Scientific Document Extraction

Helped a leading pharmaceutical company to digitize its “PlaceHolder Document Creation” to reduce processing time and manpower costs


About the Client

The client is an Indian multinational pharmaceutical company. The client had recently acquired another leading pharmaceutical company and was going through a major phase of digitization of its existing documentation and related processes.

The Business Challenge

The client had thousands of medicine preparation templates. The client was updating the values in the blank spaces manually that was taking a long time, and prone to human error.
As part of the Quality Systems initiative (a program where all existing documents are digitized and end-users are given an interface), the client wished to automate the following process:

What Aptus Data Labs Did

We developed a traditional Named Entity Recognition model using POS tags and another using LSTM. Both models were able to detect the meaningful chemical entities on either side of the blank spaces in the document and replaced the blanks with these texts.

The Business Impact Aptus Data Labs Made

With the entire manual process successfully automated, the manual effort was reduced to a staggering 5% of the initial, and human resources costs were reduced greatly as well.

The solution was deployed on Rapidminer to provide a seamless interface to run the process for 100s of documents. As a result, the time for the entire process was reduced from hours to minutes.

The Business and Technology Approach

Aptus Data Labs used the following methodology for automating the process to resolve the existing challenge. Aptus Data Labs:

Tools used

Python, Gensim, Keras, Powershell Scripts and Rapidminer.

The Outcome

Related Case Studies

Download Case study

Download Case study