Data Digitization and Meta Data Automation

Equipped a world specialist insurance market to mine text accurately and increase business performance


About the Client

The client is a corporate body dealing with general insurance and reinsurance underwriting based in the UK. With roots in marine insurance, the client underwrites multiple types of policies, such as marine, property, aviation, energy, casualty, and motor risks. Known as a specialist insurance market, the client has 54 agencies maintaining 80 syndicates with an established presence worldwide, especially in Europe and North America.

The Business Challenge

The client wanted to mine text from standard PDF files and scanned image documents into an excel sheet for usage as data entry inputs for an Open-Twin application. The prerequisites for this case were:

What Aptus Data Labs Did

First, our team used PyPDF2 to distinguish between editable and scanned PDF files and then built an ML-based CRF/HMM model for NER tagging to mine text as per the client’s expectation to resolve this business challenge.

The Business Impact Aptus Data Labs Made

Our ML-based text mining solution helped the client to:

The Business and Technology Approach

Aptus Data Labs performed the following steps to combat this business challenge.



Tools used

The Outcome

The client was able to mine required text from several editable and scanned PDF files quickly. With the processed data contained in an excel sheet as per the prerequisites, the client was able to use the output as input for data entry. The client was able to improve the business performance and save time and money with the newly built ML-based text mining application. 

Related Case Studies

Download Case study

Download Case study