Data, Process , Batch jobs and Work flow migration to Hadoop Platform

Assisted a public sector organization to migrate large volumes of data to the Hadoop platform easily

Data-process-batch-jobs-workflow-banner

About the Client

The client is a public sector organization based in Australia. The client’s primary focus is e-governance and administration services where the client is responsible to deliver government services to customers, businesses, and other government organizations. In dealing with e-governance services, the client regularly needs to process large volumes of data based on information contained in applications form of Information and Communication Technology (ICT).

The Business Challenge

The client wanted to process large batches of data quickly where the number of daily transactions is up to 15 million. The constant growth of e-governance projects was generating a large amount of data that was becoming historical every day and additional 8 TB data was getting added every month. The client wanted to manage these large volumes of data and do analytical processing without performance issues.

What Aptus Data Labs Did

We migrated the client’s database to the Hadoop platform with multi-node cluster environment to process massive data. We used basic hardware to reduce costs and distributed processing using Hive and MapReduce. We also enabled the platform to hold only hot data (3 months) in Teradata and cold data (3+ months) in a Hadoop environment.

The Impact Aptus Data Labs Made

The new analytics platform reduced IT costs significantly. It also helped the client to handle large volumes of data without any performance breaks.

The Business and Technology Approach

Aptus Data Labs used the below process for data migration and to resolve the existing challenge. Aptus Data Labs:

Work Flow

The below figure shows at high level components of migration:

Data-process-batch-jobs-workflow

Tools used

Hadoop

NDFS and Programming

The Outcome

The migrated Hadoop platform reduced the IT costs significantly as being an open-source system, there was no license cost. The platform reduced initial investment as well as recurring costs related to handling large volumes of data by using commodity hardware. The Hadoop platform, set up in a clustered framework, allowed new nodes to be added to the cluster to handle the ever-growing volume of data. It also enabled the client to process massive volumes of data smoothly without any performance degradations.

Related Case Studies

Download Case study​

Download Case study​