Categorizing and Segmenting User SMS Data Using NLP Techniques

About the Client
The Business Challenge
To segregate information across multiple platforms. Information such as Income and Expenses made. But, the lack of understanding, the way they have spent during the month, w.r.t the amount spent on Shopping, Travel, Entertainment, EMI, Insurance, and many more.
The extraction of key features, Merchant name identification and category mapping.
Visibility of spends done by an individual:
- Categories - Shopping, Travel, Entertainment, Bills, Foods & Drinks, Groceries, Health, Fuel, Investment, etc.
- Mode of payments - Credit card, UPI, bank transfer etc.
- Bank name - SBI, HDFC, ICICI etc.
What Aptus Data Labs Did
This is a summary of what we delivered :
- User’s Bank transactional SMS, Payment Due SMS, needs to be fetched via android application. After that, the SMS reader is enabled once the user provides ‘Consent’ for reading his SMS.
- Master Data Warehouse will be created that allows us to store the classified Bank SMS data for user’s visual purpose.
- The data of Income and Expenses is displayed to the user via. Mobile app and a data of all the user’s of a particular organization is displayed on the dashboard for the admin for analyzing and evaluating. Also we have shown the previous one year data of the SMS, Categorical classification, bank wise classification, investment wise classification.
- Integrated kafka as a message broker which overcomes the issues of database transaction being expensive. Moreover, it reduces the time duration required to fetch the SMS, classify the SMS, display the SMS by a significant difference.
- AI/ML algorithms will be used to reading all the Bank SMS [ available on the user’s mobile since inception] to store and classified the SMS as per the category along with transaction type, date, bank name, merchant name for every user.
- We have implemented “Access Control List” which handles all the permissions required for read/write. Therefore, this makes it easier for the admin to manage his clients along with the access provided to him.
The Impact Aptus Data Labs Made
This is the impact we created with measurable outcomes for Backend and ML engine
- The mobile app displayed the income and expenses of the user, categorical classification, Merchant wise classification and the total sum of transaction within the selected date range.
- Visually the pie chart of categorical expenses and the transaction amount are also displayed for better inference on the transactions.
- The payment remainders tab is also built, which will remind the user pending dues, along with the due date and payment link.
- The dashboard created lets admin manage( create, update, delete) the clients, client’s user and his roles apart from the data classified of the SMS.
- The users who are provided with the access can handle only the process assigned to them.
- Admins can view the graphical data category wise and the monthly expenditures in that category for all the users. Moreover, the user count, client count, SMS count, pull count and the sum of income and expense volume is also shown. Then, the category wise expenses along with the SMS count and the percentage it has contributed, has also been derived of further analysis and evaluation by the client. Previous one year data has been generated by default for the total number of SMS, investment, banks and mode of payment. Therefore, these insights can be downloaded after filtering by date in excel sheet.
- SDK has been built to integrate it with other products, which will behave similarly as the android app.
The Business and Technology Approach
- We have used a methodological process to execute the above project to work out the existing challenge.
- For the backend purpose we have used Golang, which provides a faster execution time and also handles concurrency.
- We have used Relational DBMS instead of Nosql Db as the classified expense messages data is consistent. Therefore, RDBMS processes a great ability to query data and perform joins which is required for the admin and users
- Python is used to build the ML engine, where pandas, numpy, regex, nltk, spacy are used to clean and preprocess the data and python crf-suite, sklearn are used for modelling.
Architecture
Backend
- This Expense Category Classification App is implemented as per the architecture below. There are 3 components on a large scale: · Web App · Android App · SDK
- Firstly, the webapp user goes through the middleware/authentication layer and creates a client profile. Then, a unique secret is created for each organization which is given to the client. Additionally, this secret key is used to create a token for the client’s respective users in the SDK. Then, the Android App users sign up via OTP or google auth whereas the SDK is authenticated via the secret key and the token. After passing the Authentication layer/middleware, they get access to the API Service. Therefore, their messages are fetched via the SMS Receiver Service (Kafka) and classified via the ML Engine to get the structured SMS. Moreover, this data gets populated in the Database and is visible to the android users as well as the Web App Portal. TheTherefore, the user Interface of the WebApp enables the admin to view the classified data and their analytics.

ML Engine
- The raw messages are cleaned and classified into transaction, non-transaction and payment reminder messages. Therefore, to predict the Merchant name from the SMS Named entity recognition (NER) is used. Furthermore, NER is a natural language processing (NLP) technique that automatically identifies named entities in a text. Moreover, the Merchant name and the due dates of payment remainder SMS are predicted using NER method. Hence, the Merchant name of transaction SMS are predicted and mapped with predefined categories. additonally, the predefined category list consists of ATM, Bill, Crypto, E Commerce, Entertainment, Education, Food & Drinks, Food Delivery, Fuel, Groceries, Health, Home Service, Income, Insurance, Investment, Loan, Recharge, Rent, ITR, Retail, Salary, Travel, Wallet and Unknown.

Related Case Studies
Unlock the Potential of Data Science with Aptus Data Labs
Don't wait to harness the power of data science - contact Aptus Data Labs today and start seeing results.
If you’re looking to take your business to the next level with data science, we invite you to contact us today to schedule a consultation. Our team will work with you to assess your current data landscape and develop a customized solution that will help you gain valuable insights and drive growth.