upGrad - Customer Support Ticket Classification using NLP

Introduction

This project aims to automate customer support ticket classification for a financial company using NLP techniques. The objective is to classify unstructured customer complaints into predefined categories to streamline the support process. The categories include:

Credit card/Prepaid card
Bank account services
Theft/Dispute reporting
Mortgages/loans
Others

The notebook follows a structured workflow, encompassing the following major stages:

Data Reading and Understanding
Data Cleaning
Text Preprocessing
Exploratory Data Analysis (EDA)
Feature Extraction
Topic Modeling
Model Building
Best Model
Model Inference

The data provided is in JSON format, and it is efficiently loaded into a Pandas DataFrame for further processing.

Exploratory Data Analysis (EDA)

Word Cloud

Complaint Length

Topic Modeling

Topic	Count	Label	Top Words
0	5787	Bank account services	account, check, bank, chase, money, deposit, fund, close, tell, branch, open, say, chase bank, day, transaction, checking, transfer, claim, close account, number
1	6134	Credit card / Prepaid card	card, charge, credit card, credit, chase, dispute, purchase, use, receive, merchant, chase credit card, transaction, refund, service, chase credit card, company, fraud, tell, say, pay
2	4132	Mortgages / Loans	loan, mortgage, chase, home, modification, property, letter, send, document, year, foreclosure, request, time, sale, receive, house, rate, tell, pay, loan modification
3	2535	Theft / Dispute reporting	report, credit, inquiry, credit report, hard, remove, hard inquiry, inquiry credit, account, inquiry credit report, bureau, debt, credit bureau, reporting, score, card, information, hard inquiry credit, hard inquiry credit report, identity
4	2564	Others	payment, pay, late, make, fee, balance, make payment, month, late payment, late fee, statement, payment make, account, monthly, chase, credit, payment chase, time, day, auto

Model Performance Summary

The following models were trained to classify the customer complaints:

Logistic Regression
Random Forest
Naive Bayes
Decision Tree

Performance Overview

Logistic Regression outperformed the other models across all key metrics, demonstrating superior accuracy, precision, and recall. The performance ranking of the models based on overall effectiveness is:

Logistic Regression - Best performing model with the highest scores across accuracy, precision, recall, and F1-score.
Random Forest - Good performance but slightly lower than Logistic Regression in all metrics.
Naive Bayes - Moderate performance with a balance of precision and recall.
Decision Tree - Lowest performance, prone to overfitting, resulting in lower generalization.

Model Performance Summary

Model	Accuracy	Precision	Recall	F1-score*	ROC AUC	Confidence
Logistic Regression	0.9618	0.9624	0.9618	0.9617	0.9987	0.8623
Random Forest	0.8849	0.8894	0.8849	0.8831	0.9899	0.6782
Naive Bayes	0.8705	0.8712	0.8705	0.8685	0.9830	0.7967
Decision Tree	0.8204	0.8204	0.8204	0.8203	0.8831	1.0000

*sorted on F1-score

Conclusion

The Logistic Regression model was selected as the best-performing model due to its high accuracy and robust generalization capabilities. Future improvements could include hyperparameter tuning and exploring deep learning approaches for further enhancements.

Requirements

Install the required dependencies using:

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
.gitignore		.gitignore
Amit-Mohite.ipynb		Amit-Mohite.ipynb
Evaluation_rubric.MD		Evaluation_rubric.MD
README.md		README.md
complaint-length.png		complaint-length.png
requirements.txt		requirements.txt
wordcloud_top_40.png		wordcloud_top_40.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

upGrad - Customer Support Ticket Classification using NLP

Introduction

Exploratory Data Analysis (EDA)

Word Cloud

Complaint Length

Topic Modeling

Model Performance Summary

Performance Overview

Model Performance Summary

Conclusion

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mohiteamit/ticket-classification

Folders and files

Latest commit

History

Repository files navigation

upGrad - Customer Support Ticket Classification using NLP

Introduction

Exploratory Data Analysis (EDA)

Word Cloud

Complaint Length

Topic Modeling

Model Performance Summary

Performance Overview

Model Performance Summary

Conclusion

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages