Data Science and Analytics Projects Portfolio

Classifying states based on Indian Youth Tobacco Survey

In this notebook, my analysis shows that north-eastern states have high proportion of tobacco users among students. Also, these states witness little effect of any promotional efforts to encourage or discourage the consumption of tobacco. This is done using K-means classifier and EDA. What are the reasons for this phenomenon? Check my python notebook to know more!

Access the kaggle notebook here: https://shorturl.at/gpwG3

Customer Churn EDA & Prediction using RandomForestClassifier and Recall

Accuracy is not always the best metric to evaluate a model. On a case to case basis, we also need to focus on Recall, precision, and specificity. Also, sometimes prediction model might be unusable but simple EDA can generate crucial insights. In this project, I look at a dataset of telecom users and their churn and derive important actionable insight. Further, I try to set up and refine a model for predicting whether a customer of a telecom service provider will switch to their competition or not, using RandomForestClassifier and Recall as the optimization metric.

Access the ipynb file here: https://shorturl.at/FUVZ1

Credit Card Fraud detection – with SMOTE

Here, I use a credit cards transaction dataset and try various supervised ML algorithms on it to predict fraudulence of transactions. I have used SMOTE technique to cure the imbalance in the target variable.

https://shorturl.at/qrM89

Silicon Valley Bank Crisis, Government intervention, and Public Sentiments

This notebook uses NLP and AI to assess impact of government intervention on public perceptions during the Silicon Valley Bank Crisis in USA using 279,000+ tweets. My analysis shows that there is statistically significant improvement in sentiments after the government policy interventions.

https://shorturl.at/dfL78

Online Retail Store analysis – Customer Segmentation – Kmeans

Here, I analyze the orders data of online retail store with the objective of customer segmentation using Kmeans Unsupervised Algorithm.

https://shorturl.at/ACKXZ

HR Analytics and attrition using EDA, Random Forests and GridSearchCV

This project analyses an HR dataset to detect factors that contribute to churn and attrition of employees.

Later in the same project file, I also try to predict whether an employee will leave the employer or not, using a random forest model and perform hyperparameter tuning on the same.

https://shorturl.at/wzINQ

Courier Charges Reconciliation

Here, I analyze the data of an online retail business (titled X) and its courier partner, to reconcile the invoice data from the courier partner and check for mistakes in it. By the end, I create order wise report of difference in charges billed by courier partner and the correct charges that should be charged. I also generate a summary of how much the company has been overcharged or undercharged by the courier partner.

This helps the company X to make sure that its courier partner is not overcharging it for its services.

Link to the python notebook: https://shorturl.at/mrzAT

Leave a comment