Data Science Portfolio
Here are some of my best Data Science Projects. I have explored various deep learing and machine-learning algorithms for different datasets. Feel free to contact me to learn more about my experience working with these projects.
Data Science and Machine Learning Capstone Project by IBM

Data: (1) data source API
(2) data source web scraping
Skills used: python, API request, web scraping, numpy, pandas, matplotlib, seaborn, plotly, folium, sklearn
Project Objective: To predict if first launch of Falcon 9 rocket lands
Quantifiable result: The model predicted diseases with 83% accuracy various models.
- Used various techniques to clean the dataset and make it ready
- EDA was performed
- Logistic reggression, descision tree, KNN and SVM models with hyperparameter tuning by Grid Search CV
- 83% accuracy was found with all models
- models evalueted with confusion matrics, accuracy and fl-scores
- Explored result by providing all new test data that the model hasn’t seen before
Biodiesel Production Optimization

Data: Lab generated
Skills used: excel, ANOVA, matplotlib, seaborn, design expert
Project Objective: determine optimum biodiesel production conditions
Quantifiable result: Achieved an improved 59% reaction conversion rate and a 95% accuracy using polynomial regression degree 2.
- data was lab generated for train, test and validation data sets
- EDA was performed
- polynomial regression with degree 2 was used
- A 95% accuracy was fachieved
- Model was evalued using R squared, sum of errors
Plant-Disease-Prediction

Data: data
Skills used: Python, Numpy, Matplotlib, Seaborn, Tensor flow, Neural Networking, Keras,
Project Objective: To predict the disease type
Quantifiable result: The model predicted diseases with 97% accuracy using CNN.
- Used various techniques to clean the dataset and make it ready
- EDA was performed
- A CNN model was bult
- A 97% accuracy was found using 30 epochs
- Model was evalued using accuracy
- Explored result by providing all new test data that the model hasn’t seen before
Amazon Fine Food Reviews Analysis (with SMOTE)

Data: data
Skills used: Python, Numpy, Matplotlib, Seaborn, ntlk, re, scikitplot, SKlearn
Project Objective: To review customer feedbacks
Quantifiable result: The model identified positive and negative reviews 96% accuracy using BoW model, 94% using TF-IDF model with Naive Bayes.
- Used various techniques to clean the dataset and make it ready
- EDA was performed
- SMOTE was performed
- BoW model was bult with Naive Baye’s classifier
- A 97% accuracy was found
- TF-IDF model was built with Naive Baye’s classifier
- A 94% accuracy was achieved
- Explored result by providing all new test data that the model hasn’t seen before
Suicide Rate Prediction

Data: data
Skills used: Python, Pandas, SKlearn, Matplotlib, Seaborn, Pycountry, Gradio
Project Objective: To find features correlated to increased suicide rates among different countries globally, across various socio-economic spectrum, and make predictions for the fututre
Quantifiable result: The number of suicide cases per year for each country using 99% accuracy using Random Forest Regressor.
- Used various techniques to clean the dataset and make it ready
- EDA was performed
- Linear Regression and Random Forest Resgression were applied
- A 99% accuracy was found using the Random Forest Regressor
- Model was evalued using accuracy, precision, recall, confusion matrix and classification report was compared
- Explored various ways to interprete the result
- Model deployment was performed using gradio
Employees-Attrition-Prediction

Skills used: Python,Numpy, Pandas, SKlearn, Matplotlib, Seaborn, Plotly express
Project Objective: Prediction of whether a given employee will leave in the next two quarters or not
Quantifiable result: The probablity of an employee leaving was predicted a accuracy of 80% using Naive Baye’s classifier.
- Data was cleaned using various techniques
- EDA was performed to better understand the dataset
- Categorical variables were encoded
- Scaling was performed
- Cleaned, encoded and scaled data was fitted to Naive Baye’s, Decison Tree, K Neighbors classifiers and Logistic Regression
- SMOTE was performed and data was re-fitted
- Model was evalued using accuracy, precision, recall, confusion matrix and classification report was compared
Data: data
Customer-Personality-Analysis

Skills used: Python,Numpy, Pandas, SKlearn, Matplotlib, Seaborn, Plotly express
Project Objective: Clustering of a dataset based on customers personality
Quantifiable result: Customers were clustered into four groups using K Means clustering.
- Data was cleaned using various techniques
- EDA was performed to better understand the dataset
- Categorical variables were encoded
- Outliers were taken care of
- Scaling was performed
- Cleaned, encoded and scaled data was fitted to K Means clustering
- Model was evaluated using Silhouttee
Data data
Happiness-Score-Prediction

Skills used: Python,Numpy, Pandas, SKlearn, Matplotlib, Seaborn, Plotly express
Project Objective: Prediction of happiness score for a given country
Quantifiable result: Happiness score was predicted with accuracy of 92% using Linear Regression.
- Data was cleaned using various techniques
- EDA was performed to better understand the dataset
- Categorical variables were encoded
- Outliers were taken care of
- Scaling was performed
- Cleaned, encoded and scaled data was fitted to Linear Regression
- Result was explored to see the effect of important features
- Model was evaluated with accuracy score, MSE, r2, and RMSE
Data: [data]https://www.kaggle.com/unsdsn/world-happiness
Heart-Disease-Prediction

Skills used: Python,Numpy, Pandas, SKlearn, Matplotlib, Seaborn, Plotly express
Project Objective: Heart disease were predicted for a given input
Quantifiable result: Overall prediction with accuracy of 59% using Decision Tree Classifier.
- Data was cleaned using various techniques
- EDA was performed to better understand the dataset
- Categorical variables were encoded
- Outliers were taken care of
- Scaling was performed
- Cleaned, encoded and scaled data was fitted to varios classifier models
- Result was explored to see the effect of important features
- Model was evaluated with accuracy, precision, recall, confusion matrix and classification report was compared
- Since the dataset has a very small set of data points and there were five outputs categories, the accuracy was low
Data: data