Tweet

Piyal Banik

17 Aug, 13 tweets, 5 min read

#DataScience Project 4

Customer Segmentation

- Use Machine Learning to create a model that performs Customer Segmentation

Libraries Used
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Scikit learn

Models Trained
- KMeans Clustering
- Hierarchical Clustering

Code for this project can be found here 👇

[Please do consider giving an upvote if you find this notebook to be useful 😀]

kaggle.com/piyalbanik/seg…

1. Business Understanding

The goal of this project is to divide customers into groups based on common characteristics in order to maximize the value of each customer to the business.

2. Analytical Approach

Clustering of Customers based on similar characteristics is an Unsupervised Learning as for each observation we do not have any target variable.

For this project, I will use two Machine Learning models
- KMeans Clustering
- Hierarchical clustering

3,4. Data Requirements and Data Collection

We would require a dataset that gives us information regarding customers from a market.

For this project, the dataset has been provided to us on Kaggle.
kaggle.com/vjchoudhary7/c…

5. Data Understanding

- There is a total of 200 observations with each having 5 variables.
- The column of the dataset includes CustomerID, Gender, Age, Annual Income, Spending Score.
- There are no missing values 😀
- There is one categorical variable - Gender

Distribution plots of
- Age
- Annual Income(k$)
- Spending Score (1-100)
- Gender

The heatmap shows that there is no strong correlation among variables

6. Feature Engineering

- Dropped CustomerID (not useful as it is unique for each customer)
- Created Dummy Variable for Gender

7. Modelling

K Means Clustering
- First, we determine the optimal number of clusters
- Then we determine starting values for each cluster

K Means Clustering Output 👇

Hierarchical clustering

The endpoint is a set of clusters, where each cluster is distinct from the other cluster, and the objects within each cluster are broadly similar to each other.

Output 👇

@PiyalBanik

That's it for this project 👋

Please do let me know if there is any mistake.

A retweet for the first one would really mean a lot 🙏

If you liked my content and want to get more threads on Data Science, Machine Learning & Python, do follow me @PiyalBanik

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @PiyalBanik

Piyal Banik

@PiyalBanik

15 Aug

3 remote Data Science and Machine Learning Internship opportunities which are open for all.

🧵👇

1. Graduate Rotational Internship Program - The Sparks Foundation

The Graduate Rotational Internship Program is a unique offer for students and recent graduates to experience and join The Sparks Foundation.

Apply 👇
internship.thesparksfoundation.info

2. Omenda

Omdena AI projects are the best way to build sought-after data science and machine learning skills while solving real-world problems.

Apply 👇
omdena.com/projects/

Read 5 tweets

Piyal Banik

@PiyalBanik

12 Aug

3 beginners level Machine Learning projects with code

- Regression
- Classification
- Clustering

🧵👇

https://twitter.com/PiyalBanik/status/1422577878126845963?s=20

1. Regression

https://twitter.com/PiyalBanik/status/1422577878126845963?s=20

https://twitter.com/PiyalBanik/status/1419239949136584704?s=20

2. Classification

https://twitter.com/PiyalBanik/status/1419239949136584704?s=20

Read 5 tweets

Piyal Banik

@PiyalBanik

8 Aug

#DataScience Project 3

Best Suburb to Open a Cafeteria in Melbourne 🇦🇺

- Create a Machine Learning model which suggests a location to open a Cafe.

Libraries Used
- Numpy
- Pandas
- Matplotlib
- Scikit Learn
- BeautifulSoup
- Geocoder
- Folium

Model Used:
- K Means Clustering

Please Note: the main focus of this project was on data collection, visualization, and training a model. Did not involve data cleaning.

Code for this project 👇
github.com/Piyal-Banik/Me…

1. Business Understanding:

The main goal of this project is to collect and analyze data in order to select a location in Melbourne to open a Cafeteria. We want to help a business owner planning to open up a Cafe in a location by exploring better facilities around the Suburb.

Read 17 tweets

Piyal Banik

@PiyalBanik

26 Jul

Data Science Pipeline

🧵👇

@IBM

Acknowledgment:

- John Rollins, @IBM

- Data Science Methodology, @coursera
coursera.org/learn/data-sci…

1. Business Understanding: What is the problem that we are trying to solve?

- We should have clarity of what is the exact problem we are going to solve.

- Asking the right questions as a Data Scientist starts with understanding the goal of the business.

Read 13 tweets

Piyal Banik

@PiyalBanik

25 Jul

#DataScience Project 1

Titanic – Machine Learning from Disaster

Use Machine Learning to create a model that predicts which passengers survived the Titanic shipwreck.

Libraries Used
- Numpy
- Pandas
- Seaborn
- Sickit-Learn

Final Model Chosen
- Decision Tree: 93.03% accuracy🔥

The data science methodology followed has been outlined by John Rollins, IBM

- Business Understanding
- Analytical Approach
- Data requirements
- Data collection
- Data Understanding
- Data Preparation
- Modeling
- Evaluation

Project Code 👇
github.com/Piyal-Banik/Ti…

1. Business Understanding

Given a passenger's information, how can we predict whether he/she survived the Titanic disaster?

2. Analytical Approach:

Our target variable is categorical [survived / not survived], and hence we need classification models for this task.

Read 15 tweets

Piyal Banik

@PiyalBanik

22 Jul

Data Science Books 📚 you should start reading

🧵👇

1. Data Science from Scratch

You’ll learn how many of the most fundamental DS tools and algorithms work by implementing them from scratch. Includes:

- Python basics
- Linear algebra, statistics, & probability
- Data collection & EDA
- Basic ML Algo

learning.oreilly.com/library/view/d…

2. Python for Data Analysis

This book deals with manipulating, processing, cleaning, and crunching data in Python. It is about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems.

learning.oreilly.com/library/view/p…

Read 11 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Piyal Banik

Try unrolling a thread yourself!

More from @PiyalBanik

Piyal Banik

Piyal Banik

Piyal Banik

Piyal Banik

Piyal Banik

Piyal Banik

Did Thread Reader help you today?

Like this author's thread?