Tweet

Piyal Banik

8 Aug, 17 tweets, 6 min read

#DataScience Project 3

Best Suburb to Open a Cafeteria in Melbourne 🇦🇺

- Create a Machine Learning model which suggests a location to open a Cafe.

Libraries Used
- Numpy
- Pandas
- Matplotlib
- Scikit Learn
- BeautifulSoup
- Geocoder
- Folium

Model Used:
- K Means Clustering

Please Note: the main focus of this project was on data collection, visualization, and training a model. Did not involve data cleaning.

Code for this project 👇
github.com/Piyal-Banik/Me…

1. Business Understanding:

The main goal of this project is to collect and analyze data in order to select a location in Melbourne to open a Cafeteria. We want to help a business owner planning to open up a Cafe in a location by exploring better facilities around the Suburb.

2. Analytical Approach:

This is an unsupervised machine learning problem where we need to group together suburbs having similar facilities. We will use K Means Clustering to solve this problem.

3. Data Requirements:

We would need a list of suburbs, the location of each suburb, and how many cafes are present in the suburb.

4. Data Collection:

- List of Suburbs in Melbourne, Australia which I have extracted from: en.wikipedia.org/wiki/Category:…

- Latitude & Longitude of all the suburbs using Geocoder

- venues in each suburb from foursquare API foursquare.com

5. Data Understanding

- The Wikipedia page contains a list of suburbs in Melbourne. There are 212 suburbs in Melbourne which I extracted using a web scraping technique with the help of Python BeautifulSoup and Request packages.

- the geographical coordinates such as latitude and longitude of each suburb were collected using Python’s Geocoder package.

- Then, Foursquare API was used to extract details about the various venues present in each suburb.

- Once, the location data was extracted by using Geocoder, I used the Folium package to visualize the data on a map. This ensured us that the data we retrieved was correct.

- Foursquare API was used to obtain the top 100 venues within a radius of 2000 meters.

6. Feature Engineering

- Converted the data into dummy variables using get_dummies method of Pandas package that will be essential for performing clustering algorithm

- Grouped the data by Suburb & also taking the mean of the frequency of occurrence of each category.

- I extracted the data of the Cafeteria only

- Our final data frame had two variables: suburb name and the mean of the frequency of occurrence of cafes

7. Modeling

- Performed clustering on the data using K-means clustering.

- Found out 3 clusters based on the frequency of occurrence of Cafes in each suburb.

- Found out the suburb which had the highest concentration of Cafes and also the lowest concentration

Results

Categorized the data into 3 categories using K-means clustering based on the frequency of occurrence for ‘Cafe’.
- Cluster 0: Suburbs with a low number of Cafes.
- Cluster 1: Suburbs with a moderate number of cafes.
- Cluster 2: Suburbs with a high concentration of Cafe.

Evaluation

- Cluster 0 is displayed as the red color represents a greater opportunity and high potential but also suffers from the risk of having fewer customers as those areas are not busy areas.

- As a new business owner it wouldn’t be wise enough to choose cluster 2.

Therefore, I would recommend that cluster 1 represented by blue color, should be chosen where there is medium competition but greater opportunity.

@PiyalBanik

That's it for this project 👋

Please do let me know if you feel I have done some mistakes.

I am posting one Data Science Project each week

If you liked my content and want to get more threads on Data Science, Machine Learning & Python, do follow me @PiyalBanik

https://twitter.com/PiyalBanik/status/1424222003414781954?s=20

Like & retweet for the first one would mean a lot. Thank you

https://twitter.com/PiyalBanik/status/1424222003414781954?s=20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @PiyalBanik

Piyal Banik

@PiyalBanik

26 Jul

Data Science Pipeline

🧵👇

@IBM

Acknowledgment:

- John Rollins, @IBM

- Data Science Methodology, @coursera
coursera.org/learn/data-sci…

1. Business Understanding: What is the problem that we are trying to solve?

- We should have clarity of what is the exact problem we are going to solve.

- Asking the right questions as a Data Scientist starts with understanding the goal of the business.

Read 13 tweets

Piyal Banik

@PiyalBanik

25 Jul

#DataScience Project 1

Titanic – Machine Learning from Disaster

Use Machine Learning to create a model that predicts which passengers survived the Titanic shipwreck.

Libraries Used
- Numpy
- Pandas
- Seaborn
- Sickit-Learn

Final Model Chosen
- Decision Tree: 93.03% accuracy🔥

The data science methodology followed has been outlined by John Rollins, IBM

- Business Understanding
- Analytical Approach
- Data requirements
- Data collection
- Data Understanding
- Data Preparation
- Modeling
- Evaluation

Project Code 👇
github.com/Piyal-Banik/Ti…

1. Business Understanding

Given a passenger's information, how can we predict whether he/she survived the Titanic disaster?

2. Analytical Approach:

Our target variable is categorical [survived / not survived], and hence we need classification models for this task.

Read 15 tweets

Piyal Banik

@PiyalBanik

22 Jul

Data Science Books 📚 you should start reading

🧵👇

1. Data Science from Scratch

You’ll learn how many of the most fundamental DS tools and algorithms work by implementing them from scratch. Includes:

- Python basics
- Linear algebra, statistics, & probability
- Data collection & EDA
- Basic ML Algo

learning.oreilly.com/library/view/d…

2. Python for Data Analysis

This book deals with manipulating, processing, cleaning, and crunching data in Python. It is about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems.

learning.oreilly.com/library/view/p…

Read 11 tweets

Piyal Banik

@PiyalBanik

18 Jul

"People need to know Maths to become Data Scientists or Machine Learning Engineer"

- True! 😀

But, how much do we need to know? 🤔⁉️

This thread 🧵 is an outline of the concepts we should know

1. Let's start with Linear Algebra

You can start working on Data Science or ML without knowing them.

But at some time you may wish to dive deeper.

If you ask me, if there was 1 area of Maths that I would suggest you improve before the other, it would be Linear Algebra.

If I could convince you to learn a minimum of Linear Algebra for Machine Learning, it would be the following👇:

- Systems of Linear Equations & Solving them
- Matrices
- Vector Spaces
- Linear Independence
- Basis & Rank
- Linear Mappings / Projections

Read 19 tweets

Piyal Banik

@PiyalBanik

11 Jul

Here are this week's Data Science Interview Questions along with the correct answer

Thread 🧵👇

#MachineLearning #Python #100DaysOfCode

@josh_ko_naman

Answer by @josh_ko_naman

1) SL has a feedback mechanism.
UL has no feedback mechanism.

2) Supervised learning involves building a model for predicting, or estimating.
In unsupervised learning, we can learn relationships and structures from data

https://twitter.com/PiyalBanik/status/1412400894105198592?s=20

@ammaryh92

Answer by @ammaryh92 & @arunkumarai

-regularization
-simpler model architecture
-more training data
-reduce noise in the data
-reduce the number of input attributes
-shorter training cycles

https://twitter.com/PiyalBanik/status/1412742903965708290?s=20

Read 7 tweets

Piyal Banik

@PiyalBanik

9 Jul

15 Days roadmap to master #Python basics for #DataScience & #MachineLearning without having any Prior Experience.

[ Join the #100DaysOfCode & #66daysofdata challenge to keep yourself motivated ]

Thread 🧵👇

Few things to keep in mind before starting
- Learn By Doing, Practicing & Not Just Reading
- Code By Hand [very effective]
- Share, Teach, Discuss and Ask For Help
- Use Online Resources
- Be consistent
- Learn to Use Debugger

@github

I have done all the below-mentioned concepts as part of the #100DaysOfCode challenge and the code can be found in my @github profile.

[Projects & exercise not done. let me know if you want the solutions]

github.com/Piyal-Banik/10…

Read 21 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Piyal Banik

Try unrolling a thread yourself!

More from @PiyalBanik

Piyal Banik

Piyal Banik

Piyal Banik

Piyal Banik

Piyal Banik

Piyal Banik

Did Thread Reader help you today?

Like this author's thread?