The main goal of this project is to collect and analyze data in order to select a location in Melbourne to open a Cafeteria. We want to help a business owner planning to open up a Cafe in a location by exploring better facilities around the Suburb.
2. Analytical Approach:
This is an unsupervised machine learning problem where we need to group together suburbs having similar facilities. We will use K Means Clustering to solve this problem.
3. Data Requirements:
We would need a list of suburbs, the location of each suburb, and how many cafes are present in the suburb.
- Latitude & Longitude of all the suburbs using Geocoder
- venues in each suburb from foursquare API foursquare.com
5. Data Understanding
- The Wikipedia page contains a list of suburbs in Melbourne. There are 212 suburbs in Melbourne which I extracted using a web scraping technique with the help of Python BeautifulSoup and Request packages.
- the geographical coordinates such as latitude and longitude of each suburb were collected using Python’s Geocoder package.
- Then, Foursquare API was used to extract details about the various venues present in each suburb.
- Once, the location data was extracted by using Geocoder, I used the Folium package to visualize the data on a map. This ensured us that the data we retrieved was correct.
- Foursquare API was used to obtain the top 100 venues within a radius of 2000 meters.
6. Feature Engineering
- Converted the data into dummy variables using get_dummies method of Pandas package that will be essential for performing clustering algorithm
- Grouped the data by Suburb & also taking the mean of the frequency of occurrence of each category.
- I extracted the data of the Cafeteria only
- Our final data frame had two variables: suburb name and the mean of the frequency of occurrence of cafes
7. Modeling
- Performed clustering on the data using K-means clustering.
- Found out 3 clusters based on the frequency of occurrence of Cafes in each suburb.
- Found out the suburb which had the highest concentration of Cafes and also the lowest concentration
Results
Categorized the data into 3 categories using K-means clustering based on the frequency of occurrence for ‘Cafe’.
- Cluster 0: Suburbs with a low number of Cafes.
- Cluster 1: Suburbs with a moderate number of cafes.
- Cluster 2: Suburbs with a high concentration of Cafe.
Evaluation
- Cluster 0 is displayed as the red color represents a greater opportunity and high potential but also suffers from the risk of having fewer customers as those areas are not busy areas.
- As a new business owner it wouldn’t be wise enough to choose cluster 2.
Therefore, I would recommend that cluster 1 represented by blue color, should be chosen where there is medium competition but greater opportunity.
That's it for this project 👋
Please do let me know if you feel I have done some mistakes.
I am posting one Data Science Project each week
If you liked my content and want to get more threads on Data Science, Machine Learning & Python, do follow me @PiyalBanik
Like & retweet for the first one would mean a lot. Thank you
This book deals with manipulating, processing, cleaning, and crunching data in Python. It is about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems.
1) SL has a feedback mechanism.
UL has no feedback mechanism.
2) Supervised learning involves building a model for predicting, or estimating.
In unsupervised learning, we can learn relationships and structures from data
-regularization
-simpler model architecture
-more training data
-reduce noise in the data
-reduce the number of input attributes
-shorter training cycles
Few things to keep in mind before starting
- Learn By Doing, Practicing & Not Just Reading
- Code By Hand [very effective]
- Share, Teach, Discuss and Ask For Help
- Use Online Resources
- Be consistent
- Learn to Use Debugger
I have done all the below-mentioned concepts as part of the #100DaysOfCode challenge and the code can be found in my @github profile.
[Projects & exercise not done. let me know if you want the solutions]