K-Means has two major problems:
- Number of clusters must be known
- Doesn't handle outliers
But there's a solution!
Introducing DBSCAN, a Density based clustering algorithm. π
Here's an illustrated guide...π
Simply put, DBSCAN groups together points in a dataset that are close to each other based on their spatial density.
It's very easy to understand, just follow along ...π
DBSCAN has two important parameters.
1οΈβ£ Epsilon (eps):
`eps`: represents the maximum distance between two points for them to be considered part of the same cluster.
Points within this distance of each other are considered to be neighbours.
Check this out π
2οΈβ£ min_samples:
The minimum number of points that must be present within the eps distance for a point to be considered a core point.
Core points are points that have at least min_samples number of neighbours within the eps distance.
Check this out π
Now all the points which are not outliers & within in eps reachability of each, become part of the same cluster.
That's it, that's all that DBSCAN is about! π
Check this image π
Now that we understand how DBSCAN works, let's see things in action π
Time for some code π₯
First we create some dummy data for clustering!
Check this out π
Applying DBSCAN doesn't get easier π
Notice that we don't need to worry about number of clusters in the data, it's determined based on density! β
Check this out π
π΅ Find Jupyter Notebook π β¬οΈ
Don't forget to star the repo! π
github.com/patchy631/machβ¦
That's a wrap!
If you interested in:
- Python π
- Machine Learning π€
- MLOps π
- CV/NLP π£
- LLMs π§
Find me β @akshay_pachaar βοΈ
I also share a of knowledge around ML, MLOps & LLMs via my Newsletter! (It's FREE)
Check this outπ
mlspring.beehiiv.com/subscribe
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
