Thread by @akshay_pachaar on Thread Reader App

K-Means has two major problems:

- Number of clusters must be known
- Doesn't handle outliers

But there's a solution!

Introducing DBSCAN, a Density based clustering algorithm. 🚀

Here's an illustrated guide...👇

Simply put, DBSCAN groups together points in a dataset that are close to each other based on their spatial density.

It's very easy to understand, just follow along ...👇

DBSCAN has two important parameters.

1️⃣ Epsilon (eps):

`eps`: represents the maximum distance between two points for them to be considered part of the same cluster.

Points within this distance of each other are considered to be neighbours.

Check this out 👇

2️⃣ min_samples:

The minimum number of points that must be present within the eps distance for a point to be considered a core point.

Core points are points that have at least min_samples number of neighbours within the eps distance.

Check this out 👇

Now all the points which are not outliers & within in eps reachability of each, become part of the same cluster.

That's it, that's all that DBSCAN is about! 🎉

Check this image 👇

Now that we understand how DBSCAN works, let's see things in action 🚀

Time for some code 🔥

First we create some dummy data for clustering!

Check this out 👇

Applying DBSCAN doesn't get easier 🚀

Notice that we don't need to worry about number of clusters in the data, it's determined based on density! ✅

Check this out 👇

🔵 Find Jupyter Notebook 📒 ⬇️

Don't forget to star the repo! 🌟
github.com/patchy631/mach…

That's a wrap!

If you interested in:

- Python 🐍
- Machine Learning 🤖
- MLOps 🛠
- CV/NLP 🗣
- LLMs 🧠

Find me → @akshay_pachaar ✔️

I also share a of knowledge around ML, MLOps & LLMs via my Newsletter! (It's FREE)

Check this out👇
mlspring.beehiiv.com/subscribe

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll