๐Ÿ‘‹ Day 5 of #31DaysofML

โšก๏ธUnsupervised learning โšก๏ธ
Input data is unlabeled & the program learns to recognize the inherent patterns in the input data

Eg: Data across few people's eating habits
๐Ÿ”ธModel input = ๐Ÿ๐Ÿฅฆ๐Ÿง…๐Ÿ“๐Ÿฅž๐Ÿฅ–๐Ÿš
๐Ÿ”ธModel output = cluster of vegetarian/vegan

A ๐Ÿงต
2/8
When is unsupervised learning used?
๐Ÿ”ธ On large datasets where annotating (labeling) data is costly
๐Ÿ”ธ When we don't know how many classes might exist in the data
๐Ÿ”ธ Cluster the data to apply classification on the individual clusters

#31DaysofML
@quaesita has demystified unsupervised learning here ๐Ÿ‘‰ bit.ly/quaesita_unsupโ€ฆ
@quaesita 3/8
Common types of unsupervised learning:
๐Ÿ”น Clustering - divide the data by similarity
๐Ÿ“Œ Eg: Target marketing, Customer recommendation
๐Ÿ”น Dimensionality reduction - Find wider dependencies
๐Ÿ“Œ Eg: Big data visualizations, structure discovery

#31DaysofML
@quaesita 4/8
Most used clustering algorithms:
๐Ÿ‘‰ Centroid-based Clustering - k-means
๐Ÿ‘‰ Density-based Clustering
๐Ÿ‘‰ Distribution-based Clustering - Gaussian distributions
๐Ÿ‘‰ Hierarchical Clustering
@quaesita 5/8
๐Ÿ‘‰ Centroid-based Clustering organizes the data into non-hierarchical clusters
๐Ÿ”น These algorithms are efficient but sensitive to initial conditions and outliers.
๐Ÿ”น k-means is the most widely-used centroid-based clustering algorithm which is efficient, effective, & simple Image
@quaesita 6/8
๐Ÿ‘‰ Density-based clustering connects areas of high example density into clusters.
๐Ÿ”น Allows for arbitrary-shaped distributions as long as dense areas can be connected
๐Ÿ”น These algorithms have difficulty with data of varying densities & high dimensions Image
@quaesita 7/8
๐Ÿ‘‰ Distribution-based clustering assumes data is composed of distributions
๐Ÿ”น such as Gaussian distributions
๐Ÿ”น As distance from the distribution's center increases, the probability that a point belongs to the distribution decreases

#31DaysofML Image
@quaesita 8/8
๐Ÿ‘‰ Hierarchical Clustering
๐Ÿ”น Creates a tree of clusters
๐Ÿ”น Well suited to hierarchical data, such as taxonomies
๐Ÿ”น Advantage: Any number of clusters can be chosen by cutting the tree at the right level

#31DaysofML Image
Resource: bit.ly/gcp_ml

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Priyanka Vergadia

Priyanka Vergadia Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @pvergadia

28 Feb
Experimented with Teachable Machine today and created a #nocode classification model in less than 5 mins!

It's a web-based tool making #machinelearning models fast, easy, and accessible to everyone.

See how I did it ๐Ÿงต๐Ÿ‘‡

teachablemachine.withgoogle.com

#nocode #31DaysofML
How do I use it?

๐Ÿ“ŒGather data (upload it)
๐Ÿ“ŒTrain model (in the web interface)
๐Ÿ“ŒExport the model (use it in your app) Image
What can I use to teach it?

๐Ÿ“ŒImages
๐Ÿ“ŒSounds
๐Ÿ“ŒPoses

We can use files or capture examples live through webcam/microphone. Image
Read 5 tweets
15 Feb
Day 14 #31DaysofML

๐Ÿค” How to pick the right #GoogleCloud #MachineLearning tool for your application?

Answer these questions
โ“ What's your teams ML expertise?
โ“ How much control/abstraction do you need?
โ“ Would you like to handle the infrastructure components?

๐Ÿงต ๐Ÿ‘‡
@SRobTweets created this pyramid to explain the idea.
As you move up the pyramid, less ML expertise is required, and you also donโ€™t need to worry as much about the infrastructure behind your model.

To lear more watch this video ๐Ÿ‘‰

#31DaysofML 2/10
@SRobTweets If youโ€™re using Open source ML frameworks (#TensorFlow) to build the models, you get the flexibility of moving your workloads across different development & deployment environments. But, you need to manage all the infrastructure yourself for training & serving

#31DaysofML 3/10
Read 10 tweets
14 Feb
Day 13 #31DaysofML

โš–๏ธ How to deal with imbalanced datasets?โš–๏ธ
Most real-world datasets are not perfectly balanced. If 90% of your dataset belongs to one class, & only 10% to the other, how can you prevent your model from predicting the majority class 90% of the time?

๐Ÿงต ๐Ÿ‘‡
๐Ÿฑ๐Ÿฑ๐Ÿฑ๐Ÿฑ๐Ÿฑ๐Ÿฑ๐Ÿฑ๐Ÿฑ๐Ÿฑ๐Ÿถ (90:10)
๐Ÿ’ณ ๐Ÿ’ณ ๐Ÿ’ณ ๐Ÿ’ณ ๐Ÿ’ณ ๐Ÿ’ณ ๐Ÿ’ณ ๐Ÿ’ณ ๐Ÿ’ณ โš ๏ธ (90:10)
There can be many reasons for imbalanced data. First step is to see if it's possible to collect more data. If you're working with all the data that's available, these ๐Ÿ‘‡ techniques can help

#31DaysofML 2/7
Here are 3 techniques for addressing data imbalance. You can use just one of these or all of them together:
โš–๏ธ Downsampling
โš–๏ธ Upsampling
โš–๏ธ Weighted classes

#31DaysofML 3/7
Read 7 tweets
11 Feb
Since it is Day 10 of #31DaysofML it's perfect to discuss 1๏ธโƒฃ0๏ธโƒฃ things that can go wrong with #MachineLearning Projects and what you can do about it!

I watched this amazing presentation by @kweinmeister that sums it all up

A ๐Ÿงต Image
@kweinmeister 1๏ธโƒฃ You aren't solving the right problem
โ“What's the goal of your ML model?
โ“How do you assess if your model is "good" or "bad"?
โ“What's your baseline?
๐Ÿ‘‰ Focus on a long-term mission with maximum impact
๐Ÿ‘‰ Ensure that your problem is a good fit for ML

#31DaysofML
@kweinmeister 2๏ธโƒฃ Jumping into development without a prototype
๐Ÿ‘‰ ML project is an iterative process
๐Ÿ‘‰ Start with simple model & continue to refine it until you've reached your goal
๐Ÿ‘‰ Quick prototype can tell a lot about hidden requirements, implementation challenges, scope, etc

#31DaysofML
Read 11 tweets
10 Feb
Day 9 of #31DaysofML

๐Ÿ’โ€โ™€๏ธ I thought today I would share a tip that has helped me in my #MachineLearning journey
๐Ÿ’กThe best way to learn ML is to pick a problem that you feel excited about & let it guide your learning path. Don't worry about the terms or tools, it's all secondary
Here's an example. Few weeks ago I wanted to live translate an episode of @GCPPodcast. The first question I asked myself was:
๐Ÿค” Does any video/audio translation API already exist?
๐Ÿ”น If so - I would give that a try
๐Ÿ”น If not, I would create it from scratch

#31DaysofML (2/5)
@GCPPodcast Next, I started digging into the Media Translation API which would translate audio & video data.
My point is:
๐Ÿ“Œ You don't always need to create a model
๐Ÿ“Œ Save yourself time & resources by using the models that already exist (if they server your purpose)

#31DaysofML (3/5)
Read 5 tweets
8 Feb
Day 7 of #31DaysofML

โฌ‡๏ธ Reducing Loss โฌ‡๏ธ
An iterative process of choosing model parameters that minimize loss
๐Ÿ‘‰ Loss function is how we compute loss
๐Ÿ‘‰ Loss function curve is convex for linear regression

A ๐Ÿงต ๐Ÿ‘‡ Image
Calculating loss for every value of W isn't efficient: most common way is called gradient descent
๐Ÿ‘‰ Start with any value of w, b (weights & biases)
๐Ÿ‘‰ Keep going until overall loss stops changing or changes slowly
๐Ÿ‘‰ That point is called convergence

#31DaysofML 2/4 Image
As you probably already guessed, gradient is a vector with:
๐Ÿ‘‰ Direction
๐Ÿ‘‰ Magnitude
Gradient descent algorithms multiply the gradient by a scalar known as the learning rate (or step size) to determine the next point.

#31DaysofML 3/4
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!