Santiago Profile picture
Feb 26, 2021 12 tweets 3 min read Read on X
Imagine you have a ton of data, but most of it isn't labeled. Even worse: labeling is very expensive. 😑

How can we get past this problem?

Let's talk about a different—and pretty cool—way to train a machine learning model.

☕️👇
Let's say we want to classify videos in terms of maturity level. We have millions of them, but only a few have labels.

Labeling a video takes a long time (you have to watch it in full!) We also don't know how many videos we need to build a good model.

[2 / 9]
In a traditional supervised approach, we don't have a choice: we need to spend the time and come up with a large dataset of labeled videos to train our model.

But this isn't always an option.

In some cases, this may be the end of the project. 😟

[3 / 9]
Here is a different approach: Active Learning.

Using Active Learning, we can have our algorithm start training with the data it has and interactively ask for new labeled data as it needs it.

Active Learning is a semi-supervised learning method.

[4 / 9] Image
Here is the most important part of "Active Learning":

The algorithm will look at all the unlabeled data and will pick the most informative examples. Then, it will ask humans to label those examples and use the answers as part of the training process.

[5 / 9]
Determining which examples are the most informative is the problematic part.

Worse case, we can select unlabeled examples randomly, but that wouldn't be smart.

The better the selection process is, the less data you'll need to build a model.

[6 / 9]
When deciding, we want the algorithm to pick the most challenging examples for the model.

Here are some existing methods that you can research further:

- Least Confidence Uncertainty
- Smallest Margin Uncertainty
- Entropy Reduction

[7 / 9]
In summary, Active Learning iteratively trains a model minimizing the amount of required labeled data.

This translates into significant savings, and sometimes, it's the difference that makes a solution viable.

[8 / 9]
Do you enjoy these threads about machine learning? Are they informative?

If I were to make a change to improve them, what would you like that to be?

[9 / 9]

🦕
You can determine any size for your batches.

You could decide to update the model after each request, or you could build up a batch before updating the model.

There are multiple ideas that you could follow here. Here are some examples:

▫️ Automatically identifying nudity is not a hard problem.

▫️ You could also identify profanity either with speech-to-text or through captions.

Other signals you could follow:

▫️ People who watch R-rated movies could be a link to find other R-rated movies.

▫️ Movie directors and actors/actresses could be a link too.

▫️ Genre is important as well.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Sep 16
How can you build a good understanding of math for machine learning?

Here is a complete roadmap for you.

In essence, three fields make this up:

• Calculus
• Linear algebra
• Probability theory

Let's take a quick look at them! Image
This thread is courtesy of @TivadarDanka.

3 years ago, he started writing a book about the mathematics of Machine Learning.

It's the best book you'll ever read:



Nobody explains complex ideas like he does.tivadardanka.com/books/mathemat…
1. Linear algebra.

In machine learning, data is represented by vectors. Essentially, training a learning algorithm is finding more descriptive representations of data through a series of transformations.

Linear algebra is the study of vector spaces and their transformations. Image
Read 9 tweets
Aug 12
The single most undervalued fact of linear algebra:

Matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how! Image
By the way, this thread is courtesy of @TivadarDanka. He allowed me to republish it.

3 years ago, he started writing a book about the mathematics of Machine Learning.

It's the best book you'll ever read:



Nobody explains complex ideas like he does.tivadardanka.com/books/mathemat…
If you look at this example, you probably figured out the rule.

Each row is a node, and each element represents a directed and weighted edge. We omit any edges of zero elements.

The element in the 𝑖-th row and 𝑗-th column corresponds to an edge going from 𝑖 to 𝑗. Image
Read 18 tweets
Jul 12
A common fallacy:

If it's raining, the sidewalk is wet. But if the sidewalk is wet, is it raining?

Reversing the implication is called "affirming the consequent." We usually fall for this.

But surprisingly, it's not entirely wrong!

Let's explain it using Bayes Theorem:

1/10 Image
This explanation is courtesy of @TivadarDanka. He allowed me to republish it.

He is writing a book about the mathematics of Machine Learning. It's the best book I've read:



Nobody explains complex ideas like he does.

2/10tivadardanka.com/books/mathemat…
We call propositions of the form "if A, then B" implications.

We write them as "A → B," and they form the bulk of our scientific knowledge.

For example:

"If X is a closed system, then the entropy of X cannot decrease" is the second law of thermodynamics.

3/10
Read 10 tweets
Jun 12
Some of the skills you need to start building AI applications:

• Python and SQL
• Transformer and diffusion models
• LLMs and fine-tuning
• Retrieval Augmented Generation
• Vector databases

Here is one of the most comprehensive programs that you'll find online:
"Generative AI for Software Developers" is a 4-month online course.

It's a 5 to 10-hour weekly commitment, but you can dedicate as much time as you want to finish early.

Here is the link to the program:

I also have a PDF with the syllabus:bit.ly/4aNOJdy


I'm a huge fan of online education, but most of it is all over the place and mostly theoretical.

This program is different:

You'll work on 4 different hands-on projects. You'll learn practical skills you can use at the office right away.cdn.sanity.io/files/tlr8oxjg…
Read 6 tweets
Jun 10
There's a stunning, simple explanation behind matrix multiplication.

This is the first time this clicked on my brain, and it will be the best thing you read all week.

Here is a breakdown of the most crucial idea behind modern machine learning:

1/15 Image
This explanation is courtesy of @TivadarDanka. He allowed me to republish it

3 years ago, he started writing a book about the mathematics of Machine Learning.

It's the best book you'll ever read:



Nobody explains complex ideas like he does.

2/15tivadardanka.com/books/mathemat…
Let's start with the raw definition of the product of A and B.

This looks horrible and complicated.

Let's unwrap it step by step.

3/15 Image
Read 15 tweets
May 28
This assistant has 169 lines of code:

• Gemini Flash
• OpenAI Whisper
• OpenAI TTS API
• OpenCV

GPT-4o is slower than Flash, more expensive, chatty, and very stubborn (it doesn't like to stick to my prompts).

Next week, I'll post a step-by-step video on how to build this.
The first request takes longer (warming up), but things work faster from that point.

Few opportunities to improve this:

1. Stream answers from the model (instead of waiting for the full answer.)

2. Add the ability to interrupt the assistant.

3. Whisper running on GPU
Unfortunately, no local modal supports text+images (as far as I know,) so I'm stuck running online models.

The TTS API (synthesizing text to audio) can also be replaced by a local version. I tried, but the available voices suck (too robotic), so I kept OpenAI's.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(