Santiago Profile picture
Feb 26, 2021 12 tweets 3 min read Read on X
Imagine you have a ton of data, but most of it isn't labeled. Even worse: labeling is very expensive. 😑

How can we get past this problem?

Let's talk about a different—and pretty cool—way to train a machine learning model.

☕️👇
Let's say we want to classify videos in terms of maturity level. We have millions of them, but only a few have labels.

Labeling a video takes a long time (you have to watch it in full!) We also don't know how many videos we need to build a good model.

[2 / 9]
In a traditional supervised approach, we don't have a choice: we need to spend the time and come up with a large dataset of labeled videos to train our model.

But this isn't always an option.

In some cases, this may be the end of the project. 😟

[3 / 9]
Here is a different approach: Active Learning.

Using Active Learning, we can have our algorithm start training with the data it has and interactively ask for new labeled data as it needs it.

Active Learning is a semi-supervised learning method.

[4 / 9] Image
Here is the most important part of "Active Learning":

The algorithm will look at all the unlabeled data and will pick the most informative examples. Then, it will ask humans to label those examples and use the answers as part of the training process.

[5 / 9]
Determining which examples are the most informative is the problematic part.

Worse case, we can select unlabeled examples randomly, but that wouldn't be smart.

The better the selection process is, the less data you'll need to build a model.

[6 / 9]
When deciding, we want the algorithm to pick the most challenging examples for the model.

Here are some existing methods that you can research further:

- Least Confidence Uncertainty
- Smallest Margin Uncertainty
- Entropy Reduction

[7 / 9]
In summary, Active Learning iteratively trains a model minimizing the amount of required labeled data.

This translates into significant savings, and sometimes, it's the difference that makes a solution viable.

[8 / 9]
Do you enjoy these threads about machine learning? Are they informative?

If I were to make a change to improve them, what would you like that to be?

[9 / 9]

🦕
You can determine any size for your batches.

You could decide to update the model after each request, or you could build up a batch before updating the model.

There are multiple ideas that you could follow here. Here are some examples:

▫️ Automatically identifying nudity is not a hard problem.

▫️ You could also identify profanity either with speech-to-text or through captions.

Other signals you could follow:

▫️ People who watch R-rated movies could be a link to find other R-rated movies.

▫️ Movie directors and actors/actresses could be a link too.

▫️ Genre is important as well.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Aug 4
AI is changing everything. Full stop.

If you still don't get it, watch this.

Look at the attached video. A company using this tool will execute 100x faster than everyone else. There's simply no match for how fast AI can transform what you do.
I'm working here with @PromptQL. They will help you build a reasoning AI that is specialized to your business.

This makes an ocean of difference:

• Connect to all of your data
• Build a massive knowledge graph
• Incorporate your unique know-how
• Learn over time
The learning part is the thing that blew my mind:

You can teach the system how to interpret your data and how you prefer things to be done.

This knowledge can be reviewed, edited, and deployed so everyone at your company starts using the new version of the model.
Read 4 tweets
Jul 7
Here is how you can test your applications using an LLM:

We call this "LLM as a Judge", and it's much easier to implement than most people think.

Here is how to do it:

1/11 Image
(LLM-as-a-judge is one of the topics I teach in my cohort. The next iteration starts in August. You can join at .)

2/11ml.school
We want to use an LLM to test the quality of responses from an application.

There are 3 scenarios:

1. Choose the best of two responses
2. Assess specific qualities of a response
3. Evaluate the response based on additional context

3/11 Image
Read 11 tweets
Jun 6
Bye-bye, virtual assistants! Here is the most useful agent of 2025.

An agent with access to your Gmail, Calendar, and Drive, and the ability to do things for you is pretty mind-blowing.

I asked it to read my emails and reply to every cold outreach message.

My mind is blown!
AI Secretary and the folks @genspark_ai will start printing money!

You can try this out here:

Check their announcement video and you'll see some of the crazy things it can do for you. genspark.ai
The first obvious way I've been using AI Secretary:

100x better email search.

For example, I just asked it to "show me the last 3 emails asking for an invoice for the Machine Learning School cohort."

I also asked it to label every "email containing feedback about the cohort."
Read 6 tweets
Jun 5
You can now have a literal army of coding interns working for you while you sleep!

Remote Agent is now generally available. This is how we all get to experience what AI is really about.

Here is what you need to know:
Remote Agent is a coding agent based on @augmentcode. They were gracious enough to partner with me on this post.

Remote Agent:

• Runs in the cloud
• Works autonomously
• Can handle small tasks from your backlog

Here is a link to try it out: fnf.dev/4jobOrw
If you have a list of things you've always wanted to solve, let an agent do them:

• Refactor code and ensure tests still run
• Find and fix bugs
• Close open tickets from your backlog
• Update documentation
• Write tests for untested code
Read 5 tweets
Jun 4
Knowledge graphs are infinitely better than vector search for building the memory of AI agents.

With five lines of code, you can build a knowledge graph with your data.

When you see the results, you'll never go back to vector-mediocrity-land.

Here is a quick video:
Cognee is open-source and outperforms any basic vector search approach in terms of retrieval relevance.

• Easy to use
• Reduces hallucinations (by a ton!)
• Open-source

Here is a link to the repository: github.com/topoteretes/co…Image
Here is the paper explaining how Cognee works and achieves these results:

arxiv.org/abs/2505.24478Image
Read 4 tweets
May 26
Cursor, WindSurf, and Copilot suck with Jupyter notebooks. They are great when you are writing regular code, but notebooks are a different monster.

Vincent is an extension fine-tuned to work with notebooks.

10x better than the other tools!

Here is a quick video:
You can try Vincent for free. Here is a link to the extension:



It works with any of the VSCode forks, including Cursor and Windsurf. The free plan will give you enough to test it out.marketplace.visualstudio.com/items?itemName…
The extension will feel familiar to you:

• You can use it with any of the major models (GPT-X, Gemini, Claude)
• It has an option to Chat and Edit with the model
• It has an Agent mode to make changes to the notebook autonomously

But the killer feature is the Report View.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(