Santiago Profile picture
Jun 21 9 tweets 3 min read
The Hello World! of machine learning: Classifying handwriting digits.

But everyone solves this problem the same way.

Here is a different, non-boring approach that you haven't seen before.

1 of 9
I used Contrastive Learning to solve this problem.

Nobody gets away with listing MNIST in their portfolio unless you use a different, exciting approach.

Contrastive Learning is just that.

2 of 9
Here is the high-level idea:

1. Create a neural network that turns a picture of a digit into an embedding (a vector of numbers.)

2. Embeddings belonging to the same digit should be similar.

3. Embeddings belonging to different digits should be far apart.

3 of 9
Whenever we receive a new picture:

1. Use the network to create the embedding.

2. Compare it to every digit's template embedding.

3. Correct answer is the digit whose embedding is the most similar to the one we created.

4 of 9
To make this happen:

1. We need a network with 2 heads.
2. Take two images as the input.
3. Compute the distance between the results.

The loss function will help us minimize the distance between images of the same digit.

5 of 9
Look at the attached picture.

• Each input expects an image
• The "model" layer computes the embeddings
• The "lambda" layer computes the distance.

We want the distance to be small if both images are the same. If they are different, we want the distance to be large.

6 of 9
There's a name for this: "Siamese network."

Take a look at the attached code. It shows the Keras implementation of this model. I'm using @ylecun's Contrastive Loss.

The entire code is here:
deepnote.com/@santiago-vald…

7 of 9
@ylecun I trained the model for 15 epochs. It reached 92% accuracy on the test set.

Not ground-breaking results, but it's an excellent way to showcase Contrastive Learning.

Attached you can see 10 of the results (including 2 mistakes.)

8 of 9
@ylecun Siamese Networks are handy in real-life applications. I've used them a ton.

For classification problems, they are ideal if you only have a few samples of each class.

9 of 9

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Jun 17
After talking to many machine learning engineers, only those at the top use this technique to train their models.

Distribute training makes a world of difference.

And contrary to what you may believe, anyone can start doing this immediately.

A quick summary:

1 of 8
You can parallelize the training process of your model.

Unless you work with small datasets that train fast, you should distribute the training process.

Time adds up. If you can avoid it, don't wait.

2 of 8
Most have heard about training on multiple computers.

Unfortunately, many don't know they can also distribute training on a single computer running multiple GPUs.

Most multi-GPU setups I've seen have one or more idle consuming electricity.

3 of 8
Read 9 tweets
Jun 13
Skip this unless you are starting as a software developer.

Here are 11 short problems that will help you practice. As you move through the list, their complexity increases.

It doesn't matter the language you are learning.

Try them out!

1. Write a function that reverses an array in place.

In other words, the function should not use an auxiliary array to do the work.
2. Write a function that finds the missing number in an unsorted array containing every one of the other 99 numbers ranging from 1 to 100.
Read 12 tweets
Jun 9
Everyone says that deploying machine learning models is important.

But nobody ever talks about what it takes.

Deploying is not a button that you push or a function that you call. Let's talk about this:

1 of 14
Let's get something straight:

Deploying models is not something for MLOps teams to worry about.

If you build them, you should know how to use them.

You may not have to worry about scalability, availability, and every other -ility, but running a model is fundamental.

2 of 14
I talk to companies on a weekly basis.

Their machine learning team is one or two data scientists. They don't have the budget to look elsewhere.

If you are a data scientist, in 99.99% of the cases deploying models is part of your job.

3 of 14
Read 15 tweets
Jun 8
Writing code is just the first step.

But nothing matters unless you can deploy it and have people use it.

After 20+ years of building software, I tried to deploy a web3 smart contract, and holy shit, it was frustrating.

But there's light at the end of the tunnel!

1 of 8
I used the same tools that everyone recommended.

It was cumbersome, time-consuming, and error-prone. Technology that hasn't matured yet.

Today, @thirdweb_ deploy goes live. Probably one of the most critical innovations in the web3 space.

This is how it works:

2 of 8
@thirdweb_ In case you aren't familiar with @thirdweb_:

You have two choices to start with web3:

1. The hard way → Do-everything-yourself-good-luck!

2. The smart way → Use @thirdweb_ and let them worry about the complex stuff while you write the code that matters.

3 of 8
Read 8 tweets
Jun 6
20 questions to practice for machine learning interviews.

These questions focus mostly on neural networks. They cover some fundamental concepts you should know.

1. Why is it important to introduce non-linearities in a neural network?

2. What are the differences between a multi-class classification problem and a multi-label classification problem?

3. Why does the use of Dropout work as a regularizer?
4. Why you shouldn't use a softmax output activation function in a multi-label classification problem?

5. Does the use of Dropout in your model slow down or speed up the training process?
Read 9 tweets
Jun 3
You don't need to "understand" how a machine learning model works before using it.

It's not a prerequisite.

Many have created this narrative, and it's funny because as soon as you talk to them, you realize the hypocrisy of the argument.

1 of 10
Let me start by saying that I have no interest in arguing using extremes:

• Zero understanding is not helpful.
• Full understanding is not realistic.

A more interesting question:

How much do you need to understand to accomplish your goals?

2 of 10
This is obvious but still worth remembering:

I can't fly a plane without proper training.

At the same time, I don't need to understand how the engine works to be a darn good pilot.

Both things can be accurate at the same time.

3 of 10
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(