Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Santiago

@svpino

Oct 4, 2022 • 20 tweets • 7 min read • Read on X

Scrolly

Here is a simple machine learning model. One of the classics.

If you are new, let's go together line by line and understand what's happening here:

1 of 20

First, we load the MNIST dataset, containing 70,000 28x28 images showing handwritten digits.

You can load this dataset using Keras with a single line of code.

The function returns the dataset split into train and test sets.

2 of 20

x_train and x_test contain our train and test images.

y_train and y_test contain the target values: a number between 0 and 9 indicating the digit shown in the corresponding image.

We have 60,000 images to train the model and 10,000 to test it.

3 of 20

When dealing with images, we need a tensor with 4 dimensions: batch size, width, height, and color channels.

x_train is (60000, 28, 28). We need to reshape it to add the missing dimension ("1" because these images are grayscale.)

4 of 20

Each pixel goes from 0 to 255. Neural networks work much better with smaller values.

Here we normalize pixels by dividing them by 255. That way, each pixel will go from 0 to 1.

5 of 20

Target values go from 0 to 9 (the value of each digit.)

This line one-hot encodes these values.

For example, this will transform a value like 5, in an array of zeros with a single 1 corresponding to the fifth position:

[0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

6 of 20

Let's now define our model.

There are several ways to create a model in Keras. This one is called the "Sequential API."

Our model will be a sequence of layers that we will define one by one.

7 of 20

A lot is going on with this first line.

First, we define our model's input shape: a 28x28x1 tensor (width, height, channels.)

This is exactly the shape we have in our train dataset.

8 of 20

Then we define our first layer: a Conv2D layer with 32 filters and a 3x3 kernel.

This layer will generate 32 different representations using the training images.

9 of 20

We also need to define the activation function used for this layer: ReLU.

You'll see ReLU everywhere. It's a popular activation function.

It will allow us to solve non-linear problems, like recognizing handwritten digits.

10 of 20

After our Conv2D layer, we have a max pooling operation.

The goal of this layer is to downsample the amount of information collected by the convolutional layer.

We want to throw away unimportant details and retain what truly matters.

11 of 20

We are now going to flatten the output. We want everything in a continuous list of values.

That's what the Flatten layer does. It will give us a flat tensor.

12 of 20

Finally, we have a couple of Dense layers.

Notice how the output layer has a size of 10, one for each of our possible digit values, and a softmax activation.

The softmax ensures we get a probability distribution indicating the most likely digit in the image.

13 of 20

After creating our model, we compile it.

I'm using Stochastic Gradient Descent (SGD) as the optimizer.

The loss is categorical cross-entropy: this is a multi-class classification problem.

We want to record the accuracy as the model trains.

14 of 20

Finally, we fit the model. This starts training it.

A couple of notes:

• I'm using a batch size of 32 images.
• I'm running 10 total epochs.

When fit() is done, we have a fully trained model!

15 of 20

Let's now test the model.

This gets a random image from the test set and displays it.

Notice that we want the image to come from the test set, containing data the model didn't see during training.

16 of 20

We can't forget to reshape and normalize the image as we did before with the entire train set.

I'm doing it this time for the image I use to test the model.

17 of 20

Finally, I predict the value of the image.

Remember that the result is a one-hot-encoded vector. That's why I take the argmax value (the position with the highest probability) and that's the result.

18 of 20

Here is the source code:

Have at it, go nuts, and build something cool.

gist.github.com/svpino/3cb8367…

19 of 20

@svpino

Every week, I break down machine learning concepts to give you ideas on applying them in real-life situations.

Follow me @svpino to ensure you don't miss what's coming next.

20 of 20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @svpino

Santiago

@svpino

Aug 4, 2025

AI is changing everything. Full stop.

If you still don't get it, watch this.

Look at the attached video. A company using this tool will execute 100x faster than everyone else. There's simply no match for how fast AI can transform what you do.

I'm working here with @PromptQL. They will help you build a reasoning AI that is specialized to your business.

This makes an ocean of difference:

• Connect to all of your data
• Build a massive knowledge graph
• Incorporate your unique know-how
• Learn over time

The learning part is the thing that blew my mind:

You can teach the system how to interpret your data and how you prefer things to be done.

This knowledge can be reviewed, edited, and deployed so everyone at your company starts using the new version of the model.

Read 4 tweets

Santiago

@svpino

Jul 7, 2025

Here is how you can test your applications using an LLM:

We call this "LLM as a Judge", and it's much easier to implement than most people think.

Here is how to do it:

1/11

(LLM-as-a-judge is one of the topics I teach in my cohort. The next iteration starts in August. You can join at .)

2/11ml.school

We want to use an LLM to test the quality of responses from an application.

There are 3 scenarios:

1. Choose the best of two responses
2. Assess specific qualities of a response
3. Evaluate the response based on additional context

3/11

Read 11 tweets

Santiago

@svpino

Jun 6, 2025

Bye-bye, virtual assistants! Here is the most useful agent of 2025.

An agent with access to your Gmail, Calendar, and Drive, and the ability to do things for you is pretty mind-blowing.

I asked it to read my emails and reply to every cold outreach message.

My mind is blown!

AI Secretary and the folks @genspark_ai will start printing money!

You can try this out here:

Check their announcement video and you'll see some of the crazy things it can do for you. genspark.ai

The first obvious way I've been using AI Secretary:

100x better email search.

For example, I just asked it to "show me the last 3 emails asking for an invoice for the Machine Learning School cohort."

I also asked it to label every "email containing feedback about the cohort."

Read 6 tweets

Santiago

@svpino

Jun 5, 2025

You can now have a literal army of coding interns working for you while you sleep!

Remote Agent is now generally available. This is how we all get to experience what AI is really about.

Here is what you need to know:

Remote Agent is a coding agent based on @augmentcode. They were gracious enough to partner with me on this post.

Remote Agent:

• Runs in the cloud
• Works autonomously
• Can handle small tasks from your backlog

Here is a link to try it out: fnf.dev/4jobOrw

If you have a list of things you've always wanted to solve, let an agent do them:

• Refactor code and ensure tests still run
• Find and fix bugs
• Close open tickets from your backlog
• Update documentation
• Write tests for untested code

Read 5 tweets

Santiago

@svpino

Jun 4, 2025

Knowledge graphs are infinitely better than vector search for building the memory of AI agents.

With five lines of code, you can build a knowledge graph with your data.

When you see the results, you'll never go back to vector-mediocrity-land.

Here is a quick video:

Cognee is open-source and outperforms any basic vector search approach in terms of retrieval relevance.

• Easy to use
• Reduces hallucinations (by a ton!)
• Open-source

Here is a link to the repository: github.com/topoteretes/co…

Here is the paper explaining how Cognee works and achieves these results:

arxiv.org/abs/2505.24478

Read 4 tweets

Santiago

@svpino

May 26, 2025

Cursor, WindSurf, and Copilot suck with Jupyter notebooks. They are great when you are writing regular code, but notebooks are a different monster.

Vincent is an extension fine-tuned to work with notebooks.

10x better than the other tools!

Here is a quick video:

You can try Vincent for free. Here is a link to the extension:

It works with any of the VSCode forks, including Cursor and Windsurf. The free plan will give you enough to test it out.marketplace.visualstudio.com/items?itemName…

The extension will feel familiar to you:

• You can use it with any of the major models (GPT-X, Gemini, Claude)
• It has an option to Chat and Edit with the model
• It has an Agent mode to make changes to the notebook autonomously

But the killer feature is the Report View.

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Santiago

Try unrolling a thread yourself!

More from @svpino

Santiago

Santiago

Santiago

Santiago

Santiago

Santiago

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!