Santiago Profile picture
Feb 9, 2021 27 tweets 7 min read Read on X
Seriously though, how the heck can a computer recognize what's in an image?

Grab a coffee ☕️, and let's talk about one of the core ideas that makes this possible.

(I'll try to stay away from the math, I promise.)

👇
If you are a developer, spend a few minutes trying to think about a way to solve this problem:

→ Given an image, you want to build a function that determines whether it shows a person's face.

2/ Image
It gets overwhelming fast, right?

What are you going to do with all of these pixels?

3/
Alright, you get the idea: this is a hard problem to solve, and we can't just develop our way out of it.

So let's talk about machine learning.

More specifically, let's talk about Convolutional Neural Networks.

4/
Well, I'm skipping like 300 layers of complexity here.

We should start talking about neural networks and build from that idea, but that'll be boring, and I'm sure you've heard of them before.

If you want a refresher, here is an amazing video:

5/ Image
Fully connected networks are cool, but convolutional layers transformed the field.

I want to focus on them, so next time somebody mentions "convolution," you know exactly what's going on.

6/
Before getting too technical, let's try to break down the problem in a way that makes the solution a little bit more intuitive.

Understanding an image's contents is not about individual pixels but about the patterns formed by nearby pixels.

7/
For instance, think about Lena's picture attached here.

You get a bunch of pixels that together form the left eye. Another bunch that makes up the right eye. You have the nose, mouth, eyebrows, etc.

Put them together, and you get her face.

8/ Image
Wave your magic wand and imagine you could build a function specializing in detecting each part of the face.

In the end, you run every function, and if you can find every piece, you would flag the image as being a face.

Easy, right?

9/ Image
But, how do we find an eye on a picture?

Well, we could keep breaking the problem into smaller pieces.

There are lines, circles, colors, patterns that together make up an eye. We could build more functions that detect each one of those separately.

10/
See where I'm going here?

We could build hundreds of functions, each one specializing in a specific portion of the face. Then have them look at the entire picture.

We can then put them together like a giant puzzle to determine whether we are looking at a face.

🙃

11/
I'm happy with that idea because I think it makes sense!

But building hundreds of little functions looking for individual patterns in an image is still a huge hurdle.

😬

Where do we start?

12/
Enter the idea of a "filter," a small square matrix that we will move across the image from top left to bottom right.

Every time we do this, we compute a value using a "convolution" operation.

13/ Image
Look at this picture.

A convolution operation is a dot product (element-wise multiplication) between the filter and the input image patch. Then the result is summed to result in a single value.

After doing this, we move the filter over one position and do it again.

14/ Image
Here is the first convolution operation.

It produces a single value (0.2)

After doing this, we will convolve the filter with the next patch from the image and repeat this until we cover the whole picture.

Ok, this is as much math as I want you to endure.

15/ Image
Here's what's cool about this: convolving an image with different filters will produce different outputs!

The attached code uses the filter2d() function from OpenCV to convolve an image with two different filters.

Code: gist.github.com/svpino/be7ba9b…

16/ Image
Look at the results here.

Notice how one of the pictures shows all the horizontal edges, while the other only shows the vertical edges.

Pretty cool, huh?

17/ Image
Even better: since we are convolving each filter with the entire input image, we can detect features regardless of where they are located!

This is a crucial characteristic of Convolutional Neural Networks. Smart people call it "translation invariance."

18/
Quick summary so far:

▫️ We have a bunch of filters
▫️ Each one worries about a specific pattern
▫️ We convolve them with the input image
▫️ They can detect patterns wherever they are

Do you see where this is going?

19/
The functions that we talked about before are just different filters that highlight different patterns from our image!

We can then combine each filter to find larger patterns to uncover whether we have a face.

20/
One more thing: how do we come up with the values that we need for each filter?

Horizontal and vertical edges aren't a big deal, but we will need much more than that to solve our problem.

21/
Here is where the magic happens!

Our network will learn the value of the filters during training!

We'll show it many faces, and the network will come up with useful filters that will help detect faces.

🤯

22/
None of this would be possible without everything you already know about neural networks.

I also didn't talk about other operations that make Convolutional Networks work.

But hopefully, this thread highlights the main idea: convolutions rock!

23/
If you enjoy my attempts to make machine learning a little more intuitive, stay tuned and check out @svpino for more of these threads.
There's no way to tell what specific features the filters will learn.

The expectation is that they'll focus on the face but they may learn useless features as well.

Hence the importance of validating the results and properly curating the dataset.

Great question!

In this particular case, the resultant images have the same dimensions because filter2d() uses cv2.BORDER_DEFAULT to replicate the border.

But you are right: the result of a pure convolution operation will give us smaller dimensions.

Speaking about patterns and generalization, here is the natural continuation of this thread:

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Jun 5
You can now have a literal army of coding interns working for you while you sleep!

Remote Agent is now generally available. This is how we all get to experience what AI is really about.

Here is what you need to know:
Remote Agent is a coding agent based on @augmentcode. They were gracious enough to partner with me on this post.

Remote Agent:

• Runs in the cloud
• Works autonomously
• Can handle small tasks from your backlog

Here is a link to try it out: fnf.dev/4jobOrw
If you have a list of things you've always wanted to solve, let an agent do them:

• Refactor code and ensure tests still run
• Find and fix bugs
• Close open tickets from your backlog
• Update documentation
• Write tests for untested code
Read 5 tweets
Jun 4
Knowledge graphs are infinitely better than vector search for building the memory of AI agents.

With five lines of code, you can build a knowledge graph with your data.

When you see the results, you'll never go back to vector-mediocrity-land.

Here is a quick video:
Cognee is open-source and outperforms any basic vector search approach in terms of retrieval relevance.

• Easy to use
• Reduces hallucinations (by a ton!)
• Open-source

Here is a link to the repository: github.com/topoteretes/co…Image
Here is the paper explaining how Cognee works and achieves these results:

arxiv.org/abs/2505.24478Image
Read 4 tweets
May 26
Cursor, WindSurf, and Copilot suck with Jupyter notebooks. They are great when you are writing regular code, but notebooks are a different monster.

Vincent is an extension fine-tuned to work with notebooks.

10x better than the other tools!

Here is a quick video:
You can try Vincent for free. Here is a link to the extension:



It works with any of the VSCode forks, including Cursor and Windsurf. The free plan will give you enough to test it out.marketplace.visualstudio.com/items?itemName…
The extension will feel familiar to you:

• You can use it with any of the major models (GPT-X, Gemini, Claude)
• It has an option to Chat and Edit with the model
• It has an Agent mode to make changes to the notebook autonomously

But the killer feature is the Report View.
Read 4 tweets
May 19
I added a Knowledge Graph to Cursor using MCP.

You gotta see this working!

Knowledge graphs are a game-changer for AI Agents, and this is one example of how you can take advantage of them.

How this works:

1. Cursor connects to Graphiti's MCP Server. Graphiti is a very popular open-source Knowledge Graph library for AI agents.

2. Graphiti connects to Neo4j running locally.

Now, every time I interact with Cursor, the information is synthesized and stored in the knowledge graph. In short, Cursor now "remembers" everything about our project.

Huge!

Here is the video I recorded.
To get this working on your computer, follow the instructions on this link:

github.com/getzep/graphit…

Something super cool about using Graphiti's MCP server:

You can use one model to develop the requirements and a completely different model to implement the code. This is a huge plus because you could use the stronger model at each stage.

Also, Graphiti supports custom entities, which you can use when running the MCP server.

You can use these custom entities to structure and recall domain-specific information, which will tenfold the accuracy of your results.

Here is an example of what these look like:

github.com/getzep/graphit…
By the way, knowledge graphs for agents are a big thing.

A few ridiculous and eye-opening benchmarks comparing an AI Agent using knowledge graphs with state-of-the-art methods:

• 94.8% accuracy versus 93.4% in the Deep Memory Retrieval (DMR) benchmark.

• 71.2% accuracy versus 60.2% on conversations simulating real-world enterprise use cases.

• 2.58s of latency versus 28.9s.

• 38.4% improvement in temporal reasoning.

You'll find these benchmarks in this paper: fnf.dev/3CLQjBKImage
Read 4 tweets
Apr 30
Improve your LLM-based applications by 200%:

Build an LLM-as-a-Judge evaluator and integrate it with your system.

This sounds harder than it is.

Here is how to do it and the things you need to keep in mind:

1/11 Image
(LLM-as-a-judge is one of the topics I teach in my cohort. The next iteration starts next week. You can join at .)

LLM-as-a-Judge is a technique that uses an LLM to evaluate the quality of the outputs from your application.

2/11ml.school
There are three specific scenarios you can test with a judge:

1. Choose the best of 2 answers (pairwise comparison)

2. Assess specific qualities of an answer (reference-free)

3. Evaluate the answer based on additional context (reference-based)

3/11 Image
Read 11 tweets
Apr 18
Falling off ladders to claim insurance checks is a multi-million-dollar fraud business in the US.

People bury insurance companies in paperwork to steal from them.

Enter RAG.

Here is how RAG is becoming the cheaters' worst nightmare (and how you can do the same):

1/8 Image
An insurance claim can easily have 20,000 pages, and somebody must read them all!

I work with @EyeLevel, and we built a fraud detection system using their GroundX platform.

Best RAG use case I've seen—and you can use GroundX to build your own in any vertical.

2/8
Here is how you stop the cheaters:

First, the application ingests all of the documentation. All 20,000+ pages go into the database.

GroundX uses a pretrained vision model to ingest and understand documents, which is especially good for complex documents.

3/8
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(