Santiago Profile picture
Oct 2, 2020 19 tweets 3 min read Read on X
You might have finished the engine, but there's still a lot of work to put the entire car together.

A Machine Learning model is just a small piece of the equation.

A lot more needs to happen. Let's talk about that.

🧵👇
For simplicity's sake, let's imagine a model that takes a picture of an animal and classifies it among 100 different species.

▫️Input: pre-processed pixels of the image.
▫️Output: a score for each one of the 100 species.

Final answer is the species with the highest score.

👇
There's a lot of work involved in creating a model like this. There's even more work involved in preparing the data to train it.

But it doesn't stop there.

The model is just the start, the core, the engine of what will become a fully-fledged car.

👇
Unfortunately, many companies are hyper-focused on creating these models and forget that productizing them is not just a checkbox in the process.

Reports are pointing out that ~90% of Data Science projects never make it to production!

I'm not surprised.

👇
Our model predicting species is now ready!

— "Good job, everyone!"

— "Oh, wait. Now what? How do we use this thing?"

Let's take our model into production step by step.

👇
First, we need to wrap the model with code that:

1. Pre-processes the input image
2. Translates the output into an appropriate answer

I call this the "extended model." Complexity varies depending on your needs.

👇
Frequently, processing a single image at a time is not enough, and you need to process batches of pictures (you know, to speed things up a bit.)

Doing this requires a non-trivial amount of work.

👇
Now we need to expose the functionality of the extended model.

Usually, you can do this by creating a wrapper API (REST or RPC) and have client applications use it to communicate with the model.

Loading the model in memory brings some other exciting challenges.

👇
Of course, we can't trust what comes into that API, so we need to validate its input:

▫️What's the format of the image we are getting?
▫️What happens if it doesn't exist?
▫️Does it have the expected resolution?
▫️Is it base64? URL?
▫️...

👇
Now that our API is ready, we need to host it. Maybe with a cloud provider. Several things to worry about here:

▫️Package API and model in a container
▫️Where do we deploy it?
▫️How do we deploy it?
▫️How do we take advantage of acceleration?

👇
Also:

▫️How long do we have to return an answer?
▫️How many requests per second can we handle?
▫️Do we need automatic scaling?
▫️What are the criteria to scale in and out?
▫️How can we tell when a model is down?
▫️How do we log what happens?

👇
Let's say we made it.

At this point, we have a frozen, stuck-in-time version of our model deployed.

But we aren't done yet. Far from it!

By now, there's probably a newer version of the model ready to go.

How do we deploy that version? Do we need to start again?

👇
And of course, it would be ideal if you don't just snap the new version of the model in and pray that quality doesn't go down, right?

You want old and new side by side. Then migrate traffic over gradually.

This requires more work.

👇
Creating the pipeline that handles taking new models and hosting them in production takes a lot of planning and effort.

And you are probably thinking, "That's MLOps!"

Yes, it is! But giving it a name doesn't make it less complicated.

👇
And there's more.

As we collect more and more data, we need to train new versions of our model.

We can't expect our people to do this manually. We need to automate the training pipeline.

A whole lot more work!

👇
Some questions:

1. Where's the data coming from?
2. How should it be split?
3. How much data should be used to retrain?
4. How will the training scripts run?
5. What metrics do we need?
6. How to evaluate the quality of the model?

These aren't simply "Yes" or "No" answers.

👇
At this point, are we done yet?

Well, not quite 😞

We need to worry about monitoring our model. How is it performing?

That pesky "concept drift" ensures that the quality of our results will rapidly decay. We need to be on top of it!

👇
And there's even more.

Here are some must-haves for well-rounded, safe production systems that I haven't covered yet:

▫️Ethics
▫️Data capturing and storage
▫️Data quality
▫️Integrating human feedback

👇
Here is the bottom line:

Creating a model with predictive capacity is just a small part of a much bigger equation.

There aren't a lot of companies that understand the entire picture. This opens up a lot of opportunities.

Opportunities for you and me.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Jun 6
Bye-bye, virtual assistants! Here is the most useful agent of 2025.

An agent with access to your Gmail, Calendar, and Drive, and the ability to do things for you is pretty mind-blowing.

I asked it to read my emails and reply to every cold outreach message.

My mind is blown!
AI Secretary and the folks @genspark_ai will start printing money!

You can try this out here:

Check their announcement video and you'll see some of the crazy things it can do for you. genspark.ai
The first obvious way I've been using AI Secretary:

100x better email search.

For example, I just asked it to "show me the last 3 emails asking for an invoice for the Machine Learning School cohort."

I also asked it to label every "email containing feedback about the cohort."
Read 6 tweets
Jun 5
You can now have a literal army of coding interns working for you while you sleep!

Remote Agent is now generally available. This is how we all get to experience what AI is really about.

Here is what you need to know:
Remote Agent is a coding agent based on @augmentcode. They were gracious enough to partner with me on this post.

Remote Agent:

• Runs in the cloud
• Works autonomously
• Can handle small tasks from your backlog

Here is a link to try it out: fnf.dev/4jobOrw
If you have a list of things you've always wanted to solve, let an agent do them:

• Refactor code and ensure tests still run
• Find and fix bugs
• Close open tickets from your backlog
• Update documentation
• Write tests for untested code
Read 5 tweets
Jun 4
Knowledge graphs are infinitely better than vector search for building the memory of AI agents.

With five lines of code, you can build a knowledge graph with your data.

When you see the results, you'll never go back to vector-mediocrity-land.

Here is a quick video:
Cognee is open-source and outperforms any basic vector search approach in terms of retrieval relevance.

• Easy to use
• Reduces hallucinations (by a ton!)
• Open-source

Here is a link to the repository: github.com/topoteretes/co…Image
Here is the paper explaining how Cognee works and achieves these results:

arxiv.org/abs/2505.24478Image
Read 4 tweets
May 26
Cursor, WindSurf, and Copilot suck with Jupyter notebooks. They are great when you are writing regular code, but notebooks are a different monster.

Vincent is an extension fine-tuned to work with notebooks.

10x better than the other tools!

Here is a quick video:
You can try Vincent for free. Here is a link to the extension:



It works with any of the VSCode forks, including Cursor and Windsurf. The free plan will give you enough to test it out.marketplace.visualstudio.com/items?itemName…
The extension will feel familiar to you:

• You can use it with any of the major models (GPT-X, Gemini, Claude)
• It has an option to Chat and Edit with the model
• It has an Agent mode to make changes to the notebook autonomously

But the killer feature is the Report View.
Read 4 tweets
May 19
I added a Knowledge Graph to Cursor using MCP.

You gotta see this working!

Knowledge graphs are a game-changer for AI Agents, and this is one example of how you can take advantage of them.

How this works:

1. Cursor connects to Graphiti's MCP Server. Graphiti is a very popular open-source Knowledge Graph library for AI agents.

2. Graphiti connects to Neo4j running locally.

Now, every time I interact with Cursor, the information is synthesized and stored in the knowledge graph. In short, Cursor now "remembers" everything about our project.

Huge!

Here is the video I recorded.
To get this working on your computer, follow the instructions on this link:

github.com/getzep/graphit…

Something super cool about using Graphiti's MCP server:

You can use one model to develop the requirements and a completely different model to implement the code. This is a huge plus because you could use the stronger model at each stage.

Also, Graphiti supports custom entities, which you can use when running the MCP server.

You can use these custom entities to structure and recall domain-specific information, which will tenfold the accuracy of your results.

Here is an example of what these look like:

github.com/getzep/graphit…
By the way, knowledge graphs for agents are a big thing.

A few ridiculous and eye-opening benchmarks comparing an AI Agent using knowledge graphs with state-of-the-art methods:

• 94.8% accuracy versus 93.4% in the Deep Memory Retrieval (DMR) benchmark.

• 71.2% accuracy versus 60.2% on conversations simulating real-world enterprise use cases.

• 2.58s of latency versus 28.9s.

• 38.4% improvement in temporal reasoning.

You'll find these benchmarks in this paper: fnf.dev/3CLQjBKImage
Read 4 tweets
Apr 30
Improve your LLM-based applications by 200%:

Build an LLM-as-a-Judge evaluator and integrate it with your system.

This sounds harder than it is.

Here is how to do it and the things you need to keep in mind:

1/11 Image
(LLM-as-a-judge is one of the topics I teach in my cohort. The next iteration starts next week. You can join at .)

LLM-as-a-Judge is a technique that uses an LLM to evaluate the quality of the outputs from your application.

2/11ml.school
There are three specific scenarios you can test with a judge:

1. Choose the best of 2 answers (pairwise comparison)

2. Assess specific qualities of an answer (reference-free)

3. Evaluate the answer based on additional context (reference-based)

3/11 Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(