Here is an underrated machine learning technique that will give you important information about your data and model.
Let's talk about learning curves.
Grab your ☕️ and let's do this thing!
🧵👇
Start by creating a model. Something simple. You are still exploring what works and what doesn't, so don't get fancy yet.
We are now going to plot the loss (model error) vs. the training dataset size. This will help us answer the following questions:
▫️ Do we need more data?
▫️ Do we have a bias problem?
▫️ Do we have a variance problem?
▫️ What's the ideal picture?
▫️ Do we need more data?
As you increase the training size, if both curves converge towards each other and stop improving, you don't need more data.
If there's room for them to continue closing the gap, then more data should help.
This one should be self-explanatory: if our errors stopped improving after adding more data, it's unlikely that more of it will do any good.
But if we still see the loss improving, more data should help push it even lower.
▫️ Do we have a bias problem?
If the training error is too high, we have a high bias problem.
Also, if the validation error is too high, we have a problem with the bias —either low or high bias.
A high bias indicates that our model is not powerful enough to learn the data. This is why our training error is high.
If the training error is low, that's a good thing: our model can fit the data.
High validation error indicates that our model is not performing well on the validation data. We probably have a bias problem.
To know in which direction, we need to look at the training error to decide.
▫️ Low training error: low bias
▫️ High training error: high bias
▫️ Do we have a variance problem?
If there's a big gap between the training error and the validation error, we have high variance.
A low training error also indicates that we have high variance.
High variance indicates that the model fits the data too well (probably memorizing it.)
When testing with the validation set, we should see the big gap indicating that the model did great with the training set, but sucked with the validation set.
A couple more important points:
▫️ High bias + low variance: we are underfitting.
▫️ High variance + low bias: we are overfitting.
▫️ What's the ideal picture?
These are the curves that you should be looking forward to getting.
Training and validation error converged both to a low error.
Here is another chart that does an excellent job at explaining bias and variance.
You want low bias + low variance, but keep in mind there's always a tradeoff between them: you need to find a good enough balance for your specific use case.
If these threads help, then make sure to follow me, and you won't be disappointed.
And for even more in-depth machine learning stories, make sure you head over digest.underfitted.io. The first issue coming this Friday!
🐍
Here is a quick guide that will help you dealing with overfitting and underfitting:
• You can use it with any of the major models (GPT-X, Gemini, Claude)
• It has an option to Chat and Edit with the model
• It has an Agent mode to make changes to the notebook autonomously
Knowledge graphs are a game-changer for AI Agents, and this is one example of how you can take advantage of them.
How this works:
1. Cursor connects to Graphiti's MCP Server. Graphiti is a very popular open-source Knowledge Graph library for AI agents.
2. Graphiti connects to Neo4j running locally.
Now, every time I interact with Cursor, the information is synthesized and stored in the knowledge graph. In short, Cursor now "remembers" everything about our project.
Huge!
Here is the video I recorded.
To get this working on your computer, follow the instructions on this link:
Something super cool about using Graphiti's MCP server:
You can use one model to develop the requirements and a completely different model to implement the code. This is a huge plus because you could use the stronger model at each stage.
Also, Graphiti supports custom entities, which you can use when running the MCP server.
You can use these custom entities to structure and recall domain-specific information, which will tenfold the accuracy of your results.
Here is an explanation of what MCP is, how it works, and why I think it's awesome.
I will also show you the MCP server I'm building.
This is good stuff.
For those who like YouTube better:
By the way, I won't like you anymore if you don't subscribe to my channel.
Here is where I'd start reading to understand what MCP is and what it does:
After you read "Core architecture", jump around all the other concepts. They will give you an idea of everything you can do with MCP. modelcontextprotocol.io/docs/concepts/…