There is much more to machine learning than training models.

Most courses focus exclusively on this, but this is just a small part of the pipeline.

Here are the skills that will make you a true full stack machine learning engineer. ↓
1. git

Breaking things is an inevitable consequence of building. Once your projects become serious, smashing Ctrl + Z won't get you out of trouble anymore.

This is where version control comes into play, which is essential to learn. (Especially when working in teams.)
Learning git can seem difficult at first because of the extensive use of the command line.

To start, I recommend these interactive tutorials:

• Git Immersion (gitimmersion.com/index.html)
• Learn Git Branching (learngitbranching.js.org)
2. Experiment tracking

Training a model involves trying out hundreds of hyperparameter combinations, architectures, preprocessing pipelines, etc.

This can get overwhelming really fast, and if you fail to keep a log, you'll quickly end up running around in circles.
Nowadays, there are some absolutely fantastic tools that will organize and track your experiments for you, like

• MLFlow (@MLflow),
neptune.ai (@neptune_ai),
• Weights & Biases (@weights_biases),

and many more. Pick one and profit immediately.
3. Model deployment

Training a model is awesome and all, but who is going to use the model from a messy Jupyter notebook? No one.

In practice, a trained model is wrapped into an API, providing a pleasant interface for the user. (May it be a person or a smart device.)
There are quite a few frameworks for building APIs in Python, but I recommend only one: FastAPI (fastapi.tiangolo.com).

I fell in love with this fantastic framework around two years ago, and since then, this has been my go-to choice for backend web development tasks.
If you are interested in FastAPI, there are several in-depth tutorials.

@testdrivenio has some great ones out there (testdriven.io/blog/topics/fa…), but I have also written a hands-on tutorial with FastAPI + Docker (towardsdatascience.com/how-to-properl…).
4. Dataset curation

Data is the single most important part of any machine learning pipeline. Surprisingly, it is often the most under-appreciated as well.

Never take data for granted. It can be extremely hard and expensive to produce, and you can't do anything without it.
The importance of data became widely recognized as we gained more and more experience in building machine learning systems.

Because of this, an ever-increasing amount of effort is being put into creating tools that help practitioners build datasets that feed our algorithms.
There are a lot of different approaches, for instance,

• Prodigy (prodi.gy) uses active learning to help users generate data,
• Snorkel (snorkel.ai) generates labels using weak-supervision.
There are manual annotation tools for different types of data, like

• Doccano (github.com/doccano/doccano) for NLP,
• CVAT (github.com/openvinotoolki…) for computer vision,

and many more.
Since datasets constantly change, version control can help to keep order. One tool for that is DVC (dvc.org).

Investing in data curation has enormous potential returns, and learning this skill will make you a highly valuable engineer.
5. Visualization

We, humans, are primarily visual types. Presenting data in a clear visual form can provide insights that algorithmic solutions can miss.

I always start with visual analysis if possible.
In Python, there are dozens of libraries that you can use. The most basic ones are

• Matplotlib (matplotlib.org),
• and Seaborn (seaborn.pydata.org).

These will cover the simplest scenarios.
With a bit of effort, you can go all the way and build interactive applications. The most popular tools are

• Plotly (plotly.com/python/),
• Bokeh (bokeh.org),
• Streamlit (streamlit.io).

You can't go wrong with any of them.
Bonus tip. If you want an excellent learning resource that focuses on the entire machine learning pipeline, I recommend the Full Stack Deep Learning course by @full_stack_dl.

fullstackdeeplearning.com
Are there any other skills that you feel important to mention? Feel free to share!

If you have enjoyed this thread, make sure to give me a follow!

I frequently post similar deep-dive threads about machine learning, mathematics, and the intersection of both.
More important skills, shared by you!

Feature engineering.

Although deep neural networks can learn complex features on their own, using expert-engineered features can help a lot. (Think of it as a special kind of transfer learning.)

Domain knowledge.

As pointed out by @sumedh_bp, having the right questions is essential. Without them, machine learning is just a shot in the dark.

Model explainability.

In certain applications, a black-box model is not enough. For instance, in life sciences, we want to understand why a model gave a certain result, obtaining insight that can guide further research.

Error analysis and performance monitoring.

Training data is not always perfectly representative of production data. Because of this, learning how to monitor the performance of the model is crucial.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

8 Oct
The reason PhD school is difficult is not because of the research.

Besides that, there are several key choices whose importance is underestimated by the students. Most of them are unrelated to your hard skills.

Here are the most impactful ones. ↓
1. Picking your advisor.

Young researchers usually value fame and prestige over personal relations. However, your advisor and your fellow labmates will determine your everyday work environment.

Don't sacrifice this for some scientific pedigree.
A healthy relationship with your advisor is essential for your professional performance. Pick someone who is not only a good scientist but a good person as well. Avoid abusive personalities.

Interview students and lab alumni about your prospective advisor if you can.
Read 16 tweets
5 Oct
There is one big reason we love the logarithm function in machine learning.

Logarithms help us reduce complexity by turning multiplication into addition. You might not know it, but they are behind a lot of things in machine learning.

Here is the entire story.

🧵 👇🏽
First, let's start with the definition of the logarithm.

The base 𝑎 logarithm of 𝑏 is simply the solution of the equation 𝑎ˣ = 𝑏.

Despite its simplicity, it has many useful properties that we take advantage of all the time.
You can think of the logarithm as the inverse of exponentiation.

Because of this, it turns multiplication into addition. Exponentiation does the opposite: it turns addition into multiplication.

(The base is often assumed to be a fixed constant. Thus, it can be omitted.)
Read 10 tweets
4 Oct
As you know, I am working on teaching mathematics in a way that maximizes value for machine learning practitioners.

Do you have any work stories where mathematical knowledge was a genuine advantage?

I would appreciate it if you could share!

I'll start. ↓
As a bioimage analyst, one of my projects involved the pixel-perfect identification of very thin objects: plant seedlings. (Like below.)

This was a classical semantic segmentation problem.

At first, I trained a UNet model using cross-entropy loss, but it didn't quite work.
The problem was that on the segmentation output, objects were not defined at all. My model predicted almost every pixel as background.

With some basic mathematical thinking, I suspected that the problem is caused by the cross-entropy loss.
Read 8 tweets
30 Sep
🤔 Should you learn mathematics for machine learning?

Let's do a thought experiment! Imagine moving to a new country without speaking the language and knowing the way of life. However, you have a smartphone and a reliable internet connection.

How do you start exploring?

1/8
With Google Maps and a credit card, you can do many awesome things there: explore the city, eat in nice restaurants, have a good time.

You can do the groceries every day without speaking a word: just put the stuff in your basket and swipe your card at the cashier.

2/8
After a few months, you'll start to pick up some language as well—simple things, like saying greetings or introducing yourself. You are off to a good start!

There are built-in solutions for common tasks that just work. Food ordering services, public transportation, etc.

3/8
Read 8 tweets
29 Sep
I just released a new chapter for the early access of my book, the Mathematics of Machine Learning!

This week, we are diving deep into the geometry of matrices.

What does this have to do with machine learning? Read on to find out. ↓

tivadar.gumroad.com/l/mathematics-…
Matrices are the basic building blocks of learning algorithms.

Multiplying the data vectors with a matrix is equivalent to transforming the feature space. We think about this as a "black box", but there is a lot to discover.

For one, how they change the volume of objects.
This is described by the determinant of the matrix, which is given by

• how the transformation scales the volume,
• and how it changes the orientation of basis vectors.

The determinant is given by the formula below. I am a mathematician, and even I find this intimidating.
Read 5 tweets
21 Sep
You don't need to go to a university to learn machine learning - you can do it from your living room, for completely free.

Here is an extensive list of curated free courses and tutorials, from beginner to advanced. ↓

(Trust me, you want to bookmark this tweet.)
This is how I'll group the courses.

Machine learning
├── Getting started
├── Computer vision
├── NLP
├── Reinforcement learning
└── Applications

Coding
├── Python
├── R
├── Javascript
└── Machine learning frameworks

Let's start!
Machine learning
└── Getting started

1. Neural networks (by @3blue1brown)

youtube.com/playlist?list=…
Read 40 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(