Tweet

Tivadar Danka

11 Oct, 22 tweets, 9 min read

There is much more to machine learning than training models.

Most courses focus exclusively on this, but this is just a small part of the pipeline.

Here are the skills that will make you a true full stack machine learning engineer. ↓

1. git

Breaking things is an inevitable consequence of building. Once your projects become serious, smashing Ctrl + Z won't get you out of trouble anymore.

This is where version control comes into play, which is essential to learn. (Especially when working in teams.)

Learning git can seem difficult at first because of the extensive use of the command line.

To start, I recommend these interactive tutorials:

• Git Immersion (gitimmersion.com/index.html)
• Learn Git Branching (learngitbranching.js.org)

2. Experiment tracking

Training a model involves trying out hundreds of hyperparameter combinations, architectures, preprocessing pipelines, etc.

This can get overwhelming really fast, and if you fail to keep a log, you'll quickly end up running around in circles.

@MLflow

Nowadays, there are some absolutely fantastic tools that will organize and track your experiments for you, like

• MLFlow (@MLflow),
• neptune.ai (@neptune_ai),
• Weights & Biases (@weights_biases),

and many more. Pick one and profit immediately.

3. Model deployment

Training a model is awesome and all, but who is going to use the model from a messy Jupyter notebook? No one.

In practice, a trained model is wrapped into an API, providing a pleasant interface for the user. (May it be a person or a smart device.)

There are quite a few frameworks for building APIs in Python, but I recommend only one: FastAPI (fastapi.tiangolo.com).

I fell in love with this fantastic framework around two years ago, and since then, this has been my go-to choice for backend web development tasks.

@testdrivenio

If you are interested in FastAPI, there are several in-depth tutorials.

@testdrivenio has some great ones out there (testdriven.io/blog/topics/fa…), but I have also written a hands-on tutorial with FastAPI + Docker (towardsdatascience.com/how-to-properl…).

4. Dataset curation

Data is the single most important part of any machine learning pipeline. Surprisingly, it is often the most under-appreciated as well.

Never take data for granted. It can be extremely hard and expensive to produce, and you can't do anything without it.

The importance of data became widely recognized as we gained more and more experience in building machine learning systems.

Because of this, an ever-increasing amount of effort is being put into creating tools that help practitioners build datasets that feed our algorithms.

There are a lot of different approaches, for instance,

• Prodigy (prodi.gy) uses active learning to help users generate data,
• Snorkel (snorkel.ai) generates labels using weak-supervision.

There are manual annotation tools for different types of data, like

• Doccano (github.com/doccano/doccano) for NLP,
• CVAT (github.com/openvinotoolki…) for computer vision,

and many more.

Since datasets constantly change, version control can help to keep order. One tool for that is DVC (dvc.org).

Investing in data curation has enormous potential returns, and learning this skill will make you a highly valuable engineer.

5. Visualization

We, humans, are primarily visual types. Presenting data in a clear visual form can provide insights that algorithmic solutions can miss.

I always start with visual analysis if possible.

In Python, there are dozens of libraries that you can use. The most basic ones are

• Matplotlib (matplotlib.org),
• and Seaborn (seaborn.pydata.org).

These will cover the simplest scenarios.

With a bit of effort, you can go all the way and build interactive applications. The most popular tools are

• Plotly (plotly.com/python/),
• Bokeh (bokeh.org),
• Streamlit (streamlit.io).

You can't go wrong with any of them.

@full_stack_dl

Bonus tip. If you want an excellent learning resource that focuses on the entire machine learning pipeline, I recommend the Full Stack Deep Learning course by @full_stack_dl.

fullstackdeeplearning.com

Are there any other skills that you feel important to mention? Feel free to share!

If you have enjoyed this thread, make sure to give me a follow!

I frequently post similar deep-dive threads about machine learning, mathematics, and the intersection of both.

https://twitter.com/josejorgexl/status/1447526256686342150

More important skills, shared by you!

Feature engineering.

Although deep neural networks can learn complex features on their own, using expert-engineered features can help a lot. (Think of it as a special kind of transfer learning.)

https://twitter.com/josejorgexl/status/1447526256686342150

@sumedh_bp

Domain knowledge.

As pointed out by @sumedh_bp, having the right questions is essential. Without them, machine learning is just a shot in the dark.

https://twitter.com/sumedh_bp/status/1447551454642724868

https://twitter.com/HiramCoriaRodr1/status/1447624140983255045

Model explainability.

In certain applications, a black-box model is not enough. For instance, in life sciences, we want to understand why a model gave a certain result, obtaining insight that can guide further research.

https://twitter.com/HiramCoriaRodr1/status/1447624140983255045

https://twitter.com/Jeande_d/status/1447660906951987205

Error analysis and performance monitoring.

Training data is not always perfectly representative of production data. Because of this, learning how to monitor the performance of the model is crucial.

https://twitter.com/Jeande_d/status/1447660906951987205

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 40 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Tivadar Danka

Try unrolling a thread yourself!

More from @TivadarDanka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Did Thread Reader help you today?

Like this author's thread?