Santiago Profile picture
8 Jun, 10 tweets, 3 min read
The 4 stages of a machine learning project lifecycle:

1. Project scoping
2. Data definition and preparation
3. Model training and error analysis
4. Deployment, monitoring, and maintenance

Here are 29 questions that you can use at each step of the process.

Project scoping

• What problem are we trying to solve?
• Why do we need to solve this problem?
• What are the constraints?
• What are the risks?
• What's the best approach to solving it?
• How do we measure progress?
• What does success look like?
Data definition and preparation

• What data do we need?
• How are we going to get it?
• How frequently does it change?
• Do we trust the source?
• How is this data biased?
• Can we improve it somehow?
• How are we going to clean it?
• How are we going to augment it?
Model training and error analysis

• What's a good baseline?
• What's a good starting point?
• Has anyone solved this before?
• How are we going to test the model?
• Are the results good enough?
• Are we solving the problem?
• How can we improve the results?
Deployment, monitoring, and maintenance

• Where do we host?
• How much do we need to scale?
• What metrics should we monitor?
• What results do we expect?
• How is the model doing compared to that?
• How do we keep the model up to date?
• What's our rollback strategy?
Every question opens a new set of possibilities, discoveries, and improvements.

The more you ask, the better your system will be.

Follow me @svpino, and I'll help you stay curious, one thread at a time, right on your Twitter timeline.

I post every single day.
If you build machine learning systems professionally, what questions would you recommend others to start asking?

What questions lead to interesting discoveries with the potential to change the outcome of the project?
"Is machine learning the appropriate solution for this problem?" is a great question to ask while scoping the project.

Google's Rules of Machine Learning.

Rule #1.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

9 Jun
There are thousands of machine learning algorithms out there, but that's mostly noise.

You'll rarely need more than a handful.

A good start:

• Linear/Logistic Regression
• Decision Trees
• Neural Networks
• XGBoost
• Naive Bayes
• PCA
• KNN
• SVM
• t-SNE
Why these 9 instead of your favorite ones?

No specific reason. Your list will certainly include algorithms that I haven't even heard about.

But you must start somewhere, and these are certainly a good foundation.
In case you are curious, I also studied the following algorithms in the first couple of years of starting with machine learning:

• Random Forest
• AdaBoost
• K-Means
• Expectation Maximization
• Simulated Annealing
• Genetic Algorithms
Read 5 tweets
8 Jun
Last week, an Italian artist sold an invisible sculpture for $18,300.

That's a lot of money for a lot of nothing!

I have a better idea for your money:

I can help you start your machine learning career. Something that will pay *you* for the rest of your life.

For the next 24 hours, you can grab my course for a massive 60% discount!

$6 only. That's the price of a cup of coffee where I live!

gum.co/kBjbC/60off

(Almost 2,000 customers with 208 reviews.)

If you still can't afford this, let me know, and I'll send you a free copy. Image
14 more hours, and the price goes back to $15.
Read 6 tweets
7 Jun
Here is a photo from the newspaper of a communist island.

I'm the one standing. This was 20 years ago.

I've been developing software for 25+ years, and I've learned a few things.

I didn't have Internet back then, but now that I do, I can share 3 lessons with you:

Look at that photo again.

This was early 2000.

Those were the best computers Cuba had to offer to our Computer Science faculty. Outdated but good enough.

In a country where owning a personal computer was a crime, it was all we had.
One thing was missing: There was no Internet.

I know this might be hard to understand, so I'll rephrase:

We were going through our Computer Science bachelor's with no Internet access.

The entire wealth of information we had fit in a couple of books.
Read 14 tweets
6 Jun
Good systems produce outstanding results.

↓ Some of my recommendations:

• Improve as a developer
• Improve your communication
• Take a course. Take another. Repeat.
• Solve problems. Many of them.
• Teach others.
• Analysis first. Code is secondary.
• Stay curious.
“Tutorial hell” is only when you focus on consumption and neglect production.

Solve problems and put what you learn out there.
Curiosity pushes me to dig deeper. An infinite number of "but why?" questions.

There's something new and interesting on every layer you uncover.

And the more you dig, the better your understanding and the greater your capacity to create something new.

Read 4 tweets
4 Jun
Many online courses are useless. They will not get you anywhere.

But there are gems out there.

Here is a curated list that will help you build a machine learning career without paying a fortune in tuition fees. ↓
5 specializations and 1 course, all from a single platform: Coursera.

Take these in order, and you'll end up with more than enough ammunition to tackle real-life problems.

Here is your roadmap: svpino.com/a-machine-lear…
If you find this useful, follow me @svpino, and I'll help put some practical machine learning thoughts right on your timeline.

And if you are looking for ideas that don't fit on Twitter, you can join the other 3,200+ subscribers of my newsletter: digest.underfitted.io.
Read 6 tweets
3 Jun
The machine learning setup I've been using in 2021:

• Python
• NumPy, Pandas, Matplotlib, OpenCV
• Scikit-Learn, XGBoost
• TensorFlow
• Google Colab, Jupyter, VSCode
• Docker, Flask
• AWS SageMaker
• A 48-page Field Notes
This setup hasn't changed in a while, but I plan to introduce something new: Google's Vertex AI.

How can this platform help with production systems?

I guess we'll find out together.

Follow me @svpino for a practical point of view on machine learning stuff.
It doesn’t need to be like this.

Upload the data somewhere else. Maybe to an S3 bucket, Google Drive, anywhere.

Then download it from the Colab notebook.

This will be fast and reproducible every time you restart the notebook.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(