Santiago Profile picture
3 Jan, 4 tweets, 1 min read
A machine learning workflow:

1. Define the problem
2. Assemble a dataset
3. Determine success metrics
4. Decide on evaluation method
5. Prepare the data
6. Establish a baseline
7. Develop a model that beats the baseline
8. Overfit model
9. Regularize model
10. Tune model
Where's model validation in this workflow?

Notice that steps 8, 9, and 10 presume the existence of a mechanism to evaluate the model. This means that model validation is implicitly part of this workflow.
"Assembling a dataset" focuses on determining what will be the sources of data that we will need to solve the problem.

Before understanding metrics of success, we need to have access to the data that we will be using.

Later, "Preparing the data" focuses on that data.
As soon as you determine what the dataset looks like, “data analysis” is another implicit step that enables several of the steps described in this workflow.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

2 Jan
5 differences between a machine learning system and the software you are building today:

🧵👇
1. The Team

Usually, a machine learning system needs the involvement of many different disciplines:

- Data Scientists
- Data Engineers
- Machine Learning Engineers

Plus, the same roles that are needed by a conventional software system.

👇
2. The Development Process

Machine learning is a very experimental process. Creating a model requires a lot of exploration, usually not needed in software development.

👇
Read 6 tweets
31 Dec 20
For a long time, I didn't understand how to use Virtual Environments in Python 🐍.

If this is just, let's end it here and now: 🧵👇
[2] Virtual Environments let you deal with the dependencies that your code has with external Python libraries.

It avoids having conflicts when your projects depend on different versions of the same library.

👇
[3] Let's imagine that you are building your first Python project and you install the "requests" library:

pip install requests

You get version 2.24.0 installed in your system.

👇
Read 9 tweets
28 Dec 20
I told everyone that I didn't care.

"Screw math! I've never been great with it, so I'm not starting with machine learning to fail at the end."

That was many years ago.

Math is still hard, but I don't think you should be scared at all. Here is why: 🧵👇
[1] One thing changed my mind: school pushed me to the deep end of the pool, and I was forced to swim.

I had to face my fears, and I started machine learning and realized that the math involved is not as scary (or as much) as some people believe.

👇
[2] Probably one of the most frequent questions I get is around the math needed for machine learning.

Answer:

▫️ Probabilities and Statistics
▫️ Linear Algebra
▫️ Calculus

But it turns out that this is not helpful.

👇
Read 18 tweets
27 Dec 20
I've worked with Dell, HP, IBM, Cisco, HSBC, Disney, G4S, among other large companies.

Don't think for a minute that they have things figured out.

They have amazing development teams. They also have mediocre and straight-horrible teams.

🧵👇
[2] In my experience, smaller companies tend to be more selective when hiring: they can't afford to make a mistake.

I've found out that these smaller companies build consistently decent teams. (Although they have a harder time hiring talent.)

👇
[3] Larger companies, on the other hand, build teams across many different departments. Maintaining consistency is hard, if not impossible.

I met excellent teams: sharp, organized, building excellent products using state-of-the-art technology.

👇
Read 6 tweets
27 Dec 20
An introduction to one of the the most basic structures used in machine learning: a tensor.

🧵👇
Tensors are the data structure used by machine learning systems, and getting to know them is an essential skill you should build early on.

A tensor is a container for numerical data. It is the way we store the information that we'll use within our system.

(2 / 16)
Three primary attributes define a tensor:

▫️ Its rank
▫️ Its shape
▫️ Its data type

(3 / 16)
Read 16 tweets
26 Dec 20
11 key concepts of Machine Learning.

— Supervised Learning Edition —

🧵👇
😜

Before starting, remember that, if you follow me, one of your enemies will be immediately destroyed (and you'll get to read more of these threads, of course.)

And if you don't follow me, well, you just hurt my feelings.

😜
1. Labels

(Also referred to as "y")

The label is the piece of information that we are predicting.

For example:

- the animal that's shown in a picture
- the price of a house
- whether a message is spam or not

👇
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!