Santiago Profile picture
I write about Machine Learning.
B̶͕̙̻͈̹̟̭̱͗͋͛̽͘͠ Profile picture Dr. Hansom Davidoff Profile picture Rodolpho Gurgel Profile picture Shashank Profile picture Ayodele Samuel Adebayo (unclebay) Profile picture 31 added to My Authors
10 May
A 6-step process that completely changed my life:

• Maximize what you don't learn
• Avoid schedules
• Use uncomfortable situations
• Learn as a byproduct
• Teach somebody else
• Circle back in a month

On how to learn efficiently and get ahead in life: ↓
Everything starts with maximizing the things I don't learn.

If I spend time on things that don't bring me value, I can't focus on what really matters to me.

By default, everything around me is noise until it's impossible to ignore.
If I don't see the value right away, I'll ignore it. Important things will make their way back to me.

Ignoring the noise makes space for what truly matters.
Read 15 tweets
8 May
My recommendation to learn machine learning:

• Introduction to Python Programming (Udacity)
• Machine Learning Crash Course (Google)
• Machine Learning (Coursera)

In that order. They are all free. They are all amazing.

And take your time. This is a marathon, not a sprint.
Kaggle is an amazing place to practice what you learn.
And of course, there’s always my newsletter, and my Twitter account… if you truly want to learn machine learning, you definitely want to stay tuned!
Read 4 tweets
8 May
A topic that comes up in every interview.

Bias, variance, and their relationship with machine learning algorithms. One of the most basic concepts that you have to know by heart.

Here is a simple summary that you will easily remember.

Every machine learning algorithm deals with 3 types of errors:

1. Bias error
2. Variance error
3. Irreducible error

There's nothing we can do about #3.

Let's focus on the other two.

1/5
"Bias" refers to the assumptions the model makes to simplify the process of finding answers.

The more assumptions it makes, the more biased the model is.
Read 9 tweets
7 May
Do you know what scares me? Data labeling in machine learning.

We don't talk enough about it, and yet we can't do anything unless we solve this first. Labeling enough data is expensive or even outright impossible.

Some ideas to solve this problem.

Let's start with an example:

You have terrain and weather information for different locations. Your goal is to build a model that predicts where to drill to find oil.

How do you label this data? You drill to find out where the oil is.

This is ridiculously expensive.
To get around this problem, you need to minimize the number of labeled examples you need to build a good model.

1. Take the data
2. Select as few examples as possible
3. Drill those holes to come up with the labels
4. Train the model

How can you achieve #2?
Read 11 tweets
6 May
12 machine learning YouTube videos.

On libraries, algorithms, tools, and theory.

1. Jupyter Notebooks:

2. Pandas:

3. Matplotlib:

4. Seaborn:
5. Numpy:

6. Decision Trees:

7. Neural Networks:

8. Scikit-Learn:
Read 4 tweets
5 May
Machine learning education is broken.

If you are preparing for a research position, you are good. If you are looking to get out there and start solving problems, not even close.

Here are some thoughts so you can get ahead.

Most classes, courses, and books cover the same road.

They start with a dataset. They finish with a working model. The focus is always on everything that happens in between.

Dataset → Model.

This is great, but not enough.
Real-life situations rarely start with a dataset, and they never end after you finish building your model.

Applying machine learning successfully is hard.

Here are a few examples that you should keep in mind.
Read 13 tweets
1 May
A little over 12 years ago, the police started building a case against me.

That was stressful. They were watching. They wanted to take me off the streets.

Here is the story of how I fled Cuba and came to the United States.

After finishing college, I started taking freelance projects.

That was illegal. The Cuban government didn't allow people to make money working for foreign companies.

If you were lucky, you could get 2 years in jail. They called it "illicit enrichment."
We were a small group of friends. We met at my house every morning.

We paid a foreign national for Internet access. Cubans weren't allowed to buy it, so we had to get creative.

It was a 56kbs connection shared across 4 computers.
Read 13 tweets
30 Apr
We've all heard the horror stories.

Are you ready for machine learning math? Are you sure you can download a library, go through a course and make things happen?

This is some unsolicited advice.

↓ 1/10
Back then, we had to write custom training loops. Every time.

We all heard horror stories about the complexity of statistics and how ugly linear algebra was. This was a real thing.

The barrier to start with machine learning was high and full of thorns.

↓ 2/10
Today, things are different.

The lack of programming skills is a much bigger hurdle than not understanding how derivatives work.

Wanna have a better chance? Learn to code today. Worry about math tomorrow.

↓ 3/10
Read 10 tweets
28 Apr
Free machine learning education.

Many top universities are making their Machine Learning and Deep Learning programs publicly available. All of this information is now online and free for everyone!

Here are 6 of these programs. Pick one and get started!

Introduction to Deep Learning
MIT Course 6.S191
Alexander Amini and Ava Soleimany

Introductory course on deep learning methods and practical experience using TensorFlow. Covers applications to computer vision, natural language processing, and more.

introtodeeplearning.com
Deep Learning
NYU DS-GA 1008
Yann LeCun and Alfredo Canziani

This course covers the latest techniques in deep learning and representation learning with applications to computer vision, natural language understanding, and speech recognition.

atcold.github.io/pytorch-Deep-L…
Read 8 tweets
27 Apr
$5 only for the next 50 orders.

1,131 people have bought it. 99% 5-star ratings.

Don't like it, and you pay nothing.

Link here → gum.co/kBjbC/five
If you bought it already or aren't interested, like/retweet the original tweet, and you'll be supporting my work as much as if you were paying.

Thank you from the bottom of my heart!
40 left.
Read 5 tweets
27 Apr
Data is the core of machine learning.

It should not surprise you that most of the work you'll have to do is related to capturing, managing, processing, and validating data.

A few recommendations for those who would like to start.

1/7
As you get your feet wet, these are roughly some of the areas that you should cover:

• Data collection
• Data visualization
• Imputation
• Handling outliers
• Encoding
• Normalization and scaling
• Binning and grouping

2/7
Here is a good, introductory, free course provided by Google:

"Data Preparation and Feature Engineering in ML." — developers.google.com/machine-learni…

It covers the process of collecting, transforming, splitting, and creating datasets that machine learning algorithms can use.

↓ 3/7
Read 7 tweets
26 Apr
20 quotes that I use as inspiration to change my life.

Print them out. Keep them close. Read them frequently. Reflect on them and the wisdom behind every word.

Let's start:

Humans are allergic to change. They love to say, "We've always done it this way." I try to fight that. That's why I have a clock on my wall that runs counterclockwise.

— Grace Hopper
With engineering, I view this year's failure as next year's opportunity to try it again.

Failures are not something to be avoided. You want to have them happen as quickly as you can so you can make progress rapidly.

— Gordon Moore
Read 21 tweets
24 Apr
A book that will significantly help with your Python 🐍 skills:

• "Effective Python. 90 specific ways to write better Python." from Brett Slatkin @haxor.

amzn.to/3tM655V

Make sure you buy the second edition.

↓ 1/3 Image
The book contains 90 short lessons. Attached is the entire list.

Each lesson will take you a few minutes, which is extremely powerful:

• Finish lunch 🍱
• Grab the book
• Read one lesson
• Find a place in your code to apply it
• Put back the book on the shelf

↓ 2/3 ImageImageImageImage
Do this consistently, and you will significantly improve your Python skills.

Post a summary here on Twitter of what you learned for even better results.

This book works best for those who already write Python. It's not what you need to start from scratch.

3/3
Read 4 tweets
23 Apr
I've been teaching people how to start with machine learning for more than a year now.

This is a thread with what I've learned and some advice if you are looking to start.

↓ 1/14
People enjoy the process of getting ready to start something new.

Like the first time they go to the gym: they buy supplements, shorts, shoes, and a new headband.

Unfortunately, this is not enough.

↓ 2/14
Most people never stop preparing.

They keep collecting books, tutorials, the "best" videos and ask the same "how to start" questions.

The only thing they forget: taking a concrete step forward.

↓ 3/14
Read 14 tweets
22 Apr
A 13-tweet introduction to one of the most basic structures used in machine learning: a tensor.

Understanding how tensors work is fundamental. They aren't complex but working with them may get confusing if you don't understand all the pieces.

Let's solve that today.

↓ 1/13
Three primary attributes define a tensor:

• Rank: Number of axes.
• Shape: Number of dimensions per axis.
• Data type: Type of data contained in it.

↓ 2/13
The rank of a tensor refers to the tensor's number of axes.

Examples:

• Rank of a matrix is 2.
• Rank of a vector is 1.
• Rank of a scalar is 0.

↓ 3/13
Read 14 tweets
21 Apr
700 people have watched "How To Get Started With Machine Learning." 86 have rated it.

Let's celebrate!

• You can buy the course today for $7.
• $0 if you don't like it.
• Back to $15 tomorrow.

gum.co/kBjbC/only7

If you can't afford it, keep reading:

For every copy I sell today, I'll give away one for free.

To apply for the free copy, reply below with why you think this course will help you.

I'll prioritize the best stories I read.

If you want to support my content, like/retweet this thread, so more people see it.
So far, 8 copies sold, and 8 free copies shared.

Thanks for the continuous support! It helps tremendously!
Read 8 tweets
21 Apr
Creating a good machine learning model is really sexy. That's what's different and where everyone focuses all of their attention.

But machine learning is much more than that.

A thread with a few thoughts about the real job.

1/9
Machine learning engineers spend a lot of time designing and training new models, but this is just a small fraction of their job.

2/9
In reality, dealing with data and operationalizing models is much more time-consuming and sometimes even harder and more involved than creating the models in the first place.

3/9
Read 10 tweets
20 Apr
The backbone of my end-to-end machine learning setup:

• A 48-page Field Notes
• Python
• NumPy, Pandas, Matplotlib, OpenCV
• Scikit-Learn, XGBoost
• TensorFlow
• Google Colab, Jupyter, VSCode
• Docker, Flask
• AWS SageMaker
I personally don't use C/C++.

That doesn't mean it's not useful. I know plenty of people in the industry that rely on C/C++ to do their work.

It just means that I personally haven't needed it.

There are a lot of satellite tools that I have to use depending on the project. Kinesis, Airflow, SQS... the list is endless.

I just tried to list the core of what I need, and it rarely varies.

Read 5 tweets
20 Apr
Yesterday, @PrasoonPratham posted a step-by-step guide to solve the Titanic challenge on Kaggle.

I thought it'd be fun to engineer some features that can help build an even better model.

Here are some ideas worth considering.

↓ 1/10
Attached you can find the original set of input variables that come with the data to solve the problem.

We are going to transform some of these into features that should help our model produce better results.

This is what Feature Engineering is all about.

↓ 2/13
Keep in mind that these are just hypotheses that you'll have to try and validate.

Some of these suggestions might not improve the results or could even make the model perform worse.

This is an exercise to try and think creatively about the data we are getting.

↓ 3/13
Read 15 tweets
19 Apr
Is 10 twice as worse as 5? Sometimes it is, but sometimes it's even worse.

This is the question I always ask myself when deciding how to penalize my models.

Read on for more details and a couple of examples:

↓ 1/11
When we are training a machine learning model, we need to compute how different our predictions are from the expected results.

For example, if we predict a house's price as $150,000, but the correct answer is $200,000, our "error" is $50,000.

↓ 2/11
There are multiple ways we can compute this error, but two common choices are:

• RMSE — Root Mean Squared Error
• MAE — Mean Absolute Error

Both of these have different properties that will shine depending on the problem you want to solve.

↓ 3/11
Read 12 tweets
18 Apr
Have you upgraded your project to Python 🐍 3.9 yet?

Read on for some of the new syntax and built-in features in Python that you don't want to miss.

1/5
1. You can now merge dictionaries by using a new operator "|".

See PEP 584 for more information: python.org/dev/peps/pep-0….

↓ 2/5 Image
2. There's another new operator "|=". This one will let you update a dictionary.

See PEP 584 for more information: python.org/dev/peps/pep-0….

↓ 3/5 Image
Read 6 tweets