Santiago Profile picture
27 Sep, 4 tweets, 1 min read
If you are a teaching a machine learning class, or thinking of creating a course, please, dedicate some time to have your students deal with data.

Most courses mention "data is important, you know?" and right away go into the 1,001 different ways to build a model.

A better way:

Have your students practice a skill they will face the very first day they go out there.

Data is messy. Incomplete. Noisy. Dissorganized. Misslabeled.

Have them deal with this for a while. Don't worry about the modeling part.
A good exercise:

1. Give your students a dataset.
2. Give them a model.
3. Ask them to improve its performance.
4. They can't touch the model code.

They should focus exclusively on improving the data to get a better performance.

This should be great practice.
I think it is time for us to spend more time doing more data-centric activities.

That's the number one issue I see every day. Unfortunately, discussions about this are mostly abscent on the classroom.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

24 Sep
I've been trying to identify the most effective trait for those building a career in software development.

If I were to give you one single recommendation, what would that be?

I think I figured it out. ↓
Here is a problem I see every day:

Most people start their careers solving the same boring exercises.

This is good in certain ways, but it also limits your experience to what everyone else is doing.

The key to getting out of this trap?

*Curiosity*
If there's a single trait that has helped me make continuous progress over the last two decades in building software, it has been a relentless curiosity.

And contrary to what many believe, you can learn to be curious.

This is what I do.
Read 9 tweets
23 Sep
I've heard multiple times that you don't need to do any feature engineering or selection whenever you are using neural networks.

This is not true.

Yes, neural networks can extract patterns and ignore unnecessary features from the dataset, but this is usually not enough.

First, neural networks can't compete with our expertise understanding the data.

As long as we know the problem and the dataset, we can come up with features that it would be really hard for a network to reproduce.

This is a highly creating process. Really hard to reproduce.
A couple of notes regarding the ability of a network to do automatic feature selection:

Yes, networks can "ignore" features that have no bearing on the prediction.

But these features can still introduce noise that degrade the performance of the model.
Read 5 tweets
21 Sep
One issue I see with people applying for a job:

They struggle to highlight their experience in an effective way.

If you are trying to get a job as a Data Scientist or Machine Learning Engineer, here is something you can do.

The first step is to stop thinking of "experience" exclusively as a synonym for employment history.

Experience is about all of the work you have done. It doesn't matter whether someone else paid for it.

If you know how to get things done, you should highlight it.
The second step is doing some inventory.

I'm sure you can find examples and exercises you've solved over the past few months.

They don't have to be end-to-end applications. They just need to showcase your knowledge and ability to make things work.

Collect them all.
Read 13 tweets
20 Sep
When designing a machine learning model, remember the "stretch pants" approach:

Don't waste time looking for pants that perfectly match your size. Instead, use large stretch pants that will shrink down to the right size.

What does this mean for your model?
The "stretch pants" approach in machine learning:

Pick a model with more capacity than you need. Then, use regularization techniques to avoid overfitting.

You gotta thank Vincent Vanhoucke, a scientist at Google, for this analogy.
One example:

Imagine designing a neural network, and you configure a hidden layer that's too small (not many neurons.)

The network may not preserve all the valuable information from the data: you don't have enough power to do it!

Realizing this is difficult.
Read 4 tweets
18 Sep
In theory, you can model any function using a neural network with a single hidden layer.

However, deep networks are much more efficient than shallow ones.

Can you explain why?
If my first claim gives you pause, I'm talking about the Universal approximation theorem.

You can find more about it online, but the attached paragraph summarizes the relevant part very well.
Informally, we usually say that we can model any function with a single hidden layer neural network.

But there are a couple of caveats with this statement.
Read 12 tweets
17 Sep
I need your help.

The doctor tested me, and I came back positive for a disease that infects 1 of every 1,000 people.

The test comes back positive 99% of the time if the person has the disease. About 2% of uninfected patients also come back positive.

Do I have the disease?
To answer this question, we need to understand why the doctor tested me in the first place.

If I had symptoms or if she suspected I had the disease for any reason, the analysis would be different.

But let's assume that the doctor tested 10,000 patients for no specific reason.
Many people replied using Bayes Theorem to solve this problem.

This is correct. But let's try to come up with an answer in a different—maybe more intuitive—way.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(