17 Oct, 9 tweets, 3 min read
Wanna maximize the potential reward of every hour you spend?

Here is a tangible way to do this when building real-life Machine Learning solutions.

π§΅π
Complex systems usually depend on multiple components working together to produce a solution.

Imagine a pipeline like this, where the input goes through 4 different components before getting to the appropriate output.

π
After everything is said and done, let's imagine this system is correct 60% of the time.

That sucks. We need to improve it.

Unfortunately, we tend to prioritize work in those areas where we *think* there's value. Even worse, areas that are easy or fun to change.

π
This leads to suboptimal decisions that end up wasting a ton of time.

We are scientists. We can do better than that. π

Let's talk about Ceiling Analysis and how it gives us a platform to decide where to zero-in and make a difference.

π
This is what we are going to do:

1β£Replace a component with a mocked solution that provides 100% accurate results.

2β£Measure the overall impact.

3β£Repeat with another component.

This will help us find the ceiling of potential improvements.

π

We are going to override it with a pre-defined 100% correct answer.

(We are basically cheating so we can determine the impact of improving this individual component.)

Then measure the overall solution and write down the result (63% in this case)

π
We do the same for all the components in our solution.

Remember: each iteration progressively overrides one component at a time.

1. A
2. A + B
3. A + B + C
4. A + B + C + D

Obviously, the last iteration gives you 100% correct results.

π
Now, it's time to determine where do we want to focus our time.

Here is the maximum increase we'd get from improving each component:

A. 3% increase
B. 2% increase
C. 16% increase
D. 19% increase

Pretty clear that we want to focus on either D or C, right?
Ceiling analysis is extremely powerful and informative. It has been my go-to compass to shine a light on the road ahead.

β’ β’ β’

Missing some Tweet in this thread? You can try to force a refresh
γ

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

# More from @svpino

18 Oct
Bias vs. variance in 13 charts.

π§΅π
Here is a sample 2-dimensional dataset.

(We are just representing here the training data.)

π
The red line represents a model.

Let's call it "Model A."

A very simple model. Just a straight line.

π
14 Oct
Machine Learning 101:

β«οΈ Overfitting sucks β«οΈ

Here is what you need to know.

π§΅π
Overfitting is probably the most common problem when training a Machine Learning model (followed very close by underfitting.)

Overfitting means that your model didn't learn much, and instead, it's just memorizing stuff.

π
Overfitting may be misleading: during training, it looks like your model learned awesomely well.

Look at the attached picture. It shows how the accuracy of a sample model increases as it's being trained.

The accuracy reaches close to 100%! That's awesome!

Or, is it?

π
13 Oct
Transfer Learning.

It sounds fancy because it is.

This is a thread about one of the most powerful tools that make possible that knuckleheads like me achieve state-of-the-art Deep Learning results on our laptops.

π§΅π
Deep Learning is all about "Deep" Neural Networks.

"Deep" means a lot of complexity. You can translate this to "We Need Very Complex Neural Networks." See the attached example.

The more complex a network is, the slower it is to train, and the more data we need to train it.

π
To get state-of-the-art results when classifying images, we can use a network like ResNet50, for example.

It takes around 14 days to train this network with the "imagenet" dataset (1,300,000+ images.)

14 days!

That's assuming that you have a decent (very expensive) GPU.

π
12 Oct
When I heard about Duck Typing for the first time, I had to laugh.

But Python π has surprised me before, and this time was no exception.

This is another short thread π§΅ that will change the way you write code.

π
Here is the idea behind Duck Typing:

β«οΈIf it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.

Taking this to Python's world, the functionality of an object is more important than its type. If the object quacks, then it's a duck.

π
Duck Typing is possible in dynamic languages (Hello, JavaScript fans π!)

Look at the attached example. Notice how "Playground" doesn't care about the specific type of the supplied item. Instead, it assumes that the item supports the bounce() method.

π
11 Oct
This is ridiculous.

I've been coding in Python π since 2014. Somehow I've always resisted embracing one of their most important principles.

π

One of its core principles is around writing explicit code.

Take a look at the attached code. It's one of those classic examples showing bad versus good.

See the difference?

π
There are more subtle ways in which Python encourages explicitness.

This example shows a function that checks whether two keys exist in a dictionary and adds them up if they do.

If one of the keys doesn't exist, the function returns None.

Nothing wrong here, right?

π
10 Oct
I had a breakthrough that turned a Deep Learning problem on its head!

Here is the story.
Here is the lesson I learned.

π§΅π
No, I did not cure cancer.

This story is about a classification problem βspecifically, computer vision.

I get images, and I need to determine the objects represented by them.

I have a ton of training data. I'm doing Deep Learning.

Life is good so far.

π
I'm using transfer learning.

In this context, transfer learning consists of taking a model that was trained to identify other types of objects and leverage everything that it learned to make my problem easier.

This way I don't have to teach a model from scratch!

π