Tweet

Tivadar Danka

8 Mar, 11 tweets, 3 min read

A neural network doesn't know when it doesn't know.

If you think about it, recognizing when a data point is absolutely unlike any other previously seen is a problem rarely dealt with.

However, it is essential.

In this thread, I'll explain how and why!

🧵 👇🏽

Suppose that this is your training data.

The situation looks fairly straightforward: a simple logistic regression solves the problem.

The model is deployed to production without a second thought.

Now comes the surprise!

We start receiving new data for prediction when we see the following pattern emerge.

The new instances are confidently classified, incorrectly.

Situations like these happen all the time.

For instance, you are building an image-based tool for retailers to detect products on a shelf. One day, some manufacturer comes out with a new product, in new packaging.

How do you prepare your system for that?

One answer is open set recognition: detecting the unknowns.

Instead of looking at the class probabilities given by our network, we take a single step back and analyze the distribution of the resulting feature vector.

For a single feature, it may look something like this.

The method called OpenMax (which is just one among the many, but was one of the first algorithms) detects these anomalies by fitting a particular distribution to the features and checks if new values fit.

In the paper Towards Open Set Deep Networks by Abhijit Bendale and Terrance E. Boult, the authors demonstrate how revealing the activation distributions are.

In their illustration below, you can see that new examples do indeed have different distributions.

This method works if you don't have any control over the training.

Can we perform better if we can modify the training? What if we train the network to be prepared for open set recognition tasks from the start?

Turns out, we can do this.

In their paper Reducing Network Agnostophobia, the authors Akshay Raj Dhamija, Manuel Günther, and Terrance E. Boult came up with a method that can do this.

They observed that the activation of unknown examples tends to cluster around the origin.

arxiv.org/pdf/1811.04110…

Their key idea was that with adding an "unknown" class and using a special loss function, the norm of unknown samples is forced to zero.

This is called Objectosphere loss. Below is the author's illustration of its performance on MNIST digit recognition.

Although open set recognition is an old problem, it hasn't seen much research activity until recently.

Since machine learning algorithms are not constrained to laboratory settings anymore, it is essential to perform this task properly.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @TivadarDanka

Tivadar Danka

@TivadarDanka

4 Mar

Mistakes should be celebrated.

I used to struggle with everything I started to do until I became skilled in it.

The key was to recognizing what I did wrong and going back to fix it. Over and over and over again.

Here is my list of failures that led me to success!

🧵 👇🏽

I was a bad student in school. The most difficult subject for me was mathematics, which I almost failed at one time.

Once I developed an interest, I started to improve very slowly.

Years later, I obtained a PhD in it after solving a problem that has been unsolved for decades.

As a teenager, I was overweight and physically weak. All fat, no muscle.

I was unable to do a single pushup.

Years later, I regularly do 25-50 pushups with one arm only. (Learning to do just a single one-armed pushup took me five years.)

Read 7 tweets

Tivadar Danka

@TivadarDanka

3 Mar

I am going to tell you the best-kept secret of linear algebra: matrices are graphs and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior extremely simple to study.

Let me show you how!

🧵 👇🏽

If you looked at the example above, you probably figured out the rule.

Each row is a node, and each element of a row represents a directed edge.

The element in the 𝑖-th row, 𝑗-th column corresponds to the edge in the graph, going from 𝑖 to 𝑗.

(Formal definition below.)

Why is the directed graph representation beneficial for us?

The first example is that the powers of the matrix correspond to walks in the graph.

Take a look at how to calculate the elements of the square of a matrix.

Read 9 tweets

Tivadar Danka

@TivadarDanka

2 Mar

Besides Kaggle, there are several other competition platforms.

You can use these to

• learn,
• test your skills,
• collaborate with awesome people,
• enhance your resume,
• and possibly earn money.

Take a look at these below, you'll definitely find them useful!

🧵 👇🏽

1. Numerai (numer.ai)

This is quite a special one, since it only contains a single competition.

However, its aims are big: Numerai wants to build the world's first open hedge fund

2. AIcrowd (aicrowd.com)

You can find all sort of competitions here on a wide spectrum, from applied problems to research.

Read 16 tweets

Tivadar Danka

@TivadarDanka

1 Mar

You ask me so often for free online resources about deep learning that I decided to collect my favorite courses!

These topics interest you the most:

🟩 practical deep learning,
🟩 deep learning theory,
🟩 math resources to understand the two above.

Let's see them!

🧵 👇🏽

1️⃣ Practical deep learning.

If you want to take a deep dive straight into the field and want to start training your models right away, hands down the best course for you out there is Practical Deep Learning for Coders by fast.ai. (course.fast.ai)

@full_stack_dl

To move beyond training models and learn about tooling and infrastructure, IMO the best course for you is the Full Stack Deep Learning course by @full_stack_dl.

fall2019.fullstackdeeplearning.com

Read 13 tweets

Tivadar Danka

@TivadarDanka

26 Feb

Have you ever thought about why neural networks are so powerful?

Why is it that no matter the task, you can find an architecture that knocks the problem out of the park?

One answer is that they can approximate any function with arbitrary precision!

Let's see how!

🧵 👇🏽

From a mathematical viewpoint, machine learning is function approximation.

If you are given data points 𝑥 with observations 𝑦, learning essentially means finding a function 𝑓 such that 𝑓(𝑥) approximates the given 𝑦-s as accurately as possible.

Approximation is a very natural idea in mathematics.

Let's see a simple example!

You probably know the exponential function well. Do you also know how to calculate it?

The definition itself doesn't really help you. Calculating the powers where 𝑥 is not an integer is tough.

Read 16 tweets

Tivadar Danka

@TivadarDanka

24 Feb

Conditional probability is one of the central concepts of statistics and probability theory.

Without a way to account for including prior information in our models, statistical models would be practically useless.

Let's see what conditional probability simply means!

If 𝐴 are 𝐵 are two events, they are not necessarily independent of each other.

This means that the occurrence of one can give information about the other.

When performing statistical modeling, this is frequently the case.

To illustrate, we will take a look at spam filters!

Suppose that you have 100 mails in your inbox.

40 is spam, 60 is not.

Based only on this information, if you receive a random letter, there is a 40% chance that it is spam.

This is not sufficient to build a decent model for spam detection.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Tivadar Danka

Try unrolling a thread yourself!

More from @TivadarDanka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Did Thread Reader help you today?

Like this author's thread?