Tweet

More from @TivadarDanka

Tivadar Danka

@TivadarDanka

22 Mar

What if you want to optimize a function, but every evaluation costs you $100 and takes a day to execute?

Algorithms like gradient descent build on two key assumptions:

• function is differentiable,
• and you can calculate it on demand.

What if this is not the case?

🧵 👇🏽

For example, you want to tune the hyperparameters of a model that requires 24 hours of GPU time to train.

Can you find a good enough value under reasonable time and budget?

One method is the so-called Bayesian optimization.

Essentially, the method works as follows.

1️⃣ Model the expensive function with a Gaussian process.

Gaussian processes are easy to compute and offer a way to quantify uncertainty in the predictions.

Read 14 tweets

Tivadar Danka

@TivadarDanka

15 Mar

Building a good training dataset is harder than you think.

For example, you can have millions of unlabelled data points, but only have the resources to label a thousand.

This is a story is about a case that I used to encounter almost every day in my work.

🧵 👇🏽

Do you know how new drugs are developed?

Essentially, thousands of candidate molecules are tested to see if they have the targeted effect. First, the testing is done on cell cultures.

Sometimes, there is no better option than scanning through libraries of molecules.

After cells are treated with a given molecule (or molecules in some cases), the effects are studied by screening them with microscopy.

The treated cells can exhibit hundreds of different phenotypes ( = classes), some of them might be very rare.

Read 12 tweets

Tivadar Danka

@TivadarDanka

8 Mar

A neural network doesn't know when it doesn't know.

If you think about it, recognizing when a data point is absolutely unlike any other previously seen is a problem rarely dealt with.

However, it is essential.

In this thread, I'll explain how and why!

🧵 👇🏽

Suppose that this is your training data.

The situation looks fairly straightforward: a simple logistic regression solves the problem.

The model is deployed to production without a second thought.

Now comes the surprise!

We start receiving new data for prediction when we see the following pattern emerge.

The new instances are confidently classified, incorrectly.

Read 11 tweets

Tivadar Danka

@TivadarDanka

4 Mar

Mistakes should be celebrated.

I used to struggle with everything I started to do until I became skilled in it.

The key was to recognizing what I did wrong and going back to fix it. Over and over and over again.

Here is my list of failures that led me to success!

🧵 👇🏽

I was a bad student in school. The most difficult subject for me was mathematics, which I almost failed at one time.

Once I developed an interest, I started to improve very slowly.

Years later, I obtained a PhD in it after solving a problem that has been unsolved for decades.

As a teenager, I was overweight and physically weak. All fat, no muscle.

I was unable to do a single pushup.

Years later, I regularly do 25-50 pushups with one arm only. (Learning to do just a single one-armed pushup took me five years.)

Read 7 tweets

Tivadar Danka

@TivadarDanka

3 Mar

I am going to tell you the best-kept secret of linear algebra: matrices are graphs and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior extremely simple to study.

Let me show you how!

🧵 👇🏽

If you looked at the example above, you probably figured out the rule.

Each row is a node, and each element of a row represents a directed edge.

The element in the 𝑖-th row, 𝑗-th column corresponds to the edge in the graph, going from 𝑖 to 𝑗.

(Formal definition below.)

Why is the directed graph representation beneficial for us?

The first example is that the powers of the matrix correspond to walks in the graph.

Take a look at how to calculate the elements of the square of a matrix.

Read 9 tweets

Tivadar Danka

@TivadarDanka

2 Mar

Besides Kaggle, there are several other competition platforms.

You can use these to

• learn,
• test your skills,
• collaborate with awesome people,
• enhance your resume,
• and possibly earn money.

Take a look at these below, you'll definitely find them useful!

🧵 👇🏽

1. Numerai (numer.ai)

This is quite a special one, since it only contains a single competition.

However, its aims are big: Numerai wants to build the world's first open hedge fund

2. AIcrowd (aicrowd.com)

You can find all sort of competitions here on a wide spectrum, from applied problems to research.

Read 16 tweets

Share this page!

Tivadar Danka

Try unrolling a thread yourself!

More from @TivadarDanka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Did Thread Reader help you today?

Like this author's thread?