Reading research papers is a skill in itself.

I learned it the hard way. After reading hundreds of articles, I figured out the methods of learning and extracting information the simplest way.

Here is how.

🧵 👇🏽
Regardless of fields, most well-written papers have a similar structure:

What is the problem?
🠓
What are the previous works?
🠓
What did previous works miss?
🠓
What is the main result?
🠓
Why does it work?
🠓
How it compares to others?
🠓
What are its limitations?
However, research papers are not meant to be read linearly.

There are several levels of understanding:

knowing
1. how to use the result,
2. when to use it,
3. why and how does it work,
4. and how to improve it.

Depending on your goal, the reading paths might differ.
When starting out with a paper, you don't need to read everything in one go. You don't even need to follow the order of the sections.

Get the big picture
In the first reading, aim to get a clear picture of

🔹 what is the problem (introduction section),
🔹 what is the solution (the main result),
🔹 and its potential shortcomings (the conclusion/discussion section).

With these, you are ready to use the result.
Now you are ready to move towards a deeper understanding. Focus on

🔹 What other solutions are there?
🔹 What are they lacking?
🔹 How do they compare with the authors' solution?

These require more mental effort, but it will be easier after the first read.
Note that for this, you still don't need to read the whole paper.

If a section doesn't answer these questions (like methods and proofs), you can skip it for now.

This way, you can focus on the information you really need to piece the puzzle together.
To reproduce the result and even improve on it, you have to

🔹 understand why the solution works (methods section and proofs),
🔹 and read the previous works if necessary.

If you don't want to do further research, don't feel bad for skipping these.
In case you do want to do further research, the process starts over at this point.

(Either you read on subsequent works or previous results to see how others attempted to solve this problem.)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

12 May
What you see below is a 2D representation of the MNIST dataset.

It was produced by t-SNE, a completely unsupervised algorithm. The labels were unknown to it, yet it almost perfectly separates the classes. The result is amazing.

This is how the magic is done!

🧵 👇🏽
Even though real-life datasets can have several thousand features, often the data itself lies on a lower-dimensional manifold.

Dimensionality reduction aims to find these manifolds to simplify data processing down the line.
So, we have data points 𝑥ᵢ in a high-dimensional space, looking for lower dimensional representations 𝑦ᵢ.

We want the 𝑦ᵢ-s to preserve as many properties of the original as possible.

For instance, if 𝑥ᵢ is close to 𝑥ⱼ, we want 𝑦ᵢ to be close to 𝑦ⱼ as well.
Read 15 tweets
11 May
There is a mathematical formula so beautiful that it is almost unbelievable.

Euler's identity combines the famous numbers 𝑒, 𝑖, π, 0, and 1 in a single constellation. At first sight, most people doubt that it is true. Surprisingly, it is.

This is why.

🧵 👇🏽
Let's talk about the famous exponential function 𝑒ˣ first.

Have you ever thought about how is this calculated in practice? After all, raising an irrational number to any power is not trivial.

It turns out that the function can be written as an infinite sum!
In fact, this can be done with many other functions.

For those that are differentiable infinitely many times, there is a recipe to find the infinite sum form. This form is called the Taylor expansion.

It does not always yield the original function, but it works for 𝑒ˣ.
Read 9 tweets
10 May
Creative abuse of rules can lead to game-changing discoveries.

In high school, you learned that -1 has no square roots. Yet, by ignoring this, you'll soon discover something that changed mathematics forever: complex numbers.

Follow along, and you'll see how!

🧵 👇🏽 Image
Let's start with a very simple equation:

𝑥² + 1 = 0

Can we solve this? Not at first glance, since the left side of the equation is always larger than one. This is equivalent to solving

𝑥² = -1,

which is (apparently) not possible. Image
But let's disregard this and imagine a number whose square is -1.

Let's appropriately name it the 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦 𝑛𝑢𝑚𝑏𝑒𝑟 and denote it with 𝑖.

So, 𝑖² = -1.

Now that we have this strange entity, what can we do?
Read 12 tweets
7 May
One of the biggest misconceptions regarding education is that its main purpose is to give knowledge you can immediately use.

It is not.

The best thing education can give you is the mental agility to obtain knowledge at the speed of light.

Let's unpack this idea a bit!

1/7
Consider a course where you build a custom neural network framework with NumPy.

This is hardly usable in practice: working with a custom library is insane.

However, if you know how they are built, you only need to learn the interface to master an actual framework!

2/7
By understanding how the framework is built and how the underlying algorithms work, you'll be able to do much more: experiment with custom optimizers, implement your own layers, etc.

3/7
Read 7 tweets
5 May
An exciting result came out from @GoogleAI recently, which raises several questions about how deep network architectures should be.

Here is their announcement, including a very interesting post. I would like to unpack this a bit.

Suppose that you have a trained network and a set of samples 𝑋. You take this data and run it through the network, storing all intermediate results.

The output of the 𝑖-th layer is denoted by 𝑋ᵢ. These encode the intermediate internal representations of the data.
In general, the further you go, the higher level these representations become.

For a convolutional network, filters in earlier layers detect edges, while later activations represent objects.

Check the fantastic article below for more details!

distill.pub/2017/feature-v…
Read 8 tweets
28 Apr
Principal Component Analysis is one of the most fundamental techniques in data science.

Despite its simplicity, it has several equivalent forms that you might not have seen.

In this thread, we'll explore what PCA is really doing!

🧵 👇🏽
PCA is most commonly introduced as an algorithm that iteratively finds vectors in the feature space that are

• orthogonal to the previously identified vectors,
• and maximizes the variance of the data projected onto it.

These vectors are called the principal components.
The idea behind this is we want features that convey as much information as possible.

Low variance means that the feature is more concentrated, so it is easier to predict its value in principle.

Features with low enough variances can even be omitted.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(