I learned it the hard way. After reading hundreds of articles, I figured out the methods of learning and extracting information the simplest way.
Here is how.
🧵 👇🏽
Regardless of fields, most well-written papers have a similar structure:
What is the problem?
🠓
What are the previous works?
🠓
What did previous works miss?
🠓
What is the main result?
🠓
Why does it work?
🠓
How it compares to others?
🠓
What are its limitations?
However, research papers are not meant to be read linearly.
There are several levels of understanding:
knowing 1. how to use the result, 2. when to use it, 3. why and how does it work, 4. and how to improve it.
Depending on your goal, the reading paths might differ.
When starting out with a paper, you don't need to read everything in one go. You don't even need to follow the order of the sections.
Get the big picture
In the first reading, aim to get a clear picture of
🔹 what is the problem (introduction section),
🔹 what is the solution (the main result),
🔹 and its potential shortcomings (the conclusion/discussion section).
With these, you are ready to use the result.
Now you are ready to move towards a deeper understanding. Focus on
🔹 What other solutions are there?
🔹 What are they lacking?
🔹 How do they compare with the authors' solution?
These require more mental effort, but it will be easier after the first read.
Note that for this, you still don't need to read the whole paper.
If a section doesn't answer these questions (like methods and proofs), you can skip it for now.
This way, you can focus on the information you really need to piece the puzzle together.
To reproduce the result and even improve on it, you have to
🔹 understand why the solution works (methods section and proofs),
🔹 and read the previous works if necessary.
If you don't want to do further research, don't feel bad for skipping these.
In case you do want to do further research, the process starts over at this point.
(Either you read on subsequent works or previous results to see how others attempted to solve this problem.)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
What you see below is a 2D representation of the MNIST dataset.
It was produced by t-SNE, a completely unsupervised algorithm. The labels were unknown to it, yet it almost perfectly separates the classes. The result is amazing.
This is how the magic is done!
🧵 👇🏽
Even though real-life datasets can have several thousand features, often the data itself lies on a lower-dimensional manifold.
Dimensionality reduction aims to find these manifolds to simplify data processing down the line.
So, we have data points 𝑥ᵢ in a high-dimensional space, looking for lower dimensional representations 𝑦ᵢ.
We want the 𝑦ᵢ-s to preserve as many properties of the original as possible.
For instance, if 𝑥ᵢ is close to 𝑥ⱼ, we want 𝑦ᵢ to be close to 𝑦ⱼ as well.
There is a mathematical formula so beautiful that it is almost unbelievable.
Euler's identity combines the famous numbers 𝑒, 𝑖, π, 0, and 1 in a single constellation. At first sight, most people doubt that it is true. Surprisingly, it is.
This is why.
🧵 👇🏽
Let's talk about the famous exponential function 𝑒ˣ first.
Have you ever thought about how is this calculated in practice? After all, raising an irrational number to any power is not trivial.
It turns out that the function can be written as an infinite sum!
In fact, this can be done with many other functions.
For those that are differentiable infinitely many times, there is a recipe to find the infinite sum form. This form is called the Taylor expansion.
It does not always yield the original function, but it works for 𝑒ˣ.
Creative abuse of rules can lead to game-changing discoveries.
In high school, you learned that -1 has no square roots. Yet, by ignoring this, you'll soon discover something that changed mathematics forever: complex numbers.
Follow along, and you'll see how!
🧵 👇🏽
Let's start with a very simple equation:
𝑥² + 1 = 0
Can we solve this? Not at first glance, since the left side of the equation is always larger than one. This is equivalent to solving
𝑥² = -1,
which is (apparently) not possible.
But let's disregard this and imagine a number whose square is -1.
Let's appropriately name it the 𝑖𝑚𝑎𝑔𝑖𝑛𝑎𝑟𝑦 𝑛𝑢𝑚𝑏𝑒𝑟 and denote it with 𝑖.
So, 𝑖² = -1.
Now that we have this strange entity, what can we do?
One of the biggest misconceptions regarding education is that its main purpose is to give knowledge you can immediately use.
It is not.
The best thing education can give you is the mental agility to obtain knowledge at the speed of light.
Let's unpack this idea a bit!
1/7
Consider a course where you build a custom neural network framework with NumPy.
This is hardly usable in practice: working with a custom library is insane.
However, if you know how they are built, you only need to learn the interface to master an actual framework!
2/7
By understanding how the framework is built and how the underlying algorithms work, you'll be able to do much more: experiment with custom optimizers, implement your own layers, etc.
3/7