There is a mathematical formula so beautiful that it is almost unbelievable.

Euler's identity combines the famous numbers ๐‘’, ๐‘–, ฯ€, 0, and 1 in a single constellation. At first sight, most people doubt that it is true. Surprisingly, it is.

This is why.

๐Ÿงต ๐Ÿ‘‡๐Ÿฝ
Let's talk about the famous exponential function ๐‘’หฃ first.

Have you ever thought about how is this calculated in practice? After all, raising an irrational number to any power is not trivial.

It turns out that the function can be written as an infinite sum!
In fact, this can be done with many other functions.

For those that are differentiable infinitely many times, there is a recipe to find the infinite sum form. This form is called the Taylor expansion.

It does not always yield the original function, but it works for ๐‘’หฃ.
Taylor expansions are advantageous for two reasons.

First, we can approximate functions by cutting of the sum at some N.

Second, we can simply extend functions to the complex plane with this formula!
The exponential function is not the only one that can be written as a Taylor series.

We can also do this with the trigonometric functions sine and cosine.

(Feel free to check this by hand using the general Taylor expansion formula.)
By plugging in ๐‘–๐‘ง into the exponential function, we discover that the complex exponential function can be written in terms of trigonometric functions!

(We use that ๐‘–ยฒ = -1.)
In the special case ๐‘ง = ฯ€, we obtain the famous formula called Euler's identity.

This is how the magic happens.
When asked, Euler's identity often comes up among mathematicians as the most beautiful formula ever.

It is not only amazing because it connects together a bunch of famous constants, but because it establishes a connection between the exponential and trigonometric functions.
If you enjoyed this explanation, consider following me and hitting a like/retweet on the first tweet of the thread!

I regularly post simple explanations of mathematical concepts in machine learning, make sure you don't miss out on the next one!

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

12 May
What you see below is a 2D representation of the MNIST dataset.

It was produced by t-SNE, a completely unsupervised algorithm. The labels were unknown to it, yet it almost perfectly separates the classes. The result is amazing.

This is how the magic is done!

๐Ÿงต ๐Ÿ‘‡๐Ÿฝ Image
Even though real-life datasets can have several thousand features, often the data itself lies on a lower-dimensional manifold.

Dimensionality reduction aims to find these manifolds to simplify data processing down the line. Image
So, we have data points ๐‘ฅแตข in a high-dimensional space, looking for lower dimensional representations ๐‘ฆแตข.

We want the ๐‘ฆแตข-s to preserve as many properties of the original as possible.

For instance, if ๐‘ฅแตข is close to ๐‘ฅโฑผ, we want ๐‘ฆแตข to be close to ๐‘ฆโฑผ as well.
Read 15 tweets
10 May
Creative abuse of rules can lead to game-changing discoveries.

In high school, you learned that -1 has no square roots. Yet, by ignoring this, you'll soon discover something that changed mathematics forever: complex numbers.

Follow along, and you'll see how!

๐Ÿงต ๐Ÿ‘‡๐Ÿฝ Image
Let's start with a very simple equation:

๐‘ฅยฒ + 1 = 0

Can we solve this? Not at first glance, since the left side of the equation is always larger than one. This is equivalent to solving

๐‘ฅยฒ = -1,

which is (apparently) not possible. Image
But let's disregard this and imagine a number whose square is -1.

Let's appropriately name it the ๐‘–๐‘š๐‘Ž๐‘”๐‘–๐‘›๐‘Ž๐‘Ÿ๐‘ฆ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ and denote it with ๐‘–.

So, ๐‘–ยฒ = -1.

Now that we have this strange entity, what can we do?
Read 12 tweets
7 May
One of the biggest misconceptions regarding education is that its main purpose is to give knowledge you can immediately use.

It is not.

The best thing education can give you is the mental agility to obtain knowledge at the speed of light.

Let's unpack this idea a bit!

1/7
Consider a course where you build a custom neural network framework with NumPy.

This is hardly usable in practice: working with a custom library is insane.

However, if you know how they are built, you only need to learn the interface to master an actual framework!

2/7
By understanding how the framework is built and how the underlying algorithms work, you'll be able to do much more: experiment with custom optimizers, implement your own layers, etc.

3/7
Read 7 tweets
5 May
An exciting result came out from @GoogleAI recently, which raises several questions about how deep network architectures should be.

Here is their announcement, including a very interesting post. I would like to unpack this a bit.

Suppose that you have a trained network and a set of samples ๐‘‹. You take this data and run it through the network, storing all intermediate results.

The output of the ๐‘–-th layer is denoted by ๐‘‹แตข. These encode the intermediate internal representations of the data.
In general, the further you go, the higher level these representations become.

For a convolutional network, filters in earlier layers detect edges, while later activations represent objects.

Check the fantastic article below for more details!

distill.pub/2017/feature-vโ€ฆ
Read 8 tweets
28 Apr
Principal Component Analysis is one of the most fundamental techniques in data science.

Despite its simplicity, it has several equivalent forms that you might not have seen.

In this thread, we'll explore what PCA is really doing!

๐Ÿงต ๐Ÿ‘‡๐Ÿฝ
PCA is most commonly introduced as an algorithm that iteratively finds vectors in the feature space that are

โ€ข orthogonal to the previously identified vectors,
โ€ข and maximizes the variance of the data projected onto it.

These vectors are called the principal components.
The idea behind this is we want features that convey as much information as possible.

Low variance means that the feature is more concentrated, so it is easier to predict its value in principle.

Features with low enough variances can even be omitted.
Read 10 tweets
27 Apr
Have you ever wondered why include the logarithm in the definition of log-likelihood?

The answer is simple: logarithm makes differentiation of products easier.

Let's see why!

๐Ÿงต ๐Ÿ‘‡๐Ÿฝ
Although the derivative of a sum is the sum of derivatives, a similar property cannot be stated about the product of functions.

The derivative of a product is slightly more complicated: it is a sum of products.
The formula gets even more complicated when we have more functions in the product.

When potentially hundreds of terms are present, like in the likelihood function, computing this is not feasible.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(