This stone tablet from 1800-1600 BC shows that ancient Babylonians were able to approximate the square root of two with 99.9999% accuracy.
How did they do it?
First, let’s decipher the tablet itself. It is called YBC 7289 (short for the 7289th item in the Yale Babylonian Collection), and it depicts a square, its diagonal, and numbers written around them.
Here is a stylized version.
As the Pythagorean theorem implies, the diagonal’s length for a unit square is √2. Let’s focus on the symbols there!
These are numbers, written in Babylonian cuneiform numerals. They read as 1, 24, 51, and 10.
Since the Babylonians used the base 60 numeral system (also known as sexagesimal), the number 1.24 51 10 reads as 1.41421296296 in decimal.
This matches √2 up to the sixth digit, meaning a 99.9999% accuracy!
The computational accuracy is stunning. To appreciate this, pick up a pen and try to reproduce this without a calculator. It’s not that easy!
Here is how the ancient Babylonians did it.
We start by picking a number x₀ between 1 and √2. I know, this feels random, but let’s just roll with it for now. One such example is 1.2, which is going to be our first approximation.
Because of this, 2/x₀ is larger than √2.
Thus, the interval [x₀, 2/x₀] envelopes √2.
From this, it follows that the mid-point of the interval [x₀, 2/x₀] is a better approximation to √2. As you can see in the figure below, this is significantly better!
Let's define x₁ by this.
Continuing on this thread, we can define an approximating sequence by taking the midpoints of such intervals.
Here are the first few terms of the sequence. Even the third member is a surprisingly good approximation.
If we put these numbers on a scatterplot, we practically need a microscope to tell the difference from √2 after a few steps.
Were the Babylonians just lucky, or did they hit the nail right on the head?
"How large that number in the Law of Large Numbers is?"
Sometimes, a thousand samples are large enough. Sometimes, even ten million samples fall short.
How do we know? I'll explain.
First things first: the law of large numbers (LLN).
Roughly speaking, it states that the averages of independent, identically distributed samples converge to the expected value, given that the number of samples grows to infinity.
We are going to dig deeper.
There are two kinds of LLN-s: weak and strong.
The weak law makes a probabilistic statement about the sample averages: it implies that the probability of "the sample average falling farther from the expected value than ε" goes to zero for any ε.
The single biggest argument about statistics: is probability frequentist or Bayesian? It's neither, and I'll explain why.
Buckle up. Deep-dive explanation incoming.
First, let's look at what is probability.
Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone.
In the language of probability theory, the events are formalized by sets within an event space.
(The event space is also a set, usually denoted by Ω.)