Tivadar Danka Profile picture
Oct 15 30 tweets 8 min read
"Probability is the logic of science."

There is a deep truth behind this conventional wisdom: probability is the mathematical extension of logic, augmenting our reasoning toolkit with the concept of uncertainty.

In-depth exploration of probabilistic thinking incoming.
Our journey ahead has three stops:

1. an introduction to mathematical logic,
2. a touch of elementary set theory,
3. and finally, understanding probabilistic thinking.

First things first: mathematical logic.
In logic, we work with propositions.

A proposition is a statement that is either true or false, like

• "it's raining outside",
• "the sidewalk is wet".

These are often abbreviated as variables, such as A = "it's raining outside".
We can formulate complex propositions from smaller building blocks with logical connectives.

Consider the proposition "if it is raining outside, then the sidewalk is wet". This is the combination of two propositions, connected by the implication connective.
There are four essential connectives:

• NOT (¬), also known as negation,
• AND (∧), also known as conjunction,
• OR (∨), also known as disjunction,
• THEN (→), also known as implication.
Connectives are defined by the truth values of the resulting propositions. For instance, if A is true, then NOT A is false; if A is false, then NOT A is true.

Denoting true by 1 and false by 0, we can describe connectives with truth tables. Here is the one for negation (¬).
AND (∧) and OR (∨) connect two propositions. A ∧ B is true if both A and B are true, and A ∨ B is true if either one is.
The implication connective THEN (→) formalizes the deduction of a conclusion B from a premise A.

By definition, A → B is true if B is true or both A and B are false.

An example: if "it's raining outside", THEN "the sidewalk is wet".
Science is just the collection of complex propositions like "if X is a closed system, THEN the entropy of X cannot decrease". (As the 2nd law of thermodynamics states.)

The entire body of scientific knowledge is made of A → B propositions.
In practice, our thinking process is the following.

"I know that A → B is true and A is true. Therefore, B must be true as well."

This is called modus ponens, the cornerstone of scientific reasoning.
(If you don't understand modus ponens, take a look at the truth table of the → connective, a few tweets above.

The case when A → B is true and A is true is described by the very first row, which can only happen if B is true as well.)
Logical connectives can be translated to the language of sets.

Union (∪) and intersection (∩), two fundamental operations, are particularly relevant for us.

Notice how similar the symbols for AND (∧) and intersection (∩) are? This is not an accident.
By definition, any element 𝑥 is the element of A ∩ B if and only if (𝑥 is an element of A) AND (𝑥 is an element of B).

Similarly, union corresponds to the OR connective.
What's most important for us is that the implication connective THEN (→) corresponds to the "subset of" relation, denoted by the ⊆ symbol.
Now that we understand how to formulate scientific truths as "premise → conclusion" statements and see how this translates to sets, we are finally ready to talk about probability.

What is the biggest flaw of mathematical logic?
We rarely have all the information to decide if a proposition is true or false.

Consider the following: "it'll rain tomorrow". During the rainy season, all we can say is that rain is more likely, but tomorrow can be sunny as well.
Probability theory generalizes classical logic by measuring truth on a scale between 0 and 1, where 0 is false and 1 is true.

If the probability of rain tomorrow is 0.9, it means that rain is significantly more likely, but not absolutely certain.
Instead of propositions, probability operates on events. In turn, events are represented by sets.

For example, if I roll a dice, the event "the result is less than five" is represented by the set A = {1, 2, 3, 4}.

In fact, P(A) = 4/6. (P denotes the probability of an event.)
As discussed earlier, the logical connectives AND and OR correspond to basic set operations: AND is intersection, OR is union.

This translates to probabilities as well.
How can probability be used to generalize the logical implication?

A "probabilistic A → B" should represent the likelihood of B, given that A is observed.

This is formalized by conditional probability.
(If you want to know more about conditional probabilities, here is a brief explainer.)

At the deepest level, the conditional probability P(B | A) is the mathematical formulation of our belief in the hypothesis B, given empirical evidence A.

A high P(B | A) makes B more likely to happen, given that A is observed.
On the other hand, a low P(B | A) makes B less likely to happen when A occurs as well.

This is why probability is called the logic of science.
To give you a concrete example, let's go back to the one mentioned earlier: the rain and the wet sidewalk. For simplicity, denote the events by

A = "the sidewalk is wet",
B = "it's raining outside".
The sidewalk can be wet for many reasons, say the neighbor just watered the lawn. Yet, the primary cause of a wet sidewalk is rain, so P(B | A) is close to 1.

If somebody comes in and tells you that the sidewalk is wet, it is safe to infer rain.
Probabilistic inference like the above is the foundation of machine learning.

For instance, the output of (most) classification models is the distribution of class probabilities, given an observation.
To wrap up, here is how Maxwell — the famous physicist — thinks about probability.

"The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on."
"Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man's mind."

(James Clerk Maxwell)

By now, you can fully understand what Maxwell meant.
Read the unrolled thread here:

tivadardanka.com/blog/probabili…
If you have enjoyed this thread, share it with your friends and follow me!

I regularly post deep-dive explanations about mathematics and machine learning.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

Oct 12
The single biggest argument about statistics: is probability frequentist or Bayesian?

It's both, and I'll explain why.

Buckle up. Deep-dive thread below.
First, let's look at how probability behaves.

Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation.
In the language of mathematics, the events are formalized by sets within an event space. (The event space is also a set.)
Read 35 tweets
Oct 6
One of the coolest ideas in mathematics is the estimation of a shape's area by throwing random points at it.

Don't believe this works? Check out the animation below, where I show the method on the unit circle. (Whose area equals to π.)

Here is what's behind the magic.
Let's make this method precise!

The first step is to enclose our shape S in a square.

You can imagine this as a rectangular dartboard.
Now, we select random points from the board and count how many hit the target.

Again, you can imagine this as closing your eyes, doing a 360° spin, then launching a dart.

(Suppose that you always hit the board. Yes, I know. But in math, reality doesn't limit imagination.)
Read 16 tweets
Sep 16
Matrix factorizations are the pinnacle results of linear algebra.

From theory to applications, they are behind many theorems, algorithms, and methods. However, it is easy to get lost in the vast jungle of decompositions.

This is how to make sense of them.
We are going to study three matrix factorizations:

1. the LU decomposition,
2. the QR decomposition,
3. and the Singular Value Decomposition (SVD).

First, we'll take a look at LU.
1. The LU decomposition.

Let's start at the very beginning: linear equation systems.

Linear equations are surprisingly effective in modeling real-life phenomena: economic processes, biochemical systems, etc.
Read 19 tweets
Sep 15
Differentiation is not as straightforward in multiple variables as in a single variable.

For instance, have you realized that directly adapting the definition doesn't make sense?

Let's set this straight. This is what differentiation is in higher dimensions!
First, let's go back to square one.

If you intersect a function's graph at two points, the slope of the resulting line is the ratio of changes in the y and the x coordinates.
These are called difference quotients. Why are they important?

For instance, if the graph describes the motion of an object, the slope of the secant describes the average speed between two points in time.
Read 15 tweets
Aug 23
In calculus, going from a single variable to millions of variables is hard.

Understanding the three main types of functions helps make sense of multivariable calculus.

Surprisingly, they share a deep connection. Let's see why!
In general, a function assigns elements of one set to another.

This is too abstract for most engineering applications. Let's zoom in a little!
As our measurements are often real numbers, we prefer functions that operate on real vectors or scalars.

There are three categories:

1. vector-scalar,
2. vector-vector,
3. and scalar-vector.
Read 17 tweets
Jul 27
What you see below is one of the most beautiful formulas in mathematics.

A single equation, establishing a relation between 𝑒, π, the imaginary number, and 1. It is mind-blowing.

This is what's behind the sorcery:
First, let's go back to square one: differentiation.

The derivative of a function at a given point describes the slope of its tangent plane.
By definition, the derivative is the limit of difference quotients: slopes of line segments that get closer and closer to the tangent.

These quantities are called "difference quotients".
Read 21 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(