Follow @paraschopra

12,399 views

Paras Chopra

Follow @paraschopra

, 16 tweets, 3 min read

My Authors

All 🧮 models are wrong, but some are useful.

(a thread unpacking this brilliant idea)

1/ What did the British statistician George Box mean when he wrote these now-famous words in 1976, and how is it relevant to you?

en.wikipedia.org/wiki/George_E.…

2/ Whenever we try understanding the world around us – be it our customers’ behavior, or how stars circle the center of a galaxy, or how coronavirus affects the human body – we never have direct and full access to the underlying reality.

3/ Instead, what we have access to is only the data generated by the process that we want to understand.

4/ So, for example, we know that a customer clicked on one button but ignored the other one.

Based on this data point, we now infer what sort of customer she must be and predict her needs are and how we can fulfill them.

5/ Notice that the data hasn’t told you a lot – it’s sterile, a mere bunch of numbers.

But when you combine the fact that a customer clicked on the button with your assumptions about how people behave, you get the magical ability to predict the future (e.g. what she'll buy).

6/ But your assumptions are not equal to reality and that’s why Box said: “all models are wrong”.

To understand this clearly, look at the following chart:

7/ Imagine you start observing the data from the start (bottom left) and you move in time and collect more data points (up to the blue circle).

At this stage, you have to decide whether future data points will lie on the red curve or the black curve.

Which one will you pick?

8/ Actually, in this case, even though black and red curves are generated through two very DIFFERENT realities (see their equations), there’s no way any amount of historical data can help you choose because all data up to blue circle is CONSISTENT with both.

9/ Data cannot help you select between hypotheses, it can only help you eliminate.

The idea that theories can never be proved true, but only be shown to be wrong is core to how science is done.

10/ Scientists definitely know which theories are wrong, but they never know for certain which theories are right.

(Surprisingly, that’s also how VCs work: while investing, they know for sure which companies are “duds” but they never know which ones are going to be “unicorns”).

11/ Box also said: “some (models) are useful”.

Notice that he didn’t say some models are correct.

He used the term “useful”.

12/ The usefulness of models points to their ability to predict the future.

Scientific laws (like Newton’s law of gravitation) are models that help us predict solar eclipses hundreds of years ahead.

13/ Newton's laws work quite well for this purpose while another model like throwing darts to predict eclipses will fail horribly.

So even though both models are wrong, Newton’s law gives us more mileage because it’s proven to be useful in a variety of contexts *that matter*.

14/ TLDR:

- Don’t shoot for being right because there’s no such thing as the “correct” assumptions. Shoot for having “useful” assumptions.

- Data alone is sterile. Whenever you think data is giving you insights, it’s actually the data+your assumptions that are informing you.

15/ That's it! Hope you enjoyed it :)

If you have feedback or comments, do reply.

This thread is a cross-post from my monthly letter on VWO list vwo.com/blog/probabili…

Try unrolling a thread yourself!

More from @paraschopra see all

Embed code for your website

Did Thread Reader help you today?