I little thread on how Gaussian (aka Normal) distribution arises, why it seems to be everywhere and why under closer inspection it is almost nowhere. Jump in👇 Image
Gaussian distribution is so prevalent, because it arises as a limit of averaging out many independent random variables with finite variance. This fundamental law of statistics is called a Central Limit Theorem or CLT. en.wikipedia.org/wiki/Central_l… Image
This can be very well seen in a little simulation below: 200x400 grid (so 80000) independent random variables with uniform distribution (-0.5,0.5) are simulated and a histogram of averages are taken and plotted below, clearly showing bell curve as expected.
So we are done now? What else is there to say? Well... there are two main assumptions of CLT. First that all individual variables have finite variance. And that is most of the time the case. But second that they are independent. And that is when things are a little complicated.
Because when there is even tiny bit of dependence introduced in these variables, CLT falls apart. I simulate this below by adding a small bias to all the random cells, nothing even noticeable by eye. But suddenly the averages explode into the tail of distribution:
6 sigma, 10 sigma, 13 sigma these events should pretty much be impossible under normal distribution. A 10-Sigma would be an event that happens once every 5.249e+020 years (that's half a Septillion). But of course with slight dependence the mean of these vars is no longer Gaussian
And that often happens in real world - here everything is pretty much always slightly dependent. But often that dependence is so weak, CLT still works, and statisticians are happy, models work, and everything is great. But every now and then things suddenly get dependent.
E.g. in stock market, an index is a combination of individual stocks whose prices are mostly independent and so often behaves like a gaussian random walk. Until of course an event occurs that affects all these companies and suddenly they are dependent and you see a 10 sigma jump.
This should be taught in every statistics class as literally the first thing after CLT. But often isn't. And hence people misuse statistics and apply wrong models to complex data. Read more from @nntaleb who was an inspiration for this little thread.
BTW: Here is the code snippet I wrote to generate these animations if you want to fiddle with it github.com/piekniewski/ra…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Filip Piekniewski🌻 🐘:@filippie509@techhub.social

Filip Piekniewski🌻 🐘:@filippie509@techhub.social Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @filippie509

Feb 5
Small thread: 👇1⃣ The surprising success of machine learning in the last decade stems from the fact that in many tasks roughly 80-90% of variance for classification/regression is contained in relatively low level statistical features.
2⃣ On the flip side, the source of apparent failure of ML techniques in robotics (Moravecs paradox) and mission critical autonomy is because the remaining 10% of variance is hidden in hellish complexity. Hence the march of 9's is becoming much harder with every 9.
3⃣ To get over that final 9% and then the last 1% w need much more sophisticated system that can actually create relevant, physically reasonable models of precesses that generate the data. As in brains. The entire AI craze of the last 10+ years seems to indicate
Read 9 tweets
Oct 26, 2022
The shutdown of Argo AI is certainly a silent signpost on the road to autonomy. But just as I been saying before, the road to autonomy we follow now has more than one step, in which the best idea how to solve it can be summarized by:

¯\_(ツ)_/¯
It all got ignited by DARPA challenge in 2005 in which a few teams managed to get a car through tens of miles of a desert road. Got noticed by Google, money started flowing. Next what followed was a deep learning revolution which at least on the surface provided the missing magic
Perception modules could be now way better than before and it seemed like with enough training, enough priors somehow injected into track-planner and enough mapping data to augment the SLAM will be just enough to pull this off.
Read 15 tweets
Aug 16, 2022
For years I've been claiming that our #AI being based in statistics is not equipped for dealing real world data. That data is most likely fat-tailed, and hence not possible to characterize by sampling. Let me explain this in thread below: 👇
First of all, the basics: random distributions can either be short-tailed (finite variance such as Gaussian) or fat tailed (e.g. Pareto). The nice property of finite variance distributions is that we can learn almost everything about the distribution by sampling it.
For example look at the running mean estimated below:
Read 13 tweets
Apr 30, 2022
A long time ago (1980's) in a country far far away (Poland) in reality that no longer exists, there was a science show on TV. A show unlike any other, unique to this day: Sonda. Thread👇
1/9 Each episode was focused on some piece of modern technology or recent scientific discovery. However, in reality, the technology wasn't ever the star of the show. It was the dialog between the hosts that really made this show unique.
2/9 Whatever the topic may have been one of the hosts always took the role of an enthusiast and the other a natural adversarial role of a skeptic. Each episode was like a trial in which the hosts were the prosecutor and the defense, and you, the viewer were the judge.
Read 11 tweets
Apr 3, 2022
For safety-critical applications, how things fail is potentially more important than how often they fail. Thread 👇
1/9 For example, a diving regulator valve can potentially freeze over. But it's made in a way that if it does, it does it in an open position, feeding air constantly rather than just when the diver inhales. That way, the diver immediately knows something is wrong and he needs to
2/9 ascend and more importantly he is not cut off from the air he needs to breathe. Numerous mechanical and electrical systems are engineered with similar fail cases in mind. This is in general called "fail-safe" approach.
Read 10 tweets
May 28, 2020
I've had a lengthy discussion today about the SpaceX mission. I'm observing the following - people act like this is some kind of breakthrough, at the scale almost of the Apollo program. Now I don't want to demean the effort, space is hard, but the reality as I see it is:
We are sending a capsule on top of a rocket to LEO. This has been done thousands of times by various space programs, and was done regularly in the US in the 60's. Aside from some external sugar such as LCD screens, this technology hasn't fundamentally changed since then.
In many ways the space shuttle was a much more impressive vehicle. Much larger, capable of randevouz missions (remember the Hubble telescope service missions?), in principle mostly reusable - remember SS only destroyed the big fuel tank. In theory ALL of the engines -most $$ part
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(