I little thread on how Gaussian (aka Normal) distribution arises, why it seems to be everywhere and why under closer inspection it is almost nowhere. Jump in👇
Gaussian distribution is so prevalent, because it arises as a limit of averaging out many independent random variables with finite variance. This fundamental law of statistics is called a Central Limit Theorem or CLT. en.wikipedia.org/wiki/Central_l…
This can be very well seen in a little simulation below: 200x400 grid (so 80000) independent random variables with uniform distribution (-0.5,0.5) are simulated and a histogram of averages are taken and plotted below, clearly showing bell curve as expected.
So we are done now? What else is there to say? Well... there are two main assumptions of CLT. First that all individual variables have finite variance. And that is most of the time the case. But second that they are independent. And that is when things are a little complicated.
Because when there is even tiny bit of dependence introduced in these variables, CLT falls apart. I simulate this below by adding a small bias to all the random cells, nothing even noticeable by eye. But suddenly the averages explode into the tail of distribution:
6 sigma, 10 sigma, 13 sigma these events should pretty much be impossible under normal distribution. A 10-Sigma would be an event that happens once every 5.249e+020 years (that's half a Septillion). But of course with slight dependence the mean of these vars is no longer Gaussian
And that often happens in real world - here everything is pretty much always slightly dependent. But often that dependence is so weak, CLT still works, and statisticians are happy, models work, and everything is great. But every now and then things suddenly get dependent.
E.g. in stock market, an index is a combination of individual stocks whose prices are mostly independent and so often behaves like a gaussian random walk. Until of course an event occurs that affects all these companies and suddenly they are dependent and you see a 10 sigma jump.
This should be taught in every statistics class as literally the first thing after CLT. But often isn't. And hence people misuse statistics and apply wrong models to complex data. Read more from @nntaleb who was an inspiration for this little thread.
BTW: Here is the code snippet I wrote to generate these animations if you want to fiddle with it github.com/piekniewski/ra…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Small thread: 👇1⃣ The surprising success of machine learning in the last decade stems from the fact that in many tasks roughly 80-90% of variance for classification/regression is contained in relatively low level statistical features.
2⃣ On the flip side, the source of apparent failure of ML techniques in robotics (Moravecs paradox) and mission critical autonomy is because the remaining 10% of variance is hidden in hellish complexity. Hence the march of 9's is becoming much harder with every 9.
3⃣ To get over that final 9% and then the last 1% w need much more sophisticated system that can actually create relevant, physically reasonable models of precesses that generate the data. As in brains. The entire AI craze of the last 10+ years seems to indicate
The shutdown of Argo AI is certainly a silent signpost on the road to autonomy. But just as I been saying before, the road to autonomy we follow now has more than one step, in which the best idea how to solve it can be summarized by:
¯\_(ツ)_/¯
It all got ignited by DARPA challenge in 2005 in which a few teams managed to get a car through tens of miles of a desert road. Got noticed by Google, money started flowing. Next what followed was a deep learning revolution which at least on the surface provided the missing magic
Perception modules could be now way better than before and it seemed like with enough training, enough priors somehow injected into track-planner and enough mapping data to augment the SLAM will be just enough to pull this off.
For years I've been claiming that our #AI being based in statistics is not equipped for dealing real world data. That data is most likely fat-tailed, and hence not possible to characterize by sampling. Let me explain this in thread below: 👇
First of all, the basics: random distributions can either be short-tailed (finite variance such as Gaussian) or fat tailed (e.g. Pareto). The nice property of finite variance distributions is that we can learn almost everything about the distribution by sampling it.
For example look at the running mean estimated below:
A long time ago (1980's) in a country far far away (Poland) in reality that no longer exists, there was a science show on TV. A show unlike any other, unique to this day: Sonda. Thread👇
1/9 Each episode was focused on some piece of modern technology or recent scientific discovery. However, in reality, the technology wasn't ever the star of the show. It was the dialog between the hosts that really made this show unique.
2/9 Whatever the topic may have been one of the hosts always took the role of an enthusiast and the other a natural adversarial role of a skeptic. Each episode was like a trial in which the hosts were the prosecutor and the defense, and you, the viewer were the judge.
For safety-critical applications, how things fail is potentially more important than how often they fail. Thread 👇
1/9 For example, a diving regulator valve can potentially freeze over. But it's made in a way that if it does, it does it in an open position, feeding air constantly rather than just when the diver inhales. That way, the diver immediately knows something is wrong and he needs to
2/9 ascend and more importantly he is not cut off from the air he needs to breathe. Numerous mechanical and electrical systems are engineered with similar fail cases in mind. This is in general called "fail-safe" approach.
I've had a lengthy discussion today about the SpaceX mission. I'm observing the following - people act like this is some kind of breakthrough, at the scale almost of the Apollo program. Now I don't want to demean the effort, space is hard, but the reality as I see it is:
We are sending a capsule on top of a rocket to LEO. This has been done thousands of times by various space programs, and was done regularly in the US in the 60's. Aside from some external sugar such as LCD screens, this technology hasn't fundamentally changed since then.
In many ways the space shuttle was a much more impressive vehicle. Much larger, capable of randevouz missions (remember the Hubble telescope service missions?), in principle mostly reusable - remember SS only destroyed the big fuel tank. In theory ALL of the engines -most $$ part