Understanding the Historical Divide Between Machine Learning and Statistics
On social media, it's common to encounter strong reactions to statements like "Machine learning is not statistics."
Much of this stems from a lack of historical context about the development of ML as a field.
It's important to note that the modern foundations of machine learning were largely shaped in places like the USSR and the USA—not the UK.
While Alan Turing’s legacy is significant, the UK's direct contributions to core ML theory during the 20th century were almost non-existent.
For example, the first dedicated machine learning department in the UK, founded at Royal Holloway (RHUL), was built by prominent figures from elsewhere—Vladimir Vapnik and Alexey Chervonenkis from the USSR, Ray Solomonoff from the US, and others.
To clarify the distinction:
On one side, you had the Institute of Control Sciences in the USSR, a powerhouse that developed many of ML’s theoretical foundations—Statistical Learning Theory, VC dimension, Support Vector Machines, and kernel methods.
On the other side, the Central Economic Mathematical Institute (CEMI) focused on statistics and economics, producing luminaries like Nobel laureate Leonid Kantorovich. However, their work was not directly tied to machine learning.
The historical separation between ML and statistics wasn’t just geographical—it was conceptual. ML papers were often rejected by traditional statistics journals because the two fields had different goals, methodologies, and assumptions.
So the next time someone confidently asserts that "statistics is ML," it may be worth encouraging them to look at the actual history of both disciplines before drawing conclusions.
#machinelearning #statistics
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Many data scientists don't truly understand forecasting.
They never did.
Forecasting is fundamentally different from general data science or typical machine learning.
Those who excel at forecasting often come from a strong econometrics background, understanding deeply rooted concepts like autoregression, stationarity, and lagged dependencies—ideas established nearly a century ago by statisticians like Yule.
This is why generalist ML researchers keep failing at forecasting. They continuously attempt to reinvent time series analysis with methods like 'time series LLMs' or tools like Facebook Prophet, often without grasping the fundamental laws and unique dynamics governing time series data.
Bayesianism isn’t just a misguided methodology — it’s a cargo cult, dressed in equations, pretending to be science.
From the beginning, the intellectual titans of statistics rejected it.
Sir Ronald Fisher — the man who gave us maximum likelihood, experimental design, and modern statistical inference — openly mocked Bayesianism as absurd and dangerously unscientific.
Jerzy Neyman and Egon Pearson, who built the foundations of hypothesis testing, had no use for it either.
🌟 Spectral Entropy: The "Musical Playlist" of Data Science 🎵
Ever wondered how scientists distinguish a calm brain from a chaotic one or predict stock market crashes? The answer lies in spectral entropy—a powerful tool that measures the "rhythm" of chaos in data. Let’s dive in!
🔍 What Is Spectral Entropy? Think "Radio Stations"!
Imagine tuning a radio:
Low Spectral Entropy = One clear station (e.g., classical music). All energy is focused, like a heartbeat or a pendulum. 🎻
High Spectral Entropy = Static noise.
Energy is scattered across frequencies, like a random walk or chaotic brainwaves. 📻💥
Technically, it’s Shannon entropy applied to a signal’s power spectrum, quantifying how “spread out” energy is across frequencies.
📈 Andrey Markov and the Birth of Stochastic Chains
In 1906, Russian mathematician Andrey Markov introduced a revolutionary idea: modeling dependent sequences of events using what we now call Markov chains.
At a time when probability was largely limited to independent events like coin flips or dice rolls, Markov broke new ground. He showed how we could still apply the laws of probability – such as the law of large numbers – to systems where each event depends on the previous one.
His famous 1913 analysis of vowel/consonant patterns in Pushkin’s Eugene Onegin wasn't just poetic; it proved that dependency didn’t invalidate statistical convergence. As one historian put it: Markov “founded a new branch of probability theory by applying mathematics to poetry.”
🚀 From Napoleon’s Era to Your Smartphone: The 200-Year Saga of the Fourier Transform 🔥
🌍 18th Century: A Mathematical Mystery
Before Fourier, giants like Euler and Bernoulli dared to ask: “Can complex vibrations be built from simple waves?” But it was all speculation—no proof, just raw genius chasing an idea too wild to tame.
🔥 1807: Fourier Drops a Bomb
Enter Joseph Fourier, a man so bold he told Napoleon’s skeptical elite (Lagrange, Laplace) that ANY function could be shattered into sines and cosines. Critics scoffed. “Impossible!” they cried.
🧠 When Kolmogorov–Arnold Networks (KANs) dropped last year, I said it loud and clear: this is one of the most innovative technologies of 2024.
At the time, a few skeptics scoffed.
Some made lurid claims about how KANs wouldn’t scale, wouldn’t generalize, and certainly wouldn’t touch Transformers.
📅 Fast forward to 2025 — and let’s just say the skeptics have not only eaten their hats… they’ve had seconds.
🚨 The latest proof? iTFKAN — a new paper from China (Interpretable Time Series Forecasting with Kolmogorov–Arnold Networks) — which shows that KANs don’t just compete with Transformer-based models like Informer, Autoformer, and FEDformer…