Valeriy M., PhD, MBA, CQF Profile picture
Jun 20 7 tweets 2 min read Read on X
Understanding the Historical Divide Between Machine Learning and Statistics

On social media, it's common to encounter strong reactions to statements like "Machine learning is not statistics."

Much of this stems from a lack of historical context about the development of ML as a field.Image
It's important to note that the modern foundations of machine learning were largely shaped in places like the USSR and the USA—not the UK.

While Alan Turing’s legacy is significant, the UK's direct contributions to core ML theory during the 20th century were almost non-existent.
For example, the first dedicated machine learning department in the UK, founded at Royal Holloway (RHUL), was built by prominent figures from elsewhere—Vladimir Vapnik and Alexey Chervonenkis from the USSR, Ray Solomonoff from the US, and others.

To clarify the distinction:
On one side, you had the Institute of Control Sciences in the USSR, a powerhouse that developed many of ML’s theoretical foundations—Statistical Learning Theory, VC dimension, Support Vector Machines, and kernel methods.
On the other side, the Central Economic Mathematical Institute (CEMI) focused on statistics and economics, producing luminaries like Nobel laureate Leonid Kantorovich. However, their work was not directly tied to machine learning.
The historical separation between ML and statistics wasn’t just geographical—it was conceptual. ML papers were often rejected by traditional statistics journals because the two fields had different goals, methodologies, and assumptions.
So the next time someone confidently asserts that "statistics is ML," it may be worth encouraging them to look at the actual history of both disciplines before drawing conclusions.

#machinelearning #statistics

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Valeriy M., PhD, MBA, CQF

Valeriy M., PhD, MBA, CQF Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @predict_addict

May 20
Many data scientists don't truly understand forecasting.

They never did.

Forecasting is fundamentally different from general data science or typical machine learning. Image
Those who excel at forecasting often come from a strong econometrics background, understanding deeply rooted concepts like autoregression, stationarity, and lagged dependencies—ideas established nearly a century ago by statisticians like Yule.
This is why generalist ML researchers keep failing at forecasting. They continuously attempt to reinvent time series analysis with methods like 'time series LLMs' or tools like Facebook Prophet, often without grasping the fundamental laws and unique dynamics governing time series data.
Read 5 tweets
May 17
Bayesianism: The Cargo Cult of Modern Statistics Image
Bayesianism isn’t just a misguided methodology — it’s a cargo cult, dressed in equations, pretending to be science.

From the beginning, the intellectual titans of statistics rejected it.
Sir Ronald Fisher — the man who gave us maximum likelihood, experimental design, and modern statistical inference — openly mocked Bayesianism as absurd and dangerously unscientific.

Jerzy Neyman and Egon Pearson, who built the foundations of hypothesis testing, had no use for it either.
Read 13 tweets
May 3
🌟 Spectral Entropy: The "Musical Playlist" of Data Science 🎵
Ever wondered how scientists distinguish a calm brain from a chaotic one or predict stock market crashes? The answer lies in spectral entropy—a powerful tool that measures the "rhythm" of chaos in data. Let’s dive in! Image
🔍 What Is Spectral Entropy? Think "Radio Stations"!

Imagine tuning a radio:

Low Spectral Entropy = One clear station (e.g., classical music). All energy is focused, like a heartbeat or a pendulum. 🎻
High Spectral Entropy = Static noise.
Energy is scattered across frequencies, like a random walk or chaotic brainwaves. 📻💥
Technically, it’s Shannon entropy applied to a signal’s power spectrum, quantifying how “spread out” energy is across frequencies.

🧠 The Minds Behind the Magic

Spectral entropy wasn’t born in a vacuum!
Read 9 tweets
May 2
📈 Andrey Markov and the Birth of Stochastic Chains

In 1906, Russian mathematician Andrey Markov introduced a revolutionary idea: modeling dependent sequences of events using what we now call Markov chains. Image
At a time when probability was largely limited to independent events like coin flips or dice rolls, Markov broke new ground. He showed how we could still apply the laws of probability – such as the law of large numbers – to systems where each event depends on the previous one.
His famous 1913 analysis of vowel/consonant patterns in Pushkin’s Eugene Onegin wasn't just poetic; it proved that dependency didn’t invalidate statistical convergence. As one historian put it: Markov “founded a new branch of probability theory by applying mathematics to poetry.”
Read 7 tweets
Apr 27
🚀 From Napoleon’s Era to Your Smartphone: The 200-Year Saga of the Fourier Transform 🔥

🌍 18th Century: A Mathematical Mystery Image
Before Fourier, giants like Euler and Bernoulli dared to ask: “Can complex vibrations be built from simple waves?” But it was all speculation—no proof, just raw genius chasing an idea too wild to tame.

🔥 1807: Fourier Drops a Bomb
Enter Joseph Fourier, a man so bold he told Napoleon’s skeptical elite (Lagrange, Laplace) that ANY function could be shattered into sines and cosines. Critics scoffed. “Impossible!” they cried.
Read 11 tweets
Apr 25
🧠 When Kolmogorov–Arnold Networks (KANs) dropped last year, I said it loud and clear: this is one of the most innovative technologies of 2024.

At the time, a few skeptics scoffed. Image
Some made lurid claims about how KANs wouldn’t scale, wouldn’t generalize, and certainly wouldn’t touch Transformers.

📅 Fast forward to 2025 — and let’s just say the skeptics have not only eaten their hats… they’ve had seconds.
🚨 The latest proof? iTFKAN — a new paper from China (Interpretable Time Series Forecasting with Kolmogorov–Arnold Networks) — which shows that KANs don’t just compete with Transformer-based models like Informer, Autoformer, and FEDformer…

They slay them.
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(