Experienced Data Science Leader | PhD in Machine Learning | 4x Author | Black Belt 🥋 in Time Series | Chief Conformal Prediction Promoter| Mathematician |
4 subscribers
Sep 2 • 5 tweets • 1 min read
A Perfect 2025 Startup Opportunity: Replacing Fossilized Scikit-Learn with a Next-Gen ML Toolkit
Scikit learn in 2025 is a fossil —slow, CPU-bound, and hard to scale.
The ecosystem has moved forward with specialized libraries (XGBoost, LightGBM, CatBoost), distributed frameworks (Dask), and AutoML tools. But there’s still no unified, modern toolkit that combines:
🚀 High performance: GPU-native, parallelized, edge-friendly.
Aug 27 • 11 tweets • 2 min read
Taming the Chaos: Your Statistical Safety Net
Forget everything you know about the gentle, predictable bell curve. What if your data is wild, skewed, or just plain weird?
How can you make any predictions then? Enter Chebyshev’s Inequality —probability's most reliable safety net.
This powerful tool doesn't ask about your data's shape. It doesn't care if it's normal, uniform, or looks like a mountain range on Mars.
Aug 26 • 12 tweets • 2 min read
Why the Gaussian Distribution Naturally Emerges from the Maximum Entropy Principle
One of the most fascinating things about the Maximum Entropy Principle (MaxEnt) is how it gives rise to some of the most fundamental distributions in probability and statistics.
Among them, the Gaussian distribution (a.k.a. the bell curve) is perhaps the most iconic. But why does it appear so often? And why does MaxEnt single it out?
The answer lies in a simple fact: the Gaussian is the least biased choice when you know only two things— the mean and the variance.
Aug 26 • 7 tweets • 2 min read
The Night That Forged Modern Algebra
On the night of May 29, 1832, a 20-year-old French revolutionary sat at his desk, staring at a stack of paper. He wasn't writing a manifesto or a farewell letter to a lover.
He was racing against the sunrise, feverishly scribbling equations that would change the world.
His name was Évariste Galois. By the next evening, he would be dead, shot in a pointless duel over a woman's honor.
Aug 24 • 18 tweets • 3 min read
The most honest math you'll ever use. 🤝
The Maximum Entropy Principle (MaxEnt) is a genius rule for modeling with incomplete data. It tells you to:
✅ Respect what you KNOW.
🚫 Assume nothing else.
It's the reason behind: Logistic Regression, Gaussian distributions, and smarter AI exploration.
Here’s how it works and why it matters: 👇
Think about the last time you built a model. You had some data—a few key averages, a couple of constraints. But beyond that?
Aug 23 • 7 tweets • 2 min read
Why the Central Limit Theorem Misleads You - Exactly Where It Matters Most
Everyone learns the Central Limit Theorem (CLT):
“Add up a lot of independent random variables, and the distribution looks Gaussian.”
It’s true — but dangerously incomplete.
The Problem
The CLT describes what happens near the average, within fluctuations of order √n. That’s the “bulk” of the distribution.
But what about the rare events — the tails?
The CLT says nothing about them.
Aug 3 • 15 tweets • 3 min read
🧠 Grigori Perelman, the Poincaré Conjecture, and What Academic Integrity Demands
In the early 2000s, Russian mathematician Grigori Perelman published a solution to the Poincaré Conjecture, a century-old problem and one of the Clay Millennium Prize challenges.
His work was brilliant, concise, and transformative.
And yet—he rejected both the Fields Medal (2006) and the $1 million Millennium Prize (2010).
Jul 25 • 5 tweets • 1 min read
🔍 Understanding Entropy and Mutual Information in One Diagram
This Venn diagram is a great way to visualize how entropy, conditional entropy, and mutual information relate for two random variables, X and Y:
📌 Key Concepts:
H(X): Total uncertainty in X (left circle)
* H(Y): Total uncertainty in Y (right circle)
* H(X, Y): Joint uncertainty in the pair (X, Y) — the full union of both circles
* H(X|Y): What we still don't know about X after knowing Y (left-only part)
* H(Y|X): What we still don't know about Y after knowing X (right-only part)
Jul 25 • 9 tweets • 2 min read
How a Feud Between Mathematicians Birthed Markov Chains—and Revolutionized Probability
Picture this: Russia, 1906. Two brilliant mathematicians are locked in a heated debate. On one side, Pavel Nekrasov insists that Central Limit Theorem only works under strict independence.
On the other, Andrey Markov - sharp, stubborn, and about to make history—declares: "Not so fast."
What followed wasn’t just a war of words. It was the birth of Markov chains, a concept so powerful it reshaped randomness itself.
Jul 12 • 6 tweets • 1 min read
Fourier’s Vision, Kolmogorov’s Counterexample
Joseph Fourier boldly claimed that any function could be represented as a sum of sines and cosines — a Fourier series.
His insight revolutionized physics and mathematics, but it came with a major flaw: a lack of rigor.
Fourier provided little justification for when such series converge or what kinds of functions they truly represent.
For decades, mathematicians worked to shore up the theory he had opened.
Then came Andrey Kolmogorov.
In 1923, at just 20 years old, Kolmogorov constructed a function in
Jun 20 • 7 tweets • 2 min read
Understanding the Historical Divide Between Machine Learning and Statistics
On social media, it's common to encounter strong reactions to statements like "Machine learning is not statistics."
Much of this stems from a lack of historical context about the development of ML as a field.
It's important to note that the modern foundations of machine learning were largely shaped in places like the USSR and the USA—not the UK.
While Alan Turing’s legacy is significant, the UK's direct contributions to core ML theory during the 20th century were almost non-existent.
May 20 • 5 tweets • 1 min read
Many data scientists don't truly understand forecasting.
They never did.
Forecasting is fundamentally different from general data science or typical machine learning.
Those who excel at forecasting often come from a strong econometrics background, understanding deeply rooted concepts like autoregression, stationarity, and lagged dependencies—ideas established nearly a century ago by statisticians like Yule.
May 17 • 13 tweets • 3 min read
Bayesianism: The Cargo Cult of Modern Statistics
Bayesianism isn’t just a misguided methodology — it’s a cargo cult, dressed in equations, pretending to be science.
From the beginning, the intellectual titans of statistics rejected it.
May 3 • 9 tweets • 2 min read
🌟 Spectral Entropy: The "Musical Playlist" of Data Science 🎵
Ever wondered how scientists distinguish a calm brain from a chaotic one or predict stock market crashes? The answer lies in spectral entropy—a powerful tool that measures the "rhythm" of chaos in data. Let’s dive in!
🔍 What Is Spectral Entropy? Think "Radio Stations"!
Imagine tuning a radio:
Low Spectral Entropy = One clear station (e.g., classical music). All energy is focused, like a heartbeat or a pendulum. 🎻
High Spectral Entropy = Static noise.
May 2 • 7 tweets • 2 min read
📈 Andrey Markov and the Birth of Stochastic Chains
In 1906, Russian mathematician Andrey Markov introduced a revolutionary idea: modeling dependent sequences of events using what we now call Markov chains.
At a time when probability was largely limited to independent events like coin flips or dice rolls, Markov broke new ground. He showed how we could still apply the laws of probability – such as the law of large numbers – to systems where each event depends on the previous one.
Apr 27 • 11 tweets • 2 min read
🚀 From Napoleon’s Era to Your Smartphone: The 200-Year Saga of the Fourier Transform 🔥
🌍 18th Century: A Mathematical Mystery
Before Fourier, giants like Euler and Bernoulli dared to ask: “Can complex vibrations be built from simple waves?” But it was all speculation—no proof, just raw genius chasing an idea too wild to tame.
🔥 1807: Fourier Drops a Bomb
Apr 25 • 7 tweets • 2 min read
🧠 When Kolmogorov–Arnold Networks (KANs) dropped last year, I said it loud and clear: this is one of the most innovative technologies of 2024.
At the time, a few skeptics scoffed.
Some made lurid claims about how KANs wouldn’t scale, wouldn’t generalize, and certainly wouldn’t touch Transformers.
📅 Fast forward to 2025 — and let’s just say the skeptics have not only eaten their hats… they’ve had seconds.
Apr 19 • 5 tweets • 1 min read
📈 Kolmogorov & Wiener: The Godfathers of Modern Forecasting
Before the 1950s, forecasting was part art, part guesswork. That changed thanks to two brilliant minds—Andrey Kolmogorov and Norbert Wiener.
🇺🇸 Wiener, with his applied wartime work, and 🇷🇺 Kolmogorov, with his deep theoretical insights, independently showed that if you know a process’s mean and covariance structure, you can build the optimal linear forecast—and even compute its error.
Apr 19 • 4 tweets • 1 min read
🎯 Tracking enemy aircraft with math? Enter the godfather of cybernetics Norbert Wiener.
During WWII, MIT mathematician Norbert Wiener (1894–1964) was tasked with a critical challenge: predict the future position of German bomber aircraft using only past radar observations.
📡 The result?
A groundbreaking method of signal filtering—now called the Wiener Filter.
His 1942 solution (declassified in 1949) used advanced frequency-domain math to develop the optimal way to estimate future values of a time series.
Apr 18 • 6 tweets • 2 min read
📈 Fond of moving averages? You might be surprised to learn they can create phantom cycles out of pure randomness.
Back in 1927, Soviet economist Eugen Slutsky (1880–1948) made a groundbreaking discovery.
🔁 Slutsky showed that when you apply moving averages (or rolling summations) to random data, the result can look surprisingly cyclical—even though there's no real underlying pattern. Just white noise + smoothing = the illusion of a cycle.
💡 His insight?
Apr 6 • 7 tweets • 2 min read
🔍 R² Is a Misleading Metric for Forecast Evaluation
R² is frequently reported in papers, dashboards, and even on social media as a measure of forecasting performance — but when it comes to time series forecasting, it's often the wrong metric.
Here's why practitioners and researchers should be cautious:
📉 R² does not measure forecast error
It quantifies in-sample fit — not out-of-sample accuracy. A model can achieve a high R² while producing systematically biased or inaccurate forecasts.