A kinda academic but important question for those who have done systematic statistical analysis of cricket stats, beyond pivot tables & visualization. Like regression type analysis.
Do y'all also have problems with the IID assumption? I do.
And this might be a concern not just with cricket, but all sports. IID assumes that all outcomes are "independent" and "identically distributed". When you toss a coin or dice, each result truly is IID.
But sports events are so not at all IID!
In fact, in sports, there should be mandatory autocorrelation terms and some kind of hierarchical var-covar structure. Markov process is an option, but it ties you too strongly to t-1. But year specific, series specific, venue specific covariances make more "theoretical" sense.
Just thinking out loud as I'm starting my first "formal" research project using cricket data. And model specification is where I spend a lot of my time to pre-empt all possible reviewer 3 objections. What even is the underlying distribution? There has to be one.
Let me try to put this in layperson terms.
For us to make inferences and predictions from data, we need to make defensible assumptions about how the phenomenon of interest plays.
Coin toss 50-50 is one such assumption. Normal distribution is another. And so on.
So what "distribution" (if a parametric one even exists) does a batter's score come from? Currently, when we calculate things like standard deviations and probability ranges in cricket stats, we assume IID. Like a batter is a roulette wheel, without memory.
Longitudinal models, Markov process etc take care of this by not assuming independence, but rather, assuming a correlation with the previous outcome. Which makes sense for share prices or commodity prices, which have stickiness between t-1 and t. But sports is different.
If Apple stock price yesterday and the day before were around 175, it's not like tomorrow it'll open at 22 or 3500. So that t-1 method captures a lot of the time variance. But in cricket, and many sports, your "score" fluctuates wildly.
Not all sports.
If you're modeling a sprinter or runner or swimmer's times, yeah, the t-1 will have a lot of info. But a batter at the top of their game can have scores like 175, 0, 23, 117, 3, 222 and you'll be like, wow, great series. But what's the underlying distribution?
I know some of you might be thinking, what about the Random Walk process? Yeah, could work. But still not quite "elegant", is it?
What is the underlying distribution and/or process for a cricket inning? Cos there are sooooo many endogenous variables!
Anyway, just thinking out loud. It's model specification stage where I spend months asking questions like these to myself all day and then trying to answer them.
The research on this specific broader question is sparse, but interesting and providing. Like this 20 year old JASA that tested this for tennis and found that tennis points are very much NOT iid!
Cricket is a way more complex game than tennis.
In fact in terms of the sheer range of values for scores, I don't think any sport comes close to cricket. We have stats that range from the thousands and the hundreds to multi dimensional scores to fractions and such. No other sport uses the decimal system so enthusiastically!
Oh I forgot I live in the same city as this guy. Will just go to his office after the holidays and pick his brains. Cos this is exactly what I'm thinking about as well. The memoryless assumptions, be they IID or Markovian, seem to lack face validity.
Imagine the number of interesting statistical tests you can run on each single delivery. Soooo many outcomes!
Out or not: Seems like a binomial distribution but with extremely skewed parameters
Runs scored: Zero inflated Poisson?
And many more. Fun stuff.
This point is more academic/philosophical but just because some data behave like IID doesn't make the underlying process IID by default. Especially when we can very clearly see that there is no such thing as independence or memorylessness in sport. Quasi-IID at best.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
More Indians need to say this and say this explicitly. The Pakistani nation-state has done a lot of horrible cruel things over the years, split in 2 itself, has many dark dark issues.
But what's happening in India right now is a whole other level of mainstream hate-mongering.
Criticizing sanghi barbarism by drawing parallels to Islamofascists implies that somehow, sanghism is still a "lite" kind of extremism. Nope, it's pretty original in its own barbarism, bigotry, violence, misogyny, greed, everything. Listen to #HaridwarHateAssembly speeches.
This. Exactly this! Cos I was thinking about it re my parents, who might want to fly here soon and would prefer to be boosted before getting on the plane.
But if they are saying 9 months, then what was the point of that 8 PM announcement by Modi? Most won't qualify for months!
This is such blatant bait & switch. Modi got his primetime speech and the resulting media & WhatsApp footage. Declaring boosters for elderly and all. But only if you were vaccinated 9 months ago. Very very few were.
Why o why is the Indian government so intent on stretching out this pandemic as much as possible and treating vaccines and tests as some valuable commodities that have to be rationed instead of being aggressively distributed? NYC is giving $100 bonuses. India says 60+, 9 mths.
I know 83 is an important movie and all, but please remind young youths of today that the greatness of that WI team wasn't in winning a couple of ICC tournaments, but going 16 years, 30 test series, without losing a single test series. In a VERY competitive era in the sport.
I think of West Indies' wins in 1975 and 1979 as... Nice. Cute. It's their test record that makes you go whoaaaaaaa!
When future generations look back at Virat Kohli's record, they won't care about random "ICC silverware". It's the test triumphs that endure. Fifty years later, they'll find it funny that any Indian captains before him were considered even comparable to him in tests.
Every year for the past decade, the time from Christmas dinner to December 30th lunch is the time when wife & I interact the absolute minimum amount in the whole year. Cos you see the MCG test starts at 7 pm Christmas night in NYC, the SA boxing day test at 3 AM soon after.
Every year at this time, unless I'm traveling, I'm waking up at Melbourne time and going to sleep at Durban (this time Centurion) time, which in NYC time means I'm sleeping most of the day. And wife, who isn't into cricket & has her busiest time at work, keeps normal hours.
And then we spend some time over the new year eve, much like the players and their families presumably. And then the Sydney test starts and also another in South Africa and that's another week where we are like on two different shifts.
Kohli departs after another start. But pundits will now go "he should convert the starts into centuries" as if he decided, nah, 35 is enough, who needs 100?
That's the thing about this lean phase for Kohli, Rahane, Pujara. They've been getting starts regularly. #INDvSA
If a batter is getting starts, they are generally *NOT* out of form. You can't get to 25 regularly if you're truly out of form.
In such instances, more often than not, the "lean phase" is just a stochastic inevitability. In this case, remember, in the golden age of fast bowling.
What I'm saying is that Kohli and Rahane and to some extent Pujara are having a lean patch mainly because of "luck". Not in the colloquial sense but the way @cricketingview has explained "luck". It's a probability thing. Doesn't make for 2000 word essays, but that's it.
The third umpire is named Allahueddin Palekar! Sounds like a character from a Film Institute movie, no? #INDvSA
This Palekar name reminds me of a very @bvhk story. Twenty years or so, Harish, @quatrainman & I were at Shabri on FC Road in Pune after one of our usual quiz club sessions. All of us, big fans of Amol Palekar fans. Acting and directing. Not just Golmaal fan types. Plus quizzers.
This also happened to be a phase when Amol Palekar's career was considered a cool eclectic topic in Indian quizzing. And we had each made many such quiz questions, really dived into his career.
That day, we spotted the man himself in Shabri. With his family.