Michael Wagener Profile picture
Jun 6 24 tweets 6 min read Twitter logo Read on Twitter
#cricket #stats #geeky thread alert - don't bother reading this thread if you're not interested in either cricket or statistics.

A few people have asked me why I never talk about a batter's average in particular countries. There's a really simple reason for that.
There just isn't enough international cricket played - particularly not international test cricket - to make most of those "averages in a country" statistics meaningful in any way.

There's so much randomness in results that looking at a sample size of (say) 12 innings is nuts.
What do I mean "randomness?" Well let's think about it this way. Each ball there's a chance that a batter will get runs, not get runs or get out. For the best batters, that risk of getting out will generally be lower, and it will fluctuate throughout an innings.
We know that players are more likely to be dismissed in the first 20 balls that they face than they are in the balls from 101 to 120 (if they last that long). They're also more likely to be dismissed if they're facing a top bowler than a part timer. And so on.
However, the probabilities actually don't tend to change that much - surprisingly. The best players tend to average about 10 more after they "get their eye in" than when they first start, but generally they manage the risk of getting out by adopting more judicious shot selection.
Because of that self regulation, the distribution of risk throughout an innings, when measured by runs scored, is actually remarkably consistent. They tend to have a similar probability of going from 20-30 as they do of going from 110-120.
An good example of this is Sachin Tendulkar. He played so many innings that he's one of the few players to give us a really good sample size.

In 16% of his innings where he got to 30, he got out before 40.
From 60 to 70, 15%
From 110 to 120, 13%
But even with him, we start getting very small sample sizes once we get over about 130. However, the fact remains that the risks of getting out do remain relatively stable, after an initial period of being more likely to be dismissed.
There's some really interesting work that has been done in this area, and I've personally really enjoyed looking through some of the stuff that @StatsSteves has done in this field, as well as some of @cricketingview 's excellent database work.
@StatsSteves @cricketingview But for the purpose of what I'm going to look at next, I'm going to assume that the risk of getting out before getting the next run is completely consistent, regardless of the score. I know that it isn't, but let's just pretend. We'll address how this impacts my result at the end
I'm also going to assume that a batter's ability is fixed. That's a bit unrealistic. Player's ability isn't a fixed thing. In statistical terms I'm very much a Bayesian - I don't believe that there's a single number that can be used to model a batter's skill throughout a careeer.
However, again, for the purpose of this exercise, I'm going to assume that that is the case. Again I'll address the implications of that on the result at the end.
I'd say that by now I've lost all but the geekiest readers. If you're still going, well done!

If we assume that a player has equal chance of getting out in any progression from one run to the next in their career, we can say therefore that that player's career can be modelled.
Modelling is where we use a mathematical method to create an algorithm that can tell us about the properties of the way that their scored might be distributed.

Using this model, we'd expect that Sachin Tendulkar would have averaged about 54.5. Not far from his average of 53.8.
So it isn't perfect, but it's pretty good.

Once we model something, we have many ways that we can use that data. One of them is by using that model to generate random numbers to look at what the expected results might look like.
I did just that, and generated 500 different batters, each of whom had the ability of a batter who averaged 40.

I gave each of them 200 innings, and then looked at their average after those 200 innings. Here's a graph showing it: Image
We see that there's actually a lot of variation in their actual average, after a few innings, but they start to settle down as time goes on. But, interestingly, there's still a lot of variation after even as many as 50 innings.
Now 50 innings is a lot of test innings. That's at least 25 tests. For most players in the world, that's roughly 3 full years. And even after that, a player who is good enough to average 40 could be averaging as little as 23.2 or as much as 59.5.
Roughly 10% are averaging over 48, and roughly 10% averaging under 33. We would describe one of those as a potential great, and there would be calls for the other to be dropped. But both groups have exactly the same ability. One group have just been lucky, and the other unlucky.
Now you'll remember that I made some assumptions at the start. All of those assumption made the spread more predictable, and reduced the variation.

In real life, we would expect an even greater difference in the data from players of exactly the same ability.
I really can't look at that distribution, and look at what I know about how cricket works, and bring myself to write off a player as being "unable to handle bounce" because they average 21 in 7 innings in South Africa (for example).
And if you're going to come on my post and try to tell me that someone is no good because they average 26 in 9 matches in Sri Lanka, then I'm very likely to not pay your argument much heed at all.

Cricket is a beautiful game, with high amounts of skill and luck.
To be successful you need to be both good enough and lucky enough. I love analysis of cricket players and matches. I really enjoy looking for stories in the numbers, and seeing if they relate to what I observe on the field. But numbers can be very deceptive if you aren't careful.
And if you aren't careful, and you do come at me with "well they failed in Pakistan, so they can't be any good" expect that you'll get this graphic heading your way. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Michael Wagener

Michael Wagener Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Mykuhl

Jan 4
At the moment, the #SuperSmash is the poor cousin to the global T20 competitions. And that's by design. NZC looked at the global market and said "trying to compete on this playing field will bankrupt us."

And so the SuperSmash is just a competition to bring through local talent.
There's nothing particularly wrong with that, and it has done its job to a degree. We've found a number of local players, for whom the SuperSmash has prepared them reasonably well for international cricket.

However, I can't help feeling that it could be much more.
One issue that the @BLACKCAPS have had over the years is that they've not gone so well in knock-out matches. Some of that is due to just not being as good as the teams they've been playing. But some of it feels like a psychological issue, too.

What if there was a solution?
Read 13 tweets
Jan 3
My thoughts on the #BBL catch. It is the law and I think it's a good one. The issue wasn't the law. It was that there was so much room outside the boundary.

If the rope wasn't in so close, that would have been an easy catch.
The idea that somehow bringing in the ropes is good for cricket is completely bizarre. The administrators need to stop homogenising cricket.

Some grounds should have big boundaries. Others should have short ones. Asymmetric boundaries are good, not bad.
Having differing boundary sizes results in better captaincy, better bowling, and better batting. It's a bad idea to keep bringing in the boundaries.

Have a 2-3m safety zone between the boundary route and the physical boundary. But no more than that.

The law is fine.
Read 4 tweets
Jun 26, 2022
This #ENGvNZ test series is just bizarre. You can count on one hand series in history where the two teams have been as even a these two in the 1st innings.

No team has had a 50 run 1st innings lead in any test.

The difference in collective 1st innings batting averages is 0.77.
And yet, England have already won the series and are looking at a whitewash.

And it's not like NZ have batted particularly badly in the 3rd innings.

In every match, England have been set between 275 and 300.

Teams don't normally chase scores in that range successfully.
So what has gone wrong?

1. England has batted very well.

It's important to give credit where it's due. There are two teams on the field, and England have been extraordinary.

But that's not th he whole story.
Read 11 tweets
Sep 29, 2020
There was another really interesting innings in the #IPL match today. This time by Kane Williamson.

He scored 41(26) at Abu Dhabi. That's a strike rate of 157.7, which is the second highest for any score over 40 at Abu Dhabi this IPL. (Top was Suryakumar Yadav's 47(28) SR 167.9)
That, in itself, isn't particularly interesting. What is interesting is how he scored those runs.

He faced only 3 dot balls. He scored off the other 23. Those three dots were a ball he hit too hard to a fielder, one that hit him and they got a leg bye and the ball he got out on.
He scored 5 boundaries, all fours. So he scored 20 runs in boundaries, and 21 in non-boundaries.

He scored at a strike rate over 150, despite scoring fewer boundary runs than run runs. This graph is every innings this IPL with 30+ runs at 135+ Strike rate.

Williamson's in red.
Read 12 tweets
Sep 27, 2020
I'm going to make a few comments on Rahul Tewatia's innings, because, statistically, it's really interesting.

First of all, the rate of acceleration was astounding. Image
I find that breaking T20 innings into 15 ball groups is often really informative. It's very, very informative here. Image
Another good technique is to look at using exponential smoothing to look at a batsman's scoring rate. For Tewatia's innings, his smoothed rate is astoundingly low at the start, then astoundingly high at the end. Image
Read 8 tweets
Sep 26, 2020
Today is our iron wedding anniversary.

We're having a high-iron dinner.
Mmmmm Image
Seasoned, then into the oven at 90°c for a long time. Image
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(