Win Probability, Explained 📈

What is win probability? How does it work? Is it ever "correct"?

If you ever find yourself asking these questions, this thread is for you 🧵
1/ Win probability (abbreviated "WP") is the likelihood that a team will win a particular game, expressed as a percentage.

50% WP: If the game is played 1000 times, the team will win ~500

10% WP: If the game is played 1000 times, the team will win ~100
2/ Importantly, WP is not a statement about whether a team WILL or WON'T win a game. A team having a 30% WP means they're MORE LIKELY TO LOSE THAN WIN but not that THEY'RE GOING TO LOSE.
3/ There are, generally speaking, two general types of WP models:

- Pre-game WP models
- In-game WP models
4/ Pre-game WP models predict the probability that a team will win based on how good each of the teams are, home field advantage, and any other information that might be knowable *before the game*.

As should be obvious, these models calculate WP before the game is played.
5/ For example, a pre-game 🏈 WP model might calculate Team A's WP against Team B by taking into account the following:

- Team A is strong on offense and mediocre on defense
- Team B is weak on offense and defense
- Team B is home
- It'll be raining during the game
6/ In-game WP models calculate a team's WP while the game is being played. These models often take into account the same things that pre-game models do (e.g. team strength), but also take into account the game situation.
7/ For example, an in-game 🏈 WP model might calculate Team A's WP by taking into account that:

- Team A is strong on offense, OK on defense
- Team B is weak on offense & defense
- Team B is home
- It's raining
- Team A is up 21-7
- Team B has the ball, 3rd & 8 from Team A's 40
/8 So how do these models actually work? Well, that depends...

There's more than one way to build a WP model, ranging from simple to incredibly complicated. They're all trying to predict the same thing, the probability that a team wins, they just do it in different ways.
9/ The most simple version of a WP model doesn't take a statistics degree to understand.

Let's say that we want to predict the in-game win probability for Team C, an NBA team currently beating Team D by 8 points with 3 minutes left in the 3rd quarter...
10/ To calculate the WP, we find all past NBA games where a team was beating its opponent by 8 points with 3 minutes left in the 3rd quarter.

Let's say there are 100 of such games in the last 20 years, and the team leading by ended up winning 85 of them.
11/ From this, we'd deduce that Team C has an 85% WP because 85% of teams that have been in this situation ultimately won.

Of course, this doesn't account for a number of factors, including team strength. We'd want to layer that information in for a better prediction.
12/ More advanced models riff on this theme, using information on the outcomes of past games to make predictions about future games or games in-progress...
13/ But more advanced models take into account additional information (including information from games that don't exactly resemble the game in question) and use more sophisticated techniques to build predictions.
14/ Are these models ever tested?

If they're good, yes. A good WP will have been tested against actual historical games to determine how well it would have predicted win probabilities before and/or during those games.
15/ I see different models showing different probabilities. Why are they different?

Different models produce different results because they take into account different factors (one might account for weather while another doesn't) or are built using a different techniques.
16/ Okay, but which model is right?

No model is "right," per se. WP can't be precisely known and all models that try to quantify it are imperfect estimates (though in some cases, very good imperfect estimates).

"All models are wrong, but some are useful"
-George E.P. Box
17/ That said, there are ways to determine which models tend to be more accurate over time.

For more on analyzing the accuracy of WP models, check out this piece from @StatsbyLopez: statsbylopez.com/2017/03/08/all…
18/ Interested in getting a bit more technical? Check these out:

Stephen Hill's 🏈 WP model walk-through: medium.com/@technocat79/b…

@recspecs730's 🏀 model description:
lukebenz.com/post/ncaahoopr…
19/ Well that's all for today - If you have any questions or if there's a metric you'd like me to cover in the future, drop a reply below!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Brendan Kent

Brendan Kent Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @brendankent

10 May
Things about analytics every newcomer should know

(Feel free to add your own)

⬇⬇⬇
Raw data will almost always be messy and cleaning will be a big part of any project.

If you expect that, you'll be less frustrated with the amount of time you spend cleaning.
Relatedly: Data Engineers are important and if you don't have one, you'll probably wish you did.
Read 11 tweets
23 Mar
Why is sports analytics valuable?

Because humans have mental & situational constraints.

A thread 👇
1/ Sports analytics has been able to provide value beyond traditional qualitative judgments because humans have limits that can be generally classified as:

- Mental constraints
- Situational constraints
2/ Let's start with the mental constraints.

Perhaps the most obvious mental constraint is that our memory is limited.

We don't have the capacity to remember and process everything that happened on every play.
Read 20 tweets
11 Feb
Descriptive vs. Predictive

What's the difference between a descriptive metric & a predictive metric? Why does it matter?

A thread 👇
1/ Generally speaking:

- Descriptive metrics are intended to describe what has happened in the past

- Predictive metrics are intended to provide insight into what might happen in the future
2/ Let's look at an example of each type.

A 🏀 player's Free Throw % is a descriptive metric because it quantifies the efficiency with which a player shot free throws in the past.
Read 14 tweets
28 Jan
New to sports analytics?

These are the programming languages and tools to learn & the order to prioritize learning them in.

A thread 👇
Priority 1️⃣: Get comfortable with Excel

Most high-level modeling is not done in Excel, however, it’s still important to know your way around a spreadsheet.

If you don’t have access to Excel, Google Sheets (which is free) will do the trick.
Priority 2️⃣: Learn either R or Python

R & Python are the core languages of sports analytics, & most roles will require that you know at least 1 of them.

Don’t worry about learning both at first. They’re similar languages and if you know one, it'll be easy to learn the other.
Read 10 tweets
16 Oct 20
Thread: There are a variety of online statistics, computer science, and data science courses that can be audited for free.

Here are a few I'd recommend to those interested in developing a technical skill set for sports analytics ⬇
"Intro to Statistics" from Stanford

Gotta build the foundation!

udacity.com/course/intro-t…
"Introduction to Computer Science & Programming Using Python" from MIT

These days, virtually every job in sports analytics requires some programming experience.

edx.org/course/introdu…
Read 7 tweets
14 Oct 20
There aren't many better ways to get exposure to teams (that are hiring) than to perform well in this. Also, sports analytics is fun.
More specifically to this year's topic (the secondary), many football analytics folks I've spoken to (including on @MeasurablesPod) agree this is perhaps the most difficult area of the game to quantify.

Excited to see how people tackle this problem (pun intended, obviously).
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(