A 🧵 on @karun1710 's talk at the @StatsBomb conference about some really interesting Data Analytics stuff happening at @Arsenal
Starting with the types of data to encode a sequence in a game, we have 3 types (images attached in order)
1.Event tracking data
2.Full Tracking data
3.Broadcast tracking data
This data can described by it's ease of availability (Coverage) and level of detail it provides(Granularity)
1.Event data (High Coverage, Low granularity, Limited)
2.Full Tracking (Low Coverage, High granularity)
3.Broadcast tracking (High coverage, Medium-high granularity)
How to use this data at scale?
We want to answer the question "How dangerous is the situation?" using tracking data
This question can be thought from different lenses
Attacking runs, Phase of play, Structure, Counter threat etc
Using tracking data some "models" try to answer the question and are listed below, still it's not a complete answer
Expected Threat
Passing options
Attacking runs
Phase of play
Team structure
Counter threat
ISSUES with this :
1. Models are SEGRAGATED and ISOLATED + don't have a common communication between them
2.High maintenance cost for all these models
So can we build an Unified model? How to go about it? Let's see the 2 approaches
Approach 1
A model requires features and here we extract features from the tracking data
Eg : Player level, team level and situation level features
These features are handcrafted and we are limited by our own heuristics + biases on how these questions should be solved
They also have a high maintenance cost
They make sense with full tracking but don't make sense with broadcast tracking as the data can be noisy and incomplete
So what to do now? Can we make the model "learn" features by itself?
Approach 2:
Let the Unified Model derive answers from first principles instead of us feeding handcrafted features to it
The focus now is on designing the system by *designing the questions* rather than designing the way it should arrive at our answers
This is the difference!
Remember ChatGPT? We use the 'T' in it to solve our problem (Transformers)
Why are Transformers relevant?
Tracking data is a sequence of events and they are good at modelling sequences
So you can feed individual video frames to the model as inputs and the different "models" (xT, Passing options etc) we talked about as outputs and let the model "learn" features by itself!
Now onto the use cases!
1. Tactics board interface
One can move players like on a tactics board but here the interface will UPDATE the possession value as you move the player and you can see what the team could have done better in every single SITUATION!
Eg: Brentford 👇
2. Situation Search
Now this is INCREDIBLE!
You can search across "All Man U situations"
Eg : 1) You can look at all Luke Shaw overlapping situations across their history!
2) You can also look at similar situations Arsenal conceded by other team's attacks! 🤯
AMAZING!!
3. Live match dashboards
i. Plotting Live Momentum and Game dominance chart using expected threat
ii. Showing data and metrics for different phases such as Low block mid block high press live in game
Something @m8arteta could look at during the game to make substitutions
Credit to @karun1710 and @StatsBomb for the images and the wonderful talk
If you found it interesting and learnt something new then drop a like or a RT!
End of thread!
@GiantGooner @scottjwillis @adamvoge @veeyahborna @A1ZH4RY @GeorgeV_AFC @TMftbl @RjArsenalBlog @watmanAFC @TacticsJournal would like to know your thoughts on it
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.