SkalskiP Profile picture
Open-source Lead @roboflow. VLMs. GPU poor. Dog person. Coffee addict. Dyslexic. | GH: https://t.co/dEmzMDGq5H | HF: https://t.co/4Lx1Yw34W7

Sep 24, 10 tweets

I finally solved player recognition

- player and number detection with RF-DETR
- player tracking with SAM2
- team clustering with SigLIP, UMAP and KMeans
- number recognition with SmolVLM2

stay tuned for YT tutorial:

↓ full breakdown + code youtube.com/c/Roboflow

we start with RF-DETR model fine-tuned to detect players, numbers, referees, ball, rim

model + dataset: universe.roboflow.com/roboflow-jvuqo…

I recently used the same model to build a jump shot make-or-miss demo, which will also be included in my upcoming YT tutorial

google colab: github.com/roboflow/noteb…

SAM2.1 tracks objects across video using visual prompts like boxes or points

we use a fine-tuned RF-DETR to detect all players in the first frame, pass these detections to SAM2.1, and track them in the following frames

I sample frames, detect players, crop the central regions, generate SigLIP embeddings, reduce them with UMAP, and cluster with KMeans to separate players into two teams

I used the same strategy in last year’s Football AI project. I also recorded a YouTube tutorial covering it

check it out if you haven’t already:

reading player numbers from small and blurry crops is not easy

traditional OCR models struggle with this task

for this reason, we decided to use SmolVLM2, fine-tuned on a custom multi-modal dataset

@andimarafioti @mervenoyann

model + dataset: universe.roboflow.com/roboflow-jvuqo…

the next step is to match each jersey number to the right player using the mask IoS metric

unlike IoU, which measures overlap against the union, IoS measures it against the smaller area, so a smaller object fully inside a larger one gives IoS = 1

as player positions change, jersey numbers are not always clear, so relying on a single prediction is unreliable

to reduce errors, we validate numbers across frames

you can see in the video how numbers stabilize once they stay visible across consecutive frames

code for those of you who really want to have some fun: colab.research.google.com/github/roboflo…

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling