- player and number detection with RF-DETR
- player tracking with SAM2
- team clustering with SigLIP, UMAP and KMeans
- number recognition with SmolVLM2
added support for parsing and visualizing detection results from @alibaba_cloud Qwen2.5-VL, @moondreamai, and @GoogleDeepMind Gemini 2.0 and 2.5 models.
this comes in addition to existing support for @Microsoft Florence-2 and @GoogleDeepMind PaliGemma.
here's an awesome @huggingface space by @SergioPaniego and @onuralpszr, where they compare Moondream and Qwen2.5-VL object understanding using supervision-0.26.0 for parsing and visualization
YOLOE is real-time zero-shot detector (similar to YOLO-World), but allowing you to prompt with text or boxes
here I used YOLOE to detect croissants on conveyer using box prompt; I just picked first frame, drawn box and run prediction on other frames; runs at around 15 fps on T4
you can prompt the model to detect multiple objects classes at the same time
if there are too many objects in the image, or we try to detect many classes at once, the model can get confused and spins in circles until it reach token limit.
Skip-Gram model predicts the surrounding context words based on a given center word.
during training, the Skip-Gram model learns word embeddings (numerical representations of words) that capture semantic relationships, which can then be used for various natural language processing tasks like word similarity.