SkalskiP Profile picture
Mar 21, 2024 9 tweets 5 min read Read on X
time analysis with computer vision

- blurring faces
- detection and tracking
- smoothing detections
- filtering detections by zone
- calculating time

let me know if you want me to explain anything else. ;)

code:

↓ read more github.com/roboflow/super…
the full tutorial will be available on Monday on the @roboflow YouTube channel; subscribe so you don't miss it.

link to YouTube: youtube.com/@Roboflow
Image
to ensure the privacy of store employees, I decided to blur their faces; to do this, we will first need a model capable of detecting them.

yesterday, I quickly labeled a few dozen images, which I used to train my model.

this is enough for the demo, but we need many more images to implement such a use case.

dataset link: universe.roboflow.com/roboflow-jvuqo…
here's the result! I trained the model in Google Colab.

colab link: colab.research.google.com/github/roboflo…
I used the inference package to run a model pre-trained on the COCO dataset to detect people. I used ByteTrack for tracking.

you can learn more about the Inference + Supervision combo from this tutorial.

tutorial link: supervision.roboflow.com/latest/how_to/…
we want boxes to be stable; to reduce box flickering, we'll use smoothing - averaging the box positions based on the last N frames.

pay attention to the customer #37; on the left without smoothing, on the right with smoothing.

smoothing docs: supervision.roboflow.com/latest/detecti…
I drew the zones using MakeSense - an open-source photo labeling program I created while living in a dormitory several years ago.

make sense Github link: github.com/SkalskiP/make-…
here is a fragment of the logic responsible for counting time.

- if you're working with static video files, I recommend using an FPS-based approach.
- I recommend a clock-based approach if you're working with video streams.
Image
Image
calculating the time customers spend waiting in line at a store is just one of the potential applications.

here's another use case where we calculate how long drivers wait to pass through an intersection.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with SkalskiP

SkalskiP Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @skalskip92

Jun 12
CVPR 2025 papers pt. 2 - SAMWISE

SAMWISE adds language understanding and temporal reasoning to SAM2; you can segment and track objects in videos just by describing them

more papers:

↓ more github.com/SkalskiP/top-c…
SAM2 supports visual prompts like points and boxes but have no native support for text prompts.

I often showed how combining SAM2 with VLMs enabled language-guided image segmentation.

SAMWISE allows direct text-driven video object segmentation.

Read 7 tweets
Mar 12
YOLOE is real-time zero-shot detector (similar to YOLO-World), but allowing you to prompt with text or boxes

here I used YOLOE to detect croissants on conveyer using box prompt; I just picked first frame, drawn box and run prediction on other frames; runs at around 15 fps on T4
Image
just like YOLO-World, YOLOE allows you to prompt images with text

here are two examples where I asked for:
- ["dog", "eye", "tongue", "nose", "ear"] - the model missed the ear here
- ["dogs tail"] Image
Image
Read 7 tweets
Feb 18
I've been playing with Qwen2.5-VL object detection over the past few days; take a look

notebook link: github.com/roboflow/noteb…Image
you can prompt the model to detect multiple objects classes at the same time Image
if there are too many objects in the image, or we try to detect many classes at once, the model can get confused and spins in circles until it reach token limit. Image
Read 7 tweets
Jan 23
the first episode of VLMs zero-to-hero will be about Word2Vec

we will train a Skip-Gram model on 17M words from wikipedia; notebook is already in the repository, and the video should be out in about a week

link: github.com/SkalskiP/vlms-… x.com/skalskip92/sta…Image
Skip-Gram model predicts the surrounding context words based on a given center word. Image
during training, the Skip-Gram model learns word embeddings (numerical representations of words) that capture semantic relationships, which can then be used for various natural language processing tasks like word similarity. Image
Read 7 tweets
Nov 20, 2024
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware

check out this SAM2 vs SAMURAI comparison!

- paper: arxiv.org/pdf/2411.11922
- code: github.com/yangchris11/sa…
- license: Apache-2.0
- enhance the visual tracking accuracy of SAM 2 by incorporating motion information through motion modeling, to effectively handle the fast-moving and occluded objects

- propose a motion-aware memory selection mechanism that reduces error in crowded scenes in contrast to the original fixed-window memory by selectively storing relevant frames decided by a mixture of motion and affinity scoresImage
state-of-the-art performance on various VOT benchmarks, including GOT-10k, LaSOT-ext, and NeedForSpeed Image
Image
Image
Read 4 tweets
Oct 17, 2024
YOLO11 zero to hero tutorial!

- label images for training
- understand the YOLO annotation format
- train YOLO11 on your local machine and in Google Colab
- save and deploy the fine-trained model
- and more ↓

link: youtu.be/etjkjZoG2F0 x.com/skalskip92/sta…Image
label images for YOLO11 training Image
understand the YOLO annotation format Image
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(