🔥 ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language Knowledge Distillation

Heads up: I’m preparing a visual summary on ZSD-YOLO.

So, what is Zero-Shot Detection?

• Zero-shot detection allows a model to detect something in an image even if the model has never seen that thing before

• So, if you have an image of a Chimpanzee and the model has never seen a Chimpanzee before, you can use your zero-shot detector to locate it in the image

• ZSD-YOLO leverages 2 models:
- CLIP: a pretrained Vision-Language model
- YOLOv5: a modified version that replaces the classification branch

Read 5 tweets

AI Fast Track (70/70)

@ai_fast_track

23 Dec 21

Many open-world applications require the detection of novel objects.

but state-of-the-art object detection and instance segmentation models are unable to do so.

• It’s because models learn to suppress any unannotated objects by treating them as background

• To address that issue, the authors propose a simple yet surprisingly powerful data augmentation and training scheme they call Learning to Detect Every Thing (LDET)

• To avoid suppressing hidden (unannotated) objects, background objects that are visible but unlabeled, they paste annotated objects on a background image sampled from a small region of the original image (see figure)

Read 8 tweets

AI Fast Track (70/70)

@ai_fast_track

20 Dec 21

❓ What is Multi-Scale Training (MST)?

💡 MTS helps your model to be robust to image sizes, an get better performance

• Training on small images is faster

• Training on large images increases your model performance

How is MST done?

Every N (e.g., 10) epochs, we randomly chooses a new image dimension from a range of sizes [640, 768, 800], and train our model

This means the same network becomes better at predicting at different resolutions.

• In MMDetetection, models trained using multi-scale technique have “_mstrain_” in their name.

• Example: vfnet_r50_fpn_mstrain_2x_coco

Read 5 tweets

AI Fast Track (70/70)

@ai_fast_track

14 Dec 21

@strickvl

🔥 Just after one week of learning, my mentee @strickvl:

• Built a dataset of ~ 450 images: redacted documents

• Labelled them using @explosion_ai Prodigy

• Trained a VFNet model using IceVision and @fastdotai

• Reached 73%🚀in the COCO metric score

Blog posts: 👇

📝 Main takeaway of this story is: You can learn object detection very quickly if You:

• Are determined
• Follow the optimal learning path
• Embrace the 80-20, and the KISS principles.
• Have access to high curated content, and libraries
• Know how to avoid roadblocks

• Stay focus, and avoid distraction

✨ Like many things in life, object detection is:
• neither too hard
• nor too easy
• right in between ... when you have the right ingredients

Read 5 tweets

AI Fast Track (70/70)

@ai_fast_track

9 Dec 21

Interested in parsing any custom object detection dataset format?

I will show you how easy to parse your dataset, and train one of many IceVision models using your own data.

In this post, I walk you through the steps to parse data stored in CSV files by using the Chess dataset

You can apply the same logic to any other format.

• First, both COCO and VOC formats are transparently supported in IceVision.

• The CSV format is different than the COCO and VOC formats. Hence, the custom parsing.

• IceVision auto-generates your parser class skeleton.

• You only need to connect some attributes to their corresponding ones in your CSV file

• Once done, you instantiate your class, and parse your data

• From there, all the subsequent code is common to any other parsed data.

• Parsed data are a collection of record objects

Read 8 tweets

Share this page!

AI Fast Track (70/70)

Try unrolling a thread yourself!

More from @ai_fast_track

AI Fast Track (70/70)

AI Fast Track (70/70)

AI Fast Track (70/70)

AI Fast Track (70/70)

AI Fast Track (70/70)

AI Fast Track (70/70)

Did Thread Reader help you today?

Like this author's thread?