🔥 ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language Knowledge Distillation
Heads up: I’m preparing a visual summary on ZSD-YOLO.
So, what is Zero-Shot Detection?
• Zero-shot detection allows a model to detect something in an image even if the model has never seen that thing before
• So, if you have an image of a Chimpanzee and the model has never seen a Chimpanzee before, you can use your zero-shot detector to locate it in the image
• ZSD-YOLO leverages 2 models:
- CLIP: a pretrained Vision-Language model
- YOLOv5: a modified version that replaces the classification branch
Many open-world applications require the detection of novel objects.
but state-of-the-art object detection and instance segmentation models are unable to do so.
• It’s because models learn to suppress any unannotated objects by treating them as background
• To address that issue, the authors propose a simple yet surprisingly powerful data augmentation and training scheme they call Learning to Detect Every Thing (LDET)
• To avoid suppressing hidden (unannotated) objects, background objects that are visible but unlabeled, they paste annotated objects on a background image sampled from a small region of the original image (see figure)
• Trained a VFNet model using IceVision and @fastdotai
• Reached 73%🚀in the COCO metric score
Blog posts: 👇
📝 Main takeaway of this story is: You can learn object detection very quickly if You:
• Are determined
• Follow the optimal learning path
• Embrace the 80-20, and the KISS principles.
• Have access to high curated content, and libraries
• Know how to avoid roadblocks
• Stay focus, and avoid distraction
✨ Like many things in life, object detection is:
• neither too hard
• nor too easy
• right in between ... when you have the right ingredients