How do you use transfer learning with images with 3+ (or 1) channel(s)?
Timm library, developed by @wightmanr, has an elegant way to handle that:
You can specify any input channel number (e.g. in_chans=1 or in_chans=8) using timm.create_model() function like this:
@wightmanr m = timm.create_model('resnet34', pretrained=True, in_chans=8)
How does it work?
β’ Case 1: number of input channels is 1
timm simply sums the 3 channel weights into one single channel
@wightmanr β’ Case 2: number of input channels is 8 (more than 3)
timm repeats the 3 channel weights as many times as required, and then select the required number of input channels weights
In 8 channels example, that would be: repeat 3 times (9 channels generated), then keep the first 8
π₯ ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language Knowledge Distillation
Heads up: Iβm preparing a visual summary on ZSD-YOLO.
So, what is Zero-Shot Detection?
β’ Zero-shot detection allows a model to detect something in an image even if the model has never seen that thing before
β’ So, if you have an image of a Chimpanzee and the model has never seen a Chimpanzee before, you can use your zero-shot detector to locate it in the image
β’ ZSD-YOLO leverages 2 models:
- CLIP: a pretrained Vision-Language model
- YOLOv5: a modified version that replaces the classification branch
Many open-world applications require the detection of novel objects.
but state-of-the-art object detection and instance segmentation models are unable to do so.
β’ Itβs because models learn to suppress any unannotated objects by treating them as background
β’ To address that issue, the authors propose a simple yet surprisingly powerful data augmentation and training scheme they call Learning to Detect Every Thing (LDET)
β’ To avoid suppressing hidden (unannotated) objects, background objects that are visible but unlabeled, they paste annotated objects on a background image sampled from a small region of the original image (see figure)