How do you use transfer learning with images with 3+ (or 1) channel(s)?
Timm library, developed by @wightmanr, has an elegant way to handle that:
You can specify any input channel number (e.g. in_chans=1 or in_chans=8) using timm.create_model() function like this:
@wightmanr m = timm.create_model('resnet34', pretrained=True, in_chans=8)
How does it work?
• Case 1: number of input channels is 1
timm simply sums the 3 channel weights into one single channel
@wightmanr • Case 2: number of input channels is 8 (more than 3)
timm repeats the 3 channel weights as many times as required, and then select the required number of input channels weights
In 8 channels example, that would be: repeat 3 times (9 channels generated), then keep the first 8
🔥 ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language Knowledge Distillation
Heads up: I’m preparing a visual summary on ZSD-YOLO.
So, what is Zero-Shot Detection?
• Zero-shot detection allows a model to detect something in an image even if the model has never seen that thing before
• So, if you have an image of a Chimpanzee and the model has never seen a Chimpanzee before, you can use your zero-shot detector to locate it in the image
• ZSD-YOLO leverages 2 models:
- CLIP: a pretrained Vision-Language model
- YOLOv5: a modified version that replaces the classification branch