✨ Training Object Detection Models Tips & Tricks ✨

🧊 Data Labeling:

πŸ“Œ Avoid adding low quality data: it confuses your model

πŸ“Œ Prevent data leak

πŸ“Œ Dataset size: Smaller size if using pretrained models. Bigger size if training from scratch

... Image
πŸ“Œ Use prototypical (representative) data for each class

πŸ“Œ Identify incorrect classes

πŸ“Œ Identify ambiguous labelled images

πŸ“Œ Balance your data distribution
πŸ“Œ Train from scratch if your dataset is different from the COCO dataset

πŸ“Œ When your model stops improving, you might add more data to move the needle:
πŸ“Œ Use Soft-Labelling: label new data with pretrained models => Free labels

πŸ“Œ Use Self-Training: label new data with your model you are training. Add newly labelled data to your training dataset. Loop
🧊 Modeling

πŸ“Œ Use larger models. They outperform smaller ones

πŸ“Œ Use smaller models πŸ˜€ when training small dataset

πŸ“Œ Use Focal Loss for the classification head (RetinaNet , EfficientDet, ...)

πŸ“Œ Use GIoU Loss for the regression head (box location): Need a separate post πŸ˜„
πŸ“Œ Trained with ImageNet isn't always effective

πŸ“Œ YOLO-ReT paper showed that using the whole pretrained backbone harms model performance

πŸ“Œ Use Backbone Truncation (like YOLO-ReT): Only 60% of the backbone is initialised with pretrained weights.
🧊 Anchor Boxes

πŸ“Œ Use anchor boxes with size/ratio close to target boxes

πŸ“Œ Make sure you know how your anchor boxes look like

πŸ“Œ Use some anchor-free OD models (e.g., VFNet): No anchor boxes. Some perform even better than anchor-based models
🧊 Data augmentation

πŸ“Œ Oversample images with small boxes

πŸ“Œ Use transforms close to your use case

πŸ“Œ Use Copy & Paste boxes data augmentation
πŸ“Œ Use mosaic data augmentation

πŸ“Œ Some suggest using heavy data augmentation at the beginning of training, and light data augmentation at the end
🧊 Training

πŸ“Œ Don’t worry too much about overfitting at the beginning of project: Adding more data/ data augmentation should mitigate that

πŸ“Œ Train for a longer period (more epochs): As long as your loss is decreasing
πŸ“Œ In transfer learning, when freezing NN layers, you should leave BatchNorm layers as trainable

πŸ“Œ Train using progressive resizing

πŸ“Œ Use discriminative learning rate. Low learning rate for backbones, higher learning rate for the head

πŸ“Œ Use the recommended LR scheduler
🧊 Inference

πŸ“Œ Use the same image size as the one you train your model with

πŸ“Œ With high resolution images, apply inference on patches/slices and then stitch them together: e.g., Slicing-Aided Hyper Inference
πŸ“Œ Don’t forget to put the model on evaluation mode (eval_mode): It automatically disables Dropout, BatchNorm, and backpropagation.
πŸŽ‰This is my longest thread since I joined🐦

If you like this kind of content, follow @ai_fast_track for more OD / CV demystified content in your feed

πŸ™If you could give the thread a quick retweet, it would help others discover this content! Thanks!

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with AI Fast Track (9️⃣8️⃣/1️⃣0️⃣0️⃣)

AI Fast Track (9️⃣8️⃣/1️⃣0️⃣0️⃣) Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ai_fast_track

Feb 9
Here is a summary of summaries about:

β€’ Creating a Deep Learning Pipeline
β€’ Deploying Models on AWS Lambda
β€’ Deploying Models on Edge Devices
β€’ Showcasing Models Hugging Face Spaces
πŸ‘‡
2- Object detection inference with AWS Lambda and IceVision (PyTorch)

Read 7 tweets
Feb 8
πŸŽ‰ Celebrating 100 days of sharing Visual Summaries in Computer Vision

I plan to continue sharing:
β€’ More content
β€’ Summaries of summaries by topic

Follow me for more threads on advanced computer vision techniques used in industry-level applications β†’ @ai_fast_track
- Tips & Trick & Best Practices in training (not only) object detection models.

Read 4 tweets
Feb 2
🌟 VFNet: IMHO, is the best anchor-free single-stage model, and it's not under the radar.

VariFocalNet: An IoU-aware Dense Object Detector

🧊 Background:
πŸ“Œ Accurately ranking candidate detections is crucial for dense object detectors to achieve high performance.
...
πŸ“Œ Prior work uses the classification score or a combination of classification and predicted localization scores (centerness) to rank candidates.

πŸ“Œ Those 2 scores are still not optimal.

🧊 Novelty:
πŸ“Œ VFNet proposes to learn an IoU-Aware Classification Score (IACS)
πŸ“ŒIACS is used as a joint representation of object presence confidence and localization accuracy using IoU

πŸ“Œ VFNet introduces the VariFocal Loss

πŸ“Œ The VariFocal Loss down-weights only negative examples for addressing the class imbalance problem during training.
Read 7 tweets
Feb 1
4 types of imbalance issues in object detection that you should know:

Here is a brief description of each one of them and some potential solutions.

β€’ Scale imbalance
β€’ Objective imbalance
β€’ Class imbalance
β€’ Spatial imbalance
πŸ”Έ Scale imbalance
It happens when the objects have different sizes with different numbers of objects: e.g. small objects vs. big objects.

βœ… Potential Solution
β€’ Oversample small objects using the Copy&Paste data augmentation
β€’ Use higher resolution images
πŸ”Έ Objective imbalance
It happens when calculating a total loss (classification and regression losses). One loss might dominate another.

βœ… Potential Solution
β€’ Use the weighted loss
Read 8 tweets
Jan 9
How do you use transfer learning with images with 3+ (or 1) channel(s)?

Timm library, developed by @wightmanr, has an elegant way to handle that:

You can specify any input channel number (e.g. in_chans=1 or in_chans=8) using timm.create_model() function like this:
@wightmanr m = timm.create_model('resnet34', pretrained=True, in_chans=8)

How does it work?

β€’ Case 1: number of input channels is 1
timm simply sums the 3 channel weights into one single channel
@wightmanr β€’ Case 2: number of input channels is 8 (more than 3)
timm repeats the 3 channel weights as many times as required, and then select the required number of input channels weights

In 8 channels example, that would be: repeat 3 times (9 channels generated), then keep the first 8
Read 5 tweets
Jan 5
Here is a mega-summary of my YOLO-Series Visual Summaries:

1- YOLO Family Real-Time Performance
2- IA-YOLO improves object detection in adverse weather conditions using a hybrid task.

Image improvement combined with object detection.
3- YOLO Real-Time (YOLO-ReT) architecture targets edge devices.

Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

:(