Tweet

@OpenMMLab

@ai_fast_track

More from @ai_fast_track

AI Fast Track

@ai_fast_track

17 Nov

❇VFNet: A very interesting model that isn’t under the radar. You should give it a try :)

VariFocalNet: An IoU-aware Dense Object Detector

🧊 Background:
📌 Accurately ranking candidate detections is crucial for dense object detectors to achieve high performance
...

📌 Prior work uses the classification score or a combination of classification and predicted localization scores (centerness) to rank candidates.

📌 Those 2 scores are still not optimal

🧊 Novelty:
📌 VFNet proposes to learn an IoU-Aware Classification Score (IACS)

📌IACS is used as a joint representation of object presence confidence and localization accuracy using IoU

📌 VFNet introduces the VariFocal Loss

📌 The VariFocal Loss down-weights only negative examples for addressing the class imbalance problem during training

Read 7 tweets

AI Fast Track

@ai_fast_track

15 Nov

4 Feature Pyramid Network (FPN) Design you should know:

FPN, PANet, NAS-FPN, and BiFPN

📌 (a) FPN uses a top-down pathway to fuse multi-scale features from level 3 to 7 (P3 - P7);

📌 (b) PANet adds an additional bottom-up pathway on top of FPN;

📌 (c) NAS-FPN uses neural architecture search to ﬁnd an irregular feature network topology and then repeatedly apply the same block;

📌 (d) BiFPN is a bit similar to PANet, adds shortcut fusing, and then repeatedly apply the same block

📝 Some other observations:

📌 The model diagram corresponds to the One-Stage Object Detection Architecture

📌 The FPN illustration is extracted from the EfficientDet paper

📌The (P3-P5) layers are referred as the Convolutional (C3-C5) Layers in other papers

Read 5 tweets

AI Fast Track

@ai_fast_track

5 Nov

https://twitter.com/ai_fast_track/status/1407333442145206280

🤔 How to increase your Small Object Detection Average Precision APs?

💡 By increasing both image and backbone sizes when training your model:

📌 Increasing both image and backbone sizes in EfficientDet jumped APs by 14+%

📌 Increasing backbone size in RFBNet increased APs

https://twitter.com/ai_fast_track/status/1407333442145206280

📌 Increasing image size from 320 to 608 in PP-YOLO led to 10+% increase in APs

For more tips and tricks to improve small object detection tips & tricks, check out the list I shared in my first tweet.

Benchmarks are extracted from the PP-YOLO paper:

📰 Paper: PP-YOLO: An Effective and Efficient Implementation of Object Detector

PDF: arxiv.org/pdf/2007.12099…

Read 4 tweets

AI Fast Track

@ai_fast_track

4 Nov

🥇 FCOS3D won the 1st place out of all the vision-only methods in the nuScenes 3D Detection Challenge of NeurIPS 2020.

Here is a brief description:

📌 FCOS3D is a monocular 3D object detector

📌 It’s an anchor-free model based on FCOS (2D) counterpart

📌 It replaces the FCOS regression branch by 6 branches

📌 The center-ness is redeﬁned with a 2D Gaussian distribution based on the 3D-center

📌 The authors showed some failure cases, mainly focused on the detection of large objects and occluded objects.

⏹ Source code and models are shared in the MMDetection3D repo:
github.com/open-mmlab/mmd…

⏹ MMDetection3D also has many other 3D detection models:

Read 6 tweets

AI Fast Track

@ai_fast_track

2 Nov

YOLO Real-Time (YOLO-ReT) architecture targets edge devices.

It achieves 68.75 mAP on Pascal VOC and 34.91 mAP on COCO using MobileNetV2×0.75 backbone.

Here is a brief description of the YOLO-ReT 👇

Both model accuracy and execution time (Frame Per Second) are crucial when deploying a model on edge device. YOLO-ReT is based on these 2 ideas:

⏹ Backbone Truncation: Only 60% of the backbone is initialised with pretrained weights. Using all the weights harms model accuracy

⏹ Raw Feature Collection and Redistribution (RFCR):

📌 Fuse {C2, C3, C4} into C5 layer (fused feature map)

📌 Discard last CNN layers

📌 Pass the fused feature map through a 5x5 Mobile Convolution block (MBConv)

Read 6 tweets

AI Fast Track

@ai_fast_track

27 Oct

✨Common Object Detector Architecture you should be familiar with:

📌 Common object detectors are divided into One-Stage Detectors (OSD), and Two-Stage Detectors (TSD)

📌 Both OSD and TSD can be either anchor-based (relying on anchor boxes) or anchor-free

📌 OSD use the whole feature maps to predict bounding boxes/labels: Dense Prediction

📌 TSD have an extra step hence two-stage: extracting proposals (regions of interest)

📌 Proposals are used to extract feature map regions to predict bounding boxes/labels: Sparse Prediction

📌 TSD don't use the whole feature map for prediction

📌 TSD (e.g. Faster R-CNN) used to be more accurate than STD (e.g. SSD, YOLO, etc.)

📌 STD (e.g. EfficientDet, RetinaNet, VFNet, YOLOX, etc.) recently show better results than TSD

📌 STD are faster than TSD

Read 5 tweets

Share this page!

AI Fast Track

Try unrolling a thread yourself!

More from @ai_fast_track

AI Fast Track

AI Fast Track

AI Fast Track

AI Fast Track

AI Fast Track

AI Fast Track

Did Thread Reader help you today?

Like this author's thread?