πŸ“’ The amazing @OpenMMLab just released a new project:

MMFlow: an open-source optical flow toolbox written in Pytorch

OpenMMLab hosts several impressive open-source projects for both academic research and industrial applications.
OpenMMLab covers a wide range of research topics of computer vision, e.g., classification, detection, segmentation and super-resolution.

πŸ“Œ MMCV: Foundational library for computer vision.

πŸ“Œ MIM: MIM Installs OpenMMLab Packages.
πŸ“Œ MMClassification: Image classification toolbox and benchmark.

πŸ“Œ MMDetection: Detection toolbox and benchmark.

πŸ“Œ MMDetection3D: Next-generation platform for general 3D object detection.

πŸ“Œ MMSegmentation: Semantic segmentation toolbox and benchmark.
πŸ“Œ MMAction2: Next-generation action understanding toolbox and benchmark.

πŸ“Œ MMTracking: Video perception toolbox and benchmark.

πŸ“Œ MMPose: Pose estimation toolbox and benchmark.

πŸ“Œ MMEditing: Image and video editing toolbox.
πŸ“Œ MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.

πŸ“Œ MMGeneration: Image and video generative models toolbox.

πŸ“Œ MMFlow: Optical flow toolbox and benchmark.

Repo: github.com/open-mmlab/mmf…
🟦If you enjoy reading this kind of content, follow @ai_fast_track for more OD / CV demystified content in your feed

🟧If you could give the thread a quick retweet, it would help others discover this content!

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with AI Fast Track

AI Fast Track Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ai_fast_track

17 Nov
❇VFNet: A very interesting model that isn’t under the radar. You should give it a try :)

VariFocalNet: An IoU-aware Dense Object Detector

🧊 Background:
πŸ“Œ Accurately ranking candidate detections is crucial for dense object detectors to achieve high performance
... Image
πŸ“Œ Prior work uses the classification score or a combination of classification and predicted localization scores (centerness) to rank candidates.

πŸ“Œ Those 2 scores are still not optimal

🧊 Novelty:
πŸ“Œ VFNet proposes to learn an IoU-Aware Classification Score (IACS)
πŸ“ŒIACS is used as a joint representation of object presence confidence and localization accuracy using IoU

πŸ“Œ VFNet introduces the VariFocal Loss

πŸ“Œ The VariFocal Loss down-weights only negative examples for addressing the class imbalance problem during training
Read 7 tweets
15 Nov
4 Feature Pyramid Network (FPN) Design you should know:

FPN, PANet, NAS-FPN, and BiFPN

πŸ“Œ (a) FPN uses a top-down pathway to fuse multi-scale features from level 3 to 7 (P3 - P7);

πŸ“Œ (b) PANet adds an additional bottom-up pathway on top of FPN;
πŸ“Œ (c) NAS-FPN uses neural architecture search to find an irregular feature network topology and then repeatedly apply the same block;

πŸ“Œ (d) BiFPN is a bit similar to PANet, adds shortcut fusing, and then repeatedly apply the same block
πŸ“ Some other observations:

πŸ“Œ The model diagram corresponds to the One-Stage Object Detection Architecture

πŸ“Œ The FPN illustration is extracted from the EfficientDet paper

πŸ“ŒThe (P3-P5) layers are referred as the Convolutional (C3-C5) Layers in other papers
Read 5 tweets
5 Nov
πŸ€” How to increase your Small Object Detection Average Precision APs?

πŸ’‘ By increasing both image and backbone sizes when training your model:

πŸ“Œ Increasing both image and backbone sizes in EfficientDet jumped APs by 14+%

πŸ“Œ Increasing backbone size in RFBNet increased APs Image
πŸ“Œ Increasing image size from 320 to 608 in PP-YOLO led to 10+% increase in APs

For more tips and tricks to improve small object detection tips & tricks, check out the list I shared in my first tweet.
Benchmarks are extracted from the PP-YOLO paper:

πŸ“° Paper: PP-YOLO: An Effective and Efficient Implementation of Object Detector

PDF: arxiv.org/pdf/2007.12099…
Read 4 tweets
4 Nov
πŸ₯‡ FCOS3D won the 1st place out of all the vision-only methods in the nuScenes 3D Detection Challenge of NeurIPS 2020.

Here is a brief description:

πŸ“Œ FCOS3D is a monocular 3D object detector

πŸ“Œ It’s an anchor-free model based on FCOS (2D) counterpart
πŸ“Œ It replaces the FCOS regression branch by 6 branches

πŸ“Œ The center-ness is redefined with a 2D Gaussian distribution based on the 3D-center

πŸ“Œ The authors showed some failure cases, mainly focused on the detection of large objects and occluded objects.
⏹ Source code and models are shared in the MMDetection3D repo:
github.com/open-mmlab/mmd…

⏹ MMDetection3D also has many other 3D detection models:
Read 6 tweets
2 Nov
YOLO Real-Time (YOLO-ReT) architecture targets edge devices.

It achieves 68.75 mAP on Pascal VOC and 34.91 mAP on COCO using MobileNetV2Γ—0.75 backbone.

Here is a brief description of the YOLO-ReT πŸ‘‡ Image
Both model accuracy and execution time (Frame Per Second) are crucial when deploying a model on edge device. YOLO-ReT is based on these 2 ideas:

⏹ Backbone Truncation: Only 60% of the backbone is initialised with pretrained weights. Using all the weights harms model accuracy
⏹ Raw Feature Collection and Redistribution (RFCR):

πŸ“Œ Fuse {C2, C3, C4} into C5 layer (fused feature map)

πŸ“Œ Discard last CNN layers

πŸ“Œ Pass the fused feature map through a 5x5 Mobile Convolution block (MBConv)
Read 6 tweets
27 Oct
✨Common Object Detector Architecture you should be familiar with:

πŸ“Œ Common object detectors are divided into One-Stage Detectors (OSD), and Two-Stage Detectors (TSD)

πŸ“Œ Both OSD and TSD can be either anchor-based (relying on anchor boxes) or anchor-free
πŸ“Œ OSD use the whole feature maps to predict bounding boxes/labels: Dense Prediction

πŸ“Œ TSD have an extra step hence two-stage: extracting proposals (regions of interest)

πŸ“Œ Proposals are used to extract feature map regions to predict bounding boxes/labels: Sparse Prediction
πŸ“Œ TSD don't use the whole feature map for prediction

πŸ“Œ TSD (e.g. Faster R-CNN) used to be more accurate than STD (e.g. SSD, YOLO, etc.)

πŸ“Œ STD (e.g. EfficientDet, RetinaNet, VFNet, YOLOX, etc.) recently show better results than TSD

πŸ“Œ STD are faster than TSD
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Thank you for your support!

Follow Us on Twitter!

:(