Day 24/30: π₯ EfficientDet is a very popular object detection model for a good reason!
Letβs see why
π EfficientDet achieved State-Of-The-Art (SOTA) accuracy while reducing both the size of parameters, and the FLOPS, when it was released. Itβs still a very good contender.
π Before introducing EfficientDet, models were getting impressively big to achieve SOTA results
β The authors asked the following question:
Is it possible to build a scalable detection architecture with both higher accuracy and better efficiency across # resource constraints?
So, they systematically studied neural network architecture design choices for object detection, and proposed several key optimizations to improve efficiency:
1- A weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multiscale feature fusion
2- A compound scaling method that uniformly scales the resolution (image size), depth (# layers), and width (# channels) for all backbone, feature network, and box/class prediction networks at the same time
As you might noticed in the figure, image size, # of layers, and # of channels are all dependent on the phi factor. The latter determines the values of those 3 components to consistently achieve better accuracy with much fewer parameters and FLOPs than previous object detectors.
π EfficientDet-D7 achieves state-of-the-art 55.1 AP on COCO test-dev with 77M parameters and 410B FLOPs, being 4x - 9x smaller and using 13x - 42x fewer FLOPs than previous detectors.
IceVision supports EfficientDet. Check out how simple instantiating an EfficientDet model.
It was one of first competitors of anchor-based single/two stage object detectors.
Understanding FCOS will help understanding other model inspired by FCOS.
Summary ...π
π FCOS reformulates object detection in a per-pixel prediction fashion
π It uses multi-level prediction to improve the recall and resolve the ambiguity resulted from overlapped bounding boxes
π It proposes βcenter-nessβ branch, which helps suppress the low-quality detected bounding boxes and improves the overall performance by a large margin
π It avoids complex computation such as the intersection-over-union (IoU)