Dickson Neoh πŸš€ Profile picture
πŸš€ I share bite-size practical machine learning deployment tips πŸ’‘ Current ProjectsπŸ‘‰ https://t.co/ClHoj7uDia πŸŽ‰ My best TweetsπŸ‘‰ https://t.co/2YzTSSRucv

May 3, 2022, 15 tweets

Deploying object detection models on a CPU is a PAIN.

In this thread I will show you how I optimized and 10x my YOLOX model from 5 FPS to 50 FPS on a CPU.

Yes!! CPU!

And yes for FREE.

The optimized model runs FASTER on a CPU than GPU 🀯

dicksonneoh.com/portfolio/how_…

A thread πŸ‘‡

By the end of this thread, you will find out how we go from this πŸ‘‡πŸŒ

To this πŸ‘‡πŸš€

You will learn how to:

πŸ₯‹ Train state-of-the-art YOLOX model with your own data.

πŸ€– Convert the YOLOX PyTorch model into ONNX and Intel's OpenVINO IR format. @intel

πŸš€ Run quantization algorithm to 10x your model’s inference speed.

Let's dive inπŸ‘‡

🚦 Motivation

In production environments, CPUs are far more common than GPUs.

But object detection models are a lot slower on CPUs - Not cool to have in real-time applications like #Tesla cars.

Can we feasibly deploy real-time object detection models on CPUs?

YES we canβœ…

β›· Modeling with YOLOX

Let's model a simple task of license plate detection using the YOLOX package by @Megvii.

For that, I collected 40 images of license plates from around my neighbourhood and label them with CVAT by @IntelSoftware

The annotations are in COCO format.

Once the annotations and images are in place, training a YOLOX model is as simple as running πŸ‘‡

After the training ends, we should have a @PyTorch checkpoint. Let's use the checkpoint to run an inference on a video and see its performance.

Out of the box, that's about 5 FPS with no optimization.

Let's see how we can improve it πŸ€”

πŸ€– ONNX Runtime by @onnxai

Now let's convert the PyTorch checkpoint into the ONNX form and run it with the ONNX Runtime.

We instantly improved the FPS from 5 FPS to about 10 FPS on a CPU. That's a 2x boost.

But its still not ideal for real time detection. We need more 🦾

πŸ”— OpenVINO Intermediate Representation (IR)

#OpenVINO is a toolkit by @intel to optimize DL models.

Let's convert our ONNX model into the IR (FP16) form and run the same inference.

Doing this we bumped the speed up to 16 FPS!

We're almost there, but can we do better? πŸ‘€

πŸ›  Post-Training Quantization

#OpenVINO also comes with a Post-training Optimization Toolkit (POT) designed to supercharge the inference of DL models.

POT runs 8-bit quantization and optimizes the model to use integer tensors instead of floating-point tensors.

The result? πŸš€πŸ‘‡

This is nothing short of mindblowing! I never thought this is possible with a CPU.

For reference, the same model (using PyTorch) runs on an RTX3090 at about 40+ FPS.

In this thread, I've shown you how you can 10x your YOLOX model (from 5 PS to 50 FPS) on a CPU using simple and free techniques.

The end result is a model that runs faster on a CPU than GPU.

This is nothing short of mindblowing! I never thought this is even possible.

If you like what you see and don't wish to miss gems like these, consider following me and retweeting.

This will 10x (pun intended) my energy to keep producing contents like these! πŸ™

More details in the blog post dicksonneoh.com/portfolio/how_…

@PINTO03091
I would love to know what you think of this as someone who's experienced in Intel's software and OpenVINO. Can this be better?

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling