Cameron R. Wolfe Profile picture
May 6 8 tweets 3 min read Twitter logo Read on Twitter
The recently-proposed YOLO-NAS object detector achieves a mean average-precision (mAP) of 52 on Microsoft COCO with <5 ms latency, while The YOLOv1 requires 30-50ms latency to achieve comparable mAP. Here are the three key ideas that make YoloNAS fast and performant… 🧵 [1/8] Image
1. Neural Architecture Search (NAS)

Authors of YOLO-NAS define a space of 10^14 possible architecture configurations inspired by YOLO-v6/v8, then discover a suite of models with different performance/latency tradeoffs by using a hardware-aware NAS algorithm called AutoNAC. [2/8] Image
2. Model Quantization

YOLO-NAS adopts a complex quantization strategy to minimize latency within the resulting model. First, YOLO-NAS adopts a quantization-aware structure within each of its layers and leverages quantization-aware training.

🔗: arxiv.org/abs/2212.01593

[3/8]
YOLO-NAS also adopts a hybrid/dynamic quantization strategy that avoids quantizing layers that damage performance. As a result, YOLO-NAS loses ~.5 mAP during post-training quantization, whereas most other models see a 1-2 point degradation in mAP. [4/8] Image
For an in-depth, recent analysis of post-training quantization, I would recommend the following paper. It provides a ton of awesome context and up-to-date material on the topic that is both useful and accessible.

🔗: arxiv.org/abs/2304.09785

[5/8]
3. Training Regime

YOLO-NAS is trained in three parts: pre-training over the Object365, training over pseudo/weakly-labeled data, and knowledge distillation from a larger model. These components allow the lightweight YOLO-NAS model to punch above its weight in performance. [6/8]
TL;DR: YOLO-NAS achieves impressive latency (200 FPS even with the largest model!) and performs well relative to other lightweight object detectors. Thanks to my friend @DataScienceHarp for sharing this interesting research with me!

🔗: deci.ai/blog/YOLO-NAS-…

[7/8]
YOLO-NAS is completely open-source, so you can access the model, look at training or fine-tuning examples, and read more about how it works on the associated Github page.

🔗: github.com/Deci-AI/super-…

[8/8]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Cameron R. Wolfe

Cameron R. Wolfe Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @cwolferesearch

May 8
We can use few-shot learning and instruction prompting to solve many problems with large language models (LLMs), but what should we do when these techniques fall short? Here are three advanced prompting approaches that can be used to solve complex problems with LLMs… 🧵 [1/8] Image
1. Chain of Thought (CoT) Prompting

LLMs are typically poor at solving multi-step or reasoning-based problems. CoT prompting mitigates this problem by encouraging LLMs via few-shot learning to generate a step-by-step problem-solving rationale along with their final answer. [2/8] Image
Prior work has shown that teaching language models (e.g., via fine-tuning) to generate problem-solving rationales improves their reasoning capabilities. CoT prompting drastically improves LLM performance on commonsense, arithmetic, and symbolic reasoning benchmarks. [3/8] Image
Read 8 tweets
May 4
Prompt engineering for language models usually involves tweaking the wording or structure of a prompt. But, recent research has explored automated prompt engineering via continuous updates (e.g., via SGD) to a prompt’s embedding. Here’s how these techniques work… 🧵 [1/8] Image
First, what is a prompt embedding? Given some raw text, a language model will tokenize the text (creating a list of words/sub-words) then look up each token’s associated embedding. The resulting list of token embeddings can also be referred to as a prompt embedding. [2/8] Image
AutoPrompt combines the original prompt input with a set of shared “trigger token” embeddings that are selected/trained to improve model performance by using gradient descent. Trigger tokens are shared across all inputs provided to the language model. [3/8] Image
Read 8 tweets
May 3
I’m currently writing a survey/overview of important prompt engineering tactics for my newsletter. Here’s my top-5 findings so far… 🧵 [1/7] Image
1. Start simple

Prompt engineering is an empirical science. We need to start with a simple baseline, then slowly add complexity. Starting with a long, complex prompt wastes tokens and might perform worse than something simple. [2/7] Image
2. Prompt tracking and versioning

We need a mechanism to track different versions of a prompt over time. We can do this by combining git with tools like Prompt Templates in LangChain.

🔗: bit.ly/3LS6AH3

[3/7]
Read 8 tweets
May 1
Many different (text-based) transformer architectures exist, but when and where should we use them? Here’s a quick list of four important transformer variants and the best applications to use them for…🧵[1/7] Image
To gain a better understanding of these architectures, please check out the tweet below! In this thread, we will focus on the tasks for which each architecture is most appropriate, rather than the architectures themselves. [2/7]

Decoder-only transformers are used by (large) language models. Why? Because their use of masked self-attention makes them highly compatible with next-token prediction. Each token only pays attention to prior tokens in the sequence! [3/7] Image
Read 7 tweets
Apr 25
Large Language Models (LLMs) are notoriously bad at solving reasoning-based tasks. However, we can drastically improve their reasoning performance using simple techniques that require no fine-tuning or task-specific verifiers. Here’s how…🧵[1/7] Image
The technique is called chain-of-thought (CoT) prompting. It improves the reasoning abilities of LLMs using few-shot learning. In particular, CoT prompting inserts several examples of “chains of thought” for solving a reasoning problem into the LLM’s prompt. [2/7] Image
Here, a chain of thought is defined as “a coherent series of intermediate reasoning steps that lead to the final answer for a problem”. A CoT mimics how we solve reasoning problems as humans -- by breaking the problem down into intermediate steps that are easier to solve. [3/7] Image
Read 7 tweets
Apr 24
Can large language models (LLMs) train themselves? Recent research indicates that the answer might be yes… 🧵 [1/7] Image
But, what exactly do we mean by this? One notable method of using LLMs to train other LLMs involves using these models to generate data for instruction tuning. Typically, a larger, more powerful model is used for generation. [2/7] Image
This technique was pioneered by the self-instruct framework. Beginning with a small set of initial tasks (including one instruction and one input-output example per task), self-instruct uses LLMs to generate more data for instruction tuning. [3/7] Image
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(