Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Cameron R. Wolfe

@cwolferesearch

May 6 • 8 tweets • 3 min read Twitter logo

Read on Twitter

The recently-proposed YOLO-NAS object detector achieves a mean average-precision (mAP) of 52 on Microsoft COCO with <5 ms latency, while The YOLOv1 requires 30-50ms latency to achieve comparable mAP. Here are the three key ideas that make YoloNAS fast and performant… 🧵 [1/8]

1. Neural Architecture Search (NAS)

Authors of YOLO-NAS define a space of 10^14 possible architecture configurations inspired by YOLO-v6/v8, then discover a suite of models with different performance/latency tradeoffs by using a hardware-aware NAS algorithm called AutoNAC. [2/8]

2. Model Quantization

YOLO-NAS adopts a complex quantization strategy to minimize latency within the resulting model. First, YOLO-NAS adopts a quantization-aware structure within each of its layers and leverages quantization-aware training.

🔗: arxiv.org/abs/2212.01593

[3/8]

YOLO-NAS also adopts a hybrid/dynamic quantization strategy that avoids quantizing layers that damage performance. As a result, YOLO-NAS loses ~.5 mAP during post-training quantization, whereas most other models see a 1-2 point degradation in mAP. [4/8]

For an in-depth, recent analysis of post-training quantization, I would recommend the following paper. It provides a ton of awesome context and up-to-date material on the topic that is both useful and accessible.

🔗: arxiv.org/abs/2304.09785

[5/8]

3. Training Regime

YOLO-NAS is trained in three parts: pre-training over the Object365, training over pseudo/weakly-labeled data, and knowledge distillation from a larger model. These components allow the lightweight YOLO-NAS model to punch above its weight in performance. [6/8]

@DataScienceHarp

TL;DR: YOLO-NAS achieves impressive latency (200 FPS even with the largest model!) and performs well relative to other lightweight object detectors. Thanks to my friend @DataScienceHarp for sharing this interesting research with me!

🔗: deci.ai/blog/YOLO-NAS-…

[7/8]

YOLO-NAS is completely open-source, so you can access the model, look at training or fine-tuning examples, and read more about how it works on the associated Github page.

🔗: github.com/Deci-AI/super-…

[8/8]

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @cwolferesearch

Cameron R. Wolfe

@cwolferesearch

May 8

We can use few-shot learning and instruction prompting to solve many problems with large language models (LLMs), but what should we do when these techniques fall short? Here are three advanced prompting approaches that can be used to solve complex problems with LLMs… 🧵 [1/8]

1. Chain of Thought (CoT) Prompting

LLMs are typically poor at solving multi-step or reasoning-based problems. CoT prompting mitigates this problem by encouraging LLMs via few-shot learning to generate a step-by-step problem-solving rationale along with their final answer. [2/8]

Prior work has shown that teaching language models (e.g., via fine-tuning) to generate problem-solving rationales improves their reasoning capabilities. CoT prompting drastically improves LLM performance on commonsense, arithmetic, and symbolic reasoning benchmarks. [3/8]

Read 8 tweets

Cameron R. Wolfe

@cwolferesearch

May 4

Prompt engineering for language models usually involves tweaking the wording or structure of a prompt. But, recent research has explored automated prompt engineering via continuous updates (e.g., via SGD) to a prompt’s embedding. Here’s how these techniques work… 🧵 [1/8]

First, what is a prompt embedding? Given some raw text, a language model will tokenize the text (creating a list of words/sub-words) then look up each token’s associated embedding. The resulting list of token embeddings can also be referred to as a prompt embedding. [2/8]

AutoPrompt combines the original prompt input with a set of shared “trigger token” embeddings that are selected/trained to improve model performance by using gradient descent. Trigger tokens are shared across all inputs provided to the language model. [3/8]

Read 8 tweets

Cameron R. Wolfe

@cwolferesearch

May 3

I’m currently writing a survey/overview of important prompt engineering tactics for my newsletter. Here’s my top-5 findings so far… 🧵 [1/7]

1. Start simple

Prompt engineering is an empirical science. We need to start with a simple baseline, then slowly add complexity. Starting with a long, complex prompt wastes tokens and might perform worse than something simple. [2/7]

2. Prompt tracking and versioning

We need a mechanism to track different versions of a prompt over time. We can do this by combining git with tools like Prompt Templates in LangChain.

🔗: bit.ly/3LS6AH3

[3/7]

Read 8 tweets

Cameron R. Wolfe

@cwolferesearch

May 1

Many different (text-based) transformer architectures exist, but when and where should we use them? Here’s a quick list of four important transformer variants and the best applications to use them for…🧵[1/7]

https://twitter.com/cwolferesearch/status/1649476511248818182?s=20

To gain a better understanding of these architectures, please check out the tweet below! In this thread, we will focus on the tasks for which each architecture is most appropriate, rather than the architectures themselves. [2/7]

https://twitter.com/cwolferesearch/status/1649476511248818182?s=20

Decoder-only transformers are used by (large) language models. Why? Because their use of masked self-attention makes them highly compatible with next-token prediction. Each token only pays attention to prior tokens in the sequence! [3/7]

Read 7 tweets

Cameron R. Wolfe

@cwolferesearch

Apr 25

Large Language Models (LLMs) are notoriously bad at solving reasoning-based tasks. However, we can drastically improve their reasoning performance using simple techniques that require no fine-tuning or task-specific verifiers. Here’s how…🧵[1/7]

The technique is called chain-of-thought (CoT) prompting. It improves the reasoning abilities of LLMs using few-shot learning. In particular, CoT prompting inserts several examples of “chains of thought” for solving a reasoning problem into the LLM’s prompt. [2/7]

Here, a chain of thought is defined as “a coherent series of intermediate reasoning steps that lead to the final answer for a problem”. A CoT mimics how we solve reasoning problems as humans -- by breaking the problem down into intermediate steps that are easier to solve. [3/7]

Read 7 tweets

Cameron R. Wolfe

@cwolferesearch

Apr 24

Can large language models (LLMs) train themselves? Recent research indicates that the answer might be yes… 🧵 [1/7]

But, what exactly do we mean by this? One notable method of using LLMs to train other LLMs involves using these models to generate data for instruction tuning. Typically, a larger, more powerful model is used for generation. [2/7]

This technique was pioneered by the self-instruct framework. Beginning with a small set of initial tasks (including one instruction and one input-output example per task), self-instruct uses LLMs to generate more data for instruction tuning. [3/7]

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Cameron R. Wolfe

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @cwolferesearch

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!