Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Cameron R. Wolfe

@cwolferesearch

Apr 25 • 7 tweets • 3 min read Twitter logo

Read on Twitter

Scrolly

Large Language Models (LLMs) are notoriously bad at solving reasoning-based tasks. However, we can drastically improve their reasoning performance using simple techniques that require no fine-tuning or task-specific verifiers. Here’s how…🧵[1/7]

The technique is called chain-of-thought (CoT) prompting. It improves the reasoning abilities of LLMs using few-shot learning. In particular, CoT prompting inserts several examples of “chains of thought” for solving a reasoning problem into the LLM’s prompt. [2/7]

Here, a chain of thought is defined as “a coherent series of intermediate reasoning steps that lead to the final answer for a problem”. A CoT mimics how we solve reasoning problems as humans -- by breaking the problem down into intermediate steps that are easier to solve. [3/7]

Prior techniques teach LLMs how to generate coherent chains of thought via fine-tuning. Although this improves reasoning performance, such an approach requires an annotated dataset of reasoning problems with an associated CoT, which is burdensome and expensive to create. [4/7]

CoT prompting combines the idea of using chains of thought to improve reasoning performance with the few-shot learning abilities of LLMs. We can teach LLMs to generate a coherent CoT with their solution by just providing exemplars as part of their prompt. [5/7]

Such an approach massively improves LLM performance on tasks like arithmetic, commonsense and symbolic reasoning. Plus, it requires minimal data to be curated (i.e., just a few examples for the prompt) and performs no fine-tuning on the LLM. [6/7]

Put simply, CoT prompting is a simple prompting technique that can be applied to any pre-trained LLM checkpoint to improve reasoning performance. See the overview below for more details.

🔗: bit.ly/42067HU

[7/7]

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @cwolferesearch

Cameron R. Wolfe

@cwolferesearch

Apr 24

Can large language models (LLMs) train themselves? Recent research indicates that the answer might be yes… 🧵 [1/7]

But, what exactly do we mean by this? One notable method of using LLMs to train other LLMs involves using these models to generate data for instruction tuning. Typically, a larger, more powerful model is used for generation. [2/7]

This technique was pioneered by the self-instruct framework. Beginning with a small set of initial tasks (including one instruction and one input-output example per task), self-instruct uses LLMs to generate more data for instruction tuning. [3/7]

Read 7 tweets

Cameron R. Wolfe

@cwolferesearch

Apr 21

Nearly all recently-proposed large language models (LLMs) are based upon the decoder-only transformer architecture. But, is this always the best architecture to use? It depends… 🧵 [1/8]

https://twitter.com/cwolferesearch/status/1640446111348555776?s=20

First of all, what is a decoder-only architecture? Well, the architecture is exactly what is sounds like, a transformer architecture with the encoder removed. See the tweet below for more details. [2/8]

https://twitter.com/cwolferesearch/status/1640446111348555776?s=20

https://twitter.com/cwolferesearch/status/1644773244786941952?s=20

Decoder-only architectures use masked self-attention in each of their layers, meaning that each token considers only preceding tokens during the computation of self-attention. [3/8]

https://twitter.com/cwolferesearch/status/1644773244786941952?s=20

Read 8 tweets

Cameron R. Wolfe

@cwolferesearch

Apr 18

The new Stable Diffusion XL (SDXL) model is amazing, but I think there is considerable work to be done before prompt-based, generative image models reach their true potential. In particular, we need to fix one major problem… 🧵 [1/7]

Most of the text generated in any of the images from SDXL is still weird and oftentimes illegible! See for example some of the images with different prompts shown below. The text present in each image is mostly gibberish. [2/7]

At first, this may seem like a pretty small issue, and in some ways it is! SDXL is an incredible model that can produce a variety of useful outputs and understand intricate semantic details of textual prompts. [3/7]

Read 7 tweets

Cameron R. Wolfe

@cwolferesearch

Apr 17

Following the release of LLaMA, we saw a rapid explosion of open-source research on large language models (LLMs). Here are the three most notable model releases during this time… 🧵 [1/8]

1. Alpaca

Alpaca is a fine-tuned version of the LLaMA-7B LLM that performs similarly to OpenAI’s text-davinci-003 (i.e., GPT-3.5). It is created using instruction fine-tuning according to the self-instruct framework. [2/8]

Alpaca is trained using less than $600 (including both data collection and the compute cost of fine-tuning) and is found to roughly match the performance of GPT-3.5. Believe it or not, other LLaMA-based LLMs (following Alpaca) are created for even cheaper than this! [3/8]

Read 8 tweets

Cameron R. Wolfe

@cwolferesearch

Apr 13

🧵How can we teach LLMs to reason? 🧵

“Chain of thought prompting can improve performance on various reasoning tasks... the benefits of chain of thought prompting only materialize with a sufficient number of model parameters (around 100B).”

🔗: arxiv.org/abs/2201.11903

[1/4]

Large language models (LLMs) are poor at solving basic reasoning tasks. We can improve this ability with chain-of-thought (CoT) prompting, which simply breaks a reasoning task into a multi-step process (i.e., chain-of-thought) within the LLM's prompt. [2/4]

CoT prompting is a generic idea that can be applied to many different reasoning tasks. Although different tasks may require some prompt engineering, we always follow the generic approach of injecting a chain-of-thought into the LLM's prompt. [3/4]

Read 4 tweets

Cameron R. Wolfe

@cwolferesearch

Apr 12

As Large Language Models (LLMs) improve in quality, evaluating them becomes more difficult. Recent models are so good that even humans struggle to discern differences in quality. Luckily, we can just create an automated evaluation framework using GPT-4! 🧵 [1/6]

This technique was pioneered by the recent Vicuna model, which is a version of LLaMA-13B that has undergone supervised fine-tuning (SFT) over a set of 70K instruction-following examples from ShareGPT.

🔗: sharegpt.com

[2/6]

To perform evaluation, authors of Vicuna devise eight question categories and have GPT-4 generate ten benchmark questions per category. Surprisingly, GPT-4 is capable (with proper prompt engineering) of generating challenging questions that many LLMs struggle to answer. [3/6]

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Cameron R. Wolfe

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @cwolferesearch

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Cameron R. Wolfe

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!