A simple overview of the state of massive language models like GPT-3.

/thread
Since 2018, each year has brought new models that are typically 10x+ larger than models from the year prior.
2018:
GPT-1 | 110M Parameters
BERT | 340M Parameters

2019:
GPT-2 | 1.5B Parameters
Megatron | 8.3B Parameters

2020:
Turing-NLG | 17B Parameters
GPT-3 | 175B Parameters

2021:
Google Switch | 1.6T Parameters

What is coming next?
We’re at a point where these models are capable enough to perform many tasks. Optimization now becomes just as important as scaling up further.
Techniques like Mixture of Experts, PPLM, distillation, random feature attention are all being actively researched.

These will both optimize costs and reduce compute needs, as well as improve the control developers have over large language models.
The largest models (GPT-3, Turing-NLG, etc.) already have lots of knowledge and capabilities. The question is, how do we more effectively, reliably, and systematically retrieve that knowledge?

As answers to this question become clearer, language models will become more useful.
In this paper from OpenAI: cdn.openai.com/papers/ai_and_…

“We argue that algorithmic progress has an aspect that is both straightforward to measure and interesting: reductions over time
in the compute needed to reach past capabilities.”
We’re seeing algorithmic efficiency doubling every 16 months.

By the end of 2021, it will cost around half of what it cost in early 2020 to train a GPT-3-sized model.
Hundreds of products are being built on top of language models, including hyperwrite.ai, @OthersideAI’s AI writing companion.

@OpenAI’s customers are generating billions of words each day with GPT-3.

@Microsoft is even integrating GPT-3 into its Power Apps platform.
Massive amounts of capital is being invested in this space.

@OpenAI just announced a $100M fund for startups using their API.

@AnthropicAI announced a $124M raise to fund research into large models.
This is just the start. Language is powerful on its own, but when you begin to combine language with other modalities, you get even more powerful and capable models.

Imagine a model that is trained on both text and video. This is coming, and soon.

Multi-modal models.
If you are interested in following along as these models progress, here are some accounts to follow:

@OpenAI
@MSFTResearch
@GoogleAI
@huggingface
@allen_ai

Drop me a follow for more content like this!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Matt Shumer

Matt Shumer Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @mattshumer_

1 Jun
Validating a startup idea doesn’t have to take a ton of effort.

Here are 10 tools you can use to build MVPs in days.

/thread
@bubble | bubble.io

Bubble has a visual programming interface that enables anyone to build a complex web app in days.

I use Bubble to quickly get a working MVP to users after I’ve proven out an initial idea.
@AdaloHQ | adalo.com

Adalo is similar to Bubble, but you can also use it to build native mobile applications.

You can make social apps, marketplaces, task apps, and more.
Read 12 tweets
3 Dec 20
How to GPT-3!

A primer thread on GPT-3 prompt structure:
Working with GPT-3 is just a game of figuring out how to structure text to get the results you want.

Here are some methods that work well.

Some of these methods can be used together. There’s an art to figuring out which methods are best for obtaining the results you want.
You can use zero-shot, one-shot, or few-shot methods, depending on the task. Your goal should typically be to zero-shot or one-shot, as latency and costs will be lower.

Here’s a quick primer:
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(