We’re at a point where these models are capable enough to perform many tasks. Optimization now becomes just as important as scaling up further.
Techniques like Mixture of Experts, PPLM, distillation, random feature attention are all being actively researched.
These will both optimize costs and reduce compute needs, as well as improve the control developers have over large language models.
The largest models (GPT-3, Turing-NLG, etc.) already have lots of knowledge and capabilities. The question is, how do we more effectively, reliably, and systematically retrieve that knowledge?
As answers to this question become clearer, language models will become more useful.
“We argue that algorithmic progress has an aspect that is both straightforward to measure and interesting: reductions over time
in the compute needed to reach past capabilities.”
We’re seeing algorithmic efficiency doubling every 16 months.
By the end of 2021, it will cost around half of what it cost in early 2020 to train a GPT-3-sized model.
Hundreds of products are being built on top of language models, including hyperwrite.ai, @OthersideAI’s AI writing companion.
@OpenAI’s customers are generating billions of words each day with GPT-3.
@Microsoft is even integrating GPT-3 into its Power Apps platform.
Massive amounts of capital is being invested in this space.
@OpenAI just announced a $100M fund for startups using their API.
@AnthropicAI announced a $124M raise to fund research into large models.
This is just the start. Language is powerful on its own, but when you begin to combine language with other modalities, you get even more powerful and capable models.
Imagine a model that is trained on both text and video. This is coming, and soon.
Multi-modal models.
If you are interested in following along as these models progress, here are some accounts to follow:
Working with GPT-3 is just a game of figuring out how to structure text to get the results you want.
Here are some methods that work well.
Some of these methods can be used together. There’s an art to figuring out which methods are best for obtaining the results you want.
You can use zero-shot, one-shot, or few-shot methods, depending on the task. Your goal should typically be to zero-shot or one-shot, as latency and costs will be lower.