My Authors
Read all threads
Got my invite to the @OpenAI GPT-3 API from @gdb. I actually think it deserves more hype than it’s getting, but not necessarily for the magical reasons Twitter touts. Why? My quick thoughts and impressions: (1/11)
First, let me summarize the API documentation. A user has access to 4 models (varying sizes). They cannot fine-tune the models. There’s basically only one function: the user can input some text (“priming”) and the model will predict the next several tokens (words). (2/11)
When I first played with the API, I was stumped. Why couldn’t I replicate stellar Twitter demos? Was everyone just sharing cherry-picking text samples? I wanted the model to identify basic patterns in some unstructured data, but it gave garbage when I input just the data. (3/11)
@notsleepingturk suggested I reformat my inputs as tuples of (unstructured data, example pattern, indicator if the pattern exists). The model could then easily “autocomplete” tuples with missing indicators. Damn. Priming is obviously an art. (4/11)
So why is GPT-3 so hype? It’s amazingly powerful *if* you know how to prime the model well. It’s going to change the ML paradigm — instead of constructing giant train sets for models, we’ll be crafting a few examples for models to do “few-shot” extrapolation from. (5/11)
@sharifshameem cracked the skill of priming in his demos. We don’t see what he prepends the demo input with before sending it to the API. Figuring out how to prime models properly will be the key to successfully utilizing language models in the future. (6/11)
Let’s deep-dive into how, in theory, one can be a ⭐️ primer. The model’s goal is to maximize the log-likelihood of successive tokens given the primed input. In English, this ends up being: find similar patterns in the training set, and output similar successive tokens. (7/11)
GPT-3 is a great example of the “garbage-in-garbage-out” principle. If you prime poorly, you get shitty results. But since the models are probably trained on basically every piece of data on the Internet, chances are if you prime well, you’ll get intelligent outputs. (8/11)
I like to think of these language models as “children with infinite memory.” Children’s skills are not all that refined, but they have basic pattern-matching skills. Coupled with a superpower to memorize the entire world, well, couldn’t they be extremely useful? (9/11)
What else is so hype? The API’s best model is 350 GB. Serving this monstrosity efficiently and cheaply is an entirely new software problem for the industry. If @OpenAI cracks this, they can become the AWS of modeling. (10/11)
TLDR, if this takes off:

1) Expect the next generation of good ML practitioners to be in way more creative. It’s taking me a while to wrap my head around how to prime this model to get cool demos, lol.
2) Startups will move away from training their own in-house models. (11/11)
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with Shreya Shankar

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!