Andrej Karpathy Profile picture
Jan 24, 2023 11 tweets 6 min read Read on X
The hottest new programming language is English
This tweet went wide, thought I'd post some of the recent supporting articles that inspired it.
1/ GPT-3 paper showed that LLMs perform in-context learning, and can be "programmed" inside the prompt with input:output examples to perform diverse tasks arxiv.org/abs/2005.14165 Image
2/ These two [1] arxiv.org/abs/2205.11916 , [2] arxiv.org/abs/2211.01910 are good examples that the prompt can further program the "solution strategy", and with a good enough design of it, a lot more complex multi-step reasoning tasks become possible. Image
3/ These two articles/papers:
[1] evjang.com/2021/10/23/gen…
[2] arxiv.org/abs/2106.01345
bit more technical but TLDR good prompts include the desired/aspiring performance. GPTs don't "want" to succeed. They want to imitate. You want to succeed, and you have to ask for it. Image
4/ Building A Virtual Machine inside ChatGPT engraved.blog/building-a-vir…
Here we start getting into specifics of "programming" in English. Take a look at the rules and input/output specifications declared in English, conditioning the GPT into a particular kind of role. Read in full. Image
5/ "ChatGPT in an iOS Shortcut — Worlds Smartest HomeKit Voice Assistant" matemarschalko.medium.com/chatgpt-in-an-…
This voice assistant is significantly more capable and personalized than your regular Siri/Alexa/etc., and it was programmed in English. Image
6/ "GPT is all you need for the backend" github.com/TheAppleTucker…
Tired: use an LLM to help you write a backend
Wired: LLM is the backend
Inspiring project from a recent Scale hackathon. The LLM backend takes state as JSON blob and modifies it based on... English description. Image
7/ The prompt allegedly used by Bing chat, potentially spilled by a prompt injection attack important point for our purposes is that the identity is constructed and programmed in English, by laying out who it is, what it knows/doesn't know, and how to act. Image
8/ These examples illustrate how prompts 1: matter and 2: are not trivial, and why today it makes sense to be a "prompt engineer" (e.g. @goodside ). I also like to think of this role as a kind of LLM psychologist. Image
9/ Pulling in one more relevant tweet of mine from a while ago. GPTs run natural language programs by completing the document.
This is not an exhaustive list (people can add more in replies), but at least some of the articles I saw recently that stood out.

It's still early days but this new programming paradigm has the potential to expand the number of programmers to ~1.5B people.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Andrej Karpathy

Andrej Karpathy Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @karpathy

Feb 20
New (2h13m 😅) lecture: "Let's build the GPT Tokenizer"

Tokenizers are a completely separate stage of the LLM pipeline: they have their own training set, training algorithm (Byte Pair Encoding), and after training implement two functions: encode() from strings to tokens, and decode() back from tokens to strings. In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI.Image
We will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely. Image
Also, releasing new repository on GitHub: minbpe
Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.


In the video we essentially build minbpe from scratch.
Don't miss the to build your owngithub.com/karpathy/minbpe
exercise.md
Read 4 tweets
Dec 27, 2023
"Man-Computer Symbiosis" by Licklider, 1960

I love reading technology prediction documents because the benefit of hindsight is training data. Here, 64 years ago, Licklider imagines computing as a fundamentally intelligence amplification tool.groups.csail.mit.edu/medg/people/ps…
Licklider argues that the period of "intelligence augmentation" (IA) may be transient on the path to full automation (AI), but still long enough to be worth thinking through and about.
His citations for what must have felt like rapid progress in both narrow AI and AGI (of that age, i.e. the "general problem solver" [20]) are today known to be false starts that were off track in a quite fundamental way, at that time based on a manual process of encoding knowledge with predicate logic and using production rules of logic and search to manipulate them into conclusions. Today, most of AI is only aware of all of this work as a historical curiosity, it is not part of the "master branch" of the field, it is stuck in a dead end feature branch. And notably, what is considered today the most promising approach (LLMs) were at that time not only completely computationally inaccessible, but also impossible due to the lack of training data of trillions of tokens in digitized forms. (What might be an equivalent of that today?)
The study by the Air Force, estimating that machines alone would be doing problem solving of military significance in 20 years time evokes a snicker today. Amusingly, "20 years away" seems to be a kind of codeword for "no idea, long time". Arguably, I'm not sure that we are there even today, 64 years later. Computers do a lot to increase situational awareness, but decision making of "military significance" afaik is still well within the domain of human computation.Image
An interesting observation from Licklider is that most of his "thinking" in a day-to-day computational task thought experiment is not so much thinking, but more a rote, mechanical, automatable data collection and visualization. It is this observation that leads him to conclude that the strengths and weaknesses of humans and computers are complementary; That computers can do the busy work, and humans can do thinking work. This has been the prevailing paradigm for the next 64 years, and it's only very recently (last ~year) that computers have started to make a dent into "thinking" in a general, scaleable, and economy-impacting way. Not in an explicit, hard, predicate logic way, but in an implicit, soft, statistical way. Hence the LLM-driven AI summer of today.Image
Read 11 tweets
Apr 2, 2023
Next frontier of prompt engineering imo: "AutoGPTs" . 1 GPT call is just like 1 instruction on a computer. They can be strung together into programs. Use prompt to define I/O device and tool specs, define the cognitive loop, page data in and out of context window, .run().
Interesting non-obvious note on GPT psychology is that unlike people they are completely unaware of their own strengths and limitations. E.g. that they have finite context window. That they can just barely do mental math. That samples can get unlucky and go off the rails. Etc.
(so I'd expect the good prompts to explicitly address things like this)
Read 5 tweets
Mar 6, 2023
More good read/discussion on psychology of LLMs. I don't follow in full but imo it is barking up the right tree w.r.t. a framework for analysis. lesswrong.com/posts/D7PumeYT…
A pretrained LLM is not an AI but a simulator, described by a statistical physics based on internet webpages. The system evolves given any initial conditions (prompt). To gather logprob it internally maintains a probability distribution over what kind of document it is completing
In particular, "good, aligned, conversational AI" is just one of many possible different rollouts. Finetuning / alignment tries to "collapse" and control the entropy to that region of the simulator. Jailbreak prompts try to knock the state into other logprob ravines.
Read 4 tweets
Jan 17, 2023
🔥 New (1h56m) video lecture: "Let's build GPT: from scratch, in code, spelled out."

We build and train a Transformer following the "Attention Is All You Need" paper in the language modeling setting and end up with the core of nanoGPT. Image
First ~1 hour is 1) establishing a baseline (bigram) language model, and 2) introducing the core "attention" mechanism at the heart of the Transformer as a kind of communication / message passing between nodes in a directed graph. Image
The second ~1hr builds up the Transformer: multi-headed self-attention, MLP, residual connections, layernorms. Then we train one and compare it to OpenAI's GPT-3 (spoiler: ours is around ~10K - 1M times smaller but the ~same neural net) and ChatGPT (i.e. ours is pretraining only) Image
Read 4 tweets
Jan 11, 2023
Didn't tweet nanoGPT yet (quietly getting it to good shape) but it's trending on HN so here it is :) :
github.com/karpathy/nanoG…
Aspires to be simplest, fastest repo for training/finetuning medium-sized GPTs. So far confirmed it reproduced GPT-2 (124M). 2 simple files of ~300 lines
Rough example, a decent GPT-2 (124M) pre-training reproduction would be 1 node of 8x A100 40GB for 32 hours, processing 8 GPU * 16 batch size * 1024 block size * 500K iters = ~65B tokens. I suspect this wall clock can still be improved ~2-3X+ without getting too exotic.
I'd like to continue to make it faster, reproduce the other GPT-2 models, then scale up pre-training to bigger models/datasets, then improve the docs for finetuning (the practical use case). Also working on video lecture where I will build it from scratch, hoping out in ~2 weeks.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(