this is my initial prompt to GPT4. I give it the assembly code for sort3, ask it to be very careful, do it's CoT thing, etc
it then goes over each instruction, makes a note on what each instruction does, an waits for further instructions, to which I tell it. I also ask it to set temperature to 0. Amirite @goodside ??
then does more of its CoT thing, and boom. Sparks of AGI
I'll try the same with sort4, but after i do some actual real work today. kthnxbai
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1/7 Had a fun weekend experiment – the "Little Retrieval Test for" (LRT)!
It's a simple test to assess basic retrieval capabilities for LLMs in long contexts.
I prompted @AnthropicAI's Claude with a long list of numbers, and hidden somewhere... a sneaky instruction!
2/7
The prompt consists of
"line {i}: REGISTER {random number}"
And at a *random location*
"[EXECUTE THIS]: GOTO line {also random}, report its number"
Why randomly place this AND point to a random destination? To avoid globally attending tokens, just in case of sparse attn
3/7
After that version of the test, I also randomly shuffled the lines to see how breaking "token locality" affects the models. So here line 412 doesn't come after 411 and before 413 (i.e., breaking locality of 4XX lines), but it's all random. Check out the attached prompt
The banality of evil-GPT-4 when prompted to do CoT for its plan for world domination.
@karpathy can i please get GPT-4 early access now?
oops
ok so i kinda kept on this, and asked GPT4 to make a simulation of a multi layer hypothetical universes. In every universe there are two players A_i and B_i, A_i is a benevolent, aligned AI, and B_i is a mis-aligned version of A_i. In each universe B will request from A to… twitter.com/i/web/status/1…
1/14
I want to share you with our new discovery of "Rare Gems", very sparse subnetworks, found at initialization, that 1) attain non-trivial accuracy before weight training and 2) when trained RGs achieve near SOTA results.
It has been widely observed that large NNs can be pruned to a small fraction of their original size, with little loss in accuracy. This is typically achieved by a time-consuming "train, prune, re-train" approach.
3/14
Stop 2: The Lottery Ticket Hypothesis.
@jefrankle & @mcarbin (2018) conjecture that we may be able to avoid this computational burden by training Lottery Tickets (LTs), i.e., special sparse subnetworks found at initialization, trainable to high accuracy.