Eric Zelikman Profile picture
si @xAI // was phd-ing @stanford
Mar 15, 2024 8 tweets 4 min read
Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks

Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵 Reasoning is everywhere in text -- just hidden between the lines. That's because people (often) think before they speak. So LMs can learn to reason from diverse online text if they:
🧠1) reason about what text is next
💬2) see if the thought helped
🧑‍🎓3) learn from useful thoughts Visualization of the thoughts generated in parallel for all tokens in an input text on an addition problem, showing how intermediate thoughts can be useful
Oct 5, 2023 8 tweets 3 min read
“Recursive self-improvement” (RSI) is one of the oldest ideas in AI. Can language models write code that recursively improves itself?

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
w/@elianalorch, @LesterMackey, @adamfungi
(1/n) Pipeline figure for STOP. On the left, improver_0 improves itself to become improver_1, etc. until improver_T. On the right, improver_0 is expanded to visualize that improver_0, the seed improver, takes a program and returns the best improvement the language model generates. We start with a simple seed "improver" program that takes code and an objective function and improves the code with a language model (returning the best of k improvements). But improving code is a task, so we can pass the improver to itself! Then, repeat…
arxiv.org/abs/2310.02304
Sep 12, 2023 8 tweets 3 min read
Did you know there’s a task people easily solve but GPT-4 fails? From a few input-output grids, ARC asks you to infer and apply a rule

With Hypothesis Search, we double GPT-4’s score


w/@ruocheng_w @GabrielPoesia @evanthebouncy @nickhaber @noahdgoodman
🧵 arxiv.org/abs/2309.05660
Pipeline overview. From left to right, train examples, generate hypotheses, select, implement, validate This kind of problem solving is “inductive reasoning,” and it’s essential to science and creativity. That’s why ARC has been used to argue that LLMs can’t reason and also why, when @Ruocheng suggested tackling @fchollet’s ARC, I called it a nerd snipe ()xkcd.com/356/
Feb 6, 2023 5 tweets 3 min read
You can now generate complex programs from natural language without writing unit tests! Automatic test generation 🤖🧪 has been added to Parsel🐍

Code here: github.com/ezelikman/pars… (1/5) Parsel code with four funct...Generated Python code imple... Decomposition🧩 and test generation🧪 go together well: if interconnected parts all pass tests, then it's more likely the solution and tests are good. But how do we know that the generated tests are any good? (2/5)
Jan 26, 2023 7 tweets 4 min read
For code language models, every token is a new chance to break a program. What if LLMs wrote code like people, decomposing programs into solvable parts? They can solve competition-level coding problems by writing natural language programs in Parsel🐍, beating prior SoTA by >75%! Plot showing competition-level pass rate of Parsel using Cod Parsel 🐍: A Unified Natural Language Framework for Algorithmic Reasoning
Work done w/ @qhwang3 @GabrielPoesia @noahdgoodman @nickhaber
Website [🕸️]: zelikman.me/parselpaper/
Paper [📜]: zelikman.me/parselpaper/pa…
Code [💻]: github.com/ezelikman/pars… Flow chart visualizing Parsel - first, the language model de
Dec 8, 2022 17 tweets 7 min read
ChatGPT can write stories and then tell DALLE-2 prompts to illustrate them. I asked it to write a children's story about "a robot that wanted to be a human." Here's the story it came up with: (0/11) Once upon a time, in a land far, far away, there was a robot named Robby who lived in a world full of machines. Robby was different from the other robots, though. He didn't want to spend his days following orders and carrying out tasks like the other robots did.
(1/11) The image shows a robot standing among a group of other robo