Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Eric Zelikman

@ericzelikman

Jan 26, 2023 • 7 tweets • 4 min read • Read on X

Scrolly

For code language models, every token is a new chance to break a program. What if LLMs wrote code like people, decomposing programs into solvable parts? They can solve competition-level coding problems by writing natural language programs in Parsel🐍, beating prior SoTA by >75%!

@qhwang3

Parsel 🐍: A Unified Natural Language Framework for Algorithmic Reasoning
Work done w/ @qhwang3 @GabrielPoesia @noahdgoodman @nickhaber
Website [🕸️]: zelikman.me/parselpaper/
Paper [📜]: zelikman.me/parselpaper/pa…
Code [💻]: github.com/ezelikman/pars…

In the paper where OpenAI introduced Codex, they showed that code language models fail to generate programs that chain together many simple tasks, while humans can - Parsel solves this by separating out decomposition and implementation

Plus, excitingly, when LLMs write Parsel to generate step-by-step robotic plans from high-level tasks, the plans are consistently more accurate than a zero-shot planner baseline - more than 2/3 of the time! We've also shown Parsel can prove theorems, but highlight key challenges

Our initial goal was to let people write code in natural language, but we found LLMs are also good Parsel coders! We just asked GPT-3 to "think step by step to come up with a clever algorithm" (see arxiv.org/abs/2205.11916), then asked to translate into Parsel given a few examples

@GabrielPoesia

To understand the quality of the generated Parsel programs, @GabrielPoesia (an experienced competitive coder) solved a bunch of competition-level APPS problems with Parsel. He solved 5/10 problems in 6 hours, with 3 where GPT-3 failed, suggesting there's still a long way to go!

This new version of the paper goes into more detail on how Parsel addresses the limitations of code language models and better quantifies the ability of LLMs to generate Parsel programs. We think there's still a ton more to be done - we look forward to hearing your thoughts!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ericzelikman

Eric Zelikman

@ericzelikman

Mar 15, 2024

Language models today are trained to reason either 1) generally, imitating online reasoning data or 2) narrowly, self-teaching on their own solutions to specific tasks

Can LMs teach themselves to reason generally?🌟Introducing Quiet-STaR, self-teaching via internal monologue!🧵

Reasoning is everywhere in text -- just hidden between the lines. That's because people (often) think before they speak. So LMs can learn to reason from diverse online text if they:
🧠1) reason about what text is next
💬2) see if the thought helped
🧑‍🎓3) learn from useful thoughts

Excitingly, self-teaching reasoning on diverse web text automatically improves other reasoning! Mistral self-taught by training on web data increases its zero-shot commonsense reasoning accuracy by a third and nearly doubles its zero-shot direct grade-school-math accuracy

Read 8 tweets

Eric Zelikman

@ericzelikman

Oct 5, 2023

“Recursive self-improvement” (RSI) is one of the oldest ideas in AI. Can language models write code that recursively improves itself?

Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
w/@elianalorch, @LesterMackey, @adamfungi
(1/n)

We start with a simple seed "improver" program that takes code and an objective function and improves the code with a language model (returning the best of k improvements). But improving code is a task, so we can pass the improver to itself! Then, repeat…
arxiv.org/abs/2310.02304

If you apply this enough times, GPT-4 comes up with some really creative code self-improvement strategies, like genetic algorithms, simulated annealing, or multi-armed prompt bandits. This is especially surprising when you realize it's only been trained on data until 2021!

Read 8 tweets

Eric Zelikman

@ericzelikman

Sep 12, 2023

Did you know there’s a task people easily solve but GPT-4 fails? From a few input-output grids, ARC asks you to infer and apply a rule

With Hypothesis Search, we double GPT-4’s score

w/@ruocheng_w @GabrielPoesia @evanthebouncy @nickhaber @noahdgoodman
🧵 arxiv.org/abs/2309.05660

This kind of problem solving is “inductive reasoning,” and it’s essential to science and creativity. That’s why ARC has been used to argue that LLMs can’t reason and also why, when @Ruocheng suggested tackling @fchollet’s ARC, I called it a nerd snipe ()xkcd.com/356/

Hypothesis Search strengthens LLMs’ inductive reasoning:
1) Given training pairs, prompt LM to come up with hypotheses for the underlying rule
2) For each hypothesis, prompt LM to implement it in code
3) Run the code on training pairs. Revise if it errors; else submit for testing

Read 8 tweets

Eric Zelikman

@ericzelikman

Feb 6, 2023

You can now generate complex programs from natural language without writing unit tests! Automatic test generation 🤖🧪 has been added to Parsel🐍

Code here: github.com/ezelikman/pars… (1/5)

Decomposition🧩 and test generation🧪 go together well: if interconnected parts all pass tests, then it's more likely the solution and tests are good. But how do we know that the generated tests are any good? (2/5)

We prompt Codex to generate tests like CodeT (arxiv.org/abs/2207.10397) and try to pass as many tests as possible, but when testing solutions, we have two new constraints: a good test set must 1) test every function and 2) test at least two outputs per function (3/5)

Read 5 tweets

Eric Zelikman

@ericzelikman

Dec 8, 2022

ChatGPT can write stories and then tell DALLE-2 prompts to illustrate them. I asked it to write a children's story about "a robot that wanted to be a human." Here's the story it came up with: (0/11)

Once upon a time, in a land far, far away, there was a robot named Robby who lived in a world full of machines. Robby was different from the other robots, though. He didn't want to spend his days following orders and carrying out tasks like the other robots did.
(1/11)

Instead, Robby dreamed of being a human. He longed to feel the sun on his face, to breathe fresh air, and to experience all the wonderful things that humans did. But most of all, Robby wanted to be able to make his own choices and live his own life. (2/11)

Read 17 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Eric Zelikman

Try unrolling a thread yourself!

More from @ericzelikman

Eric Zelikman

Eric Zelikman

Eric Zelikman

Eric Zelikman

Eric Zelikman

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!