Research @ OpenAI | MIT PhD | exchanging algorithms with ai
Nov 10, 2024 • 16 tweets • 5 min read
Why do we treat train and test times so differently?
Why is one “training” and the other “in-context learning”?
Just take a few gradients during test-time — a simple way to increase test time compute — and get a SoTA in ARC public validation set 61%=avg. human score! @arcprize
We investigate the existing idea of test-time training (TTT): you construct an auxiliary dataset based on your test inputs and update the model before making a prediction.
But it’s not clear what tasks to train on, what kind of inference, and what base model to start with?
Nov 29, 2022 • 13 tweets • 5 min read
How does in-context learning work?
Maybe language models unexpectedly discover how to store/simulate/train other models in their hidden units.
So, few-shot prompting can be equivalent to fine-tuning running inside of an LM!
🔢 Does GPT-3 know arithmetic? Are LM scratchpads/chain-of-thought prompting always helpful? What should go into a successful scratchpad when sampling predictions from GPT-3? Check out our blog post where we inspect these questions on the addition problem! lingo.csail.mit.edu/blog/arithmeti…
Past work found that prompting large language models to generate a set of short sentences that describe intermediate steps before producing the final answer significantly boosted their performance on tasks that require symbolic reasoning.