Assistant professor @YaleLinguistics. Studying computational linguistics, cognitive science, and AI. He/him.
Oct 10, 2024 • 17 tweets • 6 min read
🤖🧠NOW OUT IN PNAS🧠🤖
Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29
In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do
Major updates since the preprint!
1/n pnas.org/doi/10.1073/pn…
@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab In this thread, find a summary of the work & some extensions (yes, the results hold for OpenAI o1!)
And note that we've condensed it to 12 pages - making it a much quicker read than the 84-page preprint!
2/n
Sep 26, 2023 • 14 tweets • 5 min read
🤖🧠NEW PAPER🧠🤖
Language models are so broadly useful that it's easy to forget what they are: next-word prediction systems
Remembering this fact reveals surprising behavioral patterns: 🔥Embers of Autoregression🔥 (counterpart to "Sparks of AGI")
1/8 arxiv.org/abs/2309.13638
@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab Our big question: How can we develop a holistic understanding of large language models (LLMs)?
One popular approach has been to evaluate them w/ tests made for humans
But LLMs are not humans! The tests that are most informative about them might be different than for us
2/8
May 30, 2023 • 15 tweets • 6 min read
🤖🧠NEW PAPER🧠🤖
Bayesian models can learn rapidly. Neural networks can handle messy, naturalistic data. How can we combine these strengths?
Our answer: Use meta-learning to distill Bayesian priors into a neural network!
1/n
Bayesian models can learn from few examples because they have strong inductive biases - factors that guide generalization. But the costs of inference and the difficulty of specifying generative models can make naturalistic data a challenge.
2/n
Feb 14, 2023 • 15 tweets • 4 min read
This very nice piece by Ted Chiang describes ChatGPT as a lossy compression of the Internet.
This idea is helpful for building intuition, but it's easy to miss an important point: Lossiness is not always a problem! In fact, if done right, it is exactly what we want.
1/5
To understand current AI, we need some insights from CogSci and from 20th-century AI.
In CogSci, two crucial factors for human-level intelligence are compositionality and continuity.
2/5
Nov 19, 2021 • 13 tweets • 5 min read
*NEW PREPRINT*
Neural-network language models (e.g., GPT-2) can generate high-quality text. Are they simply copying text they have seen before, or do they have generalizable linguistic abilities?
We generate text from language models and then analyze whether the text is novel or duplicated from the training set. We analyze novelty for sequential structure (n-grams) and syntactic structure.
2/n
Jan 14, 2020 • 12 tweets • 9 min read
New paper: "Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks" w/ @Bob_Frank & @TalLinzen to appear in TACL
Interested in syntactic generalization? Read on! 1/ @bob_frank@tallinzen For 2 syntactic tasks, we train models on training sets that are ambiguous between two rules: one rule based on hierarchical structure and one based on linear order.