Latest Twitter Threads by @RTomMcCoy on Thread Reader App

May 20 • 16 tweets • 5 min read

🤖🧠Paper out in Nature Communications! 🧠🤖

Bayesian models can learn rapidly. Neural networks can handle messy, naturalistic data. How can we combine these strengths?

Our answer: Use meta-learning to distill Bayesian priors into a neural network!

1/n nature.com/articles/s4146…

Bayesian models can learn from few examples because they have strong inductive biases - factors that guide generalization. But the costs of inference and the difficulty of specifying generative models can make naturalistic data a challenge.

2/n

Oct 10, 2024 • 17 tweets • 6 min read

🤖🧠NOW OUT IN PNAS🧠🤖

Language models show many surprising behaviors. E.g., they can count 30 items more easily than 29

In Embers of Autoregression, we explain such effects by analyzing what LMs are trained to do

Major updates since the preprint!

1/n pnas.org/doi/10.1073/pn…

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab In this thread, find a summary of the work & some extensions (yes, the results hold for OpenAI o1!)

And note that we've condensed it to 12 pages - making it a much quicker read than the 84-page preprint!

2/n

Sep 26, 2023 • 14 tweets • 5 min read

🤖🧠NEW PAPER🧠🤖

Language models are so broadly useful that it's easy to forget what they are: next-word prediction systems

Remembering this fact reveals surprising behavioral patterns: 🔥Embers of Autoregression🔥 (counterpart to "Sparks of AGI")

1/8 arxiv.org/abs/2309.13638

@ShunyuYao12 @danfriedman0 @mdahardy @cocosci_lab Our big question: How can we develop a holistic understanding of large language models (LLMs)?

One popular approach has been to evaluate them w/ tests made for humans

But LLMs are not humans! The tests that are most informative about them might be different than for us

2/8

May 30, 2023 • 15 tweets • 6 min read

🤖🧠NEW PAPER🧠🤖

Bayesian models can learn rapidly. Neural networks can handle messy, naturalistic data. How can we combine these strengths?

Our answer: Use meta-learning to distill Bayesian priors into a neural network!

Paper: arxiv.org/abs/2305.14701

1/n

Feb 14, 2023 • 15 tweets • 4 min read

This very nice piece by Ted Chiang describes ChatGPT as a lossy compression of the Internet.

This idea is helpful for building intuition, but it's easy to miss an important point: Lossiness is not always a problem! In fact, if done right, it is exactly what we want.

1/14

https://twitter.com/NewYorker/status/1625473244311523328

To make this concrete, let’s consider a specific example. Suppose you encounter this list of sequences:

2/14

May 4, 2022 • 5 tweets • 3 min read

🤖🧠NEW PAPER🧠🤖

What explains the dramatic recent progress in AI?

The standard answer is scale (more data & compute). But this misses a crucial factor: a new type of computation.

Shorter opinion piece: arxiv.org/abs/2205.01128
Longer tutorial: microsoft.com/en-us/research…

1/5

To understand current AI, we need some insights from CogSci and from 20th-century AI.

In CogSci, two crucial factors for human-level intelligence are compositionality and continuity.

2/5

Nov 19, 2021 • 13 tweets • 5 min read

*NEW PREPRINT*

Neural-network language models (e.g., GPT-2) can generate high-quality text. Are they simply copying text they have seen before, or do they have generalizable linguistic abilities?

Answer: Some of both!

Paper: arxiv.org/abs/2111.09509

1/n

Work done with @tallinzen, Paul Smolensky, @JianfengGao0217, & @real_asli.

We generate text from language models and then analyze whether the text is novel or duplicated from the training set. We analyze novelty for sequential structure (n-grams) and syntactic structure.

2/n

Jan 14, 2020 • 12 tweets • 9 min read

New paper: "Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks" w/ @Bob_Frank & @TalLinzen to appear in TACL

Paper arxiv.org/pdf/2001.03632…
Website rtmccoy.com/rnn_hierarchic…

Interested in syntactic generalization? Read on! 1/

@bob_frank @tallinzen For 2 syntactic tasks, we train models on training sets that are ambiguous between two rules: one rule based on hierarchical structure and one based on linear order.

2/12

Share this page!

Enter URL or ID to Unroll