Latest Twitter Threads by @DavidDuvenaud on Thread Reader App

Apr 27 • 10 tweets • 4 min read

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below!

with @AlecRad and @status_effects 🧵

Blog post with more details:
talkie-lm.com/introducing-ta…

Feb 27, 2025 • 15 tweets • 5 min read

LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text.

LLMs pick up on all kinds of subtle and unusual structure: 🧵

This is fun because LLMs can condition on free-form side information, and make predictions about anything. This turns qualitative knowledge into quantitative predictions.

Here we condition Llama 3 on two datapoints, plus text. Changing the text changes the meaning of the data.

Jan 30, 2025 • 20 tweets • 5 min read

New paper: What happens once AIs make humans obsolete?

Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values.

with @jankulveit @raymondadouglas @AmmannNora @degerturann @DavidSKrueger 🧵 gradual-disempowerment.ai

The major takeaways:

1) No one has a concrete plausible plan for stopping gradual human disempowerment.

2) Aligning individual AI systems with their designers’ intentions is not sufficient. This is because our civilization and institutions aren’t robustly aligned with humans.

Jun 3, 2021 • 8 tweets • 4 min read

Gradient descent in differentiable games rotates around solutions instead of converging. For instance, in GANs. We solve this with a simple trick: complex momentum damps the oscillations.

arxiv.org/abs/2102.08431
With @jonLorraine9 @davidjesusacu @PaulVicol

Our method is a two-line change from standard momentum updates in JAX and PyTorch. It still gives real-valued updates. This is the code for our method!

Jul 17, 2020 • 7 tweets • 4 min read

Neural ODEs are slow. We speed them up by regularizing their higher derivatives, learning ODEs that are easy to solve:
arxiv.org/pdf/2007.04504…
with @jacobjinkelly @jessebett @SingularMattrix

The main idea is that solvers can take large steps when the higher derivatives of the solutions are small. So we just encourage them to be small during training! For most problems, we saw about a 2x speedup, barely hurting training loss:

Jan 9, 2020 • 6 tweets • 4 min read

Training Neural SDEs: We worked out how to do scalable reverse-mode autodiff for stochastic differential equations. This lets us fit SDEs defined by neural nets with black-box adaptive higher-order solvers.
arxiv.org/pdf/2001.01328…
With @lxuechen, @rtqichen and @wongtkleonard.

For neural ODEs, continuous-time backprop had already been worked out. For SDEs, surprisingly, there was no analogous reverse-mode method. The algorithm ended up being a simple extension of the ODE method with fixed noise, a sort of continuous-time reparameterization trick.

Jul 10, 2019 • 4 tweets • 3 min read

We scaled up neural ODE time series models to real data, such as medical records with irregularly-timed measurements. Specifically, we made ODE-RNN hybrids, and also improved inference in latent ODEs: arxiv.org/abs/1907.03907 By amazing students @YuliaRubanova and @rtqichen.

Sometimes the fact that an observation happened at all tells us a lot. For example, the fact that a patient shows up to the hospital at a particular time is informative. We modeled observation times using Poisson processes whose rate depends on the latent state:

Share this page!

Enter URL or ID to Unroll