Tweet

Patrick Kidger

Feb 8 • 21 tweets • 10 min read

⚡️ My PhD thesis is on arXiv! ⚡️

To quote my examiners it is "the textbook of neural differential equations" - across ordinary/controlled/stochastic diffeqs.

w/ unpublished material:
- generalised adjoint methods
- symbolic regression
- + more!

arxiv.org/abs/2202.02435

v🧵 1/n

If you follow me then there's a decent chance that you already know what an NDE is. (If you don't, go read the introductory Chapter 1 to my thesis haha -- it's only 6 pages long.) Put a neural network inside a differential equation, and suddenly cool stuff starts happening.

2/n

Neural differential equations are a beautiful way of building models, offering:
- high-capacity function approximation;
- strong priors on model space;
- the ability to handle irregular data;
- memory efficiency;
- a foundation of well-understand theory.

3/n

You can model the evolution of unknown trajectories (cough finance cough) via neural SDEs:

4/n

...or model unknown distributions via continuous normalising flows (aka the Fokker--Planck equation):

In this case the target distribution is a 2D picture. (Obtained by me and Microsoft Paint at 2am.)

5/n

Or you can study unknown physical dynamics -- here an unknown Hamiltonian system parameterised by neural kinetic and potential terms:

6/n

You can build "continuous time RNNs", by using the theory of controlled differential equations:

7/n

As a final example, you can understand neural nets via diffeqs. It's relatively famous that ResNets are the explicit Euler method applied to a neural ODE...

8/n

...but did you know that the feature that distinguishes GRUs and LSTMs from generic RNNs, is a very precise differential-equation-like structure? For example a GRU has an exponential decay term.

(No wonder they struggle to learn long-term dependencies.)

9/n

I've hinted that my thesis includes some previously unpublished material.

For example, did you know that a neural ODE can be a universal approximator even if its vector fields are not universal approximators? (That's Section 2.4.2.)

10/n

Or that all these "adjoint methods" floating about

- for ODEs, for SDEs, whatever -

are all special cases of the same thing, applied to the general notion of an "RDE" aka Rough Differential Equation? (That's Appendix C.3.)

11/n

@MilesCranmer

If you've ever heard of SINDy -- symbolic regression for dynamical systems -- then there's *also* some unpublished material on improving on this via genetic algorithms. (That's Section 6.1.)

[@MilesCranmer who this was joint work with!]

12/n

And of course, all accompanying code is provided -- available as the examples in the brand-new Diffrax software library! Your one-stop-shop for numerical differential equation solvers in #JAX.

GitHub: github.com/patrick-kidger…
Documentation: docs.kidger.site/diffrax/

13/n

I'm planning on doing some individual posts about some highlights from the thesis over the next few days.

Diffrax is available a little early as a sneak peak. I'll be doing a proper announcement on it next week!

14/n

Credit where it's due. A doctorate doesn't happen in a vacuum.

My friends and friends have been an amazing support. Chloe, thank you for all the food! Juliette, thank you for the south of France. Mum, Dad: thank you for everything.

On a more academic note:

15/n

@DavidDuvenaud

CC allllll the people who might find this announcement interesting

@DavidDuvenaud @MilesCranmer @RickyTQChen @lxuechen @ChrisRackauckas @YSongStanford @shoyer @MichaelPoli6 @massastrello @CristopherSalvi @sp_monte_carlo @f_vaggi @data_sig @OxBenWalker

16/n

(Feel free to @ everyone else who'd like to know about this!)

I have been fortunate to work, collaborate, or communicate with most of the above list. (+Many others, both on and off Twitter.) So in a very practical way, you made this work possible.

17/n

Okay, let's wrap this up. If you're studying NDEs and want a reference text, then maybe this is it?

231 pages of everything you ever wanted to know about N ordinary DEs, N controlled DEs, N stochastic DEs, and N rough DEs.

Once again, link here: arxiv.org/abs/2202.02435

18/18

Appendix: as a fun historical note, whilst "Neural Ordinary Differential Equations" won best paper at NeurIPS 2018 -- which is why people have heard of the field of NDEs -- I'm actually aware of work on NDEs dating back to 1991!

ieeexplore.ieee.org/document/47913…

19/18

Also, do check out the "Comments" section at the end of each chapter. There you can find various marginalia on extensions, references, open problems, and musings about the field of NDEs in general. :)

20/18

Fin. [Both of this tweet thread... and of my doctorate!]

Accompanying Reddit thread here: reddit.com/r/MachineLearn…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @PatrickKidger

Patrick Kidger

@PatrickKidger

Sep 13, 2021

Announcing Equinox v0.1.0! Lots of new goodies for your neural networks in JAX.

-The big one: models using native jax.jit and jax.grad!
-filter, partition, combine, to manipulate PyTrees
-new filter functions
-much-improved documentation
-PyPI availability!

A thread:
1/n 🧵

First: simple models can be used directly with jax.jit and jax.grad. This is because Equinox models are just PyTrees like any other. And JAX understands PyTrees.

2/n

More complex models might have arbitrary Python types in their PyTrees -- we don't limit you to just JAX arrays.

In this case, filter/partition/combine offer a succient way to split one PyTree into two, and then recombine them.

3/n

Read 9 tweets

Patrick Kidger

@PatrickKidger

Aug 3, 2021

Announcing Equinox!

github.com/patrick-kidger…

A JAX neural network library with
- a PyTorch-like class API for model building
- whilst *also* being functional (no stored state)

It leverages two tricks: *filtered transformations* and *callable PyTrees*.

1/n🧵

First of all, I know what you're thinking. We already have e.g. Flax and Haiku (+ a few others as well).

What's new, and do we really need another?

To the best of my knowledge, Equinox overcomes some of the core technical difficulties faced in previous libraries.

2/n

We love having a PyTorch-like class API for model building.

We love having JAX-like functional programming.

But these seem like completely different paradigms, and making them work together is tricky.

3/n

Read 20 tweets

Patrick Kidger

@PatrickKidger

May 12, 2021

New paper: Neural Rough Differential Equations !

Greatly increase performance on long time series, by using the mathematics of rough path theory.

arxiv.org/abs/2009.08295
github.com/jambo6/neuralR…

Accepted at #ICML2021!

🧵: 1/n
(including a lot about what makes RNNs work)

(So first of all, yes, it's another Neural XYZ Differential Equations paper.

At some point we're going to run out of XYZ differential equations to put the word "neural" in front of.)

2/n

As for what's going on here!

We already know that RNNs are basically differential equations.

Neural CDEs are the example closest to my heart. These are the true continuous-time limit of generic RNNs:
arxiv.org/abs/2005.08926
github.com/patrick-kidger…

3/n

Read 22 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Patrick Kidger

Try unrolling a thread yourself!

More from @PatrickKidger

Patrick Kidger

Patrick Kidger

Patrick Kidger

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?