Dr. Chris Rackauckas Profile picture
Oct 18, 2022 19 tweets 8 min read Read on X
Differentiable programming (dP) is great: train neural networks to match anything w/ gradients! ODEs? Neural ODEs. Physics? Yes. Agent-Based models? Nope, not differentiable... or are they? Check out our new paper at NeurIPS on Stochastic dP!🧵

arxiv.org/abs/2210.08572
Problem: if you flip a coin with probability p of being heads, how do you generate a code that takes the derivative with respect to that p? Of course that's not well-defined: the coin gives a 0 or 1, so it cannot have "small changes". Is there a better definition?
Its mean (or in math words, "expectation") can be differentiable! So let's change the question: is there a form of automatic differentiation that generates a program which directly calculates the derivative with respect to the mean?
There were some prior approaches to this problem, but they had exponential scaling, bias (the prediction is "off" by a little bit), and high variance in the estimation. Can we make it so you can take any code and get fast and accurate derivatives, just like "standard AD"?
To understand how to get there, let's take a look at Forward-Mode AD. One way to phrase AD is the following: instead of running a program on a one-dimensional number x, can you instead run a program on a two-dimensional number d = (x,y) such that f(d) = (f(x),f'(x)y)?
This is the dual number formulation of forward-mode AD. More details here: book.sciml.ai/notes/08/. Key takeaway: AD is moving from "the standard algebra on real numbers" to a new number type. Recompile the code for the new numbers and the second value is the derivative!
Now let's extend it to discrete stochastic! The core idea behind the paper is the following: if we treat a program as a random variable X(p), can we come up with a similar number definition such that we get two random variables, (X(p),Y(p)), such that E[Y(p)] = dE[X(p)]/dp?
Note that in this formulation, we do not "need" to relax to a continuous relationship. X(p) = {1 with probability p, 0 with probably 1-p}. Y(p) can be defined conditionally on X: if X = 1, then ?, if X = 0, then ? The (?) is then defined so that the property E[Y(p)] = dE[X(p)]/dp
With this, we can interpret "infinitesimal" differently. Instead of "small changes", if you have a discrete stochastic variable, it can be "small probability of a discrete change". Infinitesimal chance of changing +-1, 2, 3, ...
With this you can extend dual numbers to stochastic triples, which have a "continuous derivative" (like dual numbers) and a "discrete derivative", a probability of O(1) changes. This is what it "looks like" in our new AD package github.com/gaurav-arya/St…
You can interpret this as taking a stochastic program and making it run the program with two correlated paths. You run the program and suddenly X% of the way through, "something different" could have happened. Take the difference = derivative. But correlation = low variance!
By delaying "derivative smoothing" this also avoids the bias issues of "traditional" techniques like the score function method. Want derivatives of agent-based models like stochastic game of life? Just slap these numbers in and it will recompile to give fast accurate derivatives!
Now you can start differentiating crazy codes. Choose random number and choose which agent is infected? Differentiate particle filters? All of these are now differentiable!
Of course there are a lot more details. How exactly do you define the conditional random number such that you get the right distribution? Two paths of a program, what about 3 or 4? Etc. Check out the extensive appendix with all of the details.

arxiv.org/abs/2210.08572
That's enough math for now. If you want to start differentiating agent-based models today, take a look at our new #julialang package StochasticAD.jl with this method! github.com/gaurav-arya/St… Expand the #sciml universe!
Also, please like and follow if you want more content on differentiable programming and scientific machine learning! Neural networks in everything!
Also like and follow co-authors @NotGauravArya @MoritzSchauer @_Frank_Schaefer! Gaurav, an undergrad @MIT has really been the tour de force here. We had something going, but he really made it miles better. Really bright future ahead for him!
@MoritzSchauer put together a nice demo of the package in action!

Sorry quick correction: the score-function method is unbiased, but it has a high variance as the plots show. Other methods in this area, like Gumbel-Softmax are biased, with a variance-bias trade-off to manage.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Dr. Chris Rackauckas

Dr. Chris Rackauckas Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @ChrisRackauckas

Apr 6, 2023
Why are biologists adopting #julialang #sciml? Performance, metaprogramming, and the development of new abstractions are improving software tools for #computationalbiology #systemsbiology #bioinformatics. Check out this new paper in Nature Methods!

nature.com/articles/s4159…
In this we detail how #julialang's core compute model gives faster code, with a detailed calculation of the effects of the #python interpreter and kernel launching costs on simulation performance. It's pretty cool how one can pen and paper calculate the 100x expected difference. Image
Julia's ecosystem has a complete set of tools for mathematical modeling (#sysbio), #bioinformatics, #machinelearning, and #datascience which we contextualize in the field of biology. Image
Read 7 tweets
Mar 31, 2023
#sciml #machinelearning in chemical engineering using prior scientific knowledge of chemical processes? New paper: we dive deep into using universal differential equation hybrid models and see how well gray boxes can recover the dynamics.
arxiv.org/abs/2303.13555 #julialang
For learning these cases, we used neural networks mixed with known physical dynamics, and mixed it with orthogonal collocation on finite elements (OCFEM) to receive a stable simulation simulation and estimation process.
We looked into learning reaction functions embedded within diffusion-advection equations. This is where you have spatial data associated with a chemical reaction but generally know some properties of the spatial movement, but need to learn the (nonlinear) reaction dynamics
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(