12,399 views

\mathfrak{Michael "El Muy Muy" Betancourt}

@betanalpha

, 36 tweets, 7 min read

My Authors

I love Hamiltonian Monte Carlo, and I am sure that many of you do too, but I'm not sure if you are fully aware of how _miraculous_ the entire method is. A short thread on how Hamiltonian Monte Carlo is comprised of layers of geometric miracles stacked on top of each another.

Estimating expectation values is really hard because probability distributions are hard to quantify, especially in high (like more than 3) dimensions. To have any success we need an efficient, compact representation of a given distribution which ends up being the typical set.

That shifts the computational problem to first finding and then exploring the breadth of the typical set, which in high dimensional spaces are extremely challenging. For more details see betanalpha.github.io/assets/case_st….

In order to explore as efficiently as possible we need _coherent_ exploration that concentrates where the typical set concentrates. Amazingly we can achieve that by following a _measure-preserving flow_.

This clarifies not only the structure of the problem but also how desperate of a problem it is. Measure-preserving flows are _exceedingly rare_ mathematical objects, usually showing up in only systems with _lots_ of symmetry. Trying to construct them in general seems hopeless.

Hope, however, finds a way in the form of our first miracle: Hamiltonian geometries. These systems have natural flows _and_ probability distributions that are compatible with each other, meaning the flows are always measure-preserving.

Unfortunately a general probabilistic system isn't going to have the right structure to be Hamiltonian and enjoy all of these features. Because we can't force the necessary structure in we can't exploit it to generate the measure-preserving flows that desperately want.

But what's that? It's another miracle! And a _huge_ one. While a given probabilistic system won't have the necessary structure there is natural way to _lift_ smooth distributions into higher-dimensional spaces that do have all the structure that we want.

That means we can build measure-preserving flows on these special lifted spaces and then project the exploration down onto our original space and enjoy the sweet, sweet coherent exploration. In other words we just have to follow the _shadow_ of the lifted dynamics.

For the geometrically inclined: if our distribution is defined over a smooth manifold then we can always lift to the associated cotangent bundle which enjoys a symplectic structure. Defining distributions over the cotangent fibers lifts the target distribution up as well.

Arguably there's sub-miracle here -- by lifting to a higher-dimensional space we reparameterize our base space without compromising the necessary structure on the lifted space. Playing around with the lifted spaces costs us _nothing_.

More formally, diffeomorphisms on the base space induce symplectomorphisms on the cotangent bundle allowing us to keep everything organized if we're careful. If we're not as careful, however, there are some subtle consequences. For more see arxiv.org/abs/1910.09407.

Okay, let's summarize. We want to explore the typical set of a given target probability distribution but don't know how. Measure-preserving flows will help, but we don't know how to construct them in general and we're unlikely to luck into a problem that already has one.

But if our target distribution is differentiable then we can lift to a higher-dimensional space where we can build measure-preserving flows and guide the exploration that we're after. Well, in theory...

The problem is that measure-preserving flows are defined implicitly in terms of ordinary differential equations. We won't know how to solve those explicitly for anything but the simplest problems, which means we're going to have to deal with numerical integration.

Sadly numerical integration of ordinary differential equations is really hard, especially in high dimensions. In particular most ODE solvers exhibit _drift_ where the numerical solutions quickly wander away from the exact solution.

Consequently unless we use an obscene amount of computation we can trust these numerical trajectories for only short periods of time, which means we wouldn't be able to guide effective exploration for very long.

The miracle of the cotangent bundle is a theoretical miracle, and it means nothing if we can't fully realize it in practice. Without efficient integration of ordinary differential equations it seems like this is indeed the case, and the prior miracles are for naught.

Oh, but wait. We're not solving _any_ differential equations here. We're solving very special differential equations that arise from that special Hamiltonian geometry. In fact they're so special that they can be simulated extremely accurately without too much cost!

Enter the miracle of the symplectic integrator. Symplectic integrators generate amazing well-behaved numerical Hamiltonian trajectories, allowing us to actually implement the corresponding measure-preserving flows with just a few lines of code!

Symplectic integrators are accurate in the ways we need them to be, and that accuracy persists even to high-dimensions. Moreover they have special properties that allows to exactly compensate for the small errors they do introduce! This is probably worth its own miracle.

Why are they so accurate? Because they exactly simulate the measure-preserving flows from a _shadow_ Hamiltonian geometry that is intimately related to our given Hamiltonian geometry.

In fact by analyzing this shadow geometry we can figure out how to tune symplectic integrators for optimal performance in practice. See for example one of my favorite papers that no one has ever read, arxiv.org/abs/1411.6669. Yes, I know that no one reads any of my papers.

Now symplectic integrators are usually really well-behaved, but occasionally they go a little rouge. It's best to think about this in terms of the _stability_ of the numerical integration. These integrators don't drift, but occasionally they will go haywire and quickly _diverge_.

This isn't an accident. We can define stability in terms of the _topology_ of the shadow Hamiltonian level sets on which the numerical trajectories lie. The exact Hamiltonian level sets are always compact, and stable integrators will also follow compact shadow level sets.

Unstable integrators, however, will follow non-compact shadow level sets that reach aaaaaaaall they way out to infinity, hence the rapid divergences. For demonstrative figures see Figure 32 of arxiv.org/abs/1701.02434.

And this sets us up for our final miracle. In turns out that unstable symplectic integrators are typically triggered by pathological geometries in our original target distribution. In other words these divergences identify the features that obstruct accurate computation!

Symplectic integrators not only accurately and efficiently simulate measure-preserving flows when they are stable but also clearly indicate problems in the target distribution when they are unstable. ARE YOU NOT ENTERTAINED?!

Hamiltonian Monte Carlo puts all of these miracles together, giving us an algorithm that not only is extremely efficient when it works but also self-diagnoses when it doesn't work. If you want any more in a statistical algorithm then I don't think you can ever be pleased.

Now as a user of tool like Stan that runs Hamiltonian Monte Carlo in the background you don't have to worry about any of this. You can just sit back and enjoy the miracles, silently hating me every time you see a divergence (until you accept how important they are).

If you're considering developing your own implementation of Hamiltonian Monte Carlo or your own algorithm entirely, however, then there are many critical lessons here that I recommend you don't ignore.

The success of Hamiltonian Monte Carlo is not an accident. It is a careful conspiracy of geometry, and if you compromise any of the necessary components then the _entire thing_ falls apart. If you want to tweak the method then you have to respect this critical structure.

In general probabilistic computation is too hard for heuristics to have any reasonable chance at robust success. In order to build something useful you have to figure out why a method might work and the how to preserve the necessary features in the implementation.

Many clever ideas have been passed by because they could not be realized in practice; applied methods require _implementable_ theory.

Thanks for taking the time to read this and high fives if all of this geometry gets you as excited as it gets me. I'm looking forward to squeezing every last drop of insight from it to make tools like Stan as best as they can be.

If you want to support that work you can always support me on Patreon, patreon.com/betanalpha, hire me to teach a course for your organization or company, betanalpha.github.io/courses/, or just tell others about my work, betanalpha.github.io/writing/.

Enjoying this thread?

Keep Current with \mathfrak{Michael "El Muy Muy" Betancourt}

Stay in touch and get notified when new unrolls are available from this author!

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Enjoying this thread?

Try unrolling a thread yourself!

More from @betanalpha see all

Related threads

Trending hashtags

Embed code for your website

Did Thread Reader help you today?