Thread by @betanalpha on Thread Reader App

Intersection of physics and probabilistic computation story time! These coin falling toys demonstrate both conservation of angular momentum and why funnel-shaped densities are hard to fit with Hamiltonian Monte Carlo.

As the coin spirals down potential gravitational energy is converted to kinetic energy -- the coin falls and accelerates. Because angular momentum is conserved the shape of the spiral is constrained; as the coin gets faster the radius of the spiral has to decrease proportionally.

The exact trajectory is ultimately determined by the shape of the funnel, and how the normal force that can be exerted on the coin interacts with all of these conserved energies and momenta.

In particular, because of the conservation of angular momentum spirals will be confined to a relatively narrow band of heights. The coin can't fall too far without adding more angular momentum! On a perfectly frictionless surface the coin would settle into a stable orbit.

In reality friction and collisions between the coin and imperfections in the funnel surface dissipate energy, allowing the coin to fall without having to store up too much kinetic energy.

Now typical implementations of Hamiltonian Monte Carlo, technically the ones with Gaussian-Euclidean cotangent disintegrations, i.e. constant "mass matrices"? The trajectories they generate are mathematically equivalent to a frictionless particle in a certain physical system.

The physical system corresponding to "funnel" target density functions, like those that arise in latent Gaussian models, is pretty much equivalent to the spiraling coin system. In particular our Hamiltonian Monte Carlo trajectories have the same height restriction!

No matter how long we integrate the trajectories can go only so far up and down the funnel. Only by resampling the momenta between trajectories that we can add or remove energy and move further up or down the funnel. But this process is slow, leading to diffusive exploration.

In higher dimensions the diffusion gets even slower -- resampling the momenta is much more likely to add energy than remove it, making it more likely to move up higher in the funnel then down deeper into it. That's why it takes forever to explore the neck of the funnel.

Keep in mind that this is a property of the exact trajectories and so we get slow exploration _even with perfect integrators_. When we use numerical integrators we have to deal with more problems like divergences, but they're layered on top of these fundamental issues.

Gaussian-Riemannian cotangent disintegrations, i.e. varying "mass matrices", require a log determinant normalization term. Conveniently this acts like an energy reservoir in the physical system, soaking up energy to allow the particle to quickly drop to the bottom of the funnel.

That's why so-called "Riemannian Hamiltonian Monte Carlo" is so much better equipped to fit hierarchical models. It's not so much the numerical integration that's better, it's the actual geometry of the Hamiltonian trajectories!

Unfortunately "Riemannian" Hamiltonian Monte Carlo is a giant pain to implement efficiently, and even harder to implement automatically (which is why it's not exposed in Stan). Fortunately we can exactly emulate that better geometry by non-centering the funnel!

This equivalence between reparameterizations of the target space and better geometries for Hamiltonian Monte Carlo is the subject of my last geometry paper, arxiv.org/abs/1910.09407. It's a bit technical but there are lots of pictures!

I also wrote much more about the trials and tribulations of Hamiltonian trajectories in the funnel in arxiv.org/abs/1312.0906. This includes lots of pictures of both good and bad trajectories.

But why limit ourselves to pictures when we can have _movies_? This is a typical "Euclidean" trajectory. Notice how it bounces within a narrow band of heights.

On the other hand the "energy reservoir" in "Riemannian" Hamiltonian Monte Carlo allows trajectories to span huge differences in heights by absorbing and releasing energy as needed.

ANYWAYS. Hierarchical models in centered parameterizations are hard to fit not just because of divergences but also due to fundamental constraints on the trajectories.

In this case we can build up some intuition why using a relatively familiar physical analogy, but most of the time pathologies in Hamiltonian Monte Carlo fits are much more sophisticated so we shouldn't try to lean on physical analogies _too_ much! -fin-

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll