What is Probabilistic Numerics (PN)? To illustrate, take one core use case of PN— computing integrals. Most integrals are intractable (life is hard), so we must often integrate numerically. Sadly, numerical integrators are unreliable & computationally expensive.
The integrand f(x) here is simple—~20 characters, only atomic functions, can be evaluated in nanoseconds. However—the integral F is intractable! Let's try to calculate F numerically using PN.
The central idea of Probabilistic Numerics is to treat a numerical method as a *learning machine*. What about when the numerical method is an integrator? Well, a learning machine
• receives data,
• predicts and then
• takes actions.
QUIZ: in numerical integration,
• data = ?
• quantity of interest (the thing we want to predict) = ?
• actions = ?
Answers in two tweets' time!
No peeking!
ANSWERS:
Probabilistic Numerics views integration as:
• data = evaluations (also known as samples), f(x_i)
• quantity of interest = integral, F
• actions = choices of evaluation locations, x_i.
What does the learning machine use to turn data into predictions & actions? A model! A Probabilistic Numerical Method is (partly) defined by the choice of that model. The PN approach to integration is known as Bayesian quadrature, and often uses a Gaussian process (GP) model.
Why a GP?
1. We must choose evaluation locations, x_i. A GP, p(f(x)), gives trustworthy uncertainty (error-bars) for f(x_i)—we can trust the GP on which x_i to evaluate so as to best reduce uncertainty. Such evaluations are informative—giving high efficiency, low compute cost.
Why a GP?
2. A GP on the integrand, p(f(x)), leads to a neat one-d Gaussian, p(F), for the integral, F, 😎. p(F) gives not just an estimate, hat{F}, for F—it also tells us if/how that estimate is *trustworthy*. We get reliable error-bars for the integral!
Why a GP?
3. There exist other stochastic processes that might give trustworthy p(f(x)) and p(F)—at the cost of introducing additional, hard, numerical integration problems (a chicken/egg situation)! In contrast, a GP implementation requires (principally) only linear algebra.
But which GP? The beating heart of a GP is its covariance, k—different k's give different GPs, and hence different Bayesian quadrature (BQ) methods. PN is cool because it gives all the tools of Bayes (i.e. model selection) to help pick the k, and hence to pick the best BQ method.
Let's try one particular covariance, k = min(x, x') - χ. This k gives a GP model for f whose 'best guess', the expected value E[(f(x)], linearly interpolates between evaluations.
Does linear interpolation for numerical integration seem familiar? Well—do you remember the trapezoid rule? Used in ancient Babylon?
The trapezoid rule is just BQ with k=min(x,x')-χ!
The code for BQ with k=min(x,x')-χ IS IDENTICAL to the code for the trapezoid rule! 🎉
But we can do A LOT better than the trapezoid rule—by choosing a k that better fits f(x), e.g. a k that correctly assumes that f(x) is smooth. BQ with such a k (e.g. a k that reproduces Gauss-Legendre integration) is MUCH faster—with error that converges superpolynomially.
Notice that Monte Carlo integration… didn't do so well? Guess what: Monte Carlo is ALSO BQ, with a k that assumes that f(x) is constant plus noise. This assumption is false for our smooth f(x)—and so Monte Carlo converges slowly. Assuming that f(x) is noise is fairly odd!
Many different k's are possible! Almost anything that is known about the integrand can be baked into a k—the more we correctly assume, the faster our method. And it is possible to know a lot about the integrand: we usually have full access to the integrand's source code.
So, PN (in the form of BQ) can give integrators that are more
* Reliable—BQ gives error bars on F, quantifying reliability.
* Computationally cheap—via allowing better models, and via allowing efficient selection of evaluations, BQ can give very fast convergence.
Thanks for reading! This thread was drawn from our new book, #PNbook (with @PhilippHennig5 and @HansKersting)—check it out at the links below.
When I got #LongCovid in March 2020, I was 38 and healthy. If you are anything like I was then, it is hard to understand how bad Long Covid is. I think that we all have an instinct to just… look away. But, please, it is important that you look. 1/15
My own low-points: early on, I collapsed, shaking, and was taken to A&E in an ambulance. A year later, I did not have the energy to leave the house. Formerly, I was a marathon runner, but I brought on a bad relapse with a 700m walk. Many people have it much, much, worse. 2/
Long Covid feels like a hex. Your body and brain are wrong, in different ways on different days, unpredictable and unsettling. On the good days, you doubt yourself; on the bad, you doubt everything. The illness is capricious, boundless, wicked. 3/
I have now been sick with #longcovid for almost a year—below, some reflections on my convalescence. (1/10)
While remaining mostly functional, in many ways, I'm more sick in 2021 than I was in 2020. Two weeks ago, when I last felt well enough to walk outside, I managed only 0.7km before the post-exertional malaise came on: brain fog, fatigue, pain in my neck and arm. (2/10)
I was formerly a (somewhat) competitive distance runner. It's not that I'm ignorant of how to push my body, nor the consequences. During my first marathon, I pushed through hypoglycaemia, black and white vision, before having a seizure just over the finish line. (3/10)
Long COVID is nasty, but it is also really *weird*. (1/6)
1. My eyesight had always been fine, but became progressively worse after falling ill. Everything looked blurry. It got to the point where I maxed out the text size on all my devices. Then: over the course of two weeks, it: just got better! (2/6)
2. When I'm particularly fatigued, if I attend to a particular place in my head, I feel like I'm floating. Anywhere else—my chin, my fingers, my calves—and I sink. What the heck is happening? (3/6)
If you doubt expert forecasts of the future of work, you might like Appendix A of our 2017 report (robots.ox.ac.uk/~mosb/public/p…): we present results using only raw extrapolation of employment trends.
This extrapolation is much more optimistic than our forecasts using experts.
The horizontal (x) axis of the plot is the probability of an occupation having a higher share of employment in 2030 than in 2017. The vertical (y) axis is employment, or the number of jobs.
My personal view is that any attempt to adopt a purely data-driven approach to forecasting something as complicated as the future of work is just as likely to fail as using only expert judgement. Our report adopted a combination of the two.