Willie Neiswanger Profile picture
Assistant Professor @USC in CS + AI. Previously @Stanford, @SCSatCMU. Machine Learning, Decision Making, AI-for-Science, GenAI, Uncertainty Estimation, Systems.

Jul 7, 2021, 9 tweets

(1/9) Presenting: Bayesian Algorithm Execution (BAX) and the InfoBAX algorithm.

Bayesian optimization finds global optima of expensive black-box functions. But what about other function properties?

w/ @KAlexanderWang @StefanoErmon at #ICML2021

URL: willieneis.github.io/bax-website

(2/9) Methods like Bayes opt / quadrature can be viewed as estimating properties of a black-box function (e.g. global optima, integrals). But in many applications we also care about local optima, level sets, top-k optima, boundaries, integrals, roots, graph properties, and more

(3/9) For a given property, we can often find an algorithm that computes it, in the absence of any budget constraint. We then reframe the task as: how do we adaptively sample to estimate *the output of this algorithm*?

We call this task “Bayesian algorithm execution” or BAX.

(4/9) Given an algorithm that computes the property we desire, our procedure InfoBAX sequentially chooses queries to the black-box function that maximally inform us of this computable property.

(5/9) Example 1: Bayesian local optimization.

If we run InfoBAX with a local optimization algorithm (e.g. evolution strategies, Nelder-Mead method, gradient descent), we get a variant of Bayesian optimization that targets local, rather than global, optima.

(6/9) Here’s an animation of InfoBAX with an evolution strategy.

Notice that, unlike (standard) Bayesian optimization, this algorithm only spends budget converging to a single local optima. It shows good performance in high dimensions.

(7/9) Example 2: Estimating shortest paths in graphs.

In graphs where it is expensive to access edge weights (e.g. transportation networks) we can use InfoBAX with Dijkstra’s algorithm to estimate shortest paths.

(8/9) Example 3: Top-k estimation.

Given a set of points X, we may wish to estimate the subset of k points that have the highest value under a black-box function f. This generalizes standard Bayesian optimization (i.e. top-1 estimation) to top-k estimation.

(9/9) Our method has connections to stats/ML areas such as Bayesian optimal experimental design, information theory, optimal sensor placement, stepwise uncertainty reduction, likelihood-free Bayesian inference, and more.

See more discussion in our paper arxiv.org/abs/2104.09460

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling