\mathfrak{Michael "Shapes Dude" Betancourt}'s

Oct 15, 2024 • 29 tweets • 4 min read

Maybe it’s asking for a bit too much, but I think it’s reasonable for those dismissing Bayesian methods in favor of frequentist methods toactually be using proper frequentist methods. Buckle in for a long thread about walking the frequentist talk. Frequentist methods are based on _estimators_. An estimator is just a function that maps data to some numerical output which may or may not be associable with some meaningful property of the system being studied, hence may or may not actually “estimate” it.

Mar 6, 2024 • 15 tweets • 2 min read

You want to know why we keep seeing terrible explanations of p-values? Because concise explanations understandable by a general audience are fundamentally impossible.

Oh yeah it's a thread about communicating p-values. The proper definition of a p-value — not just what it is but also how it is used — is technical and even in full formality pretty subtle.

Feb 8, 2023 • 6 tweets • 1 min read

Friendly reminder that models at best _approximate_ some true data generating process. A natural question is how well a model approximates the true data generating process, but without actually knowing that truth we can't provide a meaningful answer. 1/n The best we can do in practice is ask how consistent models are with the brief glimpses of the true data generating process encoded in observations.

The divergence between these questions -- what we want to ask and what we can ask -- is also known as overfitting. 2/n

Jan 6, 2023 • 35 tweets • 6 min read

It's the first Friday of the new year so let me take the opportunity to complain about the scourge of "default" prior models, meaningless terms like "informative prior", and real harm that they cause.

https://twitter.com/IsabellaGhement/status/1611388214753001472

One of the fundamental challenges of statistical inference is that making and validating assumptions is hard and context dependent. In particular there are no universal assumptions that are adequate in every analysis. There's a reason why "it depends" is a statistical mantra.

Sep 23, 2022 • 23 tweets • 4 min read

I was just asked an interesting question so why don't we turn it into a little thread? How are we able to do any probabilistic inference on continuous observational spaces when the probability of any single observation is zero?

More formally let's consider a continuous space Y, such as the real numbers. An observational model pi_{theta} is a collection of probability distributions over Y, each indexed by a model configuration, or parameter, theta.

Sep 7, 2022 • 18 tweets • 3 min read

One of the aspects of Bayesian modeling that continuously brings me joy is the difficulty in hiding from modeling assumptions and their consequences. You don't have to look at them, of course, but a Bayesian analysis makes them _really_ bright. To be clear accessible modeling assumptions aren't unique to Bayesian inference. Frequentist estimators are calibrated with respect to some assumed model, but when everyone focuses on estimators and takes the calibrations for granted there's little motivation to look.

Jul 21, 2022 • 20 tweets • 5 min read

One of my constant frustrations is people taking "factor of two"/"order of magnitude"/"back of the envelope" calculations too seriously based on the implicit rationalization that the accuracy of the outputs should be similar to the accuracy of the inputs. A short thread. These calculations necessarily convolve all of the input uncertainties together which typically results in a more uncertain output. In fact the output uncertainty is often surprisingly large relative to our naive expectations.

Jun 14, 2022 • 30 tweets • 5 min read

"Samples" is one of those statistics words that everyone seems to interpret a little bit differently, and those different interpretations can have important practical consequence. Come along as I dig into what "posterior samples" actually are and how we should describe them.

https://twitter.com/torfjelde/status/1536631471208812546

Firstly I go into much more detail on this topic, along with lots of pictures and references, in my "Rumble in the Ensemble" case study, betanalpha.github.io/assets/case_st…. Here I'll just review some of the important concepts.

May 26, 2022 • 28 tweets • 5 min read

This question comes up from time to time and unfortunately I don't think that there is a satisfying let alone productive answer. The problem is how "science" becomes awkwardly operationalized in the literature. A short, spontaneous thread.

https://twitter.com/NotThatrpg/status/1529591728658948099

We often start our journey into science with the presentation of the "scientific method" early in our education that we tend to internalize and take for granted. If you go back to the individual steps in the scientific method, however, they aren't all that well defined.

May 26, 2022 • 8 tweets • 2 min read

Recently I've been thinking about how to expand my Markov chain Monte Carlo material, in particular how to demonstrate each of the theoretical concepts and why they are important with explicit examples. This is a very, very rough first step towards that betanalpha.github.io/assets/case_st…. My ultimate goal is to start the material with a discussion like this about the basics of Markov chain Monte Carlo _given_ a particular Markov transition. This would include the basics of convergence and estimation and why we need all of those "mild technical conditions".

Dec 30, 2021 • 14 tweets • 3 min read

New paper alert! In this paper Charles Margossian and I attempt a unified framework for incorporating implicit functions into automatic differentiation, arxiv.org/abs/2112.14217. There are many kinds of implicit systems -- ranging from finite dimensional systems like difference equations to infinite dimensional systems like differential equations -- and while many have been implemented in automatic differentiation the details vary between systems.

Aug 26, 2021 • 43 tweets • 7 min read

Everyone loves statistics by simulation/sampling these days, but samples are not magic and they do not allow you to implement every important probabilistic operation in a straightforward way. A short thread on the the common ways to work with probability distributions. There are basically two ways that we can actually implement probability theory in practice, and by that I mean estimate expectation values of various functions with respect to a given probability distribution.

Aug 24, 2021 • 12 tweets • 2 min read

Friendly reminder that if deriving a point estimator is really fast but quantifying a Bayesian posterior is really slow then the point estimator provides only an extremely incomplete characterization of the full likelihood function. Under _certain_ conditions point estimators like maximum likelihood estimators, and differential information at the point estimate, can provide a reasonable approximation to the entire likelihood function, and hence all of the model configurations consistent with the data.

Aug 22, 2021 • 23 tweets • 4 min read

Of course no one will ever know everything, but statistics education is particularly well-suited to making people feel simultaneously (yes pun intended) frustrated and betrayed long after the courses are over. Some late night rambling thoughts while I do hurricane prep!

https://twitter.com/phdemetri/status/1429162274632122373

Statistics, both theoretical and applied, is a fundamentally challenging subject. Probability theory on continuous spaces is all kinds of messed up, and modeling real measurements is messier then the measurements themselves. If you think otherwise then this thread isn’t for you.

Aug 12, 2021 • 21 tweets • 4 min read

This is an important question that hits on some of the crucial differences between the idealizations of Bayesian inference that are usually taught in introductory classes and how Bayesian inference is actually implemented in practice. A short thread!

https://twitter.com/ChelseaParlett/status/1425927045448560644

One of the nice theoretical properties of Bayesian updating (i.e. the application of Bayes' Theorem in Bayesian inference) is that it's compatible with any _product structure_ of the observational model.

Jul 8, 2021 • 29 tweets • 5 min read

Because it's thunderstorming outside let's do a quick thread on the subtle differences between real spaces and Euclidean spaces.

https://twitter.com/yureq/status/1413208242554085382

The real line is our usual mathematical model for a continuum -- no matter how deep we zoom in there are still and infinite number of points in any neighborhood. (Today isn't your day, p-adics).

Jul 6, 2021 • 33 tweets • 6 min read

Because this keeps coming up here's a short thread on why tools that claim to seamlessly fit discrete parameters are not the boons that you might think they are. First let's make sure we're on the same page with what a "fit" is in Bayesian inference. A Bayesian model is specified by a joint probability distribution over the data y and model configs theta; conditioning that distribution on observed data yield a posterior distribution.

May 25, 2021 • 8 tweets • 2 min read

This summer I hope to draft a case study exploring the ambiguous and varied ways in which "generative modeling" is defined. "Modeling the data generating process" sounds appealing but what does it really mean? There is a lot to unpack but I want to mention a few key points here.

https://twitter.com/VincentAB/status/1396854601060585477

One of the key features of "generative" in the statistical modeling sense is it's not a monolithic description of a model, nor is it a binary classification. Parts of a model can be more generative and other parts can be less generative.

May 22, 2021 • 13 tweets • 3 min read

Apologies @osazuwa I'm going to commandeer this question to advertise my most recent case study on sampling, betanalpha.github.io/assets/case_st…. A short thread.

https://twitter.com/osazuwa/status/1395327660960661504

The MAP, or "maximum a posteriori" point, is the model configuration theta that maximizes the posterior density function \pi(theta | tilde{y}). Because the posterior density representation depends on how the model configuration space is parameterized this is kind of weird object.

Apr 15, 2021 • 31 tweets • 5 min read

Alright, let's do it.

Random variables: what they actually are and the many incompatible ways that they're often interpreted. Consider a space, X, and a probability distribution, pi, that self-consistently allocates probability to nice subsets of X [nice refers to elements of a sigma-algebra over X]. In this thread I'll try to wrap largely ignorable technical comments in square brackets like this.

Mar 19, 2021 • 19 tweets • 3 min read

<writes a compact paper that requires certain proficiencies> OMG THIS IS TOO DENSE AND TECHNICAL NO ONE CAN UNDERSTAND THIS!

<writes a long paper that's mostly introduction of the required proficiencies> OMG THIS IS SO LONG AND NO ONE HAS THE TIME READ IT!

A short Friday rant. First and foremost statistics, and the probability theory on which it relies, is a mathematical tool. It requires a certain mathematical proficiency -- both technical and conceptual -- to be understood and wielded responsibly. _Anybody_ can become proficient, but it takes time.

Share this page!

Enter URL or ID to Unroll