\mathfrak{Michael
Once and future physicist masquerading as a statistician. Reluctant geometer. @mcmc_stan developer. Support my writing at https://t.co/Ut25MyjIAy. He/him.
Leo Burgy Profile picture Robert Goldman Profile picture Chad Scherrer Profile picture 4 subscribed
Mar 6 15 tweets 2 min read
You want to know why we keep seeing terrible explanations of p-values? Because concise explanations understandable by a general audience are fundamentally impossible.

Oh yeah it's a thread about communicating p-values. The proper definition of a p-value — not just what it is but also how it is used — is technical and even in full formality pretty subtle.
Feb 8, 2023 6 tweets 1 min read
Friendly reminder that models at best _approximate_ some true data generating process. A natural question is how well a model approximates the true data generating process, but without actually knowing that truth we can't provide a meaningful answer. 1/n The best we can do in practice is ask how consistent models are with the brief glimpses of the true data generating process encoded in observations.

The divergence between these questions -- what we want to ask and what we can ask -- is also known as overfitting. 2/n
Jan 6, 2023 35 tweets 6 min read
It's the first Friday of the new year so let me take the opportunity to complain about the scourge of "default" prior models, meaningless terms like "informative prior", and real harm that they cause. One of the fundamental challenges of statistical inference is that making and validating assumptions is hard and context dependent. In particular there are no universal assumptions that are adequate in every analysis. There's a reason why "it depends" is a statistical mantra.
Sep 23, 2022 23 tweets 4 min read
I was just asked an interesting question so why don't we turn it into a little thread? How are we able to do any probabilistic inference on continuous observational spaces when the probability of any single observation is zero? More formally let's consider a continuous space Y, such as the real numbers. An observational model pi_{theta} is a collection of probability distributions over Y, each indexed by a model configuration, or parameter, theta.
Sep 7, 2022 18 tweets 3 min read
One of the aspects of Bayesian modeling that continuously brings me joy is the difficulty in hiding from modeling assumptions and their consequences. You don't have to look at them, of course, but a Bayesian analysis makes them _really_ bright. To be clear accessible modeling assumptions aren't unique to Bayesian inference. Frequentist estimators are calibrated with respect to some assumed model, but when everyone focuses on estimators and takes the calibrations for granted there's little motivation to look.
Jul 21, 2022 20 tweets 5 min read
One of my constant frustrations is people taking "factor of two"/"order of magnitude"/"back of the envelope" calculations too seriously based on the implicit rationalization that the accuracy of the outputs should be similar to the accuracy of the inputs. A short thread. These calculations necessarily convolve all of the input uncertainties together which typically results in a more uncertain output. In fact the output uncertainty is often surprisingly large relative to our naive expectations.
Jun 14, 2022 30 tweets 5 min read
"Samples" is one of those statistics words that everyone seems to interpret a little bit differently, and those different interpretations can have important practical consequence. Come along as I dig into what "posterior samples" actually are and how we should describe them. Firstly I go into much more detail on this topic, along with lots of pictures and references, in my "Rumble in the Ensemble" case study, betanalpha.github.io/assets/case_st…. Here I'll just review some of the important concepts.
May 26, 2022 28 tweets 5 min read
This question comes up from time to time and unfortunately I don't think that there is a satisfying let alone productive answer. The problem is how "science" becomes awkwardly operationalized in the literature. A short, spontaneous thread. We often start our journey into science with the presentation of the "scientific method" early in our education that we tend to internalize and take for granted. If you go back to the individual steps in the scientific method, however, they aren't all that well defined.
May 26, 2022 8 tweets 2 min read
Recently I've been thinking about how to expand my Markov chain Monte Carlo material, in particular how to demonstrate each of the theoretical concepts and why they are important with explicit examples. This is a very, very rough first step towards that betanalpha.github.io/assets/case_st…. My ultimate goal is to start the material with a discussion like this about the basics of Markov chain Monte Carlo _given_ a particular Markov transition. This would include the basics of convergence and estimation and why we need all of those "mild technical conditions".
Dec 30, 2021 14 tweets 3 min read
New paper alert! In this paper Charles Margossian and I attempt a unified framework for incorporating implicit functions into automatic differentiation, arxiv.org/abs/2112.14217. There are many kinds of implicit systems -- ranging from finite dimensional systems like difference equations to infinite dimensional systems like differential equations -- and while many have been implemented in automatic differentiation the details vary between systems.
Aug 26, 2021 43 tweets 7 min read
Everyone loves statistics by simulation/sampling these days, but samples are not magic and they do not allow you to implement every important probabilistic operation in a straightforward way. A short thread on the the common ways to work with probability distributions. There are basically two ways that we can actually implement probability theory in practice, and by that I mean estimate expectation values of various functions with respect to a given probability distribution.
Aug 24, 2021 12 tweets 2 min read
Friendly reminder that if deriving a point estimator is really fast but quantifying a Bayesian posterior is really slow then the point estimator provides only an extremely incomplete characterization of the full likelihood function. Under _certain_ conditions point estimators like maximum likelihood estimators, and differential information at the point estimate, can provide a reasonable approximation to the entire likelihood function, and hence all of the model configurations consistent with the data.
Aug 22, 2021 23 tweets 4 min read
Of course no one will ever know everything, but statistics education is particularly well-suited to making people feel simultaneously (yes pun intended) frustrated and betrayed long after the courses are over. Some late night rambling thoughts while I do hurricane prep! Statistics, both theoretical and applied, is a fundamentally challenging subject. Probability theory on continuous spaces is all kinds of messed up, and modeling real measurements is messier then the measurements themselves. If you think otherwise then this thread isn’t for you.
Aug 12, 2021 21 tweets 4 min read
This is an important question that hits on some of the crucial differences between the idealizations of Bayesian inference that are usually taught in introductory classes and how Bayesian inference is actually implemented in practice. A short thread! One of the nice theoretical properties of Bayesian updating (i.e. the application of Bayes' Theorem in Bayesian inference) is that it's compatible with any _product structure_ of the observational model.
Jul 8, 2021 29 tweets 5 min read
Because it's thunderstorming outside let's do a quick thread on the subtle differences between real spaces and Euclidean spaces. The real line is our usual mathematical model for a continuum -- no matter how deep we zoom in there are still and infinite number of points in any neighborhood. (Today isn't your day, p-adics).
Jul 6, 2021 33 tweets 6 min read
Because this keeps coming up here's a short thread on why tools that claim to seamlessly fit discrete parameters are not the boons that you might think they are. First let's make sure we're on the same page with what a "fit" is in Bayesian inference. A Bayesian model is specified by a joint probability distribution over the data y and model configs theta; conditioning that distribution on observed data yield a posterior distribution.
May 25, 2021 8 tweets 2 min read
This summer I hope to draft a case study exploring the ambiguous and varied ways in which "generative modeling" is defined. "Modeling the data generating process" sounds appealing but what does it really mean? There is a lot to unpack but I want to mention a few key points here. One of the key features of "generative" in the statistical modeling sense is it's not a monolithic description of a model, nor is it a binary classification. Parts of a model can be more generative and other parts can be less generative.
May 22, 2021 13 tweets 3 min read
Apologies @osazuwa I'm going to commandeer this question to advertise my most recent case study on sampling, betanalpha.github.io/assets/case_st…. A short thread. The MAP, or "maximum a posteriori" point, is the model configuration theta that maximizes the posterior density function \pi(theta | tilde{y}). Because the posterior density representation depends on how the model configuration space is parameterized this is kind of weird object.
Apr 15, 2021 31 tweets 5 min read
Alright, let's do it.

Random variables: what they actually are and the many incompatible ways that they're often interpreted. Consider a space, X, and a probability distribution, pi, that self-consistently allocates probability to nice subsets of X [nice refers to elements of a sigma-algebra over X]. In this thread I'll try to wrap largely ignorable technical comments in square brackets like this.
Mar 19, 2021 19 tweets 3 min read
<writes a compact paper that requires certain proficiencies> OMG THIS IS TOO DENSE AND TECHNICAL NO ONE CAN UNDERSTAND THIS!

<writes a long paper that's mostly introduction of the required proficiencies> OMG THIS IS SO LONG AND NO ONE HAS THE TIME READ IT!

A short Friday rant. First and foremost statistics, and the probability theory on which it relies, is a mathematical tool. It requires a certain mathematical proficiency -- both technical and conceptual -- to be understood and wielded responsibly. _Anybody_ can become proficient, but it takes time.
Mar 17, 2021 18 tweets 4 min read
Excited to have contributed to this paper on practical methods for investigating the robustness of Bayes factors lead by Daniel Schad, arxiv.org/abs/2103.08744. A short thread with some of my favorite insights. I believe this paper was initiated towards the end of drafting the Bayesian workflow in cognitive science paper with Daniel and @ShravanVasishth when I mentioned that many of the workflow ideas could be generalized to Bayes factor implementations with a little bit of work.