My Authors
Read all threads
What a wonderful opportunity to talk about experimental design and the subtleties of what overfitting means in a Bayesian context! A "I've been consulting all day and my brain is too tired to do real work" thread.
Overfitting and identifiability are intimately related concepts. When you have a complex model but only small data sets then there will be _many_ model configurations consistent with the little data that you observed.
If your inferences are quantified by a point estimate then you will have to choose a single point amongst the entire subset of model configurations that are similarly consistent with any given observation. Any choice of a single point, however, is unlikely to generalize well.
In order to generalize well one has to quantify _all_ of those nearly equivalent model configurations. How do we quantify a subspaces of nearly equivalent model configurations? Probability distributions are a wonderful possibility, which is why Bayesian inference is so powerful!
But it's one thing to _say_ that a Bayesian posterior quantifies which model configurations are consistent with the observed data. Isn't entirely another to accurately compute the extent of a posterior that spans a complex subspace of nearly equivalent model configurations.
The more degenerate a model the harder the Bayesian computation will be, which is often why you often hear people say that a bad fit indicates a modeling problem. It's also why algorithms with sensitive fitting diagnostics are so important for robust inference. #divergences4lyfe
How do you know whether the measurements from a given experiment will sufficiently inform a phenomenological model? You have to analyze the _experimental design_. In practice that means analyzing simulated observations and seeing how often they are sufficiently informative.
If a given experiment doesn't provide enough information _in expectation_ then you need to collect more data, or make complementary observations, or supplement the analysis with more domain expertise. Even then you might get unlucky with a particularly uninformative observation.
<me trying to talk anyone out of doing a nonparametric analysis which has lovely asymptotic properties at the expense of shitty behavior for any reasonably sized observation>
In summary: from a statistical perspective overfitting manifests as degenerate likelihoods and hence, in a Bayesian analysis, degenerate/poorly-identified posteriors.
If your model doesn't force overly complex models, i.e. if it includes simpler models, then an accurate quantification of those predictions generated from those degenerate posteriors will be particularly robust to the overfitting.
Accurate quantification of degenerate posteriors, however, is hard, even for sophisticated tools like dynamic Hamiltonian Monte Carlo. Your best friend in these circumstances are algorithm that can identify when they can't fit. Like Hamiltonian Monte Carlo!
Those diagnostics, combined with careful experimental design via simulation studies, can inform when you are at risk of overfitting and hence when you might need more informative observational models or more informative prior models.
In that sense overfitting is the consequence of careless analysis, reflecting more on the priorities of the analyzer than any particular method chosen.
I forgot one of the most important caveats! Your experiment design analysis is only useful if your experimental design is well implemented. If the practical implementation of your experiment is very different from the design then those expectations will be nearly worthless.
This is why "blinding" and "preregistration" are so insufficient. You'll never know how poorly your experimental design was implemented, and how sophisticated of an experimental model you'll need, until analyzing the actual data.
Awkward high fives to the optimists who think that they can precisely predict how an experiment will play out, and all of the subtle systematic effects that have to be modeled, without being there or looking at the actual data.
Then again maybe I'm the only one who's ever had to deal with the influence of cross talk on detector readouts, paint fumes on mosquito behaviors, selection bias on reported results, multiple response behaviors, etc. 🤷‍♂️
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with \mathfrak{Michael "El Muy Muy" Betancourt}

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!