**
This Thread may be Removed Anytime!**

Twitter may remove this content at anytime! Save it as PDF for later use!

- Follow @ThreadReaderApp to mention us!
- From a Twitter thread mention us with a keyword "unroll"

`@threadreaderapp unroll`

Practice here first or read more on our help page!

One of the most frustrating aspects of the computational statistics literature is the varying convention for which assumptions will be implicitly assumed or not, making it near impossible to read a paper out the context of that convention. Let's talk about one bad consequence.

Like many other fields comp stats has evolved to be very provincial, with different perspectives evolving in different communities. For example in Markov chain Monte Carlo theory you can see the differences between work from the UK, Duke, Minnesota, etc.

Each of those perspectives give rise to their own conventions for notation, terminology, and, most importantly, assumptions. Papers written by that community for that community will often leave those assumptions implicit making them hard to read by those outside of the group.

Just because it bothers me so much, a short thread on why a false discover rate is not sufficient to make a binary decision.

A common motivation in model comparison is identifying which of two statistical models is most consistent with a given observation. Let's call one the "null hypothesis" and the other the "alternative hypothesis".

In certain completely arbitrary scientific fields the null hypothesis might be called the "background model" and the alternative hypothesis would be the "background plus signal model". Just an example...

Friendly reminder that from a math perspective probabilities in logistic regression are almost exactly the same as velocities in special relativity. If you understand log odds ratios then you secretly know the basics of special relativity! That may be why the former is so hard...

Another way of thinking about it -- logistic probabilities add in almost exactly the same weird way that relativistic velocities do. But everything adds _approximately_ linearly around p = 1/2 and v/c = 0. Keep that in mind the next time you use a linear probability model...

For the non-physicists who want to impress their friends: the axes x^- and x^+ in the right plot are light-cone coordinates and the straight lines are the trajectories of objects moving at constant velocities. w is the rapidity, an unconstrained relativistic velocity.

"Multilevel" model is perhaps the most loaded term in all of statistics; in most use cases it carries with it a surprisingly large number of independent assumptions. A short thread.

Important caveat: the language used in the applied and theoretical stats literature is inconsistent and terms are often motivated by historical contexts that are no longer relevant. The language I will use in this thread is entirely my own.

"Multilevel" models are used in the context of regression where we want to understand how changes in some known covariates influence the statistical behavior of some unknown variates.

Intersection of physics and probabilistic computation story time! These coin falling toys demonstrate both conservation of angular momentum and why funnel-shaped densities are hard to fit with Hamiltonian Monte Carlo.

As the coin spirals down potential gravitational energy is converted to kinetic energy -- the coin falls and accelerates. Because angular momentum is conserved the shape of the spiral is constrained; as the coin gets faster the radius of the spiral has to decrease proportionally.

The exact trajectory is ultimately determined by the shape of the funnel, and how the normal force that can be exerted on the coin interacts with all of these conserved energies and momenta.

One of the delightful insights in Good's "Good Thinking" is why collecting data until you achieve significance is fundamentally doomed to fail. I'm pretty sure this isn't a novel perspective, but it's the first I had seen it explained so clearly.

Under a point null hypothesis tail probabilities past some threshold (i.e. p-values) are uniformly distributed no matter the model. Consequently the p-values corresponding to a sequence of increasing measurements will be uniformly distributed marginally at each iteration.

At the same time because the data are growing at each iteration the p-values will be correlated with previous p-values. In other words the sequence of p-values forms a Markov chain (of some order) whose stationary distribution is that uniform distribution.