My Authors
Read all threads
"Multilevel" model is perhaps the most loaded term in all of statistics; in most use cases it carries with it a surprisingly large number of independent assumptions. A short thread.
Important caveat: the language used in the applied and theoretical stats literature is inconsistent and terms are often motivated by historical contexts that are no longer relevant. The language I will use in this thread is entirely my own.
"Multilevel" models are used in the context of regression where we want to understand how changes in some known covariates influence the statistical behavior of some unknown variates.
Typically we specific the observational model for the variates with parametric families of probability density functions and then introduce covariate dependence by replacing one or more of the parameters with functions of the covariates,

pi(y | x; phi) = pi(y; f(x, phi)).
I like this construction because it allows us to reason about what behaviors of the variate distribution are correlated with the covariates. The covariates are correlated only with the location of the variates? Then only the location parameter is a function of the covariates.
All of this regression context then motivates looking functions of the form f(x, phi) that might be relevant to a particular data generating process. In particular what functional forms will be useful?
Linear regression approximates f with its Taylor expansions in some neighborhood of the covariates. This is by far the best way to think about linear regression and I will not be taking any questions on that matter...
"Multilevel" models take a different approach. Instead of thinking about an explicit function form they discretize f(x, phi) into separate parameters corresponding to intervals of covariate values,

f_{n} \approx f(x, phi) for x_{n} < x < x_{n + 1}.
This discretized approximation is quite flexible, able to capture a variety of functional behaviors at the expense of losing resolution of the covariate values. It's particularly useful when the x are already discretized.
What are set of discrete f_n parameters called? Are they a factor? Or a level? Honestly I have given up on trying to find consistent terminology that everyone will agree upon.

Anyways, we're nowhere near done.
So far we've discretized the influence of just one covariate, but what if we have multiple covariates? If we discretize more than one covariate then a function like f(x^1, x^2, phi) is characterized by an infinite series of parameters.
The first order behavior assumes that the covariates influence f independently,

f(x^1, x^2, phi) = f(x^1, phi) f(x^2, phi).

Then we can write

f(x^1_n, x^2_m, phi) \approx f^1_n + f^2_m.
The second order behavior assumes that the covariates influence f in pairs,

f(x^1, x^2, x^3, phi) = f(x^1, x^2, phi) * f(x^2, x^3, phi) * f(x^1, x^3, phi).

Then we can write

f(x^1_n, x^2_m, x^3_l) = f^12_n^12 + f^23_n^23 + f^13_n^13.
Here n^ij indexes the pairwise intersections of the covariate discretizations. It's a bit easier to see with pictures but this thread is already too long, and really you should hire me to give a course at your company to see accompanying figures. ;-)
For the mathematically inclined we're treating each discretized covariate group as a vector and expanding the output function as a tower of tensor products,

1 \otimes 2 \otimes 3 = (1 \oplus 2 \oplus 3) \otimes ( (1 \otimes 2) \oplus (2 \otimes 3) \oplus (1 \otimes 3) )...
Beyond the math what are we actually doing? We're assuming that some complex function relationship between a parameter in our model and observed covariates can be decomposed into independent contributions from each discrete covariate value.
If we're feeling particularly saucy then we might add corrections to account for two-way interactions, three-way interactions, etc.
To summarize, the heart of a "multilevel" model is assuming that the covariates influence the rest of the model independently (at least to first order) and then discretizing the covariate influence into a finite number of functions.
I used to call the parameters corresponding to each first-order covariate influence a "level", with "multilevel" corresponding to adding those first order influences together to approximate the total influence, but I'm not sure if that's going to confuse everyone.
Anyways, note that "hierarchal model" has not yet been involved. That's because mathematically there's nothing in this construction that has _required_ hierarchical priors. In practice, however, they are _almost always assumed_.
In particular each group of parameters corresponding to a covariate (sometimes people call these discretization levels...) is given its own hierarchical model to add some dynamics regularization and help fit when all of the covariate intervals aren't well populated.
Ugh. "these discretization levels" -> "these discretizations levels" and "dynamics regularization" -> "dynamical regularization". Autocorrect is murdering me right now.
Now go to software like `lm` and its derivations and "multilevel" implies that the original statistical model for the variate takes the form of a general linear model. In other words "multilevel" presumes hierarchical priors and a general linear model for the variate.
This kind of shorthand is incredibly dangerous, especially after a few generations where the original motivation is lost. By taking all of those assumptions for granted we forget to ask if they're needed or can be replaced with other assumptions better suited to an application.
I much prefer to say "multilevel hierarchical general linear model" to make it clear the entire model that I am assuming and facilitate discussion about whether all of those assumptions are appropriate.
Even better let's stop trying to specify models with loaded terminology entirely and just specify the full model with probabilistic programs that are rich enough to communicate all of the assumptions directly. Is that too much to ask? -fin-
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with \mathfrak{Michael "El Muy Muy" Betancourt}

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!