Yesterday I tweeted about nested data, with multi-level models (MLM) versus OL + cluster-robust variance estimation (CRVE). This made me think about another confusion that arise, between what are called fixed versus random effects.
Let’s begin with a simple relationship between a covariate X and Y in nested data, e.g. students i nested in school j. We are interested in understanding the relationship between X and Y at the student level.
Approach 1: Assume the schools are fixed, but that students are a random sample within these schools. Assume the relationship between X and Y is the same in all schools. This often amounts to including a dummy variable for each school in the model. Here I use OLS to estimate β_1.
Approach 2: Assume schools are fixed, but the relationship between X and Y can vary across schools. Now add in interactions. Here I use OLS to estimate separate relationships between X and Y for each school (η_1, …, η_J).
Approach 3: Assume the schools are a random sample from some population, but treat the relationship between X and Y as the same in all schools. Estimate β_1 using generalized least squares (GLS).
Approach 4: Assume schools are random, but now that the relationship between X and Y can vary across schools (also treated as random). Same as Approach 3 but change the last equation. Estimate both β_1 and the variation across schools.
Ok, here’s where the language gets confusing. Approach 3 is referred to as “Fixed” (as in ‘treat the relationship between X and Y as fixed’) in the MLM literature, but as a “Random Effects” model in economics!
In summary: Multi-level modelers use “fixed” or “random” for coefficients, while economists use “fixed” or “random” for models.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You know how excited @daniela_witten gets about SVD? I have about the same thing with kernels. Except that I'm not sure I explain them as well as she does SVD. Still, you're getting a thread on kernels!
Maybe one way of putting it is that kernels are dot products on steroid. The dot product is already pretty cool.
1) It's easy to compute and you learn about it in high school math (at least I did, who knows what kids learn in high school now).
Take two p-dimensional vectors x = (x1, x2, ..., xp) and y = (y1, y2, ..., yp), their dot product <x|y> is simply the sum of the product of their coordinates:
<x|y> = x1 y1 + x2 y2 + ... + xp yp.
Good morning everybody! Let's talk a bit about how I came to develop statistical / machine learning tools for genomics, healthcare and drug discovery.
I trained as an engineer at @IMTAtlantique, with a specialization in computer science. I didn't really enjoy statistics and graduated in 2005, back when AI belonged to scifi and nobody knew what machine learning was.
@IMTAtlantique What really interested me was bioinformatics - the idea that my training in maths and computer science could be put to use to help solve problems from the life sciences was very appealing! So I jumped at the opportunity to intern in a lab that was doing just that.
OK, so a bit of background about me: I'm French (and tweeting from Paris), and I'm currently an associate professor at an engineering school called @MINES_ParisTech.
@MINES_ParisTech The research group I'm in (CBIO) has a partnership with @institut_curie, which is a cancer research institute. CBIO has four PIs, working on various topics related to, you've guessed it, statistics / machine learning & cancer.
@MINES_ParisTech@institut_curie My plan for the week is to talk more about my career path, my research topics, and my love of kernels. Of course I'll also talk about what we do at @WiMLDS_Paris, about open/reproducible science, and about teaching machine learning!
I have organized multiple conferences over the years.
Tips to conference organizers to support women at your meeting
1- Actively consider gender and career stage balance in speakers. 2- Women and minorities may take a longer route to success, try to avoid ageist selection.
3- Provide lactation rooms (with equipment & milk storage). Pumps are heavy and a pain to carry around a meeting. The room should be close-by not a long walk away
4.- Small babies are welcome. Check there is a changing table accessible to dads & mums.
5.- Parents of young children are often postdocs, junior faculty who need and are grateful for childcare and/or travel scholarships.
6- Go Hybrid. Live stream & record talks. Its great if one is stuck in a lactation room, or watching remotely
@Bioconductor provides genome annotation for thousands of species and its packages are used in almost every biological discipline including
Immunology
Oncology
Evolution and Phylogenetics
cheminformatics
comparative genomics
epigenetics
pharamacogenomics
systems biology
etc
The core team with the community create standard class structures for data. Developers create methods that use these, creating a connected framework were packages work together and provide entire analysis workflows
The current release @Bioconductor 3.14, consists of 2083 #RStats packages, 408 experiment data packages, 904 annotation packages, 29 workflows and 8 books.