My Authors
Read all threads
Even for science and medical applications, I am becoming weary of fine statistical modeling efforts, and believe that we should standardize on a handful of powerful and robust methods.

An opinionated thread to give context for


1/8
First, analytic variability is a killer.

eg in "standard" analysis for brain mapping onlinelibrary.wiley.com/doi/full/10.10…, for machine learning in brain imaging
sciencedirect.com/science/articl…
or more generally in "hypothesis driven" statistical testing
go.gale.com/ps/anonymous?i…

2/8
We need weakly-parametric models that can fit data as raw as possible, without relying on non-testable assumptions.

Machine learning provides these, and tree-based models need little data transformations.

3/8
We need non-parametric model selection and testing, that do not break if the model is wrong.

Cross-validation and permutation importance provide these, once we have chosen input (endogenous) and output (exogenous) variables.

4/8
If there are less than a thousand data points, all but the simple statistical question can and will be gamed (sometimes unconsciously), partly for lack of model selection. An example in neuroimaging biorxiv.org/content/10.110…

I no longer trust such endeavors, including mines.

5/8
For thousands of data points and moderate dimensionality (99% of cases), gradient-boosted trees provide the necessary regression model
scikit-learn.org/stable/modules…
They are robust to data distribution and support missing values (even outside MAR settings arxiv.org/abs/1902.06931)

6/8
For thousands of data points and large dimensionality, linear models (ridge) are needed.

But applying them without thousands of data points (as I tried for many years) is hazardous. Get more data, change the question (eg analyze across cohorts).

7/8
Most questions are not about "prediction". But machine learning is about estimating functions that approximate conditional expectations / probability. We need to get better at integrating it in our scientific inference pipelines.

For more, push me to write a paper on this.

8/8
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with Gael Varoquaux

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!