Gael Varoquaux 🦋 Profile picture
Coder & Research director @inria â–şData, Health, & Computer science â–şPython coder, (co)founder of @scikit_learn, joblib, @probabl_ai â–şArt: @artgael â–şPhysics PhD
Jun 3 • 14 tweets • 5 min read
✨ #ICML2024 accepted! CARTE: Pretraining and Transfer for Tabular Learning


Why this is a jump forward for tabular deep learning 🤯, a stepping stone for tabular foundation models 🎉, and a study with much teaching on learning on real tables 👇
1/13arxiv.org/abs/2402.16785 Teaser: the contribution leads to sizeable improvements compared to many strong baselines, across 51 datasets.

We worked really hard on the baselines, testing many, some being new combination of tools (many teachings on neural networks and handling categories)
2/13 Image
Jan 30, 2023 • 10 tweets • 5 min read
⚠️A widespread confusion: calibration of predictors, as measured by expected calibration error, does not control completely that the predictor gives perfect probabilities P(y|X):

A predictor may be overconfident on some individuals and underconfident on others
🧵
1/10 The question is: do confidence scores of predictors correspond to actually controlled probabilities?

This question is important to take decisions balancing harm-benefit tradeoffs, eg in medicine, and has motivated characterizing calibration of predictors.
2/10
Jan 17, 2023 • 5 tweets • 3 min read
Our benchmark of tree-based models vs deep learning for tabular data: final version.

TL:DR: from small compute budget, @scikit_learn's HistGradientBoosting is best. With finer tuning of hyperparams, XGBoost brings a gain (here n ranges from 3,000 to 10,000) For large datasets (> 10,000), the picture differs slightly:
â–¸ Classification: deep learning brings benefits for limited compute power
â–¸ Regression: XGBoost always outperforms @scikit_learn's HistGradientBoosting

(these are relative units)
Oct 13, 2022 • 4 tweets • 2 min read
I make about 3200€ net / month (2760€ after tax) + yearly bonus ~ 6000€ (correcting previous tweet).
I'm a research director (tenured prof equivalent), 13 years after PhD, with (I think) a good track record.


Why I think that my salary is not too low👇 First, 80% of the French workers earns less than I do. So, I am, all in all privileged
insee.fr/fr/statistique…
(we typically get these numbers wrong, so a reality check is useful) 2/3 Image
Jul 19, 2022 • 11 tweets • 4 min read
⚡️Preprint: Why do tree-based models still outperform deep learning on tabular data?

We give solid evidence that, on tabular data, achieving good prediction is easier with tree methods than deep learning (even modern architectures) and explore why
hal.archives-ouvertes.fr/hal-03723551

1/9 Image We explicit what differentiates tabular data to signals (heterogeneity of columns) and select 45 open datasets, defining a standard benchmark.

We study average performance as a function of hyperparameter tuning budget: tree methods give best performance with less tuning.

2/9 Image
May 4, 2021 • 5 tweets • 3 min read
New paper in @PLOSCompBiol: Extracting representations of cognition across neuroimaging studies improves brain decoding

A deep model, for transfer learning, decoding a completely new study via universal latent representations of cognition
journals.plos.org/ploscompbiol/a…
1/5 The challenge we address is that small cognitive neuroimaging studies address precise cognitive question, but typical suffer small statistical power.

We accumulate data across studies to improve statistical performance of decoding in studies. Smaller studies benefit most.

2/5
Mar 5, 2021 • 8 tweets • 3 min read
New preprint: Accounting for Variance in Machine Learning Benchmarks

Lead by @bouthilx and @Mila_Quebec friends

We show that ML benchmarks contain multiple sources of uncontrolled variation, not only inits. We propose procedure for reliable conclusion 1/8arxiv.org/abs/2103.03098 Data split and hyper-parameter selection (even with fancy hyper-parameter optimization) appear as the leading source of arbitrary variations in ML benchmarks, beyond random weight init.

These must be sampled to give empirical evidence on algorithm comparison that generalize 2/8 Image
Dec 4, 2020 • 6 tweets • 4 min read
Our #NeurIPS2020 oral: NeuMiss networks: differentiable programming for supervised learning with missing values
neurips.cc/virtual/2020/p…

with Marine Le Morvan @JulieJosseStat @tomamoral @ErwanScornet, visiting @Mila_Quebec

Goal: predict with various missing mechanisms
Thread 1/5 The intuition: as features go missing, the best predictor must use covariances between features to compensate on the slope of observed features.

Classic approach: fitting with EM a probabilistic model.
Its limitations: requires model of missing mechanism & intractable with p
2/n
Feb 1, 2020 • 8 tweets • 3 min read
Even for science and medical applications, I am becoming weary of fine statistical modeling efforts, and believe that we should standardize on a handful of powerful and robust methods.

An opinionated thread to give context for


1/8
First, analytic variability is a killer.

eg in "standard" analysis for brain mapping onlinelibrary.wiley.com/doi/full/10.10…, for machine learning in brain imaging
sciencedirect.com/science/articl…
or more generally in "hypothesis driven" statistical testing
go.gale.com/ps/anonymous?i…

2/8
Dec 8, 2019 • 10 tweets • 5 min read
Comparing distributions: Kernels estimate good representations, l1 distances give good tests

A simple summary of our #NeurIPS2019 work
gael-varoquaux.info/science/compar…

Given two set of observations, how to know if they are drawn from the same distribution? Short answer in the thread.. For instance, do McDonald’s and KFC use different logic to position restaurants? Difficult question! We have access to data points, but not the underlying generative mechanism, governed by marketing strategies.