Our #NeurIPS2020 oral: NeuMiss networks: differentiable programming for supervised learning with missing values
neurips.cc/virtual/2020/p…

with Marine Le Morvan @JulieJosseStat @tomamoral @ErwanScornet, visiting @Mila_Quebec

Goal: predict with various missing mechanisms
Thread 1/5
The intuition: as features go missing, the best predictor must use covariances between features to compensate on the slope of observed features.

Classic approach: fitting with EM a probabilistic model.
Its limitations: requires model of missing mechanism & intractable with p
2/n
Our approach: write the optimal predictor under various assumptions, approximate with a differentiable function composition: a Neural Network.

This theory leads to introduce a new non-linearity: the multiplication by the missingness mask at each layer
3/5
This non-linearity has a much better approximation capability then wide or deep MLPs, in theory and in practice
(our previous work showed that wide ReLU MLPs are consistent with missing values proceedings.mlr.press/v108/morvan20a… )
4/5
These approximations are good for multiple missing-value mechanisms, including missing _not_ at random, unlike EM or imputations (and these don't scale for many features).

The trick: differentiable programming to optimize a predictor function well suited for missing values
5/5
Ce travail va être présenté en Français mardi 08/12 pour le déjeuner

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Gael Varoquaux

Gael Varoquaux Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @GaelVaroquaux

1 Feb
Even for science and medical applications, I am becoming weary of fine statistical modeling efforts, and believe that we should standardize on a handful of powerful and robust methods.

An opinionated thread to give context for


1/8
First, analytic variability is a killer.

eg in "standard" analysis for brain mapping onlinelibrary.wiley.com/doi/full/10.10…, for machine learning in brain imaging
sciencedirect.com/science/articl…
or more generally in "hypothesis driven" statistical testing
go.gale.com/ps/anonymous?i…

2/8
We need weakly-parametric models that can fit data as raw as possible, without relying on non-testable assumptions.

Machine learning provides these, and tree-based models need little data transformations.

3/8
Read 8 tweets
8 Dec 19
Comparing distributions: Kernels estimate good representations, l1 distances give good tests

A simple summary of our #NeurIPS2019 work
gael-varoquaux.info/science/compar…

Given two set of observations, how to know if they are drawn from the same distribution? Short answer in the thread..
For instance, do McDonald’s and KFC use different logic to position restaurants? Difficult question! We have access to data points, but not the underlying generative mechanism, governed by marketing strategies.
To capture the information in the spatial proximity of data points, kernel mean embeddings are useful. They are intuitively related to Kernel Density Estimates
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!