Tweet

Gael Varoquaux

4 Dec, 6 tweets, 4 min read

@JulieJosseStat

Our #NeurIPS2020 oral: NeuMiss networks: differentiable programming for supervised learning with missing values
neurips.cc/virtual/2020/p…

with Marine Le Morvan @JulieJosseStat @tomamoral @ErwanScornet, visiting @Mila_Quebec

Goal: predict with various missing mechanisms
Thread 1/5

The intuition: as features go missing, the best predictor must use covariances between features to compensate on the slope of observed features.

Classic approach: fitting with EM a probabilistic model.
Its limitations: requires model of missing mechanism & intractable with p
2/n

Our approach: write the optimal predictor under various assumptions, approximate with a differentiable function composition: a Neural Network.

This theory leads to introduce a new non-linearity: the multiplication by the missingness mask at each layer
3/5

This non-linearity has a much better approximation capability then wide or deep MLPs, in theory and in practice
(our previous work showed that wide ReLU MLPs are consistent with missing values proceedings.mlr.press/v108/morvan20a… )
4/5

These approximations are good for multiple missing-value mechanisms, including missing _not_ at random, unlike EM or imputations (and these don't scale for many features).

The trick: differentiable programming to optimize a predictor function well suited for missing values
5/5

https://twitter.com/ParisMLgroup/status/1334911325546426368

Ce travail va être présenté en Français mardi 08/12 pour le déjeuner

https://twitter.com/ParisMLgroup/status/1334911325546426368

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @GaelVaroquaux

Gael Varoquaux

@GaelVaroquaux

1 Feb

https://twitter.com/GaelVaroquaux/status/1223305762350534657

Even for science and medical applications, I am becoming weary of fine statistical modeling efforts, and believe that we should standardize on a handful of powerful and robust methods.

An opinionated thread to give context for

https://twitter.com/GaelVaroquaux/status/1223305762350534657

1/8

First, analytic variability is a killer.

eg in "standard" analysis for brain mapping onlinelibrary.wiley.com/doi/full/10.10…, for machine learning in brain imaging
sciencedirect.com/science/articl…
or more generally in "hypothesis driven" statistical testing
go.gale.com/ps/anonymous?i…

2/8

We need weakly-parametric models that can fit data as raw as possible, without relying on non-testable assumptions.

Machine learning provides these, and tree-based models need little data transformations.

3/8

Read 8 tweets

Gael Varoquaux

@GaelVaroquaux

8 Dec 19

Comparing distributions: Kernels estimate good representations, l1 distances give good tests

A simple summary of our #NeurIPS2019 work
gael-varoquaux.info/science/compar…

Given two set of observations, how to know if they are drawn from the same distribution? Short answer in the thread..

For instance, do McDonald’s and KFC use different logic to position restaurants? Difficult question! We have access to data points, but not the underlying generative mechanism, governed by marketing strategies.

To capture the information in the spatial proximity of data points, kernel mean embeddings are useful. They are intuitively related to Kernel Density Estimates

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!