Goal: predict with various missing mechanisms
The intuition: as features go missing, the best predictor must use covariances between features to compensate on the slope of observed features.
Classic approach: fitting with EM a probabilistic model.
Its limitations: requires model of missing mechanism & intractable with p 2/n
Our approach: write the optimal predictor under various assumptions, approximate with a differentiable function composition: a Neural Network.
This theory leads to introduce a new non-linearity: the multiplication by the missingness mask at each layer 3/5
This non-linearity has a much better approximation capability then wide or deep MLPs, in theory and in practice
(our previous work showed that wide ReLU MLPs are consistent with missing values proceedings.mlr.press/v108/morvan20a… ) 4/5
These approximations are good for multiple missing-value mechanisms, including missing _not_ at random, unlike EM or imputations (and these don't scale for many features).
The trick: differentiable programming to optimize a predictor function well suited for missing values
Ce travail va être présenté en Français mardi 08/12 pour le déjeuner
