Thread by @MariaGlymour on Thread Reader App

Department meetings at @UCSF_Epibiostat have become surprisingly fabulous. Every meeting a thoughtful covid update from George Rutherford. +Today new faculty member @Jean_J_Feng jeanfeng.com re fair machine learning algorithms in medical care.

She thinks of ML apps along 2 dimensions: severity of healthcare situation (low=apple watch for Afib) (high=viz.ai contact to id stroke in process) and along significance of the information provided (low=apple watch) (high=idx-DR diabetic retinopathy).

Nothing approved yet with high severity and high significance of information. Suggests we are really uncomfortable deferring decisions to ML algorithm instead of our Doctor. Why?

Doctor has multiple priorities to balance: Hippocratic oath, health of individuals, public health.
ML algorithm: 1 goal: predictive accuracy across entire population.

Can we do better? 3 goals:
Encode self-awareness /know what it doesn't know in the algorithm.
Interpretability: clarify how important is each variable for making predictions?
Regulation: figure out how to regulate continuously evolving machine learning algorithms?

#1: self-awareness: the case of misleading predictions. Linear regressions allow us to extrapolate off-support. Very high dimensional data hard to tell if you're far away from your training data. So: train a model to tell us if we should even be doing predictions for new obsv'n.

2 types of uncertainty that would lead us to abstain from making a prediction. Aleatoric uncertainty: response is inherently noisy at x.
Epistemic uncertainty: the input query is for a rare set of predictors- eg you haven't seen patients like this in your training data

Need a model that considers both types of uncertainty. Approach: penalized selective prediction models. (I may not have this right) include fake input data in your testing data, and penalize the model for making predictions for these people.

Next project: re interpretability/variable importance. Example you're trying to predict patient survival based on different medical tests. Current measures of variable importance problematic for highly correlated variables.

So they turned to Shapley-Values, a game-theory formulation. Game: predict the outcome. Players: variables. Payout: prediction accuracy. How do we assign payouts to players based on their contribution to the total payout?

The importance of variable Xi is the average increase in prediction accuracy when adding Xi to all possible variable subsets. So if evaluating say the GCS test, add it onto all possible subsets of other variables and see how much it improves accuracy in each.

This is challenging to calculate b/c too many possible subsets. Their solution: randomly sample feature subsets! So cool. How many such subsets do you need to sample? It must be at least the same as the # of observations.

This approach of sampling from the set of all possible variable subsets is computationally feasible. You get CIs! agreement across different ML algorithms.

3rd project: online machine learning in healthcare systems w/ feedback. Risks associated with modifying ML algorithms are a more severe version of the risks associated with modifying drugs. Length of iteration for ML algorithm might be days or weeks, but for drugs months to years

So need to decide what is an acceptable modification? Might want to specify a non-inferiority margin. What is an acceptable sequence of modifications? Even if each modification is acceptable, in combination, they may not be. Bio-creep!

They've proposed an online hypothesis testing framework: approval contingent on showing acceptability w/r/t all previously approved modifications. You also need to correct for multiple comparisons over the entire time period. Awesome talk.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll