Read on Twitter

12,399 views

Maria Glymour

@MariaGlymour

, 28 tweets, 7 min read Read on Twitter

@sherrirose

@sherrirose

So excited for @sherrirose long-awaited workshop on computational health economics & outcomes. @UCSF_Epibiostat #epitwitter

@biostatistics

@biostatistics

She leads by calling for value of interdisciplinary research. Need both strong theory & practical/relevant for practice. Sometimes theoretical ideal and practicability are in conflict. Callout for articles on methods grounded in real problems for journal @biostatistics.

@biostatistics

@biostatistics

@biostatistics Computational health economics (#econtwitter): how can we affect policy?
Data first, methods second. Usefulness of electronic health database is a new resource, but usefulness for research really varies (fancy stats doesn't solve major data problems)

Data buckets: billing (big n, but limited variables-challenge for causal inf-see doi.org/10.1111/1475-6…), clinical, imaging (clinical & non-clinical, eg place), registry (often deep phenotype, but limited follow-up, n),

@Oakes007

@Oakes007

data from surveys (Calls out @Oakes007 paper w/ Messer & Mason (doi.org/10.1093/aje/kw…) on structural confounding) ,gov't, wearable, news media

Digital (google) data many limitations but also may include people who are not in our other data sources eg Abebe 2019 "Using Search Queries to Understand Health Information Needs in Africa" re getting highly sensitive info. aaai.org/ojs/index.php/…

Big challenge is linking across data sets.

Generalizability & fairness: Reminder! start w/ what is the research question? Who is in the target pop? Often people are excited about data or algorithm, but haven't thought through the research question or target population.

Generalizability questions are relevant for prediction, clustering & inference questions (not just causal questions!). She has concerns about generalizability of complex machine learning prediction models. See her commentary: doi:10.1001/jamanetworkopen.2018.1404

"The machine learning researchers who develop novel algorithms for prediction and the clinical teams interested in implementing them are frequently and unfortunately 2 nonintersecting groups."

Question re getting tuning parameters right. She says pay attention! eg random forests (in R?) default tuning parameter gives only one person per final leaf! Yikes. Talk about overfitting.

Shows results from Ellis et al re persistence of chronic conditions across years in health records. eg 85% of people coded for HIV in year 1 are also coded so in year 2. But for paraplegia, only 57%!! doi.org/10.1016/B978-0…

Fairness: who decides the research q? who is target popln? what do the data reflect (eg due to structural racism or other biases, who's in the data?)? how will the algorithm be assessed? what is fair for this algorithm? Getting a lot of att'n in comp sci but not much in health.

We need to think about social impact of algorithms.

@propublica

@propublica

Talks about @propublica & @statnews analysis of blacks and native american representation in clinical trials of drugs for conditions differentially affecting blacks and native americans. propublica.org/article/black-…

Now on to prediction: big data, often hundreds of potential predictors. difficult to correctly specify the parametric regression (complex function beyond main + interaction terms likely). May have more unknown parameters than observations.

Prediction tough if p>n or data are sparse (most people only have 1 value on the covariate, eg only a few people have any particular diagnosis). Hit inherent bias-variance trade-off. Many possible algorithms (logistic reg, regression trees, neural nets..) How to choose?

Even for similar questions and similar data sets, we do not know which approaches will perform "best". This uncertainty about which to choose is the reason to explore ensembles, i.e., weighted averages of multiple algorithms. What do we mean by best? Define a loss function.

Common loss function: squared error. Goal is to choose an estimator that minimizes. Use cross-validation to compare performance. Ensembles: build collection of algorithms of all weighted averages of the algorithms. Wtd avg outperforms any single algorithm.

We have 3 estimators we are considering- how can we responsibly combine? Do X-validation for each of the 3 algorithms. Get vectors of X-validated predicted values. Choose optimal weights of alg by regressing the outcome on the 3 vectors Z (predicted value from each algorithm)

This is what superlearner does. Performs asymptotically as well as the best choice among the family of algorithms. Adding more algorithms in general only improves performance.
Another use of machine learning is to pre-screen to reduce # of variables to feed into other algorithm

This can allow you to build in investigator knowledge to the process. Refs to stacking algorithm wolpert 1992...
James et al an intro to statistical learning.
Software options: Superlearner (Polley's github for R). H2o.
Superlearner-SAS on github.

She recs playing with a toy data set to "change stuff and see what happens" when learning a new software package.

Real examples: risk adjustment for insurance payments. Over 50 million people in US are in an insurance company that uses risk adjustment. Goal is to encourage competition based on efficiency and quality, not "cherry picking" healthy patients.

Plan Payment risk adjustment based essentially on regression-spending predicted by a set of hierarchical condition categories. "Risk adjustment is one of the most consequential applications of regression for social policy"

Insurers have incentive to assign patients as many condition categories as possible (upcoding). If we can reduce # of codes, might limit upcoding. Her 2nd paper on risk adjustment, showed that reducing set of 10 variables was 92% as efficient as using all of the codes..

But problem w/ generalizability-results w/ limited set of variables didn't perform well in new data w/ a different population.

@Microsoft

@Microsoft

Wow - she is such a good lecturer and this talk is an amazing combination of inspiring and educational, but my computer battery is about to die (thanks @Microsoft) so...

Like this thread? Get email updates or save it to PDF!

Subscribe to Maria Glymour

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Maria Glymour

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

Related threads

Trending hashtags

Did Thread Reader help you today?