Rohit Vashish from @UCSF_BCHSIt presenting at @UCSF_Epibiostat department meeting about using electronic health record (EHR) data to emulate target trials to understand treatment effects for chronic disease management, example with type II diabetes
Beautiful explanation of the data gap: there is just no way to have good head-to-head RCTs of all the important medication decisions for all of the important potential outcomes (retinopathy, acute CVD event, etc). We must use "real world data", e.g. EHR data.
7 simple steps!
Step 1 is creating a common data model, which requires a controlled vocabulary to map (messy) data in EHRs into analyzable data elements. There are >5million OMOP concepts right now. OMOP uses 98 distinct vocabs (eg ICDs, SNOMED, LOINC, HPCS).
>137 million relationships between OMOP concepts and idfferent vocabs. For example, type 2 diabetes is defined using 407 OMOP concept IDs. So this is a big deal.
Really ambitious but great progress w/ UCSF data. They start w UCSF and then replicate across UCs, based on UC Data Discovery Portal (transparent, fair, safe, and respectful, eg code he writes could be shared w/ anyone across UCs).
Next step is to define the new user study design: treatment of interest (eg SGLT2-inhibotrs), exclusion criteria (no other db drug prior to SGLT-2 prescription), inclusion criteria (type 2 db, 1+ Hba1c measurement, metformin). Match as best possible to what you'd do in an RCT
Note time of switching to another drug so we can decide whether to censor or use another approach.
Define primary outcomes, e.g., reduction in hba1c, and secondary outcomes, e.g., CVD, kidney, ESRD, anemia, dementia, etc.
All implemented in UCSF data discovery portal, so easily shared with other researchers. Study design can be written as a spark sql code. Then describe included population, (he showed amazing graphs comparing eligible individuals across UC institutions.
Now need to understand how treatment assignment occurred. Here he uses a patient feature tensor. I think this is a really fancy propensity score model but I may be wrong. Okay they actually used lasso regression w/10x cross-validation, then match using a distance metric.
Based on the tensor model can just describe baseline clinical features of treated vs untreated. Again "easy" to do across UCs. Cute picture of matching based on propensity score. After matching, distribution of people treated vs not treated propensity scores are similar.
Developing methods to assess multi-dimensional covariate balance. Matching actually did not improve balance for ALL covariates. Do the same thing at UCD, UCI, UCLA, and UCSD,
Once it's all matched, you can implement a standard analysis, e.g., Cox model. This is work in progress so I won't post his slide with prelim results, but very cool they could do it across 5 UC systems, then meta-analyze results.
They can then analyze heterogeneity across UC systems. Very flexible to run the comparison of one drug against many other alternatives. They are incorporating empirical calibration to address unobserved confounding.
Limitations: of course still may be residual confounding, and measurement quality in EHRs is not perfect.
Really cool platform.
Some questions: yes, the tensor feature is estimating the propensity score, but b/c there are more potential predictors than observations, we need some type of penalization method. They are comparing lasso vs regression trees vs other stuff.
q: the sample is still a bit small and leaves uncertainty.
Rohit's answer: goal is to develop a platform that could be replicated anywhere, e.g., other institutions or w/ optum or other large databases. They are really developing a framework.
q: what about using physician identity as an instrumental variable? @Vashishtrv answers: yes, in theory possible but not in the OMOP data right now.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Department meetings at @UCSF_Epibiostat have become surprisingly fabulous. Every meeting a thoughtful covid update from George Rutherford. +Today new faculty member @Jean_J_Fengjeanfeng.com re fair machine learning algorithms in medical care.
She thinks of ML apps along 2 dimensions: severity of healthcare situation (low=apple watch for Afib) (high=viz.ai contact to id stroke in process) and along significance of the information provided (low=apple watch) (high=idx-DR diabetic retinopathy).
Nothing approved yet with high severity and high significance of information. Suggests we are really uncomfortable deferring decisions to ML algorithm instead of our Doctor. Why?
Their question is super important: can we generalize from highly selected samples (HSS) eg UK Biobank (UK) to populations? HSS are much cheaper than representative samples (and in general, high response rates are expensive to achieve), but … 2/n.
Causal estimates in highly selected samples can differ from truth in populations if (1) effects differ in the people who participate vs those who did or (2) selective participation creates spurious associations so we don’t get the right answer even for those who participated! 3/n
Dr Julia Adler-Milstein re turning digital fumes into fresh air (ie useful evidence for clinical care & system design) at @UCSF_Epibiostat seminar. Super cool new Center for Clinical Informatics and Improvement Research. medicine.ucsf.edu/cliir@CliirUcsf 1/n
Digital transformation: requires constant evolution & improving tools we use, which is achieved by observing how users interact w/ & use tools. In health care, we're still in early stages of this work-need to move into an era where we adopt continuous test/refine cycle.
@johnwitte: does better user experience=spending more $? JAM: maybe for Amazon, but not us.
Digital transformation enables new discovery of *what* care should be delivered AND *how* care should be delivered. Provide tools to support clinical decisions & care team design.
She leads by calling for value of interdisciplinary research. Need both strong theory & practical/relevant for practice. Sometimes theoretical ideal and practicability are in conflict. Callout for articles on methods grounded in real problems for journal @biostatistics.
@biostatistics Computational health economics (#econtwitter): how can we affect policy?
Data first, methods second. Usefulness of electronic health database is a new resource, but usefulness for research really varies (fancy stats doesn't solve major data problems)