Really excited for Dr. Patrick Ball of @hrdag's talk. His background is quantitative analysis for evaluating human rights violations, e.g., for truth commissions. @UCSF_Epibiostat #SamplingKnowledgeHub #epitwitter
This guy's incredible. Silicon valley background- made software to help people safely document human rights violations. Started in 1991, when a grad student at @UMich soc and demography. Struggled after seeing what was happening in Guatemala and El Salvador- many crimes.
Adopted as personal motivation defense of human rights. In El Salvador, began non-violent accompaniment: accompany someone who is under threat of violence, eg a religious leader. You are 'noisy' & use privilege to try to protect, w/ camera, passport, & home country network
This actually worked. But as an aside, it was boring because you mostly sat around waiting for important people to have important meetings but you as the accompanier were not really having important meetings. He spent his time working on trying to fix up floppy disk (1991!).
This led to requests trying to computerize case files written in hard copy re eyewitnesses. The group that asked for his help wanted to link these accounts to the career trajectories in the el salvador military officers (constructed from newspapers and resistance reports).
Goal was to figure out crimes that occurred under specific officers to hold them accountable. This was a key input into the peace process then being negotiated b/c they could force specific people who were the "worst" to retire from the military.
"We don't get a lot of wins in human rights. In human rights we're using moral force to contest ..." structures with real power. So when you do have a win, what's the lesson and how do you scale it up? He concluded: data.
In the 1990s, this was mostly about databases. But there's also a huge statistical problem. We don't know the sampling process that's occurred in the context of all of this violence so one event gets reported? We want to understand the whole picture.
From Human rights perspective (HR), goal is not just counting 'how many' people were murdered kidnapped, forced from their home, but also disaggregating over time to see patterns. So Population Size Estimation is about getting it right over and over in different time.
"Silences"= people don't always tell their stories.
Some technical issues with Patrick's computer. (note to self, consider contributing to @hrdag so they can buy him a new computer)
Goal of their work is to develop policy pushbacks, e.g., criminal accountability for people who committed mass crimes. This is rare. More common and often more meaningful to victims is historical memory. The idea that we will not forget - we will retain and remember the people.
To achieve either of these goals, we must be right. We must get the fact rights. When talking about stats, e.g., how many people were killed? Did violence go up in april or down in april? This is tricky. We don't have data on all the events. Some people wouldn't talk to us.
Some people were inaccessible. Some people were afraid. Some people we didn't even know to ask about it. The catchphrase for this is "we don't know what we don't know." What we don't know is likely to be systematically different than what we know.
There's a social process that generated the reason for people to talk to us.
slight disruption d/t technical problems. Non-linear but important note: the foundation of authoritarian govt's is to tell you things that are absurdly, obviously false. Then insist you believe those lies. Then it's the test of your loyalty is whether you're willing to believe.
The way we must respond to these lies is to come back to the truth. Both the qualitative truth of the experience of the victim and the quantitative truth of how many people, when, who...
Back to "we don't know what we don't know". But sometimes there are clues. Project "Iraq Body Count", number of people killed from 2003 allied invasion on. Collected info from world media across multiple languages. He was concerned about their statistical inferences.
Not only underreporting some deaths but over-representing other types of deaths, so giving a very distorted picture. For example, consider event size (I think this is # of deaths) and # of sources. Most important data is the data for which you have 0 sources.
V. cool graph shows that large events have many many sources of reporting. Small events (1 victim) usually only reported by 1 or 2 sources. Large events (15+victims) usually reported by 15+ sources. Implies that most of the events with just 1 victim are not reported.
Important b/c large events likely to be perpetrated by AQI (al qaeda in iraq), coalition collateral, withw/IED or airstrikes, random victims-goal is destabilization/control. Small events totally different
Small events likely to be committed by Shi'a militias w/ firearm, killing adult men w/ goal of ethnic cleansing. If we collect data in a way that shares biases of pre-suppositions, we are not testing, we are reinforcing own priors. Naive statistics reinforce international biases.
Imagine you collect and combine 3 databases. Do the 3 databases in combo recover most of events, or only a small fraction? e.g., in Peru, they knew most of the violations of the peruvian army, but relatively few of the violations by sendero luminoso.
Relationship between what is observed (the sample) and what is true (the population) is the coverage rate. Only a formal, probability based model can bridge that gap. This is for multiple systems estimation (aka capture recapture).
Toy example: you've got 2 lists A (50 events) and B (100 events). check the intersection of the lists (25 events), call those people on both lists M. Total population size can be estimated as A*B/M (50*100/25=200). Many unrealistic assumptions but that's the idea.
What goes wrong with this basic approach? Consider the case of police homicides in the US. BJS linked media reports of police homicide w/ the FBI homicide reports. (side note FBI only reports homicides if they determine that it's a legal homicide??)
Problem is that probability of observation in media and in FBI systems is highly correlated. So simple MSE calc doesn't work. Imagine 2 people murdered. (1)a US citizen, murder is videotaped. Within a day, recording everywhere. (2) an undocumented immigrant, not videotaped.
Homicide of undoc immigrant not widely reported. Social visibility creates a strong positive correlation between two sources. This leads to a downward bias in the estimate.
They went back and estimated list dependence - estimates from Kosovo, colombia, others. Adjusted estimates incorporating list dependence from other countries range a bit but more like 10,000 instead of original estimate of 7300.
Suggests 8-10% of all homicides in the US are caused by police. Really staggering b/c 3/4 (I may have precise # wrong) of homicides are by someone you know, so if you calculate the probability that if you're killed by a stranger, it was a police offer is very high...
Expert testimony in Rios Montt in Guatemala. 1990s truth commission concluded acts of genocide. Finally a trial in 2013 bringing former prez rios montt to trial. Side note, what's genocide? Not just killing, but *targeted killing people in specific religious, ethnic, other groups
In epispeak, we might call it the relative risk for some groups very high. Their group calculated whether deaths were disproportionately among indigenous vs non-indigenous.
They used the census in Guatemala to get denominators in a region and showed a relative risk of about 8 for indigenous people to be killed. Similar to Rwanda, RR for being Tutsi vs Hutu in the one area they had good data was ~5. In Bosnia, RR associated w/ being Muslim was ~3.
They did this using log linear model for population size estimation to get the numerators for the above calculation. They had 4 sources of info on killings, could evaluate extent of overlap amongst each combo of lists.
side note: Rios Montt died during the trials.
Human rights work is using moral force and information against real power. It was important to make the compelling case that genocide occurred.
Side note about the stats methods of: dmanriqu.pages.iu.edu addressing independence assumption by conceptualizing it as fulfilled within strata using LCMCR, allowing use of many more sources. Allows more robust estimates omitting certain sources. Quant Bias Analysis
These stories are so incredible I cannot capture them. Huge discovery of documents in Guatemala. All national police actions over the past century +. How can they make sense of this? A warehouse of data. They undertook "topographic sampling", based on the piles and piles of docs
They had to periodically redo the sampling because as the piles were being processes, the piles changed. The papers often were memos, which were signed off by multiple people - this office then that office, then files the memo. This helps show accountability.
Colleagues found documents ordering a sweep of subversives in the particular area of disappeared people. Could find the officers who got the orders. When they found those cops, they said yep, we did it. But we were just following orders.... they got 40 years in prison.
And who gave those orders? The document identified the grand boss - charged w/ command responsibility for the disappearance. A standard defense in such cases is "I was really a reformer. Those were rogue agents". They could show: nothing special or rogue about this campaign.
Fully bureaucratic. Bureaucracies dedicated to violence are often much more controlling b/c there's a 'principal agent' problem, have to be controlling to make sure the agents are doing the violence the bureaucracy wants.
We have to get it right. The only way we get to the justice is to tell the truth. That's every bit as important when we're doing statistics or any other human rights work.
This is truly among the most compelling talks about stats and epi methods that I've ever heard. Everyone should get to hear it.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Maria Glymour

Maria Glymour Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @MariaGlymour

4 Dec 20
Rohit Vashish from @UCSF_BCHSIt presenting at @UCSF_Epibiostat department meeting about using electronic health record (EHR) data to emulate target trials to understand treatment effects for chronic disease management, example with type II diabetes
Beautiful explanation of the data gap: there is just no way to have good head-to-head RCTs of all the important medication decisions for all of the important potential outcomes (retinopathy, acute CVD event, etc). We must use "real world data", e.g. EHR data.
7 simple steps!
Read 18 tweets
18 Sep 20
Department meetings at @UCSF_Epibiostat have become surprisingly fabulous. Every meeting a thoughtful covid update from George Rutherford. +Today new faculty member @Jean_J_Feng jeanfeng.com re fair machine learning algorithms in medical care.
She thinks of ML apps along 2 dimensions: severity of healthcare situation (low=apple watch for Afib) (high=viz.ai contact to id stroke in process) and along significance of the information provided (low=apple watch) (high=idx-DR diabetic retinopathy).
Nothing approved yet with high severity and high significance of information. Suggests we are really uncomfortable deferring decisions to ML algorithm instead of our Doctor. Why?
Read 16 tweets
17 Feb 20
I hope everyone reads the results in this paper but ignores the conclusions, since conclusions do not seem to reflect the results Batty et al: bmj.com/content/368/bm… @PWGTennant @epi_kerrykeyes @EpiEllie @EpidByDesign @MarcusMunafo @MikaKivimaki 1/n
Their question is super important: can we generalize from highly selected samples (HSS) eg UK Biobank (UK) to populations? HSS are much cheaper than representative samples (and in general, high response rates are expensive to achieve), but … 2/n.
Causal estimates in highly selected samples can differ from truth in populations if (1) effects differ in the people who participate vs those who did or (2) selective participation creates spurious associations so we don’t get the right answer even for those who participated! 3/n
Read 15 tweets
2 Nov 19
Dr Julia Adler-Milstein re turning digital fumes into fresh air (ie useful evidence for clinical care & system design) at @UCSF_Epibiostat seminar. Super cool new Center for Clinical Informatics and Improvement Research. medicine.ucsf.edu/cliir @CliirUcsf 1/n
Digital transformation: requires constant evolution & improving tools we use, which is achieved by observing how users interact w/ & use tools. In health care, we're still in early stages of this work-need to move into an era where we adopt continuous test/refine cycle.
@johnwitte: does better user experience=spending more $? JAM: maybe for Amazon, but not us.
Digital transformation enables new discovery of *what* care should be delivered AND *how* care should be delivered. Provide tools to support clinical decisions & care team design.
Read 13 tweets
20 Sep 19
So excited for @sherrirose long-awaited workshop on computational health economics & outcomes. @UCSF_Epibiostat #epitwitter
She leads by calling for value of interdisciplinary research. Need both strong theory & practical/relevant for practice. Sometimes theoretical ideal and practicability are in conflict. Callout for articles on methods grounded in real problems for journal @biostatistics.
@biostatistics Computational health economics (#econtwitter): how can we affect policy?
Data first, methods second. Usefulness of electronic health database is a new resource, but usefulness for research really varies (fancy stats doesn't solve major data problems)
Read 28 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!