1/15 Why AI in health is pretty terrifying? Let me illustrate with this example clinicbarcelona.org/ca/14oticies/e… why this may not be an area in which we want algorithms to be making decisions. Thread starting 👇
2/15 Disclaimer: I only know about this application what is written in the article, so some points are based on assumptions. It still helps illustrate the fundamental issues.
3/15 The application: a neural network predicts how COVID-19 will progress in new patients. I take this to mean predict the severity/outcome/symptoms.
Goal: predict the severity? treatment needed? hospitalization? based on the data.
4/15 The classifier is trained on 3051 samples from 2440 COVID-19 patients. This is already worrisome as some samples seem to not be independent. But for the sake of the argument let’s assume that there is one sample per patient and leave this for another thread...
5/15 All data was collected from the same location. While it is necessary to partner with hospitals/doctors to collect and understand the data, a single source of data can be dangerous, as it brings the following issues
6/15 Issue 1: distributional shift. Assumption: the training samples distribution is the same as the samples the model will be deployed on. The model trained to predict outcomes at a clinic in Barcelona will work in any other part of Spain. Consequence: bias and accuracy loss
7/15 Imagine most people at the training clinic were middle class, urban, highly educated, citizens with easy access to health care. But in deployment, it is used where many are poor, rural, uneducated, undocumented workers without access to healthcare...
8/15 Then the classifier will many times fail to correctly predict results for a patient (as it had not learned this case!) The result of an incorrect prediction, eg, hospitalize when not needed, could have severe impact on this patient and even be deadly.
9/15 Issue 2: Correlation IS NOT causation! Just because two variables are correlated does not mean one caused the other.
10/15 Imagine wealthy patients have better outcomes bc they visit the doctor earlier once symptoms develop than poorer patients (they can hire a babysitter, take off work). The classifier considers occupation, relevant if your're a coal miner, but also a great proxy for wealth
11/15 The classifier may look at occupation and conclude that poor patients will have worse health outcomes and keep them in the hospital unnecessarily (which has its own risks associated with it, e.g. staph infections) or perform unnecessary medical procedures.
12/15 Which brings us to Issue 3: Transparency (or here, really, Interpretability). If the model were interpretable, doctors could evaluate the model and notice that poor patients are kept in the hospital regardless of their arrival time, realizing that predictions may be biased
13/15 How to solve this problem is not even clear. How can we separate the correlation between wealth and occupation from the influence that some occupations have on health outcomes?
14/15 This is even oversimplified. There may be other correlations (health before infection) that wealth can act as a proxy for & when you add race or immigration status into the mix, you may find that you’re over hospitalizing black and immigrant patients for the same reasons.
15/15 There are more things I didn’t cover bc this is just a twitter thread (liability, accountability, recourse, security&privacy). But this already shows that with AI technologies, the question is not always "can we deploy this tech right?", it's "should we do this at all?"
Appendix 1: Issue 1 parallels another problem in traditional medical drug trails: most are only tested on adult (overwhelmingly white) males, meaning that negative effects that might present in women or people of color are unknown (theguardian.com/lifeandstyle/2…)
Finally, it’s important to laugh in 2020, so I wanna translate a phrase from the article. From September 8th. In Cataluña.
1/n Alternative headline: "Deceptive use of deep learning to permaban Twitter users you don't like." As someone who works on both the spread of disinformation and privacy, I can say a lot about this. Let's go. elpais.com/tecnologia/202…
2/n
Let's get this out of the way: disinformation is a real problem that needs to be addressed & disinformation about covid-19 is amplifying the crisis. We should be working on ways to dampen disinformation and promote quality journalism.
3/n This article argues that one method to avoid the spread of disinformation is to record the keystroke patterns so that when you catch someone spreading disinformation you can recognize them and ban them even if they create a new account to get around your ban.