An overall lack of recognition for the invisible, arduous, & taken-for-granted data work in AI leads to poor data practices, resulting in data cascades (negative, downstream events)... “Everyone wants to do the model work, not the data work” 1/
Data quality issues in AI are addressed with the wrong tools created for, and fitted to other tech problems—they are approached as a database problem, legal compliance issue, or licensing deal. 3/
“In real life, we never see clean data. Courses focus on models & tools but rarely teach about data cleaning & pipeline gaps.” CS curricula don't include training for dealing w domain-specific ‘dirty data’, documenting datasets, designing data collection, training raters,... 4/
ML data collection practices often conflict w/ existing workflows of domain experts. Data creation was added as extraneous work to on-the-ground partners (e.g., nurses, patrollers, farmers) who already had several responsibilities and were not adequately compensated. 5/
Missing metadata led practitioners to make assumptions, ultimately leading to costly discarding of
datasets or re-collecting data. Lack of metadata & collaborators changing schema w/out understanding context led to loss of 4 months of precious medical robotics data collection 6/
From goodness-of-fit to goodness-of-data:
Goodness-of-fit metrics, such as F1, Accuracy, AUC, do not tell us much about the fidelity and validity aspects of the data. Currently, there are no standardised metrics for characterising the goodness-of-data 7/
We find drastic differences in data & compute in African countries & India, compared to USA... the Global South is viewed as a site for low-level data annotation work, an emerging market for extraction from ‘bottom billion’ data subjects, or a beneficiary of AI for social good 8/
• • •
Missing some Tweet in this thread? You can try to
force a refresh
[Faulty] assumptions in design & deploy of AI systems:
- user is an individual
- individual prioritizes personal well-being
- text & context can be separated
- the only useful knowledge is that produced through rational instrumentality... jasonedwardlewis.medium.com/from-impoveris…@jaspernotwell
"...This makes AI system engineers blind to vital aspects of human existence — such as trust, care, and community — that are fundamental to how intelligence actually operates." 2/
"The people who produced that data were not asked if it be used this way, they were not compensated for this use, & the use does not benefit them directly.
Indigenous communities have long histories with people like this. We recognize them for what they are: colonizers" 3/
One depressing aspect of the pandemic is how countries refuse to learn from other countries. Within a country, states refuse to learn from other states. Many refuse to learn from history. Many believe in exceptionalism, that they won’t face what everyone else has. 1/
I still remember first seeing the images of tent hospitals in Lombardy and realizing that this could happen everywhere. Jeremy & I did a data analysis and wrote at the time 2/
75% of people aged 16+ in UK have both doses of covid vaccine & there are currently 700 covid deaths PER WEEK in UK
Some Aus leaders want to reopen when vaccines for ages 16+ hit 70-80%. If our death rate is proportionate to UK, that would mean 266 Australians dying PER WEEK. 1/
Many in UK have already had covid, so it's likely that the AUS death rate could be higher than that 266 ppl per week
75% of ppl 16+ is only 60% of the whole population. 60% against Delta is not enough. We need to vaccinate children & we need rates ~90%. 2/
Some point out how society accepts deaths from flu. In 2019, there were 486 flu deaths in Australia (averages to 9 per week). 2017 was particularly bad with 1,255 flu deaths (avg 24 per week).
What we are facing with covid is over 10x more. These are not the same. 3/
I’m hearing more people in Australia talk about wanting to "live with Covid", even though only 22% of the population is fully vaccinated. #LivingWithCovid (combined with low vaccination rates) means… 1/
Living with Covid (+ low vaccine rates) is:
- Delaying surgery for cancer, organ transplants, brain tumors
- waiting an hour to get an ambulance after heart attack
- turning medical emergency into a catastrophe, b/c the hospital is maxed out
- millions disabled with LongCovid 2/
Living with covid is not just the death count, it is 10-30% of so-called "mild" cases becoming permanently disabled with LongCovid, which can include debilitating neurological effects and constant pain. 3/
5 Myths of Co-Design for Ethical ML
- ‘Better’ involvement➡️ 'better’ design outcomes
- Co-design increases agency of patients
- Representation reduces risk of harms
- Co-design is an inherently ethical approach
- All problems can be co-design problems @josephdonia@jayshaw29 1/
Novel challenges when co-designing AI:
- participation is often unwitting (eg as training data)
- AI technologies can be repurposed after deployment
- black box nature
- hard to account for how data produced by system will be used in future 2/
"Better" involvement does not imply a stronger focus on the whole system (including consequences related to data commodification & surveillance), much of which is out of view for both users & designers 3/
Jeff even includes Black in AI, a fantastic org co-founded by @timnitGebru, whom he fired and then tried to portray using the angry Black woman trope. 2/
One of the 3 conflicting stories that Google has provided about why Dr. Gebru was fired is that it was for being honest about how working on diversity initiatives at Google made her life HARDER. 3/