Talk two of paper session 15 at #FAccT21:
"Spoken Corpora Data, Automatic Speech Recognition, and Bias Against African American Language: The Case of Habitual ‘Be’" by Joshua Martin
Studies have begun to be published about racial linguistic bias, but still not too many. Point to ASR (automatic speech recognition) systems and how a paper came out last year about how 5 major systems (Apple, Amazon, Google, ..) had much higher error rates for Black speakers
This paper specifically looks at the specific case of (the habitual) "be" that is unique to AAVE/AAL; "Angela be studying" is used as an example, pointing out that "Angela is studying" is *not* an adequate or correct equivalent. Speech recognition systems struggle with this.
ASR systems, due to their training, might take "Angela be studying" and recognize it instead with 93% accuracy as "Angela is studying", when this is an incorrect translation. Therefore these ASR systems have bias that works against AAVE/AAL speakers.
They looked at 4 common ASR corpora (some which notably only have 4% Black speakers included) and the Corpus of Regional African American Language (CORAAL) (which has 100% Black speakers included) to compare the differences in the frequency and usage of the habitual 'be'.
Shocking results: LibriSpeech (134M words) only has 42 instances of the habitual 'be'. Switchboard (3M words) only has *3* instances. Whereas CORAAL (1M words) has 485 instances. These other corpora are the ones used for ASR systems, but are woefully lacking in 'be' instances.
53% of CORAAL texts include the habitual 'be', whereas for these 4 other popular corpora, only 0% - 2% of their texts include the habitual 'be'. The findings reveal a lack of representation in these datasets, thus affecting the ASRs as well. Yikes!! 😬
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Excited for this final keynote! For those outside of the know, Julia Angwin was the journalist who broke the "Machine Bias" article with ProPublica that just about everyone in this field now cites. She also founded The Markup & is the EIC there. Her work has been field-changing.
@JuliaAngwin is talking about how The Markup does things differently, emphasizing building trust with the readers. By writing stories and showing their analysis work, but also through a privacy promise, not tracking *anything* about people who visit their website. No cookies!
@JuliaAngwin: "We don't participate in the game that is pretty common in Silicon Valley .... we don't think someone who gets paid to be a spokesperson for an organization deserves the cloak of anonymity. That's what we do differently from other journalists they might talk to."
On the last-minute changing of the name: "Rather than say the ways that we would like to deviate from the inevitable, we want to talk about the ways in which the implications of the future are up for grabs." - @alixtrot 🔥🔥
.@schock tells us to "put our money where our mouth is" and sign up for and support the Turkopticon organizing effort to help support Amazon Mechanical Turk workers:
.@cori_crider talks about Prop 22 here in CA, which companies like Uber spent $200M on in order to encode into law that drivers are not employees. "Having secured that victory, they're seeking to roll out that model in other legislatures." "That is Uber's vision of the future."
Let's goooo!!! The second of two papers on AI education is coming up in a bit. As an AI educator focused on inclusion and co-generative pedagogy, I'm *really* excited for this talk on exclusionary pedagogy. Will tweet some take-aways in this thread:
First, a mention for those who don't know, I've been a CS educator since 2013, and in 2017 I moved into specifically being an AI educator, focusing on inclusive, accessible, and culturally responsive high school curriculum, pedagogy, and classroom experiences. Informs my POV
.@rajiinio starts the talk off by mentioning that there's an AI ethics crisis happening & we're seeing more coverage of the harms of AI deployments in the news. This paper asks the question, "Is CS education the answer to the AI ethics crisis, or actually part of the problem?" 🤔
This is one of my favorite papers at #FAccT21 for sure, and I highly recommend folks watch the talk and read the paper if they can! Tons of nuggets of insight, was so busy taking notes that I couldn't live-tweet it. Here are some take-aways, though:
The paper looked at racial categories in computer vision, motivated by looking at some of the applications of computer vision today.
For instance, face recognition is deployed by law enforcement. One study found that these "mistook darker-skinned women for men 31% of the time."
They ask, how do we even classify people by race? If this is done just by looking at geographical region, Zaid Khan argues this is badly defined, as these regions are defined by colonial empires and "a long history of shifting imperial borders". 🔥🔥
First paper of session 22 at #FAccT21 is on "Bias in Generative Art" with Ramya Srinivasan. Looks at AI systems that try to generate art based on specific historical artists' styles, but using causal methods, analyzes the biases that exist in the art generation.
They note: It's not just racial bias that emerges, but also bias that stereotypes the artists' styles (e.g., reduction of their styles to use of color) which doesn't reflect their true cognitive abilities. Can hinder cultural preservation and historical understanding.
Their study looks at AI models that generate art mainly in the style of Renaissance artists, with only one non-Western artist (Ukiyo-e) included. Why, you might ask?
There are "no established state-of-the-art models that study non-Western art other than Ukiyo-e"!!
Happening now: the book launch of "Your Computer is on Fire", which is an anthology of essays on technology and inequity, marginalization, and bias.
@tsmullaney with opening remarks on how this *four and a half* year journey has been an incredibly personal one.
I can't believe it's been four years!! I remember attending the early Stanford conferences that led to the completion of this book. At the time I think I was just returning from NYC to Oakland... so much has changed since then, in the world & this field, truly.
@histoftech: "As Sarah Roberts (@ubiquity75 ) shows in her chapter in this book, the fiction that platforms that are our main arbiters of information are also somehow neutral has effectively destroyed the public commons"