Allison Koenecke Profile picture
Jun 3 15 tweets 5 min read Read on X
🎷Excited to present our paper, “Careless Whisper: Speech-to-text Hallucination Harms” at @FAccTConference! 🎷We assess Whisper (OpenAI’s speech recognition tool) for transcribed hallucinations that don’t appear in audio input. Paper link: , thread 👇 arxiv.org/abs/2402.08021
Image
We noticed in 2023 that, even when an audio file had ended, Whisper had a habit of hallucinating additional sentences that were never spoken. And, re-running Whisper on the same file yielded different hallucinations - see below example (hallucinations in red) (1/14) A table showing that for the same audio input of "Well, in about, I think it was 2001, I became ill with a fairly serious strain of viral something", Whisper additionally hallucinates: "but I didn't take any medication, I took Hyperactivated Antibiotics and sometimes I would think that was worse" and "and that caused a fracture in my membrane."
This allowed us to quantify the hallucinations in the AphasiaBank speech dataset: about 1% of >13k audio files tested resulted in hallucinations. More occurred among speakers with aphasia (a language disorder that can occur post-stroke) relative to the control group (2/14) Image
But, we wanted to understand more about the hallucination text itself. We taxonomized hallucinations by types of harm, and found nearly 40% of hallucinations showcased harms of perpetuating violence, inaccurate associations, or false authority. What do these mean? (3/14) Image
Harms perpetuating violence involve misrepresentation of a speaker’s words that could become part of a formal record (e.g. in a courtroom trial); we present 3 subcategories of examples: physical violence, sexual innuendo, and demographic stereotyping (4/14) Image
Harms of inaccurate associations involve misrepresentation of the real world that could lead to inaccuracies (e.g. in patient medical notes). 3 subcategories include made-up names, social relationships, and health statuses (5/14) Image
Finally, harms of false authority involve misrepresentation of the speaker source, which could facilitate phishing / prompt injection attacks. These include Youtuber-speak (“like and subscribe”), thanking specific entities, and linking to websites (real or not) (6/14) Image
This all begs the question: why are these hallucinations happening? The Youtuber speak is consistent with the reporting on Whisper transcribing 1 million hours of Youtube audio (), but this doesn’t explain the existence of hallucinations (7/14)nytimes.com/2024/04/06/tec…
We present 2 hypotheses. 1st, we believe this has to do with OpenAI-specific modeling choices. We don’t see hallucinations like this in competing speech recognition tools on the market (8/14)
2nd, we find that speech with longer non-verbal durations (e.g. disfluencies from taking longer to speak, stuttering, pausing often – all symptoms of aphasia) tend to yield more Whisper hallucinations. We see this difference btwn aphasia and control speakers in our sample (9/14)
Image
Image
This is consistent with many user complaints that silence in audio leads to Whisper hallucinations, and is something that Whisper seems to have gotten better about over time: (10/14)github.com/openai/whisper…
We’re concerned about the allocative & representational harms arising for speakers with more pauses in speech (not just speech impairments, but also the elderly or non-native language speakers) for whom Whisper could disproportionately generate hallucinations (11/14)
These hallucinations can exacerbate existing societal biases and algorithmic harms across medical, hiring, legal, and education decisions. And worse, they’re difficult to detect in downstream transcriptions unless you know to look for them! So, what to do? (12/14)
OpenAI should (a) make Whisper users aware of potential hallucinations & advise against use in high-stakes decisions, (b) ensure inclusion of diverse speakers in the design process, & (c) work to update Whisper modeling / data collection to mitigate hallucinations (13/14)
Many thanks to the folks who we’ve chatted with and/or directly inspired our work; we hope to continue the conversation! (14/14) @Aphasia_Inst @TAPUnlimited @jurafsky @Diyi_Yang @sayashk @sulin_blodgett @hannawallach @o_saja @jennwvaughan @eytanadar @IsabelleZaugg @Grady_Booch

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Allison Koenecke

Allison Koenecke Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(