They sure will. Here's a quick analysis of how, from my perspective as someone who studies societal impacts of natural language technology:
First, some guesses about system components, based on current tech: it will include a very large language model (akin to GPT-3) trained on huge amounts of web text, including Reddit and the like.
It will also likely be trained on sample input/output pairs, where they asked crowdworkers to create the bulleted summaries for news articles.
The system will be some sort of encoder-decoder that "reads" the news article and then uses its resulting internal state to output bullet points. Likely no controls to make sure the bullet points are each grounded in specific statements in the article.
(This is sometimes called "abstractive" summarization, as opposed to "extractive", which has to use substrings of the article. Maybe they're doing the latter, but based on what the research world is all excited about right now, I'm guessing the former.)
So, in what ways will these end up racist?
1) The generator, giving the outputs, will be heavily guided by its large language model. When the input points it at topics it has racist training data for, it may spit out racist statement, even if they aren't supported by the article.
2) If sample input/output pairs will likely have been created by people who haven't done much reflecting on their own internalized racism, the kinds of things they choose to highlight will probably reflect a white gaze which the system will replicate.
[Ex: A Black person is murdered by the police. The article includes details about their family, education, hobbies, etc as well as statements by the police about possibly planted evidence. Which ends up in the summary?]
1&2 are about being racist in the sense of saying racist things. It will also likely be racist in the sense of disparate performance:
3) The system, trained mostly on mainstream/white-gaze texts, when asked to provide a summary for an article written from a BIPOC point of view, won't perform as well.
4) When the system encounters names that are infrequent in American news media, they may not be recognized as names of people, sending the generator down weird paths.
1-4 are all about system performance, but what about its impact in the world?
5) System output probably won't come with any indication of its degree of uncertainty/general level of accuracy/accuracy with the type of text being fed in. So people will pick up 'factoids' from these summaries that are wrong (and racist).
6) System output won't indicate what kinds of things usually make it into the summary, so the already racist patterns of e.g. how Black victims of police violence are described in the press as consumed by readers will get worse (see pt. 2).
I'm sure there's more, but that's my quick analysis for tonight.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with 🕯️❄️Emily M. Bender❄️🕯️

🕯️❄️Emily M. Bender❄️🕯️ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @emilymbender

14 Dec
To all of those who are "both-sides-ing" this: we see you. It takes courage, guts, and fortitude to speak out in the face of oppression, knowing that no matter how gently you make the point people will think you're "too angry".
I'm glad that #UWAllen publicly disavowed Domingos's meltdown. It's disheartening to see so many folks reacting to that with: well what about @AnimaAnandkumar?
Just how angry is the right amount of angry, when faced with racism, misogyny, misogynoir, gaslighting, etc? Furious is the right amount.
Read 4 tweets
13 Dec
"Aside from turning the paper viral, the incident offered a shocking indication of how little Google can tolerate even mild pushback, and how easily it can shed all pretense of scientific independence." Thank you, @mathbabedotorg
@mathbabedotorg Re: "Embarrassing as this episode should be for Google — the company’s CEO has apologized — I’m hoping policy makers grasp the larger lesson."

Totally agree on the main pts (about policy makers and about it being embarrassing). It doesn't seem to me that he actually apologized.
Read 4 tweets
9 Dec
I’ve picked up a bunch of new followers in the past few days, and I suspect many of you are here because you’re interested in what I might have to say about Google and Dr. @TimnitGebru. So, here’s what I have to say:
Dr. @TimnitGebru is a truly inspiring scholar and leader. Working with her these past few months has been an absolute highlight for me:
I was super excited to work with her because of all the excellent work she has done. For highlights, see this thread compiled by @math_rachel:
Read 23 tweets
30 Oct
I was feeling a little ranty this morning, but there's actually also some interesting points about context and pragmatics here, for when we write (or cause machines to write) text that will be interpreted in contexts we aren't directly participating in:
Surely, from the platform's point of view, WeCNLP is starting at 7am. For them, WeCNLP refers to an event with "start" and "end" times they have to program into their platform, so that people who have registered can access the platform during those times. >>
But for people *attending* WeCNLP (the addressees of that email), WeCNLP refers to an event with a specific internal schedule, of talks and informal meeting times. >>
Read 7 tweets
30 Oct
Is anyone else on the West Coast already up and surprised to see an email from #WeCNLP2020 saying the event starts in "59 minutes" when the schedule says 8am start?
Seems like an auto-generated system from the online platform because the site is opening at 7, though the program doesn't start until 8.
Given that this one is WEST COAST NLP and for once is actually in our timezone, it would be nice to not be harassed by emails making us feel late...
Read 4 tweets
14 Sep
This is a really important paper for #NLProc, #ethNLP and #ethicalAI folks.

1/n
The authors look deep into a use case for text that is ungrounded in either the world or any commitment what's being communicated but nonetheless fluent, apparently coherent, and of a specified style. You know, exactly #GPT3's specialty.

2/n
What's that use case? The kind of text needed, and apparently needed in quantity, for discussion boards whose purpose is recruitment and entrenchment in extremist ideologies.

3/n
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!