@emilymbender.bsky.social Profile picture
Sep 14, 2020 10 tweets 4 min read Read on X
This is a really important paper for #NLProc, #ethNLP and #ethicalAI folks.

1/n
The authors look deep into a use case for text that is ungrounded in either the world or any commitment what's being communicated but nonetheless fluent, apparently coherent, and of a specified style. You know, exactly #GPT3's specialty.

2/n
What's that use case? The kind of text needed, and apparently needed in quantity, for discussion boards whose purpose is recruitment and entrenchment in extremist ideologies.

3/n
And guess what? They find that #GPT3's trick of "few shot" training is definitely up to this challenge.

4/n
I don’t think GPT-3 could produce text written from the point of view of a conspiracy theorist if it didn’t have such texts among it’s training data. But, in the spirit of healthy skepticism, if someone wants to explain how it could, I’m curious about your theories. #NLProc

5/n
The next question then, is: how much such data does it need? Are we seeing a reflection of lots of this garbage getting sucked into the maw of the data-hungry algorithm? Or does it only take a little?

6/n
And if it only takes a little, that’s actually much worse, because it’s much harder to design processes that can filter out tiny amounts of this. E.g. would examples quoted in serious articles discussing the threat of online fora like this be enough?

7/n
My take away 1: ML systems that rely on datasets too large to actually examine are inherently unsafe. (Quote previous tweet on this.)


8/n
My take away 2: This paper shows the immense value of interdisciplinary perspectives in evaluating the potential risks of technology.

9/9
p.s. The paper goes talk about #GPT3 having "knowledge" of various conspiracy theories. I think this is a category error, but it does not detract from the point the paper is making. For more on why, though, see aclweb.org/anthology/2020…

cc: @KrisMcguffie @AlexBNewhouse

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with @emilymbender.bsky.social

@emilymbender.bsky.social Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @emilymbender

Nov 4
As OpenAI and Meta introduce LLM-driven searchbots, I'd like to once again remind people that neither LLMs nor chatbots are good technology for information access.

A thread, with links:

>>
@chirag_shah and I wrote about this in two academic papers:
2022: dl.acm.org/doi/10.1145/34…
2024: dl.acm.org/doi/10.1145/36…

We also have an op-ed from Dec 2022:
iai.tv/articles/all-k…

>>
Why are LLMs bad for search? Because LLMs are nothing more than statistical models of the distribution of word forms in text, set up to output plausible-sounding sequences of words.



>>
Read 15 tweets
Feb 29
It seems like there are just endless bad ideas about how to use "AI". Here are some new ones courtesy of the UK government.

... and a short thread because there is so much awfulness in this one article.
/1


ft.com/content/f2ae55…
Screencap: "UK ministers are piloting the use of generative artificial intelligence to analyse responses to government consultations and write draft answers to parliamentary questions.  Oliver Dowden, the deputy prime minister, will on Thursday unveil tools that the AI “crack squad” at the heart of Whitehall is trialling with a view to wider rollouts across central departments and public services."
Either it's a version of ChatGPT OR it's a search system where people can find the actual sources of the information. Both of those things can't be true at the same time. /2 Screencap: "The AI tools include using government-hosted versions of ChatGPT and a mix of open-source AI models securely hosted in-house to draft preliminary responses to questions to ministers submitted by MPs and to freedom of information requests.  The drafts would always be checked by a human civil servant and the AI tools are programmed to ensure they cite their sources on all claims, so they can be verified."
Also: the output of "generative AI", synthetic text, is NOT information. So, UK friends, if your government is actually using it to respond to freedom of information requests, they are presumably violating their own laws about freedom of information requests. /3
Read 10 tweets
Jan 14
It is depressing how often Bender & Koller 2020 is cited incorrectly. My best guess is that ppl writing abt whether or not LLMs 'understand' or 'are agents' have such strongly held beliefs abt what they want to be true that this impedes their ability to understand what we wrote.
Or maybe they aren't actually reading the paper --- just summarizing based on what other people (with similar beliefs) have mistakenly said about the paper.

>>
Today's case in point is a new arXiv posting, "Are Language Models More Like Libraries or Like Librarians? Bibliotechnism, the Novel Reference Problem, and the Attitudes of LLMs" by Lederman & Mahowald, posted Jan 10, 2024.



>>arxiv.org/pdf/2401.04854…
Read 11 tweets
Dec 7, 2023
A quick thread on #AIhype and other issues in yesterday's Gemini release: 1/
#1 -- What an utter lack of transparency. Researchers form multiple groups, including @mmitchell_ai and @timnitgebru when they were at Google, have been calling for clear and thorough documentation of training data & trained models since 2017. 2/
In Bender & Friedman 2018, we put it like this: /3 Screecap: "These two recommendations will need to be implemented with care. We have already noted the potential barrier to access. Secrecy concerns may also arise in some situations (e.g., some groups may be willing to share datasets but not demographic information, for fear of public relations backlash or to protect the safety of contributors to the dataset). That said, as consumers of datasets or products trained with them, NLP researchers, developers, and the general public would be well advised to use systems only if there is access to the information we propose should be included ...
Read 20 tweets
Nov 24, 2023
With the OpenAI clownshow, there's been renewed media attention on the xrisk/"AI safety" nonsense. Personally, I've had a fresh wave of reporters asking me naive questions (+ some contacts from old hands who know how to handle ultra-rich man-children with god complexes). 🧵1/
As a quick reminder: AI doomerism is also #AIhype. The idea that synthetic text extruding machines are harbingers of AGI that is on the verge of combusting into consciousness and then turning on humanity is unscientific nonsense. 2/
t the same time, it serves to suggest that the software is powerful, even magically so: if the "AI" could take over the world, it must be something amazing. 3/
Read 27 tweets
Jun 11, 2023
There's a lot I like in this op-ed, but unfortunately it ends with some gratuitous ableism (and also weird remarks about AGI as a "holy grail").

First, the good parts:

theguardian.com/commentisfree/…
"[False arrests w/face rec tech] should be at the heart of one of the most urgent contemporary debates: that of artificial intelligence and the dangers it poses. That it is not, and that so few recognise it as significant, shows how warped has become the discussion of AI,"

>>
"We have stumbled into a digital panopticon almost without realising it. Yet to suggest we live in a world shaped by AI is to misplace the problem. There is no machine without a human, and nor is there likely to be."

>>
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(