Thanks for the ping, @michaelbrundage

I don't think there's anything specific to LLMs here. Rather, this is endemic to the way ML is applied these days:
1. Someone creates a dataset & describes it as a benchmark for some skill. Sometimes, the skill is well-scoped & the benchmark represents a portion of it reasonably (eg ASR for news in X language). Others, the skill is bogus (all the physiognomy tasks; IQ from text samples; &c).
2. ML researchers use the benchmarks to test different approaches to learning. This can be done well (for well-scoped tasks): which algorithms are suited to which tasks and why? (Requires error analysis, and understanding the task as well as the algorithm.)
3. The focus shifts from understanding how learning algorithms relate to different kinds of data/tasks to leaderboardism. Many progress, much arXiv, wow!
4. In this research paradigm, no time is spent on critical analysis of benchmarks, including the data that make them up but especially also the task definitions.
5. #AIhype kicks in and the research is framed as "solving" such grand challenges as "language understanding" or "visual understanding" or...
6. That critical analysis work is happening, in at least two places: Within ML, like the work you point to, with typically a narrow lens ("hey look, the labels are bad!") but maybe getting some attention from the #AIhype crew.
But more important versions of the work, that critique the underlying conceptions of tasks and goals and claims tends to be marginalized: seen as lower on the 'hierarchy of knowledge' (see @timnitGebru's talk at Spelman ).
The folks who think of themselves as at the top of that hierarchy because they are the ones who create & apply the algorithms seem to largely dismiss (or just ignore) the work of people like Birhane, Gebru, Benjamin, Noble, Mitchell, Raji, Whittaker.
Note that some of this critical work is coming from people with deep training in ML, Gebru, Mitchell & Raji among them, but I think it's still an uphill battle to have it be taken as serious research by researchers who think of themselves as 'core ML'.
So we get:

MLbro: Look, I've solved language!
Critical researcher: No, you haven't. That task is bogus.
Rev2: But you didn't show what other task would prove the claim, so reject.

MLbro: Look, I can predict criminality from faces!
Critical researcher: No you can't and claiming you can feeds into racism/other systems of oppression.
Rev2: Keep your politics out of our conference.
But this works:

MLbro: Look I've solved language!
Other ML researcher: Well actually, the dataset in that benchmark is really messy and maybe you're just modeling noise.
Rev2: Okay, I guess we'll let this one through.
If ML/AI were just an obscure academic field this might not matter so urgently, but that's not the world we live in. The hierarchies of knowledge (see ⬆️) are closely modeled by hierarchies of funding and meanwhile techcos are pushing out AI snakeoil that is doing real harm.
Further reading/watching:…
Further reading/watching:…
Further reading/watching:

Further reading/watching:…
Further reading/watching:…
Further reading/watching
[Fixing broken threading]

Further reading/watching:…

• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with ❄️Emily M. Bender❄️

❄️Emily M. Bender❄️ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @emilymbender

Jan 25
💯 this! Overfunding is bad for the overfunded fields, bad for researchers in the overfunded fields, and bad for fields left to starve, and bad for society as a result of both of those.

Re bad for the field, see @histoftech 's tweet and the tweet by @ChristophMolnar they are QT-ing.

Re bad for researchers in the overfunded fields, see all the discourse around how do we keep up with arXiv??

Read 11 tweets
Jan 11
Quick! Think of an example of word sense ambiguity in English other than bank/river bank/financial institution!
(Mostly this is just commentary on how over-used that one example is, but I'm also kind of curious what people come up with.)
And if you want to read more about the ways in which words can have multiple senses, check out Ch 4 of Bender & Lascarides 2019:…
Read 4 tweets
Dec 2, 2021
“I’ve been frustrated for a long time about the incentive structures that we have in place and how none of them seem to be appropriate for the kind of work I want to do,” -- @timnitGebru on the founding of @DAIRInstitute…
@timnitGebru @DAIRInstitute “how to make a large corporation the most amount of money possible and how do we kill more people more efficiently,” Gebru said. “Those are […] goals under which we’ve organized all of the funding for AI research. So can we actually have an alternative?”…
“AI needs to be brought back down to earth,” said Gebru, founder of DAIR. “It has been elevated to a superhuman level that leads us to believe it is both inevitable and beyond our control. >>
Read 5 tweets
Dec 2, 2021
A few thoughts on citational practice and scams in the #ethicalAI space, inspired by something we discovered during my #ethNLP class today:

Today's topic was "language variation and emergent bias", i.e. what happens when the training data isn't representative of the language varieties the system will be used with.

The course syllabus is here, for those following along:…

Week by week, we've been setting our reading questions/discussion points for the following week as we go, so that's where the questions listed for this week come from.

Read 14 tweets
Nov 4, 2021
"Bender notes that Microsoft’s introduction of GPT-3 fails to meet the company’s own AI ethics guidelines, which include a principle of transparency" from @jjvincent on the @verge:…
@jjvincent @verge In a bit more detail, here's what Microsoft says in their blog:…

>> Screen cap of Microsoft blog reading: "That’s why Mic
@jjvincent @verge The principles are well researched and sensible, and working with their customers to ensure compliance is a laudable goal. However, it is not clear to me how GPT-3 can be used in accordance with them.

Read 9 tweets
Nov 3, 2021
About once a week, I get email from someone who'd like me to take the time to personally, individually, argue with them about the contents of Bender & Koller 2020. (Or, I gather, just agree that they are right and we were wrong.)

I don't answer these emails. To do so would be a disservice to, at the very least, the students to whom I do owe my time, as well as my own research and my various other professional commitments.

It's not that I object to people disagreeing with me! While I am committed to my own ideas, I don't have the hubris to believe I can't be wrong.

Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!


0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy


3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!