@TaliaRinger Okay, so 1st some history. There was a big statistical revolution in the 90s, coming out of earlier work on ASR & statistical MT. By about 2003, Bob Moore (of MSR) was going around with a talk gloating about how over ~10yrs ACL papers went from mostly symbolic to mostly stats.
1/
@TaliaRinger That statistical NLP work was still closely coupled with understanding the shape of the problem being solved, specifically in feature engineering. Then (2010s) we got the next "invasion" from ML land (deep learning) where the idea was the computer would learn the features!

2/
@TaliaRinger Aside: As a (computational) linguist who saw both of these waves (though I really joined when the first was fully here), it was fun, in a way, to watch the established stats NLP folks be grumpy about the DL newcomers.

3/
@TaliaRinger So, what went well? The first wave definitely brought with it worthwhile evaluation practices. I think there is also a lot of scientific interest in the interplay between manual data analysis (theory, annotation) and pattern recognition applied to large datasets.

4/
@TaliaRinger What's gone poorly: I definitely have more to say here!

1. It seems that the academic goal of ML is (frequently?) to show that things can be done by a machine instead of humans and so a lot of the framing is "this is too expensive/too tedious to do by hand".

5/
@TaliaRinger The rhetoric in #1 seems to support a lot of dismissive attitudes towards domain expertise. The humans who know how to do it by hand aren't only "relieved" of their burden but also devalued in the process.

6/
@TaliaRinger It would be one thing if the machines could actually do the work effectively & accurately, but very often the domain experts aren't even consulted enough to be able to evaluate such claims.

7/
@TaliaRinger 2. Relatedly: those with domain expertise are relegated to the role of producing data that the ML folks consume in pursuit of showing the value of their algorithmic or architectural innovations. That data work is extremely devalued, despite being, ahem, foundational.

8/
@TaliaRinger See also Sambasivan et al 2021 “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

dl.acm.org/doi/abs/10.114…

9/
@TaliaRinger 3. There's a huge mismatch in timescales behind MPUs (minimal publishable units) in ML vs. just about everything else. So the folks doing the model work get to churn out publications & that leads to unsustainable "pace" of the field.

10/
@TaliaRinger Folks who are excited about this talk about the "progress" of AI in terms of churn on leaderboards/the ever shrinking window between a benchmark being published and its being maxed out.

11/
@TaliaRinger Side effects of #3 include flag planting, everyone feeling like they have to constantly stay on top of what's on arXiv, and of course needing to publish & promote quickly to the detriment of everyone's mental health, the inclusivity of the field, and anonymous peer-review.

12/
@TaliaRinger Re pace of research, I really enjoyed the keynote by @knmnyn at #COLING2018



13/
@TaliaRinger @knmnyn The field of #NLProc is inherently interdisciplinary: it simply cannot be done well without BOTH (a) domain expertise, shaping understanding of the problem space and (b) expertise in crafting and applying algorithms to that space.

14/
@TaliaRinger @knmnyn Here's a blog post I wrote in 2017 on interdisciplinarity:

medium.com/@emilymenonben…

15/
@TaliaRinger @knmnyn The culture of ML (and this doesn't have to be true of everyone doing ML for it to be the dominant culture) devalues domain expertise and displaces domain experts ... and then somehow misses that this messes things up for the (scientific) goals of ML:


16/
@TaliaRinger @knmnyn Whenever I talk about this, there's a chorus of responses from CS types who are offended by what I have to say and describe my stance as "gatekeeping". But none of this is about trying to keep CS/ML out, but rather developing real interdisciplinarity:


17/
@TaliaRinger @knmnyn I'll finish this thread with an anecdote from a workshop in 2009:

aclanthology.org/events/ws-2009…

18/
@TaliaRinger @knmnyn At that workshop, in a Q&A session, Mark Johnson (linguist turned statistical NLP person at that point) asserted that hand-annotated treebanks "scaled better" than grammar development. At a coffee break, I pressed him to explain *in what dimension*.

19/
@TaliaRinger @knmnyn Because once you have a grammar, you can produce parses for open-ended amounts of text and even move pretty easily to texts from new genres/domains. Hand-annotated treebanks are expandable ... one item at at time.

20/
@TaliaRinger @knmnyn (Years later, I wrote a paper about this with Dan Flickinger and @oepen :

Sustainable Development and Refinement of Complex Linguistic Annotations at Scale

link.springer.com/chapter/10.100…

)

21/
@TaliaRinger @knmnyn @oepen So what was the dimension that Mark Johnson had in mind?

The number of CS labs whose work could be supported by the resource!

Which seems like ... maybe not the dimension to be optimizing on.

22/
@TaliaRinger I don't have a lot of detailed suggestions for maximizing good/minimizing bad here, but I think the solutions you are looking for are ones that put the ML & domain folks on an equal footing overall and the domain folks ascendant in the discourse around what is a "solution".

23/
@TaliaRinger And just to be very clear: in #NLProc what counts as a "solution" also depends heavily on the application domain. Sometimes, it's linguists who know (e.g. evaluating word sense disambig.) other times linguists can help, but the domain experts are others (e.g. biomed IE).

/fin

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Emily M. Bender

Emily M. Bender Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @emilymbender

26 Aug
Talking with students & others the past few days has brought some clarity to the ways in which the LLMs & associated overpromises suck the oxygen out of the room for all other kinds of research.

1/
(To be super clear: the conversations I was having with students about this were of the form of "how do I navigate wanting to work on X and get it published, when the whole field seems to expect that I must use LLMs?")

2/
We seem to be in a situation where people building & promoting LLMs are vastly overclaiming what they can do:

"This understands natural language!"
"This can do open-ended conversation!"

3/
Read 11 tweets
25 May
This whole interview is so incredibly cringe! On top of completely evading the issue as @timnitGebru points out, the views of both employees and users painted here are frankly awful. 1/n
First, a "fun subject" -- really? Even if that was meant somewhat sarcastically, at *best* it belittles the real harm done to @timnitGebru , @mmitchell_ai (not to mention @RealAbril and other mistreated Google employees). Screen cap from WIRED article showing interviewer's question
But then check out Raghavan's reply. What does "famously open culture where people can be expressive" have to do with this story? What were they fired for, if not for being "expressive"?

3/n Screen cap of reply to question in prev tweet: "We have
Read 17 tweets
25 May
Currently reading the latest NSF/Amazon call (NSF 21-585) which goes out of its way to say interdisciplinary perspectives are crucial and that "this program supports the conduct of fundamental computer science research".
So, is interdisciplinary work on "AI" actually "fundamental computer science research"?
"The lead PI on each proposal must bring computer science expertise to the research. Computationally focused research efforts informed by socio-technical and social behavioral needs of the field are broadly encouraged." Was this thing written by a committee?
Read 7 tweets
3 May
Wow this article covers a lot of ground! Seems like a good way for folks interested in "AI ethics" and what that means currently to get a quick overview.

Draws on work by @mmitchell_ai @timnitGebru @rajiinio @jovialjoy @mathbabedotorg and many others.
>>
zdnet.com/article/ethics…
A few pull quotes & comments:
"Ethics in AI is essentially questioning, constantly investigating, and never taking for granted the technologies that are being rapidly imposed upon human life.

That questioning is made all the more urgent because of scale."
Read 20 tweets
3 May
Fantastic talk and Q&A by @timnitGebru at #ICLR2021

Among other things I really appreciate how Timnit is unerasing the contribution of our retracted co-authors and how key their contributions & perspectives were to the Stochastic Parrots paper.
@timnitGebru And so much else: @timnitGebru is absolutely brilliant at drawing connections between the research milieu, research content, geopolitics and individual, situated lived experience.
@timnitGebru On interdisciplinarity and the hierarchy of knowledge:

“If you have all the money, you don’t have to listen to anybody” —@timnitgebru
Read 9 tweets
8 Apr
On professional societies not giving academic awards to harassers, "problematic faves", or bigots, a thread: /1
Context: I was a grad student at Stanford (in linguistics) in the 1990s, but I was clueless about Ullman. I think I knew of his textbook, but didn't know whether he was still faculty, let alone where. /2
I hadn't heard about his racist web page until this week. /3
Read 20 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(