Latest Twitter Threads by @nlpmattg on Thread Reader App

Jun 12, 2021 • 22 tweets • 4 min read

Here's some more context on my arguments about the ethics of crowdsourcing, as I don't think we are all operating under the same set of facts. I have read the papers that are cited in this section on crowdsourcing. I'm very skeptical of the numbers that are presented there.

https://twitter.com/yoavgo/status/1402396690661969926

First, I would venture to say that I am probably one of the ones who has used crowdsourcing the most in NLP recently. I have created some 10 relatively large scale datasets on mechanical turk in the last few years. My experience does not match those reported numbers at all.

May 23, 2020 • 7 tweets • 2 min read

This shows, once again, the problem of conflating a format with a phenomenon (not talking about Graham specifically here, but the field as a whole). Taking two sentences and classifying them is a format that permits arbitrary scope.

https://twitter.com/gneubig/status/1263946293145206787

That this format got conflated with the semantic notion of entailment as a whole is a (collective) mistake that has caused a *ton* of confusion about what the capabilities of any particular trained model should be.

Apr 7, 2020 • 18 tweets • 9 min read

Evaluating NLP Models via Contrast Sets

New work that is a collaboration between 26 people at 10 institutions (!)

arxiv.org/abs/2004.02709

Trying to tag everyone at the top of the thread, here it goes: @yoavartzi, Victoria Basmova, @JonathanBerant, @ben_bogin, @soshsihao, @pdasigi, @ddua17, @yanaiela, Ananth Gottumukkala, @nitish_gup, @HannaHajishirzi, @gabriel_ilharco

Mar 3, 2020 • 12 tweets • 3 min read

This was an interesting paper to read. It's well-written, and the method that's used clearly works very well. A few things struck me as I read:

https://twitter.com/kelvin_guu/status/1227378652318330885

First, it struck me how simple the method is. They're using an inner-product search to retrieve encoded documents, and then passing the retrieved document to some end task, doing a very shallow approximation of marginalizing over the retrieval. That's it.

May 14, 2019 • 7 tweets • 1 min read

Thus begins my tri-annual brooding on why we have gatekeepers in between our papers and their intended audience. And I say this as someone whose papers mostly made it past the gatekeepers. I really don't see the point. The impact of your work long term doesn't depend on which stamps of approval it got from the gatekeepers, it depends on how useful the community as a whole finds your contributions to be.

Mar 4, 2019 • 14 tweets • 5 min read

Announcing DROP, a new reading comprehension benchmark that requires discrete reasoning over paragraphs of text. New @NAACLHLT paper by @ddua17, @yizhongwyz, @pdasigi, @GabiStanovsky, @sameer_, and me. allennlp.org/drop.html arxiv.org/abs/1903.00161 I am super excited about this; I've been thinking about this for over a year, and we finally decided to pursue it as our first collaboration between AI2 Irvine and the UCI NLP group. This is a hard dataset that uses complex questions to test comprehensive understanding.

Share this page!

Enter URL or ID to Unroll