Discover and read the best of Twitter Threads about #EMNLP2021

Most recents (13)

#EMNLP2021 ends, but the Insights for Negative Results are coming tomorrow! The workshop is hybrid: virtual posters, talks by/for a mix of on-site & online speakers & attendees. Hosts: @JoaoSedoc @shabnamt1 @arumshisky @annargrs

Really proud of the program this year🧵:
8:45 Opening remarks
9:00 🗣️ Invited talk by Bonnie Webber: The Reviewers & the Reviewed: Institutional Memory & Institutional Incentives Image
10:00 💬🗨 Gathertown Poster session 1: Image
Read 11 tweets
A highlight from #EMNLP2021 fascinating keynote by @StevenBird:
NLP often comes with a set of assumptions about what are the needs of communities with low-resource languages. But we need to learn what they *actually* need, they may have a completely different epistemology.
/1 Image
AR: this is such a thought-provoking talk, pointing at the missing bridges between language tech and social sciences, esp. anthropology. As a computational linguist lucky to spend a year in @CPH_SODAS - I still don't think I even see the depth of everything we're missing.
/2
An audience question (@bonadossou from @MasakhaneNLP?): how do we increase the volume of NLP research on low-resource languages when such work is not as incentivized?
@StevenBird: keep submitting. I've had many rejections. Theme track for ACL2022 will be language diversity.
/3
Read 4 tweets
At #EMNLP2021 Evelina Fedorenko makes a strong case to defuse criticism that neural language models cannot "think". Neither can the human language modules in the brain, she argues, based on human brain studies. #EMNLP2021livetweet
In contrast, due it's predictive coding nature, language is inherently very well-suited to communication. #EMNLP2021livetweet
As far as human brain studies suggest, language is *not suitable for complex thought*, Fedorenko concludes her keynote at #EMNLP2021, as she outlines her future research. #EMNLP2021livetweet
Read 5 tweets
How large an emergency fund do I need? Do I have enough time to grab lunch before my next meeting?
We intuitively solve questions like these every day. Renowned physicist Enrico Fermi had a particular knack for it — these questions have become well known as Fermi Problems.
1/N
Solving Fermi Problems requires recursive decomposition, science/commonsense reasoning, abstraction, and creativity. The inherent complexity of these problems makes them an ideal candidate for #AI reasoning.
2/N
To spur research in this direction, we created two datasets — realFP and synthFP, a collection of real-world and templated Fermi problems. We found that large-scale LMs perform poorly even after fine-tuning, making estimates that can be off by 2+ orders of magnitude.
3/N
Read 4 tweets
📜 Excited to share our new work:

You have your nice multilingual translation model? Congrats 🎉
...
but what do you do if you want to add a new language (e.g., 🇳🇱) and don't have parallel data (🏴󠁧󠁢󠁥󠁮󠁧󠁿 - 🇳🇱) ?
Bonus✨: you can finally get rid of back-translation

🧵1/8
If you take a multilingual language model like mBART, add task adapters and fine-tune them with cross-attention for translation ➡️ This works well for your supervised pairs, but for your new language 🇳🇱, mBART forgets everything it learned before:

2/8
So we added denoising adapters. Our recipe is simple:
0️⃣ take mBART
1️⃣ add adapters for your languages and train them to reconstruct monolingual text
2️⃣ fine-tune cross-attention for translation

3/8
Read 8 tweets
Why does model often attend to salient words even though it's not required by the training loss? To understand this inductive bias we need to analyze the optimization trajectory🧐

Sharing our preprint "Approximating How Single Head Attention Learns" #NLProc
We approximate with 2 stages: early in training when attentions are uniform, the model learns to translate individual input word `i` to `o` if they co-occur frequently. Later, the model learns to attend to `i` while the correct output is o because it knows `i` translates to `o`.
All approximations are "wrong" (and apparently reviewers do not like our assumptions), but we are able to explain many existing empirical phenomena as well as predicting new ones: with our theory, we construct a distribution that is easy to express but hard to learn.
Read 6 tweets
Dense retrieval models (e.g. DPR) achieve SOTA on various datasets. Does this really mean dense models are better than sparse models (e.g. BM25)?
No! Our #EMNLP2021 paper shows dense retrievers even fail to answer simple entity-centric questions.

arxiv.org/abs/2109.08535 (1/6) Image
We construct EntityQuestions, consisting of simple, entity-rich questions such as “Where was Arve Furset born?”. We find dense retrieval models drastically underperform sparse models! (2/6)
We decouple the two distinct aspects of these questions: the entities and the question patterns. We find that dense retrieval models can only generalize to common entities or the question patterns that have been observed during training. (3/6) Image
Read 6 tweets
Now that "Do Transformer Modifications Transfer Across Implementations and Applications?" has been accepted to #EMNLP2021, we can finally tweet about it!

Paper 📝: arxiv.org/abs/2102.11972
Code 💾: github.com/google-researc…
Thread summary: ⬇️ (1/8)
After we published the T5 paper where we empirically surveyed many transfer learning methods to find out what works best, we decided to do something similar for Transformer architecture modifications. (2/8)
In the ~3 years since the Transformer was proposed, hundreds of architectural modifications had been proposed but almost none of them are commonly used. In other words, most Transformers people were training were largely the same as proposed in "Attention is All You Need". (3/8)
Read 8 tweets
Excited to announce our #EMNLP2021 paper that shows how to turn a pre-trained language model or even a randomly initialized model into a strong few-shot learner.

Paper: arxiv.org/abs/2109.06270
w/ amazing collaborators: @lmthang, @quocleix, @GradySimon, @MohitIyyer

1/9👇
Despite their strong performance on many tasks, large-scale pre-trained language models do not perform as well when limited labeled data is available (e.g., on small datasets or in few-shot settings). Collecting more labeled data can help but can also be prohibitively expensive.
We propose STraTA, which stands for Self-Training with Task Augmentation, an approach that combines two complementary methods, task augmentation and self-training, to effectively leverage task-specific unlabeled data, which is comparatively cheaper to obtain.
Read 9 tweets
🚨 Can response generation models read between the lines? Our 🆕 #EMNLP2021 paper probes RG models to see if they can identify common sense reasons by annotating CS explanations in dialogues and evaluating RG models for CS reasoning capabilities.
We formalize CS as a *latent variable* that helps explain the observed variable “response” in the RG process and instantiate CS using textual explanations of the response.
To collect annotations on CS explanations that justify dialogue responses. We first generate candidates by adopting a large T5 model trained on a story explanation dataset, GLUCOSE (@nasrinmmm et al). Next, we conduct a carefully designed two-stage human verification process.
Read 6 tweets
My first @GoogleAI residency project was accepted to @emnlpmeeting #EMNLP2021!

Prompt Tuning can condition a frozen T5 XXL model to perform new tasks while only adding 0.003% more parameters and no performance loss.

Camera Ready 📸: arxiv.org/abs/2104.08691

Quick Thread 🧵(1/7) A graph plotting model performance on SuperGLUE verse the nu
Fine-tuning all the parameters of large pre-trained models works well and is the core of many SotA NLP results right now, but has some sharp edges. The size can make them difficult to work with and serve, plus each fine-tuning run creates a fork. (2/7)
Prompt Tuning, learning a small set of parameters that are prepended to the embedded input, can eliminate these problems. Freezing pre-trained models enables mixed-task batching and efficient ensembling, without the need for multiple copies. (3/7) A diagram of model tuning vs prompt tuning inference, on the
Read 7 tweets
We can prompt language models for 0-shot learning ... but it's not what they are optimized for😢.

Our #emnlp2021 paper proposes a straightforward fix: "Adapting LMs for 0-shot Learning by Meta-tuning on Dataset and Prompt Collections".

Many Interesting takeaways below 👇
1. Prompting a language model out of the box can be highly suboptimal. For example, GPT-3 (175B parameters) gets 80% on SST-2 zero-shot, while UnifiedQA (700M) get 92% 🤔 so even being adapted to generic question answering can make a 200x smaller model better ...
2. We fix this by directly fine-tuning the model to produce the desired output given the task description and the task inputs. To get the training data, we unified datasets from 43 different sources into the same QA format and wrote 441 task descriptions in total *on our own*.
Read 9 tweets
I'm happy to announce that our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers" has been accepted to #EMNLP2021!

paper: arxiv.org/abs/2108.12284
code: github.com/robertcsordas/…

1/4
We improve the systematic generalization of Transformers on SCAN (0 -> 100% with length cutoff=26), CFQ (66 -> 81% on output length split), PCFG (50 -> 85% on productivity split, 72 -> 96% on systematicity split), COGS (35 -> 81%), and Mathematics dataset.

2/4
We achieve these large improvements by revisiting model configurations as basics as the scaling of embeddings, early stopping, relative positional embedding, and weight sharing (Universal Transformers).

3/4
Read 4 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!