The #ICLR2021 Workshop on Enormous Language Models (WELM) is tomorrow, May 7th!

Full info: welmworkshop.github.io
Livestream: welmworkshop.github.io/livestream/
gathertown info for ICLR registrants: iclr.cc/virtual/2021/w…

Thread summarizing the talks & panels ⬇️ (1/14)
Our first talk will be by Thomas Margoni, who will provide some legal perspective on the use of web data for training large language models. He'll touch on topics like copyright law, rights, and licenses, as they pertain to training data for LMs. (2/14)
Then, @JesseDodge will give a talk on how to document datasets and improve reproducibility of research. He'll discuss the NLP reproducibility checklist, a recent study on documenting C4, and a framework for modeling bias in data. (3/14)
We'll then have @emilymbender and @mcmillan_majora present the recent (but already seminal) Stochastic Parrots 🦜 paper, which places a critical lens on enormous LMs from many perspectives, including environmental and financial costs and dataset accountability. (4/14)
After a short break, @Thom_Wolf will give an introduction to the nascent Big Science effort, whose goal is to build a giant LM through a massive community collaboration (taking inspiration from similar efforts in particle physics). (5/14)
Then, @em_dinan will discuss her extensive work on measuring, documenting, and mitigating toxic generation from large LMs. After arguing why this is an important and challenging problem, she'll discuss adversarial human-in-the loop benchmarking as a path for progress. (6/14)
Our first panel of the day will be on “Bias, safety, copyright, and efficiency”, moderated by @katherine1ee and will include panelists @Thom_Wolf, Thomas Margoni, @em_dinan, @natschluter, and @JesseDodge. (7/14)
As part of WELM we created the participant-led BIG-bench (github.com/google/BIG-ben…) for more rigorous evaluation of giant LMs. After an introduction and overview by @jaschasd, we'll have 10 spotlights and talks by @megamor2, @oliverrr_shen, Nathan A. Chi, and Rowan Jacobs. (8/14)
After the BIG-Bench session, Noam Shazeer will give a talk on a new way to characterize and describe the distributed computation algorithms required for training and serving enormous language models. (9/14)
Then, @ml_perception will discuss various architectures that can dramatically expand language modeling capabilities without using brute-force scaling, including nonparametric and sparsely-activated models. (10/14)
After that, Nicholas Carlini will discuss work that demonstrates the possibility of extracting training data from large language models -- even in the case where the model has not "overfit" to the training data in the traditional sense. (11/14)
We'll then have a talk by @AlisonGopnik describing how young children can perform causal, counterfactual, and relational inference, and what this means for building language models that should have similar capabilities. (12/14)
Our last talk will be by @yejinchoinka, a professor at UW who has led the development of many large LMs and benchmarks from within academia. She'll lend perspective as to how to do work in this space without access to industry-sized resources. (13/14)
Finally, our workshop will end with a second panel on “Extrapolating the capabilities of language models” with @AlisonGopnik, @yejinchoinka, @ml_perception, and @emilymbender. I am incredibly excited about this workshop and hope to see you (virtually) there! (14/14)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Colin Raffel

Colin Raffel Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @colinraffel

17 Dec 20
I recently have had a number of aspiring ML researchers ask me how to stay on top of the paper onslaught. Here are three concrete tips:
1) Pick a tiny subfield to focus on
2) Skim
3) Rely on your community
Thread to explain ⬇️ (1/5)
1) Pick a tiny subfield to focus on
It's impossible to stay on top of "all of ML". It's a gigantic and diverse field. Being an effective researcher requires laser-focusing on a subfield. Pick a problem that is important, excites you, and you feel you could make progress on. (2/5)
2) Skim
You'll find that many papers within your subfield of choice have a lot in common - there is often only a small nugget of novelty in each paper. It's incredibly important to develop your ability to find this nugget as quickly as possible. (3/5)
Read 5 tweets
12 Dec 19
In case you missed our #neurips poster on MixMatch (arxiv.org/abs/1905.02249) today because you aren't in Vancouver or didn't survive the poster session stampede, here's the PDF: github.com/google-researc… and here's a transcript of what I said to everyone who came by: ⬇️ 1/11
The goal in semi-supervised learning (SSL) is to use unlabeled data to improve a model's performance. Many approaches do this by using the model to produce "label guesses" for unlabeled data, and then training the model to predict those guesses. 2/11
Two common ingredients for producing label guesses are consistency regularization ("When I perturb the input or model, the model's prediction shouldn't change.") and entropy minimization ("The model should output low-entropy/confident predictions on unlabeled data.") 3/11
Read 11 tweets
24 Oct 19
New paper! We perform a systematic study of transfer learning for NLP using a unified text-to-text model, then push the limits to achieve SoTA on GLUE, SuperGLUE, CNN/DM, and SQuAD.
Paper: arxiv.org/abs/1910.10683
Code/models/data/etc: git.io/Je0cZ
Summary ⬇️ (1/14)
Our approach casts *every* language problem as a text-to-text task. For example, English-to-German translation -- input: "translate English to German: That is good." target: "Das ist gut." or sentiment ID -- input: "sentiment: This movie is terrible!", target: "negative" (2/14)
The text-to-text approach allows us to use the same model, loss function, decoding process, training procedure, etc. across every task we study. It also provides a standard testbed for the many ideas we evaluate in our empirical survey. (3/14)
Read 14 tweets
19 Sep 19
If you are reeling from a NeurIPS rejection or stressing about an ICLR submission, remember that some of the best papers were never published anywhere except arxiv. Thread of a few favorites (1/5):
"Generating Sequences with RNNs" by Graves arxiv.org/abs/1308.0850 This paper blew my mind when it came out, showing that it was possible to generate plausible text and handwriting with RNNs. Includes the predecessors of attention, Adam, etc... (2/5)
WaveNet by van den Oord et al. arxiv.org/abs/1609.03499 Until this came out I don't think most of us expected that we'd be able to generate raw waveforms with deep networks anytime soon. The results were surprisingly good and the architecture remains influential. (3/5)
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(