Tweet

Jay Alammar

21 Nov, 6 tweets, 6 min read

So many fascinating ideas at yesterday's #blackboxNLP workshop at #emnlp2020. Too many bookmarked papers. Some takeaways:
1- There's more room to adopt input saliency methods in NLP. With Grad*input and Integrated Gradients being key gradient-based methods.

See: aclweb.org/anthology/2020… aclweb.org/anthology/2020… aclweb.org/anthology/2020…

@IbanDlank

2- NLP language model (GPT2-XL especially -- rightmost in graph) accurately predict neural response in the human brain. The next-word prediction task robustly predicts neural scores. @IbanDlank @martin_schrimpf @ev_fedorenko

biorxiv.org/content/10.110…

This line investigating the human brain's "core language network" using fMRI is helping build hypotheses of what IS a language task and what is not. e.g. GPT3 doing arithmetic is beyond what the human brain language network is responsible for
biorxiv.org/content/10.110…

@roger_p_levy

3- @roger_p_levy shows another way of comparing language models against the human brain in reading comprehension: humans take longer to read unexpected words -- that time correlates with the NLP model probability scores

cognitivesciencesociety.org/cogsci20/paper…

https://twitter.com/roger_p_levy/status/1329849700091092996

@yudapearl

4- Causal graphs are slowly trickling in. An effort to empower NLP models with aspects of causal inference (see: @yudapearl's Book of Why)

aclweb.org/anthology/2020…
aclweb.org/anthology/2020…
aclweb.org/anthology/2020…
arxiv.org/pdf/2005.13407…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @JayAlammar

Jay Alammar

@JayAlammar

21 Jul

How GPT3 works. A visual thread.

A trained language model generates text.

We can optionally pass it some text as input, which influences its output.

The output is generated from what the model "learned" during its training period where it scanned vast amounts of text.

1/n

Training is the process of exposing the model to lots of text. It has been done once and complete. All the experiments you see now are from that one trained model. It was estimated to cost 355 GPU years and cost $4.6m.

2/n

The dataset of 300 billion tokens of text is used to generate training examples for the model. For example, these are three training examples generated from the one sentence at the top.

You can see how you can slide a window across all the text and make lots of examples.

3/n

Read 14 tweets

Jay Alammar

@JayAlammar

14 Jul

On the transformer side of #acl2020nlp, three works stood out to me as relevant if you've followed the Illustrated Transformer/BERT series on my blog:
1- SpanBERT
2- BART
3- Quantifying Attention Flow
(1/n)

@mandarjoshi_

SpanBERT (by @mandarjoshi_ @danqi_chen @YinhanL @dsweld @LukeZettlemoyer @omerlevy_) came out last year but was published in this year's ACL. It found that BERT pre-training is better when you mask continuous strings of tokens, rather than BERT's 15% scattered tokens.

@ml_perception

BART (@ml_perception @YinhanL @gh_marjan @omerlevy_ @vesko_st @LukeZettlemoyer) presents a way to use what we've learned from BERT (and spanBERT) back into encoder-decoder models, which are especially important for summarization, machine translation, and chatbots. 3/n #acl2020nlp

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Jay Alammar

Try unrolling a thread yourself!

More from @JayAlammar

Jay Alammar

Jay Alammar

Did Thread Reader help you today?

Like this author's thread?