Jacob Andreas Profile picture
Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJw
Dec 9, 2022 5 tweets 1 min read
I wish the research community had good shared shorthand for "This ML model can X {syntax, meaning, intent, ...}", where X is roughly "represent and use, with less robustness and systematicity than humans but greater success than any trivial baseline we know of". Much of the current "can LMs model meaning" discourse is disagreement not about "meaning", but "model". There are lots of interesting intermediate states between total failure and total success. "Can model meaning" suggests a binary criterion & we all have different thresholds.
Dec 6, 2022 8 tweets 2 min read
Speculative (!!!) paper arguing that big LMs can model agency & communicative intent: arxiv.org/abs/2212.01681 (somehow in EMNLP findings). Briefly:

1. LMs do not in general have beliefs or goals. An LM trained on the Internet models a distribution over next tokens *marginalized* First page of the paper "Language Models as Agent Model over all the authors who could have produced the context. All large-scale training sets are generated by a mixture of authors w/ mutually incompatible beliefs & goals, and we shouldn't expect a model of the marginal dist. over utts to be coherent.

BUT
Sep 21, 2020 9 tweets 2 min read
New TACL paper from Semantic Machines! A first look at the approach we've been developing for modeling complex, task-oriented conversations using dataflow graphs.

aka.ms/AA9oxf3

I am extremely excited about this work for several reasons: 1/ It highlights one of the most consequential but overlooked consequences of neural models in dialogue: explicit representations of intent still matter (a production system can't lie about what it did even 1% of the time) but now *we can predict whatever representations we want*
Jul 9, 2020 5 tweets 2 min read
Was also very happy to get multiple pointers to @alexandersclark & Eyraud's work on substitutable languages (dl.acm.org/doi/pdf/10.555…). Haven't done a full read yet but "weak substitutability <=> syntactic congruence" is exactly what (1-fragment, full-context) GECA assumes--- so you can think of GECA as attempting to constrain models to substitutable languages by computing the closure of the training data over observed syntactic congruences. This closure need not produce a CFL (which answers a question I had about GECA!)...
Jul 6, 2020 7 tweets 2 min read
Thoughts on Kathy McKeown's #acl2020nlp keynote: someone should go back and categorize all the "what's most important about deep learning" responses---literally nobody agrees! Accessibility, scalability, empirical performance, representations of lexical meaning, feature representations more broadly, support for new inputs, ...
Jul 3, 2020 8 tweets 1 min read
sing minutes grammars and three transducers neural semantic persons
Jun 26, 2020 4 tweets 2 min read
New preprint led by Jesse Mu (@jayelmnop) on discovering compositional concepts in deep networks! You've heard of the "cat neuron" and the "sentiment neuron"; now, meet the green-and-brown-water neuron, the castle-or-surgery neuron, and the cheating-at-SNLI neuron. 1/ Earlier work from Berkeley (arxiv.org/abs/1707.08139) and MIT (netdissect.csail.mit.edu) automatically labels deep features with textual descriptions from a predefined set. In our new work, we generate more precise & expressive explanations by composing them on the fly. 2/ Image
May 15, 2020 4 tweets 2 min read
Reading arxiv.org/pdf/2005.07064… by @aggielaz Potapenko & Tieleman. Nice side-by-side comparison of different combos of sup grounding loss & communicative / task loss for NLG. Even when you lock down semantics, agents trained w task loss form weird *pragmatic* conventions. This last obs is a clear illustration of the fact that RSA (at least as usually applied to captioning) only explains a limited subset of pragmatic implicatures! In settings like the one studied here, learners are free to come up with their own conventions for the rest.
May 13, 2020 8 tweets 2 min read
Reading arxiv.org/pdf/2005.00724… @sanjayssub et al. Evidence of "non-faithfulness" in NMNs: the disentangled neural modules we get from e.g. CLEVR don't arise in models trained on natural data. But what does it mean for an NMN to be faithful? 1/ Image (First, a couple of other goodies in this paper: module layouts from BREAK question decompositions; joint NMN training on QA and localization tasks, which is something I always mention as a nice feature of these models in talks but hardly anybody does.) 2/
May 1, 2020 5 tweets 2 min read
New paper, continuing on the theme of extreme forms of generalization. Lots of work on compositionality in ML focuses on building inductive bias into models. Can we get model-agnostic compositional generalization by changing the data instead? arxiv.org/abs/1904.09545 1/ Image This paper presents GECA, a (rule-based!) data augmentation scheme that allows an ordinary seq2seq model to solve @LakeBrenden & Baroni's SCAN dataset, boosts semantic parser performance in multiple of representations, and even helps a bit with low-resource language modeling. 2/
Apr 29, 2020 7 tweets 3 min read
New paper led by @alanamarzoev: can we build effective models for language understanding using only data from a hand-written grammar, and *no* hand annotated training examples? What does "sim-to-real transfer" look like for NLP?

arxiv.org/abs/2004.13645
alanamarzoev.com/blog-1/unnatur… ImageImage Some observations: (1) pretrained LM representations are pretty good at modeling similarity between human-generated and synthetic sentences, but (2) models trained only on LM representations of synthetic sentences still overfit and generalize badly to real ones.
Apr 1, 2020 6 tweets 1 min read
RT'd this yesterday but wanted to say more on why I'm excited! Most of the tools we have for asking if some linguistic feature is "encoded" in a representation look at how accurately it can be predicted relative to a control model or control task. But this can be misleading: with a powerful enough probe, the wrong control task will lead you to conclude that easy-to-predict abstractions like parts of speech are not meaningfully encoded, and the wrong control model might lead you to conclude that *nothing* is meaningfully encoded.
Jun 5, 2019 4 tweets 2 min read
arxiv.org/abs/1905.13358 Cool paper from Haoshuo Huang, Vihan Jain, @jasonbaldridge et al showing that you can get most of the benefits of our 2018 navigation paper (arxiv.org/abs/1806.02724) using about 5% of the synthetic data filtered through a learned ranking model. Would be interesting to know whether this opens the door to more sophisticated generation procedures---we tried data aug with the "reasoning speaker" in the original paper but it didn't improve over the direct baseline; possibly because (as the new paper points out)...
Apr 7, 2019 5 tweets 3 min read
Beautiful paper by @mrdrozdov @pat_verga Mohit Yadav et al: build RNN shaped like the inside--outside dynamic program w/ soft child selection. Train outside reprs of leaves to predict their words. CKY decode on child scores gives SOTA unsup const parser! arxiv.org/abs/1904.02142 Similar flavor to arxiv.org/abs/1808.09942 (by @nitish_gup & Mike Lewis), which just does the inside pass but also assigns every cell a denotation, and gives a solid unsupervised *semantic* parser.
Feb 18, 2019 9 tweets 2 min read
I think the most interesting thing about the current LM discussion is the huge quality difference between the raw samples in github (github.com/openai/gpt-2/b…) and the cherry-picked samples on the blog. The raw ones are good, but not fall-out-of-your-chair good like Zombie Kennedy. Interesting not because it's a surprise that top-25 is so much better, but because it was a surprise (to me at least) that we should care about top-25.
Dec 3, 2017 4 tweets 2 min read
Trained a linear reg on ICLR title unigrams and reviewer scores. Bad words: end, autonomous, decision, q, approximate. Good words: relational, exploration, distributional, grounded, intrinsic. @xtimv @Smerity @xtimv @Smerity words have to appear 10x but this still seems to mostly be effects from a few papers
Aug 2, 2017 5 tweets 1 min read
ACL business meeting: Chris Manning talking now (I missed what looked like a good anti-arxiv position) #acl2017nlp equity is essential; double-blind is best way to preserve it
Jul 31, 2017 8 tweets 1 min read
Luheng He on deep SRL: homes.cs.washington.edu/~luheng/files/… #acl2017nlp previous SOTA is a CRF with neural potentials; here we're just going to train greedily (?) and Viterbi decode at test
Jul 31, 2017 8 tweets 3 min read
Next: @_shrdlu_ on joint semantic synthesis and morphological analysis of the derived word: arxiv.org/abs/1701.00946 @_shrdlu_ The phonology track is a wasteland
Jul 31, 2017 8 tweets 1 min read
Next: Tim Vieira on learning to prune #acl2017nlp we're going to learn to prune a coarse-to-fine DP by optimizing a linear combination of accuracy & runtime
Jul 31, 2017 5 tweets 1 min read
Next: Ravi Shekhar on breaking captioning models arxiv.org/abs/1705.01359 #acl2017nlp Take a captioned image, replace one noun with a different noun that makes it a false caption. Can do this automatically w/ COCO cat labels