I wish the research community had good shared shorthand for "This ML model can X {syntax, meaning, intent, ...}", where X is roughly "represent and use, with less robustness and systematicity than humans but greater success than any trivial baseline we know of".
Much of the current "can LMs model meaning" discourse is disagreement not about "meaning", but "model". There are lots of interesting intermediate states between total failure and total success. "Can model meaning" suggests a binary criterion & we all have different thresholds.
In particular, I (and I think many others who grew up in the statistical NLP tradition) still think of n-gram models as "prototypical" language generation systems. Now we wake up every day and say "holy crap, a talking n-gram model" (mit.edu/people/dpolica…)
Speculative (!!!) paper arguing that big LMs can model agency & communicative intent: arxiv.org/abs/2212.01681 (somehow in EMNLP findings). Briefly:
1. LMs do not in general have beliefs or goals. An LM trained on the Internet models a distribution over next tokens *marginalized*
over all the authors who could have produced the context. All large-scale training sets are generated by a mixture of authors w/ mutually incompatible beliefs & goals, and we shouldn't expect a model of the marginal dist. over utts to be coherent.
BUT
2. B/c single training documents are written by individuals who do have specific communicative intentions, and b/c understanding these intentions helps w/ next-word prediction, we should expect LMs to *infer and represent* the latent beliefs/intentions/etc that give rise to a ctx
New TACL paper from Semantic Machines! A first look at the approach we've been developing for modeling complex, task-oriented conversations using dataflow graphs.
I am extremely excited about this work for several reasons:
1/ It highlights one of the most consequential but overlooked consequences of neural models in dialogue: explicit representations of intent still matter (a production system can't lie about what it did even 1% of the time) but now *we can predict whatever representations we want*
If you look at the earliest dialogue research (e.g. Grosz & Sidner's Shared Plans), the community used to be way more ambitious about the dialogue phenomena we tried to represent. Most of that went out the window when contemporary ML approaches weren't up to modeling it.
Was also very happy to get multiple pointers to @alexandersclark & Eyraud's work on substitutable languages (dl.acm.org/doi/pdf/10.555…). Haven't done a full read yet but "weak substitutability <=> syntactic congruence" is exactly what (1-fragment, full-context) GECA assumes---
so you can think of GECA as attempting to constrain models to substitutable languages by computing the closure of the training data over observed syntactic congruences. This closure need not produce a CFL (which answers a question I had about GECA!)...
someone should go back and categorize all the "what's most important about deep learning" responses---literally nobody agrees! Accessibility, scalability, empirical performance, representations of lexical meaning, feature representations more broadly, support for new inputs, ...
Always good to be reminded that apart from Aravind, almost all of the "founding members" of what's now the NLP community were women: Spärck-Jones, Webber, Grosz, Hajicova.