My first #ICML2020 was different from my n-th #acl2020nlp, but, or perhaps because of that, I did try to look for interesting papers that I could relate to but that might still teach me something new!
Papers, in roughly chronological order---each with a short summary :) [1/42]
Use touching time when expanding spheres from data points as an indicator of topology and try to make distance match up in latents for these connections.
“Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers” (Zhuohan Li/@zhuohan123, @Eric_Wallace_, Sheng Shen/@shengs1123, Kevin Lin/@nlpkevinl, Kurt Keutzer, Dan Klein, Joseph Gonzalez/@mejoeyg)
[“Involutive MCMC: One Way to Derive Them All” cont.]
Many MCMC variants can be cast as involution+auxiliary variable. The talk gives a bit of an intuition: involutions preserve stationary density, auxiliary variables help walk through state space.
[“Predictive Sampling with Forecasting Autoregressive Models” cont.]
...then check whether those indeed came out. If yes, advance as many steps as were right, if not, discard and use the actual output as next input and only advance one step.
Instead of softmaxing the dot product, kernelize that distribution/similarity (i.e., feature functions) and use associativity to be twice linear. Works fine sometimes and not so fine other times.
[“Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics” cont.]
Compare four fairness metrics, ask people to apply, explain, and judge them. Interesting (if moderately discouraging/worrying) results. Food for thought!
[“Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions” cont.]
...you’d get confidence intervals---now don’t retrain but adapt parameters to “remove” training data through some Hessian and influence function magic.
“Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits” (Robert Peharz/@ropeharz, Steven Lang, Antonio Vergari/@tetraduzione, Karl Stelzner, Alejandro Molina/@alejom_ml, @martin_trapp, [...wow not even the author list fits...]
[“Aligned Cross Entropy for Non-Autoregressive Machine Translation” cont.]
Find best monotonic alignment (Viterbi) so a NANMT model isn’t coaxed into predicting things multiple times in the hope of getting one in the *right* position.
[“The continuous categorical: a novel simplex-valued exponential family” cont.]
...is always at a vertex. Probabilities for vertices are defined and you can’t get infinite interior likelihoods (like for infinitely concentrated Dirichlets), only on vertices.
[“Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information” cont.]
...variationally estimating it in a minimax game. An n-gram model over bitstrings as a structured discrete distribution over z gives you efficient inference.
Tokenization—the least interesting #NLProc topic? Hell no! We, members of the @BigScienceW tokenization group are proud to present:
✨Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP✨ arxiv.org/abs/2112.10508
What's in it? [1/10]
@BigscienceW We start by examining the theoretical and linguistic foundation of trying to identify discrete units of language (§2), leading into "old-school" tokenization, these days more often called pretokenization (§3). Are words really obvious? Oh no they aren't... [2/10]
But say we treat words as our atoms, then we probably want to augment our models by considering spellings of our "atomic" words as well (§4)—and this leads us to models that try to *learn* and *find* word boundaries (think of Chinese Word Segmentation) unsupervisedly (§5)! [3/10]
With @iclr_conf#ICLR2020 over and a bit of sleep under my belt, I'd like to give my short summary of a truly great event---and offer a list of the papers I enjoyed seeing (for those who are into that kind of thing).
In general, I feel lucky to live in a time where we have venues like these full of really interesting papers on the intersection between NLP and ML (and others, but that's what I personally am most into, so my experience is biased).
First off, echoing what everyone else concluded: the website was great. For those who didn't attend, I hope you'll get to see it soon. Having a prerecorded 5-minute talk for each paper along with the slides you could click through made for excellent paper browsing in my mind: