Sabrina J. Mielke Profile picture
Jul 19, 2020 43 tweets 33 min read Read on X
My first #ICML2020 was different from my n-th #acl2020nlp, but, or perhaps because of that, I did try to look for interesting papers that I could relate to but that might still teach me something new!

Papers, in roughly chronological order---each with a short summary :) [1/42]
“How Good is the Bayes Posterior in Deep Neural Networks Really?” (Florian Wenzel/@flwenz, Kevin Roth, @BasVeeling, Jakub Swiatkowsk, Linh Tran, @s_mandt, @JasperSnoek, @TimSalimans, @RJenatton, Sebastian Nowozin)

arxiv.org/abs/2002.02405


#ICML2020 [2/42]
[“How Good is the Bayes Posterior in Deep Neural Networks Really?” cont.]

As shown in @andrewgwils’ awesome tutorial, tempering works, probably because of bad priors?

#ICML2020 [3/42]
“Improving the Gating Mechanism of Recurrent Neural Networks” (Albert Gu, Caglar Gulcehre/@caglarml, Thomas Paine, Matthew Hoffman, Razvan Pascanu)

arxiv.org/abs/1910.09890

Initialize more randomly (uniform), and saturate faster!

#ICML2020 [4/42]
“ControlVAE: Controllable Variational Autoencoder” (Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin Liu, Jun Wang, Tarek Abdelzaher)

arxiv.org/abs/2004.05988

Do something like PID on the beta of a beta-VAE!

#ICML2020 [5/42]
“A Chance-Constrained Generative Framework for Sequence Optimization” (Xianggen Liu, Jian Peng, Qiang Liu, Sen Song)

proceedings.icml.cc/static/paper_f…

Add Lagrangian for probabilistic grammaticality of molecule strings that requires minimization, iterate back and forth.

#ICML2020 [6/42]
“Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning” (Qing Li/@Sealiqing, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu)

arxiv.org/abs/2006.06649


#ICML2020 [7/42]
[“Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning” cont.]

Search possible parses to agree with the solution of recognized hand-written formulas---better than trying straight-up RL.

#ICML2020 [8/42]
“How recurrent networks implement contextual processing in sentiment analysis” (Niru Maheswaranathan/@niru_m, @SussilloDavid)

arxiv.org/abs/2004.08013


#ICML2020 [9/42]
[“How recurrent networks implement contextual processing in sentiment analysis” cont.]

Look at RNNs as the dynamical systems they are---states evolve given input and you can see modifiers excite the state away from manifolds…

#ICML2020 [10/42]
[“Topological Autoencoders” cont.]

Use touching time when expanding spheres from data points as an indicator of topology and try to make distance match up in latents for these connections.

#ICML2020 [12/42]
“Problems with Shapley-value-based explanations as feature importance measures” (Lizzie Kumar/@ziebrah, Suresh Venkatasubramanian/@geomblog, Carlos Scheidegger/@scheidegger, Sorelle Friedler)

arxiv.org/abs/2002.11097

#ICML2020 [13/42]
[“Problems with Shapley-value-based explanations as feature importance measures” cont.]

The game theoretic view and usual constructions are bad in a few small counterexamples, so be careful!

#ICML2020 [14/42]
“Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers” (Zhuohan Li/@zhuohan123, @Eric_Wallace_, Sheng Shen/@shengs1123, Kevin Lin/@nlpkevinl, Kurt Keutzer, Dan Klein, Joseph Gonzalez/@mejoeyg)

#ICML2020 [15/42]
[“Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers” cont.]

arxiv.org/abs/2002.11794


Bigger trains faster and is more compressible (...as in parameter pruning and quantization).

#ICML2020 [16/42]
“The many Shapley values for model explanation” (Mukund Sundararajan/@modelanalyst, Amir Najmi)

arxiv.org/abs/1908.08474

There are many ways to use it and they have different weaknesses and oddities---especially when you go to continuous features (IG!).

#ICML2020 [17/42]
“Fast Differentiable Sorting and Ranking” (Mathieu Blondel/@mblondel_ml, Olivier Teboul, Quentin Berthet/@qberthet, Josip Djolonga)

arxiv.org/abs/2002.08871


#ICML2020 [18/42]
[“Fast Differentiable Sorting and Ranking” cont.]

Sorting and in particular ranking are very non-differentiable---their relaxation can be forward-solved in O(n log n)!

#ICML2020 [19/42]
“Structural Language Models of Code” (Uri Alon/@urialon1, @RoySadaka, @omerlevy_, Eran Yahav/@yahave)

arxiv.org/abs/1910.00577


Predict AST node given all leaf-to-parent paths, contextualized and attended to by the root-to-parent-path.

#ICML2020 [20/42]
“Proving the Lottery Ticket Hypothesis: Pruning is All You Need” (Eran Malach, Gilad Yehudai, Shai Shalev-Schwartz, Ohad Shamir)

arxiv.org/abs/2002.00585

Provable NN approximation by pruning random weights (not neurons tho) in an only polynomially bigger net!

#ICML2020 [21/42]
“Involutive MCMC: One Way to Derive Them All” (Kirill Neklyudov/@k_neklyudov, Max Welling/@wellingmax, Evgenii Egorov/@eeevgen, Dmitry Vetrov)

arxiv.org/abs/2006.16653


#ICML2020 [22/42]
[“Involutive MCMC: One Way to Derive Them All” cont.]

Many MCMC variants can be cast as involution+auxiliary variable. The talk gives a bit of an intuition: involutions preserve stationary density, auxiliary variables help walk through state space.

#ICML2020 [23/42]
“Predictive Sampling with Forecasting Autoregressive Models” (Auke Wiggers/@aukejw, @emiel_hoogeboom)

arxiv.org/abs/2002.09928


Guess what a transformer is gonna predict for next outputs, feed that as inputs for subsequent steps, ...

#ICML2020 [24/42]
[“Predictive Sampling with Forecasting Autoregressive Models” cont.]

...then check whether those indeed came out. If yes, advance as many steps as were right, if not, discard and use the actual output as next input and only advance one step.

#ICML2020 [25/42]
“Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention” (Angelos Katharopoulos/@angeloskath, Apoorv Vyas/@apoorv2904, Nikolaos Pappas/@nik0spapp, @francoisfleuret)

arxiv.org/abs/2006.16236


#ICML2020 [26/42]
[“Transformers are RNNs: [...]” cont.]

Instead of softmaxing the dot product, kernelize that distribution/similarity (i.e., feature functions) and use associativity to be twice linear. Works fine sometimes and not so fine other times.

#ICML2020 [27/42]
“Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics” (Debjani Saha, @CandiceSchumann, @DuncanMcelfresh, @johnpdickerson, Michelle Mazurek/@mmazurek_, Michael Tschantz)

arxiv.org/abs/2001.00089


#ICML2020 [28/42]
[“Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics” cont.]

Compare four fairness metrics, ask people to apply, explain, and judge them. Interesting (if moderately discouraging/worrying) results. Food for thought!

#ICML2020 [29/42]
“Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions” (@_AhmedAlaa_, Mihaela van der Schaar/@MihaelaVDS)

arxiv.org/abs/2006.13707

If you would “retrain” RNNs on subsets of training data,[...]

#ICML2020 [30/42]
[“Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions” cont.]

...you’d get confidence intervals---now don’t retrain but adapt parameters to “remove” training data through some Hessian and influence function magic.

#ICML2020 [31/42]
“Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits” (Robert Peharz/@ropeharz, Steven Lang, Antonio Vergari/@tetraduzione, Karl Stelzner, Alejandro Molina/@alejom_ml, @martin_trapp, [...wow not even the author list fits...]

#ICML2020 [32/42]
[“Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits” cont.]

...Guy Van den Broeck/@guyvdb, Kristian Kersting/@kerstingAIML, Zoubin Ghahramani)

arxiv.org/abs/2004.06231


#ICML2020 [33/42]
[“Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits”cont.]

Probabilistic circuits (products and weighted sums of distributions) are cool but slow; batching leads to Einsum networks!

#ICML2020 [34/42]
“Aligned Cross Entropy for Non-Autoregressive Machine Translation” (Marjan Ghazvininejad/@gh_marjan, Vladimir Karpukhin, @LukeZettlemoyer, @omerlevy_)

arxiv.org/abs/2004.01655


#ICML2020 [35/42]
[“Aligned Cross Entropy for Non-Autoregressive Machine Translation” cont.]

Find best monotonic alignment (Viterbi) so a NANMT model isn’t coaxed into predicting things multiple times in the hope of getting one in the *right* position.

#ICML2020 [36/42]
“The continuous categorical: a novel simplex-valued exponential family” (Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham)

arxiv.org/abs/2002.08563

A simple distribution over the simplex that is convex, so the mode [...]

#ICML2020 [37/42]
[“The continuous categorical: a novel simplex-valued exponential family” cont.]

...is always at a vertex. Probabilities for vertices are defined and you can’t get infinite interior likelihoods (like for infinitely concentrated Dirichlets), only on vertices.

#ICML2020 [38/42]
“Latent Variable Modelling with Hyperbolic Normalizing Flows” (Joey Bose/@bose_joey, Ariella Smofsky/@asmoog1, Renjie Liao/@lrjconan, Prakash Panangaden/@prakash127, Will Hamilton/@williamleif)

arxiv.org/abs/2002.06336


#ICML2020 [39/42]
[“Latent Variable Modelling with Hyperbolic Normalizing Flows” cont.]

Build hyperbolic normalizing flows using invertible moves to and from tangent spaces and metric-preserving moves between tangent flows.

#ICML2020 [40/42]
“Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information” (Karl Stratos/@karlstratos, Sam Wiseman)

arxiv.org/abs/2004.03991

Encode data by maximizing mutual information between its origin and the representation, [...]

#ICML2020 [41/42]
[“Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information” cont.]

...variationally estimating it in a minimax game. An n-gram model over bitstrings as a structured discrete distribution over z gives you efficient inference.

#ICML2020 [42/42]
Alright! (with total count only off-by-one) 💪

Hope I didn’t mess any paper up too much (or missed tagging an author)... 😰 Let me know! ☺️

Looking forward to the next #ICML2020 🤩 [43/42, fin.]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sabrina J. Mielke

Sabrina J. Mielke Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @sjmielke

Dec 21, 2021
Tokenization—the least interesting #NLProc topic? Hell no! We, members of the @BigScienceW tokenization group are proud to present:

✨Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP✨
arxiv.org/abs/2112.10508

What's in it? [1/10] A screenshot from the paper PDF, showing author list, affili
@BigscienceW We start by examining the theoretical and linguistic foundation of trying to identify discrete units of language (§2), leading into "old-school" tokenization, these days more often called pretokenization (§3). Are words really obvious? Oh no they aren't... [2/10]
But say we treat words as our atoms, then we probably want to augment our models by considering spellings of our "atomic" words as well (§4)—and this leads us to models that try to *learn* and *find* word boundaries (think of Chinese Word Segmentation) unsupervisedly (§5)! [3/10]
Read 11 tweets
Nov 26, 2020
I finally watched all the talks I wanted to, ended up importing 56 papers to my bib, and now present to you:

🎉 My 13 favorite papers (sorted alphabetically) at #EMNLP2020! 🔥

[1/15]
#EMNLP2020 recommendation:

"Attention is Not Only a Weight: Analyzing Transformers with Vector Norms"
@goro_koba, @ttk_kuribayashi, @sho_yokoi_, Kentaro Inui

Small vectors with high attention still have small impact!

aclweb.org/anthology/2020…



[2/15]
#EMNLP2020 recommendation:

"BLEU might be Guilty but References are not Innocent"
@markuseful, David Grangier, @iseeaswell

Translationese references reward the wrong systems!

aclweb.org/anthology/2020…



[3/15]
Read 15 tweets
May 2, 2020
With @iclr_conf #ICLR2020 over and a bit of sleep under my belt, I'd like to give my short summary of a truly great event---and offer a list of the papers I enjoyed seeing (for those who are into that kind of thing).
In general, I feel lucky to live in a time where we have venues like these full of really interesting papers on the intersection between NLP and ML (and others, but that's what I personally am most into, so my experience is biased).
First off, echoing what everyone else concluded: the website was great. For those who didn't attend, I hope you'll get to see it soon. Having a prerecorded 5-minute talk for each paper along with the slides you could click through made for excellent paper browsing in my mind:
Read 47 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(