Post

Sabrina J. Mielke

@sjmielke

Jul 19, 2020 • 43 tweets • 33 min read • Read on X

My first #ICML2020 was different from my n-th #acl2020nlp, but, or perhaps because of that, I did try to look for interesting papers that I could relate to but that might still teach me something new!

Papers, in roughly chronological order---each with a short summary :) [1/42]

@flwenz

“How Good is the Bayes Posterior in Deep Neural Networks Really?” (Florian Wenzel/@flwenz, Kevin Roth, @BasVeeling, Jakub Swiatkowsk, Linh Tran, @s_mandt, @JasperSnoek, @TimSalimans, @RJenatton, Sebastian Nowozin)

arxiv.org/abs/2002.02405

https://twitter.com/flwenz/status/1282643835454214144

#ICML2020 [2/42]

@andrewgwils

[“How Good is the Bayes Posterior in Deep Neural Networks Really?” cont.]

As shown in @andrewgwils’ awesome tutorial, tempering works, probably because of bad priors?

#ICML2020 [3/42]

@caglarml

“Improving the Gating Mechanism of Recurrent Neural Networks” (Albert Gu, Caglar Gulcehre/@caglarml, Thomas Paine, Matthew Hoffman, Razvan Pascanu)

arxiv.org/abs/1910.09890

Initialize more randomly (uniform), and saturate faster!

#ICML2020 [4/42]

“ControlVAE: Controllable Variational Autoencoder” (Huajie Shao, Shuochao Yao, Dachun Sun, Aston Zhang, Shengzhong Liu, Dongxin Liu, Jun Wang, Tarek Abdelzaher)

arxiv.org/abs/2004.05988

Do something like PID on the beta of a beta-VAE!

#ICML2020 [5/42]

“A Chance-Constrained Generative Framework for Sequence Optimization” (Xianggen Liu, Jian Peng, Qiang Liu, Sen Song)

proceedings.icml.cc/static/paper_f…

Add Lagrangian for probabilistic grammaticality of molecule strings that requires minimization, iterate back and forth.

#ICML2020 [6/42]

@Sealiqing

“Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning” (Qing Li/@Sealiqing, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu)

arxiv.org/abs/2006.06649

https://twitter.com/Sealiqing/status/1271250743949840384

#ICML2020 [7/42]

[“Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning” cont.]

Search possible parses to agree with the solution of recognized hand-written formulas---better than trying straight-up RL.

#ICML2020 [8/42]

@niru_m

“How recurrent networks implement contextual processing in sentiment analysis” (Niru Maheswaranathan/@niru_m, @SussilloDavid)

arxiv.org/abs/2004.08013

https://twitter.com/niru_m/status/1252358355164622849

#ICML2020 [9/42]

[“How recurrent networks implement contextual processing in sentiment analysis” cont.]

Look at RNNs as the dynamical systems they are---states evolve given input and you can see modifiers excite the state away from manifolds…

#ICML2020 [10/42]

@Michael_D_Moor

“Topological Autoencoders” (@Michael_D_Moor, Max Horn/@ExpectationMax, Bastian Rieck/@Pseudomanifold, Karsten Borgwardt/@kmborgwardt)

arxiv.org/abs/1906.00722

https://twitter.com/Michael_D_Moor/status/1268426319316103173

#ICML2020 [11/42]

[“Topological Autoencoders” cont.]

Use touching time when expanding spheres from data points as an indicator of topology and try to make distance match up in latents for these connections.

#ICML2020 [12/42]

@ziebrah

“Problems with Shapley-value-based explanations as feature importance measures” (Lizzie Kumar/@ziebrah, Suresh Venkatasubramanian/@geomblog, Carlos Scheidegger/@scheidegger, Sorelle Friedler)

arxiv.org/abs/2002.11097

#ICML2020 [13/42]

[“Problems with Shapley-value-based explanations as feature importance measures” cont.]

The game theoretic view and usual constructions are bad in a few small counterexamples, so be careful!

#ICML2020 [14/42]

@zhuohan123

“Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers” (Zhuohan Li/@zhuohan123, @Eric_Wallace_, Sheng Shen/@shengs1123, Kevin Lin/@nlpkevinl, Kurt Keutzer, Dan Klein, Joseph Gonzalez/@mejoeyg)

#ICML2020 [15/42]

https://twitter.com/Eric_Wallace_/status/1235616760595791872

[“Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers” cont.]

arxiv.org/abs/2002.11794

https://twitter.com/Eric_Wallace_/status/1235616760595791872

Bigger trains faster and is more compressible (...as in parameter pruning and quantization).

#ICML2020 [16/42]

@modelanalyst

“The many Shapley values for model explanation” (Mukund Sundararajan/@modelanalyst, Amir Najmi)

arxiv.org/abs/1908.08474

There are many ways to use it and they have different weaknesses and oddities---especially when you go to continuous features (IG!).

#ICML2020 [17/42]

@mblondel_ml

“Fast Differentiable Sorting and Ranking” (Mathieu Blondel/@mblondel_ml, Olivier Teboul, Quentin Berthet/@qberthet, Josip Djolonga)

arxiv.org/abs/2002.08871

https://twitter.com/mblondel_ml/status/1230839757657300994

#ICML2020 [18/42]

[“Fast Differentiable Sorting and Ranking” cont.]

Sorting and in particular ranking are very non-differentiable---their relaxation can be forward-solved in O(n log n)!

#ICML2020 [19/42]

@urialon1

“Structural Language Models of Code” (Uri Alon/@urialon1, @RoySadaka, @omerlevy_, Eran Yahav/@yahave)

arxiv.org/abs/1910.00577

https://twitter.com/urialon1/status/1283019451970007040

Predict AST node given all leaf-to-parent paths, contextualized and attended to by the root-to-parent-path.

#ICML2020 [20/42]

“Proving the Lottery Ticket Hypothesis: Pruning is All You Need” (Eran Malach, Gilad Yehudai, Shai Shalev-Schwartz, Ohad Shamir)

arxiv.org/abs/2002.00585

Provable NN approximation by pruning random weights (not neurons tho) in an only polynomially bigger net!

#ICML2020 [21/42]

@k_neklyudov

“Involutive MCMC: One Way to Derive Them All” (Kirill Neklyudov/@k_neklyudov, Max Welling/@wellingmax, Evgenii Egorov/@eeevgen, Dmitry Vetrov)

arxiv.org/abs/2006.16653

https://twitter.com/k_neklyudov/status/1278278901526138880

#ICML2020 [22/42]

[“Involutive MCMC: One Way to Derive Them All” cont.]

Many MCMC variants can be cast as involution+auxiliary variable. The talk gives a bit of an intuition: involutions preserve stationary density, auxiliary variables help walk through state space.

#ICML2020 [23/42]

@aukejw

“Predictive Sampling with Forecasting Autoregressive Models” (Auke Wiggers/@aukejw, @emiel_hoogeboom)

arxiv.org/abs/2002.09928

https://twitter.com/aukejw/status/1232277986294091776

Guess what a transformer is gonna predict for next outputs, feed that as inputs for subsequent steps, ...

#ICML2020 [24/42]

[“Predictive Sampling with Forecasting Autoregressive Models” cont.]

...then check whether those indeed came out. If yes, advance as many steps as were right, if not, discard and use the actual output as next input and only advance one step.

#ICML2020 [25/42]

@angeloskath

“Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention” (Angelos Katharopoulos/@angeloskath, Apoorv Vyas/@apoorv2904, Nikolaos Pappas/@nik0spapp, @francoisfleuret)

arxiv.org/abs/2006.16236

https://twitter.com/francoisfleuret/status/1267455240007188486

#ICML2020 [26/42]

[“Transformers are RNNs: [...]” cont.]

Instead of softmaxing the dot product, kernelize that distribution/similarity (i.e., feature functions) and use associativity to be twice linear. Works fine sometimes and not so fine other times.

#ICML2020 [27/42]

@CandiceSchumann

“Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics” (Debjani Saha, @CandiceSchumann, @DuncanMcelfresh, @johnpdickerson, Michelle Mazurek/@mmazurek_, Michael Tschantz)

arxiv.org/abs/2001.00089

https://twitter.com/johnpdickerson/status/1283575399037927425

#ICML2020 [28/42]

[“Measuring Non-Expert Comprehension of Machine Learning Fairness Metrics” cont.]

Compare four fairness metrics, ask people to apply, explain, and judge them. Interesting (if moderately discouraging/worrying) results. Food for thought!

#ICML2020 [29/42]

@_AhmedAlaa_

“Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions” (@_AhmedAlaa_, Mihaela van der Schaar/@MihaelaVDS)

arxiv.org/abs/2006.13707

If you would “retrain” RNNs on subsets of training data,[...]

#ICML2020 [30/42]

[“Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions” cont.]

...you’d get confidence intervals---now don’t retrain but adapt parameters to “remove” training data through some Hessian and influence function magic.

#ICML2020 [31/42]

@ropeharz

“Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits” (Robert Peharz/@ropeharz, Steven Lang, Antonio Vergari/@tetraduzione, Karl Stelzner, Alejandro Molina/@alejom_ml, @martin_trapp, [...wow not even the author list fits...]

#ICML2020 [32/42]

@guyvdb

[“Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits” cont.]

...Guy Van den Broeck/@guyvdb, Kristian Kersting/@kerstingAIML, Zoubin Ghahramani)

arxiv.org/abs/2004.06231

https://twitter.com/ropeharz/status/1283131919450988545

#ICML2020 [33/42]

[“Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits”cont.]

Probabilistic circuits (products and weighted sums of distributions) are cool but slow; batching leads to Einsum networks!

#ICML2020 [34/42]

@gh_marjan

“Aligned Cross Entropy for Non-Autoregressive Machine Translation” (Marjan Ghazvininejad/@gh_marjan, Vladimir Karpukhin, @LukeZettlemoyer, @omerlevy_)

arxiv.org/abs/2004.01655

https://twitter.com/gh_marjan/status/1283078292057190402

#ICML2020 [35/42]

[“Aligned Cross Entropy for Non-Autoregressive Machine Translation” cont.]

Find best monotonic alignment (Viterbi) so a NANMT model isn’t coaxed into predicting things multiple times in the hope of getting one in the *right* position.

#ICML2020 [36/42]

“The continuous categorical: a novel simplex-valued exponential family” (Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, John Cunningham)

arxiv.org/abs/2002.08563

A simple distribution over the simplex that is convex, so the mode [...]

#ICML2020 [37/42]

[“The continuous categorical: a novel simplex-valued exponential family” cont.]

...is always at a vertex. Probabilities for vertices are defined and you can’t get infinite interior likelihoods (like for infinitely concentrated Dirichlets), only on vertices.

#ICML2020 [38/42]

@bose_joey

“Latent Variable Modelling with Hyperbolic Normalizing Flows” (Joey Bose/@bose_joey, Ariella Smofsky/@asmoog1, Renjie Liao/@lrjconan, Prakash Panangaden/@prakash127, Will Hamilton/@williamleif)

arxiv.org/abs/2002.06336

https://twitter.com/bose_joey/status/1273087975891574789

#ICML2020 [39/42]

[“Latent Variable Modelling with Hyperbolic Normalizing Flows” cont.]

Build hyperbolic normalizing flows using invertible moves to and from tangent spaces and metric-preserving moves between tangent flows.

#ICML2020 [40/42]

@karlstratos

“Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information” (Karl Stratos/@karlstratos, Sam Wiseman)

arxiv.org/abs/2004.03991

Encode data by maximizing mutual information between its origin and the representation, [...]

#ICML2020 [41/42]

[“Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information” cont.]

...variationally estimating it in a minimax game. An n-gram model over bitstrings as a structured discrete distribution over z gives you efficient inference.

#ICML2020 [42/42]

Alright! (with total count only off-by-one) 💪

Hope I didn’t mess any paper up too much (or missed tagging an author)... 😰 Let me know! ☺️

Looking forward to the next #ICML2020 🤩 [43/42, fin.]

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Enter URL or ID to Unroll

Sabrina J. Mielke

Try unrolling a thread yourself!

More from @sjmielke

Sabrina J. Mielke

Sabrina J. Mielke

Sabrina J. Mielke

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!