The tutorial slides can be found here: deep-spin.github.io/tutorial/acl.p…

updates here 👇👇

#ACL2019nlp

*end to end models learn latent continuous vectors that are useful for downstream tasks. which might not be as interpretable as structured hidden representations.

Those models are mostly trained in using EM with some strict assumptions.

1. Pre-train external classifier (over cont. latent)

2. Mulঞ-task learning

3. Stochastic latent variables

4. Gradient surrogates

5. Conঞnuous relaxaঞon (Gumbel softmax)

Mainly for two problems high variance and coadaptation.

"Control variates" a.k.a baselines are a common way to reduce the variance of reinforcement learning.

The straight-through estimator, the reparametrization trick, the Gumbel softmax trick.

is in the fwd path you pretend "z" is discrete so you can get discrete decisions. in the backward path you pretend "z" is continuous and propagate error gradients on it.

* the reparameterization trick [Kingma and Welling, 2014]

* the gumbel softmax trick

I am going to skip details but here's a nice read about it:

casmls.github.io/general/2017/0…

and the original paper Jang et al. ICLR 2017

arxiv.org/pdf/1611.01144…

* RL (REINFORCE) unbiased high variance estimators of gradient loss

* Gradient surrogates (Gumbel softmax): biased - low variance approximates of gradient loss

Here are some examples from #nlproc literature for different structure prediction problems.

Structured Attention Networks Kim et al. 2017

arxiv.org/abs/1702.00887

Learning Structured Text Representations

arxiv.org/pdf/1705.09207…

pros:

* familiar to NLPers

* all computations are exact so no approximation

cons:

forward pass marginals are dense we cannot expect they give a tree, wrong paths might still have > 0 probs.

* back-prop through Dynamic programming is tricky

unrelated to the training signal. I wonder if this might also be why we don't converge yet to correct grammatical structures

Andre F. T. Martins

@tsvetomila

@meloncholist

@vnfrombucharest

Check also their paper Monday 1:50pm in the poster session.

"Sparse Sequence-to-Sequence Models"

arxiv.org/pdf/1905.05702…

ful thread here.