The tutorial slides can be found here: deep-spin.github.io/tutorial/acl.p…
updates here 👇👇
#ACL2019nlp
*end to end models learn latent continuous vectors that are useful for downstream tasks. which might not be as interpretable as structured hidden representations.
Those models are mostly trained in using EM with some strict assumptions.
1. Pre-train external classifier (over cont. latent)
2. Mulঞ-task learning
3. Stochastic latent variables
4. Gradient surrogates
5. Conঞnuous relaxaঞon (Gumbel softmax)
Mainly for two problems high variance and coadaptation.
"Control variates" a.k.a baselines are a common way to reduce the variance of reinforcement learning.
The straight-through estimator, the reparametrization trick, the Gumbel softmax trick.
is in the fwd path you pretend "z" is discrete so you can get discrete decisions. in the backward path you pretend "z" is continuous and propagate error gradients on it.
* the reparameterization trick [Kingma and Welling, 2014]
* the gumbel softmax trick
I am going to skip details but here's a nice read about it:
casmls.github.io/general/2017/0…
and the original paper Jang et al. ICLR 2017
arxiv.org/pdf/1611.01144…
* RL (REINFORCE) unbiased high variance estimators of gradient loss
* Gradient surrogates (Gumbel softmax): biased - low variance approximates of gradient loss
Here are some examples from #nlproc literature for different structure prediction problems.
Structured Attention Networks Kim et al. 2017
arxiv.org/abs/1702.00887
Learning Structured Text Representations
arxiv.org/pdf/1705.09207…
pros:
* familiar to NLPers
* all computations are exact so no approximation
cons:
forward pass marginals are dense we cannot expect they give a tree, wrong paths might still have > 0 probs.
* back-prop through Dynamic programming is tricky
unrelated to the training signal. I wonder if this might also be why we don't converge yet to correct grammatical structures
Andre F. T. Martins
@tsvetomila
@meloncholist
@vnfrombucharest
Check also their paper Monday 1:50pm in the poster session.
"Sparse Sequence-to-Sequence Models"
arxiv.org/pdf/1905.05702…
ful thread here.