On NMT Search Errors and Model Errors: Cat Got Your Tongue?
by Felix Stahlberg & Bill Byrne
aclweb.org/anthology/D19-…
Our current NMT models might be "wrong"🚫
👇thread 👇 1/7
The surprise is that in 51.8% this is the empty sequence.
In half of the cases, your model thinks that the best output translation is nothing.
This is not a case of "Even a fool, when he keeps silent, is considered wise".
Longer sequence have lower probability, so while a single EOS token is very unlikely, it is more likely than producing anything long.
This is very disturbing, as we are all playing with models whose best behavior is horrible.
It also points to two unsatisfactory practices:
- inference is done differently than training (beam search)
- length of translations is handled with EOS. Is this the best way?