Thread by @jacobandreas on Thread Reader App

previous SOTA is a CRF with neural potentials; here we're just going to train greedily (?) and Viterbi decode at test

NN is not smart enough to learn what well-formed BIO taggings look like, so add hard constraints

8 layers of LSTMs!

SOTA results with this simple model. What mistakes does it make? Do we still need syntax?

mistakes are things that people find hard: argument vs. adjunct, PP attachment

next, add a constraint requiring outputs to be well-formed constituents

this helps a lot with gold syntax, but not much with modern parsers---that last 6 F1 matters!

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!