Check out our new awesome word aligner, AWESOME aligner by @ZiYiDou 😀: github.com/neulab/awesome…
* Uses multilingual BERT and can align sentences in all included languages
* No additional training needed, so you can align even a single sentence pair!
* Excellent accuracy 1/3
A paper describing the methodology will appear at #EACL2021: arxiv.org/abs/2101.08231
The model is trained on parallel data using contrastive and self-training losses. But it generalizes zero-shot to new language pairs without any training data! 2/3
Why do we need word alignments in the first place? We use them for lexicon learning, model analysis, and cross-lingual learning. For example, AWESOME aligner results in better cross-lingual results for annotation projection in NER. Check it out and we welcome comments/issues! 3/3
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Super-excited about our new #ICASSP2020 paper on "Universal Phone Recognition with a Multilingual Allophone System" arxiv.org/abs/2002.11800
We create a multi-lingual ASR model that can do zero-shot phone recognition in up to 2,186 languages! How? A little linguistics :) 1/5
In our speech there are phonemes (sounds that can support lexical contrasts in a *particular* language) and their corresponding phones (the sounds that are actually spoken, which are language *independent*). Most multilingual ASR models conflate these two concepts. 2/5
We create a model that first recognizes to language-independent phones, and then converts these phones to language-specific phonemes. This makes our underlying representations of phones more universal and generalizable across languages. 3/5