๐ Excited to share our new work:
You have your nice multilingual translation model? Congrats ๐
...
but what do you do if you want to add a new language (e.g., ๐ณ๐ฑ) and don't have parallel data (๐ด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ - ๐ณ๐ฑ) ?
Bonusโจ: you can finally get rid of back-translation
๐งต1/8
If you take a multilingual language model like mBART, add task adapters and fine-tune them with cross-attention for translation โก๏ธ This works well for your supervised pairs, but for your new language ๐ณ๐ฑ, mBART forgets everything it learned before:
2/8
So we added denoising adapters. Our recipe is simple:
0๏ธโฃ take mBART
1๏ธโฃ add adapters for your languages and train them to reconstruct monolingual text
2๏ธโฃ fine-tune cross-attention for translation
3/8
So when you want to translate now from or into ๐ณ๐ฑ, you just need monolingual data and do 1๏ธโฃ. Plug your new nice ๐ณ๐ฑ adapter and off you go
NO catastrophic forgetting ๐
4/8
Your BLEU scores on new languages goes up and performs even better than a supervised bilingual or fine-tuned mBART
5/8
But it gets even better: you can even add languages mBART never saw ๐ง๐พ ๐ฌ๐ท
6/8
โจโจ And here is the bonus: This recipe gives you a way of leveraging monolingual data without using back-translation (BT). No forth-and-back, no need of a good reverse model.
This approach is as good as BT, and can be improved by using it
7/8
Joint work with Alexandre Bรฉrard, @laurent_besacie and @mgalle; at @naverlabseurope (to be presented at #EMNLP2021)
Paper: arxiv.org/abs/2110.10472
8/8
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.