Ahmet รœstรผn Profile picture
Research Scientist at Cohere For AI (@CohereForAI, @CohereAI). Formerly @GroNlp, @naverlabseurope.

Oct 22, 2021, 8 tweets

๐Ÿ“œ Excited to share our new work:

You have your nice multilingual translation model? Congrats ๐ŸŽ‰
...
but what do you do if you want to add a new language (e.g., ๐Ÿ‡ณ๐Ÿ‡ฑ) and don't have parallel data (๐Ÿด๓ ง๓ ข๓ ฅ๓ ฎ๓ ง๓ ฟ - ๐Ÿ‡ณ๐Ÿ‡ฑ) ?
Bonusโœจ: you can finally get rid of back-translation

๐Ÿงต1/8

If you take a multilingual language model like mBART, add task adapters and fine-tune them with cross-attention for translation โžก๏ธ This works well for your supervised pairs, but for your new language ๐Ÿ‡ณ๐Ÿ‡ฑ, mBART forgets everything it learned before:

2/8

So we added denoising adapters. Our recipe is simple:
0๏ธโƒฃ take mBART
1๏ธโƒฃ add adapters for your languages and train them to reconstruct monolingual text
2๏ธโƒฃ fine-tune cross-attention for translation

3/8

So when you want to translate now from or into ๐Ÿ‡ณ๐Ÿ‡ฑ, you just need monolingual data and do 1๏ธโƒฃ. Plug your new nice ๐Ÿ‡ณ๐Ÿ‡ฑ adapter and off you go

NO catastrophic forgetting ๐Ÿ˜Ž

4/8

Your BLEU scores on new languages goes up and performs even better than a supervised bilingual or fine-tuned mBART

5/8

But it gets even better: you can even add languages mBART never saw ๐Ÿ‡ง๐Ÿ‡พ ๐Ÿ‡ฌ๐Ÿ‡ท

6/8

โœจโœจ And here is the bonus: This recipe gives you a way of leveraging monolingual data without using back-translation (BT). No forth-and-back, no need of a good reverse model.
This approach is as good as BT, and can be improved by using it

7/8

Joint work with Alexandre Bรฉrard, @laurent_besacie and @mgalle; at @naverlabseurope (to be presented at #EMNLP2021)
Paper: arxiv.org/abs/2110.10472

8/8

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling