You have your nice multilingual translation model? Congrats 🎉
...
but what do you do if you want to add a new language (e.g., 🇳🇱) and don't have parallel data (🏴 - 🇳🇱) ?
Bonus✨: you can finally get rid of back-translation
🧵1/8
If you take a multilingual language model like mBART, add task adapters and fine-tune them with cross-attention for translation ➡️ This works well for your supervised pairs, but for your new language 🇳🇱, mBART forgets everything it learned before:
2/8
So we added denoising adapters. Our recipe is simple:
0️⃣ take mBART
1️⃣ add adapters for your languages and train them to reconstruct monolingual text
2️⃣ fine-tune cross-attention for translation
3/8
So when you want to translate now from or into 🇳🇱, you just need monolingual data and do 1️⃣. Plug your new nice 🇳🇱 adapter and off you go
NO catastrophic forgetting 😎
4/8
Your BLEU scores on new languages goes up and performs even better than a supervised bilingual or fine-tuned mBART
5/8
But it gets even better: you can even add languages mBART never saw 🇧🇾 🇬🇷
6/8
✨✨ And here is the bonus: This recipe gives you a way of leveraging monolingual data without using back-translation (BT). No forth-and-back, no need of a good reverse model.
This approach is as good as BT, and can be improved by using it
7/8