🧵If you're using GPT3 for multilingual applications, read this!
i took a 555 word/2928 char English text. It becomes 706 GPT tokens.
Chinese version of the same text: 2170 tokens. (3x 🇬🇧)
Hindi: 4480 tokens (6x 🇬🇧)
Implications 👇
First off, I only tested this on a very small amount of text so I'm not sure if this is a general trend. The actual token efficiency difference number will definitely be different than these numbers.
Also, the Chinese and English text are professionally translated parallel texts.
The Hindi was just translated with Google Translate using the English text.