👇[1/9]
Diffusion models are trained to denoise images from the web. These images are often vulgar or malicious, and many are potentially risky to use (e.g., copyrighted).
Moreover, many ongoing projects apply diffusion models to private applications such as medical imagery. [2/9]
May 1, 2020 • 10 tweets • 5 min read
We show that adversaries can attack production machine translation systems like Google Translate.
First, train a model to imitate API outputs. Then, transfer adversarial examples from the imitation model.
How to defend?👇 [1/9]
Model "stealing" can be a goal in itself. It allows adversaries to launch their own competitor service or to avoid long-term API costs.
We "steal" black-box MT systems by querying them with monolingual sentences and training "imitation models" on system outputs. [2/9]
Mar 5, 2020 • 6 tweets • 5 min read
Not everyone can afford to train huge neural models. So, we typically *reduce* model size to train/test faster.
However, you should actually *increase* model size to speed up training and inference for transformers.
Why? [1/6] 👇
For most people, the go-to strategy is to train small models because they run fast and use little memory. [2/6]
Sep 18, 2019 • 6 tweets • 3 min read
Most NLP models treat numbers (e.g., “91”) in the same way as other tokens, i.e., they embed them as vectors. Is this a good representation for downstream numerical tasks such as DROP, math questions, etc?
Yes! Pre-trained vectors (BERT, GloVe, ELMo) know numbers.[1 / 6]
We begin by testing QA models on questions that evaluate numerical reasoning (e.g., sorting, comparing, or summing numbers), taken from the DROP dataset. Standard models excel on these types of questions! [2 / 6]