Sebastian Ruder Profile picture
Research scientist @GoogleAI • Natural language processing • Transfer learning • Multilinguality • Blog: https://t.co/naxDPsILJU
Apr 23, 2021 8 tweets 3 min read
Really enjoyed today’s ML for NLP session at @eaclmeeting. It included 3 papers on task-agnostic methods to improve pre-trained models for downstream tasks via:

- test-time adaptation w/ meta-learning
- many robust classification heads
- combining adapters.

Here are some notes: @eaclmeeting Keep Learning: Self-supervised Meta-learning for Learning from Inference

Fine-tunes model on most confident predictions at test time. Class balanced filtering, meta-learning, and regularization to pre-trained weights are all important.

aclweb.org/anthology/2021…
Jun 5, 2019 49 tweets 15 min read
Coming up: A live Twitter thread of Session 8B: Machine Learning @NAACLHLT with some awesome papers on vocabulary size, subwords, Bayesian learning, multi-task learning, and inductive biases @NAACLHLT First paper: How Large a Vocabulary Does Text Classification Need?
A Variational Approach to Vocabulary Selection aclweb.org/anthology/N19-…
Sep 13, 2018 10 tweets 4 min read
David Silver on Principles for Reinforcement Learning at the #DLIndaba2018. Important principles that are not only applicable to RL, but to ML research in general. E.g. leaderboard-driven research vs. hypothesis-driven research (see the slides below). Principle 2. How an algorithm scales is more important than its starting point. Avoid performance ceilings. Deep Learning is successful because it scales so effectively.
Principles are meant to be controversial. I would argue that sample efficiency is at least as important.
Jul 20, 2018 39 tweets 6 min read
#Repl4NLP at #ACL2018 panel discussion:
Q: Given that the amount of data and computing power is rapidly increasing, should we just quit working on models altogether?
Yejin: Sounds like a good idea for the companies. The more data the better. Please create more data. Meg: Different people have different strengths. People say: “We should all care about ethics”. Geek out about what you love. Apply yourself to what you. Lots of other things that come to bear besides just working with data, e.g. sociology, psychology, maths, etc.
Jun 5, 2018 22 tweets 4 min read
All-star panel at the generalization in deep learning workshop at @NAACLHLT #Deepgen2018 : "We should have more inductive biases. We are clueless about how to add inductive biases so we do dataset augmentation, create pseudo training data to encode those biases. Seems like a strange way to go about doing things."
Mar 31, 2018 16 tweets 3 min read
1/ People (mostly people working with Computer Vision) say that CV is ahead of other ML application domains by at least 6 months - 1 year. I would like to explore why this is, if this is something to be concerned about, and what it might take to catch up. 2/ I can’t speak about other application areas, so I will mostly compare CV vs. NLP. This is just a braindump, so feel free to criticize, correct, and disagree.