Sebastian Ruder Profile picture
Research scientist @DeepMindAI • Natural language processing • Transfer learning • Making ML & NLP accessible @eurnlp @DeepIndaba
priyansh agrawal Profile picture 1 added to My Authors
13 Sep 19
It's great to see the growing landscape of NLP transfer learning libraries:
- pytorch-transformers by @huggingface: github.com/huggingface/py…
- spacy-pytorch-transformers by @explosion_ai: github.com/explosion/spac…
- FARM by @deepset_ai github.com/deepset-ai/FARM
@huggingface @explosion_ai @deepset_ai @zalandoresearch @feedly @ai2_allennlp Here's a nice comparison of the target group and core features of pytorch-transformers, spacy-pytorch-transformers, and FARM due to @deepset_ai.
Read 3 tweets
14 Aug 19
A Survey on Cross-lingual Word Embedding Models has been published in @JAIR_Editor. If you're interested in cross-lingual learning, then this should be a good starting point. It covers the history and points to interesting future directions.
jair.org/index.php/jair…
@JAIR_Editor For a more in-depth review that covers the most recent models, you can check out our book Cross-lingual Word Embeddings: morganclaypoolpublishers.com/catalog_Orig/p…
Read 2 tweets
5 Jun 19
Coming up: A live Twitter thread of Session 8B: Machine Learning @NAACLHLT with some awesome papers on vocabulary size, subwords, Bayesian learning, multi-task learning, and inductive biases
@NAACLHLT First paper: How Large a Vocabulary Does Text Classification Need?
A Variational Approach to Vocabulary Selection aclweb.org/anthology/N19-…
@NAACLHLT Wenhu:
- Typically need to predefine vocabulary to get embeddings
- Most common approach: frequency-based cutoff; can lead to under-sized or over-sized vocabulary
Read 49 tweets
13 Sep 18
David Silver on Principles for Reinforcement Learning at the #DLIndaba2018. Important principles that are not only applicable to RL, but to ML research in general. E.g. leaderboard-driven research vs. hypothesis-driven research (see the slides below).
Principle 2. How an algorithm scales is more important than its starting point. Avoid performance ceilings. Deep Learning is successful because it scales so effectively.
Principles are meant to be controversial. I would argue that sample efficiency is at least as important.
Principle 3. Generality (how your algorithm performs on other tasks) is super important. Key is to design a diverse set of challenging tasks.
This. We should evaluate on out of distribution data and new tasks.
Read 10 tweets
20 Jul 18
#Repl4NLP at #ACL2018 panel discussion:
Q: Given that the amount of data and computing power is rapidly increasing, should we just quit working on models altogether?
Yejin: Sounds like a good idea for the companies. The more data the better. Please create more data.
Meg: Different people have different strengths. People say: “We should all care about ethics”. Geek out about what you love. Apply yourself to what you. Lots of other things that come to bear besides just working with data, e.g. sociology, psychology, maths, etc.
Important to focus on what you really love. Work with people that have complimentary and different interests.
Yoav: Personally don’t work on huge data. If some company would like to train a huge LM on the entire web, that’d be great to have and analyze.
Read 39 tweets
5 Jun 18
All-star panel at the generalization in deep learning workshop at @NAACLHLT #Deepgen2018
: "We should have more inductive biases. We are clueless about how to add inductive biases so we do dataset augmentation, create pseudo training data to encode those biases. Seems like a strange way to go about doing things."
Yejin Choi: Language specific inductive bias is necessary to push NLG. Inductive bias as architectural choices. Current biases are not good at going beyond the sentence-level but language is about more than a sentence. We require building a world model.
Read 22 tweets
31 Mar 18
1/ People (mostly people working with Computer Vision) say that CV is ahead of other ML application domains by at least 6 months - 1 year. I would like to explore why this is, if this is something to be concerned about, and what it might take to catch up.
2/ I can’t speak about other application areas, so I will mostly compare CV vs. NLP. This is just a braindump, so feel free to criticize, correct, and disagree.
3/ First, is that really true? For many specialized applications, where task or domain-specific tools are required, such as core NLP tasks (parsing, POS tagging, NER) comparing to another discipline is not meaningful.
Read 16 tweets