Read on Twitter

Sebastian Ruder @seb_ruder

, 22 tweets, 4 min read Read on Twitter

@NAACLHLT

@NAACLHLT

All-star panel at the generalization in deep learning workshop at @NAACLHLT #Deepgen2018

: "We should have more inductive biases. We are clueless about how to add inductive biases so we do dataset augmentation, create pseudo training data to encode those biases. Seems like a strange way to go about doing things."

Yejin Choi: Language specific inductive bias is necessary to push NLG. Inductive bias as architectural choices. Current biases are not good at going beyond the sentence-level but language is about more than a sentence. We require building a world model.

A promising direction in this line: Memory networks, entity networks.
Chris: Inductive bias is not language-specific; entities need to be tracked e.g. by animals for perception. Early work in recursive neural networks can also be applied to vision.

Yejin: World models applicable to other parts, but for the moment, developing models for language (vs. language + perception, etc.) is most feasible.
Sam: Not aware of any recent work that uses a more linguistic-oriented inductive bias that works in practice on a benchmark task.

Percy: Cycle of new paradigms, which throw out previous extensions and start from scratch (rules --> DL).
Q: Is there an endless cycle of collecting data, finding biases, trying to address these biases?
Sam: Building datasets to make vague intuitive definitions more concrete.

Hard to define tasks precisely enough to break out of cycle.
Yejin: Revisit how dataset is constructed (balance, counteract biases). Come up with algorithms that generate datasets.
Chris: Problem is lack of education. In psych, students spend a lot of time working on exp design.

Standard CS degree does not have a single lecture on how to design an experiment. In grad school, vast % of PhD students have to do experiments w/o having ever been taught on how to do this before. Problems are unsurprising.

Yejin: Not clear whether to maintain natural distribution of data, balance and modify distribution. Important to run baselines and analyze biases of data, modify data distribution if too easy.
Percy: 6 months ago really worried about bias. Not worried about it anymore.

Hard to rule out bias in general. With 100% accuracy, you would get everything correct anyway. Better to improve models than to eradicate bias. ML is used when we don't know what's going on (vs. rules).

Sam: Cannot break out of cycle of experimental design (RQ --> experiments --> results). We can do better but cannot break out of cycle.
Devi: Synthetic, complex datasets are also useful to build models that do certain things (e.g. CLEVR).

Chris: Psychology is too obsessed with controlled experiments vs. more natural data in ML/NLP.
Indigenous NLP tradition is being replaced by ML tradition in last decade. ML people require i.i.d. data. Should not use i.i.d. data, but from a different distribution.

Requirement of i.i.d. data came through ML empiricism. Linguistic data is not i.i.d., any text is associated with other pieces of text, metadata, etc. Better to have models that generalize on data that is not i.i.d. with train data.

Important to return to indigenous NLP tradition and ignore some things that ML has brought into NLP.
Q: Other ways to induce inductive bias besides architecture/data?
Yejin: Evaluation metrics like BLEU/ROUGE are not that meaningful; important to do more human evaluations.

Sam: On SQuAD, performance drop even if students ask factoid questions (vs. MTurk workers in test data).
Chris: Humans are not at constructing auxiliary tasks.
Percy: Summarization model trained with human in the loop on lots of examples would get human performance.

Still pretty far from language understanding.
Q: Will pre-trained models will be used in all NLP tasks in the next years?
Percy: Room for pre-trained representation for some tasks; for most tasks, we will need to go beyond that.
Q: Should people release more challenge datasets?

Percy: Numbers will look really low. Possibly tag examples with phenomenon/difficulty. Problem is that we don't have challenging datasets that are big enough (for training).
Devi: Don't want to lose easy knowledge. E.g. for VQA, not clear what problems subsume each other.

Q: Possible to have declarative representations to build NLU upon?
Yejin: Possible. Shouldn't repeat what people have tried with symbolic logic. Model could encode natural language in knowledge representation.

Percy: Knowledge representations are quite different from current, task-oriented mentality. Worth investigating.
Q: For MultiNLI, small gap btw in-domain and out-of-domain. Do we learn more about way annotators generate training examples than natural language phenomena?

Sam: Brittleness of models is not with regard to overfitting to genre.
Q: How do we evaluate the abstractiveness of NLG systems?
Yejing: Good question. Might want to measure whether summary compresses text well through rewriting vs. substitution.

Sam: Abstractiveness is used as measure in new Newsroom corpus (NAACL-HLT 2018).

Fin.

Like this thread? Get email updates or save it to PDF!

Subscribe to Sebastian Ruder

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Sebastian Ruder

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

More from @seb_ruder see all

Related threads

Trending hashtags

Did Thread Reader help you today?