Are all language orders as hard?
Supposedly, for RNNs yes, for Transformers no

@JenniferCWhite @sleepinyourhat
aclanthology.org/2021.acl-long.…

github.com/rycolab/artifi… (currently empty)
#NLProc
Really cool, with a caveat Image
The paper creates synthetic languages (using a PCFG) with various ordering rules, being able to compare each order. Image
They also add agreement, and a vocabulary to introduce more of real language important features (e.g. long-distance dependencies)
Last, the words are taken from words that could have been in English (e.g. daxing)
Their results are the Transformers do have inductive biases, but those are not towards the most probable language orders (e.g. VOS is quite easy for Transformers) Image
Another interesting thing is that RNNs do not show that. Despite all that we know about their strong inductive bias regarding forgetting (hard to relate far away words\information).
Last, it seems that a lot of what makes things easy and hard is related to the agreement, at least from looking at which rules matter together. Image
I really like this direction (which they seem to discuss a lot, seems like a reviewer was quite annoying) and the results are interesting.

However, honestly, I also have a lot of criticism about this work.
A lot of the details are missing. For example, how were the words tokenized? If BPE, then morphology is split to tokens perfectly? If none, then their sophisticated way of choosing words doesn't matter, because they anyway convert the words into one hot vectors?
How large were the vocabularies? What were the parameters of the networks train? There are works saying that depth changes inductive biases and not only arch...
Because of the large number of networks (~128 * 10 reps), they train on really small amounts of data. Hard to say to which amount did that matter, and whether with a small vocab it is reasonable. Still, hard to generalize from the results because of it.
Last, and that is especially peculiar:
"Weights given to each production were chosen manually through experimentation. Some principles for choosing weights for a grammar in this manner are described by Eisner and Smith (2008)"
So, we can't know what type of sentences they create (balanced trees, long sentences, really low weights on something that makes a phenomenon\switch less interesting). And they don't share the method or even the chosen weights.
To sum, this is a very interesting approach, with a lot of potential, that could benefit from more analysis of the results (e.g. per switch), and analysing myself is also problematic as a lot is left out of the paper (nor there is an appendix with the info.)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Leshem Choshen

Leshem Choshen Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @LChoshen

24 Oct
Ever since MAEGE (aclanthology.org/P18-1127/) I have a soft spot for evaluation of evaluation = EoE (especially when they are automatic, but without is still ok).
EoE for style transfer in multiple languages.
@ebriakou, @swetaagrawal20, @Tetreault_NLP, @MarineCarpuat
arxiv.org/pdf/2110.10668…
They end up with the following best practices: Image
Capture formality - XLM-R with regression not classification
Preservation - with chrf not BLEU
Fluency - XLM-R but there is room for improvement
System Ranking - XLM-R and chrf
Crosslingual Transfer - rely on zero shot not machine translation
Read 5 tweets
25 Aug
A product of an unlikely collaboration, which I am thankful for:

When NLP and code researchers meet
Huge 𝙘𝙤𝙢𝙢𝙞𝙩 𝙨𝙪𝙢𝙢𝙖𝙧𝙞𝙯𝙖𝙩𝙞𝙤𝙣 dataset

@HujiIdan and myself
arxiv.org/abs/2108.10763
The dataset cleans tons of open source projects to have only ones with high quality committing habits

(e.g. large active projects with commits that are of significant length etc.)
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(