Xiang Zhou Profile picture
Jul 1, 2022 8 tweets 5 min read Read on X
Happy to share our #NAACL2022 paper: “Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?”

🐭MPoSM models long-term & bidirectional tag dependency.

arxiv.org/abs/2206.14969

w/ @byryuer @mohitban47

Join us in Seattle (oral session6B July12)
🧵 Image
Previous Part-Of-Speech (POS) induction models usually assume certain independence assumptions (e.g., Markov, unidirectional, local dependency) that do not hold in real languages. For example, the subject-verb agreement can be both long-term and bidirectional. Image
Our Masked Part-Of-Speech Model (🐭MPoSM (pronounced as m-possum)) is inspired by masked language modeling. It has 2 parts: a Local POS Prediction module, and a Masked POS Reconstruction module. Through the reconstruction objective, it models arbitrary tag dependencies. Image
MPoSM achieves competitive results on both the English Penn WSJ dataset & the universal treebank containing 10 languages (de en es fr id it ja ko pt-br sv). It reaches 77.9 M1 on the WSJ dataset. Similar to the recent work by Gupta et al., it can also leverage the power of mBERT. ImageImage
Back to the title: Does Modeling Long Context Help Unsupervised POS-tagging?

Surprisingly, we see mixed results.

This is interesting because ideally modeling a longer context should help in most cases (e.g. the long-term subject-verb agreement in the picture below). ImageImage
To better understand this phenomenon, we design a synthetic experiment that specifically diagnoses the ability to learn tag agreement. Surprisingly, strong baselines fail to solve this problem consistently, even in a very simplified setting: the agreement between adjacent words. Image
In this experiment, MPoSM achieves overall better performance but still not perfect (indicating the optimization challenge in learning agreements).

Lastly, we conduct a detailed error analysis to shed light on other challenges (cluster size differences, dataset biases, etc.). ImageImage
Our code is available at github.com/owenzx/MPoSM

We will also do an in-person oral presentation at @naaclmeeting during 16:15 – 17:45 PST on July 12.

See you in Seattle!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Xiang Zhou

Xiang Zhou Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(