🥳 New paper accepted at #NAACL2022 (Main) 🥳

NLP tasks like hate speech detection are subjective: annotators disagree about what the correct data labels are. We propose two contrasting paradigms to enable better data annotation.

arxiv.org/pdf/2112.07475…

⬇️ Highlights below ⬇️
⚠️ We argue that dataset creators should consider annotator subjectivity in the annotation process and either explicitly encourage it or discourage it, depending on the intended use of their dataset ⚠️
As a framework, we propose two contrasting data annotation paradigms:
1️⃣ The descriptive paradigm encourages annotator subjectivity to create datasets as granular surveys of individual beliefs
2️⃣ The prescriptive paradigm discourages subjectivity and instead tasks annotators with encoding one specific belief, formulated in the annotation guidelines
Both paradigms have clear use cases:
👪 Researchers may want to model different beliefs about hate speech (→ descriptive paradigm).
📃 A content moderation engineer at a social media company may need models that apply their content policy (→ prescriptive paradigm).
We discuss benefits and challenges of both paradigms ⚖️ For example..
Descriptive annotation gives insights into diverse beliefs but opens up questions about representativeness.
Prescriptive annotation allows us to specify model behaviour but writing good guidelines is difficult!
We also run an annotation experiment with hate speech data, to illustrate the contrast between the paradigms.
We find 1) annotators have very different views about what is hateful, but 2) prescriptive annotation + detailed guidelines can make annotators agree on a specified view!
It was great to work on this with @bertievidgen @dirk_hovy and Janet Pierrehumbert🤗
We hope that the paradigms will be useful to anyone creating datasets for subjective NLP tasks! Please get in touch if you want to discuss more 🙏

As always, all experimental code, data and guidelines are on GitHub: github.com/paul-rottger/a…
Also, there has been a lot of great work on subjectivity in labelled data from folks like @MaartenSap @aidaa @mitchellgordon @Ginger_in_AI @vinodkpg, who I hope will find this interesting 🤗 Please check out their research!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Paul Röttger @ NAACL

Paul Röttger @ NAACL Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(