Tweet

Paul Röttger @ NAACL

May 2 • 9 tweets • 4 min read

🥳 New paper accepted at #NAACL2022 (Main) 🥳

NLP tasks like hate speech detection are subjective: annotators disagree about what the correct data labels are. We propose two contrasting paradigms to enable better data annotation.

arxiv.org/pdf/2112.07475…

⬇️ Highlights below ⬇️

⚠️ We argue that dataset creators should consider annotator subjectivity in the annotation process and either explicitly encourage it or discourage it, depending on the intended use of their dataset ⚠️
As a framework, we propose two contrasting data annotation paradigms:

1️⃣ The descriptive paradigm encourages annotator subjectivity to create datasets as granular surveys of individual beliefs
2️⃣ The prescriptive paradigm discourages subjectivity and instead tasks annotators with encoding one specific belief, formulated in the annotation guidelines

Both paradigms have clear use cases:
👪 Researchers may want to model different beliefs about hate speech (→ descriptive paradigm).
📃 A content moderation engineer at a social media company may need models that apply their content policy (→ prescriptive paradigm).

We discuss benefits and challenges of both paradigms ⚖️ For example..
Descriptive annotation gives insights into diverse beliefs but opens up questions about representativeness.
Prescriptive annotation allows us to specify model behaviour but writing good guidelines is difficult!

We also run an annotation experiment with hate speech data, to illustrate the contrast between the paradigms.
We find 1) annotators have very different views about what is hateful, but 2) prescriptive annotation + detailed guidelines can make annotators agree on a specified view!

@bertievidgen

It was great to work on this with @bertievidgen @dirk_hovy and Janet Pierrehumbert🤗

We hope that the paradigms will be useful to anyone creating datasets for subjective NLP tasks! Please get in touch if you want to discuss more 🙏

As always, all experimental code, data and guidelines are on GitHub: github.com/paul-rottger/a…

@MaartenSap

Also, there has been a lot of great work on subjectivity in labelled data from folks like @MaartenSap @aidaa @mitchellgordon @Ginger_in_AI @vinodkpg, who I hope will find this interesting 🤗 Please check out their research!

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Paul Röttger @ NAACL

People who liked this thread also liked...

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?