Paul Röttger Profile picture
Postdoc @MilaNLProc, researching social values in large language models and LLM safety. Previously PhD @oiioxford & CTO/co-founder @rewire_online
Mar 14, 2023 6 tweets 2 min read
I was part of OpenAI’s red team for GPT-4, testing its ability to generate harmful content.

Working with the model in various iterations over the course of six months convinced me that model safety is the most difficult, and most exciting challenge in NLP right now.

🧵 Safety is hard because models today are general purpose tools. And for nearly every prompt that is safe and useful, there is an unsafe version.

You want the model to write good job ads, but not for some nazi group. Blog posts? Not for terrorists. Chemistry? Not for explosives…
May 2, 2022 9 tweets 4 min read
🥳 New paper accepted at #NAACL2022 (Main) 🥳

NLP tasks like hate speech detection are subjective: annotators disagree about what the correct data labels are. We propose two contrasting paradigms to enable better data annotation.

arxiv.org/pdf/2112.07475…

⬇️ Highlights below ⬇️ ⚠️ We argue that dataset creators should consider annotator subjectivity in the annotation process and either explicitly encourage it or discourage it, depending on the intended use of their dataset ⚠️
As a framework, we propose two contrasting data annotation paradigms: