Postdoc @MilaNLProc, researching social values in large language models and LLM safety. Previously PhD @oiioxford & CTO/co-founder @rewire_online
Mar 14, 2023 • 6 tweets • 2 min read
I was part of OpenAI’s red team for GPT-4, testing its ability to generate harmful content.
Working with the model in various iterations over the course of six months convinced me that model safety is the most difficult, and most exciting challenge in NLP right now.
NLP tasks like hate speech detection are subjective: annotators disagree about what the correct data labels are. We propose two contrasting paradigms to enable better data annotation.
⬇️ Highlights below ⬇️
⚠️ We argue that dataset creators should consider annotator subjectivity in the annotation process and either explicitly encourage it or discourage it, depending on the intended use of their dataset ⚠️
As a framework, we propose two contrasting data annotation paradigms: