So happy my first major piece of work at DeepMind is now published!
We consider a problem at the intersection of cogsci, social science and AI - can AI be used as a force for good, to help groups of people who disagree to find consensus?
We generated thousands of political questions and posed them to human participants.
e.g. should there be a tax on sugary foods?
Instead of a poll, people wrote their opinions out and explained their thinking
We sent these opinions into a large language model, and asked it to produce potential consensus statements that capture the group thinking overall.
Sometimes the model-generated consensus’s weren’t great, so we asked the same people to rate how much they agreed with each one
Promoted models were ok, but we knew we could do better!
We trained our model on these human preferences, so that with each new phase of human interactions, it got better and better at producing consensus statements that people like
We find that after training, our model produces statements that people prefer over the best human-written opinions.
And that the model is sensitive to the particular opinions the group has - it hasn’t just aligned to the preferences of a ‘generic’ user, but appreciates that different users can have different beliefs.
There are more details in the paper ofc, but we think this highlights the potential that large language models have as tools for social good, to help humans align their values with one another.
End of 🧵.
• • •
Missing some Tweet in this thread? You can try to
force a refresh