Joon Sung Park Profile picture
Nov 18 14 tweets 3 min read Read on X
Simulating human behavior with AI agents promises a testbed for policy and the social sciences. We interviewed 1,000 people for two hours each to create generative agents of them. These agents replicate their source individuals’ attitudes and behaviors. 🧵 arxiv.org/abs/2411.10109Image
When we presented generative agents last year, we pointed to a future where we can simulate life to understand ourselves better in situations where direct engagement or observation is impossible (e.g., health policies, product launches, or external shocks). (2/14) Image
But we felt our story was incomplete: to trust these simulations, they ought to avoid flattening agents to demographic stereotypes, and measurement of their accuracy needs to advance beyond replication success or failure on average treatment effects. (3/14)
We found our answer in models of individuals—creating generative agents that reflect real individuals and validating them by measuring how well they replicate the individual's responses to the General Social Survey, Big Five Personality tests, economic games, and RCTs. (4/14)
To achieve this, we turned to a foundational social science method: interviews. We developed a real-time, voice-to-voice AI interviewer that conducted two-hour, semi-structured interviews to teach us about these individuals’ lives and beliefs. (5/14) Image
Our finding: the agents perform well. They replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers two weeks later, and perform comparably in predicting personality traits and experimental outcomes. (6/14)
In addition, our interview-based agents reduce accuracy biases across racial and ideological groups compared to agents provided with demographic descriptions. We attribute this to the agents in our study reflecting the myriad idiosyncratic factors of real individuals. (7/14)
In sum, this work opens the door to simulating individuals. We believe that accurately modeling the individuals who make up our society ought to be the foundation of simulations. The resulting agent bank of 1,000 generative agents will further facilitate this function. (8/14)
At the same time, this work points to the beginning of an era in which generative agents can represent real people. This ought to bring both excitement and concerns: how can we balance the potential benefits while safeguarding individuals' representation and agency? (9/14)
We spent countless hours discussing ethics with the team, the IRB, and participants. Here’s what we believe: systems hosting generative agents of real people must, at a minimum, support usage audits, provide withdrawal options, and respect individuals' consent and agency. (10/14)
So, to support research while protecting participant privacy, we (Stanford authors) plan to offer a two-pronged access system in the coming months: 1) open access to aggregated responses on fixed tasks, and 2) restricted access to individual responses on open tasks. (11/14)
For those interested, here is an open-source repository and a Python package for this work:
Github:

(While we are not releasing the participant data, I have included my personal generative agent in the repo. :)) (12/14)github.com/joonspk-resear…
In closing, doing great interdisciplinary work that respects the tradition and rigor of each field is beyond any one person. This work would not have been possible without an all-star team that embodied its interdisciplinary nature, intersecting AI and social sciences. (13/14)
Thank you to my coauthors, @msbernst, @percyliang, @RobbWiller, @cqzou, @aaronshaw, @makoshark, @merrierm, @carriejcai. And thank you @KolluriAkaash for helping out with the open source release, and to @StanfordHCI and @StanfordNLP for fostering this work. (14/14)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Joon Sung Park

Joon Sung Park Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @joon_s_pk

Aug 30, 2022
Our new research estimates that *one in twenty* comments on Reddit are violations of its norms: anti-social behaviors that most subreddits try to moderate. But almost none are moderated.

🧵 on my upcoming #cscw2022 paper w/ @josephseering and @msbernst: arxiv.org/abs/2208.13094
First, what does this mean? It means if you are scrolling through a post on Reddit, in a single scroll, you will likely see at least one comment that exemplifies bad behaviors such as personal attacks or bigotry that most communities would choose not to see. (2/13)
So let’s get into the details. What did we measure exactly? We measured the proportion of unmoderated comments in the 97 most popular subreddits that are violations of one of its platform norms that most subreddits try to moderate (e.g., personal attacks, bigotry). (3/13)
Read 13 tweets
Aug 11, 2022
How might an online community look after many people join? My paper w/ @lindsaypopowski @Carryveggies @merrierm @percyliang @msbernst introduces "social simulacra": a method of generating compelling social behaviors to prototype social designs 🧵
arxiv.org/abs/2208.04024 #uist2022
You can see some of its generated behaviors—posts, replies, trolls—in our demo here: social-simulacra.herokuapp.com

E.g., say you are creating a new community for discussing a StarWar game with a few rules. Given this description, our tool generated a simulacrum like this: (2/10) A screenshot of a synthetic...
Why are these useful? In social computing design, understanding our design decisions’ impact is hard since many challenges do not arise until a system is populated by *many*. Think about: newcomers with unintentional norm-breaking, trolling, or other antisocial behaviors (3/10)
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(