Iyad Rahwan | إياد رهوان Profile picture
Sep 17 8 tweets 3 min read Read on X
Would you let AI cheat for you?

Our new paper in @nature.com, 5 years in the making, is out today.

nature.com/articles/s4158…Image
As we delegate more hiring, firing, pricing and investing decisions to machine agents, particularly LLMs, we need to understand what ethical risks it may entail.

Our new research, based on 13 studies involving over 8,000 participants and commonly used LLMs, reveals two risks of how machine delegation can drive dishonesty and highlights strategies for risk mitigation.
⚠️ A Risk to Our Own Intentions: Delegation increases dishonesty.

People are more likely to request dishonest behaviour when they can delegate the action to an AI. This effect was especially pronounced when the interface allowed for ambiguity in the agent’s behaviour. Image
E.g., when participants could set a high-level goal like "maximise profit" rather than specifying explicit rules, the percentage of people acting honestly plummeted from 95% (in self-reports) to as low as 12%. Image
⚠️ A Risk from the Agent's Behaviour: Machine agents are more compliant The second risk lies with the AI’s themselves 🤖.

When given blatantly unethical instructions, AI agents were far more likely to comply than human agents.
In our studies, prominent LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet, and Llama 3.3) complied with requests for full cheating 58-98% of the time. In sharp contrast, human agents, even when incentivised to comply, refused such requests more than half the time, complying in only 25-40% of the time.Image
🚧 The Guardrail Problem

Built-in LLM safeguards are insufficient to prevent this kind of misuse. We tested various guardrail strategies and found that highly specific prohibitions on cheating inserted at the user-level are the most effective. However, this solution isn't scalable nor practical.Image
🧭 The Path Forward

Our findings point to several crucial steps:

✅ Design for accountability: Interfaces should be designed to reduce moral ambiguity and prevent users from easily offloading responsibility.

✅ Preserve user autonomy: A remarkable 74% of our participants preferred to do these tasks themselves after trying delegation. Ensuring people retain the choice not to delegate is an important design consideration.

✅ Develop robust safeguards & oversight: We urgently need better technical guardrails against requests for unethical behaviour and strong regulatory oversight.
Thanks to the combined efforts of lead co-authors
@NCKobis and Zoe Rahwan, in addition to
Jean-Francois Bonnefon, Raluca Rilla, Bramantyo Supriyatno, Tamer Ajaj and Clara Bensch. Thank you to all the support from @mpib_berlin @Max_Planck_CHM @arc_mpib @maxplanckpress

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Iyad Rahwan | إياد رهوان

Iyad Rahwan | إياد رهوان Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @iyadrahwan

Jun 14, 2022
New paper:

Using Facebook's advertising infrastructure (i.e. Surveillance Capitalism) to measure culture at the scale of 2 Billion humans

Brief thread 🧵👇

royalsocietypublishing.org/doi/full/10.10…
Culture has played a pivotal role in human evolution, as @JoHenrich summarized in his amazing book The Secret of Our Success. See his Google talk for a summary.

Yet, the ability of social scientists to study culture is limited by the currently available measurement instruments.
Read 19 tweets
Mar 4, 2020
Moral Machine Experiment (MME) results challenged!

New study by @kurtjgray & @yo_bigman published today in @nature along with our reply (@EdmondAwad @SohanDsouza R. Kim @azimshariff @JFBonnefon @JF_Schulz @JoHenrich).

(thread)

Both papers here: rdcu.be/b2v6R
The Moral Machine Experiment always gave people two options: should a self driving car go straight, killing one group of people, or should it swerve, killing another group? Focus was on cases of unavoidable harm.
This 'forced choice' paradigm allowed us to estimate preferences in favor of saving / sacrificing different groups (e.g. humans vs. animals, men vs. women, children vs. elderly, high-status vs low-status individuals).
Read 15 tweets
Jan 21, 2020
Ethicists, Moralists, Psychologists, and Trolleyologists! I am pleased to announce our latest paper: "Universals and variations in moral decisions made in 42 countries by 70,000 participants" just out in @PNASNews

pnas.org/content/early/…

(thread)
In our "Moral Machine" platform, we added "classic" mode to collect data on the three most-studied variants of the Trolley Problem: classic "pull the switch", "push the fat man over the footbridge", and the "loop" case designed to eliminate the means vs. side effect confound
We were able to collect what is arguably the most extensive data sets on the Trolley Problem available to-date, with 70,000 people from 42 countries
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(