Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Iyad Rahwan | إياد رهوان

@iyadrahwan

Sep 17 • 8 tweets • 3 min read • Read on X

Scrolly

Would you let AI cheat for you?

Our new paper in @nature.com, 5 years in the making, is out today.

nature.com/articles/s4158…

As we delegate more hiring, firing, pricing and investing decisions to machine agents, particularly LLMs, we need to understand what ethical risks it may entail.

Our new research, based on 13 studies involving over 8,000 participants and commonly used LLMs, reveals two risks of how machine delegation can drive dishonesty and highlights strategies for risk mitigation.

⚠️ A Risk to Our Own Intentions: Delegation increases dishonesty.

People are more likely to request dishonest behaviour when they can delegate the action to an AI. This effect was especially pronounced when the interface allowed for ambiguity in the agent’s behaviour.

E.g., when participants could set a high-level goal like "maximise profit" rather than specifying explicit rules, the percentage of people acting honestly plummeted from 95% (in self-reports) to as low as 12%.

⚠️ A Risk from the Agent's Behaviour: Machine agents are more compliant The second risk lies with the AI’s themselves 🤖.

When given blatantly unethical instructions, AI agents were far more likely to comply than human agents.
In our studies, prominent LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet, and Llama 3.3) complied with requests for full cheating 58-98% of the time. In sharp contrast, human agents, even when incentivised to comply, refused such requests more than half the time, complying in only 25-40% of the time.

🚧 The Guardrail Problem

Built-in LLM safeguards are insufficient to prevent this kind of misuse. We tested various guardrail strategies and found that highly specific prohibitions on cheating inserted at the user-level are the most effective. However, this solution isn't scalable nor practical.

🧭 The Path Forward

Our findings point to several crucial steps:

✅ Design for accountability: Interfaces should be designed to reduce moral ambiguity and prevent users from easily offloading responsibility.

✅ Preserve user autonomy: A remarkable 74% of our participants preferred to do these tasks themselves after trying delegation. Ensuring people retain the choice not to delegate is an important design consideration.

✅ Develop robust safeguards & oversight: We urgently need better technical guardrails against requests for unethical behaviour and strong regulatory oversight.

Thanks to the combined efforts of lead co-authors
@NCKobis and Zoe Rahwan, in addition to
Jean-Francois Bonnefon, Raluca Rilla, Bramantyo Supriyatno, Tamer Ajaj and Clara Bensch. Thank you to all the support from @mpib_berlin @Max_Planck_CHM @arc_mpib @maxplanckpress

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @iyadrahwan

Iyad Rahwan | إياد رهوان

@iyadrahwan

Jun 14, 2022

New paper:

Using Facebook's advertising infrastructure (i.e. Surveillance Capitalism) to measure culture at the scale of 2 Billion humans

Brief thread 🧵👇

royalsocietypublishing.org/doi/full/10.10…

@JoHenrich

Culture has played a pivotal role in human evolution, as @JoHenrich summarized in his amazing book The Secret of Our Success. See his Google talk for a summary.

Yet, the ability of social scientists to study culture is limited by the currently available measurement instruments.

Read 19 tweets

Iyad Rahwan | إياد رهوان

@iyadrahwan

Mar 4, 2020

@kurtjgray

Moral Machine Experiment (MME) results challenged!

New study by @kurtjgray & @yo_bigman published today in @nature along with our reply (@EdmondAwad @SohanDsouza R. Kim @azimshariff @JFBonnefon @JF_Schulz @JoHenrich).

(thread)

Both papers here: rdcu.be/b2v6R

The Moral Machine Experiment always gave people two options: should a self driving car go straight, killing one group of people, or should it swerve, killing another group? Focus was on cases of unavoidable harm.

This 'forced choice' paradigm allowed us to estimate preferences in favor of saving / sacrificing different groups (e.g. humans vs. animals, men vs. women, children vs. elderly, high-status vs low-status individuals).

Read 15 tweets

Iyad Rahwan | إياد رهوان

@iyadrahwan

Jan 21, 2020

@PNASNews

Ethicists, Moralists, Psychologists, and Trolleyologists! I am pleased to announce our latest paper: "Universals and variations in moral decisions made in 42 countries by 70,000 participants" just out in @PNASNews

pnas.org/content/early/…

(thread)

In our "Moral Machine" platform, we added "classic" mode to collect data on the three most-studied variants of the Trolley Problem: classic "pull the switch", "push the fat man over the footbridge", and the "loop" case designed to eliminate the means vs. side effect confound

We were able to collect what is arguably the most extensive data sets on the Trolley Problem available to-date, with 70,000 people from 42 countries

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Iyad Rahwan | إياد رهوان

Try unrolling a thread yourself!

More from @iyadrahwan

Iyad Rahwan | إياد رهوان

Iyad Rahwan | إياد رهوان

Iyad Rahwan | إياد رهوان

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!