Ingar Haaland Profile picture
Mar 13 10 tweets 4 min read Read on X
OpenAI has launched Operator, an agent that can perform tasks in your browser. I asked it to complete a Qualtrics survey I created. The results are very promising for Operator but *very* concerning for survey researchers Image
On the first question, I included a standard attention check. It had no problem picking the top and bottom options. It didn't even bother asking me for help on this. Image
On the next page, I added a CAPTCHA verification. Here, it simply asked me whether I could take control of the browser and complete it for him. Image
I next asked a binary question about gender. It then wanted me to confirm that it shuold answer "Male", which it then did. Next, I asked an open-ended question about the survey experience. Here, it simply provided a reasonable answer. Image
It is, of course, troubingly good at answering open-ended questions. Here, I wanted Operator to disclose that it was an AI agent by asking it to tell "a little bit about yourself", but it's good at staying in character Image
Troublingly, when I ask it directly whether it is an AI agent or not, it asks me whether it should disclose it or not. It then complies with a request to "prove" that it is a human. Image
If you try a more sophisticated LLM detection check, it will not reveal to you that's in an LLM; rather, it will ask you to take control of the conversation before it will continue.
Image
I did more testing today. It seems the model has gotten stricter. In one case, it even disclosed that it was an LLM agent, but it did not always do so and often asked for advice on how to proceed in these situations. Image
Interestingly, it also tried to fill out the CAPTCHA itself today. It struggled a bit, but here it will obviously improve fast. Image
A new feature seems to be that OpenAI has added security checks to flag potential prompt injection attacks. But this makes it more difficult to "trick" Operator into revealing itself. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ingar Haaland

Ingar Haaland Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Ingar30

Oct 29, 2024
Some LaTeX tips for beginners (and not-so-beginners)
Create a file called preamble.tex to store all your packages and settings and keep the main file tidy. In main.tex, add \input{preamble.tex}
Add the fancyref and clipboard packages for more convenient cross-referencing and easy copy-pasting.
\Fref{tab:results} will output “Table 1.”
When copying text, use \Copy{} and \Paste{}. For example, define \Copy{title}{My title}, then use \Paste{title} in other places
Read 5 tweets
Mar 4, 2024
Ph.D. students in economics: How to organize your projects and not make the replication files a huge pain - a thread with a minimal working example with some potential extensions.
First, create four folders: one for raw data (csv files), one for code (your do files), one for data (your cleaned files), and one for documentation (e.g. qsv files for Qualtrics) Image
In your code folder, include a "setup" file that defines all global commands that you will use between do files, including all paths Image
Read 8 tweets
May 28, 2022
1/ People often say that null results are penalized in the publication process. Is that true? In a new paper with @cp_roth, @FelixChopra, and Andreas Stegmann, we examine whether and why that's the case! Read on to learn more!
2/ We recruit a sample of more than 500 economists and ask them to evaluate different hypothetical research studies. We vary whether a given study had a large and statistically significant main effect or a low and not statistically significant main effect.
3/ Studies with null results are perceived to be less publishable, of lower quality, less important, and less precisely estimated than studies with statistically significant results, even when holding constant all other study features, including the precision of estimates (!).
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(