Eric Profile picture
Jan 27, 2023 9 tweets 4 min read Read on X
ChatGPT (and others) generate very fluent (but not always truthful) text.

Some worry that teachers, news-readers (like you!), and society in general will be swamped with AI-generated content.

That's why we built DetectGPT, a method for detecting if text comes from an LM.
One way to detect LM text is with a trained classifier (another LM). This works, but can overfit to the models/topics it was trained on.

Instead, if we can access the LM itself, we can use its own log probabilities to do detection *zero-shot*, without any training at all!
But how? Well, one simple approach is to measure the log probability of the text under the model. High log prob -> model sample.

This works, but DetectGPT takes a different approach that turns out to be consistently more accurate in our experiments.
Quick q:

What do we expect the log probability function to look like in the neighborhood of a model sample?

We hypothesized that a model's samples are usually in local maxima of its log probability function, or more generally, in areas of negative curvature.

Spoiler: they are!
But how to measure curvature of an LM's logprob, you ask??

We can approximate *directional second derivatives* by perturbing the text a bit with T5 & comparing the logprob under the LM before and after.

Add Hutchinson's trace estimator & we get approximate trace of the Hessian.
DetectGPT takes this approximate Hessian trace, and simply thresholds it to get a detector.

Hessian trace very negative? Probably a model sample!

Turns out this quantity discriminates between human-written and model-generated text very well, for various models and scales.
Does it work?

DetectGPT consistently improves AUROC (prob. a random pair of fake/human text is correctly classified) over existing zero-shot methods, for models with 100M to 175B parameters.

It's also competitive with supervised classifiers, outperforming them in some domains.
There are other goodies in the experiments… for example, we explore robustness of detection to machine-generated text that has been partially revised.

Check out the paper for more (and website for code/demo soon)

arxiv.org/abs/2301.11305
ericmitchell.ai/detectgpt
So much fun working on this with:

@yoonholeee
@SashaKhazatsky
@chrmanning
@chelseabfinn

Also extremely grateful for the support of Stanford's Center for Research on Foundation Models @StanfordCRFM in running experiments on some very large LMs!!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Eric

Eric Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(