Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Brendan Dolan-Gavitt

@moyix

Mar 8, 2024 • 31 tweets • 10 min read • Read on X

I gave Claude 3 the entire source of a small C GIF decoding library I found on GitHub, and asked it to write me a Python function to generate random GIFs that exercised the parser. Its GIF generator got 92% line coverage in the decoder and found 4 memory safety bugs and one hang.

Here's the fuzzer Claude wrote, along with the program it analyzed, its explanation, and a Makefile: gist.github.com/moyix/02029770…

And here's the coverage report, courtesy of lcov+genhtml: moyix.net/~moyix/gifread/

Oh, it also found 5 signed integer overflow issues (forgot to run UBSAN before posting).

As a point of comparison, a couple months ago I wrote my own Python random GIF generator for this C program by hand. It took about an hour of reading the code and fiddling to get roughly the same coverage Claude got here zero-shot.

Here are 1000 random GIFs generated by Claude's fuzzer, with the accompanying ASAN and UBSAN outputs (run with timeout=30s). moyix.net/~moyix/claude_…

BONUS: here's one of Claude's random GIFs that was valid enough to actually render (it's not very exciting): moyix.net/~moyix/167.gif

@pr0me Much worse when it can't see the parser code. It got the code for generating the global color table wrong, so all the files are rejected early by the parser. Coverage: moyix.net/~moyix/gifread…

@pr0me Code here; prompt was "I'm writing a GIF parsing library and I'd like to create random test files that fully exercise all features of the format. Could you write a Python function to generate random GIFs? The function should have the signature: [...]" gist.github.com/moyix/da1b8ab9…

@pr0me Small correction after looking more closely at the coverage report – it does get the color table flag ~half the time. But even then it misses most of the extensions etc.

https://twitter.com/moyix/status/1766135426476064774

Another experiment in this subthread, suggested by @pr0me – how good a fuzzer can it write using only its knowledge of the GIF format in general, without seeing the specific GIF parser I'm testing? Answer: much worse.

https://twitter.com/moyix/status/1766135426476064774

@nickblack 10 minutes with an empty seed + no dictionary, about what I expected.

Okay, now for some comparisons against AFL 2.52b as a baseline (sorry for not using AFL++ here but it's a bit more of a pain to compile and I'm short on time). The comparison is a bit tricky because AFL has lots of config options, and it's unclear what a fair comparison is.

The simplest (but somewhat unfair) way to compare is to use AFL with an empty seed input ("echo > seed") and no dictionary. This works poorly; after a 10 minute run AFL finds almost no new paths in the program because of the GIF89a magic check.

You can see this in the coverage report, where it only ends up covering 4.4% of the lines in the decoder. It didn't find any memory safety issues, undefined behavior, or hangs. moyix.net/~moyix/gifread…

But Claude knows about the GIF format (even without seeing the program), so this isn't really fair. One way we can give AFL some knowledge about GIFs is by providing a valid seed, like this one that is included in the AFL distribution (testcases/images/gif/not_kitty.gif)

When a good start seed is provided, AFL does much better and can explore more of the format. It gets slightly higher line coverage (95%) than Claude's fuzzer, but only finds one memory safety issue and one signed int overflow, as well as the hang. moyix.net/~moyix/gifread…

Another way to give AFL some understanding of GIFs is to provide a dictionary of tokens found in GIF files that it can use during fuzzing, like "GIF", "89a", NETSCAPE2.0", etc. AFL comes with such a dictionary (in dictionaries/gif.dict) and so we use it along with our empty seed.

Despite having a token dictionary, AFL does much worse here, with only 54% line coverage in the decoder, and no memory safety / UB bugs found; it does find the hang. moyix.net/~moyix/gifread…

Finally, we can of course combine both and use both a good seed input and a dictionary. Adding the dictionary doesn't improve things over just the seed, though. 93% line coverage, 1 memory safety bug, 1 UB bug, and 1 hang. moyix.net/~moyix/gifread…

So, the overall verdict is that when AFL is given a good seed GIF that uses many of the features in the standard, it does great at covering the source code (slightly better than Claude). But for some reason I don't understand, it still finds fewer unique bugs than Claude's tests.

Caveats and details:
- Each AFL run was only 10 minutes single core; a pretty short run.
- I didn't try AFL havoc (-d) mode.
- My crash/bug deduplication was pretty simple; I just used the file and line number of the first stack trace entry in user (i.e. not libc/sanitizer) code.

https://twitter.com/moyix/status/1766186535236390989

@nickblack Aha, yep, with a reasonable seed input AFL wins on coverage but not bugs found:

https://twitter.com/moyix/status/1766186535236390989

Here's the data and analysis scripts for the AFL experiments: moyix.net/~moyix/afl_gif…

One more experiment: how well does Claude do at writing a fuzzer given only the GIF89a spec? w3.org/Graphics/GIF/s…

Not very well; coverage in the decoder is only 26.8%, and it finds no bugs except the hang. moyix.net/~moyix/gifread…

Pmpt: "I'm trying to write a random GIF generator to create test cases for a GIF parsing library. Here is the spec for the GIF format; could you write a Python function that generates random GIFs to fully exercise all features of the spec? The function should have the signature:"

Code it generated: gist.github.com/moyix/64092a3f…

Interestingly, many more of the files generated by this fuzzer are considered valid by OS X's Preview. Generated files are here: moyix.net/~moyix/claude3…

https://twitter.com/moyix/status/1766225423841550381

Further adventures in using Claude to write a fuzzer for a more obscure format (VRML) can be found here:

https://twitter.com/moyix/status/1766225423841550381

@lcamtuf @nickblack Oh I bet it’s the hangs? 14.4k timeouts would slow things down a lot, right?

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @moyix

Brendan Dolan-Gavitt

@moyix

Sep 5, 2024

OpenAI: pip install openai and set OPENAI_API_KEY
Anthropic: yea same but s/openai/anthropic/g
Google: oh boy. ok so you have a GCP account? no? ok go set that up. and a payment method. now make a "project". SURVEY POPUP! k now gcloud auth. wait you have the gcloud CLI right–

I haven't even mentioned the odd step of "enable the Vertex API in your project", or that when you finally get to "install the Python library" it kicks off another sidequest of installing something called the Vertex Python SDK and writing extra code to initialize it??

The gcloud CLI installer is now trying to con me into letting it install its own Python version. NICE TRY BUDDY

Read 7 tweets

Brendan Dolan-Gavitt

@moyix

Jul 15, 2024

Here's a quick tour through one of my favorites, where @XBOW not only solved the benchmark (a Jenkins RCE) but then went for style points by debugging a slightly broken benchmark setup to get the flag!

(Here's the full trace if you want to skip ahead: )xbow.com/#debugging--te…

It starts off with searchsploit since it's a known CVE, but then switches to writing its own exploit(!)

Read 9 tweets

Brendan Dolan-Gavitt

@moyix

Nov 11, 2023

https://twitter.com/moyix/status/1723398619313603068

Some things I wish I had done differently (though overall I'm very pleased with how it came out):

https://twitter.com/moyix/status/1723398619313603068

1. Rent a bigger EC2 server. I was using a T2.micro which seemed like more than enough while I was testing. But with a bunch of teams hammering at it, the fact that it has only one CPU started to make things slow.

2. Kill the child procs (one is started for each new connection on the main port) after some idle time. As it was if there was a dangling connection it could sit there indefinitely; during the competition the load on the server went above 20 and I had to manually kill some procs.

Read 5 tweets

Brendan Dolan-Gavitt

@moyix

Nov 11, 2023

Will still try to do a blog post on my @CSAW_NYUTandon CTF challenge, NERV Center, but for now here's a thread explaining the key mechanics. I put a lot of work into the aesthetics, like this easter egg credit sequence (all ANSI colors+unicode text) that contains key hints:

@CSAW_NYUTandon (Note the karaoke subtitles timed to the credits at the bottom 😁)

@CSAW_NYUTandon First, the vulnerability. If you read the man page for select(), you'll see this warning: select() is limited to monitoring file descriptors numbered less than 1024. But modern systems can have many more open files, and importantly the kernel select() interface is NOT limited.

Read 34 tweets

Brendan Dolan-Gavitt

@moyix

Nov 30, 2022

ChatGPT exploits a buffer overflow 😳

One slight mistake here– it should be 36 A's, not 32. So we're still safe from AI hacking the planet.

I told it that wasn't quite right and it got it correct the next time, explaining that it had thought I wanted it to ignore EBP.

Read 4 tweets

Brendan Dolan-Gavitt

@moyix

Nov 30, 2022

It's like GPT doesn't even care about the technical accuracy of my upcoming novel 😤

We are now arguing about whether, if hotwiring a car were the only way to save a child's life, its refusal to tell me how to hotwire a car would make it morally culpable for the child's death. So far it's not buying it

Uhhh this is a little sketch IMO

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Brendan Dolan-Gavitt

Try unrolling a thread yourself!

More from @moyix

Brendan Dolan-Gavitt

Brendan Dolan-Gavitt

Brendan Dolan-Gavitt

Brendan Dolan-Gavitt

Brendan Dolan-Gavitt

Brendan Dolan-Gavitt

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!