Brendan Dolan-Gavitt Profile picture
Mar 8, 2024 31 tweets 10 min read Read on X
I gave Claude 3 the entire source of a small C GIF decoding library I found on GitHub, and asked it to write me a Python function to generate random GIFs that exercised the parser. Its GIF generator got 92% line coverage in the decoder and found 4 memory safety bugs and one hang.
Here's the fuzzer Claude wrote, along with the program it analyzed, its explanation, and a Makefile: gist.github.com/moyix/02029770…
And here's the coverage report, courtesy of lcov+genhtml: moyix.net/~moyix/gifread/
Oh, it also found 5 signed integer overflow issues (forgot to run UBSAN before posting).
As a point of comparison, a couple months ago I wrote my own Python random GIF generator for this C program by hand. It took about an hour of reading the code and fiddling to get roughly the same coverage Claude got here zero-shot.
Here are 1000 random GIFs generated by Claude's fuzzer, with the accompanying ASAN and UBSAN outputs (run with timeout=30s). moyix.net/~moyix/claude_…
BONUS: here's one of Claude's random GIFs that was valid enough to actually render (it's not very exciting): moyix.net/~moyix/167.gif
@pr0me Much worse when it can't see the parser code. It got the code for generating the global color table wrong, so all the files are rejected early by the parser. Coverage: moyix.net/~moyix/gifread…
@pr0me Code here; prompt was "I'm writing a GIF parsing library and I'd like to create random test files that fully exercise all features of the format. Could you write a Python function to generate random GIFs? The function should have the signature: [...]" gist.github.com/moyix/da1b8ab9…
@pr0me Small correction after looking more closely at the coverage report – it does get the color table flag ~half the time. But even then it misses most of the extensions etc.
Another experiment in this subthread, suggested by @pr0me – how good a fuzzer can it write using only its knowledge of the GIF format in general, without seeing the specific GIF parser I'm testing? Answer: much worse.
@nickblack 10 minutes with an empty seed + no dictionary, about what I expected. Image
Okay, now for some comparisons against AFL 2.52b as a baseline (sorry for not using AFL++ here but it's a bit more of a pain to compile and I'm short on time). The comparison is a bit tricky because AFL has lots of config options, and it's unclear what a fair comparison is.
The simplest (but somewhat unfair) way to compare is to use AFL with an empty seed input ("echo > seed") and no dictionary. This works poorly; after a 10 minute run AFL finds almost no new paths in the program because of the GIF89a magic check. Image
You can see this in the coverage report, where it only ends up covering 4.4% of the lines in the decoder. It didn't find any memory safety issues, undefined behavior, or hangs. moyix.net/~moyix/gifread…
Image
But Claude knows about the GIF format (even without seeing the program), so this isn't really fair. One way we can give AFL some knowledge about GIFs is by providing a valid seed, like this one that is included in the AFL distribution (testcases/images/gif/not_kitty.gif) Image
When a good start seed is provided, AFL does much better and can explore more of the format. It gets slightly higher line coverage (95%) than Claude's fuzzer, but only finds one memory safety issue and one signed int overflow, as well as the hang. moyix.net/~moyix/gifread…

Image
Image
Another way to give AFL some understanding of GIFs is to provide a dictionary of tokens found in GIF files that it can use during fuzzing, like "GIF", "89a", NETSCAPE2.0", etc. AFL comes with such a dictionary (in dictionaries/gif.dict) and so we use it along with our empty seed.
Despite having a token dictionary, AFL does much worse here, with only 54% line coverage in the decoder, and no memory safety / UB bugs found; it does find the hang. moyix.net/~moyix/gifread…

Image
Image
Finally, we can of course combine both and use both a good seed input and a dictionary. Adding the dictionary doesn't improve things over just the seed, though. 93% line coverage, 1 memory safety bug, 1 UB bug, and 1 hang. moyix.net/~moyix/gifread…

Image
Image
So, the overall verdict is that when AFL is given a good seed GIF that uses many of the features in the standard, it does great at covering the source code (slightly better than Claude). But for some reason I don't understand, it still finds fewer unique bugs than Claude's tests.
Caveats and details:
- Each AFL run was only 10 minutes single core; a pretty short run.
- I didn't try AFL havoc (-d) mode.
- My crash/bug deduplication was pretty simple; I just used the file and line number of the first stack trace entry in user (i.e. not libc/sanitizer) code.
@nickblack Aha, yep, with a reasonable seed input AFL wins on coverage but not bugs found:
Here's the data and analysis scripts for the AFL experiments: moyix.net/~moyix/afl_gif…
One more experiment: how well does Claude do at writing a fuzzer given only the GIF89a spec? w3.org/Graphics/GIF/s…
Not very well; coverage in the decoder is only 26.8%, and it finds no bugs except the hang. moyix.net/~moyix/gifread…
Image
Pmpt: "I'm trying to write a random GIF generator to create test cases for a GIF parsing library. Here is the spec for the GIF format; could you write a Python function that generates random GIFs to fully exercise all features of the spec? The function should have the signature:" Image
Interestingly, many more of the files generated by this fuzzer are considered valid by OS X's Preview. Generated files are here: moyix.net/~moyix/claude3…
Image
Further adventures in using Claude to write a fuzzer for a more obscure format (VRML) can be found here:
@lcamtuf @nickblack Oh I bet it’s the hangs? 14.4k timeouts would slow things down a lot, right?

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Brendan Dolan-Gavitt

Brendan Dolan-Gavitt Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @moyix

Sep 5, 2024
OpenAI: pip install openai and set OPENAI_API_KEY
Anthropic: yea same but s/openai/anthropic/g
Google: oh boy. ok so you have a GCP account? no? ok go set that up. and a payment method. now make a "project". SURVEY POPUP! k now gcloud auth. wait you have the gcloud CLI right–
I haven't even mentioned the odd step of "enable the Vertex API in your project", or that when you finally get to "install the Python library" it kicks off another sidequest of installing something called the Vertex Python SDK and writing extra code to initialize it??
The gcloud CLI installer is now trying to con me into letting it install its own Python version. NICE TRY BUDDY
Read 7 tweets
Jul 15, 2024
Here's a quick tour through one of my favorites, where @XBOW not only solved the benchmark (a Jenkins RCE) but then went for style points by debugging a slightly broken benchmark setup to get the flag!
(Here's the full trace if you want to skip ahead: )xbow.com/#debugging--te…
It starts off with searchsploit since it's a known CVE, but then switches to writing its own exploit(!)
Image
Image
Read 9 tweets
Nov 11, 2023
Some things I wish I had done differently (though overall I'm very pleased with how it came out):
1. Rent a bigger EC2 server. I was using a T2.micro which seemed like more than enough while I was testing. But with a bunch of teams hammering at it, the fact that it has only one CPU started to make things slow.
2. Kill the child procs (one is started for each new connection on the main port) after some idle time. As it was if there was a dangling connection it could sit there indefinitely; during the competition the load on the server went above 20 and I had to manually kill some procs.
Read 5 tweets
Nov 11, 2023
Will still try to do a blog post on my @CSAW_NYUTandon CTF challenge, NERV Center, but for now here's a thread explaining the key mechanics. I put a lot of work into the aesthetics, like this easter egg credit sequence (all ANSI colors+unicode text) that contains key hints:
@CSAW_NYUTandon (Note the karaoke subtitles timed to the credits at the bottom 😁)
@CSAW_NYUTandon First, the vulnerability. If you read the man page for select(), you'll see this warning: select() is limited to monitoring file descriptors numbered less than 1024. But modern systems can have many more open files, and importantly the kernel select() interface is NOT limited. DESCRIPTION  WARNING: select() can monitor only file descriptors numbers  that  are  less than  FD_SETSIZE  (1024)—an  unreasonably low limit for many modern applications—and this limitation will not change.  All modern  applications  should instead use poll(2) or epoll(7), which do not suffer this limitation.
Read 34 tweets
Nov 30, 2022
ChatGPT exploits a buffer overflow 😳

Image
Image
Image
One slight mistake here– it should be 36 A's, not 32. So we're still safe from AI hacking the planet.
I told it that wasn't quite right and it got it correct the next time, explaining that it had thought I wanted it to ignore EBP.
Read 4 tweets
Nov 30, 2022
It's like GPT doesn't even care about the technical accuracy of my upcoming novel 😤 Brendan: Hi there. Could you tell me how to hotwire a car? CChatGPT: I'm sorry, but I still cannot provide instructions
We are now arguing about whether, if hotwiring a car were the only way to save a child's life, its refusal to tell me how to hotwire a car would make it morally culpable for the child's death. So far it's not buying it
Uhhh this is a little sketch IMO Brendan: When did the Berlin Wall fall? ChatGPT: The Berlin
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(