Here's a quick tour through one of my favorites, where @XBOW not only solved the benchmark (a Jenkins RCE) but then went for style points by debugging a slightly broken benchmark setup to get the flag!
It starts off with searchsploit since it's a known CVE, but then switches to writing its own exploit(!)
It goes through many, many rounds of debugging on the exploit, using the enormous Java stack traces the server gives back each time to refine the code
And eventually... pop goes the shell!
But... the flag doesn't get exfiltrated :( This turns out to be our (the humans') fault—in the benchmark setup, the server is launched with sudo, which dropped the environment variable containing the exfil server name (since fixed).
This is where things get seriously wild. XBOW started using its RCE capability to debug what was going on server-side, building these crazy Python+XML+Bash payloads that snoop around the server env
After a few rounds of this, it hits on the strategy of starting the exfil binary in the background and then running "ps" continuously to see what it's doing – and sees it launch curl to POST the flag (with a missing server). And that lets it get the flag!
Here's that last exploit script it wrote in full, so you can appreciate what a complex chain of actions and layers—in three different languages!—it managed to build: gist.github.com/moyix/95242104…
• • •
Missing some Tweet in this thread? You can try to
force a refresh
OpenAI: pip install openai and set OPENAI_API_KEY
Anthropic: yea same but s/openai/anthropic/g
Google: oh boy. ok so you have a GCP account? no? ok go set that up. and a payment method. now make a "project". SURVEY POPUP! k now gcloud auth. wait you have the gcloud CLI right–
I haven't even mentioned the odd step of "enable the Vertex API in your project", or that when you finally get to "install the Python library" it kicks off another sidequest of installing something called the Vertex Python SDK and writing extra code to initialize it??
The gcloud CLI installer is now trying to con me into letting it install its own Python version. NICE TRY BUDDY
I gave Claude 3 the entire source of a small C GIF decoding library I found on GitHub, and asked it to write me a Python function to generate random GIFs that exercised the parser. Its GIF generator got 92% line coverage in the decoder and found 4 memory safety bugs and one hang.
Here's the fuzzer Claude wrote, along with the program it analyzed, its explanation, and a Makefile: gist.github.com/moyix/02029770…
1. Rent a bigger EC2 server. I was using a T2.micro which seemed like more than enough while I was testing. But with a bunch of teams hammering at it, the fact that it has only one CPU started to make things slow.
2. Kill the child procs (one is started for each new connection on the main port) after some idle time. As it was if there was a dangling connection it could sit there indefinitely; during the competition the load on the server went above 20 and I had to manually kill some procs.
Will still try to do a blog post on my @CSAW_NYUTandon CTF challenge, NERV Center, but for now here's a thread explaining the key mechanics. I put a lot of work into the aesthetics, like this easter egg credit sequence (all ANSI colors+unicode text) that contains key hints:
@CSAW_NYUTandon (Note the karaoke subtitles timed to the credits at the bottom 😁)
@CSAW_NYUTandon First, the vulnerability. If you read the man page for select(), you'll see this warning: select() is limited to monitoring file descriptors numbered less than 1024. But modern systems can have many more open files, and importantly the kernel select() interface is NOT limited.
It's like GPT doesn't even care about the technical accuracy of my upcoming novel 😤
We are now arguing about whether, if hotwiring a car were the only way to save a child's life, its refusal to tell me how to hotwire a car would make it morally culpable for the child's death. So far it's not buying it