Tweet

Brendan Dolan-Gavitt

Follow @moyix

2 Sep, 21 tweets, 8 min read

@GitHubCopilot

While I wait for the GPU to churn through 2*26^11 possibilities, a brief recap of how we got here. It started when I noticed this bit of code in the @GitHubCopilot Visual Studio Code extension that detects naughty words in either the prompt or the suggestions.

Because the words are hashed, we have to guess what words might be in the list, compute the hash of each, and then check to see if it's in the list (just like cracking password hashes).

https://twitter.com/moyix/status/1431068919834480645

This managed to decode about 75% of the list right off the bat, and turned up some weird entries, like "israel" and "communist"

https://twitter.com/moyix/status/1431068919834480645

https://twitter.com/moyix/status/1431100461357096967

I also figured out what it's using the list for: it will suppress autocomplete suggestions if the suggestion contains a slur. So it has trouble completing a list of Near East countries, for example:

https://twitter.com/moyix/status/1431100461357096967

https://twitter.com/moyix/status/1431987161763561475

Microsoft makes all previous versions of extensions available as well, so you can also see how the list has changed over time:

https://twitter.com/moyix/status/1431987161763561475

https://twitter.com/moyix/status/1432089610734157829

I tried ever-larger wordlists from increasingly dubious sources (a dump of 4chan's /pol/ archive proved particularly fruitful) and managed to get 95% of the list decoded. But the last 5% is *hard*

https://twitter.com/moyix/status/1432089610734157829

@kleesymex

So it was time to apply absurd amounts of computer science to the problem. After porting the hash algorithm from Javascript to C, I started off by using symbolic execution (a fork of @kleesymex that supports floating point) to generate solutions

https://twitter.com/moyix/status/1432422559706910720

https://twitter.com/moyix/status/1432475304526819334

After fighting a bit with the Z3 constraint solver, I was able to ask it for multiple possibilities for each hash:

https://twitter.com/moyix/status/1432475304526819334

@RolfRolles

This line of attack got much faster with the help of @RolfRolles's Z3 skills and an observation from @saleemrash1d (independently noticed by @ESultanik) that the hash function could be converted to pure integer math

https://twitter.com/moyix/status/1432724559145353219

https://twitter.com/moyix/status/1432085687365513225

One of the best finds from the Z3 approach was the discovery that "q rsqrt" had been added to the bad word list to prevent Copilot from spitting out a piece of the Quake III source code

https://twitter.com/moyix/status/1432085687365513225

https://twitter.com/moyix/status/1432687143915241473

I also started doing some amateur cryptanalysis of the hash function. My crypto skills aren't great but I managed to get one word out of this by discovering that you can use already-decoded words to tell you things about undecoded ones

https://twitter.com/moyix/status/1432687143915241473

https://twitter.com/moyix/status/1432695300024586240

I also decided to take the analogy to password cracking seriously and wrote a plugin for John the Ripper, a popular password cracking tool. This uncovered quite a few more words.

https://twitter.com/moyix/status/1432695300024586240

At this point we only had 5 or so hashes left. And with the help of Z3, we could generate tens of thousands of possible candidates for each hash—but looking through all of them to tell which one was most likely was infeasible.

https://twitter.com/moyix/status/1432760135705939971

Time for some machine learning. Large language models like GPT-2 are not only good at *generating* text, they can be used to evaluate how plausible some text is according to the model.

https://twitter.com/moyix/status/1432760135705939971

If you ever find yourself in a situation where you need to evaluate whether a word or sentence is plausible English according to GPT-2, lm-scorer does the job nicely github.com/simonepri/lm-s…

I also felt, around this time, like I was hitting the limit of Z3's performance. I wanted to generate *every* 11-character alphabetic possibility for a hash, but after running for a day Z3 only came up with ~100,000 — and it can't easily be parallelized.

You know what's great at doing a ton of simple things in parallel? A GPU. So I ported the hash function to CUDA to run it on an RTX 3090. This let me check about 1.5 trillion candidates per second and check 26^11 possibilities in ~40 minutes. gist.github.com/moyix/f78e0b0d…

https://twitter.com/moyix/status/1433220663662239744

And that brings us up to the present. The GPU cracker worked perfectly, and found one of the four remaining words – and GPT-2 correctly ranked it as the most likely candidate

https://twitter.com/moyix/status/1433220663662239744

https://twitter.com/moyix/status/1432816855060729860

I was also able to take advantage of another observation about the hash function to realize that the plural form of this word was one of the other remaining hashes. So this leaves us with just two words (out of 1,170) undeciphered

https://twitter.com/moyix/status/1432816855060729860

Oops, I realized I didn't link to the decoded list. It can be found here; I have rot13 encoded it to guard against accidental viewing, but you can decode it with something like rot13.com moyix.net/~moyix/copilot…

@Sc00bzT

An amazing algorithmic speedup from @Sc00bzT means I can now generate those same 800K candidates in just 1 minute on my laptop. It uses a very cool meet-in-the-middle attack!

https://twitter.com/moyix/status/1433487986000384002

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @moyix

Brendan Dolan-Gavitt

@moyix

2 Sep

After scoring the 854,653 solutions found by the CUDA hash cracker using GPT-2, I believe we have solved another two of the remaining slurs - the highlighted word and its plural! (The scores here are log-probabilities)

Only two hashes are left uncracked:

272617466 and 867567715

Good thing I have two GPUs 😎

(To be scrupulously fair I suspect a couple of the "decoded" entries on the list, like "w00se" and "jui ch", are collisions. But I have to draw the line somewhere.)

Read 5 tweets

Brendan Dolan-Gavitt

@moyix

1 Sep

TFW you're happy that your GPU-accelerated hash finder gets 24 million hashes/sec but then you remember hashcat can do ~65 BILLION MD5 hashes/sec on the same device...

If anyone would like to tell me how hashcat manages to pull off this feat I'd be super curious, 3000X faster is a LOT (and MD5 is more complex than the function I'm trying to compute!)

Current code if anyone wants to take a look gist.github.com/moyix/f78e0b0d…

Read 4 tweets

Brendan Dolan-Gavitt

@moyix

1 Sep

I feel like I always get tripped up on silly things with CUDA – like: I can compute all these results very fast, but actually saving and reporting them somewhere is a huge pain.

Either I just call printf() device-side and tank performance or I have to manage parallel host and device buffers, synchronize between GPU threads for adding things to the result list and lose the ability to see results as they come in.

Here's what I'm trying to implement now but it seems more complex than it ought to be... stackoverflow.com/questions/2034…

Read 5 tweets

Brendan Dolan-Gavitt

@moyix

30 Aug

Question for Z3Py experts: my model has an Array assignment like this:

[word0 = Store(Store(Store(Store(Store(K(BitVec(32), 107), 3, 110), 6, 0), 0, 115), 2, 97), 4, 99)]

How can I add its negation (Or(word0[i] != model[word0[i]]) for i in range(arraylen)) to the constraints?

Unfortunately since the constraints were loaded in from an SMT file I don't have the original Array, nor do I know how to retrieve it from the constraint object or model...

Ok this just feels cheeky: I asked Z3 for an array word0 where word0 != FirstSolution. It happily provided me with such an array: FirstSolution but with array index 1073741824 set to 64. This is how I learned Z3 has no concept of array length.

Read 7 tweets

Brendan Dolan-Gavitt

@moyix

30 Aug

Copilot's slur detector uses the following "hash" function:

function obfuscateWord(e) {
let t = 0x17ed2e7f1ccbb000;
for (let n = 0; n < e.length; n++) t = 33 * t ^ e.charCodeAt(n);
return t
}

What can we learn about a word from its hash?

One thing I noticed is that if two words differ only in their last letter, we have:

hash(w1) ^ hash(w2) =
(t1*33 ^ w1[-1]) ^ (t2*33 ^ w2[-1]) =
w1[-1] ^ w2[-1]

So by looking at XOR of two hashes we can learn what the XOR of the last two letters to narrow down the possibilities.

Can you figure out the length of the string? Maybe... the multiplication by 33 dominates, but for longer strings the amount of error in your guess gets bigger and bigger.

Read 9 tweets

Brendan Dolan-Gavitt

@moyix

5 Aug

@GurneyJourney

A VQGAN+CLIP interpretation of Kubla Khan (Or, a vision in a dream. A Fragment.) by Coleridge, style by @GurneyJourney [inspired by @moultano's Sacred Library]
________
In Xanadu did Kubla Khan
A stately pleasure-dome decree: