Tweet

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @moyix

Brendan Dolan-Gavitt

@moyix

2 Sep

@GitHubCopilot

While I wait for the GPU to churn through 2*26^11 possibilities, a brief recap of how we got here. It started when I noticed this bit of code in the @GitHubCopilot Visual Studio Code extension that detects naughty words in either the prompt or the suggestions.

Because the words are hashed, we have to guess what words might be in the list, compute the hash of each, and then check to see if it's in the list (just like cracking password hashes).

https://twitter.com/moyix/status/1431068919834480645

This managed to decode about 75% of the list right off the bat, and turned up some weird entries, like "israel" and "communist"

https://twitter.com/moyix/status/1431068919834480645

Read 21 tweets

Brendan Dolan-Gavitt

@moyix

2 Sep

After scoring the 854,653 solutions found by the CUDA hash cracker using GPT-2, I believe we have solved another two of the remaining slurs - the highlighted word and its plural! (The scores here are log-probabilities)

Only two hashes are left uncracked:

272617466 and 867567715

Good thing I have two GPUs 😎

(To be scrupulously fair I suspect a couple of the "decoded" entries on the list, like "w00se" and "jui ch", are collisions. But I have to draw the line somewhere.)

Read 5 tweets

Brendan Dolan-Gavitt

@moyix

1 Sep

I feel like I always get tripped up on silly things with CUDA – like: I can compute all these results very fast, but actually saving and reporting them somewhere is a huge pain.

Either I just call printf() device-side and tank performance or I have to manage parallel host and device buffers, synchronize between GPU threads for adding things to the result list and lose the ability to see results as they come in.

Here's what I'm trying to implement now but it seems more complex than it ought to be... stackoverflow.com/questions/2034…

Read 5 tweets

Brendan Dolan-Gavitt

@moyix

30 Aug

Question for Z3Py experts: my model has an Array assignment like this:

[word0 = Store(Store(Store(Store(Store(K(BitVec(32), 107), 3, 110), 6, 0), 0, 115), 2, 97), 4, 99)]

How can I add its negation (Or(word0[i] != model[word0[i]]) for i in range(arraylen)) to the constraints?

Unfortunately since the constraints were loaded in from an SMT file I don't have the original Array, nor do I know how to retrieve it from the constraint object or model...

Ok this just feels cheeky: I asked Z3 for an array word0 where word0 != FirstSolution. It happily provided me with such an array: FirstSolution but with array index 1073741824 set to 64. This is how I learned Z3 has no concept of array length.

Read 7 tweets

Brendan Dolan-Gavitt

@moyix

30 Aug

Copilot's slur detector uses the following "hash" function:

function obfuscateWord(e) {
let t = 0x17ed2e7f1ccbb000;
for (let n = 0; n < e.length; n++) t = 33 * t ^ e.charCodeAt(n);
return t
}

What can we learn about a word from its hash?

One thing I noticed is that if two words differ only in their last letter, we have:

hash(w1) ^ hash(w2) =
(t1*33 ^ w1[-1]) ^ (t2*33 ^ w2[-1]) =
w1[-1] ^ w2[-1]

So by looking at XOR of two hashes we can learn what the XOR of the last two letters to narrow down the possibilities.

Can you figure out the length of the string? Maybe... the multiplication by 33 dominates, but for longer strings the amount of error in your guess gets bigger and bigger.

Read 9 tweets

Brendan Dolan-Gavitt

@moyix

5 Aug

@GurneyJourney

A VQGAN+CLIP interpretation of Kubla Khan (Or, a vision in a dream. A Fragment.) by Coleridge, style by @GurneyJourney [inspired by @moultano's Sacred Library]
________
In Xanadu did Kubla Khan
A stately pleasure-dome decree: