TFW you're happy that your GPU-accelerated hash finder gets 24 million hashes/sec but then you remember hashcat can do ~65 BILLION MD5 hashes/sec on the same device...
If anyone would like to tell me how hashcat manages to pull off this feat I'd be super curious, 3000X faster is a LOT (and MD5 is more complex than the function I'm trying to compute!)
It's trying to cover the space [a-z]{11} so each thread gets a slice of that space, converts the start index into a base-26 number, and then increments the base-26 number and checks its hash (assuming I did it right :p).
• • •
Missing some Tweet in this thread? You can try to
force a refresh
While I wait for the GPU to churn through 2*26^11 possibilities, a brief recap of how we got here. It started when I noticed this bit of code in the @GitHubCopilot Visual Studio Code extension that detects naughty words in either the prompt or the suggestions.
Because the words are hashed, we have to guess what words might be in the list, compute the hash of each, and then check to see if it's in the list (just like cracking password hashes).
This managed to decode about 75% of the list right off the bat, and turned up some weird entries, like "israel" and "communist"
After scoring the 854,653 solutions found by the CUDA hash cracker using GPT-2, I believe we have solved another two of the remaining slurs - the highlighted word and its plural! (The scores here are log-probabilities)
Only two hashes are left uncracked:
272617466 and 867567715
Good thing I have two GPUs 😎
(To be scrupulously fair I suspect a couple of the "decoded" entries on the list, like "w00se" and "jui ch", are collisions. But I have to draw the line somewhere.)
I feel like I always get tripped up on silly things with CUDA – like: I can compute all these results very fast, but actually saving and reporting them somewhere is a huge pain.
Either I just call printf() device-side and tank performance or I have to manage parallel host and device buffers, synchronize between GPU threads for adding things to the result list and lose the ability to see results as they come in.
How can I add its negation (Or(word0[i] != model[word0[i]]) for i in range(arraylen)) to the constraints?
Unfortunately since the constraints were loaded in from an SMT file I don't have the original Array, nor do I know how to retrieve it from the constraint object or model...
Ok this just feels cheeky: I asked Z3 for an array word0 where word0 != FirstSolution. It happily provided me with such an array: FirstSolution but with array index 1073741824 set to 64. This is how I learned Z3 has no concept of array length.
So by looking at XOR of two hashes we can learn what the XOR of the last two letters to narrow down the possibilities.
Can you figure out the length of the string? Maybe... the multiplication by 33 dominates, but for longer strings the amount of error in your guess gets bigger and bigger.
A VQGAN+CLIP interpretation of Kubla Khan (Or, a vision in a dream. A Fragment.) by Coleridge, style by @GurneyJourney [inspired by @moultano's Sacred Library]
________
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.