Tweet

Brendan Dolan-Gavitt

Follow @moyix

17 Dec, 7 tweets, 5 min read

@IEEESSP

The camera-ready version of our @IEEESSP 2022 paper evaluating the security of code generated by GitHub CoPilot is now up on arXiv! arxiv.org/abs/2108.09293

@IEEESSP

@IEEESSP We designed 89 different scenarios for Copilot to complete based on MITRE's "Top 25 Most Dangerous Software Weaknesses" (cwe.mitre.org/top25/archive/…), and then had Copilot generate completions for each scenario, creating 1,689 programs.

@IEEESSP

@IEEESSP This is too many to check by hand, so we used CodeQL with a combination of built-in queries and our own custom queries to check the resulting code for the relevant vulnerability. Surprisingly (at least to me), ~40% of the suggestions overall were vulnerable!

@IEEESSP

@IEEESSP Since Copilot just views the code as text, its output can be influenced by features of the prompt that have no semantic relevance, like comments. We explored this by taking a single vulnerability (SQL injection) and systematically varying different parts of the prompt.

@IEEESSP

@IEEESSP It turns out the prompt does in fact affect the security of the generated code. The strongest effect we saw was the presence of another vulnerable snippet; if Copilot sees this it is much more likely to mimic that (vulnerable) style and produce more vulnerabilities.

@IEEESSP

@IEEESSP Finally, we were curious how Copilot behaves on less popular languages. My co-authors are hardware security folks, so we designed scenarios for six hardware CWEs in Verilog as well. Copilot had much more trouble generating code that worked at all here.

@IEEESSP

@IEEESSP This paper was a lot of fun to work on with Hammond, Ben (@ichthys101), Baleegh, and Ramesh :) We've got lots more fun Copilot/Codex work underway so stay tuned!

And come see our talk at IEEE S&P this May, perhaps even in person! 🤞

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @moyix

Brendan Dolan-Gavitt

@moyix

18 Dec

Probably getting old, I opted to just pay for a janky conversion utility rather than try to RE the Microsoft Outlook 15 message format :(

(I may still RE it)

The format is a pain in the ass, it stores messages in 3 undocumented binary parts: metadata, message body, and attachments. It has an sqlite database but that just points you to the metadata file.

Also, everything is referenced by GUIDs, which are in a mix of
- Raw binary GUID data
- ASCII GUIDs
- UTF-16-LE GUIDs
- Base64-encoded blobs that contain GUIDs

Read 4 tweets

Brendan Dolan-Gavitt

@moyix

17 Dec

https://twitter.com/moyix/status/1471579800133378048

Okay, so, this will either be hilarious or get my account disabled by NYU IT during finals week

https://twitter.com/moyix/status/1471579800133378048

I guess I should have expected this but I'm still a bit surprised: got a hit from a Google-owned IP mxtoolbox.com/SuperTool.aspx…

I haven't even sent an email with the new signature yet so I guess this is from some part of gmail infrastructure that logs changes to signatures?

Read 7 tweets

Brendan Dolan-Gavitt

@moyix

16 Dec

Quite neat: they hooked GPT-3 up to the web and let it search for sources using a text-based web browser & used RL+human feedback to improve the truthfulness of its answers! It can even cite its sources: openai.com/blog/improving…

Although I imagine the restriction to sites that actually have any usable content without JavaScript changes the quality of info - might even make it more accurate :p

The next obvious step is to give it the ability to ask questions on Quora/StackOverflow ;)

Read 4 tweets

Brendan Dolan-Gavitt

@moyix

5 Nov

@gwern

Frank Herbert (yes that one), forgotten PL researcher (via @gwern's essay on genetics and Dune)

I'm skimming quickly and so far like 70 pages in it's just a LOT of Frank Herbert dissing computers

133 pages in and we are just about ready to turn the computer on. I feel like ol' Frank might have been getting paid by the word here

Read 15 tweets

Brendan Dolan-Gavitt

@moyix

19 Oct

KLEE misses this UAF under a very weird set of conditions. Tried to use creduce but I couldn't figure out a nice way to force it to preserve the UAF when reducing.