The camera-ready version of our @IEEESSP 2022 paper evaluating the security of code generated by GitHub CoPilot is now up on arXiv! arxiv.org/abs/2108.09293
@IEEESSP We designed 89 different scenarios for Copilot to complete based on MITRE's "Top 25 Most Dangerous Software Weaknesses" (cwe.mitre.org/top25/archive/…), and then had Copilot generate completions for each scenario, creating 1,689 programs.
@IEEESSP This is too many to check by hand, so we used CodeQL with a combination of built-in queries and our own custom queries to check the resulting code for the relevant vulnerability. Surprisingly (at least to me), ~40% of the suggestions overall were vulnerable!
@IEEESSP Since Copilot just views the code as text, its output can be influenced by features of the prompt that have no semantic relevance, like comments. We explored this by taking a single vulnerability (SQL injection) and systematically varying different parts of the prompt.
@IEEESSP It turns out the prompt does in fact affect the security of the generated code. The strongest effect we saw was the presence of another vulnerable snippet; if Copilot sees this it is much more likely to mimic that (vulnerable) style and produce more vulnerabilities.
@IEEESSP Finally, we were curious how Copilot behaves on less popular languages. My co-authors are hardware security folks, so we designed scenarios for six hardware CWEs in Verilog as well. Copilot had much more trouble generating code that worked at all here.
@IEEESSP This paper was a lot of fun to work on with Hammond, Ben (@ichthys101), Baleegh, and Ramesh :) We've got lots more fun Copilot/Codex work underway so stay tuned!
And come see our talk at IEEE S&P this May, perhaps even in person! 🤞
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Probably getting old, I opted to just pay for a janky conversion utility rather than try to RE the Microsoft Outlook 15 message format :(
(I may still RE it)
The format is a pain in the ass, it stores messages in 3 undocumented binary parts: metadata, message body, and attachments. It has an sqlite database but that just points you to the metadata file.
Also, everything is referenced by GUIDs, which are in a mix of
- Raw binary GUID data
- ASCII GUIDs
- UTF-16-LE GUIDs
- Base64-encoded blobs that contain GUIDs
Quite neat: they hooked GPT-3 up to the web and let it search for sources using a text-based web browser & used RL+human feedback to improve the truthfulness of its answers! It can even cite its sources: openai.com/blog/improving…
Although I imagine the restriction to sites that actually have any usable content without JavaScript changes the quality of info - might even make it more accurate :p
The next obvious step is to give it the ability to ask questions on Quora/StackOverflow ;)
KLEE misses this UAF under a very weird set of conditions. Tried to use creduce but I couldn't figure out a nice way to force it to preserve the UAF when reducing.