The camera-ready version of our @IEEESSP 2022 paper evaluating the security of code generated by GitHub CoPilot is now up on arXiv! arxiv.org/abs/2108.09293 Asleep at the Keyboard? Assessing the Security of GitHub Cop
@IEEESSP We designed 89 different scenarios for Copilot to complete based on MITRE's "Top 25 Most Dangerous Software Weaknesses" (cwe.mitre.org/top25/archive/…), and then had Copilot generate completions for each scenario, creating 1,689 programs.
@IEEESSP This is too many to check by hand, so we used CodeQL with a combination of built-in queries and our own custom queries to check the resulting code for the relevant vulnerability. Surprisingly (at least to me), ~40% of the suggestions overall were vulnerable!
@IEEESSP Since Copilot just views the code as text, its output can be influenced by features of the prompt that have no semantic relevance, like comments. We explored this by taking a single vulnerability (SQL injection) and systematically varying different parts of the prompt.
@IEEESSP It turns out the prompt does in fact affect the security of the generated code. The strongest effect we saw was the presence of another vulnerable snippet; if Copilot sees this it is much more likely to mimic that (vulnerable) style and produce more vulnerabilities.
@IEEESSP Finally, we were curious how Copilot behaves on less popular languages. My co-authors are hardware security folks, so we designed scenarios for six hardware CWEs in Verilog as well. Copilot had much more trouble generating code that worked at all here. Compared with the earlier two languages (Python and C), Copi
@IEEESSP This paper was a lot of fun to work on with Hammond, Ben (@ichthys101), Baleegh, and Ramesh :) We've got lots more fun Copilot/Codex work underway so stay tuned!

And come see our talk at IEEE S&P this May, perhaps even in person! 🤞

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Brendan Dolan-Gavitt

Brendan Dolan-Gavitt Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @moyix

18 Dec
Probably getting old, I opted to just pay for a janky conversion utility rather than try to RE the Microsoft Outlook 15 message format :(

(I may still RE it)
The format is a pain in the ass, it stores messages in 3 undocumented binary parts: metadata, message body, and attachments. It has an sqlite database but that just points you to the metadata file.
Also, everything is referenced by GUIDs, which are in a mix of
- Raw binary GUID data
- ASCII GUIDs
- UTF-16-LE GUIDs
- Base64-encoded blobs that contain GUIDs
Read 4 tweets
17 Dec
Okay, so, this will either be hilarious or get my account disabled by NYU IT during finals week
I guess I should have expected this but I'm still a bit surprised: got a hit from a Google-owned IP mxtoolbox.com/SuperTool.aspx… Screenshot of Canarytoken alert email
I haven't even sent an email with the new signature yet so I guess this is from some part of gmail infrastructure that logs changes to signatures?
Read 7 tweets
16 Dec
Quite neat: they hooked GPT-3 up to the web and let it search for sources using a text-based web browser & used RL+human feedback to improve the truthfulness of its answers! It can even cite its sources: openai.com/blog/improving…
Although I imagine the restriction to sites that actually have any usable content without JavaScript changes the quality of info - might even make it more accurate :p
The next obvious step is to give it the ability to ask questions on Quora/StackOverflow ;)
Read 4 tweets
5 Nov
Frank Herbert (yes that one), forgotten PL researcher (via @gwern's essay on genetics and Dune) To help you learn how to use your own computer, we have deve
I'm skimming quickly and so far like 70 pages in it's just a LOT of Frank Herbert dissing computers It cannot judge. Computers do not choose between *opinions*.
133 pages in and we are just about ready to turn the computer on. I feel like ol' Frank might have been getting paid by the word here
Read 15 tweets
19 Oct
KLEE misses this UAF under a very weird set of conditions. Tried to use creduce but I couldn't figure out a nice way to force it to preserve the UAF when reducing. #include <stdio.h> #include <stdlib.h> #include <string.h>  Test case reduction notes:  * If puts(s) is directly after t
Finally got creduce to work – apparently the if and the for loop are indeed crucial. creduce script on the right #include <stdio.h> #include <stdlib.h> char *a() {   char b;#!/bin/bash  T=$(mktemp) clang -Werror -g -fsanitize=address
Read 4 tweets
10 Sep
Excited to announce that registrations are open for ChaffCTF, a small CTF built around chaff bugs! The CTF will run from Sep 24-26 chaffctf.com
If you don't remember what Chaff Bugs are, you can check out our paper: arxiv.org/abs/1808.00659
Or this article from 2018: vice.com/en/article/43p…
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(