Joshua Saxe Profile picture
AI+cybersecurity at Meta; past lives in academic history, labor / community organizing, classical/jazz piano, hacking scene
Jul 23, 2024 11 tweets 4 min read
With today’s launch of Llama 3.1, we release CyberSecEval 3, a wide-ranging evaluation framework for LLM security used in the development of the models. Additionally, we introduce and improve three LLM security guardrails. Summary in this 🧵, links to paper/github at bottom: Image CyberSecEval 3 extends our previous work with several new test suites: a cyber attack range to measure LLM offensive capabilities, social engineering capability evaluations, and visual prompt injection tests. Image
Aug 13, 2023 6 tweets 3 min read
Making this deck for my Defcon AI Village keynote took an inordinate amount of time because it meant publicly murdering my darlings: the ~80% of MLsec R&D efforts I worked on over ~10 years and which never reached deployment🧵 Image And I guess it meant more: admitting that on many of these projects I could have seen the end before I started had I really admitted the hard limits of 2010’s era machine learning. Image
Nov 17, 2020 10 tweets 2 min read
How to evaluate a cybersecurity vendor's ML claims even if you don't know much about ML (thread).

1) Ask them why they didn't solely rely on rules/signatures in their system -- why is ML necessary? If they don't have a clear explanation, deduct a point. 2) Ask them how they know their ML system is good. Where does their test data come from? How do they know their test data is anything like real life data? How do they monitor system performance in the field? If their story isn't convincing, deduct a point.
Jan 28, 2020 6 tweets 2 min read
1\ Surprisingly, you could build a very mediocre PE malware detector with a single PE feature: the PE compile timestamp. In fact, I built a little random forest detector that uses only the timestamp as its feature that gets 62% detection on previously unseen malware at a 1% FPR. 2\ The timestamp field poses a low-key problem for attackers. If they leave the compiler-assigned value they reveal telling details. If they assign a concocted value, their tampering can make them easier to detect. Here's an 'allaple' malware set's random, insane timestamps:
Jan 24, 2020 10 tweets 4 min read
1/ Here's a thread on how to build the kind of security artifact "social network" graph popularized by @virustotal and others, but customized, and on your own private security data. Consider the following graph, where the nodes are malware samples: 2/ What you're seeing are relationships between samples from the old Chinese nation-state APT1 malware set provided by @snowfl0w / @Mandiant (fireeye.com/content/dam/fi…). The clusters are samples that appear to share C2, based on the kinds of relationships shown in the image here:
Jan 2, 2020 9 tweets 2 min read
Thread on cognitive biases in cybersecurity I've noticed:

Maginot Line: you got breached by an impersonation attack, so you go buy an anti-impersonation solution and assume you're much safer. Sort of like checking people's shoes at the airport. Survivorship/reporting bias: You treat statistics on breaches that have been reported publicly as representative of the threat landscape, when the most successful breaches go undetected.