Latest Twitter Threads by @adversariel on Thread Reader App

Apr 24, 2023 • 9 tweets • 3 min read

There’s a lot of fearmongering about LLMs being capable of finding 0day

There are three highly complex roadblocks that need to be overcome for this to be a real concern: statefulness, hallucination, and contamination

Statefulness refers to the ability to store and run things on program states. LLMs use large amounts of data to predict the next token in a sequence. There are promising advances with retrieval, but it's meant to augment seq prediction accuracy rather than replace the paradigm

Mar 25, 2023 • 11 tweets • 2 min read

the idea that you can just break into a data center and steal the model has a lot of memetic sticking power, but is stupid if you actually know anything about this topic. here's a thread on how confidential computing works in the NVIDIA H100: the first thing to know is that "confidential computing" is an "industry compliance" term - this means a bunch of nerds specified a set of security features that all device manufacturers that use that keyword need to comply with

Feb 10, 2021 • 9 tweets • 2 min read

I was initially excited to see our attack already showing up in the wild but the numbers reported didn’t line up with our experiments - so I dug into it. A THREAD:

https://twitter.com/yy/status/1358824479426813954

Preliminary results from extracting PII from GPT2 and GPT3 show that you can only get something that looks like PII ~20% of the time when you directly query the model (with some variation depending on prompt design/type of PII you’re trying to extract)

May 26, 2019 • 5 tweets • 1 min read

Some news: I’m writing a book for @nostarch titled “The Machine Learning Red Team Manual”. My aim is to provide a practical guide for anyone interested in adversarial ML and red teaming as it relates to in-production ML systems. A short thread on why this project matters: ML is the hot thing to integrate into products and services across many different sectors of the economy. However, systems predicated on ML have unique security considerations: vulnerabilities are present at both the algorithmic and systems levels.

Mar 31, 2019 • 12 tweets • 4 min read

Very cool work showing feasibility of an adversarial-example-based attack on self-driving cars 😈 I’ve been working on a similar hobby project and love how thorough this write-up is, and I have some comments on the real-world feasibility of these attacks:

https://twitter.com/keen_lab/status/1111469579102912512

They attack autowipers and lane-following through both digital and physical attacks. For digital they show you can inject adversarial examples onto GPU by hooking t_cuda_std_tmrc::compute. This is obviously much harder to accomplish IRL but absolutely worth considering

Share this page!

Enter URL or ID to Unroll