Hmm, this is actually much less impressive than I expected as far as inverting PhotoDNA (based only on reading @hackerfactor's blog post) reddit.com/r/MachineLearn…
@hackerfactor@matthew_d_green perhaps of interest if you haven't seen it yet and want to take a break from fighting with half of CS twitter about NFTs ;)
Ah, I see, it's taking a pure black box ML approach to try and learn the inverse straight from the hashes. OK, that is pretty impressive!
I had assumed it would use the limitations described here as a starting point, and then use the ML to sharpen/refine/upscale the generated image – but that would require a lot more RE on the PhotoDNA DLL hackerfactor.com/blog/index.php…
And having spent a weekend staring at that DLL's optimized XMM code until my eyes bled last summer, I certainly can't blame @anishathalye for taking the black box approach ;)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
The basic problem we're looking at in this paper is: if you buy some embedded/IoT device, it may come with a bunch of features that you don't use (say, Bluetooth) that nonetheless require driver support and expose unnecessary attack surface.
Maybe you're a company deploying a fleet of Meraki routers, you don't need the Bluetooth Low Energy localization stuff, and you're worried about vulnerabilities like this one arstechnica.com/information-te…
So, with Broadcom's acquisition of Symantec, it seems like the source code for PGP Desktop (aka Symantec Encryption Desktop) is nowhere on the internet? I have a copy but I'm pretty sure I can't host it anywhere:
Seems like a loss for archival and data recovery work! :(
FWIW, the version I have is:
MD5 (PGPDesktop10.0.1_Source.zip) = c9193850f923cda995e6d4b9f45fcbdf
Probably getting old, I opted to just pay for a janky conversion utility rather than try to RE the Microsoft Outlook 15 message format :(
(I may still RE it)
The format is a pain in the ass, it stores messages in 3 undocumented binary parts: metadata, message body, and attachments. It has an sqlite database but that just points you to the metadata file.
Also, everything is referenced by GUIDs, which are in a mix of
- Raw binary GUID data
- ASCII GUIDs
- UTF-16-LE GUIDs
- Base64-encoded blobs that contain GUIDs
The camera-ready version of our @IEEESSP 2022 paper evaluating the security of code generated by GitHub CoPilot is now up on arXiv! arxiv.org/abs/2108.09293
@IEEESSP We designed 89 different scenarios for Copilot to complete based on MITRE's "Top 25 Most Dangerous Software Weaknesses" (cwe.mitre.org/top25/archive/…), and then had Copilot generate completions for each scenario, creating 1,689 programs.
@IEEESSP This is too many to check by hand, so we used CodeQL with a combination of built-in queries and our own custom queries to check the resulting code for the relevant vulnerability. Surprisingly (at least to me), ~40% of the suggestions overall were vulnerable!
Quite neat: they hooked GPT-3 up to the web and let it search for sources using a text-based web browser & used RL+human feedback to improve the truthfulness of its answers! It can even cite its sources: openai.com/blog/improving…
Although I imagine the restriction to sites that actually have any usable content without JavaScript changes the quality of info - might even make it more accurate :p
The next obvious step is to give it the ability to ask questions on Quora/StackOverflow ;)