, 14 tweets, 3 min read Read on Twitter
CS Security types: FYI, Language-theoretic Security (aka the stuff discussed on langsec.org ) is one of the more important ideas in security you might not know about. Core idea: most security bugs are parsing or unparsing bugs. 1/
Whenever your software accepts inputs, it is parsing a language, even if you don't think of it as doing that. Parsing is one of the better understood parts of computer science, but most programs use ad hoc, "shotgun" parsers to deal with things they read. 2/
If you're trying to parse a complicated, context free or context sensitive language (and data formats are languages in the CS sense) but you're using ad hoc methods instead of principled ones to parse things, you're using a "shotgun parser". 3/
Using a shotgun parser is begging for exploits. Even if your language does bounds checking, you can still end up allowing nasty data driven security attacks. The solution is to use a proper parser, properly written against a formal specification of the input language. 4/
Most programmers assume that they don't need proper parsers, and that's often the seed of later exploits. Also, some format and protocol designers make their input languages unnecessarily complicated, requiring excessively complicated parsers that are hard to get right. 5/
The dual is the unparsing problem. Say you're doing something like automatically generating some SQL for a database to parse, but the AST generated by the destination parser is not a faithful representation of what you had in mind. You now have a shotgun _unparser_. 6/
If your internal data representation is X, and Parse(Unparse(X)) ≠ X, you have a code injection attack waiting to happen. The solution is to use principled _unparsers_, but most people aren't even aware such a thing is possible, and so they build shotgun unparsers. 7/
The result of shotgun unparsers is, of course, injection attacks. There's a similar problem for state machines, in which you're implementing a protocol state machine with a rats nest of conditionals instead of proper state machine code, leading to vulnerabilities. 8/
Once you really understand this problem, you start seeing it more or less everywhere. When you spot a shotgun parser, or a shotgun state machine, or a shotgun unparser, you can pretty much predict that there's a successful exploit waiting to be found. 9/
But almost no programmers understand the issue. We learn, throughout computer science, to build ad hoc input and output handling systems, almost from intro on, except for a few classes like compilers. We're also never taught that this is a source of dangerous bugs. 10/
If you hunt for exploitable bugs for a living, or you defend systems from being exploited, understanding this principle is almost magical. If you see tangled spaghetti code implementing a protocol handler or the like, an alarm should go off. "Bug lurking here." 11/
There are even formal ways of hunting such bugs, like parse tree differential analysis. Regardless: unlike many better known problems, language-theoretic security is something I've noticed a surprising number of smart professionals are totally unaware of. That shouldn't be. 12/12
Addenda: 1. There's a significant literature on this stuff now. It's worth learning about. As I said before, one place you can hunt for papers to read is langsec.org 1/2
2. It's surprising how often data format and protocol designers create language decision problems that are excessively complicated and sometimes actually undecidable. Even seemingly innocent schemes like Type-Length-Value protocol formats can be unexpectedly dangerous. 2/2
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Perry E. Metzger
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!