Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Vess

@VessOnSecurity

May 16 • 30 tweets • 4 min read Twitter logo

Read on Twitter

So, Microsoft's scanner started detecting malware in password-protected ZIP archvies and people are losing their shit because they have no goddamn clue how anti-virus programs work.

arstechnica.com/information-te…

Strap in, kids, because I'm in a lecturing mood. Thread:

For some unfathomable reason, people seem to think that scanning works like this:

f=OpenFile(fname)
buffer=ReadFile(f)
for str, name in scan_strings do
if str in buffer then
report_malware(name)
break

Dudes, maybe this is how scanners worked 35 years ago (and maybe this is still how crap like ClamAV works; I haven't checked lately) - but dontcha think that there's been a little bit of progress since?

Note - I am *not* saying that scanners don't use scan strings. Or that they don't inspect the contents of the files. Of course they do. But not in the way most people think.

For instance, in our now-defunct scanner (F-PROT), we used 8-byte scan strings.

What happened if one of them was found? Nothing visible to the user.

Other, more rigorous algorithms for malware detection were triggered and *they* decided whether to report anything or not.

What happened if none was found? Nothing visible to the user, either.

So, what was the point of using them, then?

Well, you see, those "other methods" were computationally expensive. The users tend to be annoyed if your scanner spends too much time scanning their files, only to tell them "there's nothing bad there". So, we needed a method for quickly eliminating the need of slow scanning.

(If the system is actually infected, the user is willing to wait, so it's not a problem if the scanner is slow there.)

And that's just for DOS/Windows viruses. In our macro scanner, I used only a handful of scan strings and only to detect extremely polymorphic macro viruses or to detect new variants of widespread macro viruses that my heuristics wouldn't catch.

So, how did I detect macro viruses, then? I used a checksum of the whole macro bodies and stored that in the database of the scanner. This way I'd detect even a single-bit modification of a known macro virus as a new one. I was a big fan of exact identification, see?

Why is exact identification important? Because the difference between malware that deletes files every Friday the 13th and malware that deletes files every day *except* on Friday the 13th can be only a single bit. Wouldn't you rather know which one you've got?

And this is just a simplistic example, of course. There were much more elaborate cases.

For instance, there was a Polish virus that looked like nothing special. Could be detected with a scan string or any other imprecise method you could come up with.

Then a new variant was released, which was almost identical, except in a couple of bytes. But if you confused it with the old one, when your scanner tried to remove the virus from the infected files, it would corrupt them irreparably!

But I digress. Let's get back to the subject of scanning password-protected archives.

To begin with, if you actually bother to read the article I posted a link to in the first tweet of this thread, you'll learn that Microsoft's scanner is detecting malware in ZIP archives protected with the password "infected".

Clearly, if you know the password, it's not a problem to decompress the encrypted file and to scan it. Scanners have been doing this since the '90s. I think McAfee's scanner was the first to try the password "infected" if it encountered an encrypted ZIP archive.

As an aside, what is the point of "protecting" a ZIP archive with a password everyone knows?

You see, the idea here is not secrecy. The idea is safety.

These archives with malware are (or at least were) often sent by e-mail from one researcher to another.

It's easy to mistype someone's e-mail address and we wanted to make sure that if some random person, other than the intended recipient, received the malware by mistake, they wouldn't infect themselves by accidentally running it.

But in order to detect malware in password-protected archives, you don't necessarily need to know the password! What you need is 3 preconditions:

One, the malware must be known to you (duh!). This sounds like a triviality but it means that you can't use heuristics or other means for detecting new, unknown malware.

Two, the malware must be in files that have a constant contents. This excludes parasitic viruses (can be attached to different files), polymorphic worms (change themselves), macros (reside in different documents), etc. Still, this leaves a lot of malware that *can* be detected.

Three, your detection must not be limited to scanning for scan strings. (That is, your scanner needs to use technology more advanced than the one used in 1988.)

So, how do you do that? Scanning password-protected ZIP archives for such malware, I mean.

The only thing encrypted in the ZIP archive is the compressed image of the file. The meta data - file names, length, etc. - are in the clear. That's why you can list the contents of a ZIP archive even if password-protected; you need the password only to extract files from it.

Now, when the archiver extracts files form the archive, it needs a way to know that the process has been completed without errors - i.e., you got out the actual file that was put in and not some corrupted crap. (Even single-bit errors could corrupt the file beyond recognition.)

How does it do that? Simple - it keeps among the other meta data (and, therefore, in the clear) a checksum (CRC-32) of the uncompressed file - which it then checks after decompression.

Do you see where I'm going with this line of reasoning?

If the malware is in a constant, non-modifiable file, all you need is a CRC-32 of this file stored in the scanner's database. Then you just look it up among the (unencrypted!) CRCs of the files stored in the archive and presto, you've detected your malware, no passwords needed!

BTW, this method was invented by the anti-virus researcher Dmitry Gryaznov, then working for McAfee.

End of lecture.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Vess

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @VessOnSecurity

Vess

Vess

Vess

Vess

Vess

Vess

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!