So, Microsoft's scanner started detecting malware in password-protected ZIP archvies and people are losing their shit because they have no goddamn clue how anti-virus programs work.
Strap in, kids, because I'm in a lecturing mood. Thread:
For some unfathomable reason, people seem to think that scanning works like this:
f=OpenFile(fname)
buffer=ReadFile(f)
for str, name in scan_strings do
if str in buffer then
report_malware(name)
break
Dudes, maybe this is how scanners worked 35 years ago (and maybe this is still how crap like ClamAV works; I haven't checked lately) - but dontcha think that there's been a little bit of progress since?
Note - I am *not* saying that scanners don't use scan strings. Or that they don't inspect the contents of the files. Of course they do. But not in the way most people think.
For instance, in our now-defunct scanner (F-PROT), we used 8-byte scan strings.
What happened if one of them was found? Nothing visible to the user.
Other, more rigorous algorithms for malware detection were triggered and *they* decided whether to report anything or not.
What happened if none was found? Nothing visible to the user, either.
So, what was the point of using them, then?
Well, you see, those "other methods" were computationally expensive. The users tend to be annoyed if your scanner spends too much time scanning their files, only to tell them "there's nothing bad there". So, we needed a method for quickly eliminating the need of slow scanning.
(If the system is actually infected, the user is willing to wait, so it's not a problem if the scanner is slow there.)
And that's just for DOS/Windows viruses. In our macro scanner, I used only a handful of scan strings and only to detect extremely polymorphic macro viruses or to detect new variants of widespread macro viruses that my heuristics wouldn't catch.
So, how did I detect macro viruses, then? I used a checksum of the whole macro bodies and stored that in the database of the scanner. This way I'd detect even a single-bit modification of a known macro virus as a new one. I was a big fan of exact identification, see?
Why is exact identification important? Because the difference between malware that deletes files every Friday the 13th and malware that deletes files every day *except* on Friday the 13th can be only a single bit. Wouldn't you rather know which one you've got?
And this is just a simplistic example, of course. There were much more elaborate cases.
For instance, there was a Polish virus that looked like nothing special. Could be detected with a scan string or any other imprecise method you could come up with.
Then a new variant was released, which was almost identical, except in a couple of bytes. But if you confused it with the old one, when your scanner tried to remove the virus from the infected files, it would corrupt them irreparably!
But I digress. Let's get back to the subject of scanning password-protected archives.
To begin with, if you actually bother to read the article I posted a link to in the first tweet of this thread, you'll learn that Microsoft's scanner is detecting malware in ZIP archives protected with the password "infected".
Clearly, if you know the password, it's not a problem to decompress the encrypted file and to scan it. Scanners have been doing this since the '90s. I think McAfee's scanner was the first to try the password "infected" if it encountered an encrypted ZIP archive.
As an aside, what is the point of "protecting" a ZIP archive with a password everyone knows?
You see, the idea here is not secrecy. The idea is safety.
These archives with malware are (or at least were) often sent by e-mail from one researcher to another.
It's easy to mistype someone's e-mail address and we wanted to make sure that if some random person, other than the intended recipient, received the malware by mistake, they wouldn't infect themselves by accidentally running it.
But in order to detect malware in password-protected archives, you don't necessarily need to know the password! What you need is 3 preconditions:
One, the malware must be known to you (duh!). This sounds like a triviality but it means that you can't use heuristics or other means for detecting new, unknown malware.
Two, the malware must be in files that have a constant contents. This excludes parasitic viruses (can be attached to different files), polymorphic worms (change themselves), macros (reside in different documents), etc. Still, this leaves a lot of malware that *can* be detected.
Three, your detection must not be limited to scanning for scan strings. (That is, your scanner needs to use technology more advanced than the one used in 1988.)
So, how do you do that? Scanning password-protected ZIP archives for such malware, I mean.
The only thing encrypted in the ZIP archive is the compressed image of the file. The meta data - file names, length, etc. - are in the clear. That's why you can list the contents of a ZIP archive even if password-protected; you need the password only to extract files from it.
Now, when the archiver extracts files form the archive, it needs a way to know that the process has been completed without errors - i.e., you got out the actual file that was put in and not some corrupted crap. (Even single-bit errors could corrupt the file beyond recognition.)
How does it do that? Simple - it keeps among the other meta data (and, therefore, in the clear) a checksum (CRC-32) of the uncompressed file - which it then checks after decompression.
Do you see where I'm going with this line of reasoning?
If the malware is in a constant, non-modifiable file, all you need is a CRC-32 of this file stored in the scanner's database. Then you just look it up among the (unencrypted!) CRCs of the files stored in the archive and presto, you've detected your malware, no passwords needed!
BTW, this method was invented by the anti-virus researcher Dmitry Gryaznov, then working for McAfee.
End of lecture.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Apparently, the Bulgarian "crypto queen", Ruja Ilieva, famous for the OneCoin scam, who has disappeared and is on the FBI most wanted list, was killed in November 2018 by order of a Bulgarian narco boss.
The killer, also a Bulgarian, is presently in a Dutch prison for some drug-dealing crime.
She was killed on a yacht in the Ionian Sea, her body was cut into pieces and thrown overboard.
The reason for the killing was, apparently, to hide the narco boss' participation in the OneCoin scam.
The narco boss currently lives in Dubai, after escaping Bulgarian law enforcement.
Update. My mother's knee is swollen but not enough for them to have to drain it. So, she spends her time in bed with a large bag of ice on her knee.
She's the oldest patient in the room. Also the sanest one.
The other two grannies, despite being younger than her, are senile. One of them talks to herself the whole night. The other, who has been operated, was trying to rip off her bandages, so they had to tie her to the bed.
My mother has been appointed sentinel of the room and is supposed to alert the nurses when the other two start some kind of mischief.
It's even harder to argue with computers that are idiots.
But it is the hardest to argue with people who are idiots and are armed with computers that are idiots.
Case in point. A former colleague of mine submitted a paper for a conference. The conference organizers used one of those idiotic plagiarism-checking tools. It came with the results that 31% of my colleague's paper was plagiarized.
The plagiarized parts were marked. Let's see:
1) A table, containing values with leading zeroes. This caused nearly 100 plagiarism notifications - apparently many people have values with leading zeroes in their tables are are copying these zeroes from each other.
My bank just served me a cookie consent pop up. Being of the curious sort, I decided to delve into the options and see what exactly I am agreeing to.
There were several categories of cookies: strictly necessary, statistical, marketing.
By default, only the cookies in the "strictly necessary" category are marked as the ones the user is agreeing to (although there is a big fat "accept all" button that most people would click). So far, so good.
OK, let's see what's "strictly necessary" to my bank, shall we?
ASP.NET_SessionId - Preserves the visitor's session state across page requests.
(Apologies for locking this thread but I'm really not in the mood of answering anyone's comments on this subject.)
In 1990, I established the Laboratory of Computer Virology at the Bulgarian Academy of Sciences.
Computer viruses were very prevalent in my country at the time, I was single-handedly developing anti-virus programs for them and cleaning people's computers, so I thought it a good idea for an institution that would do this more professionally.
I was the Lab's first director. Although I didn't work there all the time (there was a long pause as I spent 4 years in Germany, writing my Ph.D. thesis and 10 years in Iceland, working in the anti-virus industry), I did work there for many years.