Tweet

Marcus Botacin

12 Oct, 62 tweets, 18 min read

@fabriciojoc

[New Paper] How Antiviruses really work? How do they really detect #malware? Do they still use signatures? (A thread) Thanks to all of my coauthors (some on twitter @fabriciojoc @abedgregio @pgeus)
Publisher: sciencedirect.com/science/articl…
Archived: secret.inf.ufpr.br/papers/marcus_…

In this study, I analyze almost all aspects of an AV operation to understand how they really work. Important to say that this is not exactly a reverse engineering work from the sense of digging into all details of all components.

I did not perform any decompilation to not violate copyright, so all my analyses are from an inspection of dropped files perspective, as any skilled user could do to understand what runs in their own computers.

This study does not aim to describe all details of an AV working because it is too much, but to present an overview of how things work across multiple solutions.

I did my best to understand how AVs really work, but I don't guarantee that there are not exceptions and special cases not covered in the paper. So, it is not perfect, but I still believe it might be useful. It's about 80 pages of content to introduce anyone to AV working.

AVs are complex and modular pieces of software. Thus, in this paper, when I talk about AVs, I talk about the following components: the DLLs injected into the processes, the browser plugins, the GUI, the engines, and the kernel drivers.

When we talk about AVs, it is important to notice that we mostly talk about AV engines not AV products. This distinction is important because many AV products share the same detection engines.

@virustotal

The effect of sharing engines can be noticed when the same labels are assigned to the samples by different products. The figure shows this effect over time on million samples submitted to @virustotal. Some market movements can be clearly seem, such as the merge of Avast and AVG.

Another important aspect to clarify when we talk about AV is that modern AVs are not only solutions that detect malware, but they try to be complete security solutions by incorporate multiple security modules.

When we analyze the AVs, we also analyze parts of these modules, as they are coupled with the engines. The following table summarizes the main modules we found on the inspected AVs.

Our analyses of AVs started from the installation procedure. We would like to discover things like how resistant they were to preinstalled threats. Short answer: not much!

Something that I'm still trying to understand is Avast's choice to download files in clear (plain HTTP), which might put it at risk if additional measures were not taken.

Most of the files downloaded by Avast during an installation procedure were updates. We tracked updates of a real user machine during a month. They occur all the time, even though with no pattern of how many a day.

The most important take-home message for us is that updating is too much frequent to be neglected. Unfortunately, not many research work in the literature consider AV updates.

The additional measures Avast (in fact, all AVs) takes to not be affected by malicious connections is to sign its binaries. Analyzing the structure of the update (VPX) files, we discovered that hashes and certificates are used for this task.

One thing that always bothered me about AVs were claims about AVs using or not using signatures anymore to detect malware. To discover what is the current situation, I developed an experiment.

On it, distinct patches of binaries are submitted to AV evaluation, in a divide-and-conquer strategy, as shown in figure. If a specific patch is detected, I concluded that signatures were used by a given AV to detect a given sample.

I discovered with this experiment is that around 1/3 of all samples detected by an AV are detected using signatures. On the one side, the majority of samples are detected using other strategies (ML, heuristics, so on), that might be more robust against evasion.

On the other side, 1/3 is still a significant number that shows that signatures are still relevant thus they should still be studied and evaluated. Unfortunate, many researchers seems to have completely discarded the study of signatures from their research agendas.

The experiment also revealed nice information about the signature sizes. One thing I noticed in the literature is that there is almost no information/guidelines about how to write a good signature, so maybe real-world data might help.

Whereas there is huge variation, it seems that the KB range might be identified as an average across all solutions.

One thing that must be made clear about AVs is that they having some detection capability does not mean that it is exercised all the time. The following table shows experiments results for distinct payload encoding strategies.

In the Avast case, for instance, although it is able to detect a malware payload encoded as base64, this kind of detection engine is triggered only for on-demand (OD) checks, but not performed in real-time (RT).

AVs also have to make choices about where to put their efforts, ex: which packers they will support. This is not only a technical decision, but also related to the identified prevalence of a given packer in the population to be protected.

In our experiments we identified that distinct AVs implement different detectors/unpackers.
When evaluating this result, have in mind that these are the packer detectors that we identified and for our tested samples. There might be other unpackers encoded that we did not find.

If a packer is very popular, AVs might put more effort to detect it than if a packer is less popular. For instance, they might add custom unpackers. In the figure, we show experiment results comparing the ability of distinct AVs on detecting UPX samples.

Whereas most of them will likely claim they can detect UPX-packed samples, the nature of their detection varies significantly. Whereas some are able to detect even modified UPX version (we compiled ourselves), others only detected the standard UPX versions.

One thing that I always asked myself was how AVs handle cross-platforms threats. This is very important because each time is more common to transfer files among them (e.g., from Windows to Android).

We discovered that AVs act differently when scanning samples in their native platform and when in a "foreign" one. AVs often do not perform real-time scans of cross-architecture threats, even though they are able to detect them if an on-demand scan is requested.

The variation among the operation modes is also observed when inspecting compressed files.
This type of scan is very important because files are not always presented in clear, but compressed (e.g., in zip files).

We discovered that most AVs can open compressed files, but mostly in on-demand checks. It's rare to find compressed threats detected in real-time, during the filesystem copy.

To detect these samples, AVs embed their own libraries that implement the compression algorithms.
It is funny that in some cases the AV supports compressed file formats that the system cannot natively open (e.g., 7z).

I got a bit disappointed when inspecting password-protected files. I was expecting AVs to try some bruteforce approach to open them, but this seems to not be the case. In this case, AVs seems to prefer to wait the user to unpack the files to inspect it.

I was also curious to check how AVs react to memory attacks. Memory attacks are getting very popular and I talked about them a few times before: secret.inf.ufpr.br/2021/09/29/adv…

and here: secret.inf.ufpr.br/2020/12/17/fro…

There are many code injection techniques and what I found in common is that all of them are mostly detected statically (e.g., AVs try to infer a memory injection attack from the imported libraries, and so on). Once we bypassed this step, attacks were rarely detected in real-time.

Our analysis started with the installers and finished only after the scan result, so we were able to evaluate what happens after the scan. As in the past, most AVs keep infected files in a quarantine (check the paper for implementation details).

What we missed most is an effective way to mark files as FP and/or FNs. This should be a '1-click' step, but some AVs ask the user to go to a website and fill a form. Not very good for usability!

An interesting aspect about AV development is that AVs hardly ever can rely on third party code due to security reasons, so they end up implementing lots of stuff by themselves. Thus, it was difficult for us to understand what all components do or to identify all components.

In some cases, however, we could find very popular tools embedded in the AVs, such as Snort rules embedded in the VIPRE AV.

Another important aspect of an AV that I think it is a bit overlooked is its security. Some people believe that because they are security solutions they are secure software, but it is not always true. The figure shows the CVEs (#) reported for AVs (probably only a lower bound).

One aspect of AV operation that is well studied is its performance. I believe everybody one day ever complained about AVs slowing the system down.
However, I believe the performance of AV still needs to be better characterized according to the multiple operation steps.

I tried to contributed a bit towards this direction. For instance, we analyzed the impact of AV daemons. When is it worth to preload the malware definition database? The figure shows the case of ClamAV's daemon.

In addition to Snort, we also found YARA rules embedded in many of the most popular AVs.
To get the best performance from YARA rules in the long-term, precompiling the rules is a good choice.

This time AVs did all: all solutions using YARA were using precompiled rules. The gains of precompiling are shown in the figure below.

Academic-wise, researchers often refer to AVs as platforms to implement their hypothesized solutions.
In many cases, the proposed solutions are Machine Learning (ML) approaches to classify function arguments in real-time. Is it really possible?

To answer this question, we searched for the functions hooked by the multiple AVs. In the Avast case (see below), a few functions are hooked, such that most ML-based mechanisms would be impractical as a straightforward adaptation of this AV.

On the other hand, AVs such as BitDefender seems to be a good platform for ML approaches in this sense, given the number of hooked functions (almost filling an entire page of the paper!)

Before this work, I have also worked with AVs before, such as when trying to evaluate their detection results.
Publisher link: sciencedirect.com/science/articl…
Archived link: secret.inf.ufpr.br/papers/marcus_…

One thing that bothered me by that time was the inability to turn off some engines so as to perform a fair evaluation (e.g., signatures vs signatures, ML vs ML, and so on).

This time I discover that many of these options are available in the AV's background, but they are not exposed in the GUI (a pity!).
Figure shows a configuration file of one of the AVs that is parsed at startup. The engines to be used in that run are defined in the config file.

Closing the paper, we analyzed the databases associated with the AVs. Databases are used for many purposes, such as telemetry and logging.

The most interesting use case, in our view, however, was to speed up analyses. After a file is analyzed by the AV, they often store the scanned file information in a database. If a new file is requested, the AV checks if a scan was already performed and skips the new scan if so.

This databases and telemetry issues are another example of an important aspect of AV operations that is hardly ever present in the academic literature (to not say it's completely absent from any AV information folder).

@leylabilge1

To be fair, recently, after I had submitted this paper for review, I found an interesting paper from @leylabilge1 @balzarot @tudor_dumitras bridging part of this gap. Worth reading: usenix.org/conference/use…

To conclude, I'd like to thank all the reviewers that claimed in previous reviews of my papers that AVs did/didn't use signatures for motivating me to do this research and show what happens in the real-world (Irony alert!)

In the end, AVs are not heroes nor villains, they are pieces of software that present good and bad implementations and project decisions that worth to be studied and understood.

@CNPq_Oficial

Now seriously thanking the funding sources: @CNPq_Oficial for my scholarship @iserrapilheira for Raphael's scholarship

I'd like also to congratulate Felipe, an undergraduate student that performed many experiments to help me and that keep working even not being funded with a scholarship!

@matalaz

I'd like to really thanks @matalaz, whose book inspired me a lot in this work.

@MalFuzzer

And I believe this paper might (hopefully) interest other authors, such as @MalFuzzer, who recently published a book on AVs as well.

@assolini

Finally (once again), to finish this tweet storm, I'd like to humbly invite some AV guys that I follow @assolini or follow me @mer0x36 or that worked in the AV industry @cpuodzius to take a look at this paper.

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Marcus Botacin

Try unrolling a thread yourself!

More from @MarcusBotacin

Marcus Botacin

Did Thread Reader help you today?

Like this author's thread?