[2/20]
Hashing and encryption functions make good targets for #detection as they are reasonably unique to each malware family and often contain lengthy and specific byte sequences due to the mathematical operations involved.
These characteristics make for good Yara rules 😁
[3/20] The biggest challenge is locating the functions responsible for hashing and encryption. I'll leave that for another thread, but for now...
You can typically recognize hashing/encryption through the use of bitwise operators inside a loop. (xor ^ and shift >> etc).
[4/20] For example, here's a string hashing function utilised by recent #qakbot samples.
Note the heavy usage of mathematical operators. Like xor (^), right shift (>>) and bitwise "AND" (&).
These will typically produce a unique sequence of bytecodes.
[5/20] The disassembly and bytecodes for those instructions can be used for a Yara rule.
To grab the bytecodes, Highlight the decompiled code (right), this will automatically highlight the disassembly and bytecodes (left)....
[6/20] Highlighting the entire function should be avoided, as it is only the mathematical operators that will be consistent enough between samples.
For example, by including the do/while loop, then the Jump instructions (JZ/JC etc) would also be included in the disassembly....
[7/20] ... Cont'd
Jumps (JZ/JC/JNZ) == inconsistent Byte Values == not good for a Yara rule.
If a jump is accidentally included, it can be manually unselected in the #Ghidra disassembly window.
The final result should look like this.
[8/20] At this point, it's useful to obtain multiple samples of the same malware. In order to check that the remaining selected bytes are the same between samples.
With #qakbot, this value (red) does change between samples. It's important to account for this in the final rule.
[9/20] The bytecodes are easily obtained using #Ghidra.
Highlight-> Right-Click -> Copy Special -> Byte String.
This copies the highlighted code in a format that can be used by #Yara.
[10/20] The bytes can then be pasted directly into a #Yara template.
I'll keep the rule as minimal as possible to demonstrate the concept.
(IRL - Filters would be added to improve performance)
[11/20] Running the rule from there, it successfully finds the original sample. But other related samples (3 others) in the same folder remain undetected.
[11/20] This is due to the issue mentioned, where bytes unrelated to mathematical operations can differ between similar samples.
Using #Ghidra to compare two samples from the same Qakbot campaign, there are minor differences that are enough to break the original Yara rule
[12/20] To correct this, wildcards can be added to the bytes that differ between samples.
An example of this can be seen below. The new Yara rule is *mostly* the same, but with a few wildcards (??) added where the bytes differed.
[13/20] With the new changes saved, the rule can be re-run and multiple samples are now detected.
[14/20] Running the rule against running processes is able to identify where #Qakbot has successfully injected itself.
Qakbot likes to inject into OneDriveSetup.exe, so this is likely a True Positive.
[15/20] Qakbot was used for this example, but the concept works well across other malware families.
#IcedID is a good example where the unique encryption
can be used for detection.
[16/20] Now for a few notable and important caveats....
{1}: This technique is generally only effective against unpacked payloads or in situations where the malware is already executing in memory.
Detecting packed files on disk will typically require a different approach.
[17/20]
{2} - Technically the same approach can be applied to the bytecodes of unpacking routines used by loaders. But that tends to be more complex and a topic for another day.
[18/20]
{3}: The final rule has been kept simple to demonstrate the concept.
Although technically accurate for the use case, it lacks filters to perform quickly and without consuming large amounts of CPU. This would likely need to be adjusted in a production environment.
[19/20]
{4} - Malware authors can avoid this type of detection by updating encryption/hashing logic with each sample (or by introducing randomised junk instructions between the "real" code).
[20/20]
{5} - In-memory masking (like Foliage) will defend against this type of detection very well.
I'm yet to see this implemented by the major malware families, but it has been implemented (very effectively) by #HavocC2 and #BruteRatel.
[21] The base rule can be found on my Github here.