Entropy code

6 May 2022

      Entropy is a measure of randomness in a set of data. The more random your data is, the more entropy it has. When we talk of entropy we are generally referring
to Shannon Entropy which is an algorithm that will produce a result between 0 and 8, where 8 means there is no pattern in the data, thereby its very random and
0 means data follows a pattern. When you are used to evaluate digital data its very easy to see these patterns without even doing any calculation. Its the
analog of the movie The Matrix when you can see what is going on in the Matrix Code without the translator. Random high entropy digital data looks very chaotic while
low entropy data looks more organized, even you may be able to see some symmetric patterns formed by the array of bytes in the data, though it is still not readable in any
known language, and again, no calculation is needed in general. It turns out that one of the techniques, I would say, almost universally used to hide malicious code
is packing the code or encoding it to change its byte pattern signature to fool security products. The problem of this approach is also universal. As you
pack or obfuscate the malicious code, its entropy increases. In fact, studies have demonstrated that entropy can be used very effectively to separate none malicious
code from malicious one based on its entropy. Malicious samples will typically have an entropy over 7.2 while legitimate software ranges from 4.8 to 7.2. An entropy
close to 8 will be seen in 30% of malicious samples but only in 1% of innocent code. More than half of malicious samples will have an entropy over 7.2 while only
one out of ten legitimate applications will have such high entropy. In conclusion, not all malicious samples have high entropy (but a great majority will) and not
all legitimate applications will have a low entropy (but the majority will). The reason why legitimate samples can have high entropy is related to the fact that
packing is a valid technique used to decrease the size of executables, protect resources, and many applications take advantage of this.
So even though entropy is not enough to differentiate malicious code from normal innocent code, malware analysis tools have used high entropy as one of the main elements
in their malicious scoring systems to flag samples. If you have been in the crypters world, you can corroborate this by scanning your stub alone and then your stub
with the encoded payload stored in the data section or resources section. Detection immediately increases. Anti-malware solutions will look for high entropy not
only as a global value on the entire executable but also section by section and there is little that we can do to counter it. Even though this has been a problem
for many years, the proposed techniques are not definitive. If you read on the topic, malware authors propose to increase the amount of normal code versus obfuscated code
as first solution. Evidently then the stub has to be bigger and full of unused code. So although possible I don't see this as a good solution. Second solution is even
more inconvenient because it would require to reduce the amount of obfuscated code meaning that the malicious code would have to be smaller in comparison to the stub.
Unfortunately in most situations, its the opposite. The stub is greatly smaller than the payload. This, therefore will limit the possible payloads to the very few
small ones. Another possibility is obfuscate the code using an algorithm that does not increase entropy (like XORing and Base64 encoding). This last one I think is
the more convenient, which does not mean that is perfect. XORing as well as Base64 encoding can be easily decrypted to unmask the real purpose of the code. Also signatures
can be created directly, both against the XORed as well as the Base64 encoded data. Finally, Some anti-malware solutions can even decode these simple schemes during the emulation phase
of the analysis.
Thinking in better possible alternatives I came up with an interesting idea that I have seen applied to some few malware samples in the past. If the problem is
randomness, then why not try to disguise the malicious obfuscated code by inserting patterns to reduce its randomness and therefore its global entropy. In this way you are not
limited to obfuscate the code with simple algorithms and yet stay under the radar of anti-malware solutions; also the obfuscated code could be of any size.
With this in mind I created a POC to reduce the Shannon entropy of an obfuscated malicious array of bytes. The idea is to break the array in chunks and insert a low
entropy pattern of bytes between one chunk and the next. When the sample is to be executed we have to rebuild the original payload in memory and at this point we

have bypassed the static detection of the high entropy code.

I want to stop in the pattern to follow for the low entropy bytes because you can create different patterns in order to avoid signature detections. Using several
mathematical equations you could come up with many different low entropy patterns but for the purpose of the article we will use a simple one.
The second needed task is the one that will put together the high entropy chunks of bytes with the low entropy ones. Not much to say about it. The third task will restore
the original array of bytes by eliminating the low entropy patterns, because after all, we need to restore the obfuscated code to what it was originally in order to proceed to
the de-obfuscation phase.

zeynepaydogan

zeynepaydogan

tags

participants (1)