Entropy is a measure of randomness in a set of data. The more random
your data is, the more entropy it has. When we talk of entropy we are
generally referring
to Shannon Entropy which is an algorithm that will produce a result
between 0 and 8, where 8 means there is no pattern in the data, thereby
its very random and
0 means data follows a pattern. When you are used to evaluate digital
data its very easy to see these patterns without even doing any
calculation. Its the
analog of the movie The Matrix when you can see what is going on in the
Matrix Code without the translator. Random high entropy digital data
looks very chaotic while
low entropy data looks more organized, even you may be able to see some
symmetric patterns formed by the array of bytes in the data, though it
is still not readable in any
known language, and again, no calculation is needed in general. It turns
out that one of the techniques, I would say, almost universally used to
hide malicious code
is packing the code or encoding it to change its byte pattern signature
to fool security products. The problem of this approach is also
universal. As you
pack or obfuscate the malicious code, its entropy increases. In fact,
studies have demonstrated that entropy can be used very effectively to
separate none malicious
code from malicious one based on its entropy. Malicious samples will
typically have an entropy over 7.2 while legitimate software ranges from
4.8 to 7.2. An entropy
close to 8 will be seen in 30% of malicious samples but only in 1% of
innocent code. More than half of malicious samples will have an entropy
over 7.2 while only
one out of ten legitimate applications will have such high entropy. In
conclusion, not all malicious samples have high entropy (but a great
majority will) and not
all legitimate applications will have a low entropy (but the majority
will). The reason why legitimate samples can have high entropy is
related to the fact that
packing is a valid technique used to decrease the size of executables,
protect resources, and many applications take advantage of this.
So even though entropy is not enough to differentiate malicious code
from normal innocent code, malware analysis tools have used high entropy
as one of the main elements
in their malicious scoring systems to flag samples. If you have been in
the crypters world, you can corroborate this by scanning your stub alone
and then your stub
with the encoded payload stored in the data section or resources
section. Detection immediately increases. Anti-malware solutions will
look for high entropy not
only as a global value on the entire executable but also section by
section and there is little that we can do to counter it. Even though
this has been a problem
for many years, the proposed techniques are not definitive. If you read
on the topic, malware authors propose to increase the amount of normal
code versus obfuscated code
as first solution. Evidently then the stub has to be bigger and full of
unused code. So although possible I don't see this as a good solution.
Second solution is even
more inconvenient because it would require to reduce the amount of
obfuscated code meaning that the malicious code would have to be smaller
in comparison to the stub.
Unfortunately in most situations, its the opposite. The stub is greatly
smaller than the payload. This, therefore will limit the possible
payloads to the very few
small ones. Another possibility is obfuscate the code using an algorithm
that does not increase entropy (like XORing and Base64 encoding). This
last one I think is
the more convenient, which does not mean that is perfect. XORing as well
as Base64 encoding can be easily decrypted to unmask the real purpose
of the code. Also signatures
can be created directly, both against the XORed as well as the Base64
encoded data. Finally, Some anti-malware solutions can even decode these
simple schemes during the emulation phase
of the analysis.
Thinking in better possible alternatives I came up with an interesting
idea that I have seen applied to some few malware samples in the past.
If the problem is
randomness, then why not try to disguise the malicious obfuscated code
by inserting patterns to reduce its randomness and therefore its global
entropy. In this way you are not
limited to obfuscate the code with simple algorithms and yet stay under
the radar of anti-malware solutions; also the obfuscated code could be
of any size.
With this in mind I created a POC to reduce the Shannon entropy of an
obfuscated malicious array of bytes. The idea is to break the array in
chunks and insert a low
entropy pattern of bytes between one chunk and the next. When the sample
is to be executed we have to rebuild the original payload in memory and
at this point we