Entropy is a measure of randomness in a set of data. The more random
   your data is, the more entropy it has. When we talk of entropy we are
   generally referring
   to Shannon Entropy which is an algorithm that will produce a result
   between 0 and 8, where 8 means there is no pattern in the data, thereby
   its very random and
   0 means data follows a pattern. When you are used to evaluate digital
   data its very easy to see these patterns without even doing any
   calculation. Its the
   analog of the movie The Matrix when you can see what is going on in the
   Matrix Code without the translator. Random high entropy digital data
   looks very chaotic while
   low entropy data looks more organized, even you may be able to see some
   symmetric patterns formed by the array of bytes in the data, though it
   is still not readable in any
   known language, and again, no calculation is needed in general. It
   turns out that one of the techniques, I would say, almost universally
   used to hide malicious code
   is packing the code or encoding it to change its byte pattern signature
   to fool security products. The problem of this approach is also
   universal. As you
   pack or obfuscate the malicious code, its entropy increases. In fact,
   studies have demonstrated that entropy can be used very effectively to
   separate none malicious
   code from malicious one based on its entropy. Malicious samples will
   typically have an entropy over 7.2 while legitimate software ranges
   from 4.8 to 7.2. An entropy
   close to 8 will be seen in 30% of malicious samples but only in 1% of
   innocent code. More than half of malicious samples will have an entropy
   over 7.2 while only
   one out of ten legitimate applications will have such high entropy. In
   conclusion, not all malicious samples have high entropy (but a great
   majority will) and not
   all legitimate applications will have a low entropy (but the majority
   will). The reason why legitimate samples can have high entropy is
   related to the fact that
   packing is a valid technique used to decrease the size of executables,
   protect resources, and many applications take advantage of this.
   So even though entropy is not enough to differentiate malicious code
   from normal innocent code, malware analysis tools have used high
   entropy as one of the main elements
   in their malicious scoring systems to flag samples. If you have been in
   the crypters world, you can corroborate this by scanning your stub
   alone and then your stub
   with the encoded payload stored in the data section or resources
   section. Detection immediately increases. Anti-malware solutions will
   look for high entropy not
   only as a global value on the entire executable but also section by
   section and there is little that we can do to counter it. Even though
   this has been a problem
   for many years, the proposed techniques are not definitive. If you read
   on the topic, malware authors propose to increase the amount of normal
   code versus obfuscated code
   as first solution. Evidently then the stub has to be bigger and full of
   unused code. So although possible I don't see this as a good solution.
   Second solution is even
   more inconvenient because it would require to reduce the amount of
   obfuscated code meaning that the malicious code would have to be
   smaller in comparison to the stub.
   Unfortunately in most situations, its the opposite. The stub is greatly
   smaller than the payload. This, therefore will limit the possible
   payloads to the very few
   small ones. Another possibility is obfuscate the code using an
   algorithm that does not increase entropy (like XORing and Base64
   encoding). This last one I think is
   the more convenient, which does not mean that is perfect. XORing as
   well as Base64 encoding can be easily decrypted to unmask the real
   purpose of the code. Also signatures
   can be created directly, both against the XORed as well as the Base64
   encoded data. Finally, Some anti-malware solutions can even decode
   these simple schemes during the emulation phase
   of the analysis.
   Thinking in better possible alternatives I came up with an interesting
   idea that I have seen applied to some few malware samples in the past.
   If the problem is
   randomness, then why not try to disguise the malicious obfuscated code
   by inserting patterns to reduce its randomness and therefore its global
   entropy. In this way you are not
   limited to obfuscate the code with simple algorithms and yet stay under
   the radar of anti-malware solutions; also the obfuscated code could be
   of any size.
   With this in mind I created a POC to reduce the Shannon entropy of an
   obfuscated malicious array of bytes. The idea is to break the array in
   chunks and insert a low
   entropy pattern of bytes between one chunk and the next. When the
   sample is to be executed we have to rebuild the original payload in
   memory and at this point we

   have bypassed the static detection of the high entropy code.

   I want to stop in the pattern to follow for the low entropy bytes
   because you can create different patterns in order to avoid signature
   detections. Using several
   mathematical equations you could come up with many different low
   entropy patterns but for the purpose of the article we will use a
   simple one.
   The second needed task is the one that will put together the high
   entropy chunks of bytes with the low entropy ones. Not much to say
   about it. The third task will restore
   the original array of bytes by eliminating the low entropy patterns,
   because after all, we need to restore the obfuscated code to what it
   was originally in order to proceed to
   the de-obfuscation phase.