Notice, as an unrelated side note, that in c++ you can't pass full arrays to functions as you would in managed languages, you have to pass a pointer to the first byte of an array to be processed and the return a pointer to the first byte of the resulting processed array. The function copies the chunks of high entropy code to a new array and inserts a pattern of ascending bytes starting by a initial random byte. Not all bytes can be random because the entropy would not be reduced, so only the first one is. There rest will follow a pattern and here of course you can select many different patterns. For this example I used a simple incremental pattern. Final task will restore the original byte array by eliminating the inserted low entropy chunks of code. PBYTE restore_original(PBYTE high_ent_payload) { constexpr int payload_size = (payload_size_after_entropy_reduction + 1) / 2; // re-calculate the original payload size BYTE low_entropy_payload_holder[payload_size_after_entropy_reduction] = {0}; // create a new array to hold the bytes to be processed memcpy_s(low_entropy_payload_holder, sizeof low_entropy_payload_holder, high_ent_payload, payload_size_after_entropy_reduction); // move the array to be processed to the newly created array // Create an empty array which will contain the restored data static BYTE restored_payload[payload_size] = {0}; int offset_of_hi_entropy_payload = 0; int offset_of_original_payload = 0; // Because high and low entropy chunks are of the same size, then simply copy the high entropy ones to the restored array and skip the low entropy ones for (size_t i = 0; i < number_of_chunks; i++) { for (size_t j = 0; j < chunk_size; j++) { restored_payload[offset_of_original_payload] = low_entropy_payload_holder[offset_of_hi_entropy_payload]; offset_of_hi_entropy_payload++; offset_of_original_payload++; } for (size_t k = 0; k < chunk_size; k++) { offset_of_hi_entropy_payload++; } } // Copy the remaining bytes if (remaining_bytes) { for (size_t i = 0; i < sizeof remaining_bytes; i++) { restored_payload[offset_of_original_payload++] = high_ent_payload[offset_of_hi_entropy_payload++]; } } return restored_payload; } So lets explain this graphically. Suppose a byte belonging to a high entropy chunk is represented by the letter "H" and a low entropy byte belonging to a low entropy chunk is represented by the letter "L", and the remaining never modified bytes are represented by the letter "R" The original high entropy array is something like this: HHHHHHHHHHHHHHHHHRRR The processed low entropy array is something like this: HHHHLLLLHHHHLLLLHHHHLLLLHHHHRRR So to restore the original high entropy array from the low entropy processed array we simply eliminate the low entropy "L" bytes and we finally append the remaining "R" bytes. The restored high entropy array is something like this after low entropy chunks elimination: HHHHHHHHHHHHHHHHHRRR Finally we can explain the main function that will use the previous two functions to perform all the jobs, and to make sure that all of this works, we will calculate the Shannon entropy at each step to corroborate entropy is high at the beginning, then reduced and finally its increased once more.
int main() { const auto lowEntropyArrayPointer = reduce_entropy(payload); // Process the original high entropy array and get a pointer to the reduced entropy processed array BYTE original_hi_entropy_payload[sizeof payload * 2 - remaining_bytes] = {0}; // copy the resulting array to a newly created array for entropy calculation purposes memcpy_s(original_hi_entropy_payload, sizeof original_hi_entropy_payload, lowEntropyArrayPointer, sizeof payload * 2 - remaining_bytes); const auto first_array = calculate_entropy(reinterpret_cast<char*>(original_hi_entropy_payload), sizeof original_hi_entropy_payload); // Calculate entropy after processing const auto restored_payload = restore_original(original_hi_entropy_payload); // Restore original array BYTE restored_low_entropy_payload[(payload_size_after_entropy_reduction + 1) / 2] = {0}; memcpy_s(restored_low_entropy_payload, (payload_size_after_entropy_reduction + 1) / 2, restored_payload, (payload_size_after_entropy_reduction + 1) / 2); const auto second_array = calculate_entropy(reinterpret_cast<char*>(restored_low_entropy_payload), sizeof restored_low_entropy_payload); // Calculate restored array entropy const auto original_array_entropy = calculate_entropy(reinterpret_cast<char*>(payload), sizeof payload); // Calculate entropy of the original unprocessed sample printf("\r\Original array Entropy is: %f\r\n", original_array_entropy); // Present results to console. printf("Processed array Entropy is: %f\r\n", first_array); printf("\r\Restored array Entropy is: %f\r\n", second_array); getchar(); } By compiling the source code provided, you get the following results printed to the console: Original array Entropy is: 4.825164 Processed array Entropy is: 3.451938 Restored array Entropy is: 4.825164 I tested with a very large, random array and entropy was reduce from more than 7 to less than 4. In the example because the original array is small, reduction is modest, but we can still see it and probe the point. It is also important to realize that all calculations are done based on the original array and then on the converted one, which means that you don't need to know the original array characteristics (like original length) to be able to restore it successfully because the algorithm employed for conversion is completely reversible, however if you would want to use random sized low entropy chunk patterns then you'd need to store somewhere not only the converted array but also the data needed for complete restoration. Although the code would be more complex, results surely be better in terms of avoiding signatures creation against it. Reducing entropy of obfuscated malware code is simple; it can be used to evade detection and on top of that it might confer some additional protection against signatures creation. The presented code can be adapted to create solutions that could help prevent the use of entropy as a mean of detecting malware. Creating better low entropy byte patterns using different mathematical equations and different sized low entropy pieces of code will maybe increase the robustness of the method. I would like to wish everyone to become more educated, smarter and ethical @zeynep
participants (1)
-
zeynepaydogan