Malware and XOR - Part 2
In part 1, I gave some examples to recover XOR keys from encoded executables if we knew some of the content of the unencoded file (known plaintext attack).
In this part, I give some examples to automate this process using my xor-kpa tool.
xor-kpa.py takes 2 files as input: the first file contains the plaintext, and the second file the encoded file. We are going to search for string "This program cannot be run in DOS mode". So we should put this string in a file and use it as input, but because I often use this string, xor-kpa also has this string as predefined plaintext: dos. This plaintext can be selected with option -n:
xor-kpa displays some potential keys, in ascending order of extra characters.
Value Key is the recovered key, and Key (hex) is the hexadecimal representation of the key (in case the key would not be printable).
Keystream is the keystream, from which xor-kpa extracted the key by looking for repeating strings.
Extra is the difference between the length of the keystream and the length of the key. If this is just one character, the proposed key is very unlikely to be the encoding key. Output can be filtered by requiring a minimum value for extra by using option -e.
Divide is the number of times the key is present in the keystream.
And counts reports the number of times the same key was recovered at different positions in the encoded file.
So by using this known plaintext (This program cannot be run in DOS mode) with the encoded file, xor-kpa proposes a number of keys. In this example, the key with the highest number of extra characters is the actual encoding key (Password).
Another way to recover the key we saw yesterday, is looking for sequences of null bytes (0x00) which have been encoded. xor-kpa.py can do this too, by giving 000000000000... as plaintext. We could create a file containing null bytes, but it's also possible to provide the plaintext in hex on the command line using notation #h#:
As this can be long to type, we can also use notation #e# to instruct xor-kpa to build a sequence by repetition. Here we created a sequence of 256 bytes with value zero (0x00):
The key was recovered, and the count is very high, so it's very likely that the executable contains sequences of 0x00 bytes even longer than 256 bytes.
Another known plaintext that can be used in executables with an embedded manifest (as resource), is PADDINGXX:
Here we use a sequence of ten times the string PADDINGXX as known plaintext:
Please post a comment is you have ideas for other known plaintexts in executables.
Didier Stevens
Microsoft MVP Consumer Security
blog.DidierStevens.com DidierStevensLabs.com
Comments