I've been looking at the RC4 (or alleged RC4) code a bit. Unlike most ciphers, RC4 doesn't seem to have any particular word length dependancies in its principles. That is to say, a cipher like IDEA has lots of magic numbers involved, but RC4 does not, which means that one could, in principle, extend it from being byte oriented stream to being word oriented stream without causing particular harm. (It would, of course, become incompatible, but thats not a real issue.) Can anyone see any reason why one could not change RC4 to being a word oriented stream cipher, call it "ERC4"? The reason I ask is because this would speed things up by a factor of four on 32 bit machines, which would mean modest hardware could possibly break 100mbps speeds. The 64 bit extension on 64 bit RISC processors could go far, far, faster still. This is a real consideration in the protection of network traffic, where extremely fast encryption in software has been a stumbling block. Perry
I realized a few minutes later that I was mistaken to write:
OTOH maybe that is not necessary because probably the whole array does not have to be set up in order to tell whether a given key will work. 1/3 of the entries in the table are fixed once they have been swapped once, so if you checked after doing the first 20 entries, say, about 7 should have their final values, and we can perhaps reject a key already in a known plaintext situation just from that. So actually the large table size may not help against exhaustive key search. (The mod I suggested to the key setup would defend against this possibility, which raises the question of whether this design aspect was chosen to allow for export approval.)
Just knowing several of the first few entries in the table doesn't allow you to quickly reject keys because the algorithm selects entries from throughout the table to xor with the data stream. So this does not imply that keys can be rejected quickly, nor does it suggest that the particular setup algorithm used is particularly weak or was chosen for export approval. Sorry about the error. Hal
perry@imsi.com (Perry E. Metzger) writes: Can anyone see any reason why one could not change RC4 to being a word oriented stream cipher, call it "ERC4"?
The reason I ask is because this would speed things up by a factor of four on 32 bit machines, which would mean modest hardware could possibly break 100mbps speeds. The 64 bit extension on 64 bit RISC processors could go far, far, faster still.
Is mbps megabits per second? If so, I'm within a factor of 3 of confirming your numbers. If it's megabytes, I'm more than an order of magnitude away from understanding what "modest hardware" means. The original code plods along on my 50 Mhz '486 laptop (Borland C++ Pro) at a paltry 1.43mbits/s. Turning the inner loop into obfuscated C picks up a little to 3.84mbits/s, and doing it with 8086-compatible assembler yields only 8.40mbits/s. The compiler could certainly be a lot smarter, but the assembler probably couldn't be improved by a factor of 2 without modifying the algorithm as you suggested -- the current incarnation is at 15 instructions per encrypted byte. Anybody else have timing numbers? Jim Gillogly 25 Halimath S.R. 1994, 19:18
Jim Gillogly says:
Is mbps megabits per second?
Yes. John Ioannidis has gotten the code up to 24mbit/sec on SparcStation IIs.
The original code plods along on my 50 Mhz '486 laptop (Borland C++ Pro) at a paltry 1.43mbits/s. Turning the inner loop into obfuscated C picks up a little to 3.84mbits/s, and doing it with 8086-compatible assembler yields only 8.40mbits/s.
A 50 Mhz '486 shouldn't be that far off a SparcStation if you are operating in the right mode... You don't have to get very obfuscated, but moving the swap in line, doing a bit of unrolling and playing some games with word operations can get you pretty far... Perry
perry@imsi.com (Perry E. Metzger) writes:
Unlike most ciphers, RC4 doesn't seem to have any particular word length dependancies in its principles. That is to say, a cipher like IDEA has lots of magic numbers involved, but RC4 does not, which means that one could, in principle, extend it from being byte oriented stream to being word oriented stream without causing particular harm. (It would, of course, become incompatible, but thats not a real issue.) Can anyone see any reason why one could not change RC4 tO being a word oriented stream cipher, call it "ERC4"?
I'm not sure exactly how you would generalize it. Right now it has a 256 entry table which holds a permutation of the values in 0..255. A byte is selected from this table and xor'd with the data stream. To increase to four bytes per entry and keep it as a permutation we would have to have 4 billion entries taking up 16 GB of memory which seems a bit much. Altenatively we could still have 256 entries but have them four bytes each, but then it's not clear that you keep the cryptographic properties since you no longer have a permutation. However a good application of Perry's suggestion would be to go to a two-byte formulation. You would have 64K entries of two bytes each, holding a permutation of 0..65535, and then use the same algorithm with the 256's replaced by 65536 and the chars replaced by shorts. This would retain the cryptographic properties and IMO would make many sorts of attacks harder (at least requiring more data, probably by a factor of 256). The main down side is that key setup takes 256 times longer, but it shouldn't take much time to init a 64K entry table with a couple of indexes and xor's per entry. So on the whole it seems like a worthwhile extension. I wonder if the NSA would approve it? I think it was Bill Sommerfield who pointed out that it was a little curious that NSA approves RC4 with a 40 bit key when hardware-assisted search like the DES key cracker would appear to be impractical. Maybe some other parallel machine would be suitable, though. (But another possibility is that they can break the cypher and the key length restriction is just cover for that.) Trying to get a 16-bit RC4 approved for export would perhaps not work for 40 bit keys because key setup takes 256 times longer, but key size could be decreased to 32 bits to compensate. OTOH maybe that is not necessary because probably the whole array does not have to be set up in order to tell whether a given key will work. 1/3 of the entries in the table are fixed once they have been swapped once, so if you checked after doing the first 20 entries, say, about 7 should have their final values, and we can perhaps reject a key already in a known plaintext situation just from that. So actually the large table size may not help against exhaustive key search. (The mod I suggested to the key setup would defend against this possibility, which raises the question of whether this design aspect was chosen to allow for export approval.) Hal Hal
Hal says:
perry@imsi.com (Perry E. Metzger) writes:
Unlike most ciphers, RC4 doesn't seem to have any particular word length dependancies in its principles. [...] I'm not sure exactly how you would generalize it. Right now it has a 256 entry table which holds a permutation of the values in 0..255. A byte is selected from this table and xor'd with the data stream. To increase to four bytes per entry and keep it as a permutation we would have to have 4 billion entries taking up 16 GB of memory which seems a bit much. Altenatively we could still have 256 entries but have them four bytes each, but then it's not clear that you keep the cryptographic properties since you no longer have a permutation.
Am I being thick? If you simply do all array indexes modulo the length of the table, wouldn't you still have a permutation? (Its true, however, that one could slow down the algorithm quite a bit if one isn't careful with how one does this...) .pm
I wonder if the NSA would approve it? I think it was Bill Sommerfield who pointed out that it was a little curious that NSA approves RC4 with a 40 bit key when hardware-assisted search like the DES key cracker would appear to be impractical.
Actually, I'm not sure that it's that impractical, but I don't know a heck of a lot about VLSI or hardware design. A fully pipelined chip would require significantly more more chip area than the DES cracker, but you probably don't need that. I'm pretty sure you could make a blazingly fast, non-pipelined, chip with a "key setup" unit and then a "trial encrypt" unit which run in parallel; you clock the key setup unit 256 times to set up the key, then the key gets fed to the trial encrypt unit where it gets tried against the known plaintext/ciphertext pair.. Back of the envelope calculation: massively parallel RC4 cracker. 2**16 chips, cycled at 2**23 hz (8Mhz; fairly conservative), one trial every 2**8 cycles per chip. -> 2**31 trials per second. -> with this hardware, you can break 40-bit RC4 in 256 seconds on average (512 seconds worst case). - Bill
On Thu, 15 Sep 1994, Bill Sommerfeld wrote:
I wonder if the NSA would approve it? I think it was Bill Sommerfield who pointed out that it was a little curious that NSA approves RC4 with a 40 bit key when hardware-assisted search like the DES key cracker would appear to be impractical.
Actually, I'm not sure that it's that impractical, but I don't know a heck of a lot about VLSI or hardware design. A fully pipelined chip would require significantly more more chip area than the DES cracker, but you probably don't need that. I'm pretty sure you could make a blazingly fast, non-pipelined, chip with a "key setup" unit and then a "trial encrypt" unit which run in parallel; you clock the key setup unit 256 times to set up the key, then the key gets fed to the trial encrypt unit where it gets tried against the known plaintext/ciphertext pair.. ...
Don't forget the precomputation attack. The key setup only has to be done 2^40 times, ever. The initial state of the stream cipher can be stored on a set of tapes that are read in parallel to perform the brute force attack.
Mike Johnson second login says:
Don't forget the precomputation attack. The key setup only has to be done 2^40 times, ever. The initial state of the stream cipher can be stored on a set of tapes that are read in parallel to perform the brute force attack.
You may be interested to know that the SPA/NSA agreement covered this; you are allowed to use a 40 bit "salt" thats appended to the key when you use RC4 in an exported application provided the salt is sent along with the message. .pm
On Sep 15, 1:05pm, Bill Sommerfeld wrote:
Actually, I'm not sure that it's that impractical, but I don't know a heck of a lot about VLSI or hardware design. A fully pipelined chip would require significantly more more chip area than the DES cracker, but you probably don't need that.
One of the issues I looked at over the weekend was the parallelization of the key scheduler, which is definitely a non-trivial problem. One thought that did occur to me was that there might be a massively parallel solution to this which has a practical implementation up to 48 bits, but not over this. I'll post more about this when I get some time, but I've got to disagree with Bill here that a simple RC4 implementation (without a parallel key schedule setup) would take more die area than a DES cracker. Ultimately, it is a VERY simple cipher, and the VLSI implementation would reflect this. Even so, the release of the algorithm confirms the RSADSI position that an exhaustive keysearch would be a slow operation, given the setup time required for the key schedule setup. BTW, just an idle question: why is RC4 a stream cipher, as opposed to an 8-bit block cipher? Based on the implementation, it would seem to be the later to me. Ian.
participants (7)
-
Bill Sommerfeld -
Hal -
Ian Farquhar -
Jim Gillogly -
Mike Johnson second login -
Perry E. Metzger -
perry@imsi.com