[cc'd to cypherpunks] Michael Johnson <mikej2@Exabyte.COM> writes:
On Thu, 17 Oct 1996, Ted DiSilvestre wrote:
... As someone posted a few weeks ago, rather than find one message to crack we should work on ten or a hundred simultaneously if possible. This seems to be the easiest way (assuming we can find this many "good" messages) to increase cracking efficiency by orders of magnitude.
Hmmm... I don't see that this increases cracking efficiency. The suprising (to me) result is that it doesn't really hurt it any. The increase in average expected time to find a particular key is offset by the fact that more keys are tried at once. We could crack just as many keys per year, on average, but we become less sure of which keys will be cracked first.
You do save something: you only need one key schedule per key that you try. You can try the key against multiple ciphertexts. A negative effect is that you might loose cacheing opportunities as code and data are increased. For timings you're likely to get for DES, you suffer from the law of diminishing returns quickly, and the additional keys/sec from doing more keys tapers off asymptotically. The number of keys it would be most efficient to use (say where adding another key speeds things up by less than 1%, or whatever) hinges heavily on the actual balance between how fast the key schedule can be made to go as compared to a CBC block operation (or whatever mode you're attacking). Also the protocol you are attacking, and the position of the known plaintext in this block affects things greatly, the further along the plaintext, the more blocks you have to cycle through, and more data you have to move around. To illustrate the effect, here's some quick cacluations based (loosly) on the timings being quoted. With Peter Trei's projected DES block timing from pentium 90 to pentium 133, and approximate equivalence of ecb and cbc demonstrated by Eric's code (this set of figures is hand-waving: I'm not at all sure like is being compared with like, and we are now extrapolating from extrapolations), combining with Eric's keyschedule timing: keyschedule 12.2us (Eric Young's) DES cbc block = 0.47us (from Peter Trei's extrapolated pentium 133Mhz) n keys | key | DES | elapsed for | elapsed per | at a time | sched | cbc | n key tests | key test | keys/sec ----------------------------------------------------------------------- 1 12.2us 1.4us 13.6us 13.6us 73.5k keys/s 2 12.2us 2.8us 15.0us 7.5us 133.2k keys/s 4 12.2us 5.6us 17.8us 4.5us 224.2k keys/s 8 12.2us 11.3us 23.5us 2.9us 340.7k keys/s 16 12.2us 22.6us 34.8us 2.2us 460.3k keys/s 32 12.2us 45.1us 57.3us 1.8us 558.3k keys/s 64 12.2us 90.2us 102.4us 1.6us 624.8k keys/s 128 12.2us 180.5us 192.7us 1.5us 664.3k keys/s 256 12.2us 361.0us 373.2us 1.5us 686.0k keys/s So 256 keys gives ~9x improvement in keys/sec with these even less accurate guestimates, and using more keys won't increase that improvement all that much, asymptotically tailing off around 9.5x for reasonable numbers (< 64k keys). Improving the key schedule in the second table doesn't add hardly anything, because it is basically irrelevant when you get to 256 keys. (However, a faster key schedule _is_ significant for 1 key at a time). I think that Peter Trei / Phil Karn are fairly close to the limit, and if anything the resulting figure will be a fair bit short of what I have extrapolated because I suspect their peak Mb/s figure will go down when you start doing practical things with the addition of extra data shifting, etc. So it all depends on what corners can be cut in the key schedule, and how much you gain by coding the keyschedule in hand optimised assembler, and all the other factors discussed above, in summary: - keyschedule / CBC block timings ratio - how far into the message is the known plaintext (how many blocks to cycle) - cache limits (additional restriction on optimal number of keys) - extra code overhead over peak cache Mb/s (moving data, extra code) - extra code size and complexity (for multiple key code) Adam -- print pack"C*",split/\D+/,`echo "16iII*o\U@{$/=$z;[(pop,pop,unpack"H*",<> )]}\EsMsKsN0[lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<J]dsJxp"|dc`
Eric Young pointed out to me a factor of 8 error in my DES CBC block timing cacluations (I used megabytes, Erics figures are in megabytes, Peter's are in mega _bits_, this fact escaped my notice, though I must say I was very impressed with Peters optimisations :-). Peter since posted his cool keysched optimisations, which change things also. So here's the table revisited (extrapolated to 133Mhz Pentium, from Peter's 90Mhz to keep it in line with the previously posted table): keysched = 3.8us DES cbc = 7.3us n keys | key | DES | elapsed for | elapsed per | at a time | sched | cbc | n key tests | key test | keys/sec ----------------------------------------------------------------------- 1 3.4us 21.9us 25.3us 25.3us 39.5k keys/s 2 3.4us 43.8us 47.2us 23.6us 42.4k keys/s 4 3.4us 87.6us 91.0us 22.7us 44.0k keys/s 8 3.4us 175.2us 178.6us 22.3us 44.8k keys/s 16 3.4us 350.4us 353.8us 22.1us 45.2k keys/s 32 3.4us 700.8us 704.2us 22.0us 45.4k keys/s 64 3.4us 1401.6us 1405.0us 22.0us 45.6k keys/s 128 3.4us 2803.2us 2806.6us 21.9us 45.6k keys/s So as you can see this greatly reduces the gains to be made from multiple keys. Not worth doing more than 64 keys, and 64 keys only buys 15% increase in keys/sec. When Peter adds what he calls the "glue" code in his paper (extra code to move data, compare results etc), the advantage of multiple keys may go down further. Also if the extra code for testing multiple keys pushes the code requirements over the 8k L1 code cache, or the extra data space pushes the data over the 8k L1 data cache, this may lose more than is gained. The extra code complexity will add a small amount of overhead too. The data requirements for multiple keys aren't that high. (Extra data is number of blocks required per test x 8 byte block size = 64 x 8 x 3 = 1.5k, or 384 bytes if you restrict yourself to 16 keys at once, and lose the last 1% gain). Adam -- print pack"C*",split/\D+/,`echo "16iII*o\U@{$/=$z;[(pop,pop,unpack"H*",<> )]}\EsMsKsN0[lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<J]dsJxp"|dc`
participants (1)
-
Adam Back