alleged-RC4
Bill Sommerfeld
sommerfeld at orchard.medford.ma.us
Wed Sep 14 08:17:12 PDT 1994
Actually, in looking at the assembly code generated by three different
compilers (GCC on i386, GCC on PA, and HP's PA compiler), strangely
enough, the `% 256' should be `& 0xff' (it shaves a few instructions
off the inner loop for some reason which isn't immediately apparant to
me..).
On the PA, I got a ~30% speedup by unrolling the inner loop 4x,
assembling the pad into an `unsigned long', and doing one 4-byte-wide
XOR with the user data. I think most of the speedup comes from giving
the instruction scheduler more instructions to reorder to avoid
load-store conflicts. Your milage will vary on other architectures.
- Bill
More information about the Testlist
mailing list