This is somewhat different than the kind of fast multiplication you are looking for.
Yes, but even scalar multiplication is so much faster on a DSP than on most general purpose CPUs that it seems like a definite win. The 486 takes from 13-42 clock cycles to perform a multiply, depending on the operand sizes and number of significant bits in the multiplier. Even if you couldn't keep the pipeline full on a chip like the PowerPC, you'd still be well ahead. But then I hear people say that it's not the multiplication that slows down modular exponentiation, it's the modular reduction. Phil
But then I hear people say that it's not the multiplication that slows down modular exponentiation, it's the modular reduction.
A once saw a short paper on "modular multiplication without trial division" or some such. The down side was that (at least for the 486 doing RSA) you didnt seem to get any extra speed over using a straight forward test-subtract-n-shift method. Unfortunatly, I dont have a reference. Sorry. brad
Phil Karn writes:
But then I hear people say that it's not the multiplication that slows down modular exponentiation, it's the modular reduction.
That's one of the driving reasons for using Montgomery multiplication. You do some up front work that changes the representation into one where the reduction on each multiply is a multple of 2^N (a shift, or fetch of the LSW or MSW of the result). See "Modular Multiplication Without Trial Division", Peter L. Montgomery, Mathematics of Computation, v44, n170, pp 519-521, Apr 1985.
participants (3)
-
Brad Huntting -
Eric Blossom -
Phil Karn