In message <199408281656.JAA14318@netcom.netcom.com> Norman Hardy writes:
Modular reduction is scarcely worse than the multiplication. If I have a 60 word multi precision number N to be reduced by a 30 word number M, I compute a guess by dividing the 32 bit most significant bits N by the most significant 32 bits of M.
On a DSP chip like the Texas C40, 32-bit multiplication takes one clock cycle. Modular reduction will take something on the order of one hundred clocks. Modular reduction is much more expensive than multiplication.
I then multiply this quotient by M and subtract that from N. That reduces N by some multiple of M leaving N mod M unchanged. The error in the guess might mean that N is less than 32 bits shorter than it was before the operation but this method gets nearly 32 bits per pass. The inner loop of the is the same as in multiplication.
For all of this using the floating point unit wins on most modern CPUs.
Not on DSP chips. On the C40, reals are only 32 bits long, so there is no benefit to using them. They are less precise than integers. -- Jim Dixon
participants (1)
-
jdd@aiki.demon.co.uk