Re: fast modular reduction

17 Dec 2003

...
In the following pseudocode, B is the radix in which the numbers are 
represented (2^32 for a 32-bit machine), n is the length of modulus in 
blocks, U is B^(n+1) mod the modulus, X is the number to be reduced, k+1 
is the length of X, and Y is the result.
1. Y = X
2. For i from k down to n+1, repeat steps 3 and 4
3.	Y = Y - Y[i] * B^i + Y[i] * U * B^(i-n-1)
4.	If Y >= B^i, then Y = Y - B^i + U * B^(i-n-1)
Is there a proof of correctness available for this algorithm? It
looks almost like a Radix-B peasant division algorithm with some
modifications. Is there an algorithmic analysis available? I also
I think there is a bug in your description. Let k+1 = n+1
(e.g. the dividend is 1 more "block" than the modulus). Then
i=n starting out, and we have

3. Y=Y - Y[n] * B^n + Y[n] * U * B^(n-n-1)  [we have B^-1] I'm assuming
this was unintended.

How does this algorithm compare to computing the reciprocal
via Newton's Formula, and then multiplying by the reciprocal
using Karatsuba multiplication? While I was at IBM Watson I invented
a modular reduction algorithm that saves 1/4 the number of 
multiplications required on average once you have the reciprocal 
computed.

-Ray

Re: fast modular reduction

Ray Cromwell