During the Crypto' 95 Rump Session, Josh Benaloh of Microsoft Corp. presented a new modular reduction algorithm that he and I developed. It is faster than the Montgomery method by about 10 to 15%, and is more general and easier to understand. The central idea is that it is easy to reduce a number to an equivalent one that's just one "block" (machine word) longer than the modulus, by repeatedly subtracting off the highest block, and adding back something that's equivalent, but smaller. In the following pseudocode, B is the radix in which the numbers are represented (2^32 for a 32-bit machine), n is the length of modulus in blocks, U is B^(n+1) mod the modulus, X is the number to be reduced, k+1 is the length of X, and Y is the result. 1. Y = X 2. For i from k down to n+1, repeat steps 3 and 4 3. Y = Y - Y[i] * B^i + Y[i] * U * B^(i-n-1) 4. If Y >= B^i, then Y = Y - B^i + U * B^(i-n-1) Tricks can be used to eliminate step 4, and to reduce Y to n blocks using one single precision division, and n more single precision multiplications. The algorithm will hopefully be written up more completely soon. Wei Dai