Re: Fast modular exponentiation

17 Dec 2003

      Phil Karn <karn@unix.ka9q.ampr.org>  writes:
...
I.e., if I come up with a list of clock counts for each basic
arithmetic instruction, how can I tell which algorithm is probably
best for my machine?
Back in the days of Mix, Knuth worked out the model. But with modern
pipelined chips with significant on-chip cache, the model becomes
too complex to solve arithmetically.

The usual solution is to use Berkeley's Architect's Work Bench (AWB)
which allows you to model the chip's instruction set, cache structure,
pipeline stall characterists, etc. while using a compiler to
generate actual code to execute. You can then execute your algorithm
against the chip, and declare a winner.

Of course, you have to validate the chip model, and you have to know
how the compiler optimizations work, how it interacts with branch prediction
logic, etc.

While awb is readily available for the usual Unix systems, using it for
anything less trivial than a grad school compiler optimization course is
a ton of work. It makes sense when you are inventing a new chip
architecture, or even a significant revision to an existing chip.
I believe that it is far too much work to use awb (or anything of similar
capabilities) to evaluate algorithms for real world chips.

For algorithm optimization, it makes more sense to study the chip's
characteristics, and use a heuristic approach, testing real implementations.

I've already measured nearly a four to one difference in execution times
using Phil's DES code using different compilers and operating systems
on the same hardware (my 486).

But it is pretty unsatisfying to say that the best algorithm "depends" on
half a dozen variables, and that we can't reliably predict (engineer) a
solution.

Pat

Pat Farrell      Grad Student                 pfarrell@cs.gmu.edu
Department of Computer Science    George Mason University, Fairfax, VA
Public key availble via finger          #include <standard.disclaimer>