------- Forwarded Message JOURNAL OF RECREATIONAL MATHEMATICS Volume 25, Number 1 -- 1993 Book Reviews, edited by Charles Ashbacher Note: The following is neither a book nor a software review. It is a review of a piece of hardware--but it should be of prime interest to readers of this journal. The Dubner PC Cruncher-A Microcomputer Coprocessor Card for Doing Integer Arithmetic, H & R Dubner, 449 Beverly Road, Ridgewood, New Jersey 07450, Telephone 1-201-652-1825. $2,500. Need more processor speed for arithmetic operations? Try the Dubner "Cruncher." This is an add-in board for IBM PC Compatibles (ISA bus) which is designed to quickly add, subtract, multiply, and divide large numbers. For example, to check a number of 1000 digits for probable-primality using Fermat's theorem (a^p = a(mod p) whenever p is prime) takes about two minutes on a typical 486DX machine, but less than five seconds if this same machine has a Cruncher installed. The improvement in speed depends on the size of the numbers. Numbers with 100 digits are multiplied about six times as fast, numbers with 1000 digits about forty times as fast, and those with 10,000 digits are multiplied about ninety-eight times as fast. The Cruncher is currently limited to integer arithmetic. Its speed comes from the use of a chip designed for real time digital signal processing (LSI Logic's L64240 MFIR) which can perform 1.28 billion multiply/add operations per second. Access is provided to this power on two levels: through a menu interface and through a set of ANSI compatible C language routines. The menu-based interface is primitive but workable. It allows you to enter numbers from the keyboard and then perform all the various Cruncher operations, including the basics operations, using any of several factorization methods, access to a primality proving algorithm, and even the ability to screen for primes of certain forms, such as abc + 1 where either a or c is incremented during the search. Also included are the necessary tools for a programmer to access the Cruncher's power from within their own C programs. This includes a set of instructions similar to those in an assembly language, for example, hmult(a,b,c) multiplies the numbers pointed to by b and c and places the result in the location that a points to. The procedure hpowerm(a,b,c) raises a to the b-th power modulo c and places the result in a. Functions are included to do the basic arithmetic operations as well as such things as calculating n!, ab(mod c), and the integer part of the n-th root of a number. These instructions are easily used in any C program and make it unnecessary for the user to know how the hardware actually works. I learned C just to use the Cruncher in my own research and the speed was well worth the effort! The near super-computer speed provided by the Cruncher is very impressive! For example, I recently used the Cruncher on my 386 machine to search for the next prime of the form n!-1 . After about five days I found it: 3507!-1 (with 10,912 digits.) This search would have taken about a year using a very fast 486! Though it is not necessary, I usually use the Cruncher from within the Microsoft Windows environment - that way it can crunch away twenty-four hours a day in the background and I can still do my word processing and programming in the foreground. However, Windows does slow the Cruncher down a few percent. The Cruncher's price, $2,500, is not cheap, but is definitely the least expensive way to make your machine multiply and divide fifty times faster. In short: 1) the speed is unbelievable; 2) the menu interface is workable, but if your interests diverge too much from the Dubners', you will need to be able to program in C; and 3) if you program in C, the tools for accessing Cruncher's power are excellent. Chris Caldwell Mathematics and Computer Science University of Tennessee at Martin Martin, Tennessee 38238 The Dubner PC Cruncher We can give you the computational power you've only dreamed of. Dollar for dollar, we believe that we have built the most powerful hardware available for doing multiple-precision integer arithmetic. Here is a list of execution times for Fermat testing of numbers with 1,000 decimal digits: Vaxstation 3100/38 770 sec Sun 4/330 377 sec Decstation 3100 287 sec IBM RS 6000/320 128 sec 486/33 128 sec IBM 3090 19 sec 486/33 with Dubner PC Cruncher 4.4 sec Fujitsu VP 2200/10 2.1 sec You'll note that the Fujitsu VP 2200/10 can test a 1,000-digit number faster than the Cruncher. But the Fujitsu is a multi- million dollar supercomputer comparable to a four-processor Cray XMP. The PC Cruncher is an add-in board that plugs into your IBM-compatible PC/AT. Its cost: $2,500. Furthermore, the Cruncher becomes more efficient as the numbers grow larger. At 3,000 digits, the Cruncher is slightly faster than the Fujitsu. By 10,000 digits, the Cruncher is 1.5 times faster. The chances are that you already have a PC which you probably turn off at night. Install the Cruncher, run it during those nighttime hours, and you can get the same amount of computing done as if you had access to ten or more hours of supercomputer time. How can we do this? We are Harvey Dubner and Bob Dubner. Harvey is an electrical engineer and computer systems designer with a long-term interest in mathematical theory, particularly in number theory (with over twenty published papers.) Bob is an electronic design engineer with a lot of experience in high-speed digital systems. The PC Cruncher comes from that combination of talent: Harvey has an insatiable need for processing power, and Bob likes to design powerful processors. Over the last ten years we have built and programmed a succession of high-speed, number-crunching circuits that augment the ability of personal computers to do multiple- precision arithmetic. With the previous version of the hardware, we discovered over half of all the known prime numbers of more than 2,000 digits. With 632 numbers, we were by far the biggest contributor to that list. The PC Cruncher is roughly ten times faster than the previous version, and we are very excited about the contributions to theory that will be made when many investigators have this kind of computational power available. Description of the board: The hardware consists of a full-sized circuit board that requires a 16-bit slot in an IBM-compatible PC/AT. At the heart of the design is a 64-tap Finite-Impulse- Response filtering chip made by the LSI Logic Corporation. It is this chip's ability to perform 1.28 billion multiply/add operations per second that gives the PC Cruncher its remarkable performance. (And it is the chip's $1,100 purchase price that makes the PC Cruncher as expensive as it is.) The board dissipates about three or four Watts, and has no special space, power, or cooling requirements. The PC Cruncher has 256K of on-board memory for storing operands, and another 64K of high-speed memory for accumulating intermediate results. All communication between the host and the Cruncher is done with I/O ports -- the Cruncher doesn't use up any valuable memory space. The software for driving the Cruncher will run fastest on a 386 or 486 executing in protected mode. We use Symantec's Zortech C compiler, because it easily and conveniently generates 32-bit protected-mode code. All of the code has been written in strictly ANSI-compatible C, with some MASM 5.1 assembly language. When you buy the Cruncher, you will receive the board, complete schematics, complete documentation for the programmable logic on the board, complete source-level C and MASM code for driving the board, and documentation for that code. Please contact us for additional information. Order from: ( USmail, Email, Fax, or telephone ) Dubner International, Inc. 13 Westervelt Place Westwood, NJ 07675 Tel: 201-664-6434 Fax: 201-358-9377 Tel: Harvey Dubner 201-652-1825 Robert Dubner 201-664-6434 E-mail: Harvey Dubner 70372.1170@compuserve.com Robert Dubner 73247.2334@compuserve.com PC Cruncher Performance Benchmarking: Assessing just how fast the PC Cruncher can calculate depends on many things. The major factor is the size of the operands that are being operated on. Consider multiplication. The PC Cruncher multiplies using the schoolboy algorithm -- it does long multiplication just like you do, except that instead of multiplying and accumulating 2-digit products, the Cruncher multiplies and accumulates 308-digit products. And it only takes the hardware about 6.4 microseconds to multiply-and-accumulate a 308-digit product. Unfortunately, it can take dozens of microseconds of fiddling with the PC's sluggish ISA buss to set up the hardware to create that 308- digit product. The efficiency at that point is relatively low. Even at 1,000 digits, the software is spending about half its time frantically trying to get the hardware going. At about 4,000 digits, the effects of overhead are reduced, but they continue to be significant until about 10,000 digits. To give you some idea of what this means, let's look at the time needed to square an N-digit number. Here is a comparison of a 486/33 with a PC Cruncher, compared with a 486/33 running a highly-optimized multiple- precision multiply routine: Number of digits Time without Cruncher Time with Cruncher Factor 100 .000220 secs .000037 secs 6 500 .004000 .000165 24 1,000 .015000 .000357 42 5,000 .375000 .004400 85 10,000 1.500000 .015283 98 Division performance is even more complicated. The basic inner loop of division is more complicated than that of multiplying, with even more overhead. Dividing a 200-digit number by a 100-digit number is about 6 times slower than squaring a 100-digit number; at 2,000 by 1,000 digits division is 3 times slower than the corresponding multiply, and at 20,000 by 10,000 digits division is 1.7 times slower. Added to all this is a 600-microsecond pre-division calculation time that must be added to all randomly-started divisions. Happily, this calculation need only be done once whenever repeated divisions by the same number are executed, as in the heavily-used 'A**B MOD C' function. In summary: In the 50 to 200 digit range, a 486/33 PC equipped with a PC Cruncher board will be 3 to 10 times faster than the same machine without the Cruncher. The actual speed will depend on the size of the numbers and the function mix. As you go up to 1,000 digits, performance will be 20 to 40 times faster. Past 4,000 or so digits performance will be 50 to 100 times faster, which will put you in supercomputer territory. If you start with a slower PC, the relative performance gains are even more spectacular, since the Cruncher's performance is hindered only a little by the slower processor once you get past a few hundred digits. At 100 digits, a Cruncher-equipped 386/20 will be about 10 to 30 times faster than the same machine without the board, and at 4,000 digits and up will be 150 to 300 times faster. Dubner PC Cruncher -- Other Possibilities RSA Encrypting/Decrypting A 486/33 equipped with a Dubner PC Cruncher board can perform 1,024-bit A**B MOD C calculations in about 0.473 seconds. You could therefore decrypt an entire message at the rate of about 270 bytes per second -- ght times faster than an unaided 486/33. re prepended to a DES-encrypted messages, can get going in about one-half second instead body out there have a network that needs a key Need more speed? Let usere is enough interest, we can build this same basic hardware onto an EISA slave board, instead of an ISA board. We have yet to do a complete analysis, but our feeling is we'd get about a three-fold performance increase in the 1,024-bit range. This added speed would come from a reduction in overhead when manipulating these smallish 308-digit numbers. Need even more speed? Talk to us. It just costs money. The PC Cruncher design would scale up nicely with two or four of the big FIR chips. For somewhere between $6,000 and $10,000 you could blow the maintenance panels off of anything. But we'd need a sponsor. Used to working with workstations? Well, you could spend about $2,000 on a 486/33, add in the $2,500 PC Cruncher, write some software and stick it on the network as a number crunching server, and hardly notice the difference. Or talk to us -- with sufficient interest we could build a Cruncher with its own on-board RISC processor, and stick a SCSI port on it so it'll plug right into a workstation. Not a bad idea -- we could get rid of a lot of overhead that way, and get real improvements in the 100- digit and up range. But again, we'd need a sponsor. ------- End of Forwarded Message