RSA-130 Falls to NFS - Lenstra Posting to sci.crypt.research
From: alenstra@fwi.uva.nl (A. Lenstra) Newsgroups: sci.crypt.research Subject: new factorization record Date: 14 Apr 1996 10:35:20 GMT Organization: FWI, University of Amsterdam, Bellcore Lines: 189 Message-ID: <4kqkd8$g37@net.auckland.ac.nz> Summary: factorization of RSA-130 using the Number Field Sieve Keywords: factoring, Number Field Sieve, RSA On April 10, 1996, we found that RSA-130 = 18070820886874048059516561644059055662781025167694013491701270214\ 50056662540244048387341127590812303371781887966563182013214880557 has the following factorization RSA-130 = 39685999459597454290161126162883786067576449112810064832555157243 * 45534498646735972188403686897274408864356301263205069600999044599 This factorization was found using the Number Field Sieve (NFS) factoring algorithm, and beats the 129-digit record that was set on April 2, 1994, by the Quadratic Sieve (QS) factoring algorithm (cf. [AGLL]). The amount of computer time spent on this new 130-digit NFS-record is only a fraction of what was spent on the old 129-digit QS-record (see below for details). For information about NFS, see [LL]. For additional information, implementations and previous large NFS factorizations, see [BLZ, DL, E, GLM]. We used the polynomial 5748,30224,87384,05200 X^5 + 9882,26191,74822,86102 X^4 - 13392,49938,91281,76685 X^3 + 16875,25245,88776,84989 X^2 + 3759,90017,48552,08738 X - 46769,93055,39319,05995 and its root 125,74411,16841,80059,80468 modulo RSA-130. This polynomial was selected from a list of 14 candidates provided by Scott Huddleston, after extensive sieving experiments carried out by Joerg Zayer at the University of Saarland. Sieving was done on a great variety of workstations at many different locations: 28.37% by Bruce Dodson (Lehigh University) 27.77% by Marije Elkenbracht-Huizing (CWI, Amsterdam) 19.11% by Arjen K. Lenstra (Bellcore) 17.17% by contributors to the www-factoring project (organized by Jim Cowie, Wojtek Furmanski, and Arjen Lenstra, among others) 4.36% by Matt Fante (IDA) 1.66% by Paul Leyland (Oxford University) 1.56% by Damian Weber (University of Saarland) Except for a relatively small part of the contribution of the CWI and the entire contribution by the University of Saarland, all contributors used the NFS sieving program that was developed at Bellcore. This program uses `lattice sieving with sieving by vectors' as introduced by Pollard in [P], and is based on the implementation described in [GLM]. The main difference is the more liberal use of `special q-primes' that define the lattices (see also [E]). Unlike [GLM], these special q's do not necessarily belong to the factor base (as is the case in [P]); this idea can also be found in [B]. Another difference is the more liberal interpretation of the factor base sizes, which results in a much more flexible memory usage. These changes allowed us to run the sieving program in parallel on almost any number of processors, as long as they have at least about 6 megabytes of memory. This was exploited in the Web-based sieving effort, which used a collection of CGI scripts ("FAFNER", from Cooperating Systems Corporation) to automate and coordinate the flow of tasks and relations within the globally distributed network of anonymous sieving clients. As a consequence almost any user of the Web can contribute to future, larger factoring efforts, simply by a few appropriate mouse clicks. The changes also made it hard to estimate how much time was spent on the sieving stage, because the performance of the siever strongly depends on the amount of memory it gets. We can say, however, that we would have spent about 500 mips years (i.e., 10% of the computing time spent on the 129-digit QS-record) if we had done all the sieving on average workstations with at least 24 megabytes of memory. Sieving started in September 1995, initially on a very limited number of workstations. The Web-based sieving started relatively late, in December 1995. Relations were collected and merged and duplicates were removed at Bellcore. On Jan 14, 1996, we had 56,515,672 unique relations. In uncompressed ASCII format, with only the primes >2000000 listed per factorization, storage of the relations required more than 3.5 gigabytes. With a rational factor base of 250,001 elements (the primes <= 3,497,867) and an algebraic factor base of 750,001 elements (ideals of norm <= 11,380,951), the breakdown of full and partial relations is as follows. \ number of prime ideals of norm > 11,380,951: \________ 0 1 2 3 4 5 6 number of \ rational \_____________________________________________________________ primes > | 3,497,867 | | 0 | 48400 479737 1701253 1995537 6836 403 9 1 | 272793 2728107 9617073 11313254 39755 2212 44 2 | 336850 3328437 11520120 13030845 56146 3214 71 3 | 1056 9022 24455 0 0 0 0 4 | 3 9 31 0 0 0 0 The first successful dependency used 4143834 relations, of which 3506 were free relations. The breakdown of large prime ideals amongst the other contributing relations is as follows. 0 | 24242 154099 330738 255742 1054 52 1 1 | 75789 443647 885136 648148 2734 164 2 2 | 56326 300369 565605 389046 1923 131 4 3 | 182 776 1105 0 0 0 0 4 | 2 4 7 0 0 0 0 Once every week during the collection the cycles were counted at Bellcore. The final collection of 56,467,272 relations with one or more large primes generated 2,844,859 cycles. In these cycles 18,830,237 (33.3%) of the partial relations occurred (i.e., were useful). As in our previous NFS factorizations, we witnessed an explosion in the number of cycles, with first a sharp increase in the number of useful relations, followed by a sudden growth of the number of cycles: # partials # usefuls # cycles 41,319,347 47,660 16,914 45,431,262 8,214,349 224,865 53,282,421 11,960,120 972,121 56,467,272 18,830,237 2,844,859 Using the approach sketched in [DL], these data resulted in a 3,504,823 x 3,516,502 matrix of total weight 138,690,744 (on average 39.4 entries per column). Using Peter Montgomery's Cray implementation of his blocked Lanczos algorithm (cf. [M95]), it took 67.5 CPU-hours and 700 Mbyte central memory on the Cray-C90 at the SARA Computer Center in Amsterdam to do the linear algebra. This resulted in 18 useful dependencies. These were processed on 1 processor of an SGI Challenge (150 MHz R4400SC processors) using Peter Montgomery's square root program (cf. [M93]), which took 49.5 hours per dependency (with initial numerator and denominator of approximately 9.7 million decimal digits). The factorization was found by the third dependency. It is likely that slightly more sieving (and therefore more partials) would have led to substantially smaller (and easier) matrix and square root problems. Arjen K. Lenstra, Bellcore, April 11, 1996 with Jim Cowie Marije Elkenbracht-Huizing Wojtek Furmanski Peter L. Montgomery Damian Weber Joerg Zayer Acknowledgements are due to the contributors, and to the Dutch National Computing Facilities Foundation (NCF) for the use of the Cray-C90 supercomputer. [AGLL] D. Atkins, M. Graff, A.K. Lenstra, P.C. Leyland, THE MAGIC WORDS ARE SQUEAMISH OSSIFRAGE, Proceedings Asiacrypt'94, Lecture Notes in Comput. Sci. 917, (1995) 263-277. [B] D.J. Bernstein, The multiple-lattice number field sieve, Chapter 3 of Ph.D. thesis, ftp://koobera.math.uic.edu/pub/papers/mlnfs.dvi. [BLZ] J. Buchmann, J. Loho, J. Zayer, An implementation of the general number field sieve, Proceedings Crypto'93, Lecture Notes in Comput. Sci. 773, (1994) 159-165. [DL] B. Dodson, A.K. Lenstra, NFS with four large primes: an explosive experiment, Proceedings Crypto 95, Lecture Notes in Comput. Sci. 963, (1995) 372-385. [E] R.M. Elkenbracht-Huizing, An implementation of the number field sieve, Technical Report NM-R9511, Centrum voor Wiskunde en Informatica, Amsterdam, 1995. To appear in Experimental Mathematics [GLM] R. Golliver, A.K. Lenstra, K.S. McCurley, Lattice sieving and trial division, Algorithmic number theory symposium, proceedings, Lecture Notes in Comput. Sci. 877, (1994) 18-27. [LL] A.K. Lenstra, H.W. Lenstra, Jr., The development of the number field sieve, Lecture Notes in Math. 1554, Springer- Verlag, Berlin, 1993 [M93] Peter L. Montgomery, Square roots of products of algebraic numbers, in Proceedings of Symposia in Applied Mathematics, Mathematics of Computation 1943-1993, Vancouver, 1993, Walter Gautschi, ed. [M95] Peter L. Montgomery, A block Lanczos algorithm for finding dependencies over GF(2), Proceedings Eurocrypt 1995, Lecture Notes in Comput. Sci. 921, (1995) 106-120. [P] J.M. Pollard, The lattice sieve, pages 43-49 in [LL]. # Thanks; Bill # Bill Stewart, stewarts@ix.netcom.com, +1-415-442-2215
Excellent work! --- My preferred and soon to be permanent e-mail address:unicorn@schloss.li "In fact, had Bancroft not existed, potestas scientiae in usu est Franklin might have had to invent him." in nihilum nil posse reverti 00B9289C28DC0E55 E16D5378B81E1C96 - Finger for Current Key Information Opp. Counsel: For all your expert testimony needs: jimbell@pacifier.com
On Fri, 29 Mar 1996, Mike Duvos wrote:
On a more serious note, does anyone know what is happening with Arjen Lenstra and RSA-130? Last I heard back in late December, FAFNER, the magic WWW sieving dragon, had collected more than enough relations from participants to yield a factorization. Surely they have not spent an additional four months crunching the big boolean matrix at CWI.
On Sat, 30 Mar 1996, Wei Dai wrote:
Apparently the Cray they are using to crunch the matrix is busy with higher priority users and they have not been able to squeeze in enough CPU time. I was told at the beginning of March that they didn't expect to finish before late April, but now it looks like the job will take another two to three months. Anyone got a spare supercomputer laying around?
On Sun, 14 Apr 1996, Arjen Lenstra wrote:
On April 10, 1996, we found that [RSA-130] has the following factorization
RSA-130 = 39685999459597454290161126162883786067576449112810064832555157243 * 45534498646735972188403686897274408864356301263205069600999044599
[deletia]
Using Peter Montgomery's Cray implementation of his blocked Lanczos algorithm (cf. [M95]), it took 67.5 CPU-hours and 700 Mbyte central memory on the Cray-C90 at the SARA Computer Center in Amsterdam to do the linear algebra.
It appears that the estimates of "another two to three months" were overly pessimistic. Does anyone know how big a check Jim Bidzos has to write for this one? Also, a ballpark guess of how this result extrapolates to the MIPS years required to factor a 512 bit PGP key would probably be of interest to all. -- Mike Duvos $ PGP 2.6 Public Key available $ mpd@netcom.com $ via Finger. $
mpd@netcom.com (Mike Duvos) writes:
one? Also, a ballpark guess of how this result extrapolates to the MIPS years required to factor a 512 bit PGP key would probably be of interest to all.
I don't have a good guess for this, but Arjen did say that the cost to break RSA-130 was a fraction of what it cost to break RSA-129 because of the improved algorithm, so I'd guess we'll find out soon. The next target of the consortium is planned to be RSA-155, I believe, which is above 512 bits; that means skipping RSA-140 to go for the one with the higher psychological value. While 512-bit PGP keys are interesting to Cypherpunks, other 512-bit RSA keys are vitally important to some banks. Jim Gillogly Trewesday, 25 Astron S.R. 1996, 07:50
regarding these collaborative, "open" factorizations and cracking projects: I have been wondering about malicious hackers getting into these pools. would it be possible for them to contribute false data that screws up the end results? or are such anomalies easily discarded or disregarded by the final processes? there is a reduction step in the NFS (number field sieve, technique used to factor large numbers) in which all the collected data is mashed. how sensitive is this process to spurious data? i.e. if there was a little bit of bad data in its computation, does it completely screw it up, or is it robust and resistant to this kind of problem? it seems to me that in many cases, these collaborative projects virtually cannot check the validity of the supplied data without repeating the computation effort, although there may be good tests that tend to screen out "most" bad data. future implementors of these programs might amuse themselves with trying to create such safeguards or anticipate such "attacks" which are pretty significant the more the processes become distributed.
"Vladimir Z. Nuri" <vznuri@netcom.com> writes:
I have been wondering about malicious hackers getting into these pools. would it be possible for them to contribute false data that screws up the end results? or are such anomalies easily discarded or disregarded by the final processes?
The latter, for this application -- unlike the straightforward approach to RC4 cracking, the partial relations that contributors find for the factoring exercise are (like the factoring itself) time-consuming to compute but dead simple to check... and, in fact, each of them is checked before accepting it.
it seems to me that in many cases, these collaborative projects virtually cannot check the validity of the supplied data without repeating the computation effort, although there may be good tests that tend to screen out "most" bad data.
Yes, that's a good point and one we hashed around a bit at the beginning of the RC4 project, with less than a perfect conclusion -- but some good ideas. You need to account for several kinds of people, including people plaing with less than a full deck of clues; and the target of the cracking ring allocating and turning in a "not found" report on the actual target part of the space.
future implementors of these programs might amuse themselves with trying to create such safeguards or anticipate such "attacks" which are pretty significant the more the processes become distributed.
Absolutely. Jim Gillogly Trewesday, 25 Astron S.R. 1996, 21:32
On Mon, 15 Apr 1996, Vladimir Z. Nuri wrote:
I have been wondering about malicious hackers getting into these pools. would it be possible for them to contribute false data that screws up the end results? or are such anomalies easily discarded or disregarded by the final processes?
future implementors of these programs might amuse themselves with trying to create such safeguards or anticipate such "attacks" which are pretty significant the more the processes become distributed.
I guess I would have to ask you why you think hackers would be interested in these projects in the first place? Your typical hacker would care very little about such a project and in fact may be interested in seeing it succeed. However, I do feel that you may have a valid point when switching "hackers" to "opponents of the research." Anyone with an interest in preventing or slowing down the progress in such a project would be more dangerous in my mind than your average hacker. Preventing that from happening would be necessary if it is decided that such a threat truly exists. Bruce Marshall
On Mon, 15 Apr 1996, Vladimir Z. Nuri wrote:
I have been wondering about malicious hackers getting into these pools. would it be possible for them to contribute false data that screws up the end results? or are such anomalies easily discarded or disregarded by the final processes?
future implementors of these programs might amuse themselves with trying to create such safeguards or anticipate such "attacks" which are pretty significant the more the processes become distributed.
I guess I would have to ask you why you think hackers would be interested in these projects in the first place? Your typical hacker would care very little about such a project and in fact may be interested in seeing it succeed.
the malicious type of hacker has the psychology of taking great glee in tearing anything meaningful down. they don't necessarily need a plausible reason. the purpose of destruction alone can be a powerful motivating force. those who destroy carefully constructed things for fun obtain a sense of power from it.
However, I do feel that you may have a valid point when switching "hackers" to "opponents of the research." Anyone with an interest in preventing or slowing down the progress in such a project would be more dangerous in my mind than your average hacker.
the point is, when you are sharing your project among a lot of elements "out there" on a network, you have to worry more and more about "safe computing". when you are working on a purely voluntary basis, what is your guarantee that everyone who volunteers is actually on your side? again, a bigger problem the more a task is decentralized. one interesting argument in favor of centralized computing (I'm not saying it is a definitive argument, quite far from that of course-- just pointing out that Distribution is not necessarily the Panacea to All Problems).
On Tue, 16 Apr 1996, Vladimir Z. Nuri wrote:
On Mon, 15 Apr 1996, Vladimir Z. Nuri wrote:
I guess I would have to ask you why you think hackers would be interested in these projects in the first place? Your typical hacker would care very little about such a project and in fact may be interested in seeing it succeed.
the malicious type of hacker has the psychology of taking great glee in tearing anything meaningful down. they don't necessarily need a plausible reason. the purpose of destruction alone can be a powerful motivating force. those who destroy carefully constructed things for fun obtain a sense of power from it.
True. Yet, my estimate as to the number of 'malicious hackers' that would take interest in disturbing such a project (and have the ability to do so) is very low. That number would increase though if you were the NSA or any other agency perceived as Big Brother, since the hackers would probably see your efforts as a threat.
However, I do feel that you may have a valid point when switching "hackers" to "opponents of the research." Anyone with an interest in preventing or slowing down the progress in such a project would be more dangerous in my mind than your average hacker.
the point is, when you are sharing your project among a lot of elements "out there" on a network, you have to worry more and more about "safe computing". when you are working on a purely voluntary basis, what is your guarantee that everyone who volunteers is actually on your side? again, a bigger problem the more a task is decentralized. one interesting argument in favor of centralized computing (I'm not saying it is a definitive argument, quite far from that of course-- just pointing out that Distribution is not necessarily the Panacea to All Problems).
In every aspect of life we have to deal with the threat of someone working to counteract our efforts. However, to continue functioning we rate these threats as probable or inprobable and deal with them accordingly. I don't see skewed results, due to falsifying or tampering with records, as being a very probable threat in the present especially when you are dealing with volunteers. That threat would increase in magnitude when you start paying people for computer time (as they typically have less of a loyalty bond to you than volunteers would) or if a person would benefit by corrupting the data (such as in the case of a competition). Bruce Marshall
"Vladimir Z. Nuri" <vznuri@netcom.com> writes:
I have been wondering about malicious hackers getting into these pools. would it be possible for them to contribute false data that screws up the end results? or are such anomalies easily discarded or disregarded by the final processes?
If you are doing a distributed search of a key space, then it is of course possible that people, either accidently or deliberately, may fail to correctly do their part of the search and report misleading results. You may recall a hostile attack on SSL a while back where the design was flawless but the desired key failed to appear when the results of the individual searches were merged. Fortunately, where integer factorization is concerned, it is trivial to verify the full and partial relations for correctness and discard any bad data during the counting process. Thus there is no chance of garbage making it into the final reduction.
there is a reduction step in the NFS (number field sieve, technique used to factor large numbers) in which all the collected data is mashed. how sensitive is this process to spurious data? i.e. if there was a little bit of bad data in its computation, does it completely screw it up, or is it robust and resistant to this kind of problem?
The input to the reduction step is simply a large number of ways of making the number "1" by multiplying together elements of the factor base, all modulo the number of be factored. Given an overdetermined set of such relations, one can can search for linearly dependent combinations of their exponents modulo 2. Each such dependency permits one to construct a relation in which all the exponents are even, and possibly a non-trivial square root of 1 modulo N. Since each dependency has at least a 50-50 chance of yielding a factor of N, only a handful of them are needed. Certainly bad data in the matrix could cause problems, but the matrix is sparse and damage would probably be localized. You might get out some dependencies that weren't real, but unless you had quite a lot of garbage data, you would probably get enough good ones to succeed. Non-relations involving small primes would probably be more poisonous than ones involving the high end of the factor base. In any case, as I stated earlier, it is trivial to guarantee all the data going into the final reduction has been sterilized.
it seems to me that in many cases, these collaborative projects virtually cannot check the validity of the supplied data without repeating the computation effort, although there may be good tests that tend to screen out "most" bad data.
future implementors of these programs might amuse themselves with trying to create such safeguards or anticipate such "attacks" which are pretty significant the more the processes become distributed.
The only safeguards I can think of when doing a distributed search of a keyspace are to randomly assign each area to be searched to multiple participants, and to encapsulate the software in some sort of hack-resistant module, possibly calculating a running hash which could be checked when results were submitted to the central authority. If you have 10,000 volunteers, each searching 0.01% of the keyspace using a klutz-proof software module, quite a few sophisticated users would have to collaborate to create a significant chance of missing the key. In cyptography, as in life, there are no guarantees. -- Mike $ PGP 2.6 Public Key available $ mpd@netcom.com $ via Finger. $
participants (8)
-
Bill Stewart -
Black Unicorn -
Bruce Marshall -
Jim Gillogly -
Leslie Farnsworth -
mpd@netcom.com -
Perry E. Metzger -
Vladimir Z. Nuri