At 2:20 PM -0500 11/4/00, dmolnar wrote:
On Sat, 4 Nov 2000, Jim Choate wrote:
On Sat, 4 Nov 2000, Declan McCullagh wrote:
"NP" problems, on the other hand, are those that can be solved in nondeterministic polynomial time (think only by guessing). NP includes P.
Actualy any time that can't be described using a polynomial (i.e. a0 + a1x + a2x^2 + ...) is NP. For example something that executes in factorial or exponential time is NP.
I'm sorry - by the definitions I know, Declan has it closer. I'm not sure what you're getting at with "any time that can't be described..." or "something that executes in factorial or exponential time." As far as I know, NP is a class of *problems*, not a class of running times or even a class of algorithms.
And I'm going to give a completely informal, but I hope useful, introduction. Though formalism is very important, and jargon is useful, I suspect that all the talk of "succinct certificates," "oracles," "reducibility," "nondeterministic polynomial time," "Co-NP," etc., isn't very useful to someone just coming to this stuff for the first time. I figure understanding math comes from thinking about specific problems, drawing pictures, mulling things over, drawing more pictures, and basically just "becoming one with the problem." Formal definitions then begin to make a lot more sense. While Bourbaki may favor only the tersest of explanations, I think they are dead wrong. (Fair warning: I knew a lot more about this stuff in 1992 when I was reading Garey and Johnson, Harel, etc. and trying to figure out the zero knowledge papers of Goldwasser and her colleagues. These days, terms like "Co-NP" are not in my daily repertoire of concepts I have a good handle on. But the basic ideas don't need such formal definitions. It's more important to have some _intuition_ about common problems and then see the obvious connections with crypto. David Molnar and others are much better versed in the current lingo.) So, the German guy, Olav, who asked about what P and NP and all that stuff means should think of a specific problem. The "Travelling Salesman Problem" is one problem that's a lot of fun to think about, for complexity issues (and also for specific algorithms, like "simulated annealing," "heuristic search," "genetic programming," etc.). However, I'm going to pick the "Hamiltonian Path" (or Hamiltonian Circuit) problem for most of my discussion. It's equally fun, and is one of the canonical "NP-complete" examples. It turns out that these problems are all similar in a deep way to each other. Though there may not be obvious links between Hamiltonian paths, tiling problems in the plane, Go problems on generalized Go boards, grammar problems, "Monkey puzzles," the Minesweeper game mentioned in this thread, and so on, it turns out that they share deep similarities. In fact, with some effort ("polynomial time effort," so to speak) one problem can be converted to another. Hence the notion that if one could find an "easy" algorithm to solve one, one would have solved all of them. (And always keep in mind that these problems are considered in their _general_ forms, with something like N cities, M x N tile arrays, a Go board of N x N grid points, and so on. Any _specific_ instance is not the essence, though of course a specific instance may still be hideously complicated to solve. And slight factors of 2 or 20 or even 20 million, or, indeed, anything short of "exponential in N," are not important. This is often called "Big O" notation, e.g, the complexity/effort goes as "O (N^3)" or "O (N!)". For exact definitions of these kinds of terms, consult any of the many books on this stuff; I'm just trying to provide the motivation and basic ideas here.) TRAVELLING SALESMAN PROBLEM Take 10 cities in Europe. For example: Berlin, Paris, Madrid, Rome, Marseilles, Hannover, Geneva, Amsterdam, Warsaw, and London. The TSP (Travelling Salesman Problem) would be to find the shortest path that connects all cities. Exhaustive search finds the shortest path on the order of (N -1)! calculations, where N is the number of cities. Actually, (N -1)! divided by two. Neither the direction of the path (the factor of 2) nor the starting city (the N -1) matters. For 10 cities, this is trivial to solve exhaustively: a mere 180,440 paths to be computed. However, for 20 cities the number of paths to be computed is about 6 x 10^16. For 50 cities, 3 x 10^62 paths. Whew. Are their better algorithms than exhaustive search over all paths? There may be many algorithms which give "pretty good" results. Dividing the cities into regions and optimizing each one, then stitching the results together works pretty well. (Used in a lot of algorithms, developed at Los Alamos for bomb designs...the Metropolis algorithm, for example.). Simulated annealing works pretty well. And so on. But these are all just approximations, not actual solutions. Good enough for engineering, and evolution (which is why a rabbit trying to get from his burrow to a food source to another food source doesn't die of starvation while he's trying to solve the Travelling Rabbit Problem exactly). One of the characteristics of this kind of problem is that there is often/usually no way to really measure "convergence on a solution." In a maze, for example, as one travels down various maze passages one may know that the goal is "just a few meters away," but this does little good: one may have to backtrack, or undo, ALL moves all the way back to the beginning of the maze search to take another branch point! "Close doesn't count." (The similarities with most modern crypto should be getting obvious. Most modern crypto only falls to "brute force" -- exhaustive search, trying all the paths, trying to factor a modulus, etc. There is no "getting closer" in most modern ciphers.) HAMILTONIAN PATH PROBLEM Find a path or cycle on a graph which passes through each node once and only once. (Or demonstrate whether any such cycle exists, a slightly different form.) I said I would also use the Hamiltonian Path Problem, HPP. This one is worth spending an hour or two drawing pictures and trying to find clever solutions. It will make the ideas much clearer, I think. And will also lead to a good understanding of "zero knowledge proofs" and the applications of them to things like pass phrases and security systems which don't leak information to wiretappers or even to the system being accessed! (Quite a feat, that.) OK, go back to those 10 cities in Europe. As we know, some of those cities have direct rail connections to other of the cities, some don't. Berlin and Paris are connected (ignore trivial issues of their perhaps being intermediate cities and towns...). Madrid and London are not connected directly by rail lines. The HPP is to take a graph, the set of cities and the links between them, and find a path or cycle which passes through each node (city) once and only once. And returns to the starting node. For example, one such path might look like: Rome to Marseilles to Madrid to Geneva to Warsaw to Berlin to Hannover to Amsterdam to Paris to London...whoops, London is only connected to Paris, so we're stuck in London. (This isn't the essence of a HPP, and one could stipulate that all cities must be connected to at least two other cities.) Let's throw London out and only consider N cities with connections to at least two other cities. How many possible paths need to be calculated depends on the number of interconnections. Some time spent with a pencil and paper will be invaluable. As the number of cities increases, the number of paths to consider goes up roughly as N! (N factorial, as above with the TSP). This is not polynomial in the number of cities. (Hence, for newcomers, one starts to get the idea of "nonpolynomial time," though there are some nuances and quibbles to deal with.) However, suppose someone presented a purported Hamiltonian cycle for a graph? That is, a claimed path through the N cities that passed through each city once and only once? This could be verified in practically no time, just by eyeballing the purported cycle. And thus one gets at the idea of an "oracle," a machine or god which can "guess" the solution. (Hence the idea behind "nondeterministic polynomial time." Again, there are nuances and formal issues, but this is the general idea.) (The intuition goes like this: For a large graph, of, say, 100 cities, the calculations required to compute the O (100!) paths would be vastly greater than all the computers that will ever be built could ever do in a billion universes, blah blah. If someone presents a solution, they must have "oracular" powers. Well, not really, as we shall see.) ZERO KNOWLEDGE--APPLYING THIS TO PASS PHRASES "I am Tim May and I present my proof of this: I know a Hamiltonian cycle for this particular graph which is my signature graph." So I present a graph with 100 cities on it, linked in various ways, and show a Hamiltonian cycle. Proof. Except that now I've given this proof to anyone watching, including the system or person I just showed the proof to. Is there a way to prove beyond any doubt that I know the Hamiltonian cycle without actually revealing it. There is. Wow. Trippy stuff. I'll wait a day or two to explain. However, related to our above discussion of HOW FREAKING HARD it is to compute such a Hamiltonian cycle on a reasonably large graph, HOW DID I EVER FIND ONE? Well, I have no oracular or magical abilities to "guess" ("non-deterministic polynomial time"). Instead, I constructed the Hamiltonian cycle from scratch! I took N cities, with no specified links, and connected them in some Hamiltonian cycle. Very easy to do. Just draw N cities or nodes and draw lines connecting them, satisfying the once and only once criterion. Ah, but then draw in a bunch of _other_ links between the nodes. The full graph, nodes and links, is VERY HARD for anyone else to find a Hamiltonian cycle for, but trivial for _me_ to find a Hamiltonian cycle for! So I can use the fact that I know a Hamiltonian cycle for "my" "signature graph" as a pass phrase, or other proof of identity. The trick to be able to prove that I know it without actually revealing it. As I said, I'll describe the trick later today or tomorrow. By the way, my favorite book on this is David Harel's "Algorithmics." Not exactly intended for a beginning student, but much more descriptive and basic than _most_ of the books on complexity theory. Lots of pictures, lots of descriptions of actual problems (tiling puzzles, my favorites). I confess that I only skimmed the sections on "Presburger arithmetic" and why it is "beyond NP" in some weird sense. Have fun, --Tim May -- ---------:---------:---------:---------:---------:---------:---------:---- Timothy C. May | Crypto Anarchy: encryption, digital money, ComSec 3DES: 831-728-0152 | anonymous networks, digital pseudonyms, zero W.A.S.T.E.: Corralitos, CA | knowledge, reputations, information markets, "Cyphernomicon" | black markets, collapse of governments.