NOTE: This page corresponds to Section 7 of Comp.Theory FAQ, whose table of contents is the following:

Table of Contents of Comp.Theory FAQ

  1. Introduction
  2. Purely Theoretical Models of Computation
  3. Theoretical Models of More Practical Importance
  4. Kolmogorov Complexity
  5. Sorting and related topics
  6. Permutations and Random number generators
  7. P vs NP <------------------------------- This section
  8. Complexity Theory
  9. Logic in Computer Science
  10. Algorithm Libraries
  11. Open Problems.
  12. Electronic Resources
  13. Bibliography

7. P vs NP

Historical Context

If one goes back a couple of hundred years, we can see that the historical motivation for the study of complexity of algorithms is the desire to identify, under a formal framework, those problems that can be solved "fast".

To achieve this, we need to formally define what we mean by "problem", "solve" and "fast".

Let's postpone the issue of what "problem" and "solve" is by restricting ourselves to well-defined mathematical problems such as addition, multiplication, and factorization.

One of the first observations that can be made then, is that even some "simple" problems may take a long time if the question is long enough. For example, computing the product of two numbers seems like a fast enough problem, Nevertheless one can easily produce large enough numbers that would bog down a fast computer for a few minutes.

We can conclude then that we must consider the size of the "question" as a parameter for time complexity. Using this criterion, we can observe that constant time answers as a function of the size of the question are fast and exponential time are not. But what about all the problems that might lie in between?

It turns out that even though digital computers have only been around for fifty years, people have been trying for at least thrice that long to come up with a good definition of "fast". (For example, Jeff Shallit from the University of Waterloo, has collected an impressive list of historical references of mathematicians discussing time complexity, particularly as it relates to Algorithmic Number Theory).

As people gained more experience with computing devices, it became apparent that polynomial time algorithms were fast, and that exponential time were not.

In 1965, Jack Edmonds in his article Paths, trees, and flowers proposed that "polynomial time on the length of the input" be adopted as a working definition of "fast".

So we have thus defined the class of problems that could be solved "fast", i.e. in polynomial time. That is, there exists a polynomial p(n) such that the number of steps taken by a computer on input x of length n is bounded from above by p(n). This class is commonly denoted by P.

By the late 1960s it had become clear that there were some seemingly simple problems that resisted polynomial time algorithmic solutions. In an attempt to classify this family of problems, Steve Cook came up with a very clever observation: for a problem to be solved in polynomial time, one should be able --at the very least-- to verify a given correct solution in polynomial time. This is called certifying a solution in polynomial time.

Because, you see, if we can solve a problem in polynomial time and somebody comes up with a proposed solution S, we can always rerun the program, obtain the correct solution C and compare the two, all in polynomial time.

Thus the class NP of problems for which one can verify the solution in polynomial time was born. Cook also showed that among all NP problems there were some that were the hardest of them all, in the sense that if you could solve any one of those in polynomial time, then it followed that all NP problems can be solved in polynomial time. This fact is known as Cook's theorem, and the class of those "hardest" problems in NP is known as NP-complete problems. This result was independently discovered by Leonid Levin and published in the USSR at about the same time.

In that sense all NP-complete problems are equivalent under polynomial time transformation.

A year later, Richard Karp showed that some very interesting problems that had eluded polynomial time solutions could be shown to be NP-complete, and in this sense, while hard, they were not beyond hope. This list grew quite rapidly as others contributed, and it now includes many naturally occuring problems which cannot yet be solved in polynomial time.

References

S. Cook. ``The complexity of theorem-proving procedures'', Proceedings of the 3rd Annual Symposium on the Theory of Computing, ACM, New York, 151-158.
J. Edmonds. Paths, trees, and flowers.Canadian Journal of Mathematics, 17, pp.449-467.
M. R. Garey, D. S. Johnson. Computers and Intractability, W.H.Freeman &Co, 1979.
L. Levin. Universal Search Problems, Probl.Pered.Inf. 9(3), 1973. A translation appears in B. A. Trakhtenbrot. A survey of Russian approaches to Perebor (brute-force search) algorithms, Annals of the History of Computing, 6(4):384-400, 1984

(a) What are P and NP?

First we need to define formally what we mean by a problem. Typically a problem consists of a question and an answer. Moreover we group problems by general similarities.

Again using the multiplication example, we define a multiplication problem as a pair of numbers, and the answer is their product. An instance of the multiplication problem is a specific pair of numbers to be multiplied.

Problem. Multiplication
Input. A pair of numbers x and y
Output. The product x times y

A clever observation is that we can convert a multiplication problem into a yes/no answer by joining together the original question and the answer and asking if they form a correct pair. In the case of multiplication, we can convert a question like
4 x 9 = ??

into a yes/no statement such as

"is it true that 4 x 9 = 36?" (yes), or
"is it true that 5 x 7 = 48?" (no).

In general we can apply this technique to most (if not all) problems, simplifying formal treatment of problems.

Definition A decision problem is a language L of strings over an alphabet. A particular instance of the problem is a question of the form "is x in L?" where x is a string. The answer is yes or no.

The rest of this section was written by Daniel Jimenez

P is the class of decision problems for which we can find a solution in polynomial time.

Definition A polynomial time function is just a function that can be computed in a time polynomial in the size of its parameters.
Definition P is the class of decision problems (languages) L such that there is a polynomial time function f(x) where x is a string and f(x)=True (ie. yes) if and only if x is in L.

NP is the class of decision problems for which we can check solutions in polynomial time.

Definition NP is the class of decision problems (languages) L such that there is a polynomial time function f(x,c) where x is a string, c is another string whose size is polynomial in the size of x, and f(x,c)=True if and only if x is in L.

c in the definition is called a "certificate", the extra information needed to show that x is indeed in the language. NP stands for "nondeterministic polynomial time", from an alternate, but equivalent, definition involving nondeterministic Turing machines that are allowed to guess a certificate and then check it in polynomial time. (Note: A common error when speaking of P and NP is to misremember that NP stands for "non-polynomial"; avoid this trap, unless you want to prove it :-)

An example of a decision problem in NP is:

Decision Problem. Composite Number
Instance. Binary encoding of a positive integer n.
Language. All instances for which n is composite, i.e., not a prime number.

We can look at this as a language L by simply coding n in log n bits as a binary number, so every binary composite number is in L, and nothing else. We can show this problem is in NP by providing a polynomial time f(x,c) (also known as a "polynomial time proof system" for L). In this case, c can be the binary encoding of a non-trivial factor of n. Since c can be no bigger than n, the size of c is polynomial (at most linear) in the size of n. The function f simply checks to see whether c divides n evenly; if it does, then n is proved to be composite and f returns True. Since division can be done in time polynomial in the size of the operands, Composite Number is in NP.

(b) What is NP-hard?

An NP-hard problem is at least as hard as or harder than any problem in NP. Given a method for solving an NP-hard problem, we can solve any problem in NP with only polynomially more work.

Here's some more terminology. A language L' is polynomial time reducible to a language L if there exists a polynomial time function f(x) from strings to strings such that x is in L' if f(x) is in L. This means that if we can test strings for membership in L in time t, we can use f to test strings for membership in L' in a time polynomial in t. (hint) An example of this would be the relationship between Composite Number and Boolean Circuit Satisfiability.

Decision Problem. Boolean Circuit Satisfiability
Instance. A Boolean circuit with n inputs and one output. (Note: in this and the following descriptions of decision problems, it is assumed that the actual instance is a reasonable string encoding of the given instance, so we can still talk about languages of strings.)
Language. All instances for which there is an assignment to the inputs that causes the output to become True.

Composite Number is polynomial time reducible to Boolean Circuit Satisfiability by the following reduction: To decide whether an instance x is in Composite Number, construct a circuit that multiplies two integers given in binary on its inputs and compares the result to x, giving True as the output if and only if the result of the multiplication is x and neither of the input integers is one. The multiplier can be constructed and checked in polynomial time and space, and the comparison can be done in linear time and space.

Polynomial time reducibility formalizes the notion of one problem being harder than another. If L can be used to solve instances of L', then L is at least as hard as or harder than L'.

Definition A decision problem L is NP-hard if, for every language L' in NP, L' is polynomially reducible to L.

So a solution to an NP-hard problem running in time t can be used to solve any problem in NP in a time polynomial in t (possibly different polynomials for different problems). NP-hard problems are at least as hard as or harder than any problem in NP. Boolean Circuit Satisfiability is an example of an NP-hard problem. A related problem, Boolean Formula Satisfiability (commonly called SAT), is also NP-hard; see Garey and Johnson for a proof of Cook's Theorem, which was the first proof to show that a problem (satisfiability) is NP-hard.

An example of an NP-hard problem that isn't known to be in NP is Maximum Satisfiability:

Decision Problem. Maximum Satisfiability (MAXSAT)
Instance. A Boolean formula F and an integer k.
Language. All instances for which F has at least k satisfying assignments.

This problem is harder than SAT because of this reduction: Suppose we want to decide whether a formula F is in SAT. We can simply choose k to be one and see if (F, k) is in MAXSAT. If so, then there is at least one satisfying assignment and the formula is in SAT.

(c) What is NP-complete?

Definition A decision problem L is NP-complete if it is both NP-hard and in NP.

So NP-complete problems are the hardest problems in NP. Since Cook's Theorem was proved in 196?, thousands of problems have been proved to be NP-complete. Probably the most famous example is the Travelling Salesman Problem:

Decision Problem. Travelling Salesman Problem (TSP)
Instance. A set S of cities, a function f:S x S -> N giving the distances between the cities, and an integer k.
Language. The travelling salesman departs from a starting city, goes through each city exactly once, and returns to the start. The language is all instances for which there exists such a tour through the cities of S of length less than or equal to k.

(d) NP complete list

Pierluigi Crescenzi and Viggo Kann mantain a good list of NP optimization problems

[...]

10. .Algorithm Libraries

Stony Brook Algorithms Repository

Library of Efficient Datatypes and Algorithms (LEDA)