Np-completeness

Overview of NP-Completeness: Almost all the algorithms we have studied thus far have been polynomial-time algorithms: on inputs of size n, their worst-case running time is O(n^k) for some constant k. It is natural to wonder whether all problems can be solved in polynomial time. The answer is no. For example, there are problems, such as Turing's famous "Halting Problem," that cannot be solved by any computer, no matter how much time is provided. There are also problems that can be solved, but not in time O(n^k) for any constant k. Generally, we think of problems that are solvable by polynomial-time algorithms as being tractable, or easy, and problems that require superpolynomial time as being intractable, or hard.

The subject of this chapter, however, is an interesting class of problems, called the "NP-complete" problems, whose status is unknown. No polynomial-time algorithm has yet been discovered for an NP-complete problem, nor has anyone yet been able to prove that no polynomial-time algorithm can exist for any one of them. This so-called P ≠ NP question has been one of the deepest, most perplexing open research problems in theoretical computer science since it was first posed in 1971.

A particularly tantalizing aspect of the NP-complete problems is that several of them seem on the surface to be similar to problems that have polynomial-time algorithms. In each of the following pairs of problems, one is solvable in polynomial time and the other is NP-complete, but the difference between problems appears to be slight:

Shortest vs. longest simple paths: In Single-Source Shortest Paths, we saw that even with negative edge weights, we can find shortest paths from a single source in a directed graph G = (V, E) in O(V E) time. Finding the longest simple path between two vertices is NP-complete, however. In fact, it is NP-complete even if all edge weights are 1.
Euler tour vs. hamiltonian cycle: An Euler tour of a connected, directed graph G = (V, E) is a cycle that traverses each edge of G exactly once, although it may visit a vertex more than once. A hamiltonian cycle of a directed graph G = (V, E) is a simple cycle that contains each vertex in V . Determining whether a directed graph has a hamiltonian cycle is NP-complete. (Later in this chapter, we shall prove that determining whether an undirected graph has a hamiltonian cycle is NP-complete.)
2-CNF satisfiability vs. 3-CNF satisfiability: A boolean formula contains variables whose values are 0 or 1; boolean connectives such as ∧ (AND), ∨ (OR), and ¬ (NOT); and parentheses. A boolean formula is satisfiable if there is some assignment of the values 0 and 1 to its variables that causes it to evaluate to 1. We shall define terms more formally later in this chapter, but informally, a boolean formula is in k-conjunctive normal form, or k-CNF, if it is the AND of clauses of ORs of exactly k variables or their negations. For example, the boolean formula (x₁ ∨ ¬x₂) ∧ (¬x₁ ∨ x₃) ∧ (¬x₂ ∨ ¬x₃) is in 2-CNF. (It has the satisfying assignment x₁ = 1, x₂ = 0, x₃ = 1.) There is a polynomial-time algorithm to determine whether a 2-CNF formula is satisfiable, but as we shall see later in this chapter, determining whether a 3-CNF formula is satisfiable is NP-complete.

NP-completeness and the classes P and NP: The class P consists of those problems that are solvable in polynomial time. More specifically, they are problems that can be solved in time O(n^k) for some constant k, where n is the size of the input to the problem. Most of the problems examined in previous chapters are in P.

The class NP consists of those problems that are "verifiable" in polynomial time. What we mean here is that if we were somehow given a "certificate" of a solution, then we could verify that the certificate is correct in time polynomial in the size of the input to the problem. For example, in the hamiltonian-cycle problem, given a directed graph G = (V, E), a certificate would be a sequence 〈v₁, v₂, v₃, ..., v_|V|〉 of |V| vertices. It is easy to check in polynomial time that (v_i, v_{i 1}) ∈ E for i = 1, 2, 3, ..., |V| - 1 and that (v_|V|, v₁) ∈ E as well. As another example, for 3-CNF satisfiability, a certificate would be an assignment of values to variables. We can easily check in polynomial time that this assignment satisfies the boolean formula.

Any problem in P is also in NP, since if a problem is in P then we can solve it in polynomial time without even being given a certificate. We will formalize this notion later in this chapter, but for now we can believe that P ⊆ NP. The open question is whether or not P is a proper subset of NP.

Informally, a problem is in the class NPC-and we refer to it as being NP-complete-if it is in NP and is as "hard" as any problem in NP. We shall formally define what it means to be as hard as any problem in NP later in this chapter. In the meantime, we will state without proof that if any NP-complete problem can be solved in polynomial time, then every NP-complete problem has a polynomial-time algorithm. Most theoretical computer scientists believe that the NP-complete problems are intractable, since given the wide range of NP-complete problems that have been studied to date-without anyone having discovered a polynomial-time solution to any of them-it would be truly astounding if all of them could be solved in polynomial time. Yet, given the effort devoted thus far to proving that NP-complete problems are intractable-without a conclusive outcome-we cannot rule out the possibility that the NP-complete problems are in fact solvable in polynomial time.

Decision problems vs. optimization problems: Many problems of interest are optimization problems, in which each feasible (i.e., "legal") solution has an associated value, and we wish to find the feasible solution with the best value. For example, in a problem that we call SHORTEST-PATH, we are given an undirected graph G and vertices u and v, and we wish to find the path from u to v that uses the fewest edges. (In other words, SHORTEST-PATH is the single-pair shortest-path problem in an unweighted, undirected graph.) NP-completeness applies directly not to optimization problems, however, but to decision problems, in which the answer is simply "yes" or "no" (or, more formally, "1" or "0").

Although showing that a problem is NP-complete confines us to the realm of decision problems, there is a convenient relationship between optimization problems and decision problems. We usually can cast a given optimization problem as a related decision problem by imposing a bound on the value to be optimized. For SHORTEST-PATH, for example, a related decision problem, which we call PATH, is whether, given a directed graph G, vertices u and v, and an integer k, a path exists from u to v consisting of at most k edges.

The relationship between an optimization problem and its related decision problem works in our favor when we try to show that the optimization problem is "hard." That is because the decision problem is in a sense "easier," or at least "no harder." As a specific example, we can solve PATH by solving SHORTEST-PATH and then comparing the number of edges in the shortest path found to the value of the decision-problem parameter k. In other words, if an optimization problem is easy, its related decision problem is easy as well. Stated in a way that has more relevance to NP-completeness, if we can provide evidence that a decision problem is hard, we also provide evidence that its related optimization problem is hard. Thus, even though it restricts attention to decision problems, the theory of NP-completeness often has implications for optimization problems as well.