What brings you to Androma?

This course introduces the central ideas of computational complexity theory, with a focus on the boundary between problems that are efficiently solvable and those that appear inherently difficult. It develops the formal language used to measure computational resources, especially time, space, and the effect of different models of computation and encodings. The main goal is to build a rigorous framework for understanding classes such as P and NP, and to study how problems relate to one another through reductions, completeness, and hierarchy theorems. The early chapters establish the foundations: how computation is modeled, how inputs are encoded, and how complexity measures are defined. From there, the course moves to deterministic polynomial time and then to nondeterminism, introducing NP as the natural class of efficiently verifiable problems. Polynomial-time reductions then become the main organizing tool, allowing one to compare problems and prove NP-completeness, with SAT serving as the canonical starting point and a range of core NP-complete problems illustrating the method. Later chapters deepen the picture in several directions. Students learn how to design and check reductions carefully, confront the P versus NP question directly, and explore space-bounded computation, complementation phenomena, and nondeterministic space. The course then broadens to the polynomial hierarchy and randomized complexity classes, showing how the foundational ideas of P and NP connect to richer landscapes of tractability, verification, and probabilistic computation. # Introduction Computational complexity asks what can be computed when time, memory, randomness, and interaction are treated as mathematical resources. This course focuses first on deterministic polynomial time and nondeterministic polynomial time, then uses reductions to explain why many search and decision problems have the same apparent difficulty. The guiding theme is robustness: although real computers differ from Turing machines in many engineering details, the distinction between polynomial and super-polynomial resource use survives reasonable changes of model and encoding. This viewpoint connects algorithms to logic through satisfiability and to cryptography through the need for functions that are easy to evaluate but hard to invert. The chapter is an introduction to the vocabulary and viewpoint used throughout the course. It fixes the objects of study, explains why decision problems are the default formalism, and previews how the main resource bounds, reductions, and completeness notions fit together. ## What Complexity Theory Measures Why is computability alone too coarse for algorithm design? A problem may be solvable in principle but useless in practice if every algorithm requires an astronomical number of steps on moderate input sizes. Complexity theory refines the question "is there an algorithm?" into "which resources are needed as the input size grows?" [definition: Computational Problem] Let $\Sigma$ and $\Gamma$ be finite alphabets. A computational problem is a specified task whose instances are strings in $\Sigma^*$ and whose valid outputs are specified by a relation $R\subset \Sigma^*\times \Gamma^*$. On input $x\in\Sigma^*$, an algorithm solves the problem by outputting some $y\in\Gamma^*$ such that $(x,y)\in R$, when such an output is required. [/definition] For decision problems the output alphabet may be taken to encode accept and reject, or equivalently the relation is determined by a set of yes-instances. This definition forces mathematical objects such as graphs, formulae, integers, and machines to be represented by finite strings. It also prevents an ambiguity that would otherwise appear in tasks such as shortest path: asking for the distance, asking for an actual path, and asking whether a path of length at most $k$ exists are related but have different output spaces. To compare algorithms for such problems, we need a way to say how much of a resource is allowed as the input length increases. [definition: Resource Bound] A resource bound is a function $f:\mathbb N \to \mathbb N$ used to bound a computational resource, such as time or space, on all inputs of length $n$. [/definition] Resource bounds are worst-case unless explicitly stated otherwise. The main resource bound in the opening part of the course is time, and the central dividing line is whether the number of steps is bounded by a polynomial in the input length. This motivates the formal notion of polynomial time. [definition: Polynomial Time] An algorithm runs in polynomial time if there exist constants $C>0$ and $k\in\mathbb N$ such that, for every input of length $n$, the algorithm halts after at most $Cn^k$ computation steps. [/definition] Polynomial time is not a perfect synonym for practical efficiency, since $n^{100}$ is polynomial and a large hidden constant may matter. Its importance is structural: it is stable under composition, stable under standard encodings, and broad enough to include the algorithms usually regarded as scalable. [example: Graph Reachability As A Computational Problem] An instance is a string encoding a finite directed graph $G=(V,E)$ together with two distinguished vertices $s,t\in V$. The decision task is to accept exactly when there is a directed path from $s$ to $t$. Breadth-first search solves this language as follows. Start with a queue containing $s$ and a set $R=\{s\}$ of reached vertices. Repeatedly remove the first vertex $u$ from the queue, inspect every outgoing edge $(u,v)\in E$, and if $v\notin R$, add $v$ to $R$ and append $v$ to the queue. Each vertex is appended to the queue at most once, because it is appended only when it first enters $R$. Each directed edge $(u,v)$ is inspected exactly when its tail $u$ is removed from the queue. Therefore the number of queue removals is at most $|V|$, and the number of edge inspections is at most $|E|$, so the search uses at most $C(|V|+|E|)$ elementary graph operations for some constant $C$ depending on the encoding convention. At the end, accept iff $t\in R$. The invariant is that $R$ contains exactly the vertices reachable from $s$ by a directed path whose internal vertices have already been processed or queued: initially this holds for the length-$0$ path from $s$ to itself, and inspecting $(u,v)$ adds precisely the one-step extension of a known path from $s$ to $u$. When the queue is empty, no outgoing edge from a reached vertex leads to an unreached vertex, so no further directed path from $s$ can reach a new vertex. Since a standard graph encoding has length at least proportional to $|V|+|E|$, this running time is polynomial in the input length, making directed graph reachability a basic example of efficient computation. [/example] The graph example also illustrates a convention used throughout the course: we measure performance as a function of input length, not as a function of the informal size of the object before encoding. This keeps the theory model-independent enough to compare problems from logic, combinatorics, algebra, and optimization. ## Decision Problems And Languages How can a single formalism cover satisfiability, graph colourability, shortest paths, and many other tasks? The standard move is to begin with yes-no questions. This loses less information than it first seems, because search and optimization problems often reduce to associated decision versions. Without fixing a language, the same informal problem statement can mix incompatible tasks: an invalid graph encoding might be ignored, rejected, or repaired, and "shortest path" might mean outputting a path, outputting its length, or deciding whether a path of length at most $k$ exists. [definition: Language] A language over a finite alphabet $\Sigma$ is a subset $L \subset \Sigma^*$. [/definition] A language represents a decision problem: strings in $L$ are yes-instances, and strings outside $L$ are no-instances. The empty string may be an input, but most examples encode structured objects and therefore use only a subset of all strings as well-formed encodings. [definition: Decider] A decider for a language $L\subset\Sigma^*$ is a total algorithm \begin{align*} D:\Sigma^*\to\{\mathrm{accept},\mathrm{reject}\} \end{align*} such that, for every input $x\in\Sigma^*$, $D(x)=\mathrm{accept}$ iff $x\in L$. [/definition] The halting requirement matters because complexity classes bound resources on all inputs. Algorithms that sometimes run forever belong to computability theory, but time complexity for decision problems is built around total algorithms. [example: Satisfiability As A Language] Fix an encoding that maps each Boolean formula over variables $x_1,\dots,x_n$ to a finite string. The corresponding decision language is \begin{align*} \mathrm{SAT} = \{\langle \varphi\rangle : \text{there is an assignment } a:\{x_1,\dots,x_n\}\to\{0,1\}\text{ such that }\varphi(a)=1\}. \end{align*} Thus the input to the decision problem is the string $\langle\varphi\rangle$, and the answer is accept exactly when some truth assignment makes $\varphi$ true. This separates the decision problem from the search task. The search task asks for an assignment $a$ with $\varphi(a)=1$, while the language $\mathrm{SAT}$ asks only whether such an $a$ exists. A yes-instance nevertheless has a short witness: write the assignment as the length-$n$ bit string \begin{align*} w=a(x_1)a(x_2)\cdots a(x_n). \end{align*} Given $\langle\varphi\rangle$ and $w$, one checks each variable value from $w$ and evaluates the formula gate by gate or connective by connective. The verifier accepts exactly when the final value of $\varphi$ is $1$, so satisfiability becomes a yes-no language whose positive instances carry explicitly checkable certificates. [/example] The word "witness" in the satisfiability example points toward NP. A decision problem can be hard to solve directly while still having proposed yes-answers that are fast to check. ## The Classes $\mathrm{P}$ And $\mathrm{NP}$ Which problems should count as efficiently decidable, and which problems have efficiently checkable certificates? These two questions lead to P and NP, the central classes of the course. [definition: The Class P] The class $\mathrm P$ is the set of languages decidable by a deterministic Turing machine in polynomial time. [/definition] Chapter 1 introduces the formal Turing-machine model. For the introduction, the essential point is that P contains the decision problems for which there is a deterministic algorithm with a polynomial worst-case running-time bound. To describe NP, we need a second polynomial-time object: a verifier, which checks a proposed certificate rather than finding it from scratch. [definition: Polynomial-Time Verifier] A polynomial-time verifier for a language $L\subset\Sigma^*$ is a deterministic polynomial-time algorithm \begin{align*} V:\Sigma^*\times\Sigma^*\to\{\mathrm{accept},\mathrm{reject}\} \end{align*} and a polynomial $p$ such that for every $x\in\Sigma^*$, \begin{align*} x\in L \iff \exists w\in\Sigma^*\text{ with } |w|\le p(|x|) \text{ and } V(x,w)=\mathrm{accept}. \end{align*} [/definition] Here $w$ is called a certificate or witness. The verifier need not find $w$; it only checks a proposed witness within polynomial time. [definition: The Class NP] The class $\mathrm{NP}$ is the set of languages that have a polynomial-time verifier. [/definition] This verifier-based definition is the version used for reductions and NP-completeness. Chapter 4 proves its equivalence with nondeterministic polynomial-time Turing machines, which explains the name NP. Before that equivalence is available, we can already compare P and NP directly from the verifier definition. [quotetheorem:6172] [citeproof:6172] This inclusion fixes the baseline relationship between the two classes: every efficiently decidable language is also efficiently verifiable. The polynomial-time and totality requirements are essential parts of that comparison, because NP is a class of decision languages with uniformly bounded verification procedures. The theorem does not say that certificates can be found efficiently, and it does not say that every efficiently checkable problem is efficiently decidable. It sets up the central comparison between direct algorithms and verification procedures. [remark: The P Versus NP Question] The question $\mathrm P = \mathrm{NP}$ asks whether every decision problem with polynomial-size certificates checkable in polynomial time is itself decidable in polynomial time. Most complexity theorists expect $\mathrm P \ne \mathrm{NP}$, but no proof is known. [/remark] The rest of the theory develops a precise framework showing that a large family of natural problems are equivalent in difficulty to SAT under efficient reductions. ## Reductions As Comparisons Of Difficulty How can we justify the claim that two unrelated problems have the same computational difficulty? An informal comparison is not meaningful unless it specifies how instances of one problem are converted into instances of the other, and unless yes-instances and no-instances are preserved. Reductions provide the comparison tool: an efficient transformation from instances of one problem into instances of another transfers algorithms in the reverse direction. [definition: Polynomial-Time Many-One Reduction] Let $A\subset\Sigma^*$ and $B\subset\Gamma^*$ be languages. A polynomial-time many-one reduction from $A$ to $B$ is a function $f:\Sigma^*\to\Gamma^*$ computable in polynomial time such that, for every $x\in\Sigma^*$, \begin{align*} x\in A \iff f(x)\in B. \end{align*} We write $A\le_m^p B$. [/definition] The direction of the notation is important. If $A\le_m^p B$, then an efficient algorithm for $B$ should give an efficient algorithm for $A$ by first computing $f(x)$ and then solving $B$. The next theorem records the resource accounting that makes this intuition a formal closure property of P. [quotetheorem:6173] [citeproof:6173] This theorem is the engine behind NP-completeness. The reduction must be polynomial-time computable, because otherwise the preprocessing step could hide all the difficulty of solving $A$. A concrete failure occurs if $B=\{0\}$ and $A$ is any decidable language with a very slow decision procedure: define $f(x)=0$ when $x\in A$ and $f(x)=1$ when $x\notin A$. Then $x\in A$ iff $f(x)\in B$, and $B\in\mathrm P$, but the construction of $f$ may require solving $A$ first; without a polynomial-time bound on $f$, this gives no polynomial-time algorithm for $A$. The target hypothesis $B\in\mathrm P$ is also essential. For example, every language $A$ reduces to itself by the identity map, but the identity reduction $A\le_m^p A$ does not imply $A\in\mathrm P$ unless a polynomial-time decider for the target copy of $A$ is already available. The result is only a transfer of upper bounds along a many-one reduction; it does not say that $A$ and $B$ have identical structure, nor that an algorithm for $A$ would solve $B$. To prove that a problem is hard, we reduce a known hard problem to it; to use an algorithmic breakthrough, we run the reduction in the other direction and solve the original problem. [definition: NP-Hard And NP-Complete] A language $B$ is NP-hard if $A\le_m^p B$ for every $A\in\mathrm{NP}$. A language $B$ is NP-complete if $B\in\mathrm{NP}$ and $B$ is NP-hard. [/definition] NP-complete problems are the central landmarks of the course. They are in NP, so yes-instances have efficiently checkable certificates, and they are at least as hard as every problem in NP under polynomial-time reductions. [example: Why SAT Is The Prototype] Let $\varphi$ be a Boolean formula with variables $x_1,\dots,x_n$. A certificate for $\langle\varphi\rangle$ is the bit string \begin{align*} w=b_1b_2\cdots b_n\in\{0,1\}^n, \end{align*} interpreted as the assignment $a(x_i)=b_i$ for each $i$. The verifier reads $w$, assigns each variable its indicated bit, and evaluates $\varphi$ from the leaves upward: a variable gate $x_i$ receives value $b_i$, a negation gate receives $1-c$ when its input has value $c$, an AND gate receives $cd$ from inputs $c,d\in\{0,1\}$, and an OR gate receives $1-(1-c)(1-d)$. It accepts exactly when the value at the output gate is $1$. The certificate has length $n$, which is at most the length of the encoded formula up to the fixed encoding convention, and each symbol or gate of $\varphi$ is evaluated once. Thus this is a polynomial-time verifier for $\mathrm{SAT}$, so $\mathrm{SAT}\in\mathrm{NP}$. The *Cook-Levin theorem* supplies the hardness direction: for every $L\in\mathrm{NP}$ and input $x$, it constructs in polynomial time a formula $\varphi_x$ such that \begin{align*} x\in L \iff \varphi_x\in\mathrm{SAT}. \end{align*} Therefore every language in $\mathrm{NP}$ reduces to $\mathrm{SAT}$, and together with $\mathrm{SAT}\in\mathrm{NP}$ this makes $\mathrm{SAT}$ NP-complete. [/example] The example previews a recurring pattern. Membership in NP usually comes from the natural certificate, while hardness comes from designing an encoding that preserves yes-instances and no-instances. ## The Road Through The Course What must be built before NP-completeness becomes a usable technique? The course first fixes computational models, then defines resource classes, then develops reductions and complete problems. Later topics show how the same ideas extend to space, alternation, hierarchy, and randomness. [explanation: Course Map] The first lecture block introduces deterministic Turing machines, configurations, halting, and language decision. It also treats encodings of graphs, formulae, integers, and machines, because complexity statements must be invariant under reasonable representations. The second block defines time and space complexity. For a resource bound $f$, $\mathrm{TIME}(f(n))$ denotes languages decidable within $O(f(n))$ time, while $\mathrm{SPACE}(f(n))$ denotes languages decidable within $O(f(n))$ space. Important space classes include $\mathrm L$, deterministic logarithmic space, and $\mathrm{PSPACE}$, polynomial space. This block proves basic containment results and introduces hierarchy theorems as formal evidence that more resources can decide more languages. The middle of the course develops $\mathrm P$, $\mathrm{NP}$, verifiers, nondeterminism, and polynomial-time reductions. This leads to the Cook-Levin theorem and then to standard NP-completeness reductions for problems involving formulae, graphs, sets, and Hamiltonian structure. The final lecture block introduces coNP, the class of complements of NP languages, the polynomial hierarchy, and randomized classes. These include $\mathrm{RP}$, where false positives are forbidden and yes-instances are accepted with probability at least a fixed positive constant such as $1/2$; $\mathrm{BPP}$, where both error types are allowed but bounded away from $1/2$; and $\mathrm{ZPP}$, the zero-error randomized polynomial-time class. These topics show that changing the mode of computation gives a landscape of classes rather than a single boundary between feasible and infeasible problems. [/explanation] This map also explains the role of examples. Each abstract definition will be paired with concrete encodings and reductions, because the skill of the subject is not just knowing the definitions but being able to use them to classify new problems. [remark: How To Read These Notes] When a theorem states a containment, ask which simulation or resource accounting argument proves it. When a theorem states hardness, ask what information is encoded by the reduction and why the transformation is polynomial-time computable. These two habits cover a large part of the course's proof technique. [/remark] The first chapter gave the big picture: problems, languages, P, NP, and reductions as the main objects of study. Before any complexity bound can be meaningful, we need a fixed machine model and a precise encoding scheme, so the next chapter makes those conventions explicit. # 1. Models of Computation and Encodings After Chapter 0 introduced problems, languages, P, NP, and reductions at a high level, this chapter fixes the computational model used throughout the course. Complexity theory studies how resource requirements grow with the input size, so we need a precise notion of algorithm and a precise way to measure the size of an instance. The main point is that the exact low-level choices do not matter for the classes $P$ and $NP$, provided our encodings are reasonable and transformations between them have polynomial cost. ## Deterministic Turing Machines as Deciders What does it mean to say that a finite piece of code solves every instance of a decision problem? We model the code by a finite control, the input by a string over a finite alphabet, and the working memory by one or more tapes. This is austere enough to support formal proofs, but expressive enough to represent ordinary discrete algorithms. [definition: Alphabet and String] An alphabet is a finite nonempty set $\Sigma$ of symbols. A string over $\Sigma$ is a finite sequence of symbols from $\Sigma$. The set of all strings over $\Sigma$ is denoted $\Sigma^*$, and the length of a string $x \in \Sigma^*$ is denoted $|x|$. [/definition] The alphabet separates the mathematical input from the physical way it is stored. Once the alphabet is fixed, a decision problem becomes a collection of inputs that should receive the answer yes, so we name that collection before describing any machine. [definition: Language] A language over an alphabet $\Sigma$ is a subset $L \subseteq \Sigma^*$. [/definition] A language is the formal version of a yes-no problem: strings in $L$ are yes-instances, and strings outside $L$ are no-instances. We now need a finite mechanical device whose behaviour on such strings can be specified without appeal to an implementation-dependent programming language. [definition: Deterministic Single-Tape Turing Machine] A deterministic single-tape Turing machine is a tuple \begin{align*} M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\mathrm{acc}}, q_{\mathrm{rej}}), \end{align*} where $Q$ is a finite set of states, $\Sigma$ is the input alphabet, $\Gamma$ is the tape alphabet with $\Sigma \subseteq \Gamma$ and a blank symbol $\blank \in \Gamma \setminus \Sigma$, $q_0 \in Q$ is the start state, $q_{\mathrm{acc}}, q_{\mathrm{rej}} \in Q$ are distinct halting states, and \begin{align*} \delta : (Q \setminus \{q_{\mathrm{acc}}, q_{\mathrm{rej}}\}) \times \Gamma \to Q \times \Gamma \times \{L,R,S\} \end{align*} is the transition function. [/definition] The symbols $L,R,S$ mean move the head left, move it right, or keep it stationary after writing the new tape symbol. This gives the local rule, but correctness and running time require a name for the complete instantaneous situation after any number of steps; this motivates the next definition. [definition: Configuration] A configuration of a deterministic single-tape Turing machine $M$ is a finite string describing the current state, the current tape contents on all nonblank cells and the scanned cell, and the current head position. [/definition] The transition function maps each nonhalting configuration to its unique successor. A computation path is not enough for decision problems unless it also has a specified terminal interpretation, so we next record the accept-reject convention used throughout the course. [definition: Halting and Deciding] A deterministic Turing machine $M$ halts on input $x$ if its computation on $x$ reaches either $q_{\mathrm{acc}}$ or $q_{\mathrm{rej}}$. It accepts $x$ if it halts in $q_{\mathrm{acc}}$, and rejects $x$ if it halts in $q_{\mathrm{rej}}$. The machine $M$ decides a language $L \subseteq \Sigma^*$ if, for every $x \in \Sigma^*$, $M$ halts on $x$ and accepts $x$ iff $x \in L$. [/definition] Decision is stronger than recognition because the machine must eventually answer no on every no-instance. Complexity classes in this course are classes of decidable languages with specified resource bounds, so it is useful to see how an abstract transition acts on an encoded configuration. [example: Simulating One Single-Tape Transition] Let a configuration be encoded as $u q a v$, where $u,v \in \Gamma^*$, the symbols in $u$ are strictly to the left of the head, the head scans $a \in \Gamma$, and $q \in Q$ is the current state. Suppose \begin{align*} \delta(q,a)=(p,b,R). \end{align*} By the definition of the transition function, the machine writes $b$ in the scanned cell, changes state from $q$ to $p$, and moves the head one cell to the right. If $v=cw$ with $c \in \Gamma$ and $w \in \Gamma^*$, then the tape segment \begin{align*} u q a c w \end{align*} first becomes \begin{align*} u p b c w \end{align*} after replacing the state $q$ by $p$ and overwriting $a$ by $b$. Since the head then moves right, the state marker must be placed immediately before the next scanned symbol $c$, giving \begin{align*} u b p c w. \end{align*} If $v$ is empty, then the cell to the right of $a$ is blank, so after writing $b$ and moving right the scanned symbol is $\blank$, and the successor configuration is \begin{align*} u b p \blank. \end{align*} Thus this single transition changes only the scanned cell, the state marker, and the adjacent cell that becomes scanned next. [/example] The local nature of transitions is what makes the Turing-machine model suitable for complexity theory. A computation of $t$ steps is a sequence of $t+1$ configurations, each obtained from the preceding one by a bounded syntactic change. ## Multi-Tape Machines and Polynomial Simulation Real algorithms often keep several work arrays, counters, or queues. A single tape can encode all of this information, but manipulating many logical tapes on one physical tape may be slower. The next model gives the machine several tapes, then compares it with the single-tape model. [definition: Deterministic Multi-Tape Turing Machine] A deterministic $k$-tape Turing machine is a tuple \begin{align*} M = (Q, \Sigma, \Gamma, \delta, q_0, q_{\mathrm{acc}}, q_{\mathrm{rej}}), \end{align*} where $Q$ is a finite set of states, $\Sigma$ is the input alphabet, $\Gamma$ is the tape alphabet with $\Sigma \subseteq \Gamma$ and a blank symbol $\blank \in \Gamma \setminus \Sigma$, $q_0 \in Q$ is the start state, $q_{\mathrm{acc}}, q_{\mathrm{rej}} \in Q$ are distinct halting states, and \begin{align*} \delta : (Q \setminus \{q_{\mathrm{acc}}, q_{\mathrm{rej}}\}) \times \Gamma^k \to Q \times \Gamma^k \times \{L,R,S\}^k \end{align*} is the transition function. [/definition] Multi-tape machines are usually more convenient for designing algorithms: one tape can store the input, another a work string, and another a counter or output scratch space. Before relying on this convenience, we must compare it with the single-tape model; this motivates the following theorem. [quotetheorem:6174] [citeproof:6174] This direction says that adding tapes does not reduce what can be computed, and the proof uses the strongest possible hypothesis: the multi-tape machine can devote one tape to the simulated tape and ignore the rest. The hypothesis that the simulator has at least one usable tape matching the simulated tape is doing the work; with that tape available, every single-tape move has a direct multi-tape move. It does not say that several tapes are never faster; in fact, extra tapes can substantially improve exact running times for particular algorithms. What matters for this course is only that the overhead in this direction is linear, so any lower-level choice between one tape and many tapes must be examined through the coarser lens of polynomial overhead. The more important complexity question goes the other way: does the convenience of several tapes create languages that are polynomial-time only because the model is stronger? [quotetheorem:6175] [citeproof:6175] The quadratic overhead is not the main point; the essential point is that the overhead is polynomial. The fixed-$k$ hypothesis is part of the statement: if the number of tapes were allowed to grow with the input and were hidden from the input length, the encoding of configurations and the cost of scanning head markers would no longer be controlled by one constant. For a concrete failure mode, imagine a family where inputs of length $n$ are processed by a machine with $k=n$ work tapes, one tape for each input position; even recording all head markers on a single tape now carries an $n$-dependent bookkeeping cost before any simulated work has been done. The theorem also does not identify exact time classes, since a problem running in linear time on two tapes may only be guaranteed quadratic time by this simulation on one tape. Therefore a language decidable in polynomial time on a fixed multi-tape machine is decidable in polynomial time on a single-tape machine, and conversely, but fine-grained time bounds remain model-sensitive. [example: Two Work Tapes on One Tape] Suppose the two-tape machine is in state $q$, the active contents are $101$ on tape $1$ and $abba$ on tape $2$, and the heads scan the second and third symbols respectively. The marked single-tape encoding is \begin{align*} 1\dot{0}1\#ab\dot{b}a, \end{align*} so the symbols currently under the virtual heads are $0$ on the first tape and $b$ on the second tape. For a concrete simulated step, suppose \begin{align*} \delta(q,0,b)=(p,1,a,R,L). \end{align*} This rule says: change the state to $p$, write $1$ on tape $1$, write $a$ on tape $2$, move the first virtual head right, and move the second virtual head left. First rewrite only the two scanned cells: \begin{align*} 1\dot{0}1\#ab\dot{b}a &\longmapsto 1\dot{1}1\#ab\dot{a}a. \end{align*} Then move the first dot one symbol to the right, because the first head moves from the second cell to the third cell: \begin{align*} 1\dot{1}1\#ab\dot{a}a &\longmapsto 11\dot{1}\#ab\dot{a}a. \end{align*} Finally move the second dot one symbol to the left, because the second head moves from the third cell of $abba$ to the second cell: \begin{align*} 11\dot{1}\#ab\dot{a}a &\longmapsto 11\dot{1}\#a\dot{b}aa. \end{align*} Thus the single-tape encoding records one two-tape transition by reading the two dotted symbols, rewriting those two cells, and relocating the two dots to the next virtual head positions. [/example] This example also explains why precise encodings must be chosen with care. A bad encoding can hide a large object behind a short string, while a reasonable encoding records enough explicit structure that local updates and parsing have polynomial cost. The same lesson appears outside complexity theory whenever data structures are compared by the cost of moving between representations. A graph library, a compiler, or a database engine may store the same mathematical object in several formats, and the useful invariants are the ones preserved after accounting for parsing, indexing, and translation overhead. Complexity theory isolates the coarse version of this principle: representation changes are acceptable when they do not change polynomial feasibility. ## Encoding Mathematical Objects as Strings Complexity classes contain languages, but the problems we care about are usually about graphs, formulas, integers, machines, and other finite objects. To place such a problem inside the theory, we choose an encoding map that turns each finite object into a string. [definition: Encoding] An encoding of a class of finite objects $\mathcal{C}$ over an alphabet $\Sigma$ is an injective map \begin{align*} \operatorname{enc}: \mathcal{C} \to \Sigma^*. \end{align*} [/definition] Injectivity prevents two distinct objects from being represented by the same string. In practice we also require the set of valid encodings to be decidable efficiently, and the following examples show the size tradeoffs created by standard choices. [example: Graph Encodings] Let $G=(V,E)$ be a simple graph with labelled vertex set $V=\{1,\dots,n\}$. In the adjacency matrix encoding, we write one bit $A_{ij}$ for each ordered pair $(i,j)\in V\times V$, with $A_{ij}=1$ iff $\{i,j\}\in E$. Since there are $n$ choices for $i$ and $n$ choices for $j$, the number of matrix entries is \begin{align*} |V\times V|=n\cdot n=n^2. \end{align*} Thus the matrix part of the encoding has exactly $n^2$ bits, up to any fixed-size or polynomially bounded header used to record $n$. In an adjacency list encoding, for each vertex $i$ we write the label $i$, separators, and then the labels of all neighbours of $i$. A vertex label from $\{1,\dots,n\}$ needs $\lceil \log_2(n+1)\rceil$ bits in binary, so writing the $n$ vertex labels costs \begin{align*} n\lceil \log_2(n+1)\rceil = O(n\log n). \end{align*} Each undirected edge $\{i,j\}$ appears once in the list for $i$ and once in the list for $j$, so the total number of neighbour labels written is \begin{align*} \sum_{i=1}^n \deg(i)=2|E|. \end{align*} These neighbour labels therefore cost \begin{align*} 2|E|\lceil \log_2(n+1)\rceil = O(|E|\log n) \end{align*} bits, and the separators contribute only linearly in the number of list items. Hence the adjacency list length is \begin{align*} O(n\log n)+O(|E|\log n)=O((n+|E|)\log n). \end{align*} Both encodings determine exactly the same labelled graph, but when $|E|$ is much smaller than $n^2$, the list encoding records far fewer edge positions than the matrix encoding. [/example] This comparison shows both the usefulness and the danger of representation choices. The two lengths can differ by a polynomial factor, which is harmless for polynomial-time complexity, but not by an exponential factor. [example: Boolean Formula Encoding] Let the formula \begin{align*} ((x_1 \wedge (\neg x_2)) \vee x_3) \end{align*} be encoded as the fully parenthesised token string \begin{align*} (,\ (,\, x_1,\, \wedge,\, (,\, \neg,\, x_2,\, ),\, ),\, \vee,\, x_3,\, ). \end{align*} Counting the displayed tokens gives \begin{align*} 1+1+1+1+1+1+1+1+1+1+1+1=12 \end{align*} tokens, so this particular encoding length is proportional to the number of symbols in the written formula. For a formula whose encoded string has length $m$, a parser can scan the string from left to right, create one syntax-tree node for each variable and connective token, and use the parentheses to determine parent-child relations. Since the scan reads $m$ symbols and creates at most $m$ nodes, the parsing time is $O(m)$. Given a truth assignment, evaluation visits each node once: variable nodes read their assigned truth values, a $\neg$ node flips the value of its child, and $\wedge,\vee$ nodes combine the two child values. Thus evaluation uses at most one constant-time Boolean operation per syntax-tree node, hence takes $O(m)$ time after parsing. This encoding exposes the formula's syntax explicitly, so both parsing and evaluation are polynomial in the encoded length. [/example] Formulas illustrate the usual standard: the encoding should expose the syntax that algorithms need to inspect. Integers give another important case, because unary and binary representations lead to different input lengths. [example: Integer Encodings] For a positive integer $N$, the unary encoding is the string $1^N$, so its length is \begin{align*} |1^N|=N. \end{align*} In binary, if $k=\lfloor \log_2 N\rfloor$, then \begin{align*} k \le \log_2 N < k+1, \end{align*} so \begin{align*} 2^k \le N < 2^{k+1}. \end{align*} Thus the largest power of $2$ appearing in the binary expansion of $N$ is $2^k$, and the binary representation has positions $2^0,2^1,\dots,2^k$, namely \begin{align*} k+1=\lfloor \log_2 N\rfloor+1 \end{align*} bits. This difference in length changes what “polynomial time” means. For example, a running time $T(N)=N^2$ is polynomial in the unary input length because \begin{align*} T(N)=N^2=|1^N|^2. \end{align*} If the same integer is given in binary and its binary length is $m$, then \begin{align*} m=\lfloor \log_2 N\rfloor+1 \end{align*} implies \begin{align*} m-1 \le \log_2 N, \end{align*} hence \begin{align*} 2^{m-1}\le N. \end{align*} Therefore \begin{align*} T(N)=N^2 \ge (2^{m-1})^2 = 2^{2m-2}, \end{align*} which is exponential in the binary length $m$. Thus unary and binary encodings can turn the same numerical procedure into very different complexity statements, so binary notation is the standard compact encoding unless the problem explicitly declares unary input. [/example] The integer example is the first warning that not every encoding change is harmless. Complexity statements must always be interpreted relative to the chosen input representation. ## Reasonable Encodings and Invariance of Polynomial Time Why are we allowed to speak of the class $P$ without constantly specifying whether graphs are given by matrices, lists, or some other syntax? The answer is that all standard encodings translate into one another within polynomial time and with only polynomial change in length. [definition: Polynomially Related Encodings] Two encodings $\operatorname{enc}_1: \mathcal{C} \to \Sigma_1^*$ and $\operatorname{enc}_2: \mathcal{C} \to \Sigma_2^*$ are polynomially related if there are deterministic Turing machines computing functions \begin{align*} T_{12} : \operatorname{enc}_1(\mathcal{C}) \to \operatorname{enc}_2(\mathcal{C}) \end{align*} and \begin{align*} T_{21} : \operatorname{enc}_2(\mathcal{C}) \to \operatorname{enc}_1(\mathcal{C}) \end{align*} such that $T_{12}(\operatorname{enc}_1(c)) = \operatorname{enc}_2(c)$ and $T_{21}(\operatorname{enc}_2(c)) = \operatorname{enc}_1(c)$ for every $c \in \mathcal{C}$, with both machines running in time polynomial in the input length. [/definition] This condition rules out encodings that compress exponentially much structure without an efficient way to recover it. The issue is not just aesthetic: if translating between two representations could require super-polynomial time, then an algorithm that is efficient in one representation might become inefficient before it even begins its real computation. The formal invariance principle records exactly when a change of encoding can be absorbed into the polynomial-time bound. [quotetheorem:6176] [citeproof:6176] This theorem is the formal reason that the course can use whichever standard encoding is most convenient in a proof. It does not say that every imaginable representation is acceptable; it says that the robust class is obtained after excluding representations whose translations require super-polynomial work. For example, encoding the integer $N$ by the binary representation of $\lfloor \log_2 N \rfloor$ loses enough size information that reconstructing a standard binary representation of $N$ may require output length exponential in the new input length. More generally, a representation that hides an exponentially large object behind a short code can turn an algorithm that merely writes or scans the object into a super-polynomial translation step. Later reductions therefore always specify encodings, but once the standard encodings are known to be polynomially related, the proof can focus on the computational transformation rather than the syntax of the input strings. [example: Translating Graph Matrices and Lists] Let $G$ be a labelled graph with vertex set $\{1,\dots,n\}$. In the adjacency matrix encoding, the input contains one bit $A_{ij}$ for each ordered pair $(i,j)$, so the matrix has \begin{align*} n\cdot n=n^2 \end{align*} entries. To translate the matrix to adjacency lists, scan the entries in lexicographic order: \begin{align*} A_{11},A_{12},\dots,A_{1n},A_{21},\dots,A_{nn}. \end{align*} For each entry $A_{ij}=1$, write the label $j$ into the list for vertex $i$; for each entry $A_{ij}=0$, write nothing to that neighbour list. The scan reads exactly $n^2$ entries. Each label from $\{1,\dots,n\}$ has length at most $\lceil \log_2(n+1)\rceil$, and there are at most $n^2$ entries equal to $1$, so the total number of label bits written is at most \begin{align*} n^2\lceil \log_2(n+1)\rceil. \end{align*} Including separators and the $n$ list headers adds at most a polynomial number of further symbols, so the whole matrix-to-list translation runs in time polynomial in the matrix length. Conversely, suppose the input is an adjacency list whose length is $m$ and whose header declares the vertex set $\{1,\dots,n\}$. To translate it to a matrix, first write an $n\times n$ block of zeros. This writes \begin{align*} n\cdot n=n^2 \end{align*} bits. Then scan the adjacency list once. Whenever the list for vertex $i$ contains a neighbour label $j$, change the matrix entry $A_{ij}$ from $0$ to $1$. The scan reads at most $m$ input symbols, and the number of neighbour labels encountered is at most $m$, because each listed neighbour occupies at least one symbol. Thus the update phase uses polynomial time in $m$ plus the time needed to write the $n^2$ output bits. Since the vertex declarations list all $n$ vertices, the input length satisfies $m\ge n$, hence \begin{align*} n^2 \le m^2. \end{align*} Therefore the list-to-matrix translation is polynomial in the list length. The two translations recover the same labelled graph in both directions, so adjacency matrices and standard adjacency lists are polynomially related encodings of labelled finite graphs. [/example] The output length matters in the second translation: producing an $n^2$-bit matrix already costs $\Theta(n^2)$ time. Since the matrix length is polynomially bounded in the list length for labelled graphs with vertex names written in binary and all vertices declared, this remains a polynomial translation. [remark: Machine Encodings] A Turing machine itself is a finite object: its states, alphabets, and transition table can be written as a string. Standard machine encodings list the states and transition rules using separators and fixed finite alphabets. Such encodings are essential later when machines are used as inputs to universal simulation and to reductions. [/remark] Machine encodings close the circle: algorithms can be represented as data, and a universal machine can interpret that data. Later chapters use this idea when defining nondeterministic verification and when proving completeness results. ## Languages Associated to Encoded Problems The last step is to connect an informal decision problem with the language that represents it. Once an encoding is fixed, the yes-instances form a language, and complexity classes are defined by the existence of machines deciding that language within a resource bound. [definition: Encoded Language of a Decision Problem] Let $A \subseteq \mathcal{C}$ be a decision problem on a class of finite objects, and let $\operatorname{enc}:\mathcal{C}\to\Sigma^*$ be an encoding. The encoded language of $A$ is \begin{align*} L_{A,\operatorname{enc}} = \{\operatorname{enc}(c) \in \Sigma^* : c \in A\}. \end{align*} [/definition] The notation reminds us that membership in a complexity class is formally a statement about strings. In ordinary mathematical speech we often suppress the encoding when the standard choices are polynomially related, and the following example shows that convention in a familiar graph problem. [example: Graph Connectivity as a Language] Let $A$ be the set of connected finite labelled graphs, and fix the standard adjacency-list encoding whose header declares the vertices $\{1,\dots,n\}$. Then \begin{align*} L_{A,\operatorname{enc}} = \{\operatorname{enc}(G):G \text{ is a connected labelled graph}\}. \end{align*} A decider first checks that the input is a well-formed adjacency-list encoding: the header lists vertex labels, separators occur in the required positions, and every neighbour label belongs to $\{1,\dots,n\}$. This check scans the input string and compares each displayed label with the declared range, so if the input length is $m$, it uses at most polynomial time in $m$. If the encoding is well formed, run breadth-first search starting from vertex $1$. Each time the search removes a vertex $i$ from the queue, it scans the list of neighbours written for $i$ and marks every unmarked neighbour. Across the whole run, each vertex list is scanned at most once after its vertex is first reached. Therefore the total number of scanned list symbols is at most the input length $m$, and the number of vertex-marking operations is at most $n$. Since the header itself names all $n$ vertices, we have \begin{align*} n \le m. \end{align*} Thus the search time is bounded by \begin{align*} O(m)+O(n) \le O(m)+O(m)=O(m). \end{align*} After the search finishes, the machine scans the $n$ vertex marks and accepts exactly when all of them are marked; this final scan costs $O(n)\le O(m)$. Hence the whole decider runs in polynomial time in the encoded input length and accepts exactly the encodings of connected labelled graphs. [/example] This chapter has introduced the conventions that make the rest of the course stable: deterministic machines decide languages, multi-tape convenience does not change polynomial time, and standard encodings of finite objects are interchangeable up to polynomial translation. The next chapter attaches explicit time and space bounds to these machines and begins the systematic study of complexity classes. Chapter 1 established the computational model and showed that reasonable encodings and machine variants agree up to polynomial factors. With that foundation in place, we can now measure how much time and space a computation uses and begin comparing algorithms by resource bounds. # 2. Time and Space Complexity This chapter turns the machine models from the first chapter into quantitative tools. It assumes familiarity with deterministic Turing machines, deciders, languages over finite alphabets, input length, and the asymptotic notation $O(\cdot)$ from the first chapter. A decider is no longer judged only by whether it halts with the right answer; we now ask how many tape moves it needs and how much memory it uses as a function of the input length. The central theme is that robust complexity classes arise when we measure resources in the worst case and ignore constant-factor details of the chosen reasonable machine model. ## Measuring Running Time The first question is how to attach a number to the cost of deciding a language. Since a language contains inputs of every length, the relevant quantity is not the running time on a single input but the worst running time among all inputs of the same length. [definition: Worst-Case Running Time] Let $M$ be a deterministic Turing-machine decider over input alphabet $\Sigma$. The worst-case running time of $M$ is the function $T_M: \mathbb N \to \mathbb N$ given by $n \mapsto T_M(n)$, where \begin{align*} T_M(n) = \max\{\text{number of steps taken by }M\text{ on input }x : x \in \Sigma^*,\ |x| = n\}. \end{align*} [/definition] The maximum exists because there are finitely many strings of length $n$ and $M$ halts on every input. This worst-case convention is deliberately pessimistic: it asks for a uniform guarantee over all inputs of length $n$, which is the form of guarantee that survives reductions and composition of algorithms. [example: Linear Scan] Let $L=\{x\in\{0,1\}^*:x\text{ contains at least one }1\}$. Define a one-tape decider $M$ as follows: starting at the leftmost input symbol, it scans one cell to the right at a time; if it reads $1$, it accepts immediately; if it reaches the blank cell after the input without having read a $1$, it rejects. For an input $x$ of length $n$, if the first $1$ occurs in position $i$, then $M$ reads the $i$ input symbols in positions $1,\dots,i$ and then accepts. Thus that run takes at most $a i+b$ steps for fixed machine-dependent constants $a,b>0$, because each scanned input symbol costs only a bounded number of tape moves and state transitions. If $x$ contains no $1$, then $x=0^n$, and $M$ reads all $n$ input symbols before seeing the blank cell and rejecting, so this run takes at least $c n$ steps and at most $a n+b$ steps for fixed constants $a,b,c>0$. Therefore \begin{align*} c n \le T_M(n) \le a n+b \end{align*} for all $n\ge 1$, so the worst-case running time grows linearly with the input length. The input $0^n$ is the worst case because it forces the scan to continue through every input symbol instead of stopping early at a $1$. [/example] This example gives a machine-level bound for one language, but complexity theory needs language classes that collect all problems solvable within a common resource budget. To make such collections insensitive to low-level implementation details, we build constant factors and additive constants into the definition. [definition: Time Complexity Class] Let $f: \mathbb N \to \mathbb N$. The class $\operatorname{TIME}(f(n))$ consists of all languages $L$ for which there are a deterministic Turing-machine decider $M$ and a constant $C>0$ such that $M$ decides $L$ and \begin{align*} T_M(n) \le C f(n) + C \end{align*} for all $n \in \mathbb N$. [/definition] The definition records deterministic decidability within time proportional to $f(n)$, up to the constant adjustments already discussed. The next task is to identify the growth regimes that will matter throughout the course, especially the distinction between feasible search and exhaustive search. [definition: Polynomial And Exponential Time Bounds] A time bound $f: \mathbb N \to \mathbb N$ is polynomial if there exists $k \in \mathbb N$ such that $f(n) = O(n^k)$. It is exponential if there exist $c>1$ and $k \in \mathbb N$ such that $f(n) = O(c^{n^k})$. [/definition] Polynomial time is treated as the basic formal proxy for feasible deterministic computation in this course. Exponential time is not a single class but a family of much larger bounds, often arising from exhaustive search over objects whose number grows exponentially with the input length. [example: Brute Force Satisfiability] Given a Boolean formula $\varphi$ with variables $p_1,\dots,p_m$ and input length $n$, a truth assignment chooses one value in $\{0,1\}$ for each variable. Therefore the set of all assignments is $\{0,1\}^m$, whose size is \begin{align*} |\{0,1\}^m| = 2^m. \end{align*} The brute-force decider enumerates these $2^m$ assignments, evaluates $\varphi$ on each one, accepts as soon as one assignment makes $\varphi$ true, and rejects if every assignment makes $\varphi$ false. Since the input must at least encode the $m$ variables appearing in $\varphi$, we have $m\le n$, and hence \begin{align*} 2^m \le 2^n. \end{align*} If one evaluation of $\varphi$ takes at most $C n^k$ steps for fixed constants $C>0$ and $k\in\mathbb N$, then the total running time is at most \begin{align*} 2^m C n^k \le 2^n C n^k. \end{align*} For $n\ge 1$, the inequality $n\le 2^n$ gives \begin{align*} n^k \le (2^n)^k. \end{align*} Since $(2^n)^k=2^{kn}$, we obtain \begin{align*} 2^n C n^k \le C2^n 2^{kn}. \end{align*} Combining the powers of $2$ gives \begin{align*} C2^n 2^{kn}=C2^{(k+1)n}. \end{align*} Thus brute-force satisfiability has deterministic running time bounded by an exponential function of the input length; the exponential factor comes from trying all truth assignments. [/example] Some hierarchy theorems require the machine to know when its allotted time has expired. Without such a clock, a claimed bound may be too artificial to use inside a diagonal construction: the simulator would be asked to stop after $f(n)$ steps without being able to determine when those steps have elapsed. This motivates a mild effectiveness condition on time bounds. [definition: Time-Constructible Function] A function $f: \mathbb N \to \mathbb N$ with $f(n)\ge n$ for all sufficiently large $n$ is time-constructible if there is a deterministic Turing machine which, on every input of length $n$, computes a standard representation of $f(n)$ in $O(f(n))$ steps. [/definition] The condition rules out artificial bounds that cannot be used as clocks inside simulations. For the clocking applications in this course, computing $f(n)$ within $O(f(n))$ time lets the simulator build a counter or unary clock of length proportional to $f(n)$. Standard integer-valued bounds such as $n$, $\lceil n\log n\rceil$, $n^k$, and $2^n$ are time-constructible after fixing routine conventions for counters and rounding. ## More Time Gives More Power A natural question is whether giving an algorithm more time can always be simulated away by clever programming. The answer is no: under mild constructibility hypotheses, deterministic time has a strict hierarchy. [quotetheorem:6177] [citeproof:6177] The theorem is a formal version of the intuition that extra time can buy genuinely new decidable languages, but its hypotheses are doing real work. A constant bound such as $t(n)=1$ cannot support the stated conclusion, since a machine running for one step cannot even inspect arbitrary inputs and the expression $O(t(n)^3)$ gives no genuinely larger asymptotic budget. Time-constructibility is also needed because the diagonal machine must know when to stop simulating the candidate decider; otherwise the proposed resource bound would not give an implementable cutoff. The overhead in the upper bound is not cosmetic: diagonalisation must pay for universal simulation, decoding, and clocking, so the theorem separates $O(t(n))$ from a sufficiently larger bound rather than from $O(t(n)+1)$. It does not separate $\operatorname{P}$ from $\operatorname{NP}$, because both classes live inside broad polynomial-time regimes where the hierarchy theorem's padding and overhead do not target nondeterminism. [example: A Hierarchy Consequence] Fix $k\ge 1$ and set $t(n)=n^k$. Powers of $n$ are time-constructible, and $n^k\ge n$ for all $n\ge 1$, so the hypotheses of *[Deterministic Time Hierarchy Theorem](/theorems/6177)* apply. The theorem gives a language decidable in time $O(t(n)^3)$ and not decidable in time $O(t(n))$. Substituting $t(n)=n^k$ into the upper bound gives \begin{align*} t(n)^3=(n^k)^3. \end{align*} Using the exponent rule $(a^b)^c=a^{bc}$, this becomes \begin{align*} (n^k)^3=n^{3k}. \end{align*} Therefore \begin{align*} O(t(n)^3)=O(n^{3k}). \end{align*} The lower-side statement is \begin{align*} O(t(n))=O(n^k). \end{align*} Thus, for this fixed exponent $k$, there is a language decidable within deterministic time $O(n^{3k})$ that is not decidable within deterministic time $O(n^k)$ on the chosen standard model. This separates fixed polynomial exponents as fine-grained time bounds, but it does not produce a language outside $\operatorname{P}$, because $O(n^{3k})$ is still a polynomial-time upper bound and $\operatorname{P}$ is the union over all polynomial exponents. [/example] ## Measuring Space Running time is not the only scarce resource. We next ask how many tape cells a computation needs, separating read-only access to the input from writable work memory so that a long input is not automatically counted as used space. If the input tape were counted as workspace, then every nonempty input of length $n$ would automatically cost $n$ cells before the algorithm did any computation, making logarithmic-space algorithms impossible by definition. [definition: Work-Tape Space Usage] For a deterministic Turing machine with a read-only input tape and one or more read-write work tapes, the space used on input $x$ is the number of distinct work-tape cells visited during the computation on $x$. The worst-case space usage is the function $S_M: \mathbb N \to \mathbb N$ given by $n \mapsto S_M(n)$, where \begin{align*} S_M(n)=\max\{\text{space used by }M\text{ on }x : |x|=n\}. \end{align*} [/definition] The input tape is excluded because every algorithm must at least be able to inspect its input. Counting only work tapes captures auxiliary memory; as with time, we then need classes that group languages by a shared asymptotic memory budget. [definition: Space Complexity Class] Let $f: \mathbb N \to \mathbb N$. The class $\operatorname{SPACE}(f(n))$ consists of all languages $L$ for which there are a deterministic Turing-machine decider $M$ and a constant $C>0$ such that $M$ decides $L$ using at most $C f(n)+C$ work-tape cells on every input of length $n$. [/definition] Unlike time, space can be reused. A computation may run for many steps while continually overwriting the same small workspace. This motivates naming two important scales: enough memory to store input positions, and enough memory to store polynomial-size data structures. [definition: Logarithmic Space And Polynomial Space] The class $\operatorname{L}$ is \begin{align*} \operatorname{L}=\operatorname{SPACE}(\log n). \end{align*} The class $\operatorname{PSPACE}$ is \begin{align*} \operatorname{PSPACE}=\bigcup_{k\in\mathbb N}\operatorname{SPACE}(n^k). \end{align*} [/definition] Logarithmic space is large enough to store a constant number of pointers into the input, because an input position from $1$ to $n$ can be represented using $O(\log n)$ bits. Polynomial space permits large dynamic data structures but still forbids explicitly writing objects of exponential length. [example: Graph Reachability By Breadth-First Search] Let $G$ be a directed graph with $n$ vertices and distinguished vertices $s,t$. Breadth-first search maintains one visited bit for each vertex and a queue of vertices whose outgoing edges still have to be explored; it accepts when $t$ is marked visited and rejects when the queue becomes empty. The visited array has one bit for each of the $n$ vertices, so it uses exactly $n$ bits. A vertex name is an integer from $1$ to $n$, and such an integer can be written using $\lceil \log_2 n\rceil$ bits. The queue never contains more than $n$ vertex names, because a vertex is enqueued only when it is first marked visited. Hence the queue uses at most \begin{align*} n\lceil \log_2 n\rceil \end{align*} bits. The visited array and queue together therefore use at most \begin{align*} n+n\lceil \log_2 n\rceil \end{align*} bits. For $n\ge 2$, $\lceil \log_2 n\rceil\le \log_2 n+1\le 2\log_2 n$, so \begin{align*} n+n\lceil \log_2 n\rceil\le n+2n\log_2 n. \end{align*} Since $\log_2 n\ge 1$ for $n\ge 2$, we have $n\le n\log_2 n$, and therefore \begin{align*} n+2n\log_2 n\le 3n\log_2 n. \end{align*} Thus the queue and visited array use $O(n\log n)$ bits, which is polynomial in $n$. For the running time, each vertex is removed from the queue at most once. With an adjacency-list encoding, scanning the outgoing lists over the whole run examines each directed edge at most once, so the total scan cost is $O(n+m)$, where $m$ is the number of directed edges. A directed graph on $n$ vertices has at most $n^2$ directed edges, so $m\le n^2$. Hence \begin{align*} n+m\le n+n^2. \end{align*} For $n\ge 1$, $n\le n^2$, and so \begin{align*} n+n^2\le 2n^2. \end{align*} Thus the adjacency-list version runs in $O(n^2)$ time. With an adjacency-matrix encoding, each dequeued vertex requires scanning one row of length $n$, and at most $n$ vertices are dequeued, so the number of matrix entries inspected is \begin{align*} n\cdot n=n^2. \end{align*} In either standard encoding, breadth-first search decides directed reachability using polynomial time and polynomial space. [/example] This deterministic algorithm is efficient in both time and space, but it is not the most space-frugal way to think about reachability. Later chapters compare this with nondeterministic guessing, where a path can be guessed vertex by vertex while storing only the current vertex and a counter. [example: Reachability By Guessing A Path] For a directed graph with $n$ vertices and distinguished vertices $s,t$, a nondeterministic reachability procedure stores a current vertex $v$ and a counter $j$. It initializes $v=s$ and $j=0$. If $v=t$, it accepts. Otherwise, while $j<n-1$, it guesses one outgoing edge $(v,u)$, updates $v$ to $u$, updates $j$ to $j+1$, and accepts if the new current vertex is $t$. If no accepting sequence of guesses occurs within $n-1$ edge traversals, that computation branch rejects. The bound $n-1$ is enough because any path from $s$ to $t$ can be shortened to a simple path by deleting repeated-vertex cycles. A simple path in a graph with $n$ vertices contains at most $n$ distinct vertices, so it contains at most $n-1$ edges. Thus there is an accepting branch exactly when $t$ is reachable from $s$. The space usage is logarithmic. A vertex name is an integer from $1$ to $n$, so the current vertex can be stored using $\lceil \log_2 n\rceil$ bits. The counter ranges from $0$ to $n-1$, so it also uses $\lceil \log_2 n\rceil$ bits. Hence these two registers use \begin{align*} \lceil \log_2 n\rceil+\lceil \log_2 n\rceil=2\lceil \log_2 n\rceil \end{align*} bits. For $n\ge 2$, \begin{align*} \lceil \log_2 n\rceil \le \log_2 n+1 \le 2\log_2 n, \end{align*} so \begin{align*} 2\lceil \log_2 n\rceil \le 4\log_2 n. \end{align*} Thus the nondeterministic procedure uses $O(\log n)$ space, previewing the class $\operatorname{NL}$ while making the deterministic space accounting from this chapter available for comparison. [/example] ## Relating Time And Space The next question is how time and space bounds constrain each other. A machine that runs for a bounded number of steps can visit only that many cells, while a space-bounded decider has only finitely many configurations available. [quotetheorem:6178] [citeproof:6178] This containment is conceptually simple but important: time is always at least as restrictive as the corresponding space bound. Its one-way nature is essential. For example, a machine can use $O(\log n)$ space to keep a counter and repeatedly sweep across a read-only input for polynomially many rounds, so small space does not force comparably small time. More generally, the same work cells may be overwritten many times, and the theorem gives no upper bound on how often this reuse can happen. The missing control comes from determinism and halting. Once a deterministic machine's state, head positions, and work-tape contents are fixed, its entire future is fixed; if the same complete configuration appears twice, the computation is trapped in the same cycle. Thus a space-bounded decider cannot run longer than its number of possible configurations, and the next theorem turns that counting principle into a time simulation. [quotetheorem:6179] [citeproof:6179] The hypothesis $f(n)\ge \log n$ is the point at which the input-head position can be absorbed into the exponential term. If $f(n)$ were smaller than logarithmic, the configuration count would contain a separate factor of roughly $n$ for the input-head position, so the clean bound $2^{O(f(n))}$ would not follow from this counting argument. The theorem also does not say that every space-bounded computation actually needs exponential time; it gives a universal simulation bound, and many space-bounded algorithms halt much earlier. Its role is to turn memory bounds into time bounds by configuration counting, a technique that will reappear when comparing deterministic, nondeterministic, and alternating classes. Combining the two containments gives the basic resource chain \begin{align*} \operatorname{TIME}(f(n)) \subseteq \operatorname{SPACE}(f(n)) \subseteq \operatorname{TIME}(2^{O(f(n))}) \end{align*} for the standard models used in this course, whenever $f(n)\ge \log n$. This motivates applying the chain to polynomial space, because it gives the first useful comparison between $\operatorname{PSPACE}$ and exponential-time computation. [quotetheorem:6180] [citeproof:6180] This theorem is the first appearance of a pattern that recurs throughout complexity theory: a space bound often implies a much larger but still controlled time bound by counting configurations. The polynomial-space hypothesis is essential to the displayed exponential-time conclusion, since a larger space allowance such as $2^n$ would only give a double-exponential configuration bound by the same argument. The result is only an upper bound: it does not imply that $\operatorname{PSPACE}$ problems require exponential time, nor does it separate $\operatorname{P}$ from $\operatorname{PSPACE}$. What it does provide is a stable upper envelope for later complete problems, where showing a problem lies in $\operatorname{PSPACE}$ automatically places it within deterministic exponential time. ## Machine Models And Concrete Comparisons The abstract definitions become easier to use when applied to small language-recognition tasks. Palindromes are a good test case because the input must be compared at opposite ends, and the cost depends strongly on how many tapes the machine has. [example: Palindromes On A One-Tape Machine] Let $\operatorname{PAL}=\{w\in\{0,1\}^*: w=w^{\mathrm{rev}}\}$. On the classical writable one-tape model, consider the decider that repeatedly compares the leftmost unchecked symbol with the rightmost unchecked symbol, rejects if they differ, marks both symbols as checked if they agree, and returns to the leftmost unchecked symbol for the next round. If the input length is $n$, then after $r$ successful rounds, exactly $2r$ symbols have been marked, except possibly when the single middle symbol of an odd-length input is handled at the end. There are at most $\lceil n/2\rceil$ comparison rounds, because each non-final round marks two new symbols, and an odd-length input has one final middle symbol. During round $r$, with $0\le r<\lceil n/2\rceil$, the unchecked portion has length at most $n-2r$, and the extra moves needed to cross marked symbols, inspect endpoints, and return are bounded by a constant multiple of the same scan plus a fixed overhead. Thus the number of steps in round $r$ is at most $a(n-2r)+b$ for fixed machine-dependent constants $a,b>0$. Since $n-2r\le n$, the total running time is at most \begin{align*} \sum_{r=0}^{\lceil n/2\rceil-1}\bigl(a(n-2r)+b\bigr)\le \sum_{r=0}^{\lceil n/2\rceil-1}(an+b). \end{align*} The right-hand side has exactly $\lceil n/2\rceil$ equal summands, so \begin{align*} \sum_{r=0}^{\lceil n/2\rceil-1}(an+b)=(an+b)\lceil n/2\rceil. \end{align*} For $n\ge 1$, $\lceil n/2\rceil\le n$, hence \begin{align*} (an+b)\lceil n/2\rceil\le (an+b)n=an^2+bn. \end{align*} Again for $n\ge 1$, $bn\le bn^2$, so \begin{align*} an^2+bn\le (a+b)n^2. \end{align*} Therefore this one-tape palindrome decider runs in $O(n^2)$ time. This is a time comparison with the writable-input model from the first chapter: under the read-only-input/work-tape space convention used above, the marks would have to be stored on a work tape, so the example does not assert membership in $\operatorname{SPACE}(1)$ for that model. [/example] The same language has a faster algorithm when two tapes are available. This illustrates why the first chapter emphasized polynomial robustness rather than exact step counts across machine models. [example: Palindromes On A Two-Tape Machine] Let $w\in\{0,1\}^*$ have length $n$. A two-tape decider for $\operatorname{PAL}$ first copies $w$ from the input tape onto a work tape: for each $i$ with $1\le i\le n$, it reads the $i$th input symbol, writes the same symbol on the work tape, and moves both heads one cell to the right. Each copied symbol costs at most a fixed number $a$ of transitions, so the copying phase takes at most $an+b$ steps for fixed machine-dependent constants $a,b>0$. After copying, the input head is returned to the first input symbol and the work-tape head is moved left from the blank after the copied word to the last copied symbol. These head movements cross at most $n$ input cells and a constant number of additional boundary cells, so they take at most $cn+d$ steps for fixed constants $c,d>0$. The comparison phase then performs at most $n$ comparisons: on each comparison it reads one symbol from the input tape and one symbol from the work tape, rejects if they differ, and otherwise moves the input head one cell right while moving the work-tape head one cell left. Each comparison costs at most a fixed number $e$ of transitions, so this phase takes at most $en+f$ steps for fixed constants $e,f>0$. The total running time is therefore bounded by \begin{align*} (an+b)+(cn+d)+(en+f)=(a+c+e)n+(b+d+f). \end{align*} For $n\ge 1$, the constant term satisfies \begin{align*} b+d+f\le (b+d+f)n. \end{align*} Substituting this into the previous bound gives \begin{align*} (a+c+e)n+(b+d+f)\le (a+c+e)n+(b+d+f)n. \end{align*} Combining the coefficients of $n$ gives \begin{align*} (a+c+e)n+(b+d+f)n=(a+b+c+d+e+f)n. \end{align*} Hence the two-tape decider runs in $O(n)$ time. The work tape contains one copied symbol for each of the $n$ input symbols, plus at most a constant number of extra boundary or blank cells visited by the head. Thus its work-tape usage is at most $n+g$ cells for some constant $g\ge 0$. If $g=0$, this is exactly $n$. If $g>0$ and $n\ge 1$, then \begin{align*} g\le gn. \end{align*} Therefore \begin{align*} n+g\le n+gn=(g+1)n. \end{align*} So the space usage is $O(n)$. The second tape makes palindrome checking linear-time by allowing the machine to compare the input against a reversed traversal of its copied contents instead of repeatedly scanning across the whole input. [/example] These examples show why complexity classes are defined with respect to a fixed standard model but interpreted through robust families of bounds. Quadratic versus linear time matters for algorithm analysis, while polynomial versus exponential time is the level at which the classes $\operatorname{P}$, $\operatorname{NP}$, and $\operatorname{PSPACE}$ are designed to be stable. The previous chapter turned machines into objects with measurable cost, making time and space bounds precise and robust under model changes. The next chapter focuses on the first major efficiency class, isolating the languages decidable in polynomial time. # 3. The Class $\mathrm{P}$ Polynomial time is the first resource bound in the course that behaves like a mathematical version of feasible computation. Chapter 2 measured time and space for deciders; here we isolate the languages decidable within a polynomial number of steps and study the stability properties that make this class useful. The chapter also introduces several canonical algorithmic problems in polynomial time, because complexity classes are best understood through both definitions and examples. ## Efficient Decision Problems Which decision problems should count as efficiently solvable? A running time such as $n^2$ or $n^5$ depends on the encoding and machine model, but it is stable under the polynomial simulations discussed earlier, and it separates finite brute force from algorithms that scale by repeated local computation. The central object is a language rather than an optimization problem, because the Turing-machine model decides membership. Optimization problems enter the theory by asking associated yes-no questions, such as whether a graph has a matching of size at least $k$. [definition: Polynomial-Time Decidable Language] Let $L$ be a language over a finite alphabet $\Sigma$. The language $L$ is polynomial-time decidable if there exist a deterministic Turing machine $M$ and constants $c,d>0$ such that: 1. $M$ halts on every input $x \in \Sigma^*$; 2. $M$ accepts $x$ if and only if $x \in L$; 3. on every input $x$ of length $n$, $M$ runs for at most $c n^d + c$ steps. [/definition] The additive constant handles short inputs, while the exponent records the large-scale growth rate. Replacing $c n^d+c$ by $O(n^d)$ gives the same class, but the explicit bound reminds us that the statement is about every input of a given length. [definition: The Class P] The class $\mathrm{P}$ is the set of all polynomial-time decidable languages: \begin{align*} \mathrm{P} = \bigcup_{d \ge 1} \operatorname{TIME}(n^d). \end{align*} [/definition] This definition hides the chosen deterministic machine model. Without a robustness theorem, the class could be an artefact of a particular tape layout or instruction set: a problem might appear efficient only because one model has a built-in operation that another must simulate. The next result isolates the exact hypothesis needed in this course, namely polynomial-time mutual simulation with polynomially bounded configuration encodings. [quotetheorem:6181] [citeproof:6181] The polynomial bounds in the hypotheses are doing real work. If a model allowed exponentially long configurations to be accessed in one step, or if simulating one step required exponential time, polynomial time would no longer be preserved. The theorem also does not say that all models of computation are equivalent, only that models related by polynomial overhead define the same class. From now on, an algorithm described in ordinary discrete-mathematical terms represents a polynomial-time decider once its data structures have polynomial-size encodings and each operation can be implemented with polynomial overhead. [example: Unary and Binary Encodings] Consider the language of encoded pairs $(N,k)$ with $N\ge 1$, $k\ge 1$, and $k$ dividing $N$. If $N$ and $k$ are written in binary and the total input length is $m$, then both integers have at most $m$ bits. Schoolbook long division performs at most $m$ quotient-bit steps; each step compares and subtracts binary strings of length at most $m$, so the running time is bounded by a polynomial in $m$, for instance $O(m^2)$ under the usual bit-operation accounting. The decider accepts exactly when the final remainder is $0$. If instead the input is written in unary as $1^N\#1^k$, then the input length is \begin{align*} m=N+k+1. \end{align*} Repeated subtraction starts with $r=N$ and replaces $r$ by $r-k$ while $r\ge k$. The number of subtraction steps is at most $\lfloor N/k\rfloor$. Since $k\ge 1$, we have \begin{align*} \lfloor N/k\rfloor \le N. \end{align*} Also $N\le N+k+1=m$, so the number of subtractions is at most $m$. Each subtraction can be implemented by scanning and crossing off at most $m$ unary symbols, giving total time $O(m^2)$. Thus divisibility is polynomial-time decidable under both ordinary encodings, but the polynomial bound is always measured against the chosen input length; artificially compressed or padded encodings can therefore change the complexity statement. [/example] ## Closure Properties of $\mathrm{P}$ Once a problem has been placed in $\mathrm{P}$, which new problems can be built from it without leaving $\mathrm{P}$? Closure properties answer this question and let us combine algorithms modularly. The Boolean operations are the first test. A complexity class that failed to survive finite logical combinations would be awkward as a theory of efficient decision-making: even asking for inputs satisfying two already efficient tests could leave the class. The obstruction to watch for is not correctness but resource growth, so the proof must use the fact that only finitely many polynomial-time deciders are being combined. [quotetheorem:6182] [citeproof:6182] The finiteness of the Boolean combination is essential. Intersecting polynomial-time languages over an input-dependent or infinite family could require running too many tests, so the theorem does not license arbitrary quantified combinations. The complement clause also uses that the machines are deciders: reversing the answer of a recogniser that might not halt would not give a decision procedure. This closure theorem is often used without comment in later reductions, where polynomial-time algorithms are assembled from a fixed number of efficient subroutines. [example: Combining Graph Properties] Let $L_1$ be the language of encoded graphs that are connected, and let $L_2$ be the language of encoded graphs that are bipartite. Breadth-first search decides membership in $L_1$ in polynomial time: starting from one vertex, it marks every reachable vertex, scans each vertex and edge at most a bounded number of times, and accepts exactly when all vertices have been marked. The two-colouring breadth-first search for bipartiteness decides membership in $L_2$ in polynomial time by assigning opposite colours across each inspected edge and rejecting exactly when an edge has two endpoints of the same colour. To decide whether an encoded graph lies in $L_1\cap L_2$, run the connectivity decider and the bipartiteness decider on the same input. If their running times are bounded by polynomials $p_1(n)$ and $p_2(n)$ on inputs of length $n$, then the combined machine runs in time \begin{align*} p_1(n)+p_2(n)+O(1), \end{align*} which is polynomial because the sum of two polynomials is a polynomial. It accepts exactly when the graph is both connected and bipartite, so this language lies in $\mathrm{P}$ by *Boolean Closure of P*. [/example] Boolean closure combines yes-no answers after running deciders on the same input. The next operation is different: it changes the input itself before applying another algorithm, so we need a formal notion of efficient string transformation. [definition: Polynomial-Time Computable Function] A function $f:\Sigma^* \to \Gamma^*$ is polynomial-time computable if there exist a deterministic Turing machine $M$ and a polynomial $p$ such that, on every input $x \in \Sigma^*$, the machine $M$ halts within $p(|x|)$ steps with $f(x)$ written on its output tape. [/definition] Since a machine cannot write more symbols than the number of steps it takes, a polynomial-time computable function has polynomially bounded output length. This observation is the key estimate needed to prove that performing two efficient transformations in sequence remains efficient. Without the output-length bound, the second computation could receive an exponentially longer string even if the first stage were described as a preprocessing step. [quotetheorem:6183] [citeproof:6183] The theorem uses both assumptions: $f$ must be computable quickly, and $g$ must run quickly on the actual length of $f(x)$. If $f$ were allowed to output exponentially many symbols, the second stage could be too large; if $g$ were only efficient on the original input length, the bound would not follow. The theorem does not say that every useful preprocessing step is harmless, only that polynomial-time transformations with polynomially bounded outputs can be chained. To compare two decision problems, however, we need the preprocessing map to preserve yes-instances and no-instances in a precise way; this motivates the reduction notion used throughout NP-completeness. [definition: Polynomial-Time Many-One Reduction] Let $A \subseteq \Sigma^*$ and $B \subseteq \Gamma^*$ be languages. A polynomial-time many-one reduction from $A$ to $B$ is a polynomial-time computable function $f:\Sigma^* \to \Gamma^*$ such that, for every $x \in \Sigma^*$, \begin{align*} x \in A \iff f(x) \in B. \end{align*} We write $A \le_m^p B$. [/definition] A reduction is useful because it converts an algorithm for $B$ into an algorithm for $A$. The equivalence condition is what prevents a misleading transformation: mapping every input to a fixed yes-instance of $B$ would be computable in constant time but would preserve no information about $A$. The next theorem records this transfer formally and fixes the direction: if $A \le_m^p B$, then $A$ is no harder than $B$ up to polynomial preprocessing. [quotetheorem:6184] [citeproof:6184] The direction of the implication is the main point. The theorem gives an algorithm for $A$ from an algorithm for $B$, not conversely; knowing $A \le_m^p B$ and $A \in \mathrm{P}$ says nothing by itself about whether $B$ is easy. The polynomial-time requirement on $f$ is also essential, because an exponentially expensive transformation could hide the whole difficulty of deciding $A$. Later chapters reverse the emotional use of this theorem: reductions into a problem give algorithms, while reductions from many known hard problems give evidence of hardness. Both uses depend on the same formal statement. ## Reachability and Graph Search Which graph problems are solvable by exploring only a polynomial number of local neighbourhoods? Reachability is the basic example: given a directed graph and two vertices, decide whether a path leads from the start vertex to the target. [definition: Directed s-t Reachability] The language $\mathrm{PATH}$ consists of encodings $\langle G,s,t\rangle$ where $G=(V,E)$ is a finite directed graph, $s,t \in V$, and there exists a directed path in $G$ from $s$ to $t$. [/definition] The natural algorithm marks all vertices reachable from $s$ by repeatedly following outgoing edges. The possible obstruction is that a graph may contain exponentially many walks, since cycles allow repeated visits; a polynomial-time algorithm must avoid enumerating walks and instead remember which vertices have already been discovered. This reachability problem is important because it turns graph exploration into a decision language and supplies the simplest canonical example of a nonconstant problem in $\mathrm{P}$. [quotetheorem:6185] [citeproof:6185] The theorem relies on the graph being given explicitly, for instance by an adjacency list or another polynomial-size encoding from which neighbours can be scanned efficiently. It does not say that reachability is always easy in exponentially large implicit graphs, where merely listing the reachable region may be impossible in polynomial time. The important algorithmic pattern is to replace exponentially many possible walks by a monotone set of discovered vertices, with each vertex and edge charged only a bounded number of times. This pattern reappears below in implication graphs for $\mathrm{2SAT}$ and in alternating searches for bipartite matching. [example: Algorithm for s-t Reachability] Given a directed graph $G=(V,E)$ with adjacency lists and vertices $s,t\in V$, initialise $R=\{s\}$ and a queue containing $s$. While the queue is nonempty, remove the front vertex $v$ and inspect each outgoing edge $(v,w)\in E$ in the adjacency list of $v$; if $w\notin R$, add $w$ to $R$ and enqueue $w$. The algorithm accepts exactly when $t\in R$ after the queue is exhausted. At every stage, each vertex in $R$ is reachable from $s$: initially this is true because $s$ is reachable from itself by the length-$0$ path, and if $v$ is reachable from $s$ and $(v,w)\in E$, then appending the edge $(v,w)$ gives a directed path from $s$ to $w$. Conversely, when the queue is exhausted, every outgoing neighbour of every vertex in $R$ has already been added to $R$. If there were a path \begin{align*} s=v_0 \to v_1 \to \cdots \to v_\ell=t \end{align*} with $t\notin R$, choose the smallest index $i$ such that $v_i\notin R$. Then $i\ge 1$, so $v_{i-1}\in R$, and the edge $(v_{i-1},v_i)$ would have been inspected when $v_{i-1}$ was removed from the queue. That inspection would add $v_i$ to $R$, contradicting the choice of $i$. Hence the final set $R$ is exactly the set of vertices reachable from $s$. For the running time, a vertex is enqueued only at the moment it is first added to $R$, so there are at most $|V|$ enqueue operations and at most $|V|$ removals from the queue. When a vertex $v$ is removed, the algorithm scans precisely the adjacency-list entries for edges leaving $v$. Since no vertex is removed twice, the total number of scanned adjacency-list entries is \begin{align*} \sum_{v\in V}\operatorname{outdeg}(v)=|E|. \end{align*} Thus the total work is bounded by a constant multiple of \begin{align*} |V|+|E|, \end{align*} so the reachability test runs in $O(|V|+|E|)$ time and decides whether there is a directed path from $s$ to $t$. [/example] Reachability propagates a single label, namely discovered or undiscovered. The next graph problem asks whether two labels can be propagated consistently across every edge, so we first name the target structure. [definition: Bipartite Graph] An undirected graph $G=(V,E)$ is bipartite if there exist sets $A,B \subset V$ such that: 1. $A \cap B = \varnothing$; 2. $A \cup B = V$; 3. every edge of $G$ has one endpoint in $A$ and one endpoint in $B$. [/definition] A bipartition is the same data as a valid two-colouring of the graph. Trying all partitions would require checking exponentially many assignments of vertices to sides, so the useful observation is that once one vertex in a connected component is coloured, every edge forces the colours of its neighbours. This converts the decision problem into a consistency test for local constraints along edges, which suggests another breadth-first propagation algorithm. [quotetheorem:6186] [citeproof:6186] Several qualifications matter here. A colouring started in one connected component gives no information about another component, so the algorithm must restart the search until every vertex has been considered. The graph is also undirected; for directed graphs, an analogous colouring question would need a separately specified constraint on arc directions. The theorem does not claim that all graph partitioning problems are efficient, since requiring three colours instead of two leads to a very different problem later in the course. The same-colour conflict condition also points toward short certificates of failure, because it can be traced back to an odd cycle. [example: Testing Whether a Graph Is Bipartite] Consider the graph with vertex set $V=\{1,2,3,4\}$ and edge set \begin{align*} E=\{12,23,34,41\}. \end{align*} Start the breadth-first two-colouring procedure at vertex $1$ and set $c(1)=0$. Since $1$ is adjacent to $2$ and $4$, both neighbours receive the opposite colour: \begin{align*} c(2)=1-c(1)=1-0=1. \end{align*} \begin{align*} c(4)=1-c(1)=1-0=1. \end{align*} Now inspect the edge $23$. Since $c(2)=1$, vertex $3$ receives \begin{align*} c(3)=1-c(2)=1-1=0. \end{align*} The remaining edge $34$ is consistent because $c(3)=0$ and $c(4)=1$, so its endpoints have opposite colours. Checking the four edges one by one, $12$ has endpoint colours $0$ and $1$, $23$ has endpoint colours $1$ and $0$, $34$ has endpoint colours $0$ and $1$, and $41$ has endpoint colours $1$ and $0$. Thus every edge joins one vertex of colour $0$ to one vertex of colour $1$, so the algorithm accepts the graph as bipartite. If the extra edge $13$ is added, the same propagation gives \begin{align*} c(1)=0. \end{align*} \begin{align*} c(3)=0. \end{align*} The edge $13$ therefore has both endpoints in the same colour class, so the two-colouring condition fails at that edge and the algorithm rejects. [/example] ## Polynomial-Time Satisfiability Fragments General Boolean satisfiability will become the central NP-complete problem, so it is important to identify restricted formula classes that remain polynomial-time solvable. The contrast shows that hardness is not caused by the mere presence of Boolean variables, but by the structure of their interactions. A 2-CNF formula restricts each clause to at most two literals. Brute-force search over truth assignments still has size $2^n$, so the restriction is useful only if it exposes extra structure. The key observation is that each two-literal clause can be read as a pair of implications, creating a directed graph on literals. [definition: Two-Satisfiability] The language $\mathrm{2SAT}$ consists of encodings of Boolean formulas in conjunctive normal form where every clause has at most two literals and the formula is satisfiable. [/definition] For a clause $(a \lor b)$, the forbidden assignment is $\neg a$ and $\neg b$ simultaneously. Translating each clause into implications turns satisfiability into a reachability question inside a directed graph, which makes polynomial-time graph algorithms applicable. [quotetheorem:6187] [citeproof:6187] The theorem depends crucially on the two-literal restriction: it is what turns each clause into implications between single literals. It does not give a polynomial-time algorithm for arbitrary CNF satisfiability, where clauses of length three are already enough for the later NP-completeness theorem. The proof gives a complete structural test, not a search over assignments, and it connects graph reachability to logic by replacing truth assignments with strongly connected components. This motivates the next definition: a second tractable satisfiability fragment uses clauses that behave like monotone rules fired until a fixed point is reached. [definition: Horn Clause] A Horn clause is a disjunction of literals containing at most one positive literal. [/definition] The restriction means each clause can be read as a rule: if all listed negative literals are false, then the single positive literal, if present, must be true. To turn this rule format into a decision problem, we collect formulas whose every clause has this restricted shape. [definition: Horn-SAT] The language $\mathrm{HornSAT}$ consists of encodings of satisfiable Boolean formulas in conjunctive normal form whose clauses are all Horn clauses. [/definition] The forward-chaining algorithm constructs the least assignment forced by all definite clauses. Without the Horn restriction, setting a variable to satisfy one clause can later force a choice that must be undone, creating backtracking. Horn clauses avoid this obstruction because the operation of turning variables from false to true is monotone. This motivates the following theorem: if a negative goal clause is triggered, the formula is unsatisfiable; otherwise the least forced assignment satisfies all clauses. [quotetheorem:6188] [citeproof:6188] The theorem uses the asymmetry in the definition of a Horn clause. If clauses could contain two positive literals, a rule premise might force a disjunctive choice rather than a single variable, and the least-forced-assignment argument would break down. The theorem also decides satisfiability, not every optimisation question about Horn formulas. This algorithm is a prototype for fixed-point computation in finite systems: graph reachability repeatedly closes a set of vertices under outgoing edges, while Horn-SAT repeatedly closes a set of true variables under implication rules. That shared monotone-closure pattern is one reason these fragments stay inside $\mathrm{P}$. [example: Polynomial-Time Algorithm for Horn-SAT] For the Horn formula \begin{align*} (x) \land (\neg x \lor y) \land (\neg y \lor z) \land (\neg z \lor \neg w), \end{align*} rewrite each clause as an implication. The unit clause $(x)$ requires $x$ to be true, so it becomes $\top\to x$. The clause $(\neg x\lor y)$ is false exactly when $x=\mathrm{true}$ and $y=\mathrm{false}$, so it is equivalent to $x\to y$. The clause $(\neg y\lor z)$ is equivalent in the same way to $y\to z$. The final clause $(\neg z\lor \neg w)$ has no positive literal; it is false exactly when $z=\mathrm{true}$ and $w=\mathrm{true}$, so it becomes the contradiction rule $z\land w\to \bot$. Forward chaining starts with all variables false: \begin{align*} x=\mathrm{false},\quad y=\mathrm{false},\quad z=\mathrm{false},\quad w=\mathrm{false}. \end{align*} The premise $\top$ of $\top\to x$ is true, so the algorithm sets \begin{align*} x=\mathrm{true}. \end{align*} Now the premise of $x\to y$ is true, so it sets \begin{align*} y=\mathrm{true}. \end{align*} Now the premise of $y\to z$ is true, so it sets \begin{align*} z=\mathrm{true}. \end{align*} The variable $w$ is never the conclusion of a rule, so it remains \begin{align*} w=\mathrm{false}. \end{align*} Under the final assignment, the contradiction rule has premise \begin{align*} z\land w=\mathrm{true}\land \mathrm{false}=\mathrm{false}, \end{align*} so $z\land w\to\bot$ does not fire. Checking the original clauses gives $(x)=\mathrm{true}$. Also \begin{align*} (\neg x\lor y)=\mathrm{false}\lor \mathrm{true}=\mathrm{true}. \end{align*} Next, \begin{align*} (\neg y\lor z)=\mathrm{false}\lor \mathrm{true}=\mathrm{true}. \end{align*} Finally, \begin{align*} (\neg z\lor \neg w)=\mathrm{false}\lor \mathrm{true}=\mathrm{true}. \end{align*} Thus the final assignment $x=y=z=\mathrm{true}$ and $w=\mathrm{false}$ satisfies every clause. Each ordinary rule can force its conclusion only once, because variables only change from false to true, and contradiction rules are checked when their premises become true; hence forward chaining uses only polynomially many rule checks in the size of the Horn formula. [/example] ## Matching and Arithmetic Problems in $\mathrm{P}$ Polynomial time contains algorithms whose design is much less local than breadth-first search or forward chaining. Bipartite matching and primality testing are canonical examples: both are efficiently solvable, but their algorithms reveal deeper combinatorial or number-theoretic structure. [definition: Bipartite Matching Decision Problem] The language $\mathrm{BIPARTITE\text{-}MATCHING}$ consists of encodings $\langle G,k\rangle$ where $G=(A\cup B,E)$ is a finite bipartite graph and there exists a matching in $G$ of size at least $k$. [/definition] A matching is a set of edges no two of which share an endpoint. Greedily adding any available edge can get stuck below the maximum, because an early choice may block two later compatible choices. The decision version is polynomial-time equivalent to computing a maximum matching size, because the computed maximum can be compared with $k$; the algorithm must therefore include a way to repair previous choices. [quotetheorem:6189] [citeproof:6189] [example: A Small Matching Instance] Let $A=\{a_1,a_2,a_3\}$, $B=\{b_1,b_2,b_3\}$, and \begin{align*} E=\{a_1b_1,a_1b_2,a_2b_2,a_3b_1,a_3b_3\}. \end{align*} Consider the current matching \begin{align*} M=\{a_1b_1,a_2b_2\}. \end{align*} The vertices incident to edges of $M$ are $a_1,b_1,a_2,b_2$, so the unmatched vertices are $a_3$ and $b_3$. The sequence \begin{align*} a_3,b_1,a_1,b_2,a_2 \end{align*} uses the edges $a_3b_1$, $a_1b_1$, $a_1b_2$, and $a_2b_2$. Their membership in $M$ is \begin{align*} a_3b_1\notin M,\quad a_1b_1\in M,\quad a_1b_2\notin M,\quad a_2b_2\in M. \end{align*} Thus the path is alternating. It is not augmenting, because it starts at the unmatched vertex $a_3$ but ends at $a_2$, and $a_2$ is matched by $a_2b_2\in M$. The edge $a_3b_3$ belongs to $E$, and neither $a_3$ nor $b_3$ is incident to an edge of $M$. Hence \begin{align*} a_3,b_3 \end{align*} is an augmenting path. Flipping along it adds $a_3b_3$ and removes no edge of $M$, so the new matching is \begin{align*} M'=M\cup\{a_3b_3\}=\{a_1b_1,a_2b_2,a_3b_3\}. \end{align*} Therefore $|M|=2$ and $|M'|=3$. Now change the graph by removing $a_3b_3$ and adding $a_2b_3$. The new edge set is \begin{align*} E'=\{a_1b_1,a_1b_2,a_2b_2,a_3b_1,a_2b_3\}. \end{align*} With the same matching $M=\{a_1b_1,a_2b_2\}$, the sequence \begin{align*} a_3,b_1,a_1,b_2,a_2,b_3 \end{align*} uses the edges $a_3b_1$, $a_1b_1$, $a_1b_2$, $a_2b_2$, and $a_2b_3$. Their membership in $M$ is \begin{align*} a_3b_1\notin M,\quad a_1b_1\in M,\quad a_1b_2\notin M,\quad a_2b_2\in M,\quad a_2b_3\notin M. \end{align*} The path starts at the unmatched vertex $a_3$ and ends at the unmatched vertex $b_3$, so it is augmenting. Flipping along this path removes the matched edges on the path and adds the unmatched edges on the path. The matched edges on the path are $a_1b_1$ and $a_2b_2$, and the unmatched edges on the path are $a_3b_1$, $a_1b_2$, and $a_2b_3$. Hence \begin{align*} M''=\bigl(M\setminus\{a_1b_1,a_2b_2\}\bigr)\cup\{a_3b_1,a_1b_2,a_2b_3\}. \end{align*} Substituting $M=\{a_1b_1,a_2b_2\}$ gives \begin{align*} M''=\bigl(\{a_1b_1,a_2b_2\}\setminus\{a_1b_1,a_2b_2\}\bigr)\cup\{a_3b_1,a_1b_2,a_2b_3\}. \end{align*} Since $\{a_1b_1,a_2b_2\}\setminus\{a_1b_1,a_2b_2\}=\varnothing$, this becomes \begin{align*} M''=\{a_3b_1,a_1b_2,a_2b_3\}. \end{align*} The three edges in $M''$ have distinct $A$-endpoints $a_3,a_1,a_2$ and distinct $B$-endpoints $b_1,b_2,b_3$, so $M''$ is a matching of size $3$. This shows that augmentation can require reorganising earlier matched edges, not merely adding one currently available edge. [/example] Arithmetic problems require care because the input length is the number of bits, not the magnitude of the integer. Trial division up to $\sqrt{N}$ is exponential in the bit length of $N$, so primality is not placed in $\mathrm{P}$ by elementary trial division. [definition: Primality Testing] The language $\mathrm{PRIMES}$ consists of binary encodings of integers $N \ge 2$ such that $N$ is prime. [/definition] The natural trial-division approach fails because the number of possible divisors is measured against the value of $N$, while complexity is measured against the bit length of $N$. The forward question is whether primality nevertheless has an efficient deterministic recognition method when the input is given in binary. The AKS primality test answers yes: primality belongs to $\mathrm{P}$. In these notes this result is used as a boundary marker rather than developed as a theorem with proof. It shows that deterministic polynomial time includes some algorithms whose efficiency is not visible from elementary search over candidate divisors. The limitation is important. A primality test only distinguishes prime inputs from composite inputs; it need not produce a nontrivial divisor when the answer is composite. Trial division gives a concrete failure mode for naive algorithms: on an $m$-bit integer $N$, checking divisors up to $\sqrt{N}$ may require about $2^{m/2}$ candidates, so the procedure is exponential in the input length even though it looks natural as an algorithm in the value of $N$. Randomized tests such as Miller-Rabin give fast and reliable evidence in practice, but by themselves they do not establish deterministic membership in $\mathrm{P}$ unless the error is removed or derandomised. Thus the theorem does not collapse the computational distinction between recognising compositeness, certifying compositeness, and finding a useful factorisation. The AKS hypotheses also depend on binary input length and exact modular arithmetic, so the result should not be read as saying that every arithmetic question phrased near primality inherits a polynomial-time algorithm. [remark: P Is Not the Same as Practical] A polynomial-time algorithm may have a large exponent or constants that make it unsuitable for realistic input sizes, while an exponential-time algorithm may perform well on small structured instances. The class $\mathrm{P}$ is a coarse asymptotic class, not a complete performance model. Its importance is that it is robust, mathematically tractable, and aligned with many major algorithmic successes. [/remark] ## How $\mathrm{P}$ Will Be Used Later Why spend so much time proving that familiar problems lie in $\mathrm{P}$ before introducing $\mathrm{NP}$? The reason is that $\mathrm{P}$ is the baseline class of efficiently decidable languages, so every later hardness statement is measured against it. The closure results mean that polynomial preprocessing, post-processing, and finite combinations do not change efficient decidability. The examples show several algorithmic patterns: graph search closes a reachable set, $\mathrm{2SAT}$ turns clauses into reachability inside an implication graph, Horn-SAT computes a monotone fixed point, matching repairs partial solutions by augmenting paths, and primality uses algebraic identities rather than factor search. When the course moves to $\mathrm{NP}$, reductions will ask whether apparently harder search problems can be transformed into each other while preserving polynomial time; the tractable fragments in this chapter serve as the comparison class. The same boundary will also matter when randomized algorithms appear: primality illustrates that a fast probabilistic test and a deterministic polynomial-time decider are different kinds of evidence, while reductions will track which form of efficiency is being preserved. Chapter 3 identifies polynomial time as the course's formal notion of efficient deterministic computation. To understand why NP is defined the way it is, we now move from always-on deterministic choice to nondeterministic branching and verification. # 4. Nondeterminism and $\mathrm{NP}$ Chapters 2 and 3 defined deterministic time classes and treated polynomial time as the class of efficiently decidable languages. This chapter introduces nondeterminism, a model in which a computation may branch into many possible continuations and accepts when at least one branch succeeds. The same class is then reformulated through certificates and verifiers, which is the perspective used in reductions and NP-completeness. The guiding question is not whether nondeterministic machines are realistic hardware. It is whether short, efficiently checkable evidence can certify membership in a language, even when finding that evidence appears hard. ## Branching Computations and Acceptance How should a Turing machine model a search procedure that has many choices, any one of which could lead to success? A deterministic machine has a single next move from each configuration, so it represents search by trying options in some chosen order. A nondeterministic machine instead records all possible local choices as branches of one computation tree. [definition: Nondeterministic Turing Machine] A nondeterministic Turing machine is a Turing machine $N$ with finite state set $Q$, tape alphabet $\Gamma$, and transition relation \begin{align*} \Delta \subseteq Q \times \Gamma \times Q \times \Gamma \times \{L,R,S\}. \end{align*} A computation branch of $N$ on an input $x$ is a finite or infinite sequence of configurations beginning with the initial configuration on $x$, where each successor configuration is obtained by applying one transition in $\Delta$ whose source state and scanned symbol match the current configuration. [/definition] This definition changes the shape of a computation but not the syntax of configurations. Once a computation has many branches, the next question is which branch outcomes should count as success for the whole machine. [definition: Accepting Branch] Let $N$ be a nondeterministic Turing machine and let $x$ be an input string. An accepting branch of $N$ on $x$ is a computation branch that reaches an accepting halting state. The machine $N$ accepts $x$ if there exists an accepting branch of $N$ on $x$. [/definition] Existential acceptance is the central convention. Rejection for decision problems requires that no branch accepts, so a time-bounded theory must also say how long unsuccessful and rejecting branches may run. [definition: Nondeterministic Time] Let $t: \mathbb N \to \mathbb N$ be a function. A nondeterministic Turing machine $N$ runs in time $t(n)$ if there exists a constant $c > 0$ such that, for every input $x$ of length $n$, every branch of $N$ halts within at most $c t(n)$ steps. [/definition] A language $L$ lies in $\operatorname{NTIME}(t(n))$ when some nondeterministic Turing machine running in time $t(n)$ accepts exactly the strings in $L$. The bound is placed on every branch, not on the total number of branches. To isolate the efficiently checkable search problems studied in this course, we now impose the same polynomial scale used for deterministic efficient computation. [definition: Nondeterministic Polynomial Time] The class $\operatorname{NP}$ is \begin{align*} \operatorname{NP} = \bigcup_{k \ge 1} \operatorname{NTIME}(n^k). \end{align*} [/definition] This definition suggests that NP contains problems whose solutions can be found by a polynomial number of nondeterministic choices. The next examples translate familiar search problems into this machine picture. [example: Hamiltonian Cycle By Nondeterministic Guessing] Let $L_{\mathrm{HC}}$ be the language of encodings of finite graphs that contain a Hamiltonian cycle. On input an encoded graph $G=(V,E)$ with $|V|=m$, a nondeterministic machine writes a sequence $(v_1,\dots,v_m)$, where each $v_i$ is intended to be a vertex of $G$. The deterministic check first verifies that every $v_i$ is a valid vertex name. It then checks distinctness by comparing every pair $v_i,v_j$ with $1 \le i < j \le m$; there are \begin{align*} \frac{m(m-1)}{2} \end{align*} such pairs, so this part uses at most quadratic many comparisons in $m$. Finally it checks the $m$ required adjacencies \begin{align*} v_1v_2,\ v_2v_3,\ \dots,\ v_{m-1}v_m,\ v_mv_1. \end{align*} The branch accepts if all vertex names are valid, all $m$ vertices are distinct, and all $m$ listed edges are present in $E$. If an accepting branch exists, then the accepted sequence visits each vertex exactly once and returns to its starting vertex through edges of $G$, so it is a Hamiltonian cycle. Conversely, if $G$ has a Hamiltonian cycle, one branch can guess its cyclic ordering and will pass all the checks. Since $m$ is at most the length of the graph encoding and the verifier performs only pairwise comparisons and edge-membership checks among the guessed vertices, every branch runs in polynomial time in the input length. [/example] The same pattern applies to many combinatorial problems: the nondeterministic steps guess a candidate object, and the deterministic part checks that the candidate satisfies the required local constraints. [example: Clique By Nondeterministic Guessing] For the language of pairs $(G,k)$ such that the graph $G=(V,E)$ has a clique of size at least $k$, a nondeterministic machine guesses a sequence $(v_1,\dots,v_k)$ of $k$ vertex names. The deterministic check first verifies that each $v_i$ is a valid vertex of $G$. It then checks distinctness by comparing $v_i$ and $v_j$ for every pair of indices with $1 \le i < j \le k$. The number of index pairs with $1 \le i < j \le k$ is \begin{align*} (k-1)+(k-2)+\cdots+2+1 = \frac{k(k-1)}{2}. \end{align*} Thus the number of distinctness comparisons is \begin{align*} \binom{k}{2} = \frac{k(k-1)}{2}. \end{align*} For each of these same $\binom{k}{2}$ pairs, the verifier checks whether the edge $v_i v_j$ belongs to $E$. The branch accepts exactly when all guessed names are valid vertices, the $k$ guessed vertices are distinct, and every pair among them is joined by an edge. If an accepting branch exists, then the accepted vertices form a set of $k$ distinct vertices with every pair adjacent, so $G$ has a clique of size $k$ and therefore a clique of size at least $k$. Conversely, if $G$ has a clique of size at least $k$, one branch can guess any $k$ vertices from that clique, and every distinctness and adjacency check will pass. Since an accepting branch has $k \le |V|$, and $|V|$ is at most the length of the input encoding, the $k$ vertex-validity checks together with the $\binom{k}{2}$ distinctness and adjacency checks give polynomial running time on each branch. [/example] These examples also show a possible misconception. Nondeterminism does not mean that an exponentially long object may be written down in one step; the branch itself has only polynomially many transitions when the machine is polynomial-time bounded. [example: Exponentially Long Certificate Fails The Time Bound] Consider a proposed certificate for a graph problem that lists one bit for every subset of the vertex set $V$. If $|V|=m$, then each vertex either belongs to a subset or does not, so the number of subsets is \begin{align*} 2 \cdot 2 \cdots 2 = 2^m, \end{align*} with one factor of $2$ for each of the $m$ vertices. Thus the proposed certificate has at least $2^m$ bits before any separators or auxiliary data are included. A verifier that checks this certificate must at least read the listed bits. Reading one bit takes one step up to a constant factor, so reading all entries takes at least \begin{align*} 2^m \end{align*} bit inspections. This is not bounded by any polynomial in $m$: for a fixed degree $d$, the ratio \begin{align*} \frac{2^m}{m^d} \end{align*} eventually exceeds every constant as $m$ grows, since exponential growth dominates polynomial growth. Therefore this certificate format cannot prove membership in $\operatorname{NP}$, because the verifier viewpoint requires polynomially bounded certificates and polynomial-time checking, while the branch viewpoint requires every nondeterministic branch to have polynomial length. [/example] ## Certificates and Polynomial-Time Verification The machine definition captures guessing operationally, but reductions are easier to write using explicit evidence. Fix a finite alphabet $\Sigma$. The verifier viewpoint asks: if the answer is yes, is there a short string that convinces a deterministic polynomial-time algorithm? [definition: Polynomial-Time Verifier] A polynomial-time verifier is a deterministic Turing machine computing a predicate \begin{align*} V : \Sigma^* \times \Sigma^* \to \{0,1\}. \end{align*} There is a polynomial $q$ such that, on every input pair $(x,c)$, the computation of $V(x,c)$ halts within at most $q(|x|+|c|)$ steps. [/definition] A language $L \subseteq \Sigma^*$ has polynomially bounded certificates for $V$ if there is a polynomial $p$ such that \begin{align*} x \in L \iff \exists c \in \Sigma^* \text{ with } |c| \le p(|x|) \text{ and } V(x,c)=1. \end{align*} The string $c$ is called a certificate or witness for $x$. The certificate is part of the input to the verifier, so its polynomial length bound matters as much as the verifier's running time. The verifier need not find the certificate; it only checks a proposed certificate. [example: Subset Sum Certificates] Let $L_{\mathrm{SS}}$ contain encodings of integers $a_1,\dots,a_m$ and a target integer $T$ for which some subset of the listed integers sums to $T$. A certificate is a bit string $c=c_1c_2\cdots c_m \in \{0,1\}^m$, where $c_i=1$ means that $a_i$ is selected and $c_i=0$ means that $a_i$ is not selected. The certificate has exactly $m$ bits, and the input encoding must already list the $m$ integers, so $m$ is bounded by the input length up to the fixed encoding convention. Given $(a_1,\dots,a_m,T)$ and $c$, the verifier first checks that $c$ has length $m$ and that every symbol of $c$ is either $0$ or $1$. It then initializes $S_0=0$ and, for each $i$ from $1$ to $m$, computes \begin{align*} S_i = S_{i-1}+c_i a_i. \end{align*} The first steps are \begin{align*} S_1 = 0+c_1a_1 = c_1a_1. \end{align*} Then \begin{align*} S_2 = S_1+c_2a_2 = c_1a_1+c_2a_2. \end{align*} Continuing the same recurrence through the last index gives \begin{align*} S_m = c_1a_1+c_2a_2+\cdots+c_ma_m = \sum_{i=1}^m c_i a_i. \end{align*} Since each $c_i$ is either $0$ or $1$, the term $c_i a_i$ equals $a_i$ when $a_i$ is selected and equals $0$ when $a_i$ is not selected. Thus $S_m$ is exactly the sum of the selected integers, and the verifier accepts exactly when \begin{align*} S_m = T. \end{align*} Using binary arithmetic, the verifier performs $m$ additions and multiplications by bits on integers whose bit-length remains bounded by a polynomial in the input length. Therefore the certificate length and the checking time are polynomial in the input length, so subset-sum yes-instances have polynomially checkable certificates. [/example] The verifier formulation separates two different tasks. Search asks for a witness, while decision asks whether a witness exists; NP classifies the decision version by the existence of short checkable witnesses. [example: SAT Certificates] For a Boolean formula $\varphi$ in variables $x_1,\dots,x_m$, a certificate is a truth assignment \begin{align*} a=(a_1,\dots,a_m)\in \{0,1\}^m, \end{align*} where $a_i$ is the proposed truth value of $x_i$. The verifier first checks that the certificate has exactly $m$ bits and that each entry is either $0$ or $1$. It then evaluates $\varphi$ from its inputs upward. Each occurrence of $x_i$ receives the value $a_i$, and each occurrence of $\neg x_i$ receives the value $1-a_i$. For a conjunction gate with input values $u,v\in\{0,1\}$, the output is $1$ exactly when $u=1$ and $v=1$. For a disjunction gate with input values $u,v\in\{0,1\}$, the output is $1$ exactly when at least one of $u$ or $v$ equals $1$. Thus, for example, the clause \begin{align*} x_1 \vee \neg x_3 \vee x_5 \end{align*} has value \begin{align*} a_1 \vee (1-a_3) \vee a_5 \end{align*} under the assignment $a$, and the clause is satisfied exactly when this value is $1$. If $\varphi$ is in conjunctive normal form with clauses $C_1,\dots,C_r$, the verifier computes the value $b_j\in\{0,1\}$ of each clause $C_j$ under $a$. It then evaluates \begin{align*} b_1 \wedge b_2 \wedge \cdots \wedge b_r. \end{align*} The verifier accepts exactly when this final value is $1$, which is equivalent to every clause having value $1$. The certificate length is $m$, and the input formula must already name its variables and contain its symbols, so $m$ is at most the input length. During evaluation, each variable occurrence, negation, and connective in $\varphi$ is processed a bounded number of times. Therefore satisfiable formulas have polynomial-size certificates that can be checked in polynomial time. [/example] A certificate must encode enough information to be checked without hiding an expensive search step inside the verifier. This is why examples normally specify the certificate format, not only its intended meaning. [example: Invalid Certificate For Unsatisfiability] For the language $\mathrm{UNSAT}$ of unsatisfiable Boolean formulas, consider the proposed certificate consisting only of the sentence "no assignment satisfies $\varphi$." If $\varphi$ has variables $x_1,\dots,x_m$, then each variable has two possible truth values, so the number of assignments is \begin{align*} 2 \cdot 2 \cdots 2 = 2^m, \end{align*} with one factor of $2$ for each of the $m$ variables. That sentence does not tell the verifier which finite computation to check. To verify it by the most direct method, the verifier must examine every assignment \begin{align*} (a_1,\dots,a_m)\in\{0,1\}^m \end{align*} and evaluate $\varphi$ under that assignment. Even if evaluating $\varphi$ on one assignment took only one unit of work, the verifier would still have to perform at least \begin{align*} 2^m \end{align*} assignment checks. This lower bound is not polynomial in $m$: for any fixed degree $d$, \begin{align*} \frac{2^m}{m^d} \end{align*} grows without bound as $m$ increases, so no constant multiple of $m^d$ bounds $2^m$ for all sufficiently large $m$. Thus the proposed sentence is not a polynomially checkable certificate for unsatisfiability. This does not prove $\mathrm{UNSAT}\notin\operatorname{NP}$; it only shows that this particular certificate format does not establish membership in $\operatorname{NP}$. [/example] ## Equivalence of the Two Definitions The two definitions of NP look different: one uses branching computation, and the other uses deterministic checking of a second input. The key structural result is that they define the same class, because a nondeterministic branch can be encoded by its sequence of choices, and a verifier can be simulated by guessing its certificate. [quotetheorem:6191] [citeproof:6191] This theorem justifies switching between machine language and certificate language, but both polynomial bounds are essential. If the certificate were allowed to be exponentially long, a witness could contain a complete truth table or exhaustive search transcript, and merely reading it could exceed polynomial time. If the verifier were not polynomial-time bounded, the certificate could be empty and the verifier could perform the original hard search itself. The theorem therefore does not say that witnesses are easy to find; it says that yes-instances have short evidence that can be checked efficiently. In later chapters, reductions will almost always use verifiers and witnesses, while closure properties and hierarchy questions often return to machines. [example: Branch Encoding As A Certificate] Let $N$ be a nondeterministic machine which, on inputs of length $n$, has at most three legal transition choices at each step and halts on every branch within $n^2$ steps. Encode a branch by a word \begin{align*} c=c_1c_2\cdots c_\ell \in \{0,1,2\}^\ell \end{align*} with $\ell \le n^2$, where $c_i$ names the transition choice to use at step $i$. The certificate length is polynomially bounded, since \begin{align*} |c|=\ell \le n^2. \end{align*} A deterministic verifier starts from the initial configuration of $N$ on input $x$. At step $i$, it looks at the current state and scanned tape symbol, lists the legal outgoing transitions in a fixed order, and checks whether $c_i\in\{0,1,2\}$ selects one of those legal transitions. If the selected transition is legal, the verifier applies it and obtains the next configuration; if it is not legal, the verifier rejects. After processing the listed choices, the verifier accepts exactly when the simulated configuration is an accepting halting configuration. The verifier performs at most $\ell$ simulated transitions, and \begin{align*} \ell \le n^2, \end{align*} so the simulation follows only polynomially many steps. Thus the nondeterministic branch has been turned into a polynomial-size certificate whose deterministic check reproduces that branch one transition at a time. [/example] The theorem also explains why NP is insensitive to small implementation details in the nondeterministic model. Changing from one reasonable Turing-machine convention to another changes the encoding of branches and simulations by at most a polynomial overhead. [remark: Witnesses Are Not Unique] An input in an NP language may have many witnesses, one witness, or none. The definition only requires existence of at least one accepting witness for yes-instances and no accepting witness for no-instances. Counting or finding witnesses leads to related complexity questions beyond the present chapter. [/remark] ## The Containment $\operatorname{P} \subseteq \operatorname{NP}$ Where do deterministic polynomial-time problems sit relative to nondeterministic polynomial-time problems? A deterministic polynomial-time algorithm is a special case of a nondeterministic one that never branches, and it also has a verifier that ignores its certificate. [quotetheorem:6172] [citeproof:6172] This containment depends on the polynomial-time hypothesis: if arbitrary deterministic time were allowed, the same argument would place every decidable language into the analogue of NP, erasing the intended distinction between efficient and inefficient decision procedures. The theorem also does not prove $\operatorname{P}=\operatorname{NP}$, because it gives only one inclusion and does not convert a nondeterministic polynomial-time search into a deterministic polynomial-time algorithm. Its role is to set the baseline for later reductions: NP contains the efficiently decidable languages, and the interesting question is how much larger it may be. [remark: The Meaning Of P Versus NP] The question $\operatorname{P} = \operatorname{NP}$ asks whether every language with efficiently checkable yes-certificates also has an efficient deterministic decision algorithm. It is not asking whether solutions can be checked faster than they can be found for every individual instance. The statement concerns worst-case polynomial-time algorithms for entire languages. [/remark] The examples from this chapter provide the basic template for NP membership proofs: state the certificate, bound its length, and describe the deterministic polynomial-time check. This template will be used repeatedly when reductions show that certain NP languages are at least as hard as all the others. Chapter 4 introduced nondeterminism as a way to describe computations that succeed when one branch finds a witness. That witness-based view is exactly what makes polynomial-time reductions powerful, so the next chapter develops reductions as the language for comparing problems. # 5. Polynomial-Time Reductions Polynomial-time reductions are the mechanism that lets us compare decision problems without designing a fresh algorithm for each one. Chapters 3 and 4 defined the classes $P$ and $NP$ in terms of machines and verifiers; this chapter explains how to move instances from one language to another while preserving yes/no answers. The guiding idea is that a fast transformation from $A$ to $B$ makes $A$ no harder than $B$, up to polynomial-time preprocessing. ## Transforming One Decision Problem into Another When two decision problems ask different questions, we need a disciplined way to say that solving one would also solve the other. The transformation must work on every input string, must preserve membership, and must be computable within the same polynomial-time standard used to define $P$ and $NP$. [definition: Polynomial-Time Computable Function] Let $\rho, \tau$ be finite alphabets. A function $f:\rho^* \to \tau^*$ is polynomial-time computable if there is a deterministic Turing machine $M$ and a polynomial $p$ such that, for every $x \in \rho^*$, the machine $M$ halts with $f(x)$ on its output tape within $p(|x|)$ steps. [/definition] The output length of such a function is automatically polynomially bounded, because a Turing machine can write at most one new output symbol per step. Thus polynomial-time transformations cannot hide an exponential-size object inside the output string. [example: Padding Cannot Create Exponential Output] Let $M$ compute $f:\{0,1\}^*\to\{0,1\}^*$ in at most $n^3+5$ steps on every input of length $n$. If $x$ has length $n$, then during those $n^3+5$ steps the machine can write at most one new output symbol per step. If the output tape begins with at most $c$ symbols already present under the fixed starting convention for $M$, then \begin{align*} |f(x)| \le c+1\cdot(n^3+5). \end{align*} Since $c+1\cdot(n^3+5)=n^3+(c+5)$, we get \begin{align*} |f(x)| \le n^3+(c+5). \end{align*} The constant $c$ depends only on the machine model and not on $x$, so the output length is bounded by a polynomial in the original input length. Thus this polynomial-time mapping may increase the instance size, but it cannot produce outputs of length $2^n$ for all sufficiently large $n$. [/example] With this size control in place, we can define the reduction relation used throughout NP-completeness. A weaker informal comparison, such as saying that two problems both involve graphs or both have exponentially many possible certificates, would not explain how an algorithm for one problem can be reused for the other. The reduction must give an explicit preprocessing map whose yes-instances and no-instances match exactly. It is a many-one reduction because a single call to the target decision problem is enough after the instance has been transformed. [definition: Polynomial-Time Many-One Reduction] Let $A \subseteq \rho^*$ and $B \subseteq \tau^*$ be languages. We say that $A$ polynomial-time many-one reduces to $B$, written $A \le_m^p B$, if there exists a polynomial-time computable function $f:\rho^* \to \tau^*$ such that, for every $x \in \rho^*$, \begin{align*} x \in A \iff f(x) \in B. \end{align*} The function $f$ is called a polynomial-time many-one reduction from $A$ to $B$. [/definition] The reduction is part algorithm and part logical equivalence. The algorithmic condition says we can build the new instance efficiently; the equivalence condition says the new instance has exactly the same yes/no answer. [remark: Direction of a Reduction] The notation $A \le_m^p B$ means that an algorithm for $B$ gives an algorithm for $A$. The arrow points from the problem being solved by preprocessing to the problem used as the oracle-like target. [/remark] The direction convention becomes useful only if reductions can be chained. Since NP-completeness proofs often proceed through a sequence of intermediate problems, we need a theorem saying that two correct transformations can be combined into a single correct transformation. [quotetheorem:6193] [citeproof:6193] Transitivity justifies treating $\le_m^p$ as a relative difficulty relation, but both hypotheses are doing real work. For a concrete failure of answer preservation, take $A=\{1\}$, $B=\{0\}$, and let $f(1)=1$; even if $B$ reduces correctly to some $C$, the composed map has already lost the yes-instance of $A$. For a concrete failure of the time bound, suppose a first-stage transformation writes $1^{2^{|x|}}$ and a second-stage decider runs in time linear in its input length; the second stage is polynomial in the intermediate string but exponential in the original input length. The theorem also says only that certified reductions compose; it does not produce either component reduction. The next result gives the operational content: once the target problem is in $P$, the source problem is also in $P$. [quotetheorem:6194] [citeproof:6194] The direction is essential: $A \le_m^p B$ lets an algorithm for $B$ solve $A$, but it does not let an algorithm for $A$ solve $B$. A concrete language-level example is $A=\varnothing$ and $B=\{0\}$ over alphabet $\{0,1\}$. The constant map $f(x)=1$ proves $A \le_m^p B$, and $A$ has a polynomial-time decider that always rejects; however, this decider gives no way to decide whether an input belongs to $B$, since it ignores the distinction between $0$ and all other strings. The assumption $B \in P$ is also essential. For a concrete counterexample, let $B$ be any undecidable language and set $A=B$. The identity map proves $A \le_m^p B$, but the conclusion $A \in P$ is false; the missing ingredient is precisely a polynomial-time decider for the target. Without a fast decider for $B$, preprocessing has nowhere useful to send the input. The result does not prove that $B$ is hard, because even an easy language may have easier languages reducing to it. The contrapositive is often the informal message: if $A$ is believed to require super-polynomial time and $A \le_m^p B$, then $B$ should not be expected to have a polynomial-time algorithm either. This is not a proof of intractability, but it is the basic evidence pattern behind NP-hardness. ## Relative Difficulty and Completeness Reductions become a theory of difficulty when we compare a problem to every problem in a complexity class. Comparing only two isolated problems is too local: a problem might be harder than one toy language and still fail to represent the full class. Conversely, a problem outside the class might be so expressive that everything reduces to it, while still not being a legitimate member of the class being studied. The aim is to identify representative problems whose efficient solvability would collapse the whole class down to polynomial time. [definition: Hardness for a Complexity Class] Let $\mathcal C$ be a class of languages and let $B$ be a language. The language $B$ is $\mathcal C$-hard under polynomial-time many-one reductions if, for every $A \in \mathcal C$, we have $A \le_m^p B$. [/definition] Hardness is an external property: it says that all problems in the class can be transformed into $B$. To single out the problems that genuinely represent the class rather than merely dominate it from outside, we also require the target problem itself to belong to the class. [definition: Completeness for a Complexity Class] Let $\mathcal C$ be a class of languages. A language $B$ is $\mathcal C$-complete under polynomial-time many-one reductions if $B \in \mathcal C$ and $B$ is $\mathcal C$-hard under polynomial-time many-one reductions. [/definition] Completeness combines representation and membership. A complete problem is at least as hard as every problem in the class, while still being a member of that class. [example: What NP-Hard Means Operationally] Suppose $B$ is $NP$-hard and let $D_B$ be a polynomial-time decider for $B$. To see what this means operationally, fix an arbitrary language $A \in NP$. Since $B$ is $NP$-hard, there is a polynomial-time many-one reduction $f_A$ from $A$ to $B$, so for every input string $x$, \begin{align*} x \in A \iff f_A(x) \in B. \end{align*} An algorithm for $A$ can therefore compute $y=f_A(x)$ and then run $D_B$ on $y$; it accepts exactly when $D_B$ accepts $y$. If $f_A$ runs in time $p(n)$ and $D_B$ runs in time $q(m)$, then the reduction output has length at most $p(|x|)+c$ for a fixed machine-model constant $c$. The total running time is bounded by \begin{align*} p(|x|)+q(|f_A(x)|) &\le p(|x|)+q(p(|x|)+c), \end{align*} which is polynomial in $|x|$. Equivalently, this is exactly the content of *Algorithms Transfer Backwards Along Reductions*: from $A \le_m^p B$ and $B \in P$, we get $A \in P$. Since $A \in NP$ was arbitrary, one polynomial-time decider for the $NP$-hard language $B$ would give polynomial-time deciders for every language in $NP$. [/example] This observation is one of the main reasons NP-hardness is treated as evidence of intractability. The obstruction is global: a polynomial-time algorithm for one NP-hard target would not merely solve that target, but would combine with the reductions from every language in $\mathrm{NP}$. The formal statement isolates this consequence so later hardness proofs can use it without repeating the reduction-composition argument. [quotetheorem:6195] [citeproof:6195] The theorem should be read together with the open problem $P$ versus $NP$, and both hypotheses matter. NP-hardness alone does not imply membership in $P$; many NP-hard problems are believed not to have polynomial-time algorithms, and undecidable NP-hard languages cannot be in $P$. Membership in $P$ alone says nothing about $NP$, since simple languages such as $\varnothing$ or $\{0,1\}^*$ are in $P$ but do not encode all NP problems. If $P \ne NP$, no NP-hard language can lie in $P$; conversely, finding one NP-hard language in $P$ would settle the equality $P=NP$. [remark: Hardness Outside NP] A language may be $NP$-hard without being in $NP$. For example, an undecidable language can be NP-hard if every language in $NP$ reduces to it, but such a language is not an NP-complete decision problem because completeness also requires membership in $NP$. [/remark] The practical proof pattern is therefore two-part. To prove a language $B$ is NP-complete, show first that $B \in NP$, then choose a known NP-hard language $A$ and construct a polynomial-time reduction $A \le_m^p B$. ## Graph Reductions: Independent Set, Clique, and Vertex Cover The most useful reductions are not string manipulations for their own sake; they encode one combinatorial structure as another. Graph problems give the standard first examples because the transformations are simple enough to inspect and rich enough to show how yes/no answers are preserved. [definition: Independent Set Decision Problem] The language $\textsc{Independent-Set}$ consists of encodings $\langle G,k\rangle$ such that $G=(V,E)$ is a finite undirected graph, $k \in \mathbb N$, and there exists a subset $S \subseteq V$ with $|S|\ge k$ such that no two distinct vertices in $S$ are adjacent in $G$. [/definition] An independent set is a collection of mutually non-adjacent vertices. To compare it with another graph problem, we introduce the complementary notion: a set of vertices that are all adjacent to each other. [definition: Clique Decision Problem] The language $\textsc{Clique}$ consists of encodings $\langle G,k\rangle$ such that $G=(V,E)$ is a finite undirected graph, $k \in \mathbb N$, and there exists a subset $S \subseteq V$ with $|S|\ge k$ such that every two distinct vertices in $S$ are adjacent in $G$. [/definition] The definition of clique mirrors the preceding definition of independent set, except that it requires every internal pair to be an edge rather than a non-edge. The next theorem is needed because it turns that mirror relationship into a formal polynomial-time reduction, so later hardness information can be transferred from independent set to clique. [quotetheorem:6196] [citeproof:6196] This reduction also illustrates how instance size is tracked. If $G$ has $n$ vertices, then $\overline{G}$ has the same $n$ vertices and at most $\binom n2$ edges, so standard adjacency-list or adjacency-matrix encodings grow by at most a polynomial factor. Each hypothesis has a visible job. If the map forgot to complement the graph, the edgeless graph on two vertices with $k=2$ would be sent to a no-instance of $\textsc{Clique}$ although it is a yes-instance of $\textsc{Independent-Set}$. If malformed input strings were sent to a one-vertex clique with threshold $1$, they would become target yes-instances even though they are not source encodings. If the construction appended an exponentially long certificate to the complemented graph, answer preservation could remain true while polynomial-time computability would fail. The theorem does not by itself prove that $\textsc{Clique}$ is NP-hard; that conclusion requires independent knowledge that $\textsc{Independent-Set}$ is NP-hard or a reduction from some other NP-hard problem. Nor does the reduction identify the fastest algorithm for either problem, since it only transfers upper bounds backwards and hardness evidence forwards. Its role is structural: it shows that the two graph questions have the same obstruction after complementing adjacency. [example: Complementing a Four-Vertex Graph] Let $G=(V,E)$ with $V=\{1,2,3,4\}$ and $E=\{\{1,2\},\{2,3\}\}$. For $S=\{1,3,4\}$, the unordered pairs of distinct vertices in $S$ are \begin{align*} \{1,3\},\quad \{1,4\},\quad \{3,4\}. \end{align*} None of these three pairs is equal to $\{1,2\}$ or $\{2,3\}$, so no two distinct vertices of $S$ are adjacent in $G$. Hence $S$ is an independent set in $G$, and $|S|=3$. The complement graph $\overline{G}$ has the same vertex set, and its edge set contains exactly the unordered pairs of distinct vertices that are not in $E$. The six unordered pairs in $V$ are \begin{align*} \{1,2\},\{1,3\},\{1,4\},\{2,3\},\{2,4\},\{3,4\}, \end{align*} so removing $\{1,2\}$ and $\{2,3\}$ gives \begin{align*} E(\overline{G}) =\{\{1,3\},\{1,4\},\{2,4\},\{3,4\}\}. \end{align*} In particular, the three internal pairs of $S$ all lie in $E(\overline{G})$: \begin{align*} \{1,3\},\{1,4\},\{3,4\}\in E(\overline{G}). \end{align*} Thus $S=\{1,3,4\}$ is a clique of size $3$ in $\overline{G}$. This instance realizes the reduction $\langle G,3\rangle \mapsto \langle \overline{G},3\rangle$: the independent set of size $3$ in $G$ becomes the same vertex set viewed as a clique of size $3$ in the complement graph. [/example] Complementing a graph changes adjacency, but many reductions keep the underlying object and alter the numerical threshold. Vertex cover is the companion problem for independent set because choosing vertices that hit all edges is the same as excluding a set with no internal edge. [definition: Vertex Cover Decision Problem] The language $\textsc{Vertex-Cover}$ consists of encodings $\langle G,t\rangle$ such that $G=(V,E)$ is a finite undirected graph, $t \in \mathbb N$, and there exists a subset $C \subseteq V$ with $|C|\le t$ such that every edge of $G$ has at least one endpoint in $C$. [/definition] A vertex cover hits every edge, so the vertices outside the cover cannot contain both endpoints of any edge. This complement relation gives the next reduction, and the key bookkeeping point is that a small cover of size at most $t$ corresponds to a large independent set of size at least $|V|-t$. [quotetheorem:5832] [citeproof:5832] The parameter change is as important as the graph transformation. The case $t>|V|$ shows why a reduction must be total on encoded inputs rather than only correct on the most interesting parameter range; using the fixed yes-instance also avoids relying on whether $0$ is allowed as a threshold in the chosen convention for $\mathbb N$. A wrong threshold map can break the reduction even on ordinary graphs: on a triangle with $t=2$, $\textsc{Vertex-Cover}$ is yes, but mapping to $\langle G,2\rangle$ would ask for an independent set of size $2$, which does not exist. Malformed inputs must also be handled deliberately; sending a non-encoding to a fixed yes-instance would turn a source no-instance into a target yes-instance. This direction transfers a polynomial-time algorithm for $\textsc{Independent-Set}$ into one for $\textsc{Vertex-Cover}$; it does not, by itself, transfer algorithms the other way. It also preserves the optimisation complement exactly when $0\le t\le |V|$, where covers of size at most $t$ correspond to independent sets of size at least $|V|-t$. A reduction must preserve the threshold in the decision question, not merely relate the underlying optimisation objects. [example: From Cover Size to Independent-Set Size] Let $G=(V,E)$ have $|V|=10$, and suppose the vertex-cover instance is $\langle G,4\rangle$, asking for a vertex cover of size at most $4$. The reduction keeps the graph $G$ and changes the threshold to \begin{align*} |V|-4=10-4=6. \end{align*} Thus the produced independent-set instance is $\langle G,6\rangle$, asking for an independent set of size at least $6$. If $C\subseteq V$ is a vertex cover with $|C|\le 4$, set $S=V\setminus C$. Since $C\subseteq V$, the complement has size \begin{align*} |S|=|V\setminus C|=|V|-|C|. \end{align*} Using $|V|=10$ and $|C|\le 4$ gives \begin{align*} |S|=10-|C|\ge 10-4=6. \end{align*} No edge can have both endpoints in $S$: if $\{u,v\}\in E$ had $u,v\in S$, then neither $u$ nor $v$ would lie in $C$, contradicting that $C$ is a vertex cover. Hence $S$ is an independent set of size at least $6$. Conversely, if $S\subseteq V$ is an independent set with $|S|\ge 6$, set $C=V\setminus S$. Since $S\subseteq V$, \begin{align*} |C|=|V\setminus S|=|V|-|S|. \end{align*} Using $|V|=10$ and $|S|\ge 6$ gives \begin{align*} |C|=10-|S|\le 10-6=4. \end{align*} Every edge has at least one endpoint in $C$: if some edge had both endpoints outside $C$, then both endpoints would lie in $S$, contradicting that $S$ is independent. Therefore covers of size at most $4$ correspond exactly to independent sets of size at least $6$ under the threshold change $t\mapsto |V|-t$. [/example] Together these examples show the reduction method in its standard form: define a mapping on encoded instances, prove answer preservation in both directions, and check that the output size and construction time are polynomial. Later NP-completeness proofs use the same template with more elaborate encodings, but the logical burden remains the same. The reduction framework now lets us transport instances and preserve yes/no answers in polynomial time. With that machinery, we can prove that one language captures the difficulty of all of NP, beginning with SAT. # 6. $\mathrm{NP}$-Completeness and $\mathrm{SAT}$ This chapter turns the definition of NP into a method for proving lower-bound evidence. Building on Chapter 5's reduction framework, the central question is how to compare search problems by efficient transformations: if every problem in NP can be translated into one fixed language, then that language captures the full difficulty of nondeterministic polynomial-time verification. Satisfiability is the first such universal problem, because a Boolean formula can encode a whole accepting computation of a machine by enforcing many small local checks. ## Reductions and Complete Problems When two decision problems look unrelated, a polynomial-time reduction gives a precise way to say that solving one would also solve the other. The problem is to preserve yes-instances and no-instances while allowing only efficient preprocessing. [definition: Polynomial-Time Many-One Reduction] Let $A,B \subseteq \{0,1\}^*$ be languages. A polynomial-time many-one reduction from $A$ to $B$ is a function $f: \{0,1\}^* \to \{0,1\}^*$ such that: 1. there is a deterministic Turing machine computing $f(x)$ in time polynomial in $|x|$; 2. for every $x \in \{0,1\}^*$, \begin{align*} x \in A \iff f(x) \in B. \end{align*} [/definition] We write $A \le_m^p B$ when such a reduction exists. The reduction is useful only because it preserves the answer while keeping the preprocessing efficient. The next question is whether this comparison behaves correctly with the class $\mathrm{P}$. If $B$ already has a polynomial-time decider, then computing $f(x)$ and deciding the resulting instance of $B$ should give a polynomial-time procedure for $A$; the result below records the closure argument that the output length and the composed running time remain polynomial. [quotetheorem:6173] [citeproof:6173] This theorem is the basic engine behind hardness arguments. The polynomial-time hypothesis on the reduction is essential: if $f$ were allowed to take exponential time, then the preprocessing step could already be doing the computationally difficult work, and the conclusion that $A$ has a polynomial-time algorithm would no longer follow. The theorem also has a one-way character: it transfers an efficient algorithm for $B$ back to $A$, but it does not say that $B$ is hard unless many relevant languages reduce to it. This motivates a definition that records when a single target problem is at least as hard as every problem with polynomial-time verifiable certificates. [definition: NP-Hard Language] A language $B \subseteq \{0,1\}^*$ is NP-hard if for every language $A \in \mathrm{NP}$, $A \le_m^p B$. [/definition] NP-hardness is an external comparison with all of NP; it does not require the target language itself to belong to NP. This distinction matters because an undecidable language may be NP-hard without being a meaningful representative of NP's internal difficulty. For decision problems inside NP, the strongest status is completeness: the problem is hard enough to encode every NP verification problem, but still has certificates that can be checked in polynomial time. [definition: NP-Complete Language] A language $B \subseteq \{0,1\}^*$ is NP-complete if: 1. $B \in \mathrm{NP}$; 2. $B$ is NP-hard. [/definition] The definitions are useful because they concentrate the unresolved question $\mathrm{P}$ versus $\mathrm{NP}$ into any complete language. The nontrivial issue is that completeness has two separate parts: membership in $\mathrm{NP}$ and hardness for all of $\mathrm{NP}$. Together they make a complete problem a precise test case for the class equality, because one direction uses the fact that the problem belongs to $\mathrm{NP}$ and the other uses the fact that every $\mathrm{NP}$ problem reduces to it. [quotetheorem:6197] [citeproof:6197] This theorem isolates why complete problems are central rather than merely difficult-looking. Both hypotheses in NP-completeness are used: $B\in\mathrm{NP}$ gives the forward implication from $\mathrm{P}=\mathrm{NP}$, while NP-hardness gives the reverse implication from $B\in\mathrm{P}$. The theorem still does not prove $B\notin\mathrm{P}$; it says that proving or disproving polynomial-time solvability for $B$ is exactly as hard as resolving the whole class equality. The first examples of reductions are often bookkeeping exercises, but they fix the direction of reasoning. To prove that a new problem $C$ is NP-hard after SAT is known NP-complete, we reduce SAT to $C$, not $C$ to SAT. [example: Direction of a Hardness Reduction] Suppose $C$ has a polynomial-time decider $M_C$, and suppose $\mathrm{SAT}\le_m^p C$ by a polynomial-time many-one reduction $f$. On input a Boolean formula $\varphi$, compute $f(\varphi)$, run $M_C$ on $f(\varphi)$, and accept exactly when $M_C$ accepts. For correctness, the reduction property gives \begin{align*} \varphi\in \mathrm{SAT} \iff f(\varphi)\in C, \end{align*} and $M_C$ accepts $f(\varphi)$ exactly when $f(\varphi)\in C$. Thus this composed algorithm accepts exactly the satisfiable formulas. If computing $f(\varphi)$ takes time at most $|\varphi|^a$ and $M_C$ runs in time at most $m^b$ on inputs of length $m$, then the total time is polynomial in $|\varphi|$, since $|f(\varphi)|$ is bounded by a polynomial in $|\varphi|$. Therefore a polynomial-time algorithm for $C$ would give a polynomial-time algorithm for SAT. This is why the hardness reduction must start from the already hard problem and point toward the new target: $\mathrm{SAT}\le_m^p C$ transfers an algorithm for $C$ back to SAT, while $C\le_m^p \mathrm{SAT}$ would only show that $C$ can be solved using SAT as a subroutine. [/example] ## Boolean Satisfiability as a Verification Problem To find a first complete problem, we need a language whose instances can describe arbitrary polynomial-time verification. Boolean formulas are suitable because their variables can stand for small facts about a computation, and their clauses can enforce consistency between neighbouring facts. [definition: Boolean Formula] A Boolean formula over variables $x_1,\dots,x_m$ is an expression built from these variables using the connectives $\neg$, $\wedge$, and $\vee$ together with parentheses. [/definition] The semantic question asks whether some assignment of truth values makes the formula evaluate to true. This gives the language that will later be proved complete, and first we must place it inside NP by identifying its certificate and verifier. [definition: Satisfiability] The language $\mathrm{SAT}$ consists of encodings of Boolean formulas $\varphi$ for which there exists an assignment $a \in \{0,1\}^m$ such that $\varphi(a)=1$. [/definition] A proposed satisfying assignment is short enough to serve as an NP certificate, but certificate length alone is not enough. Before SAT can be used as the central complete problem, we must first verify that it belongs to $\mathrm{NP}$. The theorem below isolates the required verifier: it checks an assignment by evaluating the encoded formula in polynomial time, establishing efficient verifiability before the later Cook-Levin argument proves hardness. [quotetheorem:6198] [citeproof:6198] This membership result uses two special features of SAT: a satisfying assignment has length at most the number of variables, and formula evaluation is polynomial in the input size. If formulas were replaced by exponentially compressed circuits or by an encoding whose evaluation required super-polynomial work, the same certificate idea would not by itself place the language in NP. The result also proves only membership, not hardness; the difficult part is showing that every NP verifier can be encoded as a formula. The representation uses variables for a space-time tableau of the machine. ## Computation Tableaux The Cook-Levin construction begins with a nondeterministic polynomial-time machine $M$ and an input $x$. The problem is to build, in polynomial time, a formula that is satisfiable exactly when some accepting branch of $M$ on $x$ exists. [definition: Computation Tableau] Let $M$ be a Turing machine with tape alphabet $\Gamma$, state set $Q$, start state $q_0$, and accepting state $q_{\mathrm{acc}}$. For a time bound $T$, a computation tableau is a rectangular array of height $T+1$ whose row $t$ encodes the configuration of $M$ at time $t$. [/definition] A row contains the tape contents, the head position, and the current state. We choose a polynomial bound $T=p(|x|)$ for the accepting computation, and choose a tape bound $S$ large enough to cover every position the head can reach in $T$ steps; for a one-tape machine, $S$ may be taken to be $|x|+T+1$. To turn the tableau into a Boolean formula, the next step is to introduce propositional variables for every possible local entry. [definition: Tableau Variables] For each time $0\le t\le T$, tape cell $1\le i\le S$, and symbol-or-state marker $a$ from a finite alphabet $\Delta$, introduce a Boolean variable $X_{t,i,a}$. [/definition] The intended meaning is that $X_{t,i,a}=1$ says cell $i$ at time $t$ contains marker $a$. Here $\Delta$ combines ordinary tape symbols with marked symbols that record the unique position of the head and the current state. [example: Variables for a Short Computation] Consider a machine with tape symbols $0,1,\blank$ and states $q_0,q_1,q_{\mathrm{acc}}$, and examine $T=2$ time steps on three tape cells. The tableau has rows $t=0,1,2$ and cells $i=1,2,3$. If ordinary tape markers are $0,1,\blank$ and head-state markers record a state together with the scanned symbol, then the marker alphabet is \begin{align*} \Delta = \{0,1,\blank\} \cup \{(q,s):q\in\{q_0,q_1,q_{\mathrm{acc}}\},\ s\in\{0,1,\blank\}\}. \end{align*} Thus the variables have the form $X_{t,i,a}$ with \begin{align*} t\in\{0,1,2\},\qquad i\in\{1,2,3\},\qquad a\in\Delta. \end{align*} For example, $X_{0,1,0}=1$ says that row $0$, cell $1$ contains the ordinary tape symbol $0$. Also, $X_{1,2,(q_1,1)}=1$ says that at time $1$ the head is on cell $2$, the scanned symbol is $1$, and the current state is $q_1$. Since \begin{align*} |\Delta| = |\{0,1,\blank\}|+ |\{q_0,q_1,q_{\mathrm{acc}}\}|\cdot|\{0,1,\blank\}| = 3+3\cdot 3 = 12, \end{align*} this toy tableau uses \begin{align*} (T+1)\cdot 3\cdot |\Delta| = 3\cdot 3\cdot 12 = 108 \end{align*} Boolean variables. The point is that a whole computation history is represented by many small yes-or-no facts, one for each possible marker at each time and cell. [/example] The variable family is too permissive by itself: an assignment could mark a cell with several symbols or with none. A satisfying assignment must be forced to describe exactly one marker in each position of the tableau, so the next definition isolates the repeated local constraint that enforces this. [definition: Exactly-One Cell Constraint] For fixed $t$ and $i$, the exactly-one constraint is the conjunction of: \begin{align*} \bigvee_{a\in \Delta} X_{t,i,a} \end{align*} and, for every distinct $a,b\in \Delta$, \begin{align*} \neg X_{t,i,a}\vee \neg X_{t,i,b}. \end{align*} [/definition] The first clause requires some marker to be present, while the pairwise clauses prevent two different markers from occupying the same cell at the same time. Since $\Delta$ is fixed by the machine, only polynomially many such clauses are needed. [example: Local Cell Consistency Clauses] If $\Delta=\{0,1,H\}$ for a toy tableau cell, then exactly-one consistency at position $(t,i)$ consists of one clause saying that some marker is present and three pairwise clauses saying that no two distinct markers are present together: \begin{align*} (X_{t,i,0}\vee X_{t,i,1}\vee X_{t,i,H})\wedge(\neg X_{t,i,0}\vee \neg X_{t,i,1})\wedge(\neg X_{t,i,0}\vee \neg X_{t,i,H})\wedge(\neg X_{t,i,1}\vee \neg X_{t,i,H}). \end{align*} First set \begin{align*} X_{t,i,0}=0,\qquad X_{t,i,1}=0,\qquad X_{t,i,H}=1. \end{align*} The at-least-one clause has value \begin{align*} X_{t,i,0}\vee X_{t,i,1}\vee X_{t,i,H}=0\vee 0\vee 1=1. \end{align*} The three exclusion clauses have values \begin{align*} \neg X_{t,i,0}\vee \neg X_{t,i,1}=1\vee 1=1. \end{align*} \begin{align*} \neg X_{t,i,0}\vee \neg X_{t,i,H}=1\vee 0=1. \end{align*} \begin{align*} \neg X_{t,i,1}\vee \neg X_{t,i,H}=1\vee 0=1. \end{align*} Every conjunct is true, so this assignment satisfies the exactly-one formula. By contrast, set \begin{align*} X_{t,i,0}=1,\qquad X_{t,i,1}=0,\qquad X_{t,i,H}=1. \end{align*} Then the exclusion clause for the pair $0,H$ has value \begin{align*} \neg X_{t,i,0}\vee \neg X_{t,i,H}=0\vee 0=0. \end{align*} Since this false clause is one conjunct of the formula, the whole conjunction is false. Thus the clauses allow a cell to carry exactly one marker and reject an assignment that tries to put two markers in the same tableau position. [/example] ## The Cook-Levin Theorem Having encoded the shape of a tableau, we need to enforce that it starts correctly, evolves according to the transition relation, and eventually accepts. Each requirement must be expressible as a Boolean formula of polynomial size. [quotetheorem:6199] [citeproof:6199] [explanation: Locality in the Tableau Construction] The key technical point is that a Turing-machine transition cannot inspect the whole tableau row. Between times $t$ and $t+1$, a cell far from the head usually stays unchanged, while the cells near the head determine the new state, the written symbol, and the head movement. Therefore the validity of the whole update can be checked by requiring every small window of consecutive cells in two adjacent rows to be one of finitely many allowed patterns. Because the machine $M$ is fixed for the reduction from $A$, the list of allowed local patterns is finite and does not grow with $x$. The formula grows only because the same finite checks are placed at every time and tape position in the polynomial-size tableau. [/explanation] To see what these constraints mean concretely, it helps to isolate the non-local-looking requirements and rewrite them as repeated small clauses or formulas. [example: Initial and Accepting Clauses] Suppose $x=x_1\cdots x_n$ and the tableau has $S$ cells, with $S\ge n$. The initial row is enforced by unit clauses. For the first cell we require \begin{align*} X_{0,1,(q_0,x_1)}. \end{align*} For each input position $2\le i\le n$, we require \begin{align*} X_{0,i,x_i}. \end{align*} For each remaining cell $n<i\le S$, we require \begin{align*} X_{0,i,\blank}. \end{align*} Together with the exactly-one cell constraints, these unit clauses force row $0$ to be exactly \begin{align*} (q_0,x_1),\ x_2,\ x_3,\ \dots,\ x_n,\ \blank,\ \dots,\ \blank. \end{align*} Indeed, if $X_{0,1,(q_0,x_1)}=1$, then every other marker $b\ne(q_0,x_1)$ in cell $1$ is ruled out by the clause \begin{align*} \neg X_{0,1,(q_0,x_1)}\vee \neg X_{0,1,b} = 0\vee \neg X_{0,1,b} = \neg X_{0,1,b}, \end{align*} so the clause is true only when $X_{0,1,b}=0$. The same argument applies to each cell whose initial marker is fixed by a unit clause. If the accepting state is stored together with the scanned symbol, the accepting condition is the single clause \begin{align*} \bigvee_{0\le t\le T}\bigvee_{1\le i\le S}\bigvee_{a\in\Gamma} X_{t,i,(q_{\mathrm{acc}},a)}. \end{align*} This disjunction is true exactly when at least one variable $X_{t,i,(q_{\mathrm{acc}},a)}$ is true, meaning that for some time $t$, some cell $i$, and some scanned symbol $a$, the tableau contains the head-state marker $(q_{\mathrm{acc}},a)$. Thus the initial clauses pin down the starting configuration, while the accepting clause requires an accepting configuration to appear somewhere in the tableau. [/example] The Cook-Levin theorem gives the first NP-complete language. Further NP-completeness proofs usually avoid rebuilding tableaux by reducing from SAT or one of its normal-form variants. ## Conjunctive Normal Form Many reductions are easier to start from formulas with a restricted syntax. The question is whether requiring a formula to be a conjunction of clauses loses the expressive power needed for NP-completeness. [definition: Literal] A literal is either a Boolean variable $x$ or its negation $\neg x$. [/definition] A literal is the smallest syntactic unit that can appear in a clause. To make the clause structure usable in reductions, we now need a definition for formulas assembled as conjunctions of such local disjunctions; this is the standard conjunctive normal form. [definition: Conjunctive Normal Form] A Boolean formula is in conjunctive normal form, or CNF, if it is a conjunction of clauses, where each clause is a disjunction of literals. [/definition] The corresponding satisfiability language restricts SAT to this syntactic form. To use CNF formulas as inputs for reductions, we need to name the decision problem whose instances have this restricted syntax while keeping the same semantic question: does some truth assignment satisfy the formula? The key concern is that a syntactic restriction might have made the problem easier, so the language is separated from the later theorem that proves it remains complete. [definition: CNF-SAT] The language $\mathrm{CNF\text{-}SAT}$ consists of encodings of satisfiable Boolean formulas in conjunctive normal form. [/definition] A direct conversion using distributive laws can cause exponential growth, so membership in NP is not the interesting issue. The real obstruction is whether every satisfiable formula can be transformed into an equisatisfiable CNF formula without losing polynomial size. Auxiliary variables solve this by naming subformulas locally, allowing the reduction to preserve satisfiability while avoiding exponential duplication. [quotetheorem:6200] [citeproof:6200] This proof avoids the main failure mode of direct CNF conversion. Repeatedly distributing $\vee$ over $\wedge$ can duplicate subformulas and produce an exponentially larger formula, which would not give a polynomial-time reduction. The auxiliary variables are bookkeeping devices introduced by the reduction; since satisfiability is existential, extending an assignment to these new variables is the right notion of equivalence. The construction preserves satisfiability rather than literal truth under the original variables alone, which is exactly the preservation required by many-one reductions between decision languages. [example: Tseitin Clauses for a Small Formula] Let $\varphi=(x\vee y)\wedge \neg z$. Introduce variables $u$, $v$, and $w$ intended to satisfy \begin{align*} u=x\vee y,\qquad v=\neg z,\qquad w=u\wedge v. \end{align*} For $u\leftrightarrow (x\vee y)$, use \begin{align*} (\neg u\vee x\vee y)\wedge(u\vee \neg x)\wedge(u\vee \neg y). \end{align*} If $u=1$, then the first clause becomes \begin{align*} \neg u\vee x\vee y=0\vee x\vee y=x\vee y, \end{align*} so satisfying the clause forces $x\vee y=1$. If $x=1$, then the second clause becomes \begin{align*} u\vee \neg x=u\vee 0=u, \end{align*} so satisfying it forces $u=1$. If $y=1$, then the third clause becomes \begin{align*} u\vee \neg y=u\vee 0=u, \end{align*} so satisfying it also forces $u=1$. Thus these three clauses force $u$ to be true exactly when $x\vee y$ is true. For $v\leftrightarrow \neg z$, use \begin{align*} (\neg v\vee \neg z)\wedge(v\vee z). \end{align*} If $v=1$, then the first clause becomes \begin{align*} \neg v\vee \neg z=0\vee \neg z=\neg z, \end{align*} so satisfying it forces $z=0$. If $z=0$, then the second clause becomes \begin{align*} v\vee z=v\vee 0=v, \end{align*} so satisfying it forces $v=1$. Hence these two clauses force $v=\neg z$. For $w\leftrightarrow (u\wedge v)$, use \begin{align*} (\neg w\vee u)\wedge(\neg w\vee v)\wedge(w\vee \neg u\vee \neg v). \end{align*} If $w=1$, then the first clause becomes \begin{align*} \neg w\vee u=0\vee u=u, \end{align*} so satisfying it forces $u=1$. The second clause becomes \begin{align*} \neg w\vee v=0\vee v=v, \end{align*} so satisfying it forces $v=1$. Conversely, if $u=1$ and $v=1$, then the third clause becomes \begin{align*} w\vee \neg u\vee \neg v=w\vee 0\vee 0=w, \end{align*} so satisfying it forces $w=1$. Therefore these three clauses force $w=u\wedge v$. Finally add the unit clause $w$. The full CNF formula is the conjunction of the clauses above together with $w$. Because the unit clause forces $w=1$, the clauses for $w$ force $u=1$ and $v=1$; then the clauses for $u$ and $v$ force $x\vee y=1$ and $\neg z=1$. Conversely, if $(x\vee y)\wedge \neg z=1$, set \begin{align*} u=x\vee y,\qquad v=\neg z,\qquad w=u\wedge v. \end{align*} Then $u=1$, $v=1$, and $w=1$, and each clause displayed above evaluates to $1$. Thus the constructed CNF formula is satisfiable exactly when $\varphi$ is satisfiable. [/example] The chapter ends with the standard template for proving later completeness results. First show the target problem is in NP by describing a polynomial verifier. Then reduce a known NP-complete problem to the target by encoding satisfying assignments as the objects the target problem asks for. Chapter 6 shows how SAT becomes the prototype for NP-completeness by combining membership in NP with polynomial-time reductions from every problem in the class. The next chapter uses that template to generate a broader toolkit of standard NP-complete problems. # 7. Core $\mathrm{NP}$-Complete Problems This chapter turns the abstract definition of NP-completeness into a working toolkit. Chapters 5 and 6 introduced polynomial-time reductions and proved that SAT is NP-complete; here we learn how a small number of standard reductions generate many of the core NP-complete problems used throughout complexity theory and algorithms. The recurring theme is that a reduction must preserve yes-instances and no-instances while keeping the instance size polynomial. ## From $\mathrm{CNF}$-$\mathrm{SAT}$ to $3$-$\mathrm{SAT}$ SAT allows clauses of arbitrary length, but many reductions are easier to start from a formula whose clauses all have exactly three literals. The first question is therefore whether restricting clause width destroys NP-completeness, or whether the expressive power of SAT survives under a fixed local format. [definition: Three CNF Formula] A Boolean formula is in three conjunctive normal form if it is a conjunction of clauses, each clause is a disjunction of exactly three literals. [/definition] This definition isolates the syntactic restriction that later graph gadgets will use: each clause contributes a bounded number of local choices. We next name the decision problem associated to this restricted format, so that it can serve as a source problem for reductions. [definition: 3-SAT] The language $3\text{-}\mathrm{SAT}$ consists of all three CNF formulas that have a satisfying truth assignment. [/definition] The problem is not interesting merely as a special case of SAT; it is useful only if it remains as hard as unrestricted satisfiability. The obstacle is clause length: unrestricted CNF formulas may contain long clauses, while later graph gadgets need a bounded local format. The reduction must therefore replace each long clause by polynomially many three-literal clauses that preserve satisfiability without forcing a particular assignment to the original variables. [quotetheorem:6201] [citeproof:6201] This theorem is the standard entry point for reductions to graph problems, because a three CNF formula has two useful kinds of structure: clauses must each contribute at least one true literal, and contradictory literals cannot both be chosen. The restriction to exactly three literals is doing real work: if clauses had unbounded length, a clause gadget would need unbounded degree or unbounded internal size tied to a single constraint, while if we allowed empty clauses or changed the meaning of repeated literals the padding step for short clauses would no longer be harmless. The theorem also does not say that every syntactic restriction of SAT remains hard; for example, formulas with at most two literals per clause give $2\text{-}\mathrm{SAT}$, which is solvable in polynomial time. The following example makes the auxiliary-variable chain explicit before we use the same choice pattern in graphs. [example: Converting A Long Clause] Consider the clause $(x_1 \vee \neg x_2 \vee x_3 \vee x_4 \vee \neg x_5)$. The three-CNF replacement introduces auxiliary variables $y_1,y_2$ and uses \begin{align*} (x_1 \vee \neg x_2 \vee y_1) \wedge (\neg y_1 \vee x_3 \vee y_2) \wedge (\neg y_2 \vee x_4 \vee \neg x_5). \end{align*} Suppose the first true original literal is $x_3$. Then $x_1$ is false, $\neg x_2$ is false, and $x_3$ is true. Set $y_1=\mathrm{true}$ and $y_2=\mathrm{false}$. The first replacement clause evaluates to \begin{align*} x_1 \vee \neg x_2 \vee y_1 = \mathrm{false}\vee \mathrm{false}\vee \mathrm{true} = \mathrm{true}. \end{align*} The second replacement clause evaluates to \begin{align*} \neg y_1 \vee x_3 \vee y_2 = \mathrm{false}\vee \mathrm{true}\vee \mathrm{false} = \mathrm{true}. \end{align*} The third replacement clause evaluates to \begin{align*} \neg y_2 \vee x_4 \vee \neg x_5 = \mathrm{true}\vee x_4\vee \neg x_5 = \mathrm{true}. \end{align*} Thus this choice of auxiliary variables satisfies all three replacement clauses. Conversely, suppose all five original literals are false: $x_1=\mathrm{false}$, $\neg x_2=\mathrm{false}$, $x_3=\mathrm{false}$, $x_4=\mathrm{false}$, and $\neg x_5=\mathrm{false}$. For the first replacement clause to be true, we need \begin{align*} x_1 \vee \neg x_2 \vee y_1 = \mathrm{false}\vee \mathrm{false}\vee y_1 = y_1, \end{align*} so $y_1=\mathrm{true}$. Then the second replacement clause becomes \begin{align*} \neg y_1 \vee x_3 \vee y_2 = \mathrm{false}\vee \mathrm{false}\vee y_2 = y_2, \end{align*} so $y_2=\mathrm{true}$. With $y_2=\mathrm{true}$, the last replacement clause is \begin{align*} \neg y_2 \vee x_4 \vee \neg x_5 = \mathrm{false}\vee \mathrm{false}\vee \mathrm{false} = \mathrm{false}. \end{align*} So if the original long clause is false, no choice of $y_1,y_2$ can satisfy the whole three-clause replacement. [/example] ## Cliques and Consistent Literal Choices A satisfying assignment for a three CNF formula can be witnessed by choosing one true literal from each clause, with the restriction that we never choose both $x$ and $\neg x$. The clique problem asks for many pairwise compatible vertices, so it is designed to encode this choice-and-consistency pattern. [definition: Clique Decision Problem] The language $\mathrm{CLIQUE}$ consists of all pairs $(G,k)$ where $G=(V,E)$ is an undirected graph and $G$ contains a set $C \subset V$ with $|C| \ge k$ such that every two distinct vertices in $C$ are adjacent. [/definition] A clique is a collection whose members are pairwise mutually compatible. The next question is whether formula satisfiability can be represented purely as the existence of a large compatible collection in a graph. [quotetheorem:6202] [citeproof:6202] [example: A Three Cnf Formula As A Clique Instance] Let \begin{align*} \varphi=(x \vee y \vee z) \wedge (\neg x \vee y \vee w) \wedge (\neg y \vee z \vee \neg w), \end{align*} and call the three clauses $C_1,C_2,C_3$ in order. The clique construction creates one vertex for each literal occurrence: the vertices for $C_1$ are $x,y,z$, the vertices for $C_2$ are $\neg x,y,w$, and the vertices for $C_3$ are $\neg y,z,\neg w$. Edges are placed only between vertices from different clauses, and such an edge is omitted exactly when the two literal labels are complements. Consider the three vertices $x$ from $C_1$, $y$ from $C_2$, and $z$ from $C_3$. They come from three different clauses. The pair $x,y$ is not complementary, so the edge between $x$ from $C_1$ and $y$ from $C_2$ is present. The pair $x,z$ is not complementary, so the edge between $x$ from $C_1$ and $z$ from $C_3$ is present. The pair $y,z$ is not complementary, so the edge between $y$ from $C_2$ and $z$ from $C_3$ is present. Hence every pair among the three chosen vertices is adjacent, so they form a clique of size $3$. The corresponding partial assignment sets $x=\mathrm{true}$, $y=\mathrm{true}$, and $z=\mathrm{true}$; the value of $w$ is irrelevant for this witness. Under this assignment, the first clause evaluates as \begin{align*} C_1=(x \vee y \vee z)=\mathrm{true}\vee \mathrm{true}\vee \mathrm{true}=\mathrm{true}. \end{align*} The second clause evaluates as \begin{align*} C_2=(\neg x \vee y \vee w)=\mathrm{false}\vee \mathrm{true}\vee w=\mathrm{true}. \end{align*} The third clause evaluates as \begin{align*} C_3=(\neg y \vee z \vee \neg w)=\mathrm{false}\vee \mathrm{true}\vee \neg w=\mathrm{true}. \end{align*} The clique records one mutually consistent true literal from each clause. [/example] ## Independent Sets and Vertex Covers The next two graph problems are not new reductions from formulas so much as translations of clique through graph complementation. This section asks how changing the viewpoint from mutual compatibility to mutual non-adjacency produces equivalent NP-complete languages. [definition: Independent Set Decision Problem] The language $\mathrm{INDEPENDENT}\text{-}\mathrm{SET}$ consists of all pairs $(G,k)$ where $G=(V,E)$ is an undirected graph and there is a set $S \subset V$ with $|S| \ge k$ such that no two distinct vertices in $S$ are adjacent. [/definition] An independent set records many choices that do not conflict by edges. The next question is whether the already-hard clique problem becomes this problem after swapping edges and non-edges. [quotetheorem:6203] [citeproof:6203] Independent sets are often paired with vertex covers, because choosing vertices that avoid all edges is equivalent to leaving behind vertices that meet every edge. The complement-graph reduction depends on the graph being represented explicitly enough that all missing edges can be constructed in polynomial time; it would not by itself prove hardness for sparse graph classes that are not closed under complementation. It also does not say that every version of independent set is hard: on bipartite graphs, maximum independent set is polynomial-time equivalent to minimum vertex cover by Konig's theorem. To formulate the paired problem, we now switch from selecting a conflict-free set to selecting a set that hits every edge. [definition: Vertex Cover Decision Problem] The language $\mathrm{VERTEX}\text{-}\mathrm{COVER}$ consists of all pairs $(G,k)$ where $G=(V,E)$ is an undirected graph and there is a set $C \subset V$ with $|C| \le k$ such that every edge of $G$ has at least one endpoint in $C$. [/definition] The complement of a vertex cover is an independent set: if an uncovered edge had both endpoints outside the cover, those endpoints would be adjacent inside the complement set. The next question is whether this complement relation transfers NP-completeness while changing a lower bound into an upper bound. [quotetheorem:6204] [citeproof:6204] The theorem is a useful reminder that reductions may change the optimisation direction: a large independent set becomes a small vertex cover. The exact parameter conversion $k \mapsto n-k$ is necessary; keeping the same parameter would be wrong, since a graph on $100$ vertices with an independent set of size $90$ corresponds to a vertex cover of size $10$, not $90$. The theorem also concerns general graphs, and it does not contradict polynomial algorithms for vertex cover on special classes such as trees and bipartite graphs. A path graph gives a compact illustration of the complement relationship. [example: Independent Set And Vertex Cover Complements] In the path graph $v_1-v_2-v_3-v_4$, the vertex set is $V=\{v_1,v_2,v_3,v_4\}$ and the edge set is $E=\{v_1v_2,\;v_2v_3,\;v_3v_4\}$. Let $S=\{v_1,v_3\}$. The only pair of distinct vertices in $S$ is $v_1,v_3$, and $v_1v_3 \notin E$, so no two vertices of $S$ are adjacent. Hence $S$ is an independent set. The complement of $S$ in $V$ is \begin{align*} V\setminus S=\{v_1,v_2,v_3,v_4\}\setminus \{v_1,v_3\}=\{v_2,v_4\}. \end{align*} Call this set $C=\{v_2,v_4\}$. Each edge of the path has at least one endpoint in $C$: the edge $v_1v_2$ touches $v_2\in C$, the edge $v_2v_3$ touches $v_2\in C$, and the edge $v_3v_4$ touches $v_4\in C$. Therefore $C$ is a vertex cover. Since $n=|V|=4$ and $|S|=2$, the complement has size \begin{align*} |C|=|V\setminus S|=4-2=2=n-|S|. \end{align*} This is exactly the parameter conversion $k\mapsto n-k$ used in the reduction from independent set to vertex cover. [/example] ## Hamiltonian Cycles and Visiting Gadgets Once Unlike clique and vertex cover, Hamiltonian cycle is not a direct reformulation of choosing literals. It asks whether a graph has a cycle that visits every vertex exactly once, so the reduction must force a traversal to encode assignments and clause satisfaction. [definition: Hamiltonian Cycle Decision Problem] The language $\mathrm{HAM}\text{-}\mathrm{CYCLE}$ consists of all graphs $G=(V,E)$ that contain a cycle visiting every vertex in $V$ exactly once. [/definition] This definition imposes a global constraint rather than a local compatibility condition. The next question is whether a single forced traversal can still simulate the local choices and clauses of a Boolean formula. [quotetheorem:6205] [citeproof:6205] This theorem is used mostly as a template for route, scheduling, and path-planning hardness proofs. The important lesson is that reductions need not look algebraic; they may enforce logical constraints by how a path is allowed to pass through a network. The gadget hypotheses matter because a naive construction that merely connects variables to the clauses they satisfy lets a cycle visit both true and false sides of the same variable, or skip an inconvenient clause vertex. The theorem also does not say that Hamiltonian cycle is hard on every graph class: for example, graphs of maximum degree $2$ are just disjoint unions of paths and cycles, where the question is directly checkable. The forward use is therefore to copy the traversal-forces-choice idea, not merely the words "visit every vertex once". [remark: Hamiltonian Path Variant] The Hamiltonian path problem, where the cycle condition is replaced by a path visiting every vertex once, is also NP-complete. It reduces to and from Hamiltonian cycle by adding a small number of vertices and edges to control the endpoints. [/remark] ## Arithmetic Encodings and Subset Sum The graph reductions represent logical consistency with adjacency. Arithmetic reductions instead pack truth choices into decimal or binary digits, arranging that no carries occur so each digit acts as an independent constraint. [definition: Subset Sum Decision Problem] The language $\mathrm{SUBSET}\text{-}\mathrm{SUM}$ consists of all pairs $(a_1,\dots,a_n;t)$ of positive integers and a positive target integer $t$ for which there is a set $I \subset \{1,\dots,n\}$ such that \begin{align*} \sum_{i \in I} a_i = t. \end{align*} [/definition] A subset sum instance can encode a formula by giving each candidate literal a number. The next question is whether exact equality of integer sums can enforce both the choice of truth values and the satisfaction of every clause. [quotetheorem:6206] [citeproof:6206] The no-carry condition is the key technical guardrail. Without it, information from one constraint column could leak into another and break the intended logical interpretation; for example, a carry out of one clause column could falsely compensate for an unsatisfied neighbouring clause. The exact target value also matters: if the target were $3$ with slack digits $1$ and $2$, an unsatisfied clause contributing $0$ could be completed by taking both slack numbers. The next example focuses on the corrected clause-column mechanism. [example: Clause Digits For A Small Constraint] For the clause $(x \vee \neg y \vee z)$, the clause digit receives one unit from each selected truth-choice number that makes the clause true: the number for $x=\mathrm{true}$ contributes $1$, the number for $y=\mathrm{false}$ contributes $1$, and the number for $z=\mathrm{true}$ contributes $1$. The target in this clause digit is $4$, and the two clause-only slack numbers have digits $1$ and $2$. Let $s$ be the contribution from the selected truth-choice numbers in this clause column. Since the clause has three literals, the possible values are $s\in\{0,1,2,3\}$. If exactly one literal is true, then $s=1$, and selecting both slack numbers gives \begin{align*} s+1+2=1+1+2=4. \end{align*} If exactly two literals are true, then $s=2$, and selecting only the slack digit $2$ gives \begin{align*} s+2=2+2=4. \end{align*} If all three literals are true, then $s=3$, and selecting only the slack digit $1$ gives \begin{align*} s+1=3+1=4. \end{align*} Thus every satisfying choice for the clause can make this digit equal the target digit $4$. If all three literals are false, then none of the three truth-choice numbers contributes to this clause digit, so $s=0$. The possible slack totals are \begin{align*} 0,\qquad 1,\qquad 2,\qquad 1+2=3. \end{align*} Adding these to $s=0$ gives only \begin{align*} 0+0=0,\qquad 0+1=1,\qquad 0+2=2,\qquad 0+3=3, \end{align*} and none of these equals $4$. Therefore an unsatisfied clause column cannot be completed to the target, while any clause with at least one true literal can be completed exactly. [/example] ## Partition and Knapsack Decision Problems Subset Sum asks for an exact target. Partition and Knapsack are nearby packing problems, and the question is whether their extra structure makes them easier or whether exact subset selection remains hidden inside them. [definition: Partition Decision Problem] The language $\mathrm{PARTITION}$ consists of all finite lists of positive integers $(a_1,\dots,a_n)$ such that there is a set $I \subset \{1,\dots,n\}$ with \begin{align*} \sum_{i \in I} a_i = \sum_{i \notin I} a_i. \end{align*} [/definition] Partition is a balanced version of Subset Sum rather than a completely different problem. A target equation can be converted into a balance equation by adding carefully chosen extra numbers, so the hardness of exact selection transfers. [quotetheorem:6207] [citeproof:6207] This reduction is sensitive to the two added balancing numbers. If we added only one extra number, the new total could have the wrong parity or allow a partition that uses the extra number to mask the target equation; the pair $A+t$ and $2A-t$ forces the original subset to appear on a specified side. The theorem does not say that all balanced-number problems are hard under unary encoding, since Partition has a pseudo-polynomial dynamic programme. The knapsack decision problem generalises subset sum by giving each item both a weight and a value, so we next ask whether two inequalities can still force exact equality. [definition: Knapsack Decision Problem] The language $\mathrm{KNAPSACK}$ consists of all tuples $((w_1,v_1),\dots,(w_n,v_n);W,V)$ of positive integers for which there is a set $I \subset \{1,\dots,n\}$ satisfying \begin{align*} \sum_{i \in I} w_i \le W, \qquad \sum_{i \in I} v_i \ge V. \end{align*} [/definition] The two inequalities might suggest more flexibility than Subset Sum. The next question is whether equality can be forced by choosing weights and values so that the same selected total is measured twice. [quotetheorem:6208] [citeproof:6208] The reduction embeds Subset Sum without changing the numbers, so it is one of the shortest NP-hardness proofs in the course. The equality $w_i=v_i$ is the crucial hypothesis in the reduction: if values were unrelated to weights, satisfying the value lower bound would not force the selected weights to hit the target exactly. The theorem also concerns the binary-encoded decision problem; it does not rule out pseudo-polynomial dynamic programming algorithms when the numerical bounds are small. A numerical instance shows how the two inequalities collapse to equality. [example: Subset Sum Inside Knapsack] For the Subset Sum instance $(3,5,6,9;14)$, form a Knapsack instance by giving each number the same weight and value. The four items are $(w_1,v_1)=(3,3)$, $(w_2,v_2)=(5,5)$, $(w_3,v_3)=(6,6)$, and $(w_4,v_4)=(9,9)$, and we set \begin{align*} W=14,\qquad V=14. \end{align*} The subset $\{5,9\}$ in the original Subset Sum instance corresponds to selecting item $2$ and item $4$. Their total weight is \begin{align*} w_2+w_4=5+9=14, \end{align*} so the weight constraint holds because \begin{align*} 14 \le W=14. \end{align*} Their total value is \begin{align*} v_2+v_4=5+9=14, \end{align*} so the value constraint holds because \begin{align*} 14 \ge V=14. \end{align*} Thus the same equality $5+9=14$ that solves the Subset Sum instance becomes both the upper-bound weight condition and the lower-bound value condition in the Knapsack instance. [/example] ## How To Use The Core Problems The purpose of this chapter is not to memorise isolated NP-completeness proofs. It is to build a menu of source problems whose structures match future targets: clauses for logical constraints, cliques for compatible choices, independent sets for mutually exclusive choices, vertex covers for hitting constraints, Hamiltonian cycles for forced traversals, and subset sum for exact arithmetic packing. [remark: Choosing A Source Problem] When proving a new problem NP-hard, start by identifying the shape of the target constraints. If the problem asks for selecting compatible objects, try reducing from Clique; if it asks for avoiding conflicts, try Independent Set; if it asks for covering every local obstruction, try Vertex Cover; if it involves exact numerical totals, try Subset Sum or Partition; if it involves visiting states once, try Hamiltonian Cycle. [/remark] The core NP-complete problems now serve as a library of targets and sources for reductions. To use them effectively, we need practice designing mappings, checking correctness, and spotting the right structural feature to encode, which is the subject of the next chapter. # 8. Designing and Checking Reductions This chapter assumes the definitions of decision problems, languages over $\{0,1\}^*$, polynomial-time algorithms, the class $\mathrm{NP}$, NP-hardness, and NP-completeness from Chapters 0 through 7. It is about the craft of reductions rather than the catalogue of NP-complete problems. Earlier chapters introduced polynomial-time many-one reductions as a way to compare decision problems; here we learn how to design them, check them, and find the hidden mistakes that make a proposed proof fail. The guiding theme is that an NP-completeness proof is an algorithmic construction plus a two-direction logical equivalence, and all three parts must be present. ## What a Reduction Proof Must Establish When we claim that a problem $A$ reduces to a problem $B$, what exactly has to be proved? The statement is not merely that instances of $A$ and $B$ look related, nor that solutions to one resemble solutions to the other. A reduction is a computable translation of instances whose yes/no answer is preserved. [definition: Polynomial-Time Many-One Reduction] Let $A,B \subseteq \{0,1\}^*$ be decision problems. A polynomial-time many-one reduction from $A$ to $B$ is a function $f:\{0,1\}^* \to \{0,1\}^*$ such that: 1. There exists a deterministic Turing machine computing $f(x)$ in time polynomial in $|x|$. 2. For every $x \in \{0,1\}^*$, \begin{align*} x \in A \iff f(x) \in B. \end{align*} [/definition] The definition packages two different obligations. The first is constructibility: the output instance must be produced efficiently. The second is correctness: membership in the source problem must be equivalent to membership in the target problem. In practice, reductions are written for graphs, formulae, sets, and schedules rather than raw binary strings, so we need a checklist that translates the formal definition into the proof obligations used in NP-completeness arguments. [remark: Checklist for Polynomial-Time Many-One Reductions] Let $A$ and $B$ be decision problems, with valid instances $I_A$ and $I_B$ and yes-instances $Y_A\subseteq I_A$ and $Y_B\subseteq I_B$. To prove that a proposed map $f:I_A\to \Sigma^*$ gives a polynomial-time many-one reduction $A\le_m^p B$, check the following three obligations. 1. **Validity:** for every $x\in I_A$, the string $f(x)$ is a valid instance of $B$, so $f(x)\in I_B$. 2. **Polynomial construction:** there is a deterministic algorithm that outputs $f(x)$ in time polynomial in $|x|$. 3. **Answer preservation:** for every $x\in I_A$, \begin{align*} x\in Y_A \iff f(x)\in Y_B. \end{align*} Together these conditions are exactly the data required by the formal definition: the map is well-typed on valid instances, efficiently computable, and preserves the yes/no answer in both directions. [/remark] This checklist is useful because different mistakes violate different clauses. The validity condition matters because an output string that does not encode an instance of $B$ cannot be interpreted using the promised semantics of $B$. Polynomial-time constructibility matters because a translation that enumerates all $2^n$ assignments to an $n$-variable formula may preserve the answer while already solving too much work before the target problem is reached. The biconditional matters because a construction that only maps yes-instances of $A$ to yes-instances of $B$ may also map no-instances of $A$ to yes-instances of $B$, giving false positives. The checklist does not say how to find a reduction; it only gives the proof obligations once a candidate map has been proposed, and the next sections isolate the two logical directions and the size estimate separately. [example: Vertex Cover To Set Cover Instance Map] Let $G=(V,E)$ and $k \in \mathbb N$ be an instance of Vertex Cover. Define the Set Cover instance by taking the universe to be \begin{align*} U=E, \end{align*} and, for each vertex $v \in V$, defining its incidence set by \begin{align*} S_v=\{e \in E : e \text{ is incident with } v\}. \end{align*} The family of available sets is \begin{align*} \mathcal S=\{S_v : v \in V\}, \end{align*} and the budget remains $k$. This is a valid Set Cover instance because $U$ is finite, each $S_v$ is a subset of $U=E$, and $\mathcal S$ is a finite family of subsets of $U$. If $G$ is given by its edge list, then for each edge $e=\{u,w\}$ we add $e$ to $S_u$ and to $S_w$. Thus the construction records $|E|$ universe elements, creates $|V|$ sets, and writes at most $2|E|$ incidences between sets and universe elements, so the output size is bounded by a polynomial in the input size. The correspondence of choices is vertex-by-set: for any $C \subseteq V$, choose \begin{align*} \mathcal C=\{S_v : v \in C\}\subseteq \mathcal S. \end{align*} For an edge $e=\{u,w\}\in E=U$, the condition that $C$ covers $e$ is exactly \begin{align*} u \in C \text{ or } w \in C. \end{align*} By the definition of the incidence sets, $e \in S_u$ and $e \in S_w$, so this is equivalent to \begin{align*} e \in \bigcup_{S\in \mathcal C} S. \end{align*} Thus selecting vertices in $G$ becomes selecting their incidence sets in the Set Cover instance, and the numerical budget is unchanged. [/example] The example already suggests the desired equivalence, but a full NP-completeness proof must also prove membership in NP and both directions of the reduction. We now separate those directions because they catch different errors. ## Yes-to-Yes and No-to-No Correctness A proposed reduction often begins with the intuitive sentence "a solution to the old instance gives a solution to the new instance." That proves only half of the equivalence. The harder direction is frequently the reverse: every solution to the constructed target instance must decode into a solution of the original source instance. [definition: Yes-to-Yes Direction] For a reduction $f$ from $A$ to $B$, the yes-to-yes direction is the implication \begin{align*} x \in A \implies f(x) \in B. \end{align*} [/definition] This direction is usually constructive: given a witness for the source instance, transform it into a witness for the target instance. It proves that the construction is not too restrictive, but it does not rule out target solutions that arise for unintended reasons. To exclude those false positives, we also need the direction that preserves no-instances. [definition: No-to-No Direction] For a reduction $f$ from $A$ to $B$, the no-to-no direction is the implication \begin{align*} x \notin A \implies f(x) \notin B. \end{align*} [/definition] Equivalently, the no-to-no direction may be proved by its contrapositive $f(x) \in B \implies x \in A$. This form is often easier, because a target witness can be decoded into a source witness. Once these two directions are available, the informal correctness proof can be converted into the biconditional demanded by a many-one reduction. [quotetheorem:6210] [citeproof:6210] The contrapositive form matters in practice because it forces us to ask whether the target problem has unintended solutions. The yes-to-yes hypothesis alone is not enough: a construction may correctly embed every satisfying assignment while also creating target witnesses for unsatisfiable inputs. The no-to-no hypothesis alone is also not enough, because it could reject all no-instances while failing to represent some yes-instances. Polynomial-time computability and validity remain separate requirements; a logically perfect equivalence produced by an exponential search is not a Karp reduction, and an equivalent string outside the promised encoding of $B$ has no target-instance meaning. The theorem is therefore a correctness criterion for a proposed many-one map, not a substitute for proving membership in $\mathrm{NP}$ or for designing the construction itself. [example: One Implication Is Not Enough] Suppose a proposed map $f$ sends each 3-SAT formula $\varphi$ to a graph $G_\varphi$ for Hamiltonian Path. Assume the proof has established only \begin{align*} \varphi \text{ is satisfiable} \implies G_\varphi \text{ has a Hamiltonian path}. \end{align*} Now suppose there is an unsatisfiable formula $\psi$ such that $G_\psi$ still has a Hamiltonian path because the variable gadget lets the path bypass both truth-value traversals. Then \begin{align*} \psi \text{ is not satisfiable} \end{align*} but \begin{align*} G_\psi \text{ has a Hamiltonian path}. \end{align*} So the reverse implication \begin{align*} G_\varphi \text{ has a Hamiltonian path} \implies \varphi \text{ is satisfiable} \end{align*} is false, since taking $\varphi=\psi$ makes the hypothesis true and the conclusion false. Therefore the biconditional required for a many-one reduction, \begin{align*} \varphi \text{ is satisfiable} \iff G_\varphi \text{ has a Hamiltonian path}, \end{align*} has not been proved and is in fact false for this construction. The map may preserve yes-instances, but it also creates a false positive: a target witness for $G_\psi$ need not decode to any satisfying assignment of $\psi$. [/example] This failure is not a minor presentation issue. It changes the logical statement being proved, so the reduction gives no evidence that Hamiltonian Path is at least as hard as 3-SAT. ## Polynomial-Time Constructibility Even a logically perfect correspondence is not a polynomial-time reduction if the translation is too large or too slow. The complexity class NP is organized around polynomial resources, so the size of the constructed instance must be bounded by a polynomial in the size of the original instance. [definition: Polynomial Blow-Up] Let $\Sigma$ and $\Gamma$ be finite alphabets, and let $I \subseteq \Sigma^*$ be an instance language. A function $f:I \to \Gamma^*$ has polynomial blow-up if there exists a polynomial $p$ such that \begin{align*} |f(x)| \le p(|x|) \end{align*} for every $x \in I$. [/definition] Polynomial blow-up is a size condition, not by itself a running-time guarantee. In ordinary discrete reductions, however, the same description that lists all vertices, edges, clauses, or sets in polynomial quantity usually gives a polynomial-time construction. We therefore need a reusable counting lemma for reductions assembled from many small local pieces. [quotetheorem:6211] [citeproof:6211] This theorem is the routine size estimate behind most gadget reductions. Each hypothesis rules out a different hidden exponential: there must be only polynomially many gadgets, each gadget must have polynomial size, and the edges or constraints connecting them must also be polynomially listable. If a reduction creates one gadget for every truth assignment, the bound on $m(x)$ fails; if it creates one enormous gadget containing a truth table, the bound on gadget size fails; if every pair of exponentially many internal states is wired together, the wiring bound fails. The theorem proves only polynomial blow-up, not full polynomial-time computability: the proof must still explain that the gadgets and their connections can actually be generated in polynomial time. [example: Exponential Truth-Table Construction] Consider a SAT instance $\varphi$ with variables $x_1,\ldots,x_n$. The flawed construction makes one vertex for each truth assignment \begin{align*} \alpha:\{x_1,\ldots,x_n\}\to\{0,1\}. \end{align*} For $x_1$ there are $2$ choices, for $x_2$ there are $2$ choices, and the choices are independent for all $n$ variables, so the number of assignments is \begin{align*} |\{\alpha:\{x_1,\ldots,x_n\}\to\{0,1\}\}|=2\cdot 2\cdots 2=2^n. \end{align*} Thus the vertex list alone contains $2^n$ vertices before any edges are written. Let $N=|\varphi|$ be the length of the input formula. Since a SAT formula can have $n$ variables with length polynomial in $n$, the value $2^n$ is not bounded by a polynomial in $N$ on such inputs. Equivalently, the construction is enumerating all possible assignments before producing the target instance. It may encode satisfiability information, but it does so with exponential blow-up, so the proposed map is not a polynomial-time reduction. [/example] The lesson is that a reduction may not solve the source problem by brute force and hide the answer inside the target instance. The translation must be efficient even on instances whose answer is unknown. ## Gadget Reductions Many NP-completeness proofs use gadgets: small pieces of a target instance designed to simulate variables, clauses, graph adjacencies, or choices in the source instance. The main challenge is to make local constraints compose into a global equivalence without introducing unintended solutions. [definition: Gadget Reduction] A gadget reduction is a polynomial-time many-one reduction in which the target instance is assembled from finitely described components, called gadgets, each enforcing a local condition from the source instance, together with connections between gadgets that enforce consistency across overlapping local choices. [/definition] After defining the intended behaviour of each gadget, the proof has to verify that behaviour in both directions. Local soundness says target solutions induce allowed local states, and local completeness says allowed local states can be extended through the gadget. The next problem is how to turn these local gadget checks into a complete NP-completeness proof rather than a collection of isolated observations. [quotetheorem:6212] [citeproof:6212] The template is deliberately asymmetric in language: completeness builds target witnesses from source witnesses, while soundness decodes source witnesses from target witnesses. Each line is necessary. Without $B \in \mathrm{NP}$, the argument proves NP-hardness but not NP-completeness; without polynomial construction, the proof may smuggle an exponential computation into the translation; without completeness, some yes-instances of $A$ may be lost; without soundness, the target instance may have extra witnesses that do not correspond to any source witness. The template also has a limitation: it proves decision preservation, not uniqueness of witnesses, approximation preservation, or counting preservation. Those stronger properties require additional arguments beyond the NP-completeness checklist. [example: Debugging A 3-SAT To Hamiltonian Path Gadget] In a standard 3-SAT to Hamiltonian Path reduction, the intended invariant is that each variable gadget has exactly two legal full traversals. One traversal encodes setting the variable true, and the other encodes setting it false. Clause gadgets are attached so that the Hamiltonian path can enter the gadget for a clause only through an edge corresponding to an incident literal. The common bug is to prove only the completeness direction. That argument says: if $\varphi$ has a satisfying assignment, choose the matching traversal in each variable gadget; since every clause has at least one satisfied literal, enter each clause gadget through an edge for one such literal; hence the constructed graph $G_\varphi$ has a Hamiltonian path. This proves \begin{align*} \varphi \text{ is satisfiable} \implies G_\varphi \text{ has a Hamiltonian path}. \end{align*} It does not prove the reverse implication. The missing soundness issue appears when a Hamiltonian path can mix the two sides of a variable gadget. For example, suppose the path can enter the true traversal of the $x_i$-gadget, cross to the false traversal, and then leave after using edges from both sides. Then the decoding rule \begin{align*} x_i=\text{true} \text{ if the path uses the true traversal of the } x_i\text{-gadget} \end{align*} is not well-defined, because the same path also uses part of the false traversal. Similarly, the rule \begin{align*} x_i=\text{false} \text{ if the path uses the false traversal of the } x_i\text{-gadget} \end{align*} would give the opposite value from the same path. A single Hamiltonian path would therefore fail to determine a single truth value for $x_i$. The repair is to prove the local soundness statement for every variable gadget: \begin{align*} P \text{ is a Hamiltonian path through the } x_i\text{-gadget} \implies P \text{ uses exactly one prescribed traversal of that gadget}. \end{align*} After this is verified, define the decoded assignment as follows. If $P$ uses the true traversal of the $x_i$-gadget, set $x_i$ to true. If $P$ uses the false traversal of the $x_i$-gadget, set $x_i$ to false. The local soundness statement says that exactly one of these alternatives occurs, so the assignment is well-defined for every variable. Now check the clauses. For each clause gadget visited by $P$, the construction must ensure that its entry edge comes from an incident literal whose variable traversal satisfies that literal. Thus the path's visit to the gadget for a clause $C_j$ gives at least one literal $\ell \in C_j$ that is true under the decoded assignment. Since this holds for every clause gadget, every clause of $\varphi$ has a satisfied literal. Hence \begin{align*} G_\varphi \text{ has a Hamiltonian path} \implies \varphi \text{ is satisfiable}. \end{align*} This turns the informal gadget picture into the needed soundness argument: every Hamiltonian path in the constructed graph decodes to a satisfying assignment, so mixed variable traversals cannot create extra target witnesses. [/example] This debugging pattern is general. Whenever a gadget proof feels too permissive, look for a target solution that combines locally valid moves in a way that has no source interpretation. ## Set Cover From Vertex Cover We now carry out a complete reduction in a setting where the gadgets are incidence sets rather than graph paths. The proof is short, but it demonstrates the full checklist: membership in NP, polynomial construction, yes-to-yes, and no-to-no. [definition: Vertex Cover] An instance of Vertex Cover is a finite graph $G=(V,E)$ and an integer $k \in \mathbb N$. It is a yes-instance if there exists $C \subseteq V$ such that $|C| \le k$ and for every edge $e=\{u,v\} \in E$, at least one of $u$ or $v$ belongs to $C$. [/definition] The target problem asks for a small subfamily of sets covering a universe. This makes edges the natural universe and vertices the natural available sets. [definition: Set Cover] An instance of Set Cover is a finite universe $U$, a family $\mathcal S \subseteq \mathcal P(U)$, and an integer $k \in \mathbb N$. It is a yes-instance if there exists $\mathcal C \subseteq \mathcal S$ such that $|\mathcal C| \le k$ and \begin{align*} \bigcup_{S \in \mathcal C} S = U. \end{align*} [/definition] Set Cover belongs to NP because a proposed subfamily can be checked by marking the elements it covers and comparing with the universe. To prove NP-hardness, we now use the incidence construction previewed earlier and verify that covering edges by chosen sets is the same condition as covering edges by chosen endpoints. This is the first complete reduction proof in the chapter, so each checklist item is included explicitly. [quotetheorem:6213] [citeproof:6213] [example: A Small Vertex Cover Instance] Let $G$ be the path with vertex set $V=\{v_1,v_2,v_3\}$ and edge set $E=\{e_{12},e_{23}\}$, where $e_{12}=\{v_1,v_2\}$ and $e_{23}=\{v_2,v_3\}$, and take $k=1$. Under the Vertex Cover to Set Cover construction, the universe is the edge set: \begin{align*} U=E=\{e_{12},e_{23}\}. \end{align*} For each vertex, the corresponding set contains exactly the edges incident with that vertex. Since only $e_{12}$ is incident with $v_1$, both $e_{12}$ and $e_{23}$ are incident with $v_2$, and only $e_{23}$ is incident with $v_3$, we get \begin{align*} S_{v_1}=\{e_{12}\}. \end{align*} \begin{align*} S_{v_2}=\{e_{12},e_{23}\}. \end{align*} \begin{align*} S_{v_3}=\{e_{23}\}. \end{align*} Thus the available family is \begin{align*} \mathcal S=\{S_{v_1},S_{v_2},S_{v_3}\}. \end{align*} Choose the one-set subfamily $\mathcal C=\{S_{v_2}\}$. Its size is $|\mathcal C|=1\le k$, and its union is \begin{align*} \bigcup_{S\in\mathcal C}S=S_{v_2}=\{e_{12},e_{23}\}=U. \end{align*} So $\mathcal C$ is a Set Cover solution of size at most $1$. The selected set is labelled by $v_2$, so it corresponds to the vertex set \begin{align*} C=\{v_2\}. \end{align*} This vertex set covers $e_{12}$ because $e_{12}=\{v_1,v_2\}$ has endpoint $v_2\in C$, and it covers $e_{23}$ because $e_{23}=\{v_2,v_3\}$ has endpoint $v_2\in C$. In this instance, choosing the middle vertex in the Vertex Cover instance is exactly the same reduction choice as choosing the set containing both edge-elements in the Set Cover instance. [/example] This example also shows why the reduction is not parsimonious in any counting sense for all graphs: different vertex covers can sometimes induce the same relevant covering behaviour when duplicate or redundant sets appear. NP-completeness only needs decision preservation, not preservation of the number of witnesses. ## Parsimonious And Non-Parsimonious Reductions The final refinement asks what else a reduction might preserve beyond the yes/no answer. For decision complexity, preserving existence is enough. For counting problems, where we ask how many witnesses exist, the number of solutions can matter. [definition: Parsimonious Reduction] Let $I_A \subseteq \Sigma^*$ and $I_B \subseteq \Gamma^*$ be instance languages for search or counting versions of decision problems $A$ and $B$, with witness sets $W_A(x)$ for $x \in I_A$ and $W_B(y)$ for $y \in I_B$. A function $f:I_A \to I_B$ is parsimonious if, for every $x \in I_A$, there is a bijection \begin{align*} W_A(x) \longleftrightarrow W_B(f(x)). \end{align*} [/definition] A parsimonious reduction preserves not just whether witnesses exist, but their exact number. This is stronger than what NP-completeness requires and is mainly useful when moving from NP to counting classes such as $\#\mathrm P$. [definition: Non-Parsimonious Reduction] A non-parsimonious reduction is a reduction that preserves the yes/no answer but does not necessarily preserve the number of witnesses. [/definition] This distinction prevents a common overclaim. A valid NP-completeness proof may be non-parsimonious, so it should not automatically be reused as a counting reduction. [example: Decision Preservation Without Counting Preservation] In the Vertex Cover to Set Cover reduction, if the set family is treated as labelled by vertices, then a chosen vertex set $C\subseteq V$ determines the chosen subfamily \begin{align*} \mathcal C_C=\{S_v:v\in C\}. \end{align*} Conversely, a labelled chosen subfamily $\mathcal C\subseteq\mathcal S$ determines \begin{align*} C_{\mathcal C}=\{v\in V:S_v\in\mathcal C\}. \end{align*} For labelled sets, these operations preserve the chosen vertices: starting with $C$, the vertices recovered from $\mathcal C_C$ are exactly \begin{align*} C_{\mathcal C_C}=\{v\in V:S_v\in\{S_u:u\in C\}\}. \end{align*} Because the set labelled $S_v$ is selected precisely when its label $v$ lies in $C$, this becomes \begin{align*} C_{\mathcal C_C}=\{v\in V:v\in C\}=C. \end{align*} Starting with $\mathcal C$, the selected labelled sets recovered from $C_{\mathcal C}$ are \begin{align*} \mathcal C_{C_{\mathcal C}}=\{S_v:v\in\{u\in V:S_u\in\mathcal C\}\}. \end{align*} The condition $v\in\{u\in V:S_u\in\mathcal C\}$ means exactly $S_v\in\mathcal C$, so \begin{align*} \mathcal C_{C_{\mathcal C}}=\{S_v:S_v\in\mathcal C\}=\mathcal C. \end{align*} Thus this labelled version of the Vertex Cover to Set Cover construction preserves the selected vertices exactly. Decision preservation alone is weaker. Suppose a different gadget reduction has source witness set $W_A(x)$ and target witness set \begin{align*} W_B(f(x))=W_A(x)\times\{0,1\}. \end{align*} Each source witness $w\in W_A(x)$ gives two target witnesses, namely $(w,0)$ and $(w,1)$. If $|W_A(x)|=m$, then the product rule for finite sets gives \begin{align*} |W_B(f(x))|=|W_A(x)\times\{0,1\}|=|W_A(x)|\cdot|\{0,1\}|=m\cdot 2=2m. \end{align*} The yes/no answer is still preserved, because $W_A(x)$ is nonempty exactly when $W_A(x)\times\{0,1\}$ is nonempty: \begin{align*} W_A(x)\ne\varnothing \iff W_A(x)\times\{0,1\}\ne\varnothing \iff W_B(f(x))\ne\varnothing. \end{align*} However, when $m>0$ the witness count changes from $m$ to $2m$, so the reduction can be valid for NP-hardness while failing to be parsimonious. [/example] The practical rule is to state only the preservation property that has been proved. For NP-completeness, prove polynomial construction and a biconditional on yes-instances. For counting complexity, add a separate argument that witnesses correspond bijectively or with a controlled multiplicity. Chapter 8 sharpened the craft of reductions by emphasizing how to state and verify exactly what a mapping preserves. With that technique in hand, we can step back from individual proofs and examine the broader question of whether P and NP are actually different. # 9. The $\mathrm{P}$ versus $\mathrm{NP}$ Question Chapters 0 through 8 built the language needed to state NP-completeness: decision problems, polynomial-time reductions, certificates, and complete problems such as SAT. The prerequisites for this chapter are the definitions of P and NP, polynomial-time many-one reductions, and the Cook-Levin theorem that SAT is NP-complete. This chapter steps back from individual reductions and asks what the theory would mean if the central open question went one way or the other. We also distinguish decision problems from their search and optimization companions, because many applications ask for witnesses rather than yes/no answers. ## Stating the $\mathrm{P}$ versus $\mathrm{NP}$ Question The central question begins with a practical asymmetry. For many problems, a proposed solution can be checked quickly, while finding such a solution appears much harder. The class NP formalizes the checking side, and P formalizes efficient decision by deterministic algorithms. [definition: Polynomial-Time Decidable Language] A language $L \subseteq \{0,1\}^*$ is in $\mathrm{P}$ if there exists a deterministic Turing machine $M$ and a constant $c \ge 1$ such that, for every input $x \in \{0,1\}^*$, $M$ halts in at most $|x|^c + c$ steps and accepts exactly when $x \in L$. [/definition] This definition treats a computational problem as a language, so it records only the yes/no version of the problem. Many central problems have a different asymmetry: a yes answer may come with a witness that is easy to check even when no efficient method for finding the witness is known. To formalize that checking viewpoint, the definition must bound both the size of the certificate and the time needed to verify it. [definition: Nondeterministic Polynomial Time] A language $L \subseteq \{0,1\}^*$ is in $\mathrm{NP}$ if there exist a polynomial $p:\mathbb N \to \mathbb N$ and a polynomial-time deterministic verifier \begin{align*} V:\{0,1\}^* \times \{0,1\}^* \to \{0,1\} \end{align*} such that, for every $x \in \{0,1\}^*$, \begin{align*} x \in L \iff \exists y \in \{0,1\}^* \text{ with } |y| \le p(|x|) \text{ and } V(x,y)=1. \end{align*} [/definition] The certificate $y$ may be a satisfying assignment, a clique, a Hamiltonian cycle, or another object whose validity can be checked within polynomial time. The P versus NP question asks whether efficient verification can always be converted into efficient discovery. [definition: The P Versus NP Question] The P versus NP question is the problem of determining whether \begin{align*} \mathrm{P} = \mathrm{NP} \end{align*} or \begin{align*} \mathrm{P} \neq \mathrm{NP}. \end{align*} [/definition] The inclusion $\mathrm{P} \subseteq \mathrm{NP}$ follows from the verifier viewpoint: if a machine decides $L$ in polynomial time, the verifier can ignore the certificate and run that machine. The unknown direction is whether every polynomially checkable certificate relation has a polynomial-time algorithm for deciding the underlying language. To connect this global question to a concrete problem, we use the completeness of SAT as a single test case for all of NP. [quotetheorem:6214] [citeproof:6214] This theorem is the reason NP-completeness is a useful substitute for directly attacking every problem in NP. The hypothesis uses SAT specifically because SAT is NP-complete: every language in NP reduces to it, so a SAT decider can be placed at the end of those reductions. The same conclusion would fail for an arbitrary language in NP; for example, the empty language $\varnothing$ and the singleton language $\{0\}$ are both in P, so polynomial-time algorithms for them give no reason to expect algorithms for all verifiable problems. The theorem also says only that the resulting algorithms have polynomial worst-case running time, not that the exponents, constants, or reductions would make them practical. [example: Consequence of a Polynomial-Time SAT Algorithm] Suppose SAT has a decider $A$ that runs in time $n^5$ on formulas of length $n$. Let $m=|(G,k)|$ be the encoding length of a Clique instance, and suppose the polynomial-time reduction from Clique to SAT outputs a formula $\varphi_{G,k}$ in time at most $a m^d+a$ and with length at most $a m^d+a$, for constants $a\ge 1$ and $d\ge 1$. The reduction has the correctness property \begin{align*} G \text{ has a clique of size at least } k \iff \varphi_{G,k} \text{ is satisfiable}. \end{align*} Therefore the algorithm for Clique is: compute $\varphi_{G,k}$, run $A$ on $\varphi_{G,k}$, and return the same yes/no answer. Its running time is bounded by the reduction time plus the SAT-decider time: \begin{align*} T(m) \le (a m^d+a) + |\varphi_{G,k}|^5. \end{align*} Using $|\varphi_{G,k}| \le a m^d+a$, this gives \begin{align*} T(m) \le (a m^d+a) + (a m^d+a)^5. \end{align*} Since $m\ge 1$ and $d\ge 1$, we have $m^d\ge 1$, so \begin{align*} a m^d+a \le a m^d+a m^d = 2a m^d. \end{align*} Substituting this bound into the previous inequality gives \begin{align*} T(m) \le 2a m^d + (2a m^d)^5. \end{align*} Expanding the fifth power gives \begin{align*} T(m) \le 2a m^d + 32a^5 m^{5d}. \end{align*} Because $m^d\le m^{5d}$ for $m\ge 1$ and $d\ge 1$, we obtain \begin{align*} T(m) \le (2a+32a^5)m^{5d}. \end{align*} Thus Clique would be decidable in polynomial time, with exponent at most $5d$. The point is that a polynomial-time SAT algorithm would not remain isolated: every NP-complete problem with a polynomial-time reduction to SAT would inherit a polynomial-time decision algorithm by the same composition argument. [/example] ## Consequences of Either Answer What would actually change if the equality were resolved? The answer depends on the kind of computational task: decision, search, optimization, security, or formal reasoning. The mathematical statement is about languages, but its effects reach much further because many practical tasks reduce to languages. [remark: Meaning of P Equals NP] The statement $\mathrm{P}=\mathrm{NP}$ would mean that every polynomially verifiable decision problem has a polynomial-time decision algorithm. It would not say that every such algorithm is fast in practice, nor would it determine the polynomial exponent or constant factors. [/remark] The caveat matters because a theoretical polynomial such as $n^{100}$ is not usable at ordinary input sizes. Still, the existence of polynomial-time algorithms would be a major structural discovery, especially when combined with self-reductions that turn decision algorithms into witness-finding algorithms. [remark: Meaning of P Not Equal NP] The statement $\mathrm{P}\neq \mathrm{NP}$ would mean that at least one language in NP has no polynomial-time deterministic decider. Since SAT is NP-complete, it would imply that no NP-complete language is in P. [/remark] This would not prove that every natural problem outside P is NP-complete. It would separate efficient verification from efficient decision, but it would still leave many finer questions about average-case hardness, approximation, and special input families. [explanation: Search Optimization Cryptography and Reasoning] For search problems, the issue is whether a positive instance can be accompanied by an efficiently found witness. SAT asks for a truth assignment, Clique asks for a set of vertices, and Hamiltonian Cycle asks for a cycle. For many NP-complete problems, decision and search are polynomial-time equivalent, so a decision algorithm would also recover witnesses. For optimization problems, the decision version asks whether there is a solution of value at least or at most a threshold. Binary search over the threshold, combined with decision calls, often computes the optimum value; further self-reductions may recover an optimizer. Clique illustrates this pattern: deciding whether a $k$-clique exists can be used to find the maximum clique size, and then to find vertices in such a clique. For cryptography, the most direct threat from $\mathrm{P}=\mathrm{NP}$ is to schemes whose security relies on worst-case NP search hardness. Modern cryptographic security is usually phrased in average-case and probabilistic terms: an encryption or signature scheme must resist random polynomial-time adversaries on typical keys and messages, not only encode a worst-case NP language. A one-way function, for instance, asks for a polynomial-time computable map that is hard to invert on average over its input distribution. Thus a proof that $\mathrm{P}\neq\mathrm{NP}$ would not by itself establish secure cryptography, because worst-case decision hardness does not automatically give average-case inversion hardness. For automated reasoning, SAT and theorem-proving procedures sit close to formal proof search. If NP-complete search problems became polynomial-time solvable with practical bounds, many finite proof-search and constraint-solving tasks would become dramatically more tractable. The theoretical statement alone would still need algorithmic content before changing implementations. [/explanation] The course therefore treats P versus NP as both a formal language question and a guide to the limits of reduction-based evidence. The next section makes the decision-search connection precise in the central case of SAT. ## Search-to-Decision Self-Reductions A decision problem answers whether a witness exists; a search problem asks for the witness itself. For SAT, the formula supplies enough structure that repeated decision queries can pin down a satisfying assignment one variable at a time. This procedure is called a self-reduction because it solves the search version using the decision version of the same problem. [definition: Search SAT] The search version of SAT is a search relation with input set the Boolean formulas and output set $\{0,1\}^* \cup \{\bot\}$. On input a Boolean formula $\varphi$, any satisfying assignment for $\varphi$ is an acceptable output if one exists, and $\bot$ is the required output otherwise. [/definition] Equivalently, Search SAT may be viewed as a multivalued function problem, since a satisfiable formula can have many acceptable satisfying assignments. The search problem has a richer output type than the language SAT, but it is still controlled by polynomial-time access to a decision oracle. The key operation is to test whether satisfiability survives after fixing one variable. [quotetheorem:6215] [citeproof:6215] This proof is constructive enough to be used as an algorithmic template. Its essential hypothesis is not merely that satisfiability is decidable, but that the restricted formulas remain in the same representation with only polynomial growth. A boundary case would be a decision problem whose yes-instances have witnesses but whose input encoding gives no efficient way to impose or test partial commitments; then the variable-fixing argument has no direct analogue. The theorem also does not say that every NP search problem automatically reduces to its decision version without extra structure, only that SAT has a particularly transparent self-reduction. This is why the next comparison uses Clique, where the same principle works but must be phrased in terms of thresholds and vertex choices. [example: Recovering a Satisfying Assignment from a SAT Oracle] Let $\varphi(x_1,x_2,x_3)$ be satisfiable. We recover a satisfying assignment by maintaining the invariant that the currently restricted formula is satisfiable. First query the SAT oracle on $\varphi[x_1:=0]$. If the oracle answers yes, set $b_1=0$, and the restricted formula \begin{align*} \varphi_1=\varphi[x_1:=0] \end{align*} is satisfiable by the meaning of the oracle answer. If the oracle answers no, then $\varphi[x_1:=0]$ is unsatisfiable. Since $\varphi$ is satisfiable, there is some satisfying assignment $(c_1,c_2,c_3)$ for $\varphi$. The value $c_1$ is either $0$ or $1$; the case $c_1=0$ would make $\varphi[x_1:=0]$ satisfiable, contradicting the oracle answer. Hence $c_1=1$, so set $b_1=1$, and \begin{align*} \varphi_1=\varphi[x_1:=1] \end{align*} is satisfiable. Now query the oracle on $\varphi_1[x_2:=0]$. If it answers yes, set $b_2=0$ and put \begin{align*} \varphi_2=\varphi_1[x_2:=0]. \end{align*} If it answers no, then $\varphi_1[x_2:=0]$ is unsatisfiable. Because $\varphi_1$ is satisfiable, some satisfying completion has $x_2=0$ or $x_2=1$; the value $x_2=0$ has been ruled out, so some satisfying completion has $x_2=1$. Set $b_2=1$ and put \begin{align*} \varphi_2=\varphi_1[x_2:=1]. \end{align*} In either case, $\varphi_2$ is satisfiable. Finally query the oracle on $\varphi_2[x_3:=0]$. If it answers yes, set $b_3=0$; if it answers no, then satisfiability of $\varphi_2$ forces the remaining value $b_3=1$. Thus the assignment \begin{align*} (x_1,x_2,x_3)=(b_1,b_2,b_3) \end{align*} satisfies $\varphi$. The oracle has converted the yes/no information into an explicit witness by preserving satisfiability after each variable choice. [/example] The same idea applies to many NP-complete problems, but the details depend on the representation of partial solutions. Clique gives a useful comparison because its decision, search, and optimization versions have different outputs while remaining tightly connected. [definition: Clique Decision Search and Optimization] For a finite undirected graph $G=(V,E)$ and $k \in \mathbb N$, the decision version of Clique asks whether $G$ has a clique of size at least $k$. The search version asks for a clique of size at least $k$ when one exists. The optimization version asks for the maximum integer $\omega(G)$ such that $G$ contains a clique of size $\omega(G)$. [/definition] The decision version is the one placed inside NP as a language. The search and optimization versions are often the forms encountered in applications, where the goal is to exhibit a group of mutually adjacent vertices or find the largest such group. [quotetheorem:6216] [citeproof:6216] The monotonicity of the threshold predicate is the key hypothesis: if a clique of size at least $k$ exists, then the decision answer is also yes for every smaller threshold. A non-monotone predicate such as "does $G$ have a clique of exactly $k$ vertices and no larger clique?" would produce a yes answer at only one threshold, so binary search over $k$ would discard relevant thresholds without justification. Even a linear scan would need a separate rule connecting the pattern of yes/no answers to the objective value. The theorem computes the optimum value, not by itself the vertices of an optimum clique, although the vertices can be recovered by additional oracle queries that test whether an optimum clique survives after deleting or requiring particular vertices. This mirrors the SAT construction: each query asks whether a partial commitment can be extended to a full solution, but the commitments now live in the graph rather than in truth assignments. [example: Comparing Three Clique Versions] Let $G=(V,E)$ have $n$ vertices. The decision version with threshold $k=5$ returns only the truth value of the statement \begin{align*} \exists C\subseteq V \text{ such that } |C|\ge 5 \text{ and every two distinct vertices of } C \text{ are adjacent}. \end{align*} The search version asks for such a set $C$ itself; for example, $\{v_2,v_4,v_7,v_9,v_{11}\}$ is a valid output exactly when each of the ten pairs in this set is an edge of $G$. The optimization version asks for \begin{align*} \omega(G)=\max\{|C|: C\subseteq V \text{ is a clique in }G\}. \end{align*} The decision answers determine $\omega(G)$ because, for each threshold $k$ with $1\le k\le n$, the oracle answers yes on $(G,k)$ exactly when there is a clique $C\subseteq V$ with $|C|\ge k$. By the definition of $\omega(G)$, such a clique exists exactly when $k\le \omega(G)$. Hence the yes thresholds are precisely \begin{align*} \{k\in\{1,\dots,n\}:k\le \omega(G)\}=\{1,\dots,\omega(G)\}. \end{align*} Querying all thresholds $1,\dots,n$ and selecting the largest threshold with answer yes therefore returns exactly $\omega(G)$. Knowing this value still does not display a clique. Suppose $\omega(G)=r$. To recover vertices, maintain a selected set $S$ and a remaining graph $H$ such that $S$ is a clique, every vertex of $H$ is adjacent to every vertex of $S$, and $H$ contains a clique of size $r-|S|$. Initially $S=\varnothing$ and $H=G$, so the invariant says exactly that $G$ contains an $r$-clique. Choose a vertex $v$ of $H$ and query whether $H-v$ has a clique of size $r-|S|$. If the answer is yes, replace $H$ by $H-v$ and keep $S$ unchanged; the invariant is preserved because the needed clique remains in the smaller graph. If the answer is no, then every clique of size $r-|S|$ in $H$ contains $v$. Add $v$ to $S$, replace $H$ by the subgraph induced by the neighbors of $v$ in $H-v$, and reduce the remaining target from $r-|S|$ to $r-|S|-1$. The neighbor restriction is valid because every other vertex in a clique containing $v$ must be adjacent to $v$. Repeating this process until $|S|=r$ produces an $r$-clique. These extra oracle queries are exactly what separates computing the optimum value from producing an optimum witness. [/example] ## $\mathrm{NP}$-Intermediate Problems NP-completeness gives a powerful dichotomy for many natural problems: either the problem is in P, or it is at least as hard as every problem in NP under polynomial-time reductions. The P versus NP question leaves open a third possibility for problems in NP that are neither efficiently decidable nor NP-complete. [Ladner's theorem](/theorems/6217) says that, assuming $\mathrm{P}\neq\mathrm{NP}$, such problems must exist. [definition: NP-Intermediate Language] Assume $\mathrm{P}\neq\mathrm{NP}$. A language $L$ is NP-intermediate if $L \in \mathrm{NP}$, $L \notin \mathrm{P}$, and $L$ is not NP-complete under polynomial-time many-one reductions. [/definition] This is a conditional category: it is empty if $\mathrm{P}=\mathrm{NP}$, and nonempty if $\mathrm{P}\neq\mathrm{NP}$ by the theorem below. The point is not that the construction gives a familiar practical problem, but that the landscape of NP cannot consist only of P and NP-complete languages unless P equals NP. [quotetheorem:6217] [citeproof:6217] Ladner's theorem is often the first warning that NP-completeness is not the only possible explanation for a problem's apparent difficulty. The conditional hypothesis is essential: if $\mathrm{P}=\mathrm{NP}$, then every language in NP is already in P, so there is no room for an NP-intermediate language under this definition. The theorem proves existence by an artificial diagonal construction, not by identifying a familiar computational problem from practice. Some natural candidates, such as graph isomorphism and integer factorization, have historically been discussed as possible intermediate problems, but Ladner's theorem alone does not prove that either lies outside P or fails to be NP-complete. [remark: Conditional Nature of Natural Candidates] Saying that a problem is a candidate for NP-intermediate status does not prove it is outside P or not NP-complete. It means that current reductions and algorithms do not place it cleanly into either class, and Ladner's theorem makes such a middle region compatible with $\mathrm{P}\neq\mathrm{NP}$. [/remark] The chapter closes the first arc of the course. We now have the formal question, the reason SAT is central, the passage from decision to search for core NP-complete problems, and the conditional existence of problems between P and NP-complete. Later topics refine this picture by changing the resource bound, the reduction type, or the computational model. After building the language of completeness and hardness, the course turns to the central open problem: whether every efficiently verifiable problem is also efficiently solvable. The next chapter changes the resource bound from time to space and studies how altering the model changes the landscape around P and NP. # 10. Space, Complementation, and Nondeterministic Space These notes develop the part of computational complexity theory that compares deterministic computation, nondeterministic computation, and the resources needed to simulate one model by another. Chapters 3 through 9 studied polynomial time, nondeterministic polynomial time, reductions, and NP-completeness; here the controlling resource changes from time to working memory. The main question is how much information a computation must remember, rather than how many steps it takes, assuming familiarity with Turing machines, asymptotic notation, and the basic classes $\operatorname{P}$ and $\operatorname{NP}$. Space has a distinctive feature: a machine using little space may run for a very long time, but it can visit only a bounded number of configurations before repeating. That observation turns space-bounded computation into graph reachability in a finite configuration graph. ## Space-Bounded Nondeterminism How should we measure memory when the input itself may be much longer than the workspace? The standard convention is that the read-only input tape is not charged, while work tapes are charged. This separates the memory needed to inspect the input from the memory needed to store intermediate information. [definition: Space-Bounded Decider] Let $f: \mathbb N \to \mathbb N$. A deterministic Turing machine $M$ decides a language $L$ in space $f(n)$ if, for every input $x$ of length $n$, $M$ halts, accepts exactly the strings in $L$, and uses at most $O(f(n))$ cells on its work tapes. [/definition] The deterministic definition measures a single computation path. The next question is how to count memory when a machine is allowed to make choices. Since different branches may explore different certificates, the space bound must constrain every branch rather than only the successful one. [definition: Nondeterministic Space Class] Let $f: \mathbb N \to \mathbb N$. The class $\operatorname{NSPACE}(f(n))$ consists of all languages $L$ decided by a nondeterministic Turing machine $M$ such that, on every input $x$ of length $n$, every computation branch halts, $M$ accepts $x$ iff at least one branch accepts, and every branch uses at most $O(f(n))$ work-tape cells. [/definition] This gives the space analogue of nondeterministic time, with a uniform bound across all branches. The halting condition matters because rejecting inputs should have no accepting branch and no infinite branch. We therefore need names for the polynomially bounded versions, since these are the space classes that will be compared with polynomial time and nondeterministic polynomial time. [definition: Polynomial Space Classes] The classes $\operatorname{PSPACE}$ and $\operatorname{NPSPACE}$ are defined by \begin{align*} \operatorname{PSPACE} &= \bigcup_{k \ge 1} \operatorname{SPACE}(n^k). \end{align*} \begin{align*} \operatorname{NPSPACE} &= \bigcup_{k \ge 1} \operatorname{NSPACE}(n^k). \end{align*} [/definition] These classes are large enough to contain computations that may take exponential time while remembering only polynomially many symbols. The main theme of the chapter is that nondeterminism adds less power to space than it appears to add to time. [example: Depth First Search with Logarithmic Space Counter] Consider directed reachability in a graph $G$ with vertex set $\{1,\dots,N\}$, given by its $N\times N$ adjacency matrix $A$, where $A_{ij}=1$ exactly when there is an edge $i\to j$. A deterministic depth-first search may store a visited bit for each vertex, so its visited array uses $N$ bits. Since the adjacency-matrix input has length $N^2$ bits, this is polynomial in the input length because $N=(N^2)^{1/2}$, but it is not logarithmic in the input length because $\log(N^2)=2\log N$ and $N$ is not bounded by $C\log N$ for any constant $C$ and all large $N$. A nondeterministic machine can instead store one proposed path. It keeps the current vertex $v$, a guessed next vertex $u$, and a counter $c\in\{0,\dots,N-1\}$. Each vertex name needs $\lceil \log_2 N\rceil$ bits, and the counter also needs $\lceil \log_2 N\rceil$ bits, so these registers use at most $3\lceil \log_2 N\rceil=O(\log N)$ bits. At each step, the machine guesses $u\in\{1,\dots,N\}$, computes the adjacency-matrix position for $A_{v,u}$ using the row-and-column indices $v$ and $u$, checks that this bit is $1$, replaces $v$ by $u$, and increments $c$. If $v=t$ at any point, it accepts. If $c=N-1$ and $v\neq t$, that branch rejects. The bound $N-1$ is enough: if $t$ is reachable from $s$, choose a shortest directed path \begin{align*} s=v_0\to v_1\to \cdots \to v_\ell=t. \end{align*} No vertex repeats on this shortest path, because if $v_i=v_j$ with $0\le i<j\le \ell$, then removing the cycle gives \begin{align*} v_0\to\cdots\to v_i=v_j\to v_{j+1}\to\cdots\to v_\ell, \end{align*} a shorter path from $s$ to $t$. Thus the path has at most $N$ distinct vertices, so $\ell\le N-1$. Nondeterminism is useful here because the machine stores only the single path it is guessing, not the full visited set or search frontier. [/example] The example motivates the central complete problem for small-space nondeterminism, but the same idea applies to every space-bounded nondeterministic machine. Its possible instantaneous descriptions form a directed graph. ## Configuration Graphs What finite object represents all possible behaviours of a space-bounded computation? A configuration records the state, head positions, and work-tape contents. If the space bound is small, then the number of such records is bounded even when the computation runs for many steps. [definition: Configuration Graph] Fix a Turing machine $M$ and an input $x$. The configuration graph $G_{M,x}$ is the directed graph whose vertices are configurations of $M$ on input $x$, with an edge $C \to C'$ when $M$ can move from configuration $C$ to configuration $C'$ in one transition. [/definition] The start configuration and the accepting configurations become distinguished vertices or sets of vertices in this graph. For a nondeterministic machine, acceptance means that some accepting configuration is reachable from the start configuration. This translation would be useless without a quantitative bound: if a space-$s(n)$ machine could have arbitrarily many configurations, reachability would not give a finite search problem controlled by the space bound. The logarithmic lower bound in the next theorem is the point at which input-head positions, which range over $n$ cells, can be absorbed into the same exponential estimate as the work tape. [quotetheorem:6218] [citeproof:6218] This bound is the bridge from machines to graphs. The hypothesis $s(n) \ge \log n$ is needed because even a machine using no work tape may have $n$ possible input-head positions, and $n$ is not bounded by $2^{O(s(n))}$ when $s(n)$ is sublogarithmic. The theorem does not say that the computation finishes in $2^{O(s(n))}$ steps unless the machine is forced to halt before repeating configurations; it only bounds the number of distinct instantaneous descriptions. What it gives for the sequel is a finite directed graph in which nondeterministic acceptance becomes ordinary reachability. [example: Log-Space Configurations as Vertices] Let $M$ be a fixed nondeterministic machine using at most $C\log n$ work-tape cells on inputs of length $n$, for some constant $C$. Fix an input $x$ of length $n$. A configuration records the current state, the input-head positions, the work-tape head positions, and the work-tape contents. The state takes constant space because $M$ has a fixed finite set of states. If $M$ has $r$ input heads, then each input-head position is an integer in $\{0,1,\dots,n+1\}$, so one position can be encoded using $\lceil \log_2(n+2)\rceil$ bits. Since $r$ is fixed, all input-head positions together use \begin{align*} r\lceil \log_2(n+2)\rceil=O(\log n) \end{align*} bits. If $M$ has $q$ work-tape heads, then each work-tape head position lies in $\{1,\dots,\lceil C\log n\rceil\}$, so one such position can be encoded using \begin{align*} \left\lceil \log_2\lceil C\log n\rceil\right\rceil=O(\log\log n) \end{align*} bits. Since $q$ is fixed, all work-tape head positions together use \begin{align*} q\left\lceil \log_2\lceil C\log n\rceil\right\rceil=O(\log\log n). \end{align*} Finally, let the work-tape alphabet have size $g$. The work-tape contents are strings of length at most $\lceil C\log n\rceil$ over this alphabet, so there are at most \begin{align*} g^{\lceil C\log n\rceil} \end{align*} possible work-tape contents. One such string can therefore be encoded using \begin{align*} \left\lceil \log_2\left(g^{\lceil C\log n\rceil}\right)\right\rceil \end{align*} bits. Since $\log_2(a^b)=b\log_2 a$ for positive integers $a,b$, this is \begin{align*} \left\lceil \lceil C\log n\rceil\log_2 g\right\rceil=O(\log n), \end{align*} because $C$ and $g$ are constants. Adding the components gives \begin{align*} O(1)+O(\log n)+O(\log\log n)+O(\log n)=O(\log n). \end{align*} Thus each configuration has an $O(\log n)$-bit encoding. Hence the total number of possible encoded configurations is at most \begin{align*} 2^{O(\log n)}=n^{O(1)}, \end{align*} so $G_{M,x}$ has polynomially many vertices. The edge relation is locally checkable in logarithmic space: given two encoded configurations $C$ and $C'$, the tester scans their state fields, head-position fields, and relevant work-tape symbols, then verifies that one transition rule of the fixed machine $M$ transforms $C$ into $C'$. This uses the two $O(\log n)$-bit configuration encodings and only constant extra information about the transition rule, so log-space computations give polynomial-size configuration graphs with compact vertices and log-space decidable edges. [/example] This explains why graph reachability is not merely an example but a normal form for nondeterministic space. To simulate a space-bounded machine deterministically, it suffices to solve reachability in this implicit graph without writing down the whole graph. ## Savitch-Style Reachability How can a deterministic machine decide reachability without storing the entire set of visited vertices? Savitch's idea is to replace search by divide-and-conquer. To decide whether $v$ reaches $w$ in at most $2m$ steps, choose a midpoint $u$ and check whether $v$ reaches $u$ in at most $m$ steps and $u$ reaches $w$ in at most $m$ steps. [definition: Bounded Reachability Predicate] For a directed graph $G=(V,E)$, define the predicate \begin{align*} \operatorname{Reach}_G: V \times V \times \mathbb N \to \{\mathrm{true},\mathrm{false}\} \end{align*} by setting $\operatorname{Reach}_G(a,b,\ell)=\mathrm{true}$ exactly when there is a directed path from $a$ to $b$ of length at most $\ell$. [/definition] The predicate has a recursive structure: a path of length at most $\ell$ either is very short, or has a midpoint after at most half of its steps. A usual breadth-first search or depth-first search would keep a frontier or a visited set of size $\Theta(N)$ in the worst case, so it is not a small-space solution when vertices require only $O(\log N)$ bits individually. The cost of avoiding that visited set is that the deterministic algorithm tries all possible midpoints and recomputes many subproblems. The next result states this trade-off precisely, with the adjacency test separated as its own workspace cost. [quotetheorem:6219] [citeproof:6219] The algorithm may take much more time than ordinary graph search because it repeats subproblems. This is not a polynomial-time reachability algorithm in general: the midpoint loop branches over all $N$ vertices at each recursive level, and the same reachability questions can be recomputed many times. The theorem also depends on compact vertex encodings and a small-space adjacency test. If vertices are not representable in $O(\log N)$ space, then a single recursion frame may already exceed the claimed budget before any midpoint search begins. For example, an implicit graph might name a vertex by an entire subset of $\{1,\dots,N\}$, requiring $\Theta(N)$ bits per endpoint; then storing $a$, $b$, $u$, and the length bound at one level costs linear space. Likewise, if testing $(u,v)\in E$ itself required writing the whole adjacency matrix, the stated space bound would disappear. Its role is therefore specifically space-theoretic: it replaces a linear-size visited set by a logarithmic-depth recursion stack only when individual vertices and local edge tests are compact. [example: Applying the Recursion] Suppose the vertices of $G$ are encoded by numbers in $\{1,\dots,N\}$, and we want to decide whether $s$ reaches $t$. The recursive procedure starts with the question \begin{align*} \operatorname{Reach}_G(s,t,N-1). \end{align*} For $N\ge 2$, it tests every midpoint $u\in\{1,\dots,N\}$ and accepts this call exactly when, for some $u$, \begin{align*} \operatorname{Reach}_G\left(s,u,\left\lceil \frac{N-1}{2}\right\rceil\right) \quad\text{and}\quad \operatorname{Reach}_G\left(u,t,\left\lfloor \frac{N-1}{2}\right\rfloor\right) \end{align*} are both true. The two length bounds cover the original budget because, for the integer $N-1$, \begin{align*} \left\lceil \frac{N-1}{2}\right\rceil+\left\lfloor \frac{N-1}{2}\right\rfloor=N-1. \end{align*} At one recursive level the machine stores the two endpoints of the current subproblem, the length bound, and the current midpoint $u$. Each of these is an integer between $1$ and $N$, or between $0$ and $N-1$ for the length bound, so each field uses at most $\lceil \log_2 N\rceil$ bits. Thus one stack frame uses \begin{align*} O(\log N)+O(\log N)+O(\log N)+O(\log N)=O(\log N) \end{align*} bits. After $d$ recursive splits, every remaining length bound is at most \begin{align*} \left\lceil \frac{N-1}{2^d}\right\rceil. \end{align*} Choosing $d=\lceil \log_2(N-1)\rceil$ gives $2^d\ge N-1$, hence \begin{align*} \left\lceil \frac{N-1}{2^d}\right\rceil\le 1, \end{align*} so the recursion has reached the base cases $\ell=0$ or $\ell=1$. Therefore the stack depth is $O(\log N)$, and the total stack space is \begin{align*} O(\log N)\cdot O(\log N)=O((\log N)^2). \end{align*} The algorithm may reconsider the same midpoint and the same subproblem many times, but it pays for that repetition in time rather than in a stored visited set. [/example] The recursion solves directed reachability in a graph whose vertices are explicitly represented by short strings. A configuration graph for a space-bounded machine is usually implicit, but its vertices are still short descriptions and its edge relation is locally checkable. The possible failure would be an implicit graph whose vertices cannot be enumerated or whose edge relation cannot be checked within the intended space bound; configuration graphs avoid this because a configuration is just a bounded tape snapshot together with state and head positions. We now need to ask what this graph algorithm says about every nondeterministic space computation, not only about reachability instances supplied as graphs. The resulting theorem is the main deterministic simulation of nondeterministic space. [quotetheorem:6220] [citeproof:6220] The theorem is striking because the deterministic simulation pays only a quadratic factor in space. The condition $f(n)\ge \log n$ is used to encode input-head positions and configuration names inside the same asymptotic budget; sublogarithmic space requires separate conventions and is not covered by this statement. Space-constructibility ensures that the simulator can manage the intended space bound and configuration lengths uniformly, rather than relying on an unavailable numerical promise. The theorem does not prove that nondeterminism is useless for space at every exact bound, nor does it give an efficient deterministic time simulation. This is the space-bounded counterpart of the earlier comparison between $\operatorname{P}$ and $\operatorname{NP}$: nondeterministic choices are converted into a deterministic search over configurations, but the conversion spends repeated time rather than a large visited set. Applying it to polynomial bounds gives the main class-level consequence. [quotetheorem:6221] [citeproof:6221] This equality is one of the first signs that space complexity has different closure behaviour from time complexity. The proof uses both parts of the polynomial-space definition: deterministic machines are included among nondeterministic machines, and a quadratic increase in a polynomial bound remains polynomial. It does not identify the exact deterministic and nondeterministic classes at a fixed bound $n^k$, since the argument may move from $n^k$ to $n^{2k}$. The forward lesson is that polynomial space is robust under nondeterministic branching, which prepares the complementation questions that follow. ## Complementation and Space If a nondeterministic machine accepts when some branch succeeds, how can we decide the complement, which asks that no branch succeeds? For deterministic space the answer is immediate only for deciders: once every computation halts with a definite answer, we may swap accepting and rejecting outcomes. If the machine were merely a recogniser that might run forever, reversal would not turn nonacceptance into acceptance because an infinite computation has no final state to reverse. For nondeterministic space the answer is deeper, especially at logarithmic space. Here $\operatorname{NL}$ denotes nondeterministic logarithmic space, and $\operatorname{coNL}$ denotes the class of complements of languages in $\operatorname{NL}$. [quotetheorem:6222] [citeproof:6222] For deterministic machines, the decider hypothesis is essential: without halting on all inputs, the complement machine could wait forever on a string outside $L$ rather than accepting it. The theorem also says nothing about improving the space bound; it preserves the same workspace because the simulation is literally the original computation followed by a reversed final answer. This direct closure is the baseline against which nondeterministic complementation is surprising. For nondeterministic machines, reversing accepting and rejecting states does not complement the language. A machine for $L$ accepts if at least one branch accepts; after swapping states, it accepts if at least one original branch rejects, which is a different condition. [example: Why Swapping Branch Outcomes Fails] Consider a nondeterministic machine $M$ whose computation on input $x$ has exactly two halting branches: branch $B_1$ accepts and branch $B_2$ rejects. By the acceptance convention for nondeterministic deciders, $M$ accepts $x$ exactly when at least one branch accepts, and branch $B_1$ is such a branch. Hence $x$ is accepted by $M$. Now form a machine $M'$ by swapping accepting and rejecting halting outcomes on each branch of $M$. Under this swap, branch $B_1$ becomes rejecting and branch $B_2$ becomes accepting. Therefore $M'$ also has at least one accepting branch on input $x$, namely the image of $B_2$, so $M'$ accepts $x$ as well. Thus branchwise reversal does not complement a nondeterministic computation: in this example both the original machine and the swapped machine accept the same input. [/example] The example shows that complementation for nondeterministic space cannot be obtained by a local change to the machine. The landmark [Immerman-Szelepcsenyi theorem](/theorems/6223) answers the right question instead: non-reachability in a configuration graph can itself be certified within comparable nondeterministic space. In particular, $\operatorname{NL}=\operatorname{coNL}$. The result is therefore a robust closure theorem for standard space classes such as $\operatorname{NL}$ and $\operatorname{NSPACE}(n^k)$, but it should not be read as an exact-bound statement for arbitrary nonconstructible or sublogarithmic functions. [remark: Complementation as a Space Phenomenon] The equality $\operatorname{NL}=\operatorname{coNL}$ has no known analogue saying $\operatorname{NP}=\operatorname{coNP}$. Space-bounded computation allows the configuration graph to be counted and revisited while keeping only a small amount of local information. Time-bounded computation does not permit the same kind of repeated global recomputation without paying for all repetitions. [/remark] The deterministic closure theorem, [Savitch's theorem](/theorems/6220), and Immerman-Szelepcsenyi together form the main lesson of the chapter. Space-bounded computation can trade time for memory, and configuration graphs make this trade precise. These results also connect back to the course's broader hierarchy questions: $\operatorname{PSPACE}$ contains $\operatorname{NP}$ by reusing polynomial time as polynomial space, Savitch's theorem shows that polynomial space is unchanged by nondeterministic branching, and $\operatorname{NL}=\operatorname{coNL}$ gives a low-space complementation theorem with no known time-bounded analogue for $\operatorname{NP}$ versus $\operatorname{coNP}$. Chapter 10 showed that space behaves differently from time, with results such as Savitch's theorem and $\operatorname{NL}=\operatorname{coNL}$ illustrating stronger closure properties under space bounds. That contrast prepares the ground for a finer time-bounded hierarchy, where repeated alternation between existential and universal choices creates the polynomial hierarchy. # 11. The Polynomial Hierarchy After Chapter 10's space-bounded detour, this chapter returns to time-bounded complexity and extends the course's study of $P$, $NP$, $coNP$, and NP-completeness from one witness to several alternating layers of choices. The prerequisite ideas are polynomial-time many-one reductions, Boolean satisfiability, nondeterministic verification, and the complement class $coNP$. The main goals are to define oracle computation, build the classes $\Sigma_k^P$, $\Pi_k^P$, and $\Delta_k^P$, prove bounded quantified satisfiability completeness, and understand why equality between matching existential and universal levels would collapse the whole hierarchy. Conceptually, the chapter asks how much extra power is gained when a proposed certificate must survive every counter-certificate, or when each adversarial move may be followed by a bounded repair certificate. ## Oracle Machines and Relativized Computation How should we model an algorithm that is allowed to use another decision problem as a subroutine at unit cost? Reductions already compare problems by translating instances, but oracle computation gives a more flexible model: the machine may ask many adaptive questions while it runs. This lets us define classes such as $P^A$ and $NP^A$, and it prepares the language needed for higher levels of the hierarchy. [definition: Oracle Turing Machine] An oracle Turing machine is a Turing machine with an additional oracle tape, a query state, and two answer states. For a language $A \subset \{0,1\}^*$ used as an oracle, entering the query state with word $w$ on the oracle tape sends the machine in one step to the yes answer state if $w \in A$, and to the no answer state if $w \notin A$. [/definition] The oracle language is not part of the ordinary input; it is a fixed external resource. A computation can therefore be polynomial time even if the oracle language is undecidable, since the cost model charges one step per query rather than the time needed to decide the query by ordinary means. We therefore need a class that records deterministic polynomial-time computation relative to a chosen oracle language. [definition: Relativized Polynomial Time] For a language $A \subset \{0,1\}^*$, $P^A$ is the class of languages decided by deterministic oracle Turing machines running in time polynomial in the input length with oracle $A$. [/definition] The class $P^A$ measures adaptive deterministic access to $A$, but the hierarchy will also need machines that guess witnesses while consulting an oracle. The nondeterministic version keeps the same oracle access and allows accepting branches, so it combines the verifier viewpoint from NP with the subroutine viewpoint from reductions. [definition: Relativized Nondeterministic Polynomial Time] For a language $A \subset \{0,1\}^*$, $NP^A$ is the class of languages decided by nondeterministic oracle Turing machines running in time polynomial in the input length with oracle $A$. [/definition] These definitions recover the familiar classes when the oracle is harmless. If $A \in P$, then each oracle query can be simulated by a polynomial-time decider for $A$, so $P^A = P$ and $NP^A = NP$ up to polynomial overhead. Interesting relativization begins when $A$ is treated as a powerful problem, such as SAT. [example: Using A SAT Oracle] Let $A=\mathrm{SAT}$, and let $\varphi(x_1,\dots,x_n)$ be the input formula. First ask the oracle whether $\varphi$ is satisfiable. If the answer is no, then $\varphi$ has zero satisfying assignments, so the machine rejects. If the answer is yes, recover one satisfying assignment one variable at a time. Suppose values $a_1,\dots,a_{i-1}$ have already been chosen so that \begin{align*} \varphi(a_1,\dots,a_{i-1},x_i,\dots,x_n) \end{align*} is satisfiable. Ask the SAT oracle whether \begin{align*} \varphi(a_1,\dots,a_{i-1},0,x_{i+1},\dots,x_n) \end{align*} is satisfiable. If yes, set $a_i=0$. If no, set $a_i=1$; this is valid because the previous restricted formula is satisfiable, and the only two possibilities for $x_i$ are $0$ and $1$. After $n$ such queries, the assignment $a=(a_1,\dots,a_n)$ satisfies $\varphi(a)=1$. Now form the formula \begin{align*} \psi(x_1,\dots,x_n) = \varphi(x_1,\dots,x_n) \wedge \bigl((x_1\ne a_1)\vee(x_2\ne a_2)\vee\cdots\vee(x_n\ne a_n)\bigr). \end{align*} An assignment satisfies $\psi$ exactly when it satisfies $\varphi$ and differs from $a$ in at least one coordinate. Ask the SAT oracle whether $\psi$ is satisfiable. If yes, then $\varphi$ has the satisfying assignment $a$ and another satisfying assignment satisfying $\psi$, so it has at least two satisfying assignments. If no, then every satisfying assignment of $\varphi$ equals $a$, so $\varphi$ has exactly one satisfying assignment. The machine uses $1+n+1$ oracle queries, hence polynomially many queries, and the later queries depend on earlier oracle answers. [/example] The example illustrates why oracle access is stronger than a single many-one reduction. The algorithm asks later questions whose content depends on earlier answers, which is the computational pattern behind Delta-classes in the polynomial hierarchy. ## Quantified Boolean Formulas What is the canonical complete problem when a computation alternates between choices made by a prover and choices made by an adversary? Boolean satisfiability captures one existential block, but adding quantifiers over Boolean assignments gives a hierarchy of problems whose syntax mirrors alternating certificates. [definition: Quantified Boolean Formula] A quantified Boolean formula is a formula of the form \begin{align*} Q_1 X_1\, Q_2 X_2\, \cdots\, Q_k X_k\, \varphi(X_1,\dots,X_k), \end{align*} where each $Q_i \in \{\exists, \forall\}$, each $X_i$ is a finite block of Boolean variables, and $\varphi$ is a quantifier-free Boolean formula. [/definition] The truth value is evaluated by reading the quantifiers over all assignments to the corresponding variable blocks. The total number of assignments may be exponential, but each individual assignment has polynomial length when the formula is part of the input. To connect this syntax to fixed complexity levels, we need bounded-alternation versions that record both the number of blocks and the leading quantifier. [definition: Bounded Alternation QSAT] For $k \ge 1$, $\mathrm{QSAT}_k^\exists$ is the language of true quantified Boolean formulas \begin{align*} \exists X_1\, \forall X_2\, \exists X_3\, \cdots\, Q_k X_k\, \varphi(X_1,\dots,X_k), \end{align*} with $k$ alternating quantifier blocks and leading existential quantifier. The language $\mathrm{QSAT}_k^\forall$ is the language of true quantified Boolean formulas with $k$ alternating quantifier blocks and leading universal quantifier. [/definition] For $k=1$, $\mathrm{QSAT}_1^\exists$ is SAT and $\mathrm{QSAT}_1^\forall$ is TAUT, the language of tautological Boolean formulas. For $k=2$, the statement $\exists X\,\forall Y\,\varphi(X,Y)$ says that there is a single first move that works against every reply. [example: Classifying Exist-Forall Fragments] Consider a quantified Boolean formula \begin{align*} \exists X\,\forall Y\,\varphi(X,Y), \end{align*} where $\varphi$ is quantifier-free. It has leading quantifier $\exists$ and exactly two alternating quantifier blocks, first $X$ and then $Y$. This matches the defining pattern of $\mathrm{QSAT}_2^\exists$: \begin{align*} \exists X_1\,\forall X_2\,\theta(X_1,X_2). \end{align*} Taking $X_1=X$, $X_2=Y$, and $\theta=\varphi$, the formula is an instance of the second existential fragment. For the dual form \begin{align*} \forall X\,\exists Y\,\psi(X,Y), \end{align*} the leading quantifier is $\forall$ and there are again two alternating blocks, so it matches $\mathrm{QSAT}_2^\forall$. Its complement is obtained by negating the whole quantified statement: \begin{align*} \neg\bigl(\forall X\,\exists Y\,\psi(X,Y)\bigr)\equiv \exists X\,\neg\bigl(\exists Y\,\psi(X,Y)\bigr). \end{align*} This uses the quantifier-negation law $\neg\forall X\,A(X)\equiv \exists X\,\neg A(X)$. Applying the second quantifier-negation law $\neg\exists Y\,B(Y)\equiv \forall Y\,\neg B(Y)$ gives \begin{align*} \exists X\,\neg\bigl(\exists Y\,\psi(X,Y)\bigr)\equiv \exists X\,\forall Y\,\neg\psi(X,Y). \end{align*} Thus complementing the universal two-block fragment produces an existential two-block fragment with the quantifier-free predicate negated. [/example] This classification is the syntactic model for the semantic classes introduced next. The point of the hierarchy is that machines need not literally receive quantified formulas; any problem with the same alternating certificate structure belongs to the corresponding level. ## The Levels Sigma, Pi, and Delta How do we turn alternating quantifier patterns into machine classes? The first level is NP: guess a polynomial-size certificate and verify it in polynomial time. Higher levels allow blocks of existential and universal polynomial-size strings, with a polynomial-time predicate at the end. [definition: Existential Polynomial Hierarchy Levels] For $k \ge 1$, a language $L$ belongs to $\Sigma_k^P$ if there exist a polynomial $p$ and a polynomial-time decidable Boolean predicate \begin{align*} R:\{0,1\}^* \times (\{0,1\}^*)^k \to \{0,1\} \end{align*} such that \begin{align*} x \in L \iff \exists y_1 \in \{0,1\}^{p(|x|)}\, \forall y_2 \in \{0,1\}^{p(|x|)}\, \exists y_3 \in \{0,1\}^{p(|x|)}\, \cdots\, Q_k y_k \in \{0,1\}^{p(|x|)}\, R(x,y_1,\dots,y_k)=1, \end{align*} where the quantifiers alternate and $Q_k$ is determined by parity. [/definition] The existential levels describe claims that begin with a proposed witness and then face alternating challenges and replies. Their complements naturally begin with a universal block, so we need a dual family of classes to track the reversed pattern directly rather than hiding it behind complementation. [definition: Universal Polynomial Hierarchy Levels] For $k \ge 1$, a language $L$ belongs to $\Pi_k^P$ if there exist a polynomial $p$ and a polynomial-time decidable Boolean predicate \begin{align*} R:\{0,1\}^* \times (\{0,1\}^*)^k \to \{0,1\} \end{align*} such that \begin{align*} x \in L \iff \forall y_1 \in \{0,1\}^{p(|x|)}\, \exists y_2 \in \{0,1\}^{p(|x|)}\, \forall y_3 \in \{0,1\}^{p(|x|)}\, \cdots\, Q_k y_k \in \{0,1\}^{p(|x|)}\, R(x,y_1,\dots,y_k)=1, \end{align*} where the quantifiers alternate and $Q_k$ is determined by parity. [/definition] The classes $\Sigma_k^P$ and $\Pi_k^P$ describe nondeterministic alternation, but deterministic algorithms with access to a lower-level oracle also occur naturally. Such algorithms may ask many adaptive questions to a fixed previous level, so they deserve their own notation rather than being forced into a single quantified formula at every use. For instance, computing the lexicographically first satisfying assignment of a formula, and then using that assignment in a later test, is awkward to express as one static certificate: each queried prefix depends on the answers to earlier satisfiability questions. Oracle access records this adaptive search pattern directly. [definition: Deterministic Polynomial Hierarchy Levels] Set $\Delta_1^P = P$. For $k \ge 2$, define \begin{align*} \Delta_k^P = P^{\Sigma_{k-1}^P}. \end{align*} [/definition] The Delta levels capture polynomial-time computation with lower-level subroutines, while the Sigma and Pi levels capture bounded alternation directly. To state global structural results, we need one class containing every finite amount of polynomially bounded alternation and every corresponding deterministic oracle level. [definition: Polynomial Hierarchy] The polynomial hierarchy is \begin{align*} PH = \bigcup_{k \ge 1} \Sigma_k^P = \bigcup_{k \ge 1} \Pi_k^P = \bigcup_{k \ge 1} \Delta_{k+1}^P. \end{align*} [/definition] The equality of these unions follows from the containments between adjacent levels, not from equality of the individual levels. Each level is contained in the next existential and next universal level by adding a dummy quantifier block. [quotetheorem:6224] [citeproof:6224] These containments explain why the three families define the same union, but the theorem does not say that corresponding levels are equal. The hypotheses in each containment are doing real work. For example, $\Delta_{k+1}^P$ allows only polynomially many queries to a fixed $\Sigma_k^P$ oracle; if the number of adaptive queries were not polynomially bounded, the transcript-compression argument would no longer fit into a single polynomial-size quantified formula. Likewise, the duality statement depends on taking complements of decision languages with the same polynomial bounds, so it does not turn a search task such as outputting a satisfying assignment into the decision problem TAUT. The first boundary to understand is therefore not a higher-level phenomenon at all: it is the base of the hierarchy. The containments above will be used recursively later, but that recursion only has the intended meaning if the first existential and universal levels really are the old classes from the course. The next theorem provides this anchor by identifying $\Sigma_1^P$ with NP and $\Pi_1^P$ with $coNP$; without it, the hierarchy notation would not yet be connected to the verifier and complement classes already studied. [quotetheorem:6225] [citeproof:6225] This theorem confirms that the hierarchy extends the classes already studied rather than replacing them. The hypotheses matter in two ways: witnesses must have polynomial length, and the final predicate must be decidable in polynomial time. If either condition is dropped, the definition no longer describes NP-style verification, since an accepting certificate could hide an exponential object or an undecidable computation. The theorem also does not identify the existential and universal first levels with each other. It says $\Sigma_1^P$ is NP and $\Pi_1^P$ is $coNP$, but it gives no reason for $NP=coNP$. A typical boundary case is SAT versus TAUT: SAT asks for one satisfying assignment, while TAUT asks whether every assignment satisfies the formula. The first-level theorem classifies these problems on opposite sides of the first level; it does not collapse the distinction between finding a witness and ruling out every counterexample. The next levels ask whether an NP-type claim remains true under all possible counter-certificates, or whether every challenge admits a polynomial response. [example: A Sigma Two Minimization Certificate] Assume candidates are encoded by strings of length $p(|I|)$, and let $F(I,z)$ mean that $z$ is a feasible certificate for $I$. Suppose feasibility and the comparisons $C(I,z)\le t$ and $C(I,z)<t$ are decidable in polynomial time. Define \begin{align*} R(I,t,w,u) = F(I,w) \wedge \bigl(C(I,w)\le t\bigr) \wedge \bigl(\neg F(I,u)\vee \neg(C(I,u)<t)\bigr). \end{align*} Since $\neg(C(I,u)<t)$ is the same as $C(I,u)\ge t$, the quantified statement \begin{align*} \exists w\,\forall u\,R(I,t,w,u)=1 \end{align*} expands to \begin{align*} \exists w\,\forall u\, \Bigl( F(I,w) \wedge C(I,w)\le t \wedge \bigl(\neg F(I,u)\vee C(I,u)\ge t\bigr) \Bigr). \end{align*} The first two conjuncts do not depend on $u$, so this is equivalent to \begin{align*} \exists w\, \Bigl( F(I,w) \wedge C(I,w)\le t \wedge \forall u\,\bigl(\neg F(I,u)\vee C(I,u)\ge t\bigr) \Bigr). \end{align*} The universal conjunct says that every certificate $u$ is either infeasible or has cost at least $t$, which is exactly the assertion that no feasible solution has cost less than $t$. Taking $u=w$ also gives $C(I,w)\ge t$, because $F(I,w)$ is true; together with $C(I,w)\le t$, the guessed feasible solution has cost exactly $t$. Thus exact optimality is expressed by one existentially guessed solution and one universal block ruling out every better certificate, so the language has the required $\Sigma_2^P$ form. [/example] The minimization example is typical: proving that a value is optimal is harder than proving that some feasible value exists. The universal block is not an implementation detail; it represents the need to rule out every better candidate. ## Completeness for Bounded Quantifier Alternation Which problems are complete for the higher levels in the same way SAT is complete for NP? Bounded quantified satisfiability gives the standard complete family. The proof is the Cook-Levin idea with an alternating verifier rather than a single nondeterministic witness. [quotetheorem:6226] [citeproof:6226] Completeness makes QSAT the reference problem for the hierarchy. It also gives a practical classification method: put the problem into an alternating certificate form, then compare it to the matching QSAT fragment for hardness. The boundedness of the alternation number is essential. For a fixed $k$, the reduction targets formulas with exactly a fixed finite pattern of quantifier blocks, so the problem matches one level of $PH$. If the number of alternations is part of the input, the resulting language is the full TQBF problem, which is PSPACE-complete rather than complete for a fixed hierarchy level. The leading quantifier is also part of the classification: $\exists X\,\forall Y\,\varphi(X,Y)$ is a $\Sigma_2^P$ instance, while $\forall X\,\exists Y\,\varphi(X,Y)$ is its universal-side analogue. The theorem also does not separate levels: $\mathrm{QSAT}_k^\exists$ being complete for $\Sigma_k^P$ means it is a hardest problem within that level under reductions, but it does not prove that $\Sigma_k^P$ differs from $\Pi_k^P$ or from $\Sigma_{k+1}^P$. [example: A Two-Round Strategy Formula] Let $\varphi(X,Y)$ be a quantifier-free Boolean formula, where an assignment to $X$ represents a proposed design and an assignment to $Y$ represents a test scenario. The formula \begin{align*} \exists X\,\forall Y\,\varphi(X,Y) \end{align*} has two quantifier blocks: the first block is existential and the second block is universal. Thus it has exactly the syntactic form \begin{align*} \exists X_1\,\forall X_2\,\theta(X_1,X_2) \end{align*} by taking $X_1=X$, $X_2=Y$, and $\theta=\varphi$. Therefore it is an instance of $\mathrm{QSAT}_2^\exists$, and it expresses that one fixed design $X$ must make $\varphi(X,Y)$ true for every scenario $Y$. By contrast, the formula \begin{align*} \forall Y\,\exists X\,\varphi(X,Y) \end{align*} has two quantifier blocks with leading universal quantifier. It matches the defining pattern \begin{align*} \forall X_1\,\exists X_2\,\theta(X_1,X_2) \end{align*} by taking $X_1=Y$, $X_2=X$, and $\theta=\varphi$. Hence it is an instance of $\mathrm{QSAT}_2^\forall$, expressing that after a scenario $Y$ is fixed, some possibly scenario-dependent design $X$ survives it. The two formulas differ because \begin{align*} \exists X\,\forall Y\,\varphi(X,Y) \end{align*} requires one design to work for all scenarios, while \begin{align*} \forall Y\,\exists X\,\varphi(X,Y) \end{align*} allows the design to change with the scenario. [/example] The order of quantifiers changes the computational meaning. A single robust design is stronger than a scenario-dependent response, and the hierarchy records that distinction syntactically and computationally. ## Collapse of the Hierarchy What would it mean if two adjacent-looking levels turned out to be the same? The hierarchy is built by adding alternations, so equality at one level lets later alternations be compressed. This is the polynomial-hierarchy analogue of the observation that if $P=NP$, then many apparently harder search and verification tasks lose their separation. [quotetheorem:6227] [citeproof:6227] This theorem is one of the main reasons complexity theorists expect strictness throughout the hierarchy. The hypothesis is a same-level equality: it is not enough to know that both $\Sigma_k^P$ and $\Pi_k^P$ sit inside a higher class, because those containments already follow from the definition of the hierarchy. What drives the collapse is the ability to replace a universal $k$th-level test by an existential $k$th-level test without paying another alternation. A boundary case helps separate the statement from nearby facts. The inclusions $NP \subseteq \Sigma_2^P$ and $coNP \subseteq \Sigma_2^P$ do not collapse $PH$, because they do not identify $NP$ with $coNP$. By contrast, the equality $NP=coNP$ is the case $k=1$, and it would collapse $PH$ to NP. The theorem also does not prove that any such equality holds, nor does it prove that the hierarchy is strict; it is a conditional compression result. Since QSAT with more alternations appears to express genuinely richer games, such collapses are viewed as strong and unlikely consequences. [remark: Relation With PSPACE] Every fixed level of $PH$ is contained in PSPACE because a polynomial-space machine can recursively evaluate the polynomially many quantified bits using depth-first search over assignments. Therefore $PH \subseteq PSPACE$. The full language TQBF, with an unbounded number of alternating quantifier blocks, is PSPACE-complete, so the polynomial hierarchy can be viewed as the bounded-alternation part of quantified Boolean reasoning. [/remark] The chapter closes the deterministic and nondeterministic core of the course by showing how NP generalises to structured alternation. The final lecture turns to randomness, where the resource being controlled is not alternation but probability of error. The polynomial hierarchy extends NP by layering quantified choices into a structured tower of classes. The final chapter then shifts from alternation to randomness, asking how probabilistic computation compares with the deterministic and nondeterministic models developed earlier. # 12. Randomized Complexity Classes Randomization gives algorithms a controlled source of uncertainty: instead of asking for the same computation path on every input, we allow the machine to toss coins and then require a high probability of the right answer. This chapter assumes the deterministic and nondeterministic classes $P$ and $NP$, polynomial-time reductions and verifiers, and the basic probability facts needed for independent trials and expectation. It introduces the standard randomized classes that sit near $P$ and $NP$, separating one-sided error, two-sided error, and zero-error computation. The guiding questions are how to define correctness when an algorithm may err, how to reduce the error probability, and how randomized algorithms function as evidence for membership in complexity classes. ## Probabilistic Turing Machines and Error Conventions A deterministic decider has a single computation path on each input. A randomized decider has many possible computation paths, indexed by coin tosses, so the first task is to decide what probability space is being used and what correctness condition is demanded. [definition: Probabilistic Polynomial-Time Turing Machine] A probabilistic polynomial-time Turing machine is a Turing machine $M$ with a read-once random tape whose cells contain independent fair bits. There is a polynomial $p$ such that, for every input $x \in \{0,1\}^*$ and every random string $r \in \{0,1\}^{p(|x|)}$, the computation $M(x;r)$ halts within $p(|x|)$ steps and outputs either accept or reject. [/definition] The random tape turns acceptance into an event. For a fixed input $x$, the expression $\mathbb P_r[M(x;r) \text{ accepts}]$ means the fraction of random strings $r \in \{0,1\}^{p(|x|)}$ that lead to acceptance. The first useful error convention asks for certainty on no-instances and a noticeable chance of success on yes-instances. [definition: One-Sided Error Classes RP and coRP] A language $L \subseteq \{0,1\}^*$ belongs to $RP$ if there is a probabilistic polynomial-time Turing machine $M$ such that: \begin{align*} x \notin L \implies \mathbb P_r[M(x;r) \text{ accepts}] = 0. \end{align*} \begin{align*} x \in L \implies \mathbb P_r[M(x;r) \text{ accepts}] \ge \frac{1}{2}. \end{align*} The class $coRP$ consists of languages $L$ whose complements lie in $RP$. [/definition] The class $RP$ permits false negatives but forbids false positives. A machine for $coRP$ has the opposite behaviour: on yes-instances it always accepts, while on no-instances it rejects with probability at least $1/2$. [example: Random Witness Search] Let $q(n)$ be the prescribed witness length, and for a fixed input $x$ define $A_x=\{w\in \{0,1\}^{q(|x|)} : V(x,w)\text{ accepts}\}$. Consider the randomized algorithm $M$ that samples $w$ uniformly from $\{0,1\}^{q(|x|)}$, runs $V(x,w)$, and accepts exactly when $V(x,w)$ accepts. Since $V$ runs in polynomial time and $q$ is polynomially bounded, $M$ runs in polynomial time. If $x\notin L$, the hypothesis says there are no accepting witnesses, so $A_x=\varnothing$. Hence \begin{align*} \mathbb P_w[M(x;w)\text{ accepts}]=\mathbb P_w[w\in A_x]. \end{align*} Because $w$ is uniform on $\{0,1\}^{q(|x|)}$, this probability is \begin{align*} \mathbb P_w[w\in A_x]=\frac{|A_x|}{2^{q(|x|)}}. \end{align*} Since $A_x=\varnothing$, we have $|A_x|=0$, and therefore \begin{align*} \mathbb P_w[M(x;w)\text{ accepts}]=\frac{0}{2^{q(|x|)}}=0. \end{align*} If $x\in L$, the density hypothesis says that a uniformly sampled witness is accepting with probability at least $1/2$. Equivalently, \begin{align*} \mathbb P_w[V(x,w)\text{ accepts}]=\frac{|A_x|}{2^{q(|x|)}}\ge \frac{1}{2}. \end{align*} Since $M$ accepts exactly in the event that $V(x,w)$ accepts, this gives \begin{align*} \mathbb P_w[M(x;w)\text{ accepts}]\ge \frac{1}{2}. \end{align*} Thus $M$ has no false positives and accepts every yes-instance with probability at least $1/2$, which is exactly the $RP$ correctness condition. A successful random witness certifies membership, while failure to find one only causes rejection on that trial. [/example] The witness-search example shows the benefit of one-sided error when a successful random choice certifies the answer. Some randomized procedures do not have such certificates; their random choices may produce the wrong answer on either side, so the next class allows two-sided error while still demanding a fixed advantage over guessing. [definition: BPP] A language $L \subseteq \{0,1\}^*$ belongs to $BPP$ if there is a probabilistic polynomial-time Turing machine $M$ such that, for every input $x$, \begin{align*} x \in L \implies \mathbb P_r[M(x;r) \text{ accepts}] \ge \frac{2}{3}. \end{align*} \begin{align*} x \notin L \implies \mathbb P_r[M(x;r) \text{ rejects}] \ge \frac{2}{3}. \end{align*} [/definition] The class $BPP$ is the right home for bounded two-sided error, but it still permits wrong answers. In settings where a wrong answer is unacceptable, the natural question is whether randomness can be used only to decide when to stop. This motivates the zero-error class: the computation may have random running time, but every answer it prints must be correct. [definition: Zero-Error Probabilistic Polynomial Time] A language $L \subseteq \{0,1\}^*$ belongs to $ZPP$ if there is a probabilistic algorithm $M$ such that, for every input $x$, $M$ outputs accept or reject, the output is always the correct decision for whether $x \in L$, and the expected running time of $M$ is bounded by a polynomial in $|x|$. [/definition] A zero-error algorithm is often called a Las Vegas algorithm. It may use a random amount of time, but it never gives the wrong answer; the randomness is used to search efficiently for a certificate that tells the algorithm when it is safe to stop. [example: Zero Error from Reliable Halting] Consider a randomized procedure whose rounds are independent. In each round it samples a candidate certificate, runs a polynomial-time check, and either returns the correct answer or reports ``try again''. Let $T$ be the number of rounds until the first answer is returned, and let $p$ be the probability that one round returns an answer. The hypothesis gives $p \ge 1/3$. For $j \ge 0$, the event $T>j$ means that the first $j$ rounds all reported ``try again''. Since the rounds are independent and each round fails to answer with probability $1-p$, \begin{align*} \mathbb P[T>j]=(1-p)^j. \end{align*} Using the tail-sum formula for a positive integer-valued [random variable](/page/Random%20Variable), \begin{align*} \mathbb E[T]=\sum_{j=0}^{\infty}\mathbb P[T>j]. \end{align*} Substituting the expression for $\mathbb P[T>j]$ gives \begin{align*} \mathbb E[T]=\sum_{j=0}^{\infty}(1-p)^j. \end{align*} Because $0<p\le 1$, this geometric series has ratio $1-p$ with absolute value less than $1$, so \begin{align*} \sum_{j=0}^{\infty}(1-p)^j=\frac{1}{1-(1-p)}. \end{align*} The denominator simplifies as $1-(1-p)=p$, hence \begin{align*} \mathbb E[T]=\frac{1}{p}. \end{align*} Since $p\ge 1/3$, taking reciprocals reverses the inequality and gives \begin{align*} \mathbb E[T]=\frac{1}{p}\le 3. \end{align*} If one round takes at most $q(n)$ steps on inputs of length $n$, then the expected running time is bounded by \begin{align*} \mathbb E[T]q(n)\le 3q(n), \end{align*} which is polynomial in $n$. The algorithm has zero error because every halting round returns the correct answer, and every non-answer only triggers another independent trial. [/example] ## One-Sided Error and Nondeterminism The first structural question is how randomized one-sided algorithms compare with nondeterministic verification. If an $RP$ machine accepts a yes-instance with positive probability and never accepts a no-instance, then an accepting random string behaves like a certificate. [quotetheorem:6228] [citeproof:6228] This containment explains why one-sided randomness is weaker than arbitrary nondeterminism in this direction: a random string that succeeds is a conventional witness. Both hypotheses in the $RP$ definition are doing work. If no-instances were allowed to accept with positive probability, then an accepting random string would no longer be a sound $NP$ certificate; for example, an algorithm that accepts every input with probability $1/2$ would have accepting random strings on both sides of any nontrivial language. If yes-instances only had acceptance probability $0$ or exponentially small probability, the proof would still work when at least one accepting string exists, but the resulting class would no longer capture efficient randomized discovery. The next example isolates this density issue: ordinary $NP$ witnesses may exist without being findable by uniform sampling with noticeable probability. [example: Rare Witnesses and the Gap from NP] Let $\varphi$ be a satisfiable Boolean formula on variables $x_1,\dots,x_n$, and suppose exactly one assignment $\alpha \in \{0,1\}^n$ satisfies it. If an assignment $w$ is sampled uniformly from $\{0,1\}^n$, then the sample space has size $2^n$, and the accepting set has size $1$. Therefore the success probability of direct random search is \begin{align*} \mathbb P_w[\varphi(w)=1]=\frac{|\{w\in\{0,1\}^n:\varphi(w)=1\}|}{|\{0,1\}^n|}. \end{align*} Since exactly one assignment satisfies $\varphi$, this becomes \begin{align*} \mathbb P_w[\varphi(w)=1]=\frac{1}{2^n}=2^{-n}. \end{align*} Satisfiability is still verified in polynomial time: a certificate is an assignment $w\in\{0,1\}^n$, and the verifier evaluates $\varphi(w)$ gate by gate or clause by clause and accepts exactly when the formula is true. Thus the formula has an $NP$ witness, namely the unique satisfying assignment. But the uniform sampler finds that witness with probability $2^{-n}$, not with probability at least $1/2$. Therefore this direct search procedure does not satisfy the $RP$ yes-instance condition. The point is that $NP$ only asks for at least one accepting witness, while this kind of $RP$ search needs a noticeable fraction of random strings to be accepting. [/example] The rare-witness example separates ordinary nondeterministic certificates from certificates that random sampling can locate with noticeable probability. Zero-error computation asks for a different combination: it should be able to certify whichever answer is correct, and this suggests running a one-sided algorithm for the language together with a one-sided algorithm for its complement. [quotetheorem:6229] [citeproof:6229] The equality is useful because it gives two interchangeable views of zero-error computation: expected-time algorithms with no mistakes, or simultaneous one-sided algorithms for the language and its complement. The intersection hypothesis is essential. An $RP$ algorithm for $L$ alone can certify yes-instances but may reject forever on no-instances, so it cannot by itself safely output no; symmetrically, a $coRP$ algorithm alone cannot safely output yes. The expected-time condition is also essential in the reverse direction: a zero-error procedure that eventually halts with probability $1$ but has superpolynomial expected running time cannot be truncated to obtain the constant success probability required for $RP$. This is why later membership proofs for $ZPP$ usually present two complementary one-sided tests, since that format makes both certificates and the expected-time bound visible. ## Randomized Algorithms as Membership Proofs To prove membership in $RP$, $coRP$, $ZPP$, or $BPP$, the algorithm must be accompanied by a probability analysis. The input length controls the running time, while the random choices control the error event. [explanation: Anatomy of a Randomized Class-Membership Proof] A randomized membership proof has four ingredients. First, specify the random experiment, including the number of random bits or the finite sample space. Second, prove the algorithm runs in polynomial time for every outcome of the random choices. Third, prove the exact correctness condition on the side where no error is allowed, if the class has one-sided error. Fourth, bound the error probability on the side where mistakes are allowed. For $BPP$, both sides need probability bounds. For $RP$ and $coRP$, the most important step is usually proving that the algorithm never lies on one side of the language. [/explanation] This template is useful because many randomized algorithms are not introduced with complexity-class terminology at first. Polynomial identity testing is the standard example where a randomized algorithm gives strong evidence of efficient decidability, although the deterministic complexity can be more subtle depending on the representation and field. [example: Polynomial Identity Testing over a Finite Sample] Let $F$ be a field and let $f \in F[x_1,\dots,x_m]$ have total degree at most $d$, given by an arithmetic circuit. Choose a finite set $S \subset F$ with $|S| \ge 2d$ and sample $a=(a_1,\dots,a_m)$ uniformly from $S^m$. This size condition requires the sampling field to contain enough elements; over a smaller finite field, one samples from a suitable extension field and evaluates the same circuit there. The algorithm evaluates $f(a)$ and rejects the identity claim exactly when $f(a)\ne 0$. If $f$ is the zero polynomial, then for every $a\in S^m$ we have \begin{align*} f(a)=0. \end{align*} Thus no sampled point satisfies $f(a)\ne 0$, so \begin{align*} \mathbb P_a[\text{the algorithm rejects}]=0. \end{align*} Now suppose $f$ is nonzero over the field from which the sample points are drawn. By *Schwartz-Zippel*, \begin{align*} \mathbb P_a[f(a)=0]\le \frac{d}{|S|}. \end{align*} The rejection event is the complementary event to $f(a)=0$, because the algorithm rejects exactly when $f(a)\ne 0$. Therefore \begin{align*} \mathbb P_a[\text{the algorithm rejects}]=\mathbb P_a[f(a)\ne 0]. \end{align*} Since $f(a)=0$ and $f(a)\ne 0$ are complementary events, \begin{align*} \mathbb P_a[f(a)\ne 0]=1-\mathbb P_a[f(a)=0]. \end{align*} Combining this with the Schwartz-Zippel bound gives \begin{align*} \mathbb P_a[\text{the algorithm rejects}]\ge 1-\frac{d}{|S|}. \end{align*} If $d>0$, the assumption $|S|\ge 2d$ gives \begin{align*} \frac{d}{|S|}\le \frac{d}{2d}=\frac12. \end{align*} Hence \begin{align*} \mathbb P_a[\text{the algorithm rejects}]\ge 1-\frac12=\frac12. \end{align*} If $d=0$ and $f$ is nonzero, then $f$ is a nonzero constant polynomial, so $f(a)\ne 0$ for every $a\in S^m$ and the rejection probability is $1$. Thus true identities are never rejected, while nonzero polynomials are rejected with probability at least $1/2$. [/example] Interpreted as a language, polynomial identity testing asks whether the circuit computes the zero polynomial. The algorithm above is naturally $coRP$-style for the zero-identity language: it always accepts true identities, and it finds evidence against false identities with constant probability. [remark: Representation Matters] The polynomial must be encoded so that evaluation at sampled points can be performed in time polynomial in the input length. Arithmetic circuits and sparse representations support such evaluation directly, while a dense list of all coefficients changes the relevant input size. Complexity-class statements always depend on the chosen encoding, even when the mathematical problem has a compact informal description. PIT is also a meeting point between algebra and complexity theory: the Schwartz-Zippel lemma supplies the algebraic probability bound, while the search for deterministic PIT algorithms is one of the central derandomization problems for arithmetic circuits. [/remark] ## Amplification by Independent Repetition The constant error probabilities in the definitions would be too weak if they were fixed forever. The central technical fact is that independent repetition drives the error down rapidly while preserving polynomial running time. [quotetheorem:6230] [citeproof:6230] The one-sided theorem uses an OR rule because accepting once is a valid certificate, while repeated rejection is only accumulating evidence. Independence is the hypothesis that turns repeated trials into a product bound: if the same random string were reused each time, the rejection probability on a yes-instance could remain $1/2$ no matter how many repetitions are announced. The polynomial bound on $t$ is equally important, since exponentially many repetitions would reduce error but would no longer define a polynomial-time machine. For two-sided error neither answer is automatically certified by a single run, so the amplified algorithm needs a voting rule that treats acceptances and rejections symmetrically. [quotetheorem:6231] [citeproof:6231] The theorem justifies writing $1/3$, $1/10$, or $2^{-100n}$ almost interchangeably in the definition of $BPP$, provided the target error bound is achievable with polynomially many repetitions. The assumptions explain the limits of this robustness. The initial error must be bounded below $1/2$ by a fixed gap, because majority vote cannot amplify an algorithm that is correct with probability exactly $1/2$; it has no bias to amplify. The repetitions must be independent, since perfectly correlated repeated errors would make the majority wrong with the same probability as a single run. Finally, the number of repetitions must stay polynomially bounded, so the theorem gives exponentially small error in any polynomial target parameter but does not license superpolynomial computation. This robustness is one reason $BPP$ is viewed as a stable complexity class rather than a definition tied to an arbitrary constant. [example: Amplifying Error from One Third to Two to the Minus k] Suppose a $BPP$ algorithm has error at most $1/3$ on every input. Choose an odd integer $m$, run the algorithm independently $m$ times, and output the majority answer. For a fixed input, let $Y_i$ be $1$ if the $i$th run is wrong and $0$ otherwise, and put $Y=\sum_{i=1}^m Y_i$. Then \begin{align*} \mathbb E[Y_i]\le \frac13 \end{align*} for each $i$, so by linearity of expectation, \begin{align*} \mathbb E[Y] =\mathbb E\left[\sum_{i=1}^m Y_i\right] =\sum_{i=1}^m \mathbb E[Y_i] \le \sum_{i=1}^m \frac13 =\frac m3. \end{align*} The majority vote is wrong exactly when more than half the runs are wrong, and since $m$ is odd this event is contained in \begin{align*} Y\ge \frac m2. \end{align*} Because $\mathbb E[Y]\le m/3$, the event $Y\ge m/2$ implies \begin{align*} Y-\mathbb E[Y]\ge \frac m2-\frac m3=\frac m6. \end{align*} By the additive *Chernoff-Hoeffding bound* for independent $0$-$1$ random variables, \begin{align*} \mathbb P\left[Y-\mathbb E[Y]\ge \frac m6\right] \le \exp\left(-\frac{2(m/6)^2}{m}\right). \end{align*} The exponent simplifies as \begin{align*} -\frac{2(m/6)^2}{m} =-\frac{2m^2}{36m} =-\frac m{18}, \end{align*} so the amplified error probability is at most \begin{align*} e^{-m/18}. \end{align*} If $m\ge 18k\log 2$, then \begin{align*} -\frac m{18}\le -k\log 2. \end{align*} Since the exponential function is increasing, \begin{align*} e^{-m/18}\le e^{-k\log 2}. \end{align*} Using $e^{\log 2}=2$, we get \begin{align*} e^{-k\log 2}=(e^{\log 2})^{-k}=2^{-k}. \end{align*} Thus the majority-vote algorithm has error at most $2^{-k}$, and only a linear number of independent repetitions in $k$ is needed to obtain $k$ bits of confidence. [/example] ## Randomness and the Polynomial Hierarchy The final question is how randomized polynomial time relates to the deterministic hierarchy built from alternating quantifiers. The complete answer is not known, but the [Sipser-Gacs-Lautemann theorem](/theorems/6232) gives a striking upper bound: $BPP$ lies surprisingly low in the polynomial hierarchy. This result is usually treated as an advanced optional theorem in a first course, so these notes use it as context rather than developing its proof. [remark: Why the Optional Theorem Matters] The containment suggests that randomness does not appear to give access to the full strength of nondeterminism. It also connects derandomization questions with circuit lower bounds and pseudorandomness: if every efficient randomized computation can be simulated deterministically, then $BPP=P$, while the theorem above gives an unconditional but weaker simulation by limited alternation. [/remark] This final chapter leaves three takeaways that complement Chapter 11's alternation-based view of complexity. One-sided error gives certificates when it succeeds, so $RP \subseteq NP$ and $ZPP$ is exactly the overlap of $RP$ and $coRP$. Two-sided error is robust because majority-vote amplification makes the error exponentially small. Randomized algorithms should therefore be read not only as procedures but also as structured complexity-class membership proofs: runtime, sample space, and error analysis are all part of the classification. ## Beyond and Connected Topics This course is the discrete-complexity entry point for several adjacent Androma notes. [Combinatorial Optimisation](/page/Combinatorial%20Optimisation) develops many of the polynomial-time algorithmic structures that sit on the positive side of the $P$ versus $NP$ boundary: augmenting paths, cuts, matchings, matroids, and linear relaxations give concrete examples where certificates and efficient procedures coincide. [Cambridge IB Optimisation](/page/Cambridge%20IB%20Optimisation) is a gentler companion for optimisation algorithms and dual certificates, while more specialised notes on network flows, matching theory, integer programming, and approximation algorithms are natural next stops after the NP-completeness and reduction chapters. The logic-facing chapters also connect outward. The tableau proof of Cook-Levin and the alternation view of $PH$, $PSPACE$, and $IP$ should be compared with [Model Theory I: Foundations](/page/Model%20Theory%20I%3A%20Foundations), where formal languages, structures, satisfiability, compactness, and definability are studied for their own sake. The shared theme is that syntax becomes mathematical evidence: a formula, a tableau, an accepting branch, or a quantified strategy can encode a finite object whose existence has structural consequences. Finally, the randomized chapter points beyond classical decision problems. Polynomial identity testing links complexity theory to [Polynomial Ring](/page/Polynomial%20Ring) and to algebraic algorithms, because the encoding of a polynomial controls whether evaluation is efficient. Derandomization, pseudorandomness, circuit lower bounds, and probabilistically checkable proofs are standard continuations of these notes. Approximation and hardness of approximation form the optimisation-facing continuation: when exact polynomial-time algorithms are unlikely, the next question is what kind of efficiently checkable guarantee can still be proved. ## References - Michael Sipser, *Introduction to the Theory of Computation*. - Sanjeev Arora and Boaz Barak, *Computational Complexity: A Modern Approach*. - Christos Papadimitriou, *Computational Complexity*. - Oded Goldreich, *P, NP, and NP-Completeness: The Basics of Computational Complexity*.

Created by admin on 6/11/2026 | Last updated on 6/11/2026

What brings you to Androma?

Start with a route through the knowledge graph.

Computational Complexity I: P and NP

Sign in to Androma

Check your inbox

One last step

Computational Complexity I: P and NP

Prerequisites (0/1 completed)

Prerequisites Graph

Rate this page