Constructive mathematics studies mathematics in a way that keeps proofs tied to explicit constructions, computations, and verifiable evidence. Instead of treating existence as automatic, the course asks what it means to prove that an object exists, when a statement is meaningful, and how much classical reasoning can be recovered without losing constructive content. The central viewpoint is that proofs are not just arguments about truth values; they also carry computational and conceptual information.
The course begins with the basic logic of constructive reasoning, moving from intuitionistic propositional and predicate logic to semantic tools such as Heyting algebras and Kripke models. From there it develops the relationship between constructive and classical mathematics by studying decidability, omniscience principles, and the ways classical reasoning can be embedded or recovered inside a constructive framework. Later chapters broaden the view to realizability, where proofs are interpreted as programs, and to constructive structures such as sets, functions, metric spaces, and [real numbers](/page/Real%20Numbers), culminating in Bishop-style analysis and the treatment of choice principles, fan theorems, and compactness.
The chapters are arranged to build gradually from logic to mathematics. The first half establishes the language, semantics, and proof principles of constructive reasoning; the second half applies that framework to arithmetic, programs, analysis, and set-theoretic structures. The final chapters compare constructive and classical mathematics directly and then synthesize the course by showing how to build, inspect, and audit constructive proofs with a clear eye toward meaning, computability, and mathematical strength.
# Introduction
Constructive mathematics studies mathematical statements through the constructions that justify them. In this course, the central question is not only whether a statement is true, but what kind of evidence a proof supplies. The subject begins in logic, passes through computation, and then returns to familiar parts of analysis with a different standard for existence, disjunction, and choice.
Classical mathematics often treats truth as independent of our access to a witness or decision procedure. Constructive mathematics keeps track of that access. This does not make the subject weaker as a working discipline; it changes the information carried by theorems and forces proofs to expose algorithms, bounds, approximations, and refutations.
## What Constructive Mathematics Asks For
The first problem is to say what counts as mathematical evidence. A proof of $\exists x\,P(x)$ should give more than the assertion that a witness cannot fail to exist; it should give a way to obtain such an $x$ together with evidence for $P(x)$. A proof of $P \vee Q$ should say which side holds, not only that the two alternatives cannot both fail.
[definition: Constructive Proof]
A constructive proof of a mathematical judgement is a proof whose evidence determines the computational or structural content required by that judgement.
[/definition]
This definition is intentionally schematic, because the exact form of evidence depends on the judgement. For a numerical existence statement the evidence may be an integer and a verification. For an implication it is a transformation from evidence for the hypothesis to evidence for the conclusion. The opening lectures make this precise through the Brouwer-Heyting-Kolmogorov interpretation.
[example: Existence With A Witness]
To prove $\exists n \in \mathbb N\,(n^2 = 25)$ constructively, take the witness $n = 5$. Since $5 \in \mathbb N$, it remains only to verify the predicate $n^2 = 25$ for this chosen value. By the definition of squaring,
\begin{align*}
5^2 = 5 \cdot 5 = 25.
\end{align*}
Thus the evidence consists of the specific natural number $5$ together with the verification that it satisfies the required equation; this is stronger information than merely knowing that nonexistence would lead to a contradiction.
[/example]
The same issue becomes sharper for disjunction. A proof of a disjunction is expected to identify the verified alternative, so constructive disjunction behaves like a mathematical choice carrying information.
[example: Informative Disjunction]
For the statement $2 = 2 \vee 2 = 3$, the constructive proof uses the left introduction rule for disjunction. The required left-side evidence is the reflexivity proof of equality:
\begin{align*}
2 = 2.
\end{align*}
Therefore the full evidence is the left tag together with this equality proof. It does not merely rule out the simultaneous failure of the two alternatives; it identifies the verified alternative, namely $2 = 2$.
[/example]
## Truth, Refutation, And Negation
The next question is how to understand negative statements when truth is tied to evidence. Constructively, a proof of $\neg P$ is a method that turns any proposed evidence for $P$ into contradiction. Negation is therefore not primitive denial; it is implication into absurdity.
[definition: Constructive Negation]
For a proposition $P$, its constructive negation is $\neg P := P \to \bot$, where $\bot$ denotes absurdity.
[/definition]
After this definition, double negation has a different status from classical logic. A proof of $\neg\neg P$ refutes every refutation of $P$, but it need not supply evidence for $P$ itself. The first test case is excluded middle: the direct disjunction may be unavailable, yet its double negation is derivable.
[quotetheorem:7494]
[citeproof:7494]
This theorem explains why constructive mathematics is not a blanket rejection of classical tautologies. Some classical principles survive under double negation, while the undoubled form may require information that is not available. We therefore need a name for propositions where double-negation information can be turned back into direct evidence, since those propositions behave more classically inside constructive arguments.
[definition: Stable Proposition]
A proposition $P$ is stable if there is a proof of $\neg\neg P \to P$.
[/definition]
Stability records when double-negation information can be converted back into direct evidence. The next question is how stability is obtained in practice rather than assumed. Decidable propositions provide the basic mechanism, because a decision procedure separates the case where $P$ holds from the case where a refutation of $P$ is available.
[quotetheorem:7495]
[citeproof:7495]
The result is a useful warning for later analysis. Equality of natural numbers is decidable, so many arithmetic equalities are stable. Equality of arbitrary real numbers need not be decidable in constructive analysis, so arguments that split into $x = 0$ or $x \ne 0$ require justification.
A practical rule is to invoke stability only after identifying the source of the decision or negative translation. If $P$ is decidable, or if the object under discussion comes with an explicit comparison procedure, then using $\neg\neg P \to P$ is legitimate. If no such procedure has been supplied, the step should be treated as a classical assumption needing proof.
## Logic As A Calculus Of Constructions
A third problem is to make the informal talk of evidence precise enough for proofs about proofs. Natural deduction and sequent calculus provide formal systems in which the introduction and elimination rules specify how evidence is built and used.
[definition: Intuitionistic Validity]
A formula is intuitionistically valid if it has a derivation in intuitionistic natural deduction.
[/definition]
The same formulas are derivable in the sequent calculus LJ, which gives a second formal presentation of intuitionistic validity. Natural deduction mirrors the BHK reading: introduction rules build evidence, while elimination rules consume evidence. This raises the central proof-theoretic question for implication: what does a derivation of $P \to Q$ do with evidence for $P$?
[quotetheorem:4634]
[citeproof:4634]
The theorem is about constructive derivations, not about every classically valid implication. A classical proof of $P \to Q$ may use excluded middle or double-negation elimination in a way that does not describe an effective transformation from evidence for $P$ to evidence for $Q$. This limitation is the reason implication becomes a useful diagnostic: to understand what a proof contributes, ask whether it actually transforms input evidence or merely rules out failure. This perspective prepares the later connection with realizability, where the course asks which transformations are computable and how programs can witness formulas.
[example: A Proof As A Program]
To prove $(P \wedge Q) \to P$, assume evidence $u$ for $P \wedge Q$. By the meaning of conjunction evidence, $u$ consists of two components: a first component $p$ witnessing $P$ and a second component $q$ witnessing $Q$. Thus we may write $u=(p,q)$, where $p:P$ and $q:Q$.
The proof transforms the input $u$ by applying the first projection:
\begin{align*}
\pi_1(u)=\pi_1(p,q)=p.
\end{align*}
Since $p$ is evidence for $P$, this gives an operation sending every piece of evidence for $P \wedge Q$ to evidence for $P$. As a program, the proof is exactly the function $u \mapsto \pi_1(u)$; normalizing the proof removes the temporary pairing language and leaves the computation “return the first component.”
[/example]
## Constructive Analysis As Ordinary Analysis With More Data
The final question of the introductory chapter is how this logical discipline affects analysis. Constructive analysis is not obtained by deleting classical theorems from an analysis course. It rebuilds the subject so that convergence, continuity, completeness, and compactness carry quantitative or witness-producing content.
[definition: Constructive Existence In Analysis]
A constructive existence theorem in analysis is an existence theorem whose proof supplies enough data to approximate, compute, or otherwise construct the object asserted to exist.
[/definition]
This notion will reappear specifically in Chapter 7 on metric completions and locatedness, Chapter 8 on Cauchy reals and [uniform continuity](/page/Uniform%20Continuity), and Chapter 10 on compactness principles. The exact data depend on the theorem: sometimes a modulus of convergence is needed, sometimes a choice of approximants, and sometimes a finite covering argument with explicit selections.
[example: Cauchy Data]
A constructive presentation of a real number can be given by a rational sequence $(q_n)_{n=1}^{\infty}$ together with a modulus $\mu:\mathbb N\to\mathbb N$ such that, for every $k\in\mathbb N$, if $m,n\geq \mu(k)$, then
\begin{align*}
|q_m-q_n|\leq 2^{-k}.
\end{align*}
The modulus is usable information because it turns a requested accuracy into an explicit stage of the sequence.
For example, suppose we want all later terms to be within $1/100$ of one another. Since
\begin{align*}
2^7=128
\end{align*}
and $100<128$, taking reciprocals of positive numbers reverses the inequality:
\begin{align*}
\frac{1}{128}<\frac{1}{100}.
\end{align*}
Thus
\begin{align*}
2^{-7}=\frac{1}{128}<\frac{1}{100}.
\end{align*}
By the defining property of $\mu$, whenever $m,n\geq\mu(7)$, we have
\begin{align*}
|q_m-q_n|\leq 2^{-7}<\frac{1}{100}.
\end{align*}
So the modulus tells us the concrete threshold $\mu(7)$ after which the sequence is $1/100$-Cauchy. Later chapters compare this with classical presentations where convergence may be known without such effective control.
[/example]
The course will repeatedly compare Bishop-style arguments with classical arguments. Bishop's approach aims to preserve ordinary mathematical practice while requiring proofs to be meaningful as constructions. This makes the constructive version of a theorem stronger as information, even when its statement resembles the classical statement.
[remark: Constructive Strength]
A constructive theorem often implies its classical counterpart after forgetting computational content. The reverse implication may fail because a classical proof can establish existence without selecting a witness, choosing a side of a disjunction, or giving an effective bound.
[/remark]
This introductory chapter sets the agenda for the course. First we formalize constructive proof through intuitionistic logic and the BHK interpretation. Then we study proof-theoretic metatheorems and computational interpretations. Finally we build constructive analysis, where the logical distinctions become mathematical tools rather than formal restrictions.
The introduction has now set the stage by showing that constructive mathematics is not a restriction for its own sake, but a shift in what a proof is allowed to mean. The next chapter begins that shift in earnest by asking what a proof actually gives us: not mere truth, but mathematical content that can be used, transformed, and extracted.
# 1. Constructive Proof and Mathematical Meaning
Constructive mathematics begins by asking what a proof gives us. In a classical course, a proof of an existential statement may be accepted even when it gives no usable example; in a constructive course, existence is tied to a construction, and a proof is treated as evidence with mathematical content. Building on the introductory contrast between witnesses, disjunctions, and refutations, this chapter fixes the basic vocabulary: judgements, constructions, evidence, the Brouwer-Heyting-Kolmogorov reading of logical connectives, and the special constructive status of negation.
## Judgements, Constructions, and Evidence
The first question is not whether a formula is true in a model, but what it means to know it. Constructive mathematics separates a proposition from the judgement that we possess evidence for it, and this separation is the source of the later proof-theoretic rules.
[definition: Judgement]
A judgement is an assertion made by a mathematician or formal system that some act, construction, or proposition has been established.
[/definition]
A judgement records an act of asserting, but the course also needs a name for the object or method that makes such an assertion legitimate. This is the role of a construction: it is the mathematical content carried by a proof.
[definition: Construction]
A construction of a proposition $P$ is a mathematical object, procedure, or proof-term accepted as evidence for $P$ under the rules governing $P$.
[/definition]
To define existence constructively, we must say what construction is being demanded rather than merely say that nonexistence leads to conflict. Direct existential evidence is the positive witness-carrying form of existence used throughout the course.
[definition: Direct Existential Evidence]
Direct existential evidence for $\exists x \in A\,P(x)$ consists of an element $a \in A$ together with evidence for $P(a)$.
[/definition]
The preceding definition records the positive form of existence. It should not be confused with a proof of $\neg\forall x \in A\,\neg P(x)$, which refutes the claim that every possible witness fails but may not identify any particular witness.
[example: Existence Without a Witness]
Let $R$ be a decidable predicate on $\mathbb{N}$, and suppose we are given
$h : \neg\forall n\in\mathbb{N}\,\neg R(n)$. The classical inference to $\exists n\in\mathbb{N}\,R(n)$ proceeds by contradiction: assume $u : \neg\exists n\in\mathbb{N}\,R(n)$. For an arbitrary $n\in\mathbb{N}$, any evidence $r : R(n)$ would give the existential witness $(n,r)$, so $u(n,r)$ is absurd. Hence $u$ defines evidence for $\neg R(n)$ for each $n$, and therefore defines evidence for $\forall n\in\mathbb{N}\,\neg R(n)$. Applying $h$ to that universal refutation gives absurdity, so the argument has constructed
\begin{align*}
\neg\neg\exists n\in\mathbb{N}\,R(n).
\end{align*}
The last classical step converts this double-negated existence into $\exists n\in\mathbb{N}\,R(n)$. Constructively, however, direct existential evidence must contain an actual numeral $k\in\mathbb{N}$ together with evidence for $R(k)$, while $h$ only says that a uniform refutation of all $R(n)$ is impossible. Thus the data refutes the universal denial, but it does not by itself display a witness.
[/example]
To avoid ambiguity, these notes reserve existential language for evidence that contains a witness. Negative information remains valuable, but it is tracked as refutability or double-negated existence rather than as ordinary existence.
## The BHK Interpretation of Logical Connectives
Once proof is treated as evidence, the next question is how complex propositions inherit evidence from their parts. The Brouwer-Heyting-Kolmogorov interpretation gives clauses for the first-order logical constants by saying what counts as a construction of each compound proposition.
[definition: BHK Evidence]
Let $P$ and $Q$ be propositions, and let $A(x)$ be a proposition depending on $x$ in a domain $D$. BHK evidence is assigned by the following clauses:
A construction of $P \wedge Q$ consists of a construction of $P$ and a construction of $Q$.
A construction of $P \vee Q$ consists of either a construction of $P$ together with the information that the left disjunct was chosen, or a construction of $Q$ together with the information that the right disjunct was chosen.
A construction of $P \to Q$ is a method transforming any construction of $P$ into a construction of $Q$.
A construction of $\forall x \in D\, A(x)$ is a method which, given any $a \in D$, constructs $A(a)$.
A construction of $\exists x \in D\, A(x)$ consists of an element $a \in D$ together with a construction of $A(a)$.
[/definition]
The notation $P \wedge Q$ is used for conjunction throughout these notes; in type-theoretic presentations the same evidence is often described as a product, because a proof of a conjunction is a pair. The disjunction and existential clauses are the most visible difference from classical practice: they require a selected side or a displayed witness.
[remark: BHK Clauses and Natural Deduction]
The BHK clauses are a definitional assignment of evidence conditions, not an additional theorem about propositions. They line up with the introduction rules for first-order intuitionistic natural deduction: conjunction introduction packages two derivations, disjunction introduction records a chosen side, implication introduction discharges an assumption to give a method, universal introduction gives a uniform construction, and existential introduction supplies a witness together with evidence at that witness. The elimination rules are then read as the corresponding ways of unpacking these forms of evidence.
[/remark]
This interpretation is not merely a slogan about intuitionistic logic. It explains why some classical arguments change their meaning when translated constructively: a proof of $P \vee Q$ must decide which branch holds, and a proof of $\exists x\,A(x)$ must carry a witness. It also marks a limitation of the chapter: the clauses explain what evidence should be, while later formal systems will specify derivations that generate such evidence.
[example: Constructive Conjunction and Disjunction]
Let $P$ be the proposition that $6$ is composite and let $Q$ be the proposition that $7$ is prime. Evidence for $P$ is the factorization
\begin{align*}
6 = 2\cdot 3.
\end{align*}
Here $1<2<6$ and $1<3<6$, so this is a nontrivial factorization of $6$.
Evidence for $Q$ is the verification that no positive divisor of $7$ other than $1$ and $7$ exists. If $d$ is a positive divisor of $7$, then $d \leq 7$. The possible values $d=2,3,4,5,6$ fail because
\begin{align*}
7 = 2\cdot 3 + 1.
\end{align*}
\begin{align*}
7 = 3\cdot 2 + 1.
\end{align*}
\begin{align*}
7 = 4\cdot 1 + 3.
\end{align*}
\begin{align*}
7 = 5\cdot 1 + 2.
\end{align*}
\begin{align*}
7 = 6\cdot 1 + 1.
\end{align*}
Thus the only positive divisors left are $1$ and $7$.
A construction of $P \wedge Q$ packages both pieces of evidence: the factorization $6=2\cdot 3$ and the divisor check for $7$. A construction of $P \vee Q$ may instead choose the left branch and carry only the evidence $6=2\cdot 3$ for $P$, together with the left tag. Thus conjunction stores both pieces of evidence, while disjunction stores one chosen piece of evidence plus the information of which side was chosen.
[/example]
The next connective is implication, where the constructive content is often computational. A proof of $P \to Q$ is not a survey of cases in which $P$ and $Q$ have truth values; it is a reusable operation from evidence for $P$ to evidence for $Q$.
[example: Implication as a Transformation]
Let $P$ be the proposition that an integer $n$ is divisible by $4$, and let $Q$ be the proposition that $n$ is even. Evidence for $P$ is an integer $k$ satisfying
\begin{align*}
n = 4k.
\end{align*}
To transform this evidence into evidence for $Q$, define the new integer witness $\ell = 2k$. Since $k \in \mathbb{Z}$ and integers are closed under multiplication, $\ell \in \mathbb{Z}$. The equality $n=4k$ can be rewritten as
\begin{align*}
n = (2\cdot 2)k.
\end{align*}
By associativity of multiplication,
\begin{align*}
(2\cdot 2)k = 2(2k).
\end{align*}
Hence
\begin{align*}
n = 2(2k) = 2\ell.
\end{align*}
Thus $\ell$ is evidence that $n$ is even. The implication $P \to Q$ is therefore the explicit witness-transforming operation $k \mapsto 2k$.
[/example]
Quantifiers extend the same pattern from propositions to families of propositions. The universal quantifier asks for a uniform construction, while the existential quantifier asks for a particular element together with evidence at that element.
[example: Universal and Existential Evidence]
For the proposition $\forall n \in \mathbb{N}\, \exists m \in \mathbb{N}\,(m > n)$, define a method as follows. Given an arbitrary input $n \in \mathbb{N}$, choose
\begin{align*}
m = n+1.
\end{align*}
Since $\mathbb{N}$ is closed under successor, $n+1 \in \mathbb{N}$. To verify the inequality, use the defining order property of successor: $n < n+1$. Rewriting this in the form required by the existential predicate gives
\begin{align*}
n+1 > n.
\end{align*}
Thus the output for the input $n$ is the pair consisting of the witness $m=n+1$ and the evidence that $m>n$.
Because the input $n$ was arbitrary and the same rule $n \mapsto n+1$ works for every $n \in \mathbb{N}$, this is evidence for the universal statement. The example shows that universal evidence is a uniform construction, while existential evidence at each input is the displayed witness together with its verification.
[/example]
## Negation, Refutability, and Double Negation
The BHK clauses do not list negation as primitive, so the next question is how to read a negative statement. Constructively, to deny $P$ is to transform any proposed evidence for $P$ into absurdity; hence negation is a form of refutation rather than a detached assertion of falsity.
[definition: Absurdity]
Absurdity, written $\bot$, is the proposition with no constructions.
[/definition]
Because $\bot$ has no constructions, it can serve as the target of a refuting method. This motivates constructive negation: a negative proposition is the operation that turns any evidence for the proposition being denied into evidence for $\bot$.
[definition: Constructive Negation]
The negation of a proposition $P$ is the implication
\begin{align*}
\neg P := P \to \bot.
\end{align*}
[/definition]
This definition makes negation an implication, so its evidence is a refuting method. It also explains why double negation is weaker than affirmation: a construction of $\neg\neg P$ transforms every refutation of $P$ into absurdity, but it need not construct $P$.
[definition: Refutable Proposition]
A proposition $P$ is refutable if there is a construction of $\neg P$.
[/definition]
Refutability is positive constructive information: it is a method that defeats any claimed proof of $P$. The next rule says that once absurdity has been obtained, any proposition may be inferred; this is compatible with the BHK reading because there are no actual constructions of $\bot$ to transform.
[quotetheorem:7496]
[citeproof:7496]
The rule does not say that absurdity is ever available. It says that if a contradiction has been constructed, then the surrounding assumptions were inconsistent enough to support any conclusion. Its constructive content is therefore conditional: it explains how to use impossible evidence, not how to produce such evidence.
[example: Refuting an Equality]
In arithmetic, a construction of $\neg(0=1)$ is a method which transforms any proposed evidence of $0=1$ into absurdity. Since $1$ is the successor numeral $S(0)$, proposed evidence $e : 0=1$ is the same as proposed evidence
\begin{align*}
e : 0 = S(0).
\end{align*}
The computation rules for natural numbers distinguish the zero constructor from every successor constructor: $0$ is not of the form $S(k)$ for any $k\in\mathbb{N}$. Therefore the equality comparison of the two normal forms has the following two outputs:
\begin{align*}
0 \longmapsto \text{zero constructor}.
\end{align*}
\begin{align*}
1=S(0) \longmapsto \text{successor constructor}.
\end{align*}
An equality proof between numerals must identify equal normal forms, but these two normal forms have different outer constructors. Applying the zero-versus-successor disjointness rule to $e : 0=S(0)$ therefore produces a construction of $\bot$.
Thus the refuting method is explicit: given $e : 0=1$, rewrite $1$ as $S(0)$ and use the disjointness of the natural-number constructors to obtain absurdity. This shows that a negative proposition can carry direct constructive content: it is not merely the absence of evidence, but a procedure that defeats any claimed evidence.
[/example]
We now test double negation on excluded middle. The theorem of double-negated excluded middle is needed because it pinpoints the constructive boundary: the disjunction cannot be refuted, although a branch still has not been chosen.
[quotetheorem:7494]
[citeproof:7494]
This result is a useful diagnostic. Intuitionistic logic can prove that excluded middle cannot be refuted, but it does not thereby accept excluded middle as a general principle.
[example: No Decision Extracted]
Let $P$ be an arbitrary proposition for which no decision procedure has been supplied. The construction from *Double Negated Excluded Middle* can be written explicitly. To construct $\neg\neg(P \vee \neg P)$, assume
\begin{align*}
h : \neg(P \vee \neg P).
\end{align*}
By the definition of negation, this means
\begin{align*}
h : (P \vee \neg P) \to \bot.
\end{align*}
We first construct evidence for $\neg P$. Given temporary evidence $p : P$, the left injection gives evidence
\begin{align*}
\operatorname{inl}(p) : P \vee \neg P.
\end{align*}
Applying $h$ to this evidence gives
\begin{align*}
h(\operatorname{inl}(p)) : \bot.
\end{align*}
Therefore the rule
\begin{align*}
p \mapsto h(\operatorname{inl}(p))
\end{align*}
is evidence for $\neg P$. The right injection then gives evidence
\begin{align*}
\operatorname{inr}(p \mapsto h(\operatorname{inl}(p))) : P \vee \neg P.
\end{align*}
Applying $h$ once more gives
\begin{align*}
h(\operatorname{inr}(p \mapsto h(\operatorname{inl}(p)))) : \bot.
\end{align*}
Thus the whole construction is the method
\begin{align*}
h \mapsto h(\operatorname{inr}(p \mapsto h(\operatorname{inl}(p)))).
\end{align*}
This method defeats any proposed refutation of $P \vee \neg P$, but it never supplies a stable left tag with evidence for $P$ or a stable right tag with independent evidence for $\neg P$. It shows that $P \vee \neg P$ is not refutable, not that $P$ has been decided.
[/example]
## Decidability, Stability, and Classical-Looking Conclusions
The final question in this chapter is when classical reasoning can be recovered constructively. The key case is decidability: if a proposition comes equipped with evidence for $P \vee \neg P$, then double negation elimination is valid for that proposition.
[definition: Decidable Proposition]
A proposition $P$ is decidable if there is a construction of $P \vee \neg P$.
[/definition]
Decidability is not assumed for all propositions. It is additional structure: either evidence for $P$ is supplied, or a refutation of $P$ is supplied, together with the corresponding tag. We now need a term for propositions where double-negated evidence can be converted back into direct evidence.
[definition: Stable Proposition]
A proposition $P$ is stable if there is a construction of $\neg\neg P \to P$.
[/definition]
Stability measures whether the double-negation gap closes for a particular proposition. Decidability gives a concrete way to close that gap, because a decision reduces the problem to a positive branch and a refuting branch.
[quotetheorem:7497]
[citeproof:7497]
This result shows why constructive mathematics often proves classical consequences after establishing decidability. The issue is not that double negation elimination is banned everywhere, but that it must be justified by the structure of the proposition in question. The decidability hypothesis is sufficient rather than necessary: some propositions are stable for other reasons, but this theorem gives the most common route used in elementary constructive arguments.
[example: Stability Without Decidability]
Let $P$ be any proposition. We show that $\neg P$ is stable, so the goal is to construct
\begin{align*}
\neg\neg(\neg P) \to \neg P.
\end{align*}
By the definition of constructive negation, $\neg\neg(\neg P)$ is the same kind of evidence as $\neg\neg\neg P$, namely a method
\begin{align*}
h : \neg\neg\neg P.
\end{align*}
To construct $\neg P$, assume temporary evidence
\begin{align*}
p : P.
\end{align*}
We must produce absurdity. Define a refutation of $\neg P$ by sending any proposed refutation $r : \neg P$ to the absurdity obtained by applying $r$ to $p$:
\begin{align*}
r(p) : \bot.
\end{align*}
Thus the method
\begin{align*}
r \mapsto r(p)
\end{align*}
is evidence for $\neg\neg P$. Applying $h : \neg\neg\neg P$ to this evidence gives
\begin{align*}
h(r \mapsto r(p)) : \bot.
\end{align*}
Therefore the method
\begin{align*}
p \mapsto h(r \mapsto r(p))
\end{align*}
is evidence for $\neg P$.
So from $h : \neg\neg\neg P$ we have constructed evidence for $\neg P$, which proves that $\neg P$ is stable. This does not decide $P$; it only shows that negative propositions admit double-negation elimination, so decidability is a strong sufficient condition for stability rather than its definition.
[/example]
The most familiar decidable propositions come from finite computation, especially equality of numerals. These cases explain why ordinary arithmetic often feels classical even inside a constructive system.
[example: Equality of Natural Numbers is Stable]
For fixed $m,n \in \mathbb{N}$, compute their equality decision by recursion on their normal forms. The comparison begins with the constructor cases
\begin{align*}
0 = 0 \quad\text{with evidence }\operatorname{refl}_0.
\end{align*}
\begin{align*}
0 \neq S(b) \quad\text{because zero and successor have different outer constructors.}
\end{align*}
\begin{align*}
S(a) \neq 0 \quad\text{because successor and zero have different outer constructors.}
\end{align*}
For two successors, reduce the comparison to their predecessors. If evidence $e:a=b$ has been computed, then applying the successor constructor to both sides gives evidence
\begin{align*}
\operatorname{cong}_S(e):S(a)=S(b).
\end{align*}
If instead a refutation $r:\neg(a=b)$ has been computed, then any proposed evidence $q:S(a)=S(b)$ would imply $a=b$ by injectivity of the successor constructor, so
\begin{align*}
q \mapsto r(S\text{-}\operatorname{inj}(q))
\end{align*}
is evidence for $\neg(S(a)=S(b))$. Thus the finite constructor comparison returns either evidence $p:m=n$ or evidence $r:\neg(m=n)$.
Now assume $h:\neg\neg(m=n)$, meaning
\begin{align*}
h:\neg(m=n)\to \bot.
\end{align*}
Run the equality decision for $m$ and $n$. If it returns the left branch $p:m=n$, output $p$. If it returns the right branch $r:\neg(m=n)$, then applying $h$ to $r$ gives
\begin{align*}
h(r):\bot.
\end{align*}
From this absurdity, the eliminator for $\bot$ yields evidence for $m=n$. Therefore the rule taking $h:\neg\neg(m=n)$ to the result of this two-branch procedure constructs
\begin{align*}
\neg\neg(m=n)\to (m=n).
\end{align*}
The stability comes from finite decidability: the computation of normal forms supplies the branch information needed to turn double-negated equality into equality evidence.
[/example]
The last example revisits a standard classical argument about irrational powers. It is useful because the same headline statement has both a non-informative classical proof and an informative constructive proof.
[example: Irrational Power of an Irrational]
The classical split considers $a=(\sqrt{2})^{\sqrt{2}}$ and uses the dichotomy that $a$ is rational or $a$ is irrational. If $a$ is rational, take $x=\sqrt{2}$ and $y=\sqrt{2}$. The number $\sqrt{2}$ is irrational: if $\sqrt{2}=p/q$ with integers $p,q$ in lowest terms and $q\neq 0$, then $p^2=2q^2$, so $p$ is even; writing $p=2r$ gives $4r^2=2q^2$, hence $q^2=2r^2$, so $q$ is even, contradicting that $p/q$ was in lowest terms. Thus $x$ and $y$ are irrational, and
\begin{align*}
x^y=(\sqrt{2})^{\sqrt{2}}=a.
\end{align*}
In this branch, $x^y$ is rational by the branch assumption.
If $a$ is irrational, take $x=a$ and $y=\sqrt{2}$. Then both $x$ and $y$ are irrational. Since $\sqrt{2}>0$, the usual exponent law for positive real bases gives
\begin{align*}
x^y=\bigl((\sqrt{2})^{\sqrt{2}}\bigr)^{\sqrt{2}}=(\sqrt{2})^{\sqrt{2}\cdot\sqrt{2}}.
\end{align*}
Because $\sqrt{2}\cdot\sqrt{2}=2$, this becomes
\begin{align*}
x^y=(\sqrt{2})^2=2.
\end{align*}
Thus each classical branch produces irrational numbers $x,y$ with rational value $x^y$, but the argument does not decide which pair has been constructed.
An informative constructive version supplies the second pair directly. The number $\sqrt{2}$ is algebraic because it is a root of $T^2-2$, and it is irrational by the parity argument above. Also $\sqrt{2}\neq 0$ and $\sqrt{2}\neq 1$. By *Gelfond-Schneider*, $(\sqrt{2})^{\sqrt{2}}$ is transcendental. Every rational number $p/q$ with $q\neq 0$ is algebraic, since it is a root of $qT-p$, so a transcendental number is irrational. Hence
\begin{align*}
x=(\sqrt{2})^{\sqrt{2}}
\end{align*}
is irrational. With $y=\sqrt{2}$, the same exponent calculation gives
\begin{align*}
x^y=\bigl((\sqrt{2})^{\sqrt{2}}\bigr)^{\sqrt{2}}=(\sqrt{2})^2=2.
\end{align*}
The constructive proof therefore gives explicit witnesses $x=(\sqrt{2})^{\sqrt{2}}$ and $y=\sqrt{2}$, rather than only proving that one of two possible witness-pairs must work.
[/example]
The chapter's guiding lesson is that constructive proof records more than truth preservation. Conjunction stores pairs, disjunction stores choices, implication stores transformations, universal quantification stores uniform methods, existential quantification stores witnesses, and negation stores refutations. These meanings will be formalised in the next chapter through natural deduction and sequent calculi for intuitionistic logic.
Chapter 1 turned proofs into evidence, witnesses, and refutations, and it clarified why constructive meaning is stronger than classical validity alone. The next chapter makes these ideas precise by formalizing intuitionistic reasoning in proof systems that can track exactly how information is built.
# 2. Intuitionistic Propositional and Predicate Logic
This chapter turns the BHK proof-meaning principles from Chapter 1 into formal proof systems. The guiding question is how to recognise, transform, and extract information from intuitionistic proofs without appealing to classical principles such as excluded middle. We first give natural deduction rules, then move to the sequent calculus LJ where proof transformations become more algebraic, and finally use normalization and cut elimination to obtain the disjunction and existence properties proved in this chapter.
## Natural Deduction for Intuitionistic Logic
What rules should a formal proof system have if a proof of a compound statement is supposed to be direct evidence for its construction? Natural deduction answers this by assigning introduction and elimination rules to each connective. The introduction rule describes how evidence is built; the elimination rule describes how evidence may be used.
[definition: Intuitionistic Propositional Formula]
An intuitionistic propositional formula is generated from propositional variables $P,Q,R,\dots$ using $\top$, $\bot$, $\land$, $\lor$, and $\to$. Negation is the abbreviation $\neg A := A \to \bot$.
[/definition]
The definition fixes the grammar on which the proof rules operate. The abbreviation for negation matters because intuitionistic refutation is a function from evidence for $A$ to absurdity, so the next example is needed to separate double-negation introduction from double-negation elimination.
[example: Double Negation Introduction]
Let $A$ be any proposition, and suppose first that we have a proof $a$ of $A$. To prove $\neg\neg A$, we must prove $(A \to \bot) \to \bot$, since $\neg A$ abbreviates $A \to \bot$.
Assume temporarily that $h:\neg A$, so $h:A \to \bot$. Applying implication elimination to $h:A \to \bot$ and $a:A$ gives a proof $h(a):\bot$. Therefore the temporary assumption $h:A \to \bot$ has led to $\bot$, so implication introduction discharges $h$ and yields
\begin{align*}
(A \to \bot) \to \bot.
\end{align*}
Using the abbreviation for negation again, this is $\neg\neg A$. Since the construction started from an arbitrary proof of $A$, implication introduction gives $A \to \neg\neg A$.
Thus intuitionistic logic validates double-negation introduction because a proof of $A$ refutes every proposed refutation of $A$; it does not thereby supply a rule turning an arbitrary proof of $\neg\neg A$ back into direct evidence for $A$.
[/example]
The example exhibits a derivation that depends on a temporary assumption and then discharges it. This bookkeeping is not cosmetic: the constructive content of a proof changes when an assumption is still open, because the proof is then conditional on evidence that has not yet been supplied. Natural deduction therefore needs a judgement form that records both the formula being proved and the assumptions on which the proof still depends. The next definition introduces that judgement form.
[definition: Natural Deduction Judgement]
A natural deduction judgement has the form $\Gamma \vdash A$, where $\Gamma$ is a finite context of assumptions and $A$ is a formula. A derivation of $\Gamma \vdash A$ is built using the intuitionistic natural deduction rules for $\top$, $\bot$, $\land$, $\lor$, and $\to$.
[/definition]
The judgement form tells us how to state rules uniformly, but it does not yet say what each connective contributes. The first connective rules are needed because conjunction is the basic case where evidence is assembled from two independent components and later projected.
[definition: Conjunction Rules]
The conjunction rules of intuitionistic natural deduction are:
\begin{align*}
\frac{\Gamma \vdash A \qquad \Gamma \vdash B}{\Gamma \vdash A \land B}
\end{align*}
\begin{align*}
\frac{\Gamma \vdash A \land B}{\Gamma \vdash A} \qquad \frac{\Gamma \vdash A \land B}{\Gamma \vdash B}.
\end{align*}
[/definition]
The conjunction rules show how evidence can contain several components at once. The shared context in the introduction rule is not a harmless typographical choice: if the two premises depended on incompatible open assumptions, the conclusion would not be a proof under a single stated stock of hypotheses. The elimination rules also do less than prove either conjunct from nothing; they only project information already present in a proof of $A \land B$. The next rule group is needed because disjunctive evidence has a different shape: it contains a choice of side, and every use of it must be prepared for either choice.
[definition: Disjunction Rules]
The disjunction rules of intuitionistic natural deduction are:
\begin{align*}
\frac{\Gamma \vdash A}{\Gamma \vdash A \lor B} \qquad \frac{\Gamma \vdash B}{\Gamma \vdash A \lor B}
\end{align*}
\begin{align*}
\frac{\Gamma \vdash A \lor B \qquad \Gamma,A \vdash C \qquad \Gamma,B \vdash C}{\Gamma \vdash C}.
\end{align*}
[/definition]
The disjunction rules make case analysis depend on an available disjunctive proof. The elimination rule requires both branches to prove the same conclusion $C$ from the same ambient assumptions because a consumer of $A \lor B$ is not allowed to know in advance which side will be supplied. If only the $A$-branch were required, a proof of $A \lor B$ obtained from $B$ would still be unusable for deriving $C$; if the branches had different conclusions, no single conclusion could be returned after the case split. The next rule group is needed because implication is not a pair or a tagged choice; it is the formal mechanism by which a temporary assumption becomes a reusable transformation.
[definition: Implication Rules]
The implication rules of intuitionistic natural deduction are:
\begin{align*}
\frac{\Gamma,A \vdash B}{\Gamma \vdash A \to B}
\end{align*}
\begin{align*}
\frac{\Gamma \vdash A \to B \qquad \Gamma \vdash A}{\Gamma \vdash B}.
\end{align*}
[/definition]
The implication rules explain refutation and conditional construction, but they do not license the converse move from $\neg\neg A$ to $A$. That missing rule is the proof-theoretic place where intuitionistic logic separates itself from classical logic: a continuation that turns every refutation of $A$ into absurdity is not the same as direct evidence for $A$. The boundary cases for propositions with no information and impossible information are therefore needed next, especially the controlled use of absurdity through ex falso.
[definition: Truth and Absurdity Rules]
The truth and absurdity rules of intuitionistic natural deduction are:
\begin{align*}
\frac{}{\Gamma \vdash \top}
\end{align*}
\begin{align*}
\frac{\Gamma \vdash \bot}{\Gamma \vdash A}.
\end{align*}
[/definition]
The propositional rules now cover the connectives, but mathematics also quantifies over objects such as numbers and terms. This prepares us to extend the same evidence-based discipline from propositions to first-order formulas with variables.
[definition: First-Order Intuitionistic Formula]
A first-order intuitionistic formula is generated from atomic formulas $R(t_1,\dots,t_n)$ using the intuitionistic propositional connectives and the quantifiers $\forall x$ and $\exists x$. Free and bound variables are determined by the usual binding rules for quantifiers. For a formula or finite context $E$, write $FV(E)$ for the set of variables occurring free in $E$.
[/definition]
The definition introduces quantifiers, whose constructive meanings are not symmetric. The next rules are needed because universal evidence must be uniform in the quantified variable, and the side condition must prevent dependence on a special hidden choice.
[definition: Universal Quantifier Rules]
The following rules are valid in intuitionistic first-order natural deduction:
\begin{align*}
\frac{\Gamma \vdash A(y)}{\Gamma \vdash \forall x\,A(x)} \quad (y \notin FV(\Gamma))
\end{align*}
\begin{align*}
\frac{\Gamma \vdash \forall x\,A(x)}{\Gamma \vdash A(t)}.
\end{align*}
[/definition]
The universal rules show how to prove statements for arbitrary inputs, and the side condition is essential. If $y$ were allowed to occur freely in an undischarged assumption, then from the assumption $P(y)$ one could infer $\forall x\,P(x)$, turning information about a particular object into a uniform proof about every object. The rules also do not say that every universal statement has a closed proof; they say only how a proof may be introduced once its derivation is independent of the chosen name. The next rule group is needed because existential evidence has the opposite computational shape: it must contain a witness together with a proof about that witness.
[definition: Existential Quantifier Rules]
The following rules are valid in intuitionistic first-order natural deduction:
\begin{align*}
\frac{\Gamma \vdash A(t)}{\Gamma \vdash \exists x\,A(x)}
\end{align*}
\begin{align*}
\frac{\Gamma \vdash \exists x\,A(x) \qquad \Gamma,A(y) \vdash C}{\Gamma \vdash C} \quad (y \notin FV(\Gamma,C)).
\end{align*}
[/definition]
The existential rules explain why proofs of existence are expected to carry data, and again the eigenvariable condition is doing real work. If the witness name $y$ were allowed to occur freely in the conclusion $C$, then from $\exists x\,P(x)$ and the local assumption $P(y)$ one could conclude $P(y)$ outside the scope where $y$ was introduced, pretending that an arbitrary fresh name is the actual witness. The rule permits using the witness locally, but it does not permit the final conclusion to depend on its private name. A useful test case for the whole natural deduction system is Peirce's law, $((P \to Q) \to P) \to P$, because it is classically derivable from excluded middle but has no intuitionistic derivation.
[example: Peirce Law from Excluded Middle]
Assume $P \lor \neg P$, and let $h:(P \to Q)\to P$. We show that $P$ follows by [disjunction elimination](/theorems/4625) on the assumed proof of $P \lor \neg P$.
In the left branch, assume $p:P$. Then the branch conclusion is exactly $P$, witnessed by $p$.
In the right branch, assume $r:\neg P$. Since $\neg P$ abbreviates $P \to \bot$, we have $r:P \to \bot$. To build a proof of $P \to Q$, temporarily assume $p:P$. Implication elimination applied to $r:P \to \bot$ and $p:P$ gives $r(p):\bot$. By the absurdity elimination rule, from $r(p):\bot$ we obtain a proof of $Q$. Discharging the temporary assumption $p:P$ by implication introduction gives a function
\begin{align*}
P \to Q.
\end{align*}
Call this function $f:P \to Q$. Now implication elimination applied to $h:(P \to Q)\to P$ and $f:P \to Q$ gives
\begin{align*}
h(f):P.
\end{align*}
Both branches of the case split on $P \lor \neg P$ therefore derive the same conclusion $P$, so disjunction elimination yields $P$ from the assumptions $P \lor \neg P$ and $h:(P \to Q)\to P$. Discharging $h$ gives
\begin{align*}
((P \to Q)\to P)\to P.
\end{align*}
Thus excluded middle supplies a proof of Peirce's law, and the proof depends exactly on the non-constructive availability of the case split $P \lor \neg P$.
[/example]
This example isolates the issue: intuitionistic logic does not reject case analysis, but it requires evidence for the disjunction being analysed. Without a proof of $P \lor \neg P$, the classical derivation of Peirce's law has no constructive starting point.
## The Sequent Calculus LJ
Natural deduction follows the shape of constructions, but proof transformations can be hard to control locally. The sequent calculus LJ reorganises proofs around sequents, with left rules explaining how to use assumptions and right rules explaining how to prove conclusions. Its defining intuitionistic feature is that each sequent has at most one formula on the right.
[definition: Intuitionistic Sequent]
An intuitionistic sequent has the form $\Gamma \Rightarrow A$, where $\Gamma$ is a finite list of formulas and the succedent is either empty or consists of a single formula $A$.
[/definition]
The list convention makes order and repeated assumptions visible. This is important for the structural theorem below: exchange and contraction are not already hidden in the representation of contexts. The single-conclusion restriction prevents LJ from building classical reasoning into the proof system, and we now specify the proof system whose derivations will later be normalized.
[definition: Sequent Calculus LJ]
Restricted LJ is the sequent calculus for intuitionistic logic whose sequents use finite list contexts, whose succedent has at most one formula, and whose primitive rules consist of identity and the logical left and right rules for $\top$, $\bot$, $\land$, $\lor$, $\to$, $\forall$, and $\exists$. The primitive rules do not include left weakening, left contraction, or left exchange.
[/definition]
The definition names the calculus used for the admissibility arguments below. Some presentations add left weakening, contraction, and exchange as primitive structural rules; here they are kept out of the definition so that their admissibility is a genuine metatheorem. The distinction between left and right rules is the practical reason LJ is useful, and the next example shows how a connective is introduced on the right and consumed on the left.
[example: Left and Right Rules for Implication]
The implication right rule is the rule instance
\begin{align*}
\frac{\Gamma,A \Rightarrow B}{\Gamma \Rightarrow A \to B}.
\end{align*}
Thus a derivation of $\Gamma \Rightarrow A \to B$ is obtained by first deriving $B$ with one additional antecedent assumption $A$, and then moving that open assumption into the succedent as the implication $A \to B$.
The implication left rule is the rule instance
\begin{align*}
\frac{\Gamma \Rightarrow A \qquad \Delta,B \Rightarrow C}{\Gamma,\Delta,A \to B \Rightarrow C}.
\end{align*}
Here the first premise supplies the antecedent $A$, while the second premise says that once $B$ has been obtained, the derivation may continue to $C$ in the context $\Delta$. The conclusion records both pieces of information in one antecedent list: the assumptions from $\Gamma$, the assumptions from $\Delta$, and the available implication $A \to B$. Thus LJ separates proving an implication on the right from applying an implication on the left.
[/example]
The example makes context manipulation visible: formulas may need to be inserted, duplicated, or reordered as a derivation is transformed. This prepares us to distinguish rules that are built into a presentation from rules that can always be simulated.
[definition: Admissible Rule]
A rule is admissible for a proof system if, whenever its premises are derivable, its conclusion is also derivable.
[/definition]
Admissibility is weaker than being a primitive rule, but it is enough for using the rule in metatheoretic arguments. This matters because we need ordinary management of assumptions must be compatible with LJ even when the rules are treated as derived operations.
[quotetheorem:7498]
[proofunderconstruction:7498]
Structural admissibility says that ordinary context manipulations do not add logical strength in this restricted list presentation. The hypotheses are doing separate work. Weakening is needed even for small derivations: from identity $A \Rightarrow A$ we often need the sequent $B,A \Rightarrow A$ in order to use the derivation inside a larger context, but this insertion is not a primitive rule in restricted LJ. Exchange is separate because list order matters syntactically: a rule may deliver $B,A \Rightarrow C$ while the next inference expects $A,B \Rightarrow C$, and the admissibility theorem is what permits the adjacent swap rather than silently treating the two antecedents as the same list. Contraction is separate again: if contraction were primitive, deleting a repeated occurrence would be available by fiat rather than as a consequence of the logical rules; for instance, the derivation shape using two occurrences of $A \to B$ in an implication-left step would no longer test whether the proof can be reorganised with one occurrence. If antecedents were finite multisets, adjacent exchange would not be a rule to prove, since the order of entries would already have been quotiented out. The theorem also does not say that contexts may be treated as sets inside every proof transformation: occurrences still matter during the induction, and a formula may be inserted, swapped, or contracted only through an admissible reconstruction of the derivation. This prepares for cut, where the context bookkeeping is coupled to an intermediate formula rather than only to the order and multiplicity of assumptions.
[definition: Cut Rule]
The cut rule is
\begin{align*}
\frac{\Gamma \Rightarrow A \qquad \Delta,A \Rightarrow C}{\Gamma,\Delta \Rightarrow C}.
\end{align*}
[/definition]
The cut rule corresponds to using a separately proved lemma $A$, so it is useful for building derivations. This matters because final normal forms should not require such intermediate lemmas; removing them is the proof-theoretic source of computational information.
[quotetheorem:4645]
[citeproof:4645]
Cut elimination is the main technical engine of the chapter because it removes formulas introduced only as intermediate lemmas. A concrete obstruction is a derivation of $\Rightarrow B \lor C$ that first proves an unrelated lemma $L$, then cuts $L$ into a second derivation ending in $B \lor C$; before cut elimination, the formula $L$ may occur in the proof even though it is not part of the final sequent. The hypotheses are essential: allowing multiple formulas in the succedent moves to the classical sequent calculus, where the last step of a cut-free proof of a disjunction need not reveal a chosen constructive disjunct in the same way, and allowing cuts leaves unrelated lemmas syntactically present. The theorem also does not say that mathematicians should never use lemmas, nor that the cut-free proof is shorter; cut elimination may greatly expand a derivation. Its value here is that an arbitrary proof can be transformed into one whose formulas are controlled, and the next theorem makes that control precise.
[quotetheorem:4646]
[citeproof:4646]
The subformula theorem explains why cut-free proofs are computationally informative. The cut-free hypothesis is indispensable: with cut, a proof of $\Rightarrow P \to P$ could detour through a proved lemma such as $(Q \to Q)$ and then cut it away, leaving a formula involving $Q$ inside the derivation although $Q$ occurs neither in the antecedent nor in the succedent. The propositional hypothesis in the first sentence is also real; in first-order logic, a proof using $\forall x\,R(x)$ may contain the instance $R(S0)$, which is not a literal printed subformula of the endsequent but is a substitution instance of its quantified body. The theorem does not say that proof search is finite in first-order logic, since term instantiations may still be unbounded, and it does not identify which cut-free derivation should be chosen. What it supplies is the syntactic invariant needed for the disjunction and existence arguments: once unrelated formulas are absent, the final introduction rule of a closed normal proof can be read as data.
## Disjunction and Existence Properties
The formal systems above are not just alternative presentations of intuitionistic logic; they explain why constructive proofs contain information. The next question is what can be read from a closed proof of a disjunction or an existential statement.
[definition: Closed Derivation]
A closed derivation of a formula $A$ is a derivation of $\vdash A$ in natural deduction or a derivation of $\Rightarrow A$ in LJ.
[/definition]
The definition excludes hidden assumptions, which matters because assumptions may already contain the information later extracted. This matters because we need to show that a closed proof of a disjunction contains a closed proof of one chosen side.
[quotetheorem:7499]
[citeproof:7499]
This theorem is a precise version of the BHK clause for disjunction. The closedness hypothesis is essential: from the open assumption $A \lor B$ one can derive $A \lor B$ without having a proof of $A$ or a proof of $B$. The theorem also does not decide which disjunct is derivable from the formula alone; the chosen side is found by inspecting a normalized proof. This is exactly the Curry-Howard reading of a disjunction proof as tagged data, and the next example shows why unrestricted excluded middle would break that reading.
[example: Why Excluded Middle Would Destroy the Disjunction Property]
Suppose, for contradiction, that intuitionistic propositional logic proved $P \lor \neg P$ with no assumptions, where $P$ is an arbitrary propositional variable. By the *[Disjunction Property for Intuitionistic Propositional Logic](/theorems/7499)*, a closed proof of $P \lor \neg P$ would force one of the following alternatives:
\begin{align*}
\vdash P
\end{align*}
or
\begin{align*}
\vdash \neg P.
\end{align*}
Using the abbreviation $\neg P := P \to \bot$, the second alternative is
\begin{align*}
\vdash P \to \bot.
\end{align*}
Neither alternative is available for an arbitrary variable. In a one-node Kripke model where $P$ is not forced, the empty context does not force $P$, so soundness rules out $\vdash P$. In a one-node Kripke model where $P$ is forced, forcing $P \to \bot$ would require that every extension forcing $P$ also force $\bot$; the node itself forces $P$ and does not force $\bot$, so it does not force $P \to \bot$, and soundness rules out $\vdash P \to \bot$.
Thus a uniform intuitionistic proof of $P \lor \neg P$ would make the disjunction property demand a closed proof of either $P$ or its refutation, but arbitrary propositional variables provide neither. This is exactly why unrestricted excluded middle is not intuitionistically valid.
[/example]
The example connects a metatheorem to a familiar rejected classical principle. This matters because we need the same distinction for existential statements behave in the same way, except that the extracted information is a witness term rather than a selected disjunct.
[quotetheorem:7500]
[proofunderconstruction:7500]
The existence theorem should be read as a metatheorem about formal proofs, not as an algorithm that solves every predicate without being given a proof. The formal setting is part of the statement. If the theory had an added axiom $\exists x\,P(x)$ for a new predicate symbol $P$ with no witness term in the proof rules, the conclusion could fail: the axiom would prove an existential statement without supplying a numeral $k$ together with a derivation of $P(\bar{k})$. If the language had extra closed number terms without a canonicity theorem, the last existential introduction might supply a closed term whose numeral value is not available by the stated computation rules. The theorem also does not say that the extracted witness is efficient to compute, unique, or semantically minimal; different proofs of the same existential theorem may normalize to different numerals. The point is narrower and stronger: in ordinary $HA$ with its canonical closed number terms, a normalized proof contains a witness that can be read off, connecting LJ normalization to the Curry-Howard view of proofs as programs and to the witness extraction performed by proof assistants.
[example: Normalizing an Existential Proof to Extract a Witness]
Suppose a normalized arithmetic derivation proves $\exists n\,P(n)$ and its last inference is existential introduction with premise a derivation of $P(\bar{3})$. Since $\bar{3}$ denotes $SSS0$, the final step has the form
\begin{align*}
\frac{\vdash P(SSS0)}{\vdash \exists n\,P(n)}.
\end{align*}
The witness exposed by the normalized proof is therefore $\bar{3}$, and the displayed premise is already the required proof of $P(\bar{3})$.
If the original proof introduced the closed term $\bar{1}+\bar{2}$ instead, the arithmetic defining equations for addition compute it to the same numeral. Since $\bar{1}=S0$ and $\bar{2}=SS0$,
\begin{align*}
\bar{1}+\bar{2}=S0+SS0.
\end{align*}
By the recursive equation $Sx+y=S(x+y)$ for addition,
\begin{align*}
S0+SS0=S(0+SS0).
\end{align*}
By the defining equation $0+y=y$,
\begin{align*}
0+SS0=SS0.
\end{align*}
Substituting this equality into the previous line gives
\begin{align*}
S(0+SS0)=S(SS0)=SSS0=\bar{3}.
\end{align*}
Thus the normalized final existential introduction can be read as supplying the numeral $\bar{3}$ together with a proof of $P(\bar{3})$, even when the pre-normalized proof used the closed term $\bar{1}+\bar{2}$.
[/example]
The disjunction and existence properties summarize the proof-theoretic force of intuitionistic logic. Natural deduction gives the local meaning of the connectives, LJ supplies a normal form theory through cut elimination, and the normal forms expose the choices and witnesses that constructive proofs are required to contain.
Chapter 2 established the proof-theoretic machinery: natural deduction, sequent calculi, and normal forms that expose the constructive content of proofs. The next chapter asks what these same principles look like semantically, through models that interpret intuitionistic truth without collapsing it into two-valued classical logic.
# 3. Semantics: Heyting Algebras and Kripke Models
This chapter gives semantic models for the natural deduction and LJ proof systems developed in Chapter 2. The central question is how implication and negation can be interpreted without treating truth as a fixed two-valued matter. Heyting algebras give the algebraic semantics, while Kripke models give a dynamic semantics in which information grows over time.
The bridge between the two viewpoints is persistence: once a constructive proof has been obtained, later information cannot invalidate it. Algebraically this appears through order-preserving operations and relative pseudocomplements. In Kripke semantics it appears as monotone forcing along an accessibility order.
## Why Implication Needs New Semantics
Classically, an implication $A \to B$ is often identified with the truth table for $\neg A \vee B$. Constructively this loses information, since a proof of $A \to B$ is a method transforming any proof of $A$ into a proof of $B$. A semantic model for intuitionistic logic must therefore interpret implication as a uniform transformation principle rather than as a truth-functional abbreviation.
The algebraic setting begins with a lattice of truth values ordered by strength of information. Meets represent simultaneous evidence, joins represent alternative evidence, and implication is the best truth value which, together with evidence for the antecedent, yields evidence for the consequent. This asks for a lattice in which implication is not primitive decoration but is determined by its interaction with meet.
[definition: Heyting Algebra]
A Heyting algebra is a bounded lattice $(H, \wedge, \vee, 0, 1)$ equipped with a map
\begin{align*}
\to &: H \times H \to H
\end{align*}
such that for all $a,b,c \in H$,
\begin{align*}
a \wedge b \le c \quad \iff \quad a \le b \to c.
\end{align*}
[/definition]
The defining adjunction says that $b \to c$ is the largest amount of information which turns $b$-information into $c$-information when combined by meet. This is the algebraic form of the BHK clause for implication. Since constructive negation is refutation rather than Boolean complementation, the next question is how to express refutation using only implication and absurdity.
[definition: Negation In A Heyting Algebra]
Let $H$ be a Heyting algebra. Negation is the map
\begin{align*}
\neg &: H \to H, & a &\mapsto a \to 0.
\end{align*}
[/definition]
Negation is therefore implication into absurdity. This makes refutation a positive operation: to refute $a$ is to transform evidence for $a$ into the bottom element. The order-theoretic content of this operation is best described by the largest element incompatible with $a$, which leads to pseudocomplements.
[definition: Pseudocomplement]
Let $L$ be a bounded lattice. A pseudocomplement of $a \in L$ is an element $a^* \in L$ such that $a \wedge a^* = 0$ and, for every $b \in L$, if $a \wedge b = 0$, then $b \le a^*$.
[/definition]
In a Heyting algebra, $\neg a$ is the pseudocomplement of $a$. This explains why $a \wedge \neg a = 0$ is valid while $a \vee \neg a = 1$ need not hold: maximal disjointness is not the same as complementarity. To see this distinction in a familiar setting, we now use open sets as truth values.
[example: Open Subsets Of The Real Line]
Let $H$ be the set of open subsets of $\mathbb R$, ordered by inclusion. Meets are intersections, joins are unions, $0=\varnothing$, and $1=\mathbb R$. For open sets $U,V \subset \mathbb R$, define
\begin{align*}
U \to V := \operatorname{int}\big((\mathbb R \setminus U) \cup V\big).
\end{align*}
We verify the Heyting implication condition by showing that this is exactly the largest [open set](/page/Open%20Set) $W$ with $W \cap U \subseteq V$.
Suppose first that $W$ is open and $W \cap U \subseteq V$. If $x \in W$, then either $x \notin U$, in which case $x \in \mathbb R \setminus U$, or $x \in U$, in which case $x \in W \cap U \subseteq V$. Hence
\begin{align*}
W \subseteq (\mathbb R \setminus U) \cup V.
\end{align*}
Since $W$ is open, containment in this set implies containment in its interior:
\begin{align*}
W \subseteq \operatorname{int}\big((\mathbb R \setminus U) \cup V\big)=U \to V.
\end{align*}
Conversely, if $W \subseteq U \to V$, then $W \subseteq (\mathbb R \setminus U) \cup V$. For $x \in W \cap U$, we have $x \in U$ and $x \in (\mathbb R \setminus U) \cup V$; the alternative $x \in \mathbb R \setminus U$ is impossible, so $x \in V$. Therefore
\begin{align*}
W \cap U \subseteq V.
\end{align*}
Thus $W \cap U \subseteq V$ holds exactly when $W \subseteq U \to V$, which is the required adjunction for implication.
Now take $U=(0,1)$. Since negation is implication into $0=\varnothing$,
\begin{align*}
\neg U = U \to \varnothing = \operatorname{int}\big(\mathbb R \setminus (0,1)\big).
\end{align*}
The complement is
\begin{align*}
\mathbb R \setminus (0,1)=(-\infty,0]\cup[1,\infty).
\end{align*}
Its interior is
\begin{align*}
\operatorname{int}\big((-\infty,0]\cup[1,\infty)\big)=(-\infty,0)\cup(1,\infty).
\end{align*}
Therefore
\begin{align*}
\neg(0,1)=(-\infty,0)\cup(1,\infty).
\end{align*}
The meet with $U$ is empty:
\begin{align*}
(0,1)\cap\big((-\infty,0)\cup(1,\infty)\big)=\varnothing.
\end{align*}
But the join is
\begin{align*}
(0,1)\cup\big((-\infty,0)\cup(1,\infty)\big)=\mathbb R \setminus \{0,1\}.
\end{align*}
So $U \vee \neg U \ne 1$, because the boundary points $0$ and $1$ are missing. This shows concretely that intuitionistic negation gives the largest open region disjoint from $U$, not a Boolean complement of $U$.
[/example]
The example shows the geometric content of intuitionistic negation. A point on the boundary of $U$ is not inside $U$, but every neighbourhood of it meets $U$, so no open neighbourhood gives stable evidence against $U$. Having introduced the algebraic operations, we now need to verify that formal intuitionistic derivations always evaluate to the top truth value in every Heyting algebra.
[quotetheorem:7501]
[citeproof:7501]
This theorem says that Heyting algebras are sound truth-value structures for intuitionistic proof. The adjunction in the definition of implication is essential: if a bounded lattice is equipped with an arbitrary binary operation, for instance $a \to b=0$ for all $a,b$, then the derivable formula $P \to P$ can evaluate below $1$. The requirement that $v$ respect the recursive interpretation of connectives is also essential, since soundness is a statement about formulas built from the algebraic operations, not about unrelated assignments to compound expressions. The theorem does not say that every formula with value $1$ in one chosen Heyting algebra is derivable; it only gives preservation of derivability in every such algebra. To refute a proposed intuitionistic principle, it suffices to find a Heyting algebra and a valuation under which that formula has value below $1$.
## Kripke Frames And Persistent Truth
The algebraic semantics compresses information into abstract truth values. Kripke semantics instead keeps the stages of information visible. The guiding question is: if a proposition has been established at a stage of knowledge, what must remain true as the stage is extended?
A semantic stage needs only an ordering of possible information states. The order is not temporal in the physical sense; it records extension of evidence. This motivates the frame on which forcing will be defined.
[definition: Kripke Frame]
A Kripke frame is a partially ordered set $(K, \le)$ whose elements are called nodes or worlds.
[/definition]
The order is read informationally: $w \le u$ means that $u$ is a possible future extension of the information available at $w$. Truth must be monotone along this order. Before compound formulas can be interpreted, we must specify which atomic propositions are already forced at each node and require that atomic truth persists.
[definition: Kripke Model]
A Kripke model for propositional logic is a triple $(K, \le, \Vdash_{\mathrm{At}})$ where $(K,\le)$ is a Kripke frame and
\begin{align*}
\Vdash_{\mathrm{At}} \subseteq K \times \mathrm{At}
\end{align*}
is an atomic forcing relation. The persistence condition is that, for every atomic proposition $P \in \mathrm{At}$ and all $w,u \in K$,
\begin{align*}
w \le u \text{ and } w \Vdash_{\mathrm{At}} P \quad \implies \quad u \Vdash_{\mathrm{At}} P.
\end{align*}
[/definition]
Equivalently, the atomic data is a map from atomic propositions to upward-closed subsets of $K$. The atomic forcing relation gives the base layer of the model, but logic requires recursive clauses for all formulas. These clauses must preserve the constructive readings from the previous chapters, especially the reading of implication as a method that survives every future extension. This motivates the full forcing definition.
[definition: Forcing For Propositional Formulas]
Let $(K,\le,\Vdash_{\mathrm{At}})$ be a Kripke model. The forcing relation on all formulas is the relation
\begin{align*}
\Vdash \subseteq K \times \mathrm{Form}
\end{align*}
defined recursively as follows. For atomic propositions $P \in \mathrm{At}$,
\begin{align*}
w \Vdash P &\iff w \Vdash_{\mathrm{At}} P.
\end{align*}
For compound formulas,
\begin{align*}
w \Vdash A \wedge B &\iff w \Vdash A \text{ and } w \Vdash B,
\end{align*}
\begin{align*}
w \Vdash A \vee B &\iff w \Vdash A \text{ or } w \Vdash B,
\end{align*}
\begin{align*}
w \Vdash A \to B &\iff \text{for all } u \ge w,\ u \Vdash A \implies u \Vdash B,
\end{align*}
\begin{align*}
w \Vdash \bot &\iff \text{false}.
\end{align*}
Negation is defined by $\neg A := A \to \bot$.
[/definition]
The implication clause quantifies over all future nodes because a proof of $A \to B$ must work after any later refinement of information. This clause would be unusable if compound forcing could fail to persist. This matters because we need to show that the recursive forcing clauses preserve monotonicity from atoms to all formulas.
[quotetheorem:7502]
[citeproof:7502]
Persistence is the semantic counterpart of the constructive principle that proofs do not expire. Atomic monotonicity is needed for the induction to start: if $w \le u$, $w \Vdash P$, and $u \nVdash P$ were allowed, persistence would already fail for the atomic formula $P$. The future-node quantifier in the implication clause is also needed, since implication must remain valid after every information extension. The theorem does not say that truth moves backwards from $u$ to $w$; a formula may become forced only after more information is added. It gives a practical test for countermodels: an atom may become true later, but it cannot be made false after it has become true. The simplest countermodel uses exactly this possibility, with $P$ undecided now and true later.
[example: Two Node Refutation Of Excluded Middle]
Let $K=\{w,u\}$ with $w<u$, and define atomic forcing for $P$ by
\begin{align*}
u \Vdash_{\mathrm{At}} P \quad \text{and} \quad w \nVdash_{\mathrm{At}} P.
\end{align*}
This atomic assignment is persistent: the only nontrivial extension from a node forcing $P$ is $u \le u$, and $u \Vdash_{\mathrm{At}} P$ remains true there. By the atomic forcing clause, this gives
\begin{align*}
w \nVdash P.
\end{align*}
We next compute negation at $w$. Since $\neg P$ means $P \to \bot$, the implication clause says that $w \Vdash \neg P$ would mean: for every $v \ge w$, if $v \Vdash P$, then $v \Vdash \bot$. The node $u$ is such a future node, since $u \ge w$, and by construction
\begin{align*}
u \Vdash P.
\end{align*}
But the forcing clause for $\bot$ is false at every node, so
\begin{align*}
u \nVdash \bot.
\end{align*}
Thus the required universal condition for $w \Vdash P \to \bot$ fails at $v=u$, and therefore
\begin{align*}
w \nVdash \neg P.
\end{align*}
Finally, by the forcing clause for disjunction,
\begin{align*}
w \Vdash P \vee \neg P \quad \text{iff} \quad w \Vdash P \text{ or } w \Vdash \neg P.
\end{align*}
Both alternatives have just failed, so
\begin{align*}
w \nVdash P \vee \neg P.
\end{align*}
This two-node model shows that $P$ may be undecided at the present node even though a future extension forces $P$, so excluded middle is not forced at $w$.
[/example]
This countermodel captures the constructive distinction between not yet having a proof of $P$ and having a refutation of $P$. At $w$, the information is compatible with a future proof of $P$, so $\neg P$ cannot be forced. The same model also separates double-negation stability from direct proof.
[example: Refutation Of Double Negation Elimination]
Use the same two-node frame $w<u$, with $u \Vdash_{\mathrm{At}} P$ and $w \nVdash_{\mathrm{At}} P$. By the atomic forcing clause, this means
\begin{align*}
u \Vdash P.
\end{align*}
and
\begin{align*}
w \nVdash P.
\end{align*}
We first show that neither node forces $\neg P$. Since $\neg P$ means $P \to \bot$, the implication clause says that $v \Vdash \neg P$ exactly when every $r \ge v$ with $r \Vdash P$ also satisfies $r \Vdash \bot$. At $u$, taking $r=u$ gives $u \ge u$ and $u \Vdash P$, while the forcing clause for $\bot$ gives
\begin{align*}
u \nVdash \bot.
\end{align*}
Hence
\begin{align*}
u \nVdash \neg P.
\end{align*}
At $w$, taking $r=u$ gives $u \ge w$ and $u \Vdash P$, but again $u \nVdash \bot$. Therefore
\begin{align*}
w \nVdash \neg P.
\end{align*}
Now compute $\neg\neg P$ at $w$. Since $\neg\neg P$ means $\neg P \to \bot$, the implication clause requires that every $v \ge w$ forcing $\neg P$ must force $\bot$. The only nodes above $w$ are $w$ and $u$, and we have just shown
\begin{align*}
w \nVdash \neg P.
\end{align*}
and
\begin{align*}
u \nVdash \neg P.
\end{align*}
Thus there is no node $v \ge w$ with $v \Vdash \neg P$, so the required universal condition holds:
\begin{align*}
w \Vdash \neg\neg P.
\end{align*}
Finally, $w \nVdash \neg\neg P \to P$. Indeed, the implication clause for $\neg\neg P \to P$ at $w$ would require every $v \ge w$ with $v \Vdash \neg\neg P$ to satisfy $v \Vdash P$. Taking $v=w$, we have $w \ge w$ and $w \Vdash \neg\neg P$, but $w \nVdash P$. Therefore
\begin{align*}
w \nVdash \neg\neg P \to P.
\end{align*}
This model separates the failure of every refutation of $P$ from a present proof of $P$: at $w$, double negation holds, but $P$ itself is still not forced.
[/example]
The double-negation example shows that $\neg\neg P$ can mean that every attempted refutation of $P$ breaks down, without supplying a proof of $P$ at the present node. These countermodels are legitimate only because derivable formulas are preserved by every Kripke model, so we need soundness for the forcing semantics.
[quotetheorem:7503]
[proofunderconstruction:7503]
Kripke soundness justifies the use of finite ordered models as diagnostic tools. The phrase "every node" is necessary because a derivation is not allowed to depend on a privileged root; if a formula failed at any node, that node could be regarded as the current state of information. The monotonicity built into Kripke models is also necessary for implication introduction, where assumptions must be transported to future extensions. The theorem does not construct a countermodel for each underivable formula; it only says that any countermodel already found is a valid obstruction to derivability. Thus, if a formula fails at the root of some Kripke model, then it is not intuitionistically derivable.
## Completeness For Intuitionistic Propositional Logic
Soundness explains why derivable formulas hold in every model. Completeness asks the converse question: if a formula is not derivable, can the semantics always detect the failure? For intuitionistic propositional logic, Kripke models are exact enough to do so.
The construction uses a canonical model built from theories. A node represents a deductively closed body of information, and extension is inclusion of information. To make disjunction behave locally at a node, the theories used as nodes must record a choice of disjunct whenever they contain a disjunction.
[definition: Prime Theory]
A prime theory is a set $\Gamma$ of propositional formulas such that $\Gamma$ is closed under intuitionistic derivability, $\bot \notin \Gamma$, and whenever $A \vee B \in \Gamma$, at least one of $A \in \Gamma$ or $B \in \Gamma$ holds.
[/definition]
Primeness is the semantic form of the disjunction property at a world. It makes disjunction forcing local rather than deferred to a later stage. For completeness, we also need to know that whenever a formula is not derivable, there are enough prime theories to witness its failure.
[quotetheorem:7504]
[proofunderconstruction:7504]
This lemma supplies counterworlds for underivable formulas. The hypothesis $\Gamma \nvdash A$ is essential: if $\Gamma \vdash A$, then every deductively closed extension of $\Gamma$ contains $A$, so no prime theory extending $\Gamma$ can omit it. Primeness is needed because intuitionistic disjunction is local in Kripke forcing; without it, a theory could contain $B \vee C$ while containing neither disjunct, breaking the truth lemma for disjunction. The lemma does not make the extension maximal in the Boolean sense, and it need not decide every formula. It is the semantic analogue of extending a consistent classical theory to a maximal consistent one, with primeness replacing Boolean maximality. Once the canonical worlds are available, the remaining task is to prove that membership in a prime theory matches forcing in the canonical Kripke model.
[quotetheorem:7505]
[citeproof:7505]
Completeness confirms that Kripke semantics is not only a source of examples but a full semantics for intuitionistic propositional logic. The restriction to propositional formulas matters: first-order intuitionistic completeness requires additional structure for domains and quantifiers. The use of all Kripke models is also essential; validity in a small selected class of frames may validate extra principles that intuitionistic logic does not prove. The theorem does not say that a failed formula has a unique countermodel, and canonical models are usually larger than the finite countermodels used for quick diagnosis. Together with soundness, completeness gives a precise semantic characterisation of derivability. The two semantic presentations now need to be compared, since later constructive principles can be tested in either algebraic or relational form.
[remark: Algebraic And Kripke Perspectives]
Heyting algebras and Kripke models emphasize different aspects of the same logic. Heyting algebras describe the algebra of truth values and make implication a right adjoint to meet. Kripke models describe growth of information and make implication a universal condition over all future stages.
[/remark]
The next part of the course will use these semantics as a testing ground for further constructive principles. When a proposed classical rule fails, the failure can often be seen either as a missing complement in a Heyting algebra or as a possible future extension in a Kripke model.
The semantic picture in Chapter 3 showed why constructive logic behaves differently from classical logic, and where classical principles can fail. With that testing ground in place, the next chapter studies how classical reasoning can still be used inside constructive mathematics when it is carefully isolated and controlled.
# 4. Classical Logic Inside Constructive Logic
This chapter explains how classical logic can be used inside constructive mathematics without losing constructive content. The guiding question is not whether classical reasoning is valid in general, but which fragments of it preserve the kind of evidence that intuitionistic logic requires. The answer is subtle: proofs of negative conclusions often survive translation, while proofs of positive existence or disjunction may conceal information that constructive mathematics refuses to invent.
Chapters 2 and 3 developed intuitionistic reasoning both proof-theoretically, through natural deduction and LJ, and semantically, through Heyting algebras and Kripke models. We now compare it with classical logic by isolating stable formulas, double-negation translations, and weak forms of excluded middle. The main theme is that classical logic is not a single indivisible principle; it is a hierarchy of principles whose constructive strength depends on the shape of the formulas to which they are applied.
## Negative and Stable Formulas
A first source of tension is proof by contradiction. In classical mathematics, a contradiction obtained from assuming $\neg A$ is enough to infer $A$. Constructively, the same argument only gives $\neg\neg A$, so we need criteria that tell us when this double negation can be removed.
[definition: Negation]
Let $A$ be a proposition. The negation of $A$ is the proposition
\begin{align*}
\neg A := A \to \bot,
\end{align*}
where $\bot$ denotes absurdity.
[/definition]
Negation is constructive because a proof of $\neg A$ is a method that transforms any proof of $A$ into a contradiction. We need to isolate formulas for which double negation carries no extra ambiguity.
[definition: Stable Proposition]
A proposition $A$ is stable if there is a proof of
\begin{align*}
\neg\neg A \to A.
\end{align*}
[/definition]
Stability says that refuting the refutability of $A$ is enough to prove $A$. This is not available for arbitrary $A$, but it holds for many propositions whose content is already negative or decidable.
[example: Stability of Negated Propositions]
Let $B$ be any proposition and put $A := \neg B$, so stability of $A$ means proving $\neg\neg\neg B \to \neg B$. Assume
\begin{align*}
h : \neg\neg\neg B.
\end{align*}
By the definition of negation, this means $h : ((B \to \bot) \to \bot) \to \bot$. To prove $\neg B$, assume an arbitrary proof
\begin{align*}
b : B.
\end{align*}
We must derive $\bot$. From $b$ we define a proof of $\neg\neg B$ as follows: given any $k : \neg B$, that is $k : B \to \bot$, apply $k$ to $b$ and obtain
\begin{align*}
k(b) : \bot.
\end{align*}
Thus the map $k \mapsto k(b)$ has type $\neg B \to \bot$, which is exactly $\neg\neg B$. Applying $h$ to this proof gives
\begin{align*}
h(k \mapsto k(b)) : \bot.
\end{align*}
Therefore every assumed proof $b : B$ leads to absurdity, so we have constructed $\neg B$. Hence $\neg\neg\neg B \to \neg B$, and negated propositions are stable.
[/example]
The example shows why negative conclusions are special: they already have the form of a refutation. We now make precise a syntactic class of formulas built to preserve that behaviour.
[definition: Negative Formula]
Fix a first-order language $\mathcal{L}$. The negative formulas over $\mathcal{L}$ are generated by the grammar
\begin{align*}
N ::= \bot \mid (N \wedge N) \mid (A \to N) \mid \forall x\,N,
\end{align*}
where $A$ ranges over arbitrary first-order $\mathcal{L}$-formulas.
[/definition]
Negation is included because $\neg A$ abbreviates $A \to \bot$. The definition is syntactic, so it depends on how a formula is written. For instance, $\neg\exists x\,P(x)$ is negative, while $\exists x\,\neg P(x)$ is not: the first asks for a refutation of all witnesses, and the second asks for an actual witness. Atomic formulas are not included as positive base cases, because an arbitrary atomic proposition need not be stable intuitionistically.
[quotetheorem:7506]
[citeproof:7506]
This theorem explains why many classical contradiction arguments are constructively acceptable when the final statement is negative. Each clause in the grammar has a specific role. The base $\bot$ is already absurdity, conjunctions reduce stability to the two components, implications $A \to N$ are allowed because the consequent is still negative, and universal quantifiers are allowed because proving $\forall x\,N(x)$ only requires a uniform method for an arbitrary $x$. The variants that fail are exactly the positive ones: adding atomic formulas would give $\neg\neg P \to P$ for an arbitrary atomic $P$, adding disjunction would require choosing a side from $\neg\neg(N_1 \vee N_2)$, and adding existential quantification would require extracting a witness from $\neg\neg\exists x\,N(x)$. This prepares the next result: [Glivenko's theorem](/theorems/7507) can always recover a double-negated propositional conclusion, and the stability theorem tells us when that recovered conclusion can be turned back into the original statement.
[example: Contradiction Versus Witness]
Suppose a classical argument proves that it is false that no $x$ satisfies $P(x)$. Written with $\neg A := A \to \bot$, this conclusion is
\begin{align*}
\neg\neg\exists x\,P(x) = ((\exists x\,P(x) \to \bot) \to \bot).
\end{align*}
Thus the proof has the following constructive shape: given any attempted refutation
\begin{align*}
r : \exists x\,P(x) \to \bot,
\end{align*}
it produces a contradiction
\begin{align*}
h(r) : \bot.
\end{align*}
This is not the same data as a proof of $\exists x\,P(x)$. To prove $\exists x\,P(x)$ constructively, one must give a particular element $a$ and a proof
\begin{align*}
p : P(a).
\end{align*}
Then the [existential introduction rule](/theorems/4639) yields
\begin{align*}
(a,p) : \exists x\,P(x).
\end{align*}
A proof of $\neg\neg\exists x\,P(x)$ supplies only the operation $r \mapsto h(r)$ from refutations to contradictions; it contains no displayed element $a$ and no displayed proof of $P(a)$. In a computational reading, the missing information is exactly the witness requested by the existential quantifier.
[/example]
The practical lesson is that proof by contradiction must be classified by its conclusion. A contradiction proof of $\neg A$, or of a formula built negatively from such claims, can often be retained; a contradiction proof of $A \vee B$ or $\exists x\,P(x)$ usually needs additional constructive information.
## Glivenko's Theorem and Double-Negation Translation
The next question is global: can every classical proof be converted into an intuitionistic proof after adding double negations in the right places? For propositional logic, Glivenko's theorem gives the first answer. It says that classical provability of $A$ implies intuitionistic provability of $\neg\neg A$.
[quotetheorem:7507]
[citeproof:7507]
Glivenko's theorem is sharp in two ways. First, the output is $\neg\neg A$, not $A$. For formulas that are stable, this distinction disappears, which is why classical proofs of stable propositional statements are constructively harmless. Without stability of the conclusion, the theorem does not recover the original statement; applying it to $P \vee \neg P$ gives only $\neg\neg(P \vee \neg P)$, not a constructive decision between $P$ and $\neg P$. This is an explicit counterexample to any version with conclusion $A$: classical propositional logic proves $P \vee \neg P$, while intuitionistic propositional logic does not prove $P \vee \neg P$ for an arbitrary propositional variable $P$.
Second, the theorem is propositional. A plain outer double negation does not protect quantifiers well enough in first-order logic. For example, classical first-order logic proves $\forall x\,(P(x) \vee \neg P(x))$, but intuitionistic logic does not prove $\neg\neg\forall x\,(P(x) \vee \neg P(x))$ in general: in a Kripke model with continually extending information, every node may leave some later predicate value undecided, so no node forces the universal law of excluded middle for all individuals. This gives the corresponding counterexample to dropping the propositional hypothesis. This is why the next translation modifies disjunctions and existential quantifiers inside the formula, rather than merely placing $\neg\neg$ in front of the whole conclusion.
[example: Translating a Classical Contradiction]
Let a classical propositional derivation prove $\bot$ with no open assumptions, possibly using an instance of excluded middle for an auxiliary proposition $P$. By *Glivenko Theorem*, the classical derivation yields an intuitionistic derivation of $\neg\neg\bot$. Expanding negation, this conclusion has type
\begin{align*}
\neg\neg\bot = ((\bot \to \bot) \to \bot).
\end{align*}
Let
\begin{align*}
h : (\bot \to \bot) \to \bot.
\end{align*}
To recover $\bot$, define the identity refutation of absurdity by
\begin{align*}
\operatorname{id}_{\bot} : \bot \to \bot.
\end{align*}
It sends any assumed $u:\bot$ to the same proof $u:\bot$. Applying $h$ to this identity map gives
\begin{align*}
h(\operatorname{id}_{\bot}) : \bot.
\end{align*}
Thus the final double negation disappears because the target is already absurdity: an intuitionistic proof of $\neg\neg\bot$ immediately gives an intuitionistic proof of $\bot$.
With assumptions present, the same reasoning must be applied to the translated sequent rather than only to the conclusion. If the classical derivation has assumptions $\Gamma$ and conclusion $\bot$, the negative translation gives an intuitionistic derivation from $\Gamma^N$ of $\bot^N$, and the defining clause for absurdity gives
\begin{align*}
\bot^N = \bot.
\end{align*}
So assumptions must be translated along with the proof, but no extra work is needed at the final contradiction itself.
[/example]
This example shows why a final double negation is sufficient when the target is already absurdity. Predicate logic introduces a new problem: disjunctions and existential quantifiers can occur inside a formula, not only at the end, so the translation must protect every positive choice point before the proof is interpreted constructively.
[definition: Goedel-Gentzen Negative Translation]
Fix a first-order language $\mathcal{L}$, and let $\operatorname{Form}_{\mathcal{L}}$ be the set of first-order $\mathcal{L}$-formulas. The Goedel-Gentzen negative translation is the function
\begin{align*}
(-)^N : \operatorname{Form}_{\mathcal{L}} \to \operatorname{Form}_{\mathcal{L}},
\end{align*}
sending each formula $A$ to a formula $A^N$, defined recursively as follows. For atomic $P$,
\begin{align*}
P^N := \neg\neg P.
\end{align*}
For absurdity,
\begin{align*}
\bot^N := \bot.
\end{align*}
For conjunction,
\begin{align*}
(A \wedge B)^N := A^N \wedge B^N.
\end{align*}
For implication,
\begin{align*}
(A \to B)^N := A^N \to B^N.
\end{align*}
For disjunction,
\begin{align*}
(A \vee B)^N := \neg(\neg A^N \wedge \neg B^N).
\end{align*}
For universal quantification,
\begin{align*}
(\forall x\,A)^N := \forall x\,A^N.
\end{align*}
For existential quantification,
\begin{align*}
(\exists x\,A)^N := \neg\forall x\,\neg A^N.
\end{align*}
[/definition]
The translation replaces positive choices by their double-negated or refutational versions. A translated disjunction no longer chooses a side, and a translated existential no longer gives a witness. The natural theorem to prove is that these clauses have proof-theoretic force: they translate entire classical derivations into intuitionistic derivations.
[quotetheorem:7508]
[proofunderconstruction:7508]
The theorem does not say that classical proofs contain the same witnesses as constructive proofs. It says that the part of a classical proof expressible in the negative translation is acceptable to intuitionistic logic. The hypotheses matter at both ends of the sequent. If the conclusion were left untranslated, the classical proof of $P \vee \neg P$ would incorrectly yield an intuitionistic proof of excluded middle. If translation were ignored for assumptions and then the original classical conclusion were demanded, the sequent $\neg\neg P \vdash_{\mathrm{CL}} P$ would incorrectly suggest that a constructive proof can eliminate double negation. More explicitly, take $\Gamma=\{\neg\neg P\}$ and conclusion $P$. Classically this sequent is valid, but intuitionistically it is not: a Kripke root may fail to force $P$ while every extension that refutes $P$ is impossible, so the root can force $\neg\neg P$ without forcing $P$. The translation prevents this: it turns the conclusion $P$ into $\neg\neg P$ and keeps the proof inside the information that intuitionistic logic can justify. This prompts a sharper question: when the original statement is already negative, can we remove the translation and recover the statement itself?
[quotetheorem:7509]
[citeproof:7509]
This conservativity result is the formal version of a common working rule: classical reasoning may be used as a shortcut when both the assumptions and the target live in a negative fragment. The conclusion hypothesis cannot be dropped: with no assumptions, classical logic proves $P \vee \neg P$, while intuitionistic logic does not prove that positive disjunction. The assumption hypothesis marks a different risk. A positive assumption such as $\exists x\,P(x)$ contains a witness that its negative translation records only as $\neg\forall x\,\neg P^N(x)$; a constructive argument that uses the actual witness is not reproduced from the translated assumption alone. Likewise, a proof using $P \vee Q$ by case analysis may rely on knowing which branch is present, while $(P \vee Q)^N$ records only that the two translated alternatives cannot both be refuted. The next section studies exactly these weaker classical remnants: principles that preserve some refutational information but stop short of constructive choice.
## De Morgan Principles and Weak Forms of Classical Logic
Classical logic contains many principles that coincide classically but separate constructively. The final task of the chapter is to place some familiar laws into a hierarchy. This prevents the mistake of treating every nonconstructive argument as a full use of excluded middle.
[definition: Law of Excluded Middle]
The law of excluded middle is the schema
\begin{align*}
A \vee \neg A
\end{align*}
for every proposition $A$.
[/definition]
Excluded middle gives a decision without a method of deciding. Constructive mathematics often accepts restricted instances, especially when $A$ is decidable by computation or finite search. A weaker question asks only whether $A$ can be refuted, which motivates a separate principle.
[definition: Weak Excluded Middle]
Weak excluded middle is the schema
\begin{align*}
\neg A \vee \neg\neg A
\end{align*}
for every proposition $A$.
[/definition]
Weak excluded middle does not decide $A$; it decides whether $A$ is refutable. This is weaker than excluded middle but still not intuitionistically derivable in general. Another family of principles asks whether a refutation of a compound statement can be turned into positive information about which component fails.
[definition: De Morgan Principle]
A De Morgan principle is any one of the following schemas, with $A$ and $B$ ranging over propositions and $P(x)$ ranging over predicates:
\begin{align*}
\neg(A \wedge B) \to (\neg A \vee \neg B),
\end{align*}
\begin{align*}
\neg(A \vee B) \to (\neg A \wedge \neg B),
\end{align*}
\begin{align*}
(\neg A \wedge \neg B) \to \neg(A \vee B),
\end{align*}
\begin{align*}
\neg\exists x\,P(x) \to \forall x\,\neg P(x),
\end{align*}
\begin{align*}
\forall x\,\neg P(x) \to \neg\exists x\,P(x),
\end{align*}
or
\begin{align*}
\neg\forall x\,P(x) \to \exists x\,\neg P(x).
\end{align*}
[/definition]
Some De Morgan directions are intuitionistic, while others require choice-like or decision-like information. For instance, intuitionistic logic proves $(\neg A \vee \neg B) \to \neg(A \wedge B)$, but the reverse direction asks us to choose which conjunct fails.
[quotetheorem:7510]
[citeproof:7510]
The converse is not derivable in intuitionistic logic: knowing whether $A$ is refutable need not produce either a proof of $A$ or a refutation of $A$. A clean way to separate the principles is algebraic rather than pictorial. There are intermediate Heyting algebras in which every element $a$ satisfies $\neg a \vee \neg\neg a = 1$, while some element $p$ still satisfies $p \vee \neg p \ne 1$. Interpreting propositions in such a Heyting algebra validates weak excluded middle but refutes excluded middle, so the implication from weak excluded middle to excluded middle cannot be proved intuitionistically. This shows why weak excluded middle is a genuine intermediate principle rather than a disguised form of classical logic. The missing ingredient is positive decision data: excluded middle supplies a branch of $A \vee \neg A$, while weak excluded middle supplies only a branch of $\neg A \vee \neg\neg A$. The distinction matters in analysis, where many propositions are stable under double negation but not decidable by a finite computation, and it prepares the next example by separating valid proof-by-contradiction patterns from those that hide an existential choice.
[example: Proof by Contradiction in Elementary Analysis]
Consider the uniqueness-of-limits argument for a real sequence $(x_n)$. Suppose $x_n \to a$ and $x_n \to b$, and assume for contradiction that $a \ne b$. Put $\varepsilon := |a-b|/3$, so $\varepsilon>0$. By convergence to $a$, choose $N_a$ such that $n \ge N_a$ implies $|x_n-a|<\varepsilon$; by convergence to $b$, choose $N_b$ such that $n \ge N_b$ implies $|x_n-b|<\varepsilon$. For $n \ge \max(N_a,N_b)$, the triangle inequality gives
\begin{align*}
|a-b| \le |a-x_n|+|x_n-b| < \varepsilon+\varepsilon = 2|a-b|/3.
\end{align*}
Since $|a-b|>0$, the inequality $|a-b|<2|a-b|/3$ is impossible. Thus the assumption $a\ne b$ leads to contradiction, so the proof establishes the negative statement that two proposed limits cannot be distinct.
By contrast, if $f$ is continuous on $[0,1]$ with $f(0)<0<f(1)$, a contradiction argument from “there is no $c$ with $f(c)=0$” has conclusion
\begin{align*}
\neg\neg\exists c\in[0,1]\, f(c)=0.
\end{align*}
A constructive proof of the existential conclusion must exhibit a particular real number $c$ and a proof that $f(c)=0$. The double-negated conclusion only transforms any refutation of such a zero into absurdity; it does not itself name the zero. A constructive proof, for example, must supply the locating information produced by a bisection or interval-nesting argument, not merely refute the absence of a zero.
The irrationality proof for $\sqrt2$ is negative again. Assume a rational number $q$ satisfies $q^2=2$. Write $q=m/n$ with integers $m,n$, $n\ne 0$, and $\gcd(m,n)=1$. Then
\begin{align*}
(m/n)^2=2
\end{align*}
gives
\begin{align*}
m^2=2n^2.
\end{align*}
Thus $m^2$ is even. If $m$ were odd, say $m=2r+1$, then
\begin{align*}
m^2=(2r+1)^2=4r^2+4r+1=2(2r^2+2r)+1,
\end{align*}
so $m^2$ would be odd. Hence $m$ is even, so $m=2k$ for some integer $k$. Substituting gives
\begin{align*}
(2k)^2=2n^2,
\end{align*}
hence
\begin{align*}
4k^2=2n^2,
\end{align*}
and therefore
\begin{align*}
n^2=2k^2.
\end{align*}
So $n^2$ is even, and the same parity argument shows that $n$ is even. Both $m$ and $n$ are divisible by $2$, contradicting $\gcd(m,n)=1$. Therefore no rational number squares to $2$, which is a refutation of any proposed rational square root rather than the construction of a positive witness.
[/example]
The hierarchy becomes especially visible around existential quantifiers. Constructively, refuting universal failure is weaker than producing a witness. We need to record this separation in proof-theoretic form, and its proof sketch indicates how Kripke models witness the missing information.
[quotetheorem:7511]
[citeproof:7511]
This result is the semantic counterpart of the computational point made earlier: a double-negated existential is a consistency statement about witnesses, not a witness-producing construction. The boundary is important: the theorem does not say that existence can never be recovered from double-negated existence. If the search space is finite and decidable, if $P$ is stable in the relevant sense, or if extra structure supplies a canonical witness, then an implication from $\neg\neg\exists x\,P(x)$ to $\exists x\,P(x)$ may be constructively provable for that particular situation. What fails is the unrestricted logical schema, where no decision procedure, stability hypothesis, or finite search principle is available. This also explains why classical logic remains useful rather than being discarded: classical arguments often locate durable negative information, and constructive logic then records exactly which positive data has not been produced. In realizability and program extraction, this distinction becomes operational, since a proof of $\exists x\,P(x)$ must compute a witness while a proof of $\neg\neg\exists x\,P(x)$ need only transform refutations into contradictions. Kripke semantics gives the model-theoretic picture of the same gap.
[remark: What Classical Logic Contributes]
Classical logic is best understood constructively as a source of negative information. Through Glivenko's theorem and the Goedel-Gentzen translation, it can certify that certain refutations, impossibility results, and stable consequences are available. It cannot by itself supply the data demanded by a positive disjunction or existential statement.
[/remark]
The chapter leaves us with a usable test for later constructive analysis. When reading a classical proof, identify the final logical form, mark every use of excluded middle or contradiction, and ask whether the conclusion is negative, stable, decidable, or genuinely positive. This classification determines whether the proof can be imported directly, translated by double negation, or must be rebuilt with explicit constructive content.
Chapter 4 identified which classical arguments preserve constructive content and which need translation or rebuilding. The next chapter applies that discipline to arithmetic, where decidability, stability, and omniscience principles sharply separate what can be computed from what must be assumed.
# 5. Arithmetic, Decidability, and Omniscience Principles
This chapter turns the computational reading of proofs toward arithmetic. It assumes the preceding material on intuitionistic logic from Chapters 1 and 2, the semantic tests from Chapter 3, and the Chapter 4 distinction between positive existence and negative or stable information. Natural numbers are the first place where constructive mathematics looks very close to classical mathematics: induction, primitive recursion, and equality all behave well. The difference appears when a statement about a sequence asks us to find information that has not been supplied by a construction. The chapter isolates several omniscience principles, each measuring a different amount of nonconstructive search over binary sequences and natural-number predicates.
## Arithmetic as a Constructive Theory
What arithmetic principles are available when proof still means construction? The basic answer is that the natural numbers may be used with their usual constructors, recursion, and induction, but existential statements over $\mathbb N$ require witnesses and disjunctions require a chosen side.
[definition: Heyting Arithmetic]
Heyting arithmetic, denoted $\mathrm{HA}$, is first-order intuitionistic arithmetic with equality, constant $0$, successor function $S$, symbols for primitive recursive functions, defining equations for those functions, and the induction scheme
\begin{align*}
\bigl(\varphi(0) \wedge \forall n(\varphi(n) \to \varphi(Sn))\bigr) \to \forall n\,\varphi(n)
\end{align*}
for every formula $\varphi(n)$ in the language.
[/definition]
The important point is not that induction is weakened; it is not. What changes is the surrounding logic: proving $\exists n\,\varphi(n)$ means producing a numeral and a proof of $\varphi(n)$, while proving $\varphi \vee \psi$ means providing a proof of one specified disjunct. To understand which numerical functions this theory treats as directly computable, we next isolate the primitive-recursive schemes built into arithmetic.
[definition: Primitive Recursive Function]
A function $f:\mathbb N^k \to \mathbb N$ is primitive recursive if it is obtained from the zero function, successor function, projections, composition, and primitive recursion.
[/definition]
Primitive recursion is the constructive mechanism behind many familiar algorithms. It gives functions whose values are computed by a finite recipe with a known bound on the recursion depth, namely the natural-number argument being recursed over.
[example: Addition by Primitive Recursion]
Define $+:\mathbb N^2\to\mathbb N$ by recursion on the second argument:
\begin{align*}
m+0=m
\end{align*}
and
\begin{align*}
m+S(n)=S(m+n).
\end{align*}
This is primitive recursion because the value at $S(n)$ is obtained from the already computed value at $n$.
We verify commutativity by first proving the two identities needed in the induction. For $0+n=n$, induction on $n$ gives the base case
\begin{align*}
0+0=0
\end{align*}
by the first defining clause. If $0+n=n$, then
\begin{align*}
0+S(n)=S(0+n)
\end{align*}
by the second defining clause, and hence
\begin{align*}
0+S(n)=S(n)
\end{align*}
by the induction hypothesis.
Next, for $S(m)+n=S(m+n)$, induction on $n$ gives the base case
\begin{align*}
S(m)+0=S(m)
\end{align*}
by the first defining clause, while
\begin{align*}
S(m+0)=S(m)
\end{align*}
because $m+0=m$. If $S(m)+n=S(m+n)$, then
\begin{align*}
S(m)+S(n)=S(S(m)+n)
\end{align*}
by the second defining clause, so
\begin{align*}
S(m)+S(n)=S(S(m+n))
\end{align*}
by the induction hypothesis. Also
\begin{align*}
S(m+S(n))=S(S(m+n))
\end{align*}
because $m+S(n)=S(m+n)$. Thus $S(m)+S(n)=S(m+S(n))$.
Now fix $m$ and prove $m+n=n+m$ by induction on $n$. For $n=0$,
\begin{align*}
m+0=m
\end{align*}
and
\begin{align*}
0+m=m
\end{align*}
by the identity $0+k=k$. If $m+n=n+m$, then
\begin{align*}
m+S(n)=S(m+n)
\end{align*}
by definition, hence
\begin{align*}
m+S(n)=S(n+m)
\end{align*}
by the induction hypothesis. By the identity $S(k)+\ell=S(k+\ell)$ with $k=n$ and $\ell=m$,
\begin{align*}
S(n)+m=S(n+m).
\end{align*}
Therefore $m+S(n)=S(n)+m$. The construction and the proof are accepted in $\mathrm{HA}$ because each recursion and induction step refers only to the predecessor already supplied by the natural-number argument.
[/example]
This example shows how arithmetic operations can be justified by finite recursive clauses. The next issue is different: even when a numerical question has a yes-or-no answer for each input, constructive logic asks for a method returning the relevant side. This motivates the formal notion of a decidable predicate.
[definition: Decidable Predicate]
A predicate $P:\mathbb N \to \mathrm{Prop}$ is decidable if
\begin{align*}
\forall n\,\bigl(P(n) \vee \neg P(n)\bigr).
\end{align*}
[/definition]
A decidable predicate supplies a method that, for each input $n$, returns either evidence for $P(n)$ or evidence that $P(n)$ is impossible. We need a basic theorem for the first basic test case: equality of natural numbers, which later search principles rely on when finite pieces of data are inspected.
[quotetheorem:7512]
[citeproof:7512]
This theorem explains why arithmetic can be computational without being classical. The hypotheses matter: both sides are finite natural-number data, built by repeatedly applying $S$ to $0$, so comparison can proceed by simultaneous induction until one side reaches $0$ or both do. The theorem does not say that every arithmetical predicate is decidable; for example, if $a:\mathbb N \to \{0,1\}$ is generated by an open-ended computation, equality lets us decide each finite comparison $a(n)=1$, but it does not decide whether such an $n$ exists somewhere in the infinite sequence. Thus decidable equality is the finite arithmetic base on which the later omniscience principles operate, while the failure of bounded information for infinite searches is exactly what those principles measure.
[example: A Decidable Predicate with an Unresolved Existential]
Let $a:\mathbb N \to \{0,1\}$ be a binary sequence whose values can be computed one at a time. For a fixed $n$, compute $a(n)$. If the computation returns $1$, then $a(n)=1$ is witnessed by reflexivity. If it returns $0$, then $a(n)=1$ would imply $0=1$, contradicting the disjointness of the two values in $\{0,1\}$; hence $\neg(a(n)=1)$. Thus each instance $a(n)=1$ is decidable.
This pointwise decision does not by itself prove either global alternative. After inspecting only $0,1,\dots,N$, the strongest negative information obtained is the finite list
\begin{align*}
a(0)=0,\ a(1)=0,\ \dots,\ a(N)=0.
\end{align*}
That finite list is compatible with a later value
\begin{align*}
a(N+1)=1,
\end{align*}
so it does not produce a proof of $\forall n\,a(n)=0$. It also does not produce a witness for $\exists n\,a(n)=1$, because such a proof must contain a specific natural number $n$ together with evidence that $a(n)=1$. The example separates the constructive decision of each finite input from the stronger global decision of whether a $1$ occurs anywhere in the infinite sequence.
[/example]
## Omniscience Principles for Binary Sequences
Which extra principles would turn pointwise decidability into global information about an infinite binary sequence? Omniscience principles answer this question by postulating various forms of infinite search, each stronger than what constructive proof normally provides.
[definition: Binary Sequence]
A binary sequence is a function $a:\mathbb N \to \{0,1\}$.
[/definition]
Binary sequences are used because, unlike the finite natural-number equalities just discussed, they strip infinite search down to its logical core. Asking whether some $a(n)$ is $1$ is the same as asking whether a decidable predicate has a witness, provided that the predicate has been encoded as bits. The strongest principle considered here decides exactly this infinite yes-or-no question.
[definition: Limited Principle of Omniscience]
The limited principle of omniscience, abbreviated $\mathrm{LPO}$, states that for every binary sequence $a:\mathbb N \to \{0,1\}$,
\begin{align*}
\bigl(\exists n\,a(n)=1\bigr) \vee \bigl(\forall n\,a(n)=0\bigr).
\end{align*}
[/definition]
LPO gives a binary decision for an infinite search problem. It does not merely say that each bit is decidable; it says that the whole sequence can be classified as either containing a $1$ or containing none. The constructive strength of this classification is best seen by unpacking what the positive disjunct contains.
Unpacking the principle, the positive branch of LPO gives existence of an index $n$ with $a(n)=1$, while the negative branch gives a proof that every entry is $0$. Thus the content is a global decision about the whole binary sequence, not a pointwise decision about individual entries.
This shows that LPO is not just a harmless restatement of pointwise decidability. The crucial extra hypothesis is the global disjunction over an infinite sequence: without it, finite inspection can always be compatible both with a later $1$ and with the sequence being identically $0$. Its limitation is also important: LPO decides existence for binary searches, but it does not by itself provide arbitrary classical reasoning for all propositions unless those propositions have first been encoded into such a search. This is why weaker principles are introduced next: they keep part of the search behaviour while avoiding the full all-or-nothing decision supplied by LPO.
The full force of LPO is often too strong for constructive analysis, so weaker principles record less decisive forms of search. The first weakening keeps a global disjunction only under the promise that the sequence has at most one nonzero entry. For instance, if a computation may eventually place a single marker somewhere in a binary sequence but cannot place two, deciding exactly whether a marker exists is stronger than many constructive arguments need. It can be enough to know that all even positions are empty or all odd positions are empty, because that gives a usable separation of alternatives without locating the marker. This motivates a principle that decides an empty parity class rather than finding a witness.
[definition: Lesser Limited Principle of Omniscience]
The lesser limited principle of omniscience, abbreviated $\mathrm{LLPO}$, states that for every binary sequence $a:\mathbb N \to \{0,1\}$ with at most one index $n$ satisfying $a(n)=1$,
\begin{align*}
\bigl(\forall k\,a(2k)=0\bigr) \vee \bigl(\forall k\,a(2k+1)=0\bigr).
\end{align*}
[/definition]
LLPO does not find a $1$. It only decides that one parity class is empty, under the promise that there is at most one $1$ in the whole sequence. The next weakening keeps arbitrary binary sequences but lowers the conclusion to a double-negated alternative.
[definition: Weak Limited Principle of Omniscience]
The weak limited principle of omniscience, abbreviated $\mathrm{WLPO}$, states that for every binary sequence $a:\mathbb N \to \{0,1\}$,
\begin{align*}
\bigl(\forall n\,a(n)=0\bigr) \vee \neg\bigl(\forall n\,a(n)=0\bigr).
\end{align*}
[/definition]
WLPO decides the all-zero property for a binary sequence without necessarily producing a witness to nonzeroness. Equivalently, since $\neg\forall n\,a(n)=0$ is the same as $\neg\neg\exists n\,a(n)=1$ for binary sequences, WLPO may be written as
\begin{align*}
\bigl(\forall n\,a(n)=0\bigr) \vee \neg\neg\exists n\,a(n)=1.
\end{align*}
It gives less information than LPO because the second disjunct is only a double negation of the existence of a $1$, not a natural number $n$ with $a(n)=1$. The next principle focuses on semidecidable searches rather than arbitrary binary alternatives.
[definition: Markov's Principle]
Markov's principle states that for every predicate $P:\mathbb N \to \mathrm{Prop}$ satisfying $\forall n\,\bigl(P(n)\vee\neg P(n)\bigr)$,
\begin{align*}
\neg\neg\exists n\,P(n) \to \exists n\,P(n).
\end{align*}
[/definition]
Markov's principle captures a Russian constructive attitude toward search: if a terminating search cannot be impossible, then the search is accepted as yielding a witness. Bishop-style constructive mathematics usually does not include this as a general axiom.
[example: Binary Sequence with No Known First One]
Let $a:\mathbb N\to\{0,1\}$ be defined from an open computation by
\begin{align*}
a(n)=1 \quad\text{exactly when the computation halts at step } n,
\end{align*}
and by $a(n)=0$ otherwise. For a fixed $n$, run the computation for exactly $n$ steps. If it halts at step $n$, then the defining clause gives $a(n)=1$, witnessed by reflexivity. If it does not halt at step $n$, then the defining clause gives $a(n)=0$; an assumed proof of $a(n)=1$ would identify $0$ with $1$, contradicting the disjointness of the two binary values. Thus each individual proposition $a(n)=1$ is decidable.
The global statement
\begin{align*}
\exists n\,a(n)=1
\end{align*}
amounts to producing a particular step number $n$ together with evidence that the computation halts at that step. The global statement
\begin{align*}
\forall n\,a(n)=0
\end{align*}
amounts to proving that no finite step is a halting step. Finite simulation supplies neither kind of global evidence by itself: after checking $0,1,\dots,N$, the computation may still halt at step $N+1$, and the checked initial segment does not contain a witness either. LPO would decide between these two alternatives for the binary sequence $a$, while Markov's principle would only convert $\neg\neg\exists n\,a(n)=1$ into $\exists n\,a(n)=1$ after that double-negated existence had already been proved.
[/example]
## Sigma-One Search and Markov's Principle
How much classical reasoning is hidden in the move from impossibility of failure to success of a search? Sigma-one predicates provide the exact arithmetic setting: they say that some natural-number witness exists for a decidable relation.
[definition: Sigma-One Predicate]
A predicate $A$ is sigma-one if it has the form
\begin{align*}
A \equiv \exists n\,R(n),
\end{align*}
where $R(n)$ is a decidable predicate on $\mathbb N$.
[/definition]
Sigma-one statements are semidecidable: if a witness exists, exhaustive search can eventually find it. The constructive gap is that knowing it is impossible for no witness to exist is not the same as having a witness. The theorem below records that Markov's principle is precisely the extra rule that closes this gap for sigma-one statements.
[quotetheorem:7513]
[citeproof:7513]
This result is often the practical form of Markov's principle. The sigma-one hypothesis is essential because the proof needs a decidable relation $R(n)$ whose witnesses can be searched for one by one. A proposition such as $\forall n\,a(n)=0$ for a binary sequence $a:\mathbb N \to \{0,1\}$ is not sigma-one in this sense; double negating it does not provide a finite witness that a search could eventually discover. Thus Markov's principle does not make every proposition stable, but it makes existential arithmetic searches stable under double negation, which is exactly the form needed in recursive mathematics and computability-theoretic arguments. This is also one of the entry points to constructive reverse mathematics: different omniscience and search principles can be compared by asking which theorems of analysis or arithmetic become equivalent to them over a constructive base theory.
[example: Search for a Zero of a Computable Integer Sequence]
Let $f:\mathbb N \to \mathbb Z$ be computable, and define $R(n)$ to be the predicate $f(n)=0$. For each fixed $n$, compute the integer $f(n)$. Equality of integers is decidable, so this computation returns either evidence that $f(n)=0$ or evidence that $f(n)\neq 0$; hence
\begin{align*}
\forall n\,\bigl(R(n)\vee\neg R(n)\bigr).
\end{align*}
With this notation, the statement that $f$ has a zero is exactly
\begin{align*}
\exists n\,f(n)=0.
\end{align*}
Since $R(n)$ means $f(n)=0$, this is the same statement as
\begin{align*}
\exists n\,R(n).
\end{align*}
Thus $\exists n\,f(n)=0$ is sigma-one: it has the form $\exists n\,R(n)$ with $R$ decidable.
Now suppose we have
\begin{align*}
\neg\neg\exists n\,f(n)=0.
\end{align*}
Using the identification of $R(n)$ with $f(n)=0$, this is
\begin{align*}
\neg\neg\exists n\,R(n).
\end{align*}
Markov's principle applies to the decidable predicate $R$, so it gives
\begin{align*}
\exists n\,R(n).
\end{align*}
Unfolding $R(n)$ once more yields
\begin{align*}
\exists n\,f(n)=0.
\end{align*}
Thus, under Markov's principle, double-negated evidence that $f$ cannot be everywhere nonzero is enough to obtain an actual natural number $n$ with $f(n)=0$, although the argument need not provide a prior bound on where that $n$ occurs.
[/example]
## Decision, Semidecision, and Search
What is the difference between being able to recognise success and being able to decide the whole problem? Constructive mathematics keeps these notions separate, while classical logic often collapses them through excluded middle or choice principles.
[definition: Decision Procedure]
A decision procedure for a predicate $P(n)$ is an algorithm that, given $n\in\mathbb N$, terminates with either evidence for $P(n)$ or evidence for $\neg P(n)$.
[/definition]
A decision procedure is pointwise. It settles each input, but it need not settle an infinite existential question without a further bound or principle.
[definition: Semidecision Procedure]
A semidecision procedure for a predicate $Q$ is an algorithm that terminates with evidence when $Q$ holds, with no required termination behaviour when $Q$ fails.
[/definition]
Semidecision matches existential search. For $Q\equiv\exists n\,P(n)$ with $P$ decidable, the semidecision procedure checks $P(0),P(1),P(2),\dots$ and stops at the first witness. To compare semidecision with witness production under hypotheses, we need a separate definition of search.
[definition: Search Procedure]
A search procedure for $\exists n\,P(n)$ is an algorithm that returns a natural number $n$ together with evidence for $P(n)$ whenever such a witness is guaranteed by its hypotheses.
[/definition]
The distinction between these three ideas is central to the comparison of constructive traditions. Bishop-style mathematics accepts explicit searches under explicit hypotheses; recursive mathematics emphasises algorithms and semidecision; Russian-style constructivism often accepts Markov's principle for searches over decidable predicates.
[example: Deciding Inputs Without Deciding Existence]
Let $a:\mathbb N\to\{0,1\}$ be computable. For a fixed input $n$, compute $a(n)$. Since $a(n)$ is a binary value, the computation returns either $0$ or $1$. If it returns $1$, then $a(n)=1$ is witnessed by reflexivity. If it returns $0$, then an assumed proof of $a(n)=1$ would give $0=1$, contradicting the disjointness of the two binary values; hence $\neg(a(n)=1)$. Thus each predicate instance $a(n)=1$ is decidable.
The existential statement
\begin{align*}
\exists n\,a(n)=1
\end{align*}
is semidecidable by the following search: compute $a(0)$, then $a(1)$, then $a(2)$, and continue. If at stage $k$ the computation returns $a(k)=1$, the search stops and returns the witness $k$ together with the equality proof $a(k)=1$. If no such $k$ has appeared after checking $0,1,\dots,N$, the available information is only
\begin{align*}
a(0)=0,\ a(1)=0,\ \dots,\ a(N)=0.
\end{align*}
This finite evidence is compatible with
\begin{align*}
a(N+1)=1
\end{align*}
and also compatible with
\begin{align*}
\forall n\,a(n)=0.
\end{align*}
Therefore the scanning procedure need not terminate in the all-zero case. A decision procedure for $\exists n\,a(n)=1$ would have to return either a witness for $\exists n\,a(n)=1$ or evidence for $\neg\exists n\,a(n)=1$; for a binary sequence, the latter is exactly the all-zero information $\forall n\,a(n)=0$. This is the additional global information asserted by LPO.
[/example]
This example separates finite input decisions from global decisions over an infinite domain. The same separation reappears in analysis, where real numbers are given by approximations rather than by finite normal forms. The failure of general trichotomy for real numbers is the analytic analogue of the failure of unrestricted binary sequence search.
[example: Failure of Trichotomy for Arbitrary Real Numbers]
Represent $x$ by the constant rational [Cauchy sequence](/page/Cauchy%20Sequence) $x_N=0$. To see what trichotomy would decide, let $a:\mathbb N\to\{0,1\}$ be a computable binary sequence and form rational approximations
\begin{align*}
y_N=\sum_{k=0}^{N}\frac{a(k)}{2^{k+1}}.
\end{align*}
Each summand satisfies $0\leq a(k)/2^{k+1}$, so $0\leq y_N$ for every $N$. Also, if $M>N$, then
\begin{align*}
y_M-y_N=\sum_{k=N+1}^{M}\frac{a(k)}{2^{k+1}}.
\end{align*}
Since $a(k)\leq 1$, we have
\begin{align*}
0\leq y_M-y_N\leq \sum_{k=N+1}^{M}\frac{1}{2^{k+1}}.
\end{align*}
The finite geometric sum is bounded by its infinite tail:
\begin{align*}
\sum_{k=N+1}^{M}\frac{1}{2^{k+1}}\leq \sum_{k=N+1}^{\infty}\frac{1}{2^{k+1}}=\frac{1}{2^{N+1}}.
\end{align*}
Thus the sequence $(y_N)$ is Cauchy with an explicit tail bound.
If trichotomy were available for arbitrary real numbers, applying it to $x=0$ and $y=\lim_N y_N$ would return one of
\begin{align*}
0<y.
\end{align*}
\begin{align*}
0=y.
\end{align*}
\begin{align*}
y<0.
\end{align*}
The third alternative is impossible because every approximation $y_N$ is nonnegative and the limit of a nonnegative Cauchy sequence cannot be strictly negative. The equality alternative says that no positive bit has appeared anywhere: if some $a(j)=1$, then
\begin{align*}
y_N\geq \frac{a(j)}{2^{j+1}}=\frac{1}{2^{j+1}}
\end{align*}
for every $N\geq j$, contradicting $y=0$. The strict inequality alternative $0<y$ gives positive separation from $0$, and such separation must come from some finite partial sum, hence from some index $j$ with $a(j)=1$.
So trichotomy would turn an infinite stream of approximations into a global decision: either all bits of $a$ are $0$, or a particular positive bit eventually appears. Constructively, finite approximations to real numbers do not contain that global information; every inspected initial segment may still be compatible both with equality and with later strict separation.
[/example]
The contrast with natural numbers is now sharp. Arithmetic data have finite canonical forms, while binary sequences and real numbers may carry information revealed only through ongoing approximation. The closing summary records this boundary before the course moves on to constructive analysis.
[remark: Arithmetic as the Boundary Case]
Natural-number equality is decidable because natural numbers are finite data built from $0$ and successor. Infinite binary sequences and real numbers introduce open-ended information, so pointwise decidability no longer supplies global decisions. Omniscience principles mark the exact points where extra nonconstructive power is being added.
[/remark]
Chapter 5 showed that arithmetic is the first place where constructive limitations become concrete, especially for infinite sequences and real numbers. The next chapter turns from the logic of arithmetic to the interpretation of proofs as programs, making the computational reading of constructive reasoning explicit.
# 6. Realizability and Proofs as Programs
Realizability makes precise the slogan that a constructive proof carries a program. Chapters 1 and 2 treated intuitionistic proofs syntactically through BHK evidence, natural deduction, and sequent calculus, while Chapter 5 introduced the arithmetic setting in which realizers will live. This chapter adds a semantic and computational layer: a proof of a formula is interpreted by a natural number coding an effective procedure, and the clauses for the logical constants explain what information the procedure must produce.
The main theme is that realizability validates intuitionistic arithmetic while exposing extra computational principles that are not theorems of ordinary intuitionistic arithmetic. We first define Kleene realizability for arithmetic formulas, then prove soundness and the [numerical existence property](/theorems/7515), and finally compare this with modified realizability, where extracted programs are organised through witnesses, majorants, and higher-type functionals.
## Realizers for Arithmetic Formulas
What does it mean for a number to be evidence for a formula about natural numbers? The BHK interpretation says that evidence for $A \vee B$ chooses a side, evidence for $A \to B$ transforms evidence for $A$ into evidence for $B$, and evidence for $\exists n\,A(n)$ gives both a number and evidence for the instance. Kleene realizability turns this informal reading into an arithmetical relation $e \Vdash A$, where $e \in \mathbb N$ codes a partial recursive computation.
[definition: Kleene Realizability Relation]
Fix a primitive recursive pairing function $\langle \cdot,\cdot\rangle: \mathbb N\times\mathbb N\to\mathbb N$ with primitive recursive projections $\pi_0,\pi_1:\mathbb N\to\mathbb N$, written $(e)_0=\pi_0(e)$ and $(e)_1=\pi_1(e)$. Fix an effective enumeration $(\varphi_e)_{e\in\mathbb N}$ of partial recursive functions $\varphi_e: \mathbb N\rightharpoonup \mathbb N$. For closed arithmetic formulas, Kleene realizability is a relation $\Vdash\subseteq \mathbb N\times \mathrm{Sent}_{\mathrm{Ar}}$, written $e\Vdash A$, defined recursively on the form of $A$ as follows.
For an atomic sentence $P$, $e \Vdash P$ iff $P$ is true in the standard natural numbers.
For conjunction,
\begin{align*}
e \Vdash A \wedge B
\end{align*}
iff $(e)_0 \Vdash A$ and $(e)_1 \Vdash B$, where $(e)_0,(e)_1$ are the two components decoded from $e$.
For disjunction,
\begin{align*}
e \Vdash A \vee B
\end{align*}
iff either $(e)_0=0$ and $(e)_1 \Vdash A$, or $(e)_0=1$ and $(e)_1 \Vdash B$.
For implication,
\begin{align*}
e \Vdash A \to B
\end{align*}
iff for every $a \in \mathbb N$, if $a \Vdash A$, then $\varphi_e(a)$ is defined and $\varphi_e(a) \Vdash B$.
For falsity, no number realizes $\bot$.
For universal quantification,
\begin{align*}
e \Vdash \forall n\,A(n)
\end{align*}
iff for every $n \in \mathbb N$, $\varphi_e(n)$ is defined and $\varphi_e(n) \Vdash A(n)$.
For existential quantification,
\begin{align*}
e \Vdash \exists n\,A(n)
\end{align*}
iff $(e)_0 \in \mathbb N$ and $(e)_1 \Vdash A((e)_0)$.
[/definition]
The atomic clause is deliberately extensional: atomic truths do not require computational content beyond their truth value. The computational content enters at the logical connectives, where pairing, tagging, and indices for partial recursive functions record how evidence is built and used.
[example: Tagged Output for Disjunction]
Suppose $A$ and $B$ are arithmetic formulas, with $a \Vdash A$ and $b \Vdash B$. Let $e=\langle 0,a\rangle$. Then $(e)_0=0$ and $(e)_1=a$ by the defining property of the pairing projections. By the disjunction clause in Kleene realizability, these two facts together with $a\Vdash A$ give $e\Vdash A\vee B$.
For the reversed disjunction, let $e'=\langle 0,b\rangle$. Then $(e')_0=0$ and $(e')_1=b$, so the same disjunction clause now reads the tag $0$ as choosing the left formula $B$ in $B\vee A$. Since $b\Vdash B$, it follows that $e'\Vdash B\vee A$. The number $0$ does not mean “the formula named $A$”; it means “the left-hand side of the displayed disjunction,” so the realizer records both evidence and the branch in which that evidence is to be used.
[/example]
The tagged disjunction example shows how a connective imposes a data format on realizers. The next connective to isolate is negation, because constructive refutation is not a truth-value complement but a method for turning alleged evidence into contradiction.
[definition: Realizability for Negation]
For an arithmetic formula $A$ and $e \in \mathbb N$,
\begin{align*}
e \Vdash \neg A
\end{align*}
iff $e \Vdash A \to \bot$.
[/definition]
This makes refutation computational: to realize $\neg A$ is to give a uniform procedure showing that every alleged construction of $A$ collapses into impossibility. It also explains why double negation need not return a witness for $A$; a procedure refuting all refutations of $A$ need not contain direct evidence for $A$.
The universal and existential clauses give the simplest examples of proof as program. A proof of a universal statement is interpreted as a function sending an input $n$ to evidence for the instance, while a proof of an existential statement is interpreted as an output pair consisting of a witness and evidence that the witness works.
[example: Realizer for Successor Existence]
Consider $\forall n\,\exists m\,(m>n)$. Fix the primitive recursive function
\begin{align*}
f(n)=\langle n+1,0\rangle.
\end{align*}
Let $e$ be an index for $f$, so for every $n\in\mathbb N$,
\begin{align*}
\varphi_e(n)=\langle n+1,0\rangle.
\end{align*}
By the defining equations for the pairing projections,
\begin{align*}
(\varphi_e(n))_0=n+1
\end{align*}
and
\begin{align*}
(\varphi_e(n))_1=0.
\end{align*}
The atomic sentence $n+1>n$ is true in the standard natural numbers. Hence, by the atomic clause of Kleene realizability, $0\Vdash n+1>n$. Using the existential clause with witness $(\varphi_e(n))_0=n+1$ and realizing component $(\varphi_e(n))_1=0$, we get
\begin{align*}
\varphi_e(n)\Vdash \exists m\,(m>n).
\end{align*}
Since this holds for every $n$ and $\varphi_e(n)$ is defined for every $n$, the universal clause gives
\begin{align*}
e\Vdash \forall n\,\exists m\,(m>n).
\end{align*}
The extracted program is the successor function on witnesses: on input $n$, it outputs $m=n+1$ together with evidence for the true atomic inequality $n+1>n$.
[/example]
This example is intentionally small, but it contains the pattern used throughout arithmetic: the proof supplies a uniform recursive method for producing witnesses. More involved arguments require the realizer to carry intermediate data, not only the final number.
## Soundness and Numerical Existence
Why should every proof in intuitionistic arithmetic have a realizer? The answer is an induction on derivations: each logical rule corresponds to a computable operation on realizers. Introduction rules build data, elimination rules consume data, and induction is handled by primitive recursion.
[quotetheorem:7514]
[citeproof:7514]
The closed-sentence hypothesis avoids an ambiguity: for open formulas realizability must be defined relative to a numerical assignment, and the theorem then has a uniformity clause over all assignments. Without that clause the statement is false as written, because a formula such as $x=0 \vee \neg(x=0)$ has different realizers under different assignments to $x$. The restriction to $\mathrm{HA}$ is also essential. If unrestricted excluded middle were added, then the sentence $P\vee \neg P$ for an undecided arithmetic sentence $P$ would demand a realizability tag choosing either a realizer of $P$ or a realizer of $\neg P$; for arbitrary $P$, no uniform recursive operation can supply such a decision. The boundary is therefore exact: soundness sends intuitionistic derivations to programs, but it neither realizes arbitrary classical principles nor turns every true arithmetic sentence into an $\mathrm{HA}$ derivation. To see what kind of data such a code can contain, it helps to look at a theorem whose constructive proof computes several related numbers rather than a single witness.
[example: Extracting Euclidean Division Data]
A constructive proof of Euclidean division for positive divisors has the form
\begin{align*}
\forall a\,\forall b\,(b>0 \to \exists q\,\exists r\,(a=qb+r \wedge r<b)).
\end{align*}
Fix $a,b\in\mathbb N$ and suppose $c\Vdash b>0$. Since $b>0$ is atomic, this means exactly that $b>0$ is true in the standard natural numbers.
The repeated-subtraction procedure starts with $q_0=0$ and $r_0=a$. While $r_i\ge b$, set
\begin{align*}
q_{i+1}=q_i+1
\end{align*}
and
\begin{align*}
r_{i+1}=r_i-b.
\end{align*}
The invariant is $a=q_i b+r_i$. At $i=0$,
\begin{align*}
q_0b+r_0=0\cdot b+a=a.
\end{align*}
If $a=q_i b+r_i$ and $r_i\ge b$, then
\begin{align*}
q_{i+1}b+r_{i+1}=(q_i+1)b+(r_i-b)=q_i b+b+r_i-b=q_i b+r_i=a.
\end{align*}
Because $b>0$, each subtraction decreases the remainder by at least $1$, so the loop stops after at most $a$ steps. Let $t$ be the stopping stage, and put $q=q_t$ and $r=r_t$. The invariant gives
\begin{align*}
a=qb+r.
\end{align*}
The stopping condition gives
\begin{align*}
r<b.
\end{align*}
The realizer outputs the nested pair
\begin{align*}
\langle q,\langle r,\langle 0,0\rangle\rangle\rangle.
\end{align*}
Its first projection is $q$. Its second projection has first projection $r$ and second projection $\langle 0,0\rangle$. Since $a=qb+r$ and $r<b$ are true atomic sentences, the atomic clause of Kleene realizability gives $0\Vdash a=qb+r$ and $0\Vdash r<b$. Hence $\langle 0,0\rangle$ realizes $a=qb+r\wedge r<b$, the pair $\langle r,\langle 0,0\rangle\rangle$ realizes $\exists r\,(a=qb+r\wedge r<b)$, and the whole output realizes $\exists q\,\exists r\,(a=qb+r\wedge r<b)$. Thus the program contains both the quotient-remainder witnesses and the exact verification structure demanded by the formula.
[/example]
Euclidean division illustrates that realizers carry witnesses in a machine-readable position. This motivates the general numerical existence principle: whenever the conclusion is an existential statement over natural numbers, the realizer's first projection should reveal an actual numeral. The conclusion below is stronger than the external extraction of a number: it asserts that a proof of a particular instance exists inside $\mathrm{HA}$, so the final step is proof-theoretic rather than a direct consequence of realizability soundness alone.
[quotetheorem:7515]
[proofunderconstruction:7515]
Each hypothesis marks a real boundary. The condition on free variables is needed: from $\exists n\,(n=x)$ a witness can be chosen only after an external value for $x$ is fixed, so there is no single numeral $k$ for all assignments. The use of $\mathrm{HA}$ and normalisation is needed as well. If $P$ is an arithmetic sentence not decided by the base theory and $A(n)$ abbreviates $(n=0\wedge P)\vee(n=1\wedge\neg P)$, then classical logic proves $\exists n\,A(n)$ by excluded middle, while a proof of either displayed instance would decide $P$. The conclusion is also deliberately syntactic: it gives an $\mathrm{HA}$ proof of one instance for a closed existential theorem, not a uniform computable choice function for formulas with parameters. This boundary motivates a parallel extraction theorem for disjunctions: when the conclusion is $A\vee B$, the outer tag of the realizer should identify which branch has been constructively obtained.
For a tagged Kleene realizer of $A\vee B$, the first projection is the tag and the second projection is the payload. Tag $0$ carries a realizer of $A$, while tag $1$ carries a realizer of $B$. The extraction is therefore definitional: inspect the tag, then read the payload as evidence for the corresponding branch.
The intuitionistic proof hypothesis is what makes the tag meaningful. If this hypothesis is replaced by classical derivability, excluded middle proves $P\vee\neg P$ for every arithmetic sentence $P$, but a realizer would still have to choose the left tag with evidence for $P$ or the right tag with a procedure refuting every realizer of $P$; an undecided $P$ gives no computable choice of that kind. This example invalidates the logical hypothesis, not the closedness condition.
The closed-formula boundary is separate. For $x=0\vee\neg(x=0)$, the correct tag depends on the value assigned to $x$, so an open formula cannot be assigned a single branch without also specifying the numerical assignment. This example keeps intuitionistic reasoning but invalidates the closed-formula hypothesis. The theorem also has a precise limitation: it extracts a branch from an already supplied realizer, but it does not by itself produce an $\mathrm{HA}$ derivation of that branch, decide whether $A$ or $B$ is true in the standard model, or give a stable branch choice for formulas with parameters. The formal disjunction property needs a separate normalisation argument turning a proof of the disjunction into a proof ending with the corresponding introduction rule. This is a recurring pattern: realizability makes elimination rules computationally explicit by forcing proof objects to carry enough structure for later use, while proof-theoretic conclusions about derivability require an additional syntactic theorem.
## Church's Thesis and the Boundary Between Realizable and Derivable Principles
Realizability is not just a model of theorems of Heyting arithmetic; it also validates some principles that Heyting arithmetic itself does not prove. This distinction matters because it separates computational soundness from derivability inside the base theory.
[definition: Church's Thesis Schema]
Let $T\subseteq \mathbb N^3$ be Kleene's primitive recursive computation predicate, with $T(e,n,s)$ expressing that the program with index $e$ halts on input $n$ with computation code $s$, and let $U:\mathbb N\to\mathbb N$ be the primitive recursive output function reading the value from such a computation code. The Church's Thesis schema for number-theoretic functions is the family of formulas
\begin{align*}
\forall n\,\exists m\,A(n,m) \to \exists e\,\forall n\,\exists s\,(T(e,n,s) \wedge A(n,U(s))),
\end{align*}
where $A(n,m)$ is an arithmetic formula.
[/definition]
Read informally, the schema turns pointwise constructive existence into a single index computing suitable witnesses. Since a realizer for the antecedent is already a uniform index taking $n$ to a code of a witness and its verification, the next question is whether the schema itself has a realizer.
[quotetheorem:7516]
[proofunderconstruction:7516]
The antecedent is a constructive totality statement, so its realizer already contains a single uniform procedure for producing witnesses; without that universal realizer, separate witnesses for separate $n$ would not determine one index $e$. A concrete obstruction is a choice sequence that gives a witness for each input by external choice but no recursive rule for the choices: such data may satisfy pointwise existence externally, but it is not a realizer for $\forall n\,\exists m\,A(n,m)$. The use of arithmetized computation is also necessary. Replacing $T(e,n,s)$ by informal notation $\varphi_e(n)=m$ would no longer be a formula of arithmetic, and replacing it by an arbitrary relation not tied to computation would make the extracted index meaningless. The theorem does not say that every externally total function is recursive; it says that realizability validates the internal schema because realizers for universal-existential statements are uniform programs. Realizability of Church's Thesis might suggest that the schema should be an ordinary theorem of intuitionistic arithmetic. The next result marks the boundary: semantic validity in Kleene's realizability model is stronger than derivability in the base deductive system.
[quotetheorem:7517]
[proofunderconstruction:7517]
The theorem separates two kinds of validity. Kleene realizability validates the schema because the realizability model builds uniform recursive indices into the meaning of universal-existential evidence, whereas derivability in $\mathrm{HA}$ would require the schema to survive every intuitionistic model of arithmetic. The next explanation identifies the semantic obstruction that prevents that stronger conclusion.
[explanation: Metatheoretic Status of Non-Derivability]
A Kripke or sheaf model can validate the axioms and rules of $\mathrm{HA}$ while making the full Church's Thesis schema fail. The key semantic point is that such models may contain internally total number-theoretic functions that are not represented by a single recursive index. If $\mathrm{HA}$ proved the full schema, soundness for all intuitionistic models would force the schema to hold in those countermodels, contradicting their construction.
[/explanation]
The word "ordinary" matters: the result concerns bare $\mathrm{HA}$, not extensions of $\mathrm{HA}$ obtained by adding Church's Thesis as an axiom schema. The full schema is also stronger than any single verified recursive program, since a particular recursive function can be represented in $\mathrm{HA}$ while the universal assertion that all constructive total relations are recursively indexed is not derivable. A concrete Kripke obstruction is a model whose nodes successively reveal longer finite pieces of a total function $f:\mathbb N\to\mathbb N$ chosen to avoid every recursive index: at each finite stage the axioms of $\mathrm{HA}$ are preserved, but no node can force a single index computing all future values of $f$. Sheaf models give the same kind of failure by allowing locally defined continuous choice data with no global recursive index. This shows exactly why the hypotheses matter: realizability validates Church's Thesis in Kleene's computational model, while derivability in $\mathrm{HA}$ would require validity in all intuitionistic models. A statement may be realizable because it matches the computational reading of one model while still being absent from the deductive strength of the base formal system.
## Modified Realizability and Extracted Algorithms
Kleene realizability uses natural numbers as codes for partial recursive functions. For proof mining and higher-type constructive mathematics, this single-code presentation quickly becomes hard to use: a proof of a statement such as $\forall k\,\exists N\,\forall m,n\ge N\,P(k,N,m,n)$ may contain a modulus, verification data, and a transformation of hypotheses, but the Kleene code hides all of these behind projections from one natural number and applications of one partial recursive index. When such codes are composed through implication or nested quantifiers, the reader must repeatedly decode pairs, recover which component is a witness, and distinguish the computational functional from any quantitative bound it happens to compute. Modified realizability separates those roles directly: the witness produced by a proof, the functional transforming assumptions into conclusions, and the majorants controlling the computation are represented at their natural finite types.
[definition: Modified Realizability]
In intuitionistic arithmetic in finite types, modified realizability consists of a recursive type assignment
\begin{align*}
|\cdot|:\mathrm{Form}_{\mathrm{HA}^{\omega}}\to \mathrm{FT}
\end{align*}
from formulas of finite-type arithmetic to finite types, together with relations $r\Vdash_{\mathrm{mr}}A$ for $r:|A|$, uniformly in the free numerical variables of $A$, defined as follows.
For computationally irrelevant atomic formulas $P$, $|P|$ is the unit type and $r\Vdash_{\mathrm{mr}}P$ iff $P$ holds.
For conjunction, $|A\wedge B|=|A|\times |B|$, and $(r_0,r_1)\Vdash_{\mathrm{mr}}A\wedge B$ iff $r_0\Vdash_{\mathrm{mr}}A$ and $r_1\Vdash_{\mathrm{mr}}B$.
For disjunction, $|A\vee B|=\mathbb N\times |A|\times |B|$. A triple $(i,r_0,r_1)$ realizes $A\vee B$ iff either $i=0$ and $r_0\Vdash_{\mathrm{mr}}A$, or $i=1$ and $r_1\Vdash_{\mathrm{mr}}B$.
For implication, $|A\to B|=|A|\to |B|$, and $f\Vdash_{\mathrm{mr}}A\to B$ iff for every $r:|A|$, if $r\Vdash_{\mathrm{mr}}A$, then $f(r)\Vdash_{\mathrm{mr}}B$.
For universal quantification over natural numbers, $|\forall n\,A(n)|$ is the dependent function type $\prod_{n\in\mathbb N}|A(n)|$, and $f\Vdash_{\mathrm{mr}}\forall n\,A(n)$ iff for every $n\in\mathbb N$, $f(n)\Vdash_{\mathrm{mr}}A(n)$.
For existential quantification over natural numbers, $|\exists n\,A(n)|$ is the dependent pair type $\sum_{n\in\mathbb N}|A(n)|$, and $(n,r)\Vdash_{\mathrm{mr}}\exists n\,A(n)$ iff $r\Vdash_{\mathrm{mr}}A(n)$.
[/definition]
In the disjunction clause, the component not selected by the tag is present only to keep all realizers inside the fixed product type $\mathbb N\times |A|\times |B|$; it carries no evidence for the selected branch. The change from numerical codes is not merely cosmetic. Instead of coding everything by natural numbers, modified realizability works closer to the typed structure of the proof, which makes extracted algorithms easier to read and easier to compose with later arguments.
[example: Modified Realizer for Bounded Search]
Fix $N\in\mathbb N$, and read $\exists m\le N\,A(m)$ as $\exists m\,(m\le N\wedge A(m))$. Suppose the decidable search tests $A(0),A(1),\dots,A(N)$ in order and stops at the first $m$ such that $A(m)$ holds. Then for each $i<m$ the recorded decision is the negative branch for $A(i)$, and at stage $m$ the recorded decision is the positive branch with some realizing data $r_m\Vdash_{\mathrm{mr}}A(m)$.
Since $m$ occurs in the list $0,1,\dots,N$, we have $m\le N$. The formula $m\le N$ is atomic, so its modified realizer is the unit value $*$. By the conjunction clause for modified realizability, the pair $(*,r_m)$ realizes
\begin{align*}
m\le N\wedge A(m).
\end{align*}
By the existential clause, the dependent pair
\begin{align*}
(m,(*,r_m))
\end{align*}
realizes
\begin{align*}
\exists m\,(m\le N\wedge A(m)).
\end{align*}
Thus the modified realizer keeps the witness $m$ and the verification data $(*,r_m)$ in their typed positions, while the finite search procedure parameterized by $N$ records the branch history that explains why the computation stopped there.
[/example]
Bounded search shows that extracted programs often come with useful quantitative information. To use that information systematically, we introduce majorants: objects that control the size of witnesses or the running bounds produced by a realizer.
[definition: Majorant]
A majorant for a function $f: \mathbb N \to \mathbb N$ is a function $g: \mathbb N \to \mathbb N$ such that
\begin{align*}
f(n) \le g(n)
\end{align*}
for every $n \in \mathbb N$.
[/definition]
After a majorant is extracted, the exact witness-producing algorithm may become less important than the bound it satisfies. Since modified realizability is meant to support such extraction from proofs, it also needs its own soundness theorem.
[quotetheorem:7518]
[citeproof:7518]
The closedness hypothesis again avoids hidden dependence on external variables; for an open implication such as $P(x)\to \exists n\,A(x,n)$ the extracted object must be a term depending on a realizer for $P(x)$ and on the numerical value of $x$, not a closed term. The finite-type restriction is essential because the theorem is stated inside a typed calculus whose primitive recursion and function spaces determine what counts as an extracted functional; allowing arbitrary set-level functionals would move outside the term model used in the induction. Computational irrelevance is another boundary: for a decidable atomic equation the unit realizer records no algorithm beyond truth, so the theorem should not be read as extracting computational content from every proved atomic fact. The conclusion is consequently limited but useful: it gives a typed realizing term for closed derivations, not necessarily an extensionally optimal algorithm or the sharpest bound. Modified realizability therefore supports a finer extraction workflow. First identify the witnesses promised by existential conclusions; then identify the functionals needed to transform assumptions into those witnesses; finally extract majorants when the proof contains quantitative information.
[example: Witnesses and Majorants in a Convergence Argument]
Assume a constructive proof shows that a sequence $(x_n)_{n\in\mathbb N}$ in a [metric space](/page/Metric%20Space) is Cauchy:
\begin{align*}
\forall k\,\exists N\,\forall m,n\ge N\,d(x_m,x_n)<2^{-k}.
\end{align*}
Under modified realizability, the outer universal quantifier is realized by a function sending each precision input $k$ to realizing data for the existential statement
\begin{align*}
\exists N\,\forall m,n\ge N\,d(x_m,x_n)<2^{-k}.
\end{align*}
Thus, for each $k$, the extracted realizer has the form
\begin{align*}
(M(k),r_k),
\end{align*}
where $M(k)$ is the displayed witness for $N$ and $r_k$ realizes the remaining assertion
\begin{align*}
\forall m,n\ge M(k)\,d(x_m,x_n)<2^{-k}.
\end{align*}
Unpacking that universal statement, whenever $m\ge M(k)$ and $n\ge M(k)$, the verification component $r_k$ establishes
\begin{align*}
d(x_m,x_n)<2^{-k}.
\end{align*}
Now let $G$ be a majorant for $M$. By the definition of majorant, for every $k\in\mathbb N$,
\begin{align*}
M(k)\le G(k).
\end{align*}
The realizer therefore gives the exact modulus value $M(k)$, while the majorant gives a possibly coarser numerical bound on that value. The proof has extracted more than the statement that some $N$ exists: it supplies a function $M$ choosing such an $N$ for each $k$, verification data showing that the choice works, and a quantitative control function $G$ satisfying $M(k)\le G(k)$ for every precision $k$.
[/example]
This completes the passage from proof meaning to program extraction. Kleene realizability gives a concrete numerical semantics for arithmetic proofs; soundness turns derivations into realizers; the numerical existence property explains witness extraction; and modified realizability refines the output into typed algorithms and quantitative bounds.
The same pattern reappears outside arithmetic proof theory. In type theory, the realizability slogan becomes the Curry-Howard reading of propositions as types and proofs as terms. In computability theory, Church's Thesis marks the difference between externally chosen witnesses and witnesses generated by a uniform algorithm. In [numerical analysis](/page/Numerical%20Analysis) and constructive analysis, extracted moduli and majorants are not cosmetic annotations: they become rates of convergence, search bounds, and certified algorithms. Realizability is therefore both a semantics for intuitionistic logic and a bridge from formal proof to executable mathematical information.
Realizability completed the bridge from proof to algorithm by showing how constructive arguments carry executable information. The next chapter uses that bridge to define the basic objects of constructive analysis, where equality, apartness, functions, and metric structure all have to be given with usable data.
# 7. Constructive Sets, Functions, and Metric Spaces
This chapter turns the logical discipline of Chapters 1-6 into the basic objects of analysis: sets with equality, extensional functions, apartness, located subsets, and metric completion. In constructive mathematics, a set is not only a carrier of elements: it must come with the information needed to compare, separate, approximate, and compute with those elements. The guiding question is how far the ordinary language of sets, functions, real numbers, and metric spaces survives when existence means construction rather than mere consistency.
The chapter begins with the constructive treatment of sets and functions, then introduces apartness and locatedness as replacements for classically automatic distinctions. It ends with metric completion and compactness principles, where the constructive content of approximation becomes visible.
## Sets, Predicates, and Extensional Functions
How should we speak about a collection when equality may not be decidable? Classical set theory often hides this issue because every pair of elements either is equal or is not equal. Constructively, the data attached to equality affects what counts as a function, what counts as a subset, and what it means for two mathematical objects to be the same.
A first convention is that a set can be presented either as a type of objects or as a collection equipped with an equality relation. The equality relation must support substitution in all constructions that use the object.
[definition: Set With Equality]
A set with equality is a collection $X$ together with a relation
\begin{align*}
=_{X}:X\times X\to \operatorname{Prop}
\end{align*}
such that, for all $x,y,z \in X$,
\begin{align*}
x =_{X} x, \qquad x =_{X} y \implies y =_{X} x, \qquad x =_{X} y \land y =_{X} z \implies x =_{X} z.
\end{align*}
[/definition]
This definition fixes the ambient equality relation, but it does not yet say how to describe smaller collections inside $X$. This distinction matters because we need constructive subsets are usually specified by evidence for a condition, and that evidence must be compatible with the equality just chosen.
[definition: Subset Predicate]
Let $X$ be a set with equality. A subset of $X$ is a predicate $P:X\to \operatorname{Prop}$ that is extensional: whenever $x =_{X} y$, a proof of $P(x)$ can be transformed into a proof of $P(y)$.
[/definition]
This definition records a common constructive constraint: membership must respect the equality already chosen on the ambient set. The same compatibility problem appears for maps, so this prepares us to exclude rules that depend on a representative rather than on the element represented.
[definition: Extensional Function]
Let $X$ and $Y$ be sets with equality. An extensional function $f: X \to Y$ is an assignment sending each $x \in X$ to an element $f(x) \in Y$ such that
\begin{align*}
x =_{X} y \implies f(x) =_{Y} f(y)
\end{align*}
for all $x,y \in X$.
[/definition]
Extensionality prevents a construction from depending on hidden representatives. This is essential for quotient-like constructions, real numbers represented by Cauchy sequences, and functions defined on completed spaces.
[example: Rational Representatives]
Represent a rational number by a pair $(a,b)\in \mathbb Z\times \mathbb N_{>0}$ with value $a/b$, and declare
\begin{align*}
(a,b)=(c,d) \quad \text{iff} \quad ad=bc.
\end{align*}
The representatives $(1,2)$ and $(2,4)$ are equal because
\begin{align*}
1\cdot 4=4=2\cdot 2.
\end{align*}
However, the rule $F(a,b)=a+b$ gives
\begin{align*}
F(1,2)=1+2=3
\end{align*}
and
\begin{align*}
F(2,4)=2+4=6.
\end{align*}
Since $3\ne 6$ in $\mathbb Z$, equal rational representatives can be sent to unequal integers, so $F$ is not extensional.
By contrast, the rule sending $(a,b)$ to its rational value $a/b$ is extensional: if $(a,b)=(c,d)$, then $ad=bc$, which is exactly the equality condition identifying the rational values $a/b$ and $c/d$. Thus changing representatives may change representative-dependent data such as $a+b$, but it does not change the rational number represented.
[/example]
The example shows why equality data is not bookkeeping. If the intended object is a rational number, operations must be invariant under changing its representative. This matters because we need to verify that predicate-defined subsets have the expected mapping property and therefore behave like genuine subobjects.
[quotetheorem:7519]
[citeproof:7519]
This theorem is the constructive version of the usual universal property of a subset. The extensionality hypothesis is essential: for rational representatives, the predicate "$a+b$ is even" on a representative $(a,b)$ need not survive replacing $(a,b)$ by an equal fraction, so the proposed subset would not have equality inherited from the rational number it claims to describe. The theorem does not say that membership in the subset is decidable; it says only that whenever membership evidence is supplied, it is transported along equality. This prepares the later treatment of located subsets, where predicates again need extra data before they support metric operations such as distance.
## Apartness and Located Subsets
How can we express that two objects are distinct when negated equality may be too weak to compute with? Constructive analysis often needs a positive notion of separation. Apartness supplies this positive information: it says not merely that equality leads to contradiction, but that we have evidence separating the two objects.
[definition: Apartness Relation]
A tight apartness relation on a set with equality $X$ is a binary relation
\begin{align*}
\#\, : X\times X \to \operatorname{Prop}
\end{align*}
such that, for all $x,y,z \in X$,
\begin{align*}
\neg(x \# x).
\end{align*}
\begin{align*}
x \# y \implies y \# x.
\end{align*}
\begin{align*}
x \# y \implies \neg(x =_{X} y).
\end{align*}
\begin{align*}
x =_{X} y \implies \neg(x \# y).
\end{align*}
\begin{align*}
x \# z \implies x \# y \vee y \# z.
\end{align*}
\begin{align*}
\neg(x \# y) \implies x =_{X} y.
\end{align*}
[/definition]
The fifth axiom is the constructive cotransitivity principle. It makes apartness behave like a positive inequality that can be propagated through intermediate points. The two displayed compatibility clauses state that apartness is tight with respect to the equality relation already fixed on $X$: equality rules out positive separation, and failure of positive separation identifies the two elements as equal in $X$.
[example: Apartness on Rational Numbers]
For $q,r\in\mathbb Q$, define $q\# r$ to mean $|q-r|>0$. This is positive separation: if $q\# r$ and $q=r$, then
\begin{align*}
|q-r|=|q-q|=|0|=0,
\end{align*}
contradicting $|q-r|>0$. It is symmetric because
\begin{align*}
|r-q|=|-(q-r)|=|q-r|.
\end{align*}
Also $q\not\# q$, since
\begin{align*}
|q-q|=|0|=0.
\end{align*}
To see cotransitivity, assume $q\# r$, so $|q-r|>0$, and let $s\in\mathbb Q$. Rational equality is decidable, so either $s=q$ or $s\ne q$. If $s\ne q$, then rational trichotomy gives $s<q$ or $q<s$, and in both cases
\begin{align*}
|q-s|>0,
\end{align*}
so $q\# s$. If $s=q$, then
\begin{align*}
|s-r|=|q-r|>0,
\end{align*}
so $s\# r$. Hence for every $s$ we have $q\# s$ or $s\# r$. The point is that rational apartness contains finite comparison data, not merely the negation of equality.
[/example]
For real numbers, apartness is similar in form but different in content. A proof that $x \# y$ must contain a positive rational lower bound on the distance between $x$ and $y$.
[example: Apartness on Constructive Real Numbers]
Let real numbers be represented by regular Cauchy sequences of rationals, so a representative $x=(x_n)$ satisfies $|x_n-x_m|\le 2^{-n}+2^{-m}$ for all $m,n$. Define $x\# y$ to mean that, for compatible representatives $x=(x_n)$ and $y=(y_n)$, there is an index $n\in\mathbb N$ with
\begin{align*}
|x_n-y_n|>2^{-n+1}.
\end{align*}
This is positive separation because equality of the represented real numbers would force the finite gap at stage $n$ to be no larger than the tail error. Indeed, if $x=y$, then for every rational $\eta>0$ we may choose $m$ so large that $|x_m-y_m|<\eta$ and $2^{-m+1}<\eta$. By the triangle inequality and regularity,
\begin{align*}
|x_n-y_n|\le |x_n-x_m|+|x_m-y_m|+|y_m-y_n|.
\end{align*}
The two regularity estimates give
\begin{align*}
|x_n-x_m|\le 2^{-n}+2^{-m}.
\end{align*}
\begin{align*}
|y_m-y_n|\le 2^{-m}+2^{-n}.
\end{align*}
Substituting these three bounds gives
\begin{align*}
|x_n-y_n|\le 2^{-n+1}+2^{-m+1}+\eta.
\end{align*}
Since $2^{-m+1}<\eta$, this yields
\begin{align*}
|x_n-y_n|<2^{-n+1}+2\eta.
\end{align*}
If also $|x_n-y_n|>2^{-n+1}$, choose a positive rational $\eta$ with $2\eta<|x_n-y_n|-2^{-n+1}$, contradicting the last inequality. Thus a witness to $x\# y$ rules out $x=y$.
The data in $x\# y$ is stronger than $\neg(x=y)$: it supplies a concrete finite index $n$ and a rational inequality showing separation at that stage, while a proof of $\neg(x=y)$ need only turn any proposed equality proof into a contradiction.
[/example]
Apartness separates points, but analysis also needs separation between a point and a subset. Before asking for a distance to a subset, we must first rule out the empty case, since the distance from a point to the empty set is not a finite real number. This need motivates a minimal non-emptiness condition.
[definition: Inhabited Subset]
Let $(X,d)$ be a metric space. A subset $A \subset X$ is inhabited if there exists $a \in X$ such that $a \in A$.
[/definition]
Inhabitedness provides a starting point for approximation, but it does not supply an infimum of distances. This distinction matters because constructive proofs often require not only that points of $A$ exist, but that the distance from any external point to $A$ can be computed as a real number.
[definition: Located Subset]
Let $(X,d)$ be a metric space. An inhabited subset $A \subset X$ is located if it is equipped with a located-real-valued distance map
\begin{align*}
\operatorname{dist}_A:X\to \mathbb R_{\ge 0}, \qquad \operatorname{dist}_A(x)=\inf_{a\in A} d(x,a),
\end{align*}
and an approximation operation
\begin{align*}
\operatorname{approx}_A : X\times \mathbb Q_{>0}\to A
\end{align*}
such that, for every $x\in X$ and every rational $\varepsilon>0$,
\begin{align*}
d(x,\operatorname{approx}_A(x,\varepsilon))<\operatorname{dist}_A(x)+\varepsilon.
\end{align*}
[/definition]
We write $\operatorname{dist}(x,A)$ for $\operatorname{dist}_A(x)$ once the located structure on $A$ has been fixed.
Locatedness is automatic in many finite or compact situations, but it is not automatic from inhabitedness alone. The obstruction is that computing the infimum can require deciding whether arbitrarily close candidates exist. Here the distance value is a located real, so it supports the usual constructive [comparison principle](/theorems/4870): if rational bounds $q<r$ are separated, then the distance lies above $q$ or below $r$. The next theorem is needed to turn the located distance into the usable operation of choosing near-minimisers.
[quotetheorem:7520]
[citeproof:7520]
This result is used whenever an argument says “choose a point nearly minimising the distance.” The locatedness hypothesis is doing real work: for the subset of $\mathbb R$ controlled by an undecided proposition in the next example, inhabitedness alone gives a point but does not give an algorithm for near-minimising distance from every external point. The theorem does not say that an exact closest point exists, nor that membership in $A$ is decidable; it only supplies approximants to the distance. This is the metric analogue of the earlier predicate issue: extra structure is required before a subset can support the operations later used in compactness and completion arguments.
[example: A Non-Located Subset as a Warning]
Let $P$ be a proposition not known to be decidable, and define $A_P \subset \mathbb R$ by
\begin{align*}
x\in A_P \quad \text{iff} \quad x=0 \vee (x=1 \wedge P).
\end{align*}
This subset is inhabited because $0=0$, so the left disjunct gives $0\in A_P$.
Assume, for contradiction, that $A_P$ is located, and write
\begin{align*}
r=\operatorname{dist}(1,A_P).
\end{align*}
Since $1/3<2/3$, locatedness of the real number $r$ gives the comparison alternative
\begin{align*}
r<2/3 \quad \text{or} \quad r>1/3.
\end{align*}
In the first case, $r<2/3$. Apply the approximation operation for $A_P$ to the point $1$ with tolerance $1/3$. It gives some $a\in A_P$ such that
\begin{align*}
|1-a|<r+1/3.
\end{align*}
Because $r<2/3$, we have
\begin{align*}
r+1/3<2/3+1/3=1.
\end{align*}
Hence
\begin{align*}
|1-a|<1.
\end{align*}
The membership evidence for $a\in A_P$ is either $a=0$ or $(a=1\wedge P)$. If $a=0$, then
\begin{align*}
|1-a|=|1-0|=|1|=1,
\end{align*}
contradicting $|1-a|<1$. Therefore the first disjunct is impossible, so the membership evidence must give $a=1\wedge P$, and hence gives $P$.
In the second case, $r>1/3$. We show $\neg P$. If $P$ held, then $1=1$ together with $P$ would give $1\in A_P$. Since $r$ is the infimum of the distances from $1$ to points of $A_P$, this particular point gives
\begin{align*}
r\le |1-1|.
\end{align*}
But
\begin{align*}
|1-1|=|0|=0,
\end{align*}
so $r\le 0$, contradicting $r>1/3$. Thus $P$ implies a contradiction, and therefore $\neg P$.
So locatedness of $A_P$ yields $P\vee\neg P$: the case $r<2/3$ gives $P$, and the case $r>1/3$ gives $\neg P$. For an undecided proposition $P$, this decision cannot be supplied constructively, so inhabitedness alone does not make $A_P$ located.
[/example]
The warning example marks a recurring theme: constructive existence of distances, suprema, and infima carries information. Compactness notions are designed to supply enough finite approximation to recover such information.
## Total Boundedness and Constructive Compactness
What should compactness mean when arbitrary open covers are too large to search? In metric constructive analysis, the usable approximation property is [total boundedness](/page/Total%20Boundedness): at every scale, the space can be covered by finitely many balls. Completion then supplies limits of Cauchy approximations.
[definition: Total Boundedness]
A metric space $(X,d)$ is totally bounded if, for every rational $\varepsilon>0$, there exist $n \in \mathbb N$ and points $x_1,\dots,x_n \in X$ such that
\begin{align*}
X = \bigcup_{i=1}^{n} B(x_i,\varepsilon).
\end{align*}
[/definition]
Total boundedness is finite approximability at each resolution. It is weaker than compactness because it does not by itself assert that limiting approximation processes converge in the space.
[example: The Rational Unit Interval]
Let $X=[0,1]\cap \mathbb Q$ with $d(q,r)=|q-r|$. Given a rational $\varepsilon>0$, choose $n\in\mathbb N$ with $1/n<\varepsilon$, and consider the finite rational grid
\begin{align*}
G=\{0,1/n,2/n,\dots,n/n\}.
\end{align*}
For any $x\in X$, the rational order gives an integer $k$ with $0\le k\le n$ and
\begin{align*}
k/n\le x<(k+1)/n
\end{align*}
unless $x=1$, in which case choose $k=n$. If $x=1$, then
\begin{align*}
d(x,k/n)=|1-1|=0<\varepsilon.
\end{align*}
Otherwise,
\begin{align*}
d(x,k/n)=|x-k/n|=x-k/n< (k+1)/n-k/n=1/n<\varepsilon.
\end{align*}
Thus every point of $X$ lies within $\varepsilon$ of one of the finitely many grid points, so $X$ is totally bounded.
The space is not complete. Let $(q_m)$ be a rational sequence in $[0,1]$ converging in the usual real metric to $\sqrt{2}/2$, for instance decimal truncations from below. Since every convergent real sequence is Cauchy, $(q_m)$ is Cauchy with respect to the rational metric. If it converged in $X$ to some rational $q$, then [uniqueness of limits](/theorems/625) in the usual metric would give
\begin{align*}
q=\sqrt{2}/2.
\end{align*}
Hence $q^2=1/2$. Writing $q=a/b$ in lowest terms with $b>0$, this gives
\begin{align*}
(a/b)^2=1/2,
\end{align*}
so
\begin{align*}
2a^2=b^2.
\end{align*}
Thus $b^2$ is even, hence $b$ is even; write $b=2c$. Substitution gives
\begin{align*}
2a^2=(2c)^2=4c^2,
\end{align*}
so
\begin{align*}
a^2=2c^2.
\end{align*}
Thus $a$ is even as well, contradicting that $a/b$ was in lowest terms. Therefore the Cauchy sequence has no limit in $[0,1]\cap\mathbb Q$: finite nets give total boundedness, but they do not supply the missing irrational limit points.
[/example]
The example shows that finite nets alone do not keep all limiting processes inside the space. This prepares us to combine total boundedness with completeness and with explicit finite approximation data, which is the form of compactness used in later continuity arguments.
[definition: Constructively Compact Metric Space]
A metric space $(X,d)$ is constructively compact if it is complete and is equipped with an operation
\begin{align*}
\operatorname{Net}_X:\mathbb Q_{>0}\to \operatorname{List}(X)
\end{align*}
and a covering certificate operation
\begin{align*}
\operatorname{Cover}_X:\{(\varepsilon,x)\in \mathbb Q_{>0}\times X\}\to \{i\in \{1,\dots,|\operatorname{Net}_X(\varepsilon)|\}:d(x,(\operatorname{Net}_X(\varepsilon))_i)<\varepsilon\}.
\end{align*}
For each rational $\varepsilon>0$, if
\begin{align*}
\operatorname{Net}_X(\varepsilon)=(x_1,\dots,x_n),
\end{align*}
then
\begin{align*}
X = \bigcup_{i=1}^{n} B(x_i,\varepsilon).
\end{align*}
[/definition]
The uniform data clause prevents compactness from becoming a non-computational existence assertion. It says that nets can be produced as part of the structure, not merely proved to exist after an unbounded search. Without such data, a classical open-cover formulation can hide decisions about which finite subcover has been found; in computable analysis this is exactly where compact-looking spaces can fail to provide an effective modulus. The next theorem is a finite-net lemma rather than the full classical [compactness theorem](/theorems/2748): finite approximation data converts local continuity information into a single global modulus only when the local data is also presented uniformly on the chosen finite nets.
[quotetheorem:7521]
[citeproof:7521]
This finite-net lemma explains one way constructive compactness can substitute for classical compactness. The uniform finite-net continuity hypothesis is essential: a bare assertion of pointwise continuity gives a radius near each point but need not give any method for extracting one positive radius that works on a whole finite net; computable counterexamples on compact spaces are pointwise continuous without a computable uniform modulus. The theorem also does not say that every pointwise continuous map on a constructively [compact space](/page/Compact%20Space) is uniformly continuous; that stronger classical statement requires additional principles or stronger continuity data. Its forward role is to make extension to completions possible, because completed points are represented by Cauchy data and Cauchy data can be transported through a function only with a uniform modulus.
## Metric Completion by Regular Cauchy Sequences
How do we construct the real line, or the completion of any metric space, without appealing to a completed universe of limit points? The constructive answer is to represent a new point by an explicitly controlled Cauchy approximation. Regular Cauchy sequences give a convenient normal form because their convergence rate is built into the definition.
[definition: Regular Cauchy Sequence]
Let $(X,d)$ be a metric space. A regular Cauchy sequence in $X$ is a function $x:\mathbb N\to X$, written $x_n=x(n)$, such that
\begin{align*}
d(x_n,x_m) \le 2^{-n}+2^{-m}
\end{align*}
for all $m,n \in \mathbb N$.
[/definition]
The chosen error bound is not essential, but it makes later estimates uniform. To form new points from regular sequences, we must decide when two different approximations name the same limit. This prepares us to identify sequences whose termwise distance is forced to vanish at the same controlled rate.
[definition: Completion Equivalence]
Let $(x_n)$ and $(y_n)$ be regular Cauchy sequences in a metric space $(X,d)$. They are equivalent if
\begin{align*}
d(x_n,y_n) \to 0
\end{align*}
with an explicit modulus $M:\mathbb N\to\mathbb N$ such that $n\ge M(k)$ implies $d(x_n,y_n)<2^{-k}$.
[/definition]
This relation removes the dependence on a particular approximating sequence, and the modulus formulation is stable under transitivity: composing two equivalences only changes the requested precision, so each half is asked for a smaller error. This prepares us to turn equivalence classes into a metric space by assigning a distance between two completed points through the limiting distances of their representatives.
[definition: Metric Completion]
Let $(X,d)$ be a metric space. Its metric completion $\widehat X$ is the set of equivalence classes of regular Cauchy sequences in $X$, equipped with the metric
\begin{align*}
\widehat d:\widehat X\times \widehat X\to \mathbb R_{\ge 0}, \qquad
\widehat d([(x_n)],[(y_n)])=\text{ the constructive real represented by }r_k=d(x_{k+2},y_{k+2}), \quad k\in\mathbb N.
\end{align*}
[/definition]
This definition packages a process of approximation as a point, and the explicit shift in the index supplies the modulus needed for the displayed real. Indeed, if $k \le \ell$, the triangle inequality gives
\begin{align*}
|r_k-r_\ell| \le d(x_{k+2},x_{\ell+2})+d(y_{k+2},y_{\ell+2}) \le 2^{-k-1}+2^{-\ell-1}+2^{-k-1}+2^{-\ell-1}\le 2^{-k}+2^{-\ell}.
\end{align*}
Thus $(r_k)$ has the regular modulus required to represent a constructive real. The [equivalence relation](/page/Equivalence%20Relation) is also needed for well-definedness: replacing either representative by an equivalent one changes the represented distance by a sequence tending to $0$. The original space should still sit inside the completion without distorting distances, so the next theorem verifies this basic compatibility.
[quotetheorem:3995]
[citeproof:3995]
The embedding tells us that completion does not alter old distances. The use of constant regular sequences is essential: if $X$ were mapped by arbitrary approximating representatives, equality of old distances would require an additional coherence proof and could fail for badly chosen representatives. The theorem does not say that $\iota$ is onto; the rational unit interval already shows why completion may add new limit points. This prepares the uniqueness theorem by identifying the original space as a dense metric subspace of its completion rather than as a merely set-theoretic subset.
[example: Constructing the Completion of $\mathbb Q$]
Start with $\mathbb Q$ under $d(q,r)=|q-r|$. A point of the completion is represented by a regular Cauchy sequence $(q_n)$ of rationals, meaning
\begin{align*}
|q_n-q_m|\le 2^{-n}+2^{-m}
\end{align*}
for all $m,n$.
For addition, define
\begin{align*}
s_n=q_{n+1}+r_{n+1}.
\end{align*}
Then, using the triangle inequality in $\mathbb Q$,
\begin{align*}
|s_n-s_m|=|(q_{n+1}+r_{n+1})-(q_{m+1}+r_{m+1})|\le |q_{n+1}-q_{m+1}|+|r_{n+1}-r_{m+1}|.
\end{align*}
Regularity of $(q_n)$ and $(r_n)$ gives
\begin{align*}
|q_{n+1}-q_{m+1}|\le 2^{-(n+1)}+2^{-(m+1)}.
\end{align*}
\begin{align*}
|r_{n+1}-r_{m+1}|\le 2^{-(n+1)}+2^{-(m+1)}.
\end{align*}
Substituting these two estimates gives
\begin{align*}
|s_n-s_m|\le 2\cdot 2^{-(n+1)}+2\cdot 2^{-(m+1)}=2^{-n}+2^{-m}.
\end{align*}
Thus $(s_n)$ is again a regular Cauchy sequence, so it represents the sum $[(q_n)]+[(r_n)]$.
For multiplication, first note that regular sequences are rationally bounded: for example
\begin{align*}
|q_n|\le |q_0|+|q_n-q_0|\le |q_0|+2^{-n}+2^0\le |q_0|+2.
\end{align*}
Similarly $|r_n|\le |r_0|+2$. Choose indices far enough that the factor errors are small relative to these two bounds; then
\begin{align*}
|q_i r_i-q_j r_j|=|q_i(r_i-r_j)+r_j(q_i-q_j)|\le |q_i||r_i-r_j|+|r_j||q_i-q_j|.
\end{align*}
The displayed estimate is the reason multiplication uses sufficiently advanced terms rather than the naive termwise product. With these operations, the completion of $\mathbb Q$ is the constructive real line presented by rational approximations.
[/example]
The example constructs a familiar completion, but the construction should not depend on this particular presentation. This matters because we need to justify speaking of the completion rather than a completion by proving that any two dense complete realisations are isometric over the original space.
[quotetheorem:1002]
[citeproof:1002]
The theorem justifies speaking of the completion rather than a completion once the dense approximation operations are included as structure. The data requirement is essential: an isometric embedding into a larger complete space that contains an unrelated extra point would preserve all old distances but would not be determined by $X$, and a proof that every point is approximable would not by itself provide the approximating sequence needed to define $F$. The theorem also does not identify completions by equality of underlying sets; it gives a unique isometry respecting the embedded copy of $X$. This is the bridge to extension theorems, where maps are first defined on $X$ and then forced on all completed points by their action on regular approximations.
## Continuity Principles in Completed Spaces
When a function is defined first on a dense subspace, when does it extend to the completion? The answer is controlled by uniform continuity, because a completed point is a Cauchy approximation rather than a pre-existing limit. This supplies the theorem needed for moving constructions from rational approximations to completed spaces.
[quotetheorem:964]
[citeproof:964]
This extension principle is the practical reason regular Cauchy completions are useful. The uniform modulus hypothesis is essential: a merely pointwise continuous map can send a Cauchy approximation to a sequence whose Cauchy rate is not constructively controlled, so the proposed value at the completed point would not be available as data in $Y$. The theorem does not say that every continuous map on $X$ extends, and it does not create values outside the closure determined by the original approximations. It is the final form of the chapter's theme: constructive analysis works when the objects carry enough finite, uniform information for the required operation.
[remark: Constructive Moral]
The constructive treatment of metric spaces replaces several classical yes-or-no principles with positive data: apartness gives evidence of separation, locatedness gives computable distances to sets, total boundedness gives finite nets, and completion gives limits of regular approximations. These data are not decorative; they are the mechanisms that make analysis possible without unrestricted excluded middle.
[/remark]
Chapter 7 provided the structural language needed for constructive analysis: sets, functions, locatedness, and completion. The next chapter begins the reconstruction of the real numbers themselves, using that language to explain what kind of information a proof about reals must actually carry.
# 8. Constructive Real Numbers and Elementary Analysis
Constructive analysis begins by rebuilding the real numbers with attention to what data a proof carries. The prerequisites are the constructive logic of Chapters 1-4, the arithmetic and omniscience material of Chapter 5, the metric-completion language of Chapter 7, and the countable-choice discussion developed further in Chapter 10. Classically, Cauchy sequences, Dedekind cuts, decimal expansions, and nested intervals all describe the same complete ordered field. Constructively, the equivalence between these presentations may require countable choice, and order statements split into negative and positive forms such as equality, strict order, and apartness. This chapter develops the real line in a way that keeps algorithms, moduli, and located approximations visible.
## Presentations of the Real Numbers
The first problem is how to say that a sequence of rational approximations determines a real number when convergence must include usable information. A bare assertion that a rational sequence is Cauchy says that suitable approximations exist; a constructive real number should also provide enough data to find them.
[definition: Cauchy Real]
A Cauchy real is an equivalence class of pairs $(a,\nu)$, where $a:\mathbb N \to \mathbb Q$ and $\nu:\mathbb N \to \mathbb N$ satisfy
\begin{align*}
\forall k\in \mathbb N\ \forall m,n\ge \nu(k),\quad |a_m-a_n| \le 2^{-k}.
\end{align*}
Two such pairs $(a,\nu)$ and $(b,\mu)$ are equivalent when
\begin{align*}
\forall k\in \mathbb N\ \exists N\in \mathbb N\ \forall m,n\ge N,\quad |a_m-b_n| \le 2^{-k}.
\end{align*}
[/definition]
The modulus $\nu$ is part of the information, not extra decoration. It turns the statement of convergence into a procedure: given a desired precision $2^{-k}$, it gives a stage after which all later approximations agree to that precision.
[example: Rational Numbers as Cauchy Reals]
For $q\in \mathbb Q$, define $a_n=q$ for every $n\in\mathbb N$ and set $\nu(k)=1$. If $m,n\ge \nu(k)$, then $a_m=q$ and $a_n=q$, so
\begin{align*}
|a_m-a_n|=|q-q|=|0|=0\le 2^{-k}.
\end{align*}
Thus $(a,\nu)$ is a Cauchy representative.
For rationals $q,r\in\mathbb Q$, let $a_n=q$ and $b_n=r$ be the corresponding constant representatives. If $q=r$, then for every $k$ and every $m,n\ge 1$,
\begin{align*}
|a_m-b_n|=|q-r|=|0|=0\le 2^{-k}.
\end{align*}
So the two representatives are equivalent. Conversely, suppose the two constant representatives are equivalent. Then for every $k$ there is an $N$ such that, for all $m,n\ge N$,
\begin{align*}
|q-r|=|a_m-b_n|\le 2^{-k}.
\end{align*}
If $q\ne r$, then $d=|q-r|$ is a positive rational, and by the [Archimedean property](/theorems/737) for rationals there is a $k$ with $2^{-k}<d$. The displayed inequality for that $k$ gives $d\le 2^{-k}<d$, a contradiction. Hence $q=r$. Therefore constant rational representatives identify exactly equal rationals, so the rational line embeds into the Cauchy reals without adding any limiting information.
[/example]
This example shows the simplest case, where no genuine limiting process is present. For later estimates it is useful to replace arbitrary moduli by a uniform normal form, so we need to package the convergence rate directly into the sequence.
[definition: Regular Cauchy Sequence]
A regular Cauchy sequence is a sequence $a:\mathbb N \to \mathbb Q$ such that
\begin{align*}
|a_m-a_n| \le 2^{-m}+2^{-n} \quad \text{for all } m,n\in \mathbb N.
\end{align*}
[/definition]
Regular sequences make precision estimates easier because the index itself controls the error. This matters because we need to justify using this convenient presentation without changing the real numbers under discussion.
[quotetheorem:7522]
[citeproof:7522]
The regularisation theorem permits later arguments to work with a standard convergence rate. The hypothesis that the original representative comes with a modulus is essential: a sequence that is merely known to be Cauchy by a negative or non-effective argument need not provide indices from which the subsequence $b_k=a_{\nu(k+1)}$ can be computed. The theorem also does not say that every arbitrary rational sequence can be regularised; it says that a modulus-bearing Cauchy representative can be replaced by an equivalent regular one. Without this distinction, the construction would confuse the existence of a limit with the stronger ability to approximate that limit at a requested rate.
This comparison between rates and existence leads to sequence-free presentations of the real line, where rational order information is primary rather than derived from approximating terms. This motivates a precise definition of Dedekind reals, where locatedness is built into the cut.
[definition: Dedekind Real]
A Dedekind real is a subset $L\subset \mathbb Q$ such that:
1. $L$ is inhabited.
2. $L$ is proper, meaning there exists $q\in \mathbb Q$ with $q\notin L$.
3. If $p\in L$ and $r<p$, then $r\in L$.
4. If $p\in L$, then there exists $r\in \mathbb Q$ with $p<r$ and $r\in L$.
5. If $p<q$, then $p\in L$ or $q\notin L$.
[/definition]
The last condition is locatedness. It does not decide whether an arbitrary rational belongs to $L$; instead, it gives positive information about every rational interval that straddles the cut. The problem is exactly what extra principle lets the sequence and cut presentations become interchangeable.
[quotetheorem:7523]
[citeproof:7523]
This theorem explains why many constructive developments state which real-number presentation they use. The isomorphism is not a free identification inside every constructive foundation: the direction from a cut to a sequence asks for a compatible rational bracket at each precision, and that countable family of choices is exactly where countable choice enters. In realizability and sheaf-style models where such choice principles can fail, one can have a located Dedekind real without a globally chosen Cauchy representative with a modulus. The result also does not provide decidable comparisons or canonical computational names for all cuts; it only says that after adopting countable choice the two ordered-field structures can be matched.
The next remark records the foundational price paid in the reverse direction from cuts to sequences.
[remark: Role of Countable Choice]
The need for countable choice is not a technical nuisance. Constructing a Cauchy approximation from a cut requires making countably many choices of rational brackets, one for each precision. Some constructive foundations accept this principle; others keep the distinction between the two real lines visible.
[/remark]
## Order, Apartness, and Located Intervals
The next problem is that equality and order on constructive reals do not behave like decidable comparisons of rational numbers. To prove $x<y$ is to give a positive gap, while to prove $x=y$ is to give arbitrarily good mutual approximation.
[definition: Equality of Cauchy Reals]
For Cauchy reals $x$ and $y$ represented by regular sequences $a:\mathbb N\to \mathbb Q$ and $b:\mathbb N\to \mathbb Q$, write $x=y$ when
\begin{align*}
\forall k\in \mathbb N\ \exists N\in \mathbb N\ \forall n\ge N,\quad |a_n-b_n| \le 2^{-k}.
\end{align*}
[/definition]
This definition asks for approximation at every precision. A constructive proof of equality is therefore a uniform method for refining the comparison, not a finite comparison that terminates in all cases.
[definition: Strict Order on Cauchy Reals]
For Cauchy reals $x$ and $y$, choose regular representatives $a:\mathbb N\to \mathbb Q$ and $b:\mathbb N\to \mathbb Q$. Write $x<y$ when there exist $k,n\in \mathbb N$ such that
\begin{align*}
\forall m\ge n,\quad a_m+2^{-k}<b_m.
\end{align*}
[/definition]
The relation is independent of the chosen regular representatives, so it is a relation on Cauchy reals rather than on names for Cauchy reals. Strict order is positive information because it exhibits a rational separation that persists along the representatives. Requiring the separation only at a single early approximation would depend on the chosen representative rather than on the real number, since equivalent regular sequences may have misleading initial terms. Since non-equality need not identify which side contains the separation, constructive analysis introduces a separate relation for usable inequality.
[definition: Apartness of Reals]
For real numbers $x$ and $y$, write $x\# y$ when $x<y$ or $y<x$.
[/definition]
Apartness replaces the classical habit of treating inequality as the negation of equality. Constructively, $x\# y$ gives a side and a gap, while $\neg(x=y)$ only says that equality leads to contradiction.
[example: Strict Inequality Gives Data]
Let $x,y$ be represented by regular sequences $(a_n)$ and $(b_n)$, and suppose we are given a witness to $x<y$: numbers $k,n\in\mathbb N$ such that $a_m+2^{-k}<b_m$ for every $m\ge n$. Fix any $m\ge n$. Subtracting $a_m$ from the displayed inequality gives
\begin{align*}
2^{-k}<b_m-a_m.
\end{align*}
Since $2^{-(k+1)}=2^{-1}2^{-k}$ and $2^{-1}<1$, we have
\begin{align*}
2^{-(k+1)}<2^{-k}.
\end{align*}
Combining the two inequalities gives
\begin{align*}
a_m+2^{-(k+1)}<b_m.
\end{align*}
Thus the original witness also gives the weaker but stable rational separation $2^{-(k+1)}$ at every later approximation $m\ge n$. An algorithm can therefore wait until stage $n$, compare the rational approximations there or later, and branch using the certified fact that $x$ lies to the left of $y$ by a known positive margin.
[/example]
The example shows why positive order information can drive computation. For existence theorems, the analogous object is often not a decided point but an interval whose position and length can be controlled at each requested precision; this motivates a precise definition.
[definition: Located Interval]
A located interval is data $(a,b,\lambda,\rho,\delta)$ consisting of real numbers $a,b\in \mathbb R$, endpoint locators $\lambda,\rho:\mathbb N\to \mathbb Q\times \mathbb Q$, and a separation witness $\delta\in \mathbb N$ such that, writing $\lambda(k)=(l_k,u_k)$ and $\rho(k)=(r_k,s_k)$,
\begin{align*}
l_k\le a\le u_k,\qquad u_k-l_k\le 2^{-k}.
\end{align*}
\begin{align*}
r_k\le b\le s_k,\qquad s_k-r_k\le 2^{-k}.
\end{align*}
\begin{align*}
a+2^{-\delta}<b.
\end{align*}
The associated closed interval is
\begin{align*}
[a,b]=\{x\in \mathbb R: a\le x\le b\}.
\end{align*}
[/definition]
Located intervals are the constructive substitute for several classical point-existence conclusions. They provide finite approximations to the object being constructed, and this explains why decidability of exact equality is too much to expect.
[remark: Equality Is Not Decidable]
There is no constructive proof that for all real numbers $x,y$, either $x=y$ or $x\# y$. Such a proof would decide whether every real is zero or apart from zero, which would imply limited omniscience principles incompatible with the intended computational reading of constructive mathematics.
[/remark]
The failure of decidable equality does not prevent numerical computation. The next example illustrates how nested located intervals can compute a real while avoiding any global decision procedure for equality.
[example: Square Roots by Nested Intervals]
Let $c\in\mathbb Q$ with $c>0$, and choose rationals $l_0,u_0$ such that $0<l_0<u_0$ and $l_0^2<c<u_0^2$. Given $l_n,u_n$, set
\begin{align*}
m_n=\frac{l_n+u_n}{2}.
\end{align*}
Since rational order is decidable, exactly one of $m_n^2<c$, $m_n^2=c$, or $c<m_n^2$ holds. If $m_n^2<c$, define $l_{n+1}=m_n$ and $u_{n+1}=u_n$; then
\begin{align*}
l_{n+1}^2=m_n^2<c<u_n^2=u_{n+1}^2.
\end{align*}
If $c<m_n^2$, define $l_{n+1}=l_n$ and $u_{n+1}=m_n$; then
\begin{align*}
l_{n+1}^2=l_n^2<c<m_n^2=u_{n+1}^2.
\end{align*}
If $m_n^2=c$, set all later endpoints equal to $m_n$, giving an exact rational square root.
In the two non-exact cases, the new interval is nested in the old one and has half the length, because
\begin{align*}
u_{n+1}-l_{n+1}=\frac{u_n-l_n}{2}.
\end{align*}
By induction,
\begin{align*}
u_n-l_n=2^{-n}(u_0-l_0).
\end{align*}
Let $p_n=(l_n+u_n)/2$ be the midpoint. Since $l_n\le p_n\le u_n$ and $0<l_n\le u_n\le u_0$, we have
\begin{align*}
|p_n^2-c|\le u_n^2-l_n^2=(u_n-l_n)(u_n+l_n)\le 2^{-n}(u_0-l_0)(2u_0).
\end{align*}
Thus the square error can be made smaller than any requested $2^{-k}$ by choosing $n$ large enough that $2^{-n}(u_0-l_0)(2u_0)\le 2^{-k}$.
The midpoints also form a Cauchy sequence: if $j\ge i$, then both $p_i$ and $p_j$ lie in $[l_i,u_i]$, so
\begin{align*}
|p_i-p_j|\le u_i-l_i=2^{-i}(u_0-l_0).
\end{align*}
Reindex the construction by choosing $N(k)$ with $2^{-N(k)}(u_0-l_0)\le 2^{-k}$ and setting $a_k=p_{N(k+1)}$. Then for $i\le j$,
\begin{align*}
|a_i-a_j|\le 2^{-(i+1)}\le 2^{-i}+2^{-j}.
\end{align*}
So $(a_k)$ is a regular Cauchy representative, and the estimate on $|a_k^2-c|$ shows that its square is $c$; this constructed real is $\sqrt c$.
[/example]
## Sequences, Series, and Moduli of Continuity
Once the real line is built, the next question is which parts of elementary analysis survive with the same statements. The pattern is that convergence and continuity must carry moduli, while existence conclusions often become located or approximate conclusions.
[definition: Convergent Sequence of Reals]
A sequence $x:\mathbb N\to \mathbb R$ converges to a real number $y\in \mathbb R$ with modulus $\nu:\mathbb N\to \mathbb N$ when
\begin{align*}
\forall k\in \mathbb N\ \forall n\ge \nu(k),\quad |x_n-y|\le 2^{-k}.
\end{align*}
[/definition]
A convergence proof with a modulus can be used to compute approximations to the limit. The same idea must be imposed on functions: to compute $f(x)$ from an approximation to $x$, we need a rule converting output precision into input precision.
[definition: Uniform Continuity with Modulus]
Let $X\subset \mathbb R$ and let $f:X\to \mathbb R$. A modulus of uniform continuity for $f$ is a function $\omega:\mathbb N\to \mathbb N$ such that for all $x,y\in X$,
\begin{align*}
|x-y|\le 2^{-\omega(k)} \implies |f(x)-f(y)|\le 2^{-k}.
\end{align*}
[/definition]
The modulus says how accurately the input must be known to compute the output to a requested precision. This is the analytic form of the proof-as-program principle from the earlier chapters.
[example: Lipschitz Functions Have Moduli]
Suppose $f:[a,b]\to \mathbb R$ satisfies $|f(x)-f(y)|\le C|x-y|$ for all $x,y\in [a,b]$, where $C\in \mathbb Q$ and $C\ge 1$. For each $k\in\mathbb N$, choose $\omega(k)\in\mathbb N$ such that
\begin{align*}
C2^{-\omega(k)}\le 2^{-k}.
\end{align*}
Such a choice is possible because the powers of $2$ are unbounded in $\mathbb Q$.
We show that this $\omega$ is a modulus of uniform continuity. Let $x,y\in [a,b]$, and assume
\begin{align*}
|x-y|\le 2^{-\omega(k)}.
\end{align*}
Since $C\ge 0$, multiplying the preceding inequality by $C$ preserves the order:
\begin{align*}
C|x-y|\le C2^{-\omega(k)}.
\end{align*}
By the Lipschitz hypothesis,
\begin{align*}
|f(x)-f(y)|\le C|x-y|.
\end{align*}
Combining these inequalities with the defining choice of $\omega(k)$ gives
\begin{align*}
|f(x)-f(y)|\le 2^{-k}.
\end{align*}
Thus knowing the input within $2^{-\omega(k)}$ is enough to know the output within $2^{-k}$, so the Lipschitz constant gives an explicit conversion from input precision to output precision.
[/example]
The Lipschitz example supplies a modulus from a simple global estimate. For series of functions, the corresponding task is to control all tails uniformly, and a numerical majorant series is the most direct way to obtain that control.
[quotetheorem:7524]
[citeproof:7524]
The theorem is useful because the proof returns the exact modulus needed for [uniform convergence](/page/Uniform%20Convergence). The tail modulus for the majorant is not cosmetic: pointwise bounds $|f_n(x)|\le M_n$ and bare convergence of $\sum M_n$ do not by themselves give an index at which all later function tails are small. Without uniform tail control, partial sums may converge at different rates for different $x$, so there is no algorithm that turns an output precision into a single cutoff valid on all of $X$. The theorem also does not assert effective pointwise convergence from arbitrary convergent numerical majorants; it asserts uniform convergence only when the numerical tail estimates are themselves available as data.
The next example applies it to a familiar [power series](/page/Power%20Series), where the majorant is a geometric series with explicit tails.
[example: A Uniformly Convergent Power Series on a Smaller Interval]
Let $r\in\mathbb Q$ satisfy $0\le r<1$, and consider the series $\sum_{n=0}^\infty x^n$ on $[-r,r]$. For $x\in[-r,r]$ we have $|x|\le r$, so for every $n$,
\begin{align*}
|x^n|=|x|^n\le r^n.
\end{align*}
For $m>N$, the geometric tail satisfies
\begin{align*}
\sum_{n=N+1}^{m} r^n=r^{N+1}(1+r+\cdots+r^{m-N-1}).
\end{align*}
Multiplying by $1-r$ gives
\begin{align*}
(1-r)(1+r+\cdots+r^{m-N-1})=1-r^{m-N}.
\end{align*}
Since $0\le r^{m-N}\le 1$ and $1-r>0$,
\begin{align*}
1+r+\cdots+r^{m-N-1}=\frac{1-r^{m-N}}{1-r}\le \frac{1}{1-r}.
\end{align*}
Therefore
\begin{align*}
\sum_{n=N+1}^{m} r^n\le \frac{r^{N+1}}{1-r}.
\end{align*}
Given $k$, choose $N(k)$ such that
\begin{align*}
\frac{r^{N(k)+1}}{1-r}\le 2^{-k}.
\end{align*}
Then for all $m>N(k)$ and all $x\in[-r,r]$,
\begin{align*}
\left|\sum_{n=N(k)+1}^{m}x^n\right|\le \sum_{n=N(k)+1}^{m}|x^n|\le \sum_{n=N(k)+1}^{m}r^n\le 2^{-k}.
\end{align*}
Thus the partial sums are uniformly Cauchy, so the series converges uniformly by the *[Constructive Weierstrass M-Test](/theorems/7524)*.
Let
\begin{align*}
S_m(x)=\sum_{n=0}^{m}x^n.
\end{align*}
For every finite $m$,
\begin{align*}
(1-x)S_m(x)=(1-x)(1+x+\cdots+x^m).
\end{align*}
Expanding and grouping terms gives
\begin{align*}
(1+x+\cdots+x^m)-(x+x^2+\cdots+x^{m+1})=1-x^{m+1}.
\end{align*}
Hence
\begin{align*}
(1-x)S_m(x)=1-x^{m+1}.
\end{align*}
Since $|x|\le r<1$, we have
\begin{align*}
|x^{m+1}|\le r^{m+1}.
\end{align*}
The same geometric-tail modulus makes $r^{m+1}$ arbitrarily small, so $x^{m+1}$ converges uniformly to $0$. Passing to the uniform limit in
\begin{align*}
(1-x)S_m(x)=1-x^{m+1}
\end{align*}
gives
\begin{align*}
(1-x)S(x)=1.
\end{align*}
Because $x\le r<1$, the rational gap $1-r>0$ shows that $1-x$ is apart from $0$, so division by $1-x$ is legitimate. Therefore
\begin{align*}
S(x)=\frac{1}{1-x}.
\end{align*}
The modulus above gives uniform convergence on the whole smaller interval, and the finite identity for partial sums identifies the uniformly computed limit with $x\mapsto (1-x)^{-1}$.
[/example]
## Constructive Versions of Classical Theorems
The final problem is how to restate familiar theorems so that their conclusions contain data that can be found. Existence of an exact point is often too strong, while existence of arbitrarily small located intervals is the constructive content of the classical argument.
[quotetheorem:7525]
[proofunderconstruction:7525]
The conclusion is weaker in wording than the classical version but stronger computationally. It does not say that every point of the final interval has small image; approximate opposite signs at the endpoints would not justify that. Instead, the constructive content is the nested located bracket, together with endpoint error bounds that tend to zero. Nestedness is essential because the Cauchy real $z$ is extracted from a coherent family of intervals, not from unrelated approximate zeros. If the intervals merely had lengths tending to zero, they could alternate between disjoint regions; for instance, a sequence of rational intervals of length $2^{-k}$ centred alternately near $1/3$ and near $2/3$ has no common Cauchy limit, even though each individual interval is short. The nesting condition is what turns finite approximations into one real number rather than a list of incompatible candidates.
Locatedness of the ambient interval is a separate hypothesis. The bisection algorithm must compute rational approximations to the endpoints and choose rational subdivision points inside the interval. If $a$ and $b$ are only given as abstract reals satisfying $a<b$, a proof of $a<b$ supplies a positive gap but may not provide the rational brackets needed to place a midpoint or certify that the next half-interval lies inside $[a,b]$. A concrete obstruction is an interval whose endpoints are specified only by non-located Dedekind cuts: there may be no procedure which, at precision $2^{-k}$, returns rational brackets for both endpoints, so the first rational bisection step cannot be carried out uniformly. The endpoint locators in the theorem prevent this failure by making every finite subdivision a computable piece of data.
The modulus of uniform continuity is needed because bisection must turn a requested output precision into a finite subdivision schedule and must transfer convergence of the endpoint names to convergence of their images. Without such a modulus, a pointwise [continuous function](/page/Continuous%20Function) can supply, for each fixed $x$, a local radius that depends on information available only after $x$ is chosen; this is the same boundary phenomenon seen in constructive compactness, where local continuity alone need not yield a single global bisection schedule. The strict sign witnesses at the endpoints are also data: for the family $f_t(x)=x-t$ on $[0,1]$, knowing only $f_t(0)\le 0\le f_t(1)$ without a positive margin leaves the case $t=0$ indistinguishable from nearby positive cases unless one can decide whether the endpoint is already a zero. Thus non-strict endpoint signs do not provide the first sign bracket needed by the algorithm. The theorem therefore does not produce a decidable exact point-selection operation for all continuous functions; it produces located intervals and a Cauchy name for a zero when the continuity, endpoint location, nestedness, and sign information are given effectively.
[example: Locating a Zero of $x^2-2$]
Let $f(x)=x^2-2$ on $[1,2]$. If $x,y\in[1,2]$, then
\begin{align*}
|f(x)-f(y)|=|(x^2-2)-(y^2-2)|.
\end{align*}
The constants cancel, so
\begin{align*}
|(x^2-2)-(y^2-2)|=|x^2-y^2|.
\end{align*}
Factoring the difference of squares gives
\begin{align*}
|x^2-y^2|=|(x-y)(x+y)|.
\end{align*}
By multiplicativity of absolute value,
\begin{align*}
|(x-y)(x+y)|=|x-y||x+y|.
\end{align*}
Since $1\le x\le2$ and $1\le y\le2$, we have $2\le x+y\le4$, hence $|x+y|\le4$. Therefore
\begin{align*}
|f(x)-f(y)|\le 4|x-y|.
\end{align*}
Thus $f$ is Lipschitz on $[1,2]$ with rational Lipschitz constant $4$.
This gives an explicit modulus of uniform continuity: set $\omega(k)=k+2$. If $|x-y|\le 2^{-\omega(k)}=2^{-(k+2)}$, then
\begin{align*}
|f(x)-f(y)|\le 4|x-y|\le 4\cdot 2^{-(k+2)}.
\end{align*}
Since $4=2^2$,
\begin{align*}
4\cdot 2^{-(k+2)}=2^2\cdot 2^{-(k+2)}=2^{-k}.
\end{align*}
So $\omega(k)=k+2$ converts an output precision $2^{-k}$ into the input precision needed for bisection.
The endpoint signs have positive margins. At the left endpoint,
\begin{align*}
f(1)=1^2-2=1-2=-1.
\end{align*}
Since $2^{-1}=1/2$,
\begin{align*}
f(1)+2^{-1}=-1+\frac12=-\frac12<0.
\end{align*}
At the right endpoint,
\begin{align*}
f(2)=2^2-2=4-2=2.
\end{align*}
Again $2^{-1}=1/2$, and
\begin{align*}
2^{-1}=\frac12<2=f(2).
\end{align*}
The hypotheses of the preceding constructive [intermediate value theorem](/theorems/180) are therefore satisfied with the located interval $[1,2]$, the modulus $\omega(k)=k+2$, and endpoint sign witnesses $p=q=1$.
The theorem produces nested located intervals $[u_k,v_k]\subset[1,2]$ such that
\begin{align*}
v_k-u_k\le 2^{-k}.
\end{align*}
It also gives
\begin{align*}
f(u_k)\le 2^{-k}.
\end{align*}
and
\begin{align*}
f(v_k)\ge -2^{-k}.
\end{align*}
Their nested lengths determine a common Cauchy real $z\in[1,2]$, and the endpoint estimates converge to
\begin{align*}
f(z)=0.
\end{align*}
Substituting the definition of $f$ gives
\begin{align*}
z^2-2=0.
\end{align*}
Adding $2$ to both sides gives
\begin{align*}
z^2=2.
\end{align*}
Since $z\in[1,2]$, this is the positive square root of $2$. Thus the bisection output is the same real constructed by nested square-root intervals, now obtained as an instance of the constructive [intermediate value theorem](/theorems/629).
[/example]
The bisection example uses compactness through finite interval subdivision rather than through arbitrary open covers. To state the constructive compactness principle behind such arguments, we use finite nets at every precision.
[definition: Totally Bounded Metric Space]
A metric space $(X,d)$ is totally bounded when for every $k\in \mathbb N$ there exists a finite list $x_1,\dots,x_N\in X$ such that
\begin{align*}
X\subset \bigcup_{i=1}^N B(x_i,2^{-k}).
\end{align*}
[/definition]
This definition gives compactness as finite data at every precision. Such finite nets are exactly what is needed to turn local continuity information into a global rule, which leads to a precise theorem on uniform continuity over constructive compact intervals.
[quotetheorem:7526]
[citeproof:7526]
The theorem shows how constructive compactness converts local information into a global modulus. The finite rational cover and its Lebesgue number are part of the usable compactness data: without them, local continuity gives a radius only after a point has been chosen, and there need not be a constructive way to extract one radius that works everywhere. A standard boundary phenomenon is the failure of the fan theorem: in constructive settings where the fan theorem is not accepted, there are pointwise continuous maps on Cantor space with no uniform modulus, and the ternary embedding of Cantor space into $[0,1]$ transfers this obstruction to compact-looking subsets of the real line. In that situation the missing datum is exactly the finite bar, or equivalently the Lebesgue-number information, that would turn pointwise radii into one global radius. The result also does not cover arbitrary pointwise continuous functions on an interval when the proof of continuity supplies no finite cover data. The next example gives the sequential analogue of the same obstruction: bounded monotone sequences may fail to provide enough finite data to locate their limits.
[example: Specker Sequence]
Let $K\subseteq\mathbb N$ be a computably enumerable but non-computable set, and let $e_0,e_1,e_2,\dots$ enumerate $K$ without repetition. Define a rational sequence by
\begin{align*}
s_n=\sum_{i=0}^{n}2^{-2e_i-2}.
\end{align*}
Each $s_n$ is rational. If $n<m$, then
\begin{align*}
s_m-s_n=\sum_{i=n+1}^{m}2^{-2e_i-2}.
\end{align*}
Every term in the last sum is nonnegative, so $s_n\le s_m$. Also,
\begin{align*}
s_n\le \sum_{j=0}^{\infty}2^{-2j-2}.
\end{align*}
Since $2^{-2j-2}=2^{-2}(2^{-2})^j=\frac14\left(\frac14\right)^j$, the geometric sum is
\begin{align*}
\sum_{j=0}^{\infty}2^{-2j-2}=\frac14\cdot\frac{1}{1-\frac14}=\frac14\cdot\frac43=\frac13.
\end{align*}
Thus $(s_n)$ is increasing and bounded above by $\frac13$.
Its supremum is
\begin{align*}
\alpha=\sum_{j\in K}2^{-2j-2}.
\end{align*}
If $\alpha$ were computable with a modulus of approximation, then membership in $K$ would be decidable. Indeed, fix $j$. The possible contribution of the $j$th coded digit is
\begin{align*}
2^{-2j-2}.
\end{align*}
The total contribution of all later coded positions is
\begin{align*}
\sum_{i=j+1}^{\infty}2^{-2i-2}
=2^{-2j-4}\sum_{r=0}^{\infty}4^{-r}
=2^{-2j-4}\cdot\frac43
=\frac13\,2^{-2j-2}.
\end{align*}
So the gap between “$j\in K$” and “$j\notin K$” is larger than the whole possible tail after position $j$. A sufficiently accurate approximation to $\alpha$, together with the finite enumeration information below position $j$, would therefore decide whether the coefficient of $2^{-2j-2}$ is present. That would decide $K$, contradicting the choice of $K$.
Hence this increasing bounded rational sequence has no computable supremum and no modulus of convergence to its supremum. The example shows exactly what fails constructively: monotonicity and boundedness give order control, but they do not by themselves give located limit data.
[/example]
The chapter's main lesson is that constructive elementary analysis is not a weakened copy of classical analysis. It records the computational content already hidden in many classical proofs: rates of convergence, moduli of continuity, finite nets, located intervals, and positive order information. These extra data make theorems suitable for extraction into algorithms while preserving the familiar structure of real analysis.
Chapter 8 rebuilt elementary analysis around constructive real numbers and the data hidden inside classical arguments. The next chapter extends that viewpoint into Bishop-style analysis, where the familiar theorems of calculus and analysis are recovered in a form that remains computationally meaningful.
# 9. Bishop-Style Constructive Analysis
Bishop-style constructive analysis asks how much of ordinary analysis survives when existence claims must carry usable information. The answer is not a rejection of classical analysis, but a disciplined reformulation of it: metric estimates, moduli, Cauchy data, and approximation procedures replace appeals to completed totalities. The chapter assumes the metric spaces and Cauchy completions of Chapter 7 and the constructive real analysis of Chapter 8; it then revisits normed spaces, elementary integration, and convergence with explicit computational data. This chapter follows Bishop's program from its philosophical aim into normed spaces and integration, ending with convergence theorems whose hypotheses contain enough data to compute approximations. Along the way it sits close to computable analysis, type-theoretic proof assistants, and proof mining: all three ask which numerical witnesses a proof contains and how those witnesses can be extracted.
## Bishop's Program and Interpretations
The first question is how a single proof can be meaningful in several foundational settings at once. Bishop's strategy is to write proofs that are acceptable under classical mathematics, recursive mathematics, and intuitionistic mathematics by avoiding principles whose computational content is absent.
[explanation: Bishop's Compatibility Aim]
A Bishop-style theorem is formulated so that its proof can be read classically as an ordinary proof, recursively as an algorithm with correctness proof, and intuitionistically as a construction of the asserted object. This does not mean that every classical theorem is preserved word-for-word. Instead, hypotheses and conclusions are sharpened until the statement describes what data are available and what data are produced.
For instance, a classical statement saying that every pointwise convergent dominated sequence of integrable functions has an integrable limit is too coarse constructively. A Bishop-style version asks for convergence data and an integrable majorant, then concludes $L^1$ convergence with a quantitative bound.
[/explanation]
Constructive analysis therefore treats information that is often implicit in classical proofs as part of the mathematical object. The preceding discussion shows why bare convergence is not enough for later computation: a proof must also say when a desired precision has been reached. We need to record the central kind of numerical information used throughout the chapter.
[definition: Modulus Of Convergence]
Let $(X,d)$ be a metric space and let $(x_n)_{n\ge 1}$ be a sequence in $X$. A modulus of convergence for $(x_n)$ is a function $N:\mathbb N\to\mathbb N$ together with a point $x\in X$ such that for every $k\in\mathbb N$ and every $n\ge N(k)$,
\begin{align*}
d(x_n,x) < 2^{-k}.
\end{align*}
[/definition]
The definition turns convergence into a promise about how far along the sequence one must go to obtain a requested accuracy. In constructive arguments, such a modulus is often the object extracted from the proof rather than an afterthought.
[example: Geometric Error Data]
Let $0<q<1$, let $C>0$, and suppose $d(x_n,x)\le Cq^n$ for every $n$. For each $k\in\mathbb N$, define
\begin{align*}
N(k)=\max\left\{1,\left\lfloor \frac{k\log 2+\log C}{-\log q}\right\rfloor+1\right\}.
\end{align*}
Since $0<q<1$, we have $-\log q>0$, and the definition gives
\begin{align*}
N(k)>\frac{k\log 2+\log C}{-\log q}.
\end{align*}
Multiplying by $-\log q$ gives
\begin{align*}
-N(k)\log q>k\log 2+\log C.
\end{align*}
Equivalently,
\begin{align*}
\log C+N(k)\log q<-k\log 2.
\end{align*}
Therefore
\begin{align*}
\log\left(Cq^{N(k)}\right)<\log\left(2^{-k}\right),
\end{align*}
so $Cq^{N(k)}<2^{-k}$. If $n\ge N(k)$, then
\begin{align*}
q^n=q^{N(k)}q^{n-N(k)}.
\end{align*}
Because $n-N(k)\ge0$ and $0<q<1$, we have $q^{n-N(k)}\le1$, hence $q^n\le q^{N(k)}$. Thus
\begin{align*}
d(x_n,x)\le Cq^n\le Cq^{N(k)}<2^{-k}.
\end{align*}
So this $N$ is a modulus of convergence for $(x_n)$ to $x$: the geometric error estimate gives an explicit index after which the sequence is within any requested accuracy.
[/example]
The example converts a tail estimate into convergence data for one sequence. To use functions in analysis, however, we need data that work uniformly over many inputs rather than only along a selected sequence. This leads from moduli of convergence to moduli of uniform continuity.
[definition: Modulus Of Uniform Continuity]
Let $(X,d_X)$ and $(Y,d_Y)$ be metric spaces, and let $f:X\to Y$ be a function. A modulus of uniform continuity for $f$ is a function $\omega:\mathbb N\to\mathbb N$ such that for every $k\in\mathbb N$ and all $x,y\in X$,
\begin{align*}
d_X(x,y)<2^{-\omega(k)} \implies d_Y(f(x),f(y))<2^{-k}.
\end{align*}
[/definition]
A modulus of uniform continuity is stronger than pointwise continuity data because it works uniformly over the whole domain. The remaining question is where such uniform finite control comes from when a space has infinitely many points. Bishop-style compactness answers this by replacing arbitrary open-cover arguments with explicit finite nets.
[quotetheorem:7527]
[proofunderconstruction:7527]
This theorem is the constructive replacement for several classical compactness arguments. Explicit total boundedness is needed because the proof computes with a finite list at a prescribed scale; without such nets there is no finite search space on which to take a maximum. Completeness is needed because finite approximations may produce Cauchy data whose limit otherwise lies outside the space, as happens for Cauchy rational approximations to an irrational real inside $\mathbb Q$. The modulus of uniform continuity is also essential: a merely pointwise continuous function gives no single mesh size that controls all points of $X$, so evaluating it on a finite net need not approximate its range. The sequence version deliberately includes selection data, because choosing an infinite branch through repeated finite partitions is precisely the step that is not automatically constructive. The theorem therefore does not recover arbitrary open-cover compactness, nor does it assert that every sequence has a subsequence extractable without branch-selection data; it gives the finite-net and selection-data form of compactness that supports computation.
[example: Uniform Continuity On The Unit Interval]
Let $f:[0,1]\to\mathbb R$ have modulus of uniform continuity $\omega$, and fix a requested accuracy $2^{-k}$. Choose $m\in\mathbb N$ such that
\begin{align*}
\frac{1}{m}<2^{-\omega(k+1)}.
\end{align*}
Set $t_j=j/m$ for $0\le j\le m$. If $x<1$, choose $j=\lfloor mx\rfloor$, so $j\le mx<j+1$ and therefore
\begin{align*}
0\le x-\frac{j}{m}<\frac{1}{m}.
\end{align*}
Thus $|x-t_j|<2^{-\omega(k+1)}$. If $x=1$, choose $j=m$, and then $|x-t_j|=0<2^{-\omega(k+1)}$. In both cases, the definition of the modulus gives
\begin{align*}
|f(x)-f(t_j)|<2^{-(k+1)}.
\end{align*}
Since $2^{-(k+1)}<2^{-k}$, every value $f(x)$ lies within $2^{-k}$ of one of the finitely many values
\begin{align*}
f(t_0),f(t_1),\ldots,f(t_m).
\end{align*}
Equivalently, if $M=\max_{0\le j\le m} f(t_j)$, then for every $x\in[0,1]$ we have $f(x)<M+2^{-(k+1)}$, while $M=f(t_j)$ for some grid point and hence is an actual value of $f$. The finite grid therefore supplies an explicit finite approximation to the range of $f$, showing that compactness is being used through finite nets rather than through a non-effective cover argument.
[/example]
## Normed Spaces and Fixed Point Arguments
The next problem is to rebuild functional analysis while preserving computational meaning. Normed and Banach spaces remain central, but existence theorems must be examined for the data they require and the data they return.
[definition: Normed Space]
A normed space is a [vector space](/page/Vector%20Space) $X$ over $\mathbb R$ or $\mathbb C$ together with a function $\|\cdot\|_X:X\to\mathbb R$ such that for all $x,y\in X$ and all scalars $\lambda$, the following conditions hold: $\|x\|_X=0 \iff x=0$, $\|\lambda x\|_X=|\lambda|\,\|x\|_X$, and $\|x+y\|_X\le \|x\|_X+\|y\|_X$.
[/definition]
The norm supplies the metric $d(x,y)=\|x-y\|_X$, so constructive questions about approximation can be asked inside vector spaces. Normed spaces alone do not guarantee that controlled limiting processes stay inside the space. To solve equations by iteration, we need to strengthen a normed space by adding an operation that turns Cauchy data into limits.
[definition: Banach Space]
A [Banach space](/page/Banach%20Space) is a normed space $X$ such that every Cauchy sequence in $X$ equipped with a modulus of Cauchy convergence has a limit in $X$ equipped with a modulus of convergence.
[/definition]
This formulation makes completeness into an operation on data: a controlled Cauchy process produces a controlled limit. The most important use of this operation is to prove existence by constructing successive approximations. The Banach contraction argument is the model case because every estimate in the proof is quantitative.
[quotetheorem:270]
[citeproof:270]
The theorem is constructive because the proof tells us how many Picard iterates are needed for any target precision. The strict inequality $q<1$ is essential: the translation $T(x)=x+1$ on $\mathbb R$ is non-expansive up to the wrong constant and has no fixed point, while $T(x)=x$ has every point fixed and gives no uniqueness. Completeness is also essential, since a contraction on an incomplete subspace can generate a Cauchy Picard sequence whose limit belongs only to the completion. Knowing the numerical value of $q$ matters because the modulus is obtained by solving $q^n(1-q)^{-1}d(x_1,x_0)<2^{-k}$; without a usable bound away from $1$, the proof gives existence without an iteration count. This is also the form used in numerical analysis and differential equations, where a theorem is useful only when it can be turned into a stopping rule.
[example: Picard Iteration With Explicit Rate]
Let $X=C([0,1])$ with the sup norm, and define
\begin{align*}
(Tu)(t)=a+\int_0^t F(s,u(s))\,ds.
\end{align*}
Assume $F$ is Lipschitz in its second variable with constant $0\le L<1$ on the relevant range. For $u,v\in X$ and $t\in[0,1]$,
\begin{align*}
|(Tu)(t)-(Tv)(t)|=\left|\int_0^t\bigl(F(s,u(s))-F(s,v(s))\bigr)\,ds\right|.
\end{align*}
By the triangle inequality for the integral and the Lipschitz bound,
\begin{align*}
|(Tu)(t)-(Tv)(t)|\le \int_0^t L|u(s)-v(s)|\,ds.
\end{align*}
Since $|u(s)-v(s)|\le \|u-v\|_\infty$ for every $s\in[0,1]$,
\begin{align*}
|(Tu)(t)-(Tv)(t)|\le Lt\|u-v\|_\infty.
\end{align*}
Because $0\le t\le1$,
\begin{align*}
|(Tu)(t)-(Tv)(t)|\le L\|u-v\|_\infty.
\end{align*}
Taking the supremum over $t\in[0,1]$ gives
\begin{align*}
\|Tu-Tv\|_\infty\le L\|u-v\|_\infty.
\end{align*}
Thus $T$ is a contraction, so the Picard sequence $u_{n+1}=T(u_n)$ converges to the unique fixed point by the *Constructive Banach Contraction Theorem*.
Let $D=\|u_1-u_0\|_\infty$. The contraction estimate gives
\begin{align*}
\|u_{j+1}-u_j\|_\infty=\|Tu_j-Tu_{j-1}\|_\infty\le L\|u_j-u_{j-1}\|_\infty
\end{align*}
for $j\ge1$. Starting from $\|u_1-u_0\|_\infty=D$, induction gives
\begin{align*}
\|u_{j+1}-u_j\|_\infty\le L^jD.
\end{align*}
If $m>n$, then
\begin{align*}
u_m-u_n=(u_m-u_{m-1})+(u_{m-1}-u_{m-2})+\cdots+(u_{n+1}-u_n).
\end{align*}
The triangle inequality therefore gives
\begin{align*}
\|u_m-u_n\|_\infty\le \sum_{j=n}^{m-1}\|u_{j+1}-u_j\|_\infty.
\end{align*}
Using the increment bound,
\begin{align*}
\|u_m-u_n\|_\infty\le D\sum_{j=n}^{m-1}L^j.
\end{align*}
For $0<L<1$,
\begin{align*}
\sum_{j=n}^{m-1}L^j=L^n\frac{1-L^{m-n}}{1-L}.
\end{align*}
Since $0\le L^{m-n}\le1$,
\begin{align*}
\|u_m-u_n\|_\infty\le \frac{L^n}{1-L}D.
\end{align*}
Choose $c\in\mathbb N$ with $D/(1-L)\le 2^c$. Then it is enough to choose $n$ so that $L^n2^c<2^{-k}$, equivalently
\begin{align*}
n>\frac{(k+c)\log 2}{-\log L}.
\end{align*}
For $0<L<1$, the function
\begin{align*}
N(k)=\left\lfloor \frac{(k+c)\log 2}{-\log L}\right\rfloor+1
\end{align*}
therefore satisfies $\|u_m-u_n\|_\infty<2^{-k}$ whenever $m>n\ge N(k)$. If $L=0$, the same conclusion holds with $N(k)=1$, since then $\|u_{j+1}-u_j\|_\infty=0$ for every $j\ge1$. This turns the Picard iteration into an explicit stopping rule for approximating the fixed point.
[/example]
The contraction theorem handles existence by iteration, but functional analysis also needs extension principles. Some classical uses of Hahn-Banach hide a choice of real numbers from non-located sets or intervals. A Bishop-style version must identify the hypotheses under which the extension values can be selected with effective control.
[quotetheorem:7528]
[citeproof:7528]
This version records why constructive Hahn-Banach arguments often carry extra hypotheses. Separability and the dense sequence turn the extension problem into a countable construction; without them, there is no prescribed order in which the extension values are to be computed. The known bound $M$ is needed because each admissible interval is defined by the inequality $|F(e)|\le M\|e\|_X$; without a numerical bound, the extension step has no effective constraint to preserve. Locatedness is the genuinely constructive hypothesis: an interval may be classically non-empty without there being a procedure for selecting a point from it to a requested accuracy. Concretely, when extending $F_n$ from $E_n$ to $E_n+\operatorname{span}(z)$, all possible values of $F_{n+1}(z)$ form the interval $I_n$ cut out by the inequalities above. If the available information only proves that this interval is inhabited, but does not locate it, then a classical proof may assert that some scalar $a$ defines a bounded extension while a constructive proof cannot compute even a rational approximation to $F_{n+1}(z)$. Thus the obstruction is part of the normed-space extension step itself: a bounded extension requires a selected scalar from an admissible interval whose non-emptiness must be computationally witnessed.
[remark: Variants Of Hahn-Banach]
Different constructive schools accept different Hahn-Banach principles. Bishop-style mathematics proves useful separable or located versions, while stronger unrestricted forms may imply non-constructive principles. When using Hahn-Banach constructively, the statement must specify the data that make the extension step effective.
[/remark]
## Integration and Measure in a Bishop Framework
The last problem in this chapter is integration. Classical measure theory often identifies functions up to a.e. equality and proves convergence by extracting subsequences or using non-effective null-set arguments. Bishop-style integration begins with approximable objects and defines convergence through seminorm estimates.
[definition: Simple Function]
Let $(E,\mathcal E,\mu)$ be a [measure space](/page/Measure%20Space). A [simple function](/page/Simple%20Function) is a function $s:E\to\mathbb R$ of the form
\begin{align*}
s=\sum_{i=1}^n a_i\mathbb{1}_{A_i},
\end{align*}
where $a_i\in\mathbb R$ and $A_i\in\mathcal E$ for $1\le i\le n$.
[/definition]
Simple functions are the finite objects from which the integral is built. Once functions are approximated by finite data, we need a way to measure the error of approximation that treats negligible differences as zero. This motivates the $L^1$ seminorm.
[definition: L One Seminorm]
Let $(E,\mathcal E,\mu)$ be a measure space, and let $\mathcal L^1(E)$ denote the real-valued integrable functions on $E$. The $L^1$ seminorm is the map $\|\cdot\|_{L^1}:\mathcal L^1(E)\to\mathbb R$ defined by
\begin{align*}
\|f\|_{L^1}:=\int_E |f|\,d\mu.
\end{align*}
[/definition]
It is a seminorm on raw functions because functions that differ only on a null set have distance zero. Passing to equivalence classes turns it into a norm, but Bishop-style arguments can often work directly with the seminorm and explicit estimates. To prove convergence in this seminorm, we need hypotheses that control the size of all functions in the sequence by a single integrable object.
[definition: Integrable Majorant]
Let $(E,\mathcal E,\mu)$ be a measure space and let $(f_n)$ be a sequence of real-valued [measurable functions](/page/Measurable%20Functions) on $E$. An integrable majorant for $(f_n)$ is an integrable function $g:E\to[0,\infty)$ such that
\begin{align*}
|f_n(x)|\le g(x)
\end{align*}
for every $n\in\mathbb N$ and every $x\in E$ under consideration.
[/definition]
The majorant controls the possible mass of the sequence, while convergence data control where the functions are close. Dominated convergence becomes constructive only when these two controls can be combined quantitatively. The theorem below is the Bishop-style form used in this course.
[quotetheorem:7529]
[citeproof:7529]
The theorem mirrors the classical [dominated convergence theorem](/theorems/4), but the hypotheses reveal the hidden quantitative choices: how the majorant is approximated, where convergence is uniform enough, and how exceptional sets are controlled. Domination is necessary because pointwise convergence alone does not control mass: on $[0,1]$, the functions $f_n=n\mathbb{1}_{(0,1/n)}$ converge pointwise to $0$ but have $\|f_n\|_{L^1}=1$. Integrability of the majorant is necessary because a non-integrable bound gives no small tail estimate to spend in the proof. Effective convergence data are necessary because knowing only that each point eventually settles does not tell us a single index that works on a finite-measure region. Exceptional-set control is the bridge between pointwise and integral information: without a bound on the $g$-weighted size of the bad set, a small set of slow convergence can still carry significant integral mass.
[example: Quantitative Domination In L One]
Let $E=[0,1]$ with [Lebesgue measure](/page/Lebesgue%20Measure), let $g\in L^1(E)$ be non-negative, and suppose $(a_n)$ is a non-negative sequence converging to $0$ with modulus $N$ such that $|f_n(x)-f(x)|\le a_n g(x)$ for every $x\in E$. For each $n$, monotonicity of the integral gives
\begin{align*}
\|f_n-f\|_{L^1}=\int_E |f_n-f|\,d\mu\le \int_E a_n g\,d\mu.
\end{align*}
Since $a_n$ is a constant and $a_n\ge0$, homogeneity of the integral gives
\begin{align*}
\int_E a_n g\,d\mu=a_n\int_E g\,d\mu.
\end{align*}
Because $g\ge0$, the definition of the $L^1$ seminorm gives
\begin{align*}
\int_E g\,d\mu=\|g\|_{L^1}.
\end{align*}
Thus
\begin{align*}
\|f_n-f\|_{L^1}\le a_n\|g\|_{L^1}.
\end{align*}
Choose $c\in\mathbb N$ with $\|g\|_{L^1}\le 2^c$. Given $k\in\mathbb N$, let $n\ge N(k+c)$. Since $N$ is a modulus for $a_n\to0$, we have
\begin{align*}
a_n<2^{-(k+c)}.
\end{align*}
Multiplying by the non-negative number $\|g\|_{L^1}$ and using $\|g\|_{L^1}\le 2^c$ gives
\begin{align*}
a_n\|g\|_{L^1}<2^{-(k+c)}2^c.
\end{align*}
The powers of $2$ combine as
\begin{align*}
2^{-(k+c)}2^c=2^{-k-c+c}=2^{-k}.
\end{align*}
Therefore
\begin{align*}
\|f_n-f\|_{L^1}<2^{-k}.
\end{align*}
So $k\mapsto N(k+c)$ is an explicit modulus for the $L^1$ convergence of $f_n$ to $f$.
[/example]
A second useful pattern is convergence by approximation with simple functions. This is the form most aligned with Bishop's construction of the integral.
[example: Simple Function Approximation And L One Convergence]
Suppose $f_n,f\in L^1(E)$. For each $k\in\mathbb N$, assume we have simple functions $s_{n,k}$ and $s_k$ with
\begin{align*}
\|f_n-s_{n,k}\|_{L^1}<2^{-(k+2)}
\end{align*}
and
\begin{align*}
\|f-s_k\|_{L^1}<2^{-(k+2)}.
\end{align*}
Assume also that, for all $n\ge N(k)$,
\begin{align*}
\|s_{n,k}-s_k\|_{L^1}<2^{-(k+1)}.
\end{align*}
Fix $k$ and let $n\ge N(k)$. Since
\begin{align*}
f_n-f=(f_n-s_{n,k})+(s_{n,k}-s_k)+(s_k-f),
\end{align*}
the triangle inequality for the $L^1$ seminorm gives
\begin{align*}
\|f_n-f\|_{L^1}\le \|f_n-s_{n,k}\|_{L^1}+\|s_{n,k}-s_k\|_{L^1}+\|s_k-f\|_{L^1}.
\end{align*}
Because $\|s_k-f\|_{L^1}=\|f-s_k\|_{L^1}$, the three assumed bounds imply
\begin{align*}
\|f_n-f\|_{L^1}<2^{-(k+2)}+2^{-(k+1)}+2^{-(k+2)}.
\end{align*}
The first and third terms combine as
\begin{align*}
2^{-(k+2)}+2^{-(k+2)}=2\cdot 2^{-(k+2)}=2^{-(k+1)}.
\end{align*}
Therefore
\begin{align*}
2^{-(k+2)}+2^{-(k+1)}+2^{-(k+2)}=2^{-(k+1)}+2^{-(k+1)}=2\cdot 2^{-(k+1)}=2^{-k}.
\end{align*}
Thus
\begin{align*}
\|f_n-f\|_{L^1}<2^{-k}.
\end{align*}
So the same index function $N$ is a modulus for $L^1$ convergence: convergence of the finite simple-function approximants lifts to convergence of the original integrable functions.
[/example]
Bishop-style analysis therefore preserves the familiar architecture of analysis: compactness, completeness, fixed points, linear functionals, and integration. What changes is the bookkeeping of information. A theorem is strongest constructively when it states not only that an object exists, but also what approximation data the proof constructs and how later arguments may use that data.
Chapter 9 showed that much of ordinary analysis survives constructively once existence is accompanied by approximation data and moduli. The next chapter then turns to the principles that support such arguments, examining choice, fan theorems, and compactness as constructive tools rather than background assumptions.
# 10. Choice Principles, Fan Theorems, and Compactness
This chapter studies the constructive status of principles that classical mathematics often treats as background set theory. The main prerequisites are the constructive real numbers of Chapter 8, the metric completions of Chapter 7, the compactness examples from Chapters 8 and 9, and the logic of existence statements from Chapters 1-4. The main issue is that choosing witnesses is computational content: a proof of $\forall n\,\exists x\,P(n,x)$ does not by itself supply a single function $n \mapsto x_n$ unless a choice principle is available. We compare controlled forms of choice with fan-theoretic compactness principles, then use them to understand why compactness has several inequivalent constructive formulations.
## Controlled Choice Principles
The first problem is how to turn many local existence statements into one coherent construction. Constructive mathematics accepts choice principles when their computational meaning is explicit enough: a countable list of problems may be solved by a countable list of witnesses, and a recursive construction may proceed when each next step is available from the current state. The strength of such principles lies in the fact that they are not logical tautologies; they add uniformity to existence proofs.
[definition: Countable Choice]
Countable choice is the principle that for every family of inhabited sets $(A_n)_{n \in \mathbb N}$, there exists a section $a$ of the indexed family such that $a(n) \in A_n$ for every $n \in \mathbb N$.
[/definition]
This principle packages countably many choices into a single object. In Bishop-style analysis it is often used when a construction has already produced, for each precision $n$, some approximation or witness, but has not supplied a named global sequence. The next example motivates this use by showing how real-number representations can require a chosen approximation at every precision.
[example: Choosing Cauchy Approximations]
Suppose $r$ is given by a located cut, and suppose that for every $n \in \mathbb N$ the set
\begin{align*}
A_n=\{q\in \mathbb Q: |q-r|<2^{-n}\}
\end{align*}
is inhabited. Countable choice applied to the family $(A_n)_{n\in\mathbb N}$ gives a function $q:\mathbb N\to\mathbb Q$ with $q(n)\in A_n$ for every $n$, so writing $q_n=q(n)$ gives
\begin{align*}
|q_n-r|<2^{-n}.
\end{align*}
This chosen rational sequence is Cauchy. Indeed, if $m,n\ge N$, then the triangle inequality gives
\begin{align*}
|q_m-q_n|\le |q_m-r|+|r-q_n|.
\end{align*}
Using the defining bounds for $q_m$ and $q_n$,
\begin{align*}
|q_m-r|+|r-q_n|<2^{-m}+2^{-n}.
\end{align*}
Since $m,n\ge N$, we have $2^{-m}\le 2^{-N}$ and $2^{-n}\le 2^{-N}$, hence
\begin{align*}
2^{-m}+2^{-n}\le 2^{-N}+2^{-N}=2^{1-N}.
\end{align*}
Thus $|q_m-q_n|<2^{1-N}$ whenever $m,n\ge N$, and $2^{1-N}$ tends to $0$ as $N$ increases. Countable choice is exactly the step that turns the separate approximants supplied by locatedness into one Cauchy sequence representing the same real number.
[/example]
The example illustrates independent choices indexed by precision. Many constructions are not independent in that way: the subsequent object is constrained by the one already chosen, so we need to isolate the principle that supports iterated successor choices.
[definition: Dependent Choice]
Dependent choice is the principle that if $A$ is an inhabited set and $R \subset A \times A$ satisfies $\forall x \in A\,\exists y \in A\,R(x,y)$, then for every $a_0 \in A$ there exists a sequence $(a_n)_{n \in \mathbb N}$ in $A$ such that $a_0$ is its initial term and $R(a_n,a_{n+1})$ holds for every $n \in \mathbb N$.
[/definition]
Dependent choice is the constructive form of repeatedly applying a one-step extension lemma. This leads to a precise theorem, which records the exact reusable pattern needed later for metric-space approximation arguments.
Applied to a successor relation, dependent choice produces a sequence $(a_n)_{n\in\mathbb N}$ by repeatedly choosing $a_{n+1}$ with $R(a_n,a_{n+1})$. The principle supplies the path through the one-step relation; it does not by itself supply any metric estimate or limiting argument.
The inhabitedness of $A$ is needed because dependent choice starts from an actual state, not merely from a possible ambient space. The totality hypothesis on $R$ is the non-negotiable one-step extension condition; if even one current approximation has no successor, the recursive process can stop. The principle does not assert convergence, boundedness, or uniqueness of the resulting sequence, so a relation that always permits wildly different successors may produce many incompatible paths. This boundary case is exactly why later metric arguments must add estimates after the sequence has been constructed.
This separates sequence production from convergence analysis. The next example motivates this separation in a familiar analytic setting, where the construction gives approximate successors and the metric estimates later prove convergence.
[example: Picard Iterates With Approximate Choices]
Let $T$ have contraction constant $\lambda$ with $0\le \lambda<1$, so $d(T(u),T(v))\le \lambda d(u,v)$ for all $u,v\in X$. Fix $x_0\in X$, and use the state space $X\times \mathbb N$. From a state $(x,n)$ choose a successor $(y,n+1)$ with $d(y,T(x))<2^{-(n+1)}$; dependent choice applied to this successor relation gives a sequence $(x_n)_{n\in\mathbb N}$ such that
\begin{align*}
d(x_{n+1},T(x_n))<2^{-(n+1)}
\end{align*}
for every $n\in\mathbb N$.
We show that the resulting approximate Picard sequence is Cauchy. Put $a_n=d(x_{n+1},x_n)$. For $n\ge 1$, the triangle inequality, the contraction bound, and the two approximation bounds give
\begin{align*}
a_n\le 2^{-(n+1)}+\lambda a_{n-1}+2^{-n}.
\end{align*}
Since $2^{-(n+1)}+2^{-n}=3\cdot 2^{-(n+1)}$, this becomes
\begin{align*}
a_n\le \lambda a_{n-1}+3\cdot 2^{-(n+1)}.
\end{align*}
Iterating the inequality $k$ times gives
\begin{align*}
a_n\le \lambda^n a_0+3\sum_{j=1}^{n}\lambda^{n-j}2^{-(j+1)}.
\end{align*}
The right-hand side tends to $0$: the term $\lambda^n a_0$ tends to $0$ because $\lambda<1$, and the sum is a convolution of the summable geometric sequences $(\lambda^n)$ and $(2^{-n})$.
For $m>n$, repeated use of the triangle inequality gives
\begin{align*}
d(x_m,x_n)\le a_n+a_{n+1}+\cdots+a_{m-1}.
\end{align*}
The displayed estimate on $a_i$ bounds this tail by a tail of a convergent geometric-convolution series, so the right-hand side can be made arbitrarily small by taking $n$ large. Thus $(x_n)$ is Cauchy, and completeness of $X$ supplies a limit. The choice principle constructs the infinite chain of approximate successors; the metric estimates are the separate argument proving that this chain converges.
[/example]
Dependent choice handles genuine branching, while the Picard example also suggests a special case with no ambiguity. This motivates unique choice: when the witness is unique, the extracted function carries less additional choice content.
[definition: Unique Choice]
Unique choice is the principle that if $\forall x \in A\,\exists! y \in B\,P(x,y)$, then there exists a function $f: A \to B$ such that $P(x,f(x))$ for every $x \in A$.
[/definition]
Unique choice is weaker in computational demand than arbitrary choice. Its role is to extract a function from a single-valued relation whose totality has already been proved. The discussion returns to countable choice and motivates why even standard equivalences between real-number presentations can depend on such extraction.
[quotetheorem:7523]
[citeproof:7523]
Locatedness is essential in the first direction because it supplies, for each rational interval of prescribed size, evidence that the cut lies on one side or the other; without it there need not be enough information to choose rational approximants. A standard boundary example is a Dedekind-style cut whose membership is defined using an undecided proposition $P$, for instance by placing the cut at $0$ if $P$ and at $1$ if $\neg P$: without locatedness, asking for a rational approximation within $1/3$ would decide $P$. Countable choice is used only after the separate approximants have been proved to exist, and it is not replaced by uniqueness because many rationals may satisfy the same error bound. In models where countable choice fails, the family of inhabited approximation sets can exist without a single global selector, so the Cauchy sequence need not be constructible from the separate existence proofs. In the reverse direction, the Cauchy modulus is the data that makes the lower and upper cuts located: to compare a rational $q$ with the represented limit, choose a sufficiently late rational approximation and use the modulus to control the remaining tail. The composite maps are checked by showing that, at every precision $2^{-n}$, the reconstructed presentation gives approximations within the same shrinking tolerance, which is the constructive equality relation for these real-number presentations.
This theorem is a useful warning: equivalences between standard classical definitions can hide a choice step. Constructively, the issue is not whether the two presentations describe the same intended continuum, but whether the passage between their data is itself a construction available in the theory.
## Bars and Brouwer's Fan Theorem
The second problem is how to express compactness of infinite binary choice without invoking classical open covers. Cantor space consists of all infinite binary sequences, and finite binary strings represent finite information about such a sequence. A bar is a set of finite strings that every infinite path eventually meets.
[definition: Binary Tree And Cantor Space]
The full binary tree is $2^{<\mathbb N}$, the set of all finite sequences with entries in $\{0,1\}$. Cantor space is $2^{\mathbb N}$, the set of all functions $\alpha: \mathbb N \to \{0,1\}$. For $s \in 2^{<\mathbb N}$ and $\alpha \in 2^{\mathbb N}$, write $s \prec \alpha$ when $s$ is an initial segment of $\alpha$.
[/definition]
Finite strings are the basic observations one can make about an infinite binary sequence. This motivates the notion of a bar, which records a finite observation that every infinite path must eventually satisfy.
[definition: Bar]
A subset $B \subset 2^{<\mathbb N}$ is a bar on Cantor space if for every $\alpha \in 2^{\mathbb N}$ there exists $n \in \mathbb N$ such that $\alpha|_n \in B$.
[/definition]
A bar may meet different paths at different depths. The compactness question is whether these path-dependent depths can be bounded at once, which motivates a precise definition.
[definition: Uniform Bar]
A bar $B \subset 2^{<\mathbb N}$ is uniform if there exists $N \in \mathbb N$ such that for every $\alpha \in 2^{\mathbb N}$ there exists $n \le N$ with $\alpha|_n \in B$.
[/definition]
Uniformity is the compactness content of the binary fan. This slogan needs to become precise in the assertion that decidable pathwise bars on the full fan have a common finite bound.
[quotetheorem:7530]
[proofunderconstruction:7530]
The decidability hypothesis is needed because an arbitrary subset of finite strings may be extensionally a bar without providing a test for membership during a finite search. The full binary branching hypothesis is also doing work: for non-finitely branching trees, pathwise eventual entry need not give a uniform finite depth. The theorem does not say that every pointwise assertion over Cantor space is uniformly bounded; it applies only after the assertion has been encoded as a decidable bar. This restriction is the bridge to topology and computability, where compactness is meaningful only when local information can be detected from finite data.
The fan theorem is not merely a statement about trees. It becomes an analytic compactness principle when functions on Cantor space are read as depending on finite information, so we next make the two relevant continuity notions explicit.
[definition: Pointwise Continuous Map On Cantor Space]
A function $f:2^{\mathbb N}\to \mathbb N$ is pointwise continuous if for every $\alpha \in 2^{\mathbb N}$ there exists $n \in \mathbb N$ such that whenever $\beta \in 2^{\mathbb N}$ satisfies $\beta|_n=\alpha|_n$, then $f(\beta)=f(\alpha)$.
[/definition]
Pointwise continuity gives a modulus depending on the input sequence. Uniform continuity asks for a single modulus working for every input sequence, which is the analytic analogue of a uniform bar bound.
[definition: Uniformly Continuous Map On Cantor Space]
A function $f:2^{\mathbb N}\to \mathbb N$ is uniformly continuous if there exists $N \in \mathbb N$ such that whenever $\alpha,\beta \in 2^{\mathbb N}$ satisfy $\alpha|_N=\beta|_N$, then $f(\alpha)=f(\beta)$.
[/definition]
The bridge from the fan theorem to uniform continuity is to convert local moduli into a bar. This leads to a precise theorem: once the bar has a uniform depth, every function value is determined by an initial segment of that length.
[quotetheorem:7531]
[citeproof:7531]
The decidable-bar hypothesis is the point at which continuity data becomes usable by the fan theorem. Pointwise continuity alone gives, for each path, some finite determining prefix, but it need not make the global set of determining prefixes decidable. The theorem therefore does not claim that all extensional pointwise continuous maps $2^{\mathbb N}\to\mathbb N$ are uniformly continuous in a weak base theory. It says that when the local moduli are organised into a detachable bar, fan compactness turns them into one global modulus.
The theorem explains why finite prefixes are enough uniformly, but a concrete bar helps fix the picture. The next example gives a finite bar where the bound can be read directly.
[example: A Bar With A Uniform Bound]
Let $B \subset 2^{<\mathbb N}$ consist of two kinds of finite binary strings: strings whose first, second, or third entry is $1$, and all strings of length $3$. We show that $B$ is a uniform bar with bound $3$.
Fix an arbitrary $\alpha \in 2^{\mathbb N}$. If one of $\alpha(0),\alpha(1),\alpha(2)$ is equal to $1$, let $n$ be the least index in $\{1,2,3\}$ such that the prefix $\alpha|_n$ already contains that $1$. Then $\alpha|_n \in B$ and $n \le 3$. If none of $\alpha(0),\alpha(1),\alpha(2)$ is equal to $1$, then the prefix $\alpha|_3$ has length $3$, so $\alpha|_3 \in B$ by the second part of the definition of $B$. Thus in every case there exists $n \le 3$ with $\alpha|_n \in B$.
Hence $B$ is not merely a bar but a uniform bar, with uniform bound $N=3$. The two clauses in the definition of $B$ separate the two ways a finite search can stop: either the event “a $1$ has appeared” is discovered before depth $3$, or the fallback depth $3$ is reached.
[/example]
This example is finite enough to inspect directly, but the fan theorem matters when the bar is produced by a proof rather than by an explicit list of terminal strings. The uniform bound is what turns local dependence on finite information into a global continuity modulus.
## Constructive Compactness Principles
The third problem is that classical compactness has several equivalent formulations, while constructive mathematics separates them. Open-cover compactness, [sequential compactness](/page/Sequential%20Compactness), Bolzano-Weierstrass, total boundedness, and completeness carry different computational information. Comparing them reveals which principles require choice, fan-theoretic reasoning, or extra locatedness hypotheses.
[definition: Total Boundedness]
A metric space $(X,d)$ is totally bounded if for every $\varepsilon>0$ there exists a finite list $x_1,\dots,x_m \in X$ such that for every $x \in X$ there exists $i \in \{1,\dots,m\}$ with $d(x,x_i)<\varepsilon$.
[/definition]
Total boundedness gives finite approximation at each scale. It is often the most constructive compactness notion because it explicitly provides finite nets. We now move from finite approximation data to the stronger request for convergent subsequences.
[definition: Sequential Compactness]
A metric space $(X,d)$ is sequentially compact if every sequence $(x_n)_{n \in \mathbb N}$ in $X$ has a convergent subsequence with limit in $X$.
[/definition]
Sequential compactness involves choosing a subsequence, so its constructive strength depends on the available choice and on how the subsequence is found. This motivates the Bolzano-Weierstrass principle as the classical benchmark whose constructive status must be treated with care.
[quotetheorem:171]
Constructively, this principle is not a routine consequence of boundedness alone. It is often replaced by versions for sequences equipped with additional data, or derived under compactness principles strong enough to select nested intervals and subsequences. We turn to the open-cover formulation, where the finite-output requirement is visible in the statement itself.
The obstruction is that a bounded sequence may come with no computable rule for choosing a convergent subsequence. Classical proofs often use compactness or infinite pigeonhole reasoning to pass from infinitely many approximations to one coherent subsequence. Constructively, that passage requires extra data: finite nets with branch-selection information, a fan-theoretic compactness principle, or a choice principle strong enough to coordinate the selections across all scales.
This is why Bolzano-Weierstrass is useful here as a benchmark rather than as an automatic tool. When later arguments need a subsequence, the constructive proof must say how the subsequence is selected and what modulus makes it Cauchy. If the proof instead uses an open cover, the finite subcover must be computed from the cover data. These two forms of compactness agree classically in many settings, but constructively they expose different computational obligations.
[definition: Heine-Borel Compactness For Intervals]
The interval $[a,b] \subset \mathbb R$ is Heine-Borel compact if every open cover of $[a,b]$ has a finite subcover.
[/definition]
Open-cover compactness is powerful because it converts infinitely many local neighbourhoods into finitely many. Constructively, proving this for intervals often requires a fan-theoretic principle, since bisection of an interval behaves like traversal of a binary tree.
[quotetheorem:7532]
[citeproof:7532]
The decidability of the finite-cover bar is needed because the argument must recognise, at a finite stage of bisection, that a subinterval has already been covered by finitely many members of the cover. If the cover is given only by the negative statement that no point is outside it, the construction may have no finite test for membership in $B$, so the fan theorem cannot be applied to that bar. The repeated bisection hypothesis is the place where interval compactness is reduced to binary fan compactness; without a uniform bar bound, each point might be covered at some finite scale without yielding one finite scale for the whole interval. A typical counterpattern is a pointwise bar whose required prefix length depends on an undecided proposition or on an unbounded search; every path is eventually met, but no constructive proof supplies a common depth. The theorem does not cover arbitrary classically specified open covers with no constructive membership or local-radius data. It explains why constructive Heine-Borel results often state precise presentation conditions on covers before invoking a fan principle.
This proof sketch explains why the fan theorem is a compactness principle rather than an isolated tree axiom. It supplies the common depth needed to pass from pointwise local cover information to a finite global cover. This also raises a comparison this with the sequential route through finite nets and completeness.
[quotetheorem:7533]
[citeproof:7533]
Total boundedness alone is not enough here because it gives finite covers at each scale but does not identify which cell contains infinitely many selected sequence terms. Completeness is also used only at the end: before that point the construction has produced a Cauchy subsequence, not its limit. The theorem therefore does not assert that every constructively totally bounded complete space is sequentially compact in a weak base theory; it asserts this under the stated infinitary selection principle. This is the sequential analogue of the fan-theorem argument above, with finite nets replacing dyadic subintervals and infinite pigeonhole choices replacing uniform bar bounds.
The theorem explains the sequential compactness mechanism in terms of nets, choices, and limits. The final example applies the comparison to the unit interval, where finite nets are explicit but open-cover compactness still carries fan-theoretic content.
[example: Comparing Compactness On The Unit Interval]
Given $\varepsilon>0$, choose $m\in\mathbb N$ with $m\ge 1$ and $1/m<\varepsilon$. Consider the finite grid
\begin{align*}
G_m=\{0,1/m,2/m,\dots,m/m\}.
\end{align*}
For any $x\in[0,1]$, the partition intervals $[k/m,(k+1)/m]$ for $0\le k<m$ cover $[0,1]$, so choose $k<m$ with $k/m\le x\le (k+1)/m$. Then $(k+1)/m\in G_m$, and
\begin{align*}
|x-(k+1)/m|=(k+1)/m-x.
\end{align*}
Since $x\ge k/m$, subtracting $x$ from $(k+1)/m$ gives
\begin{align*}
(k+1)/m-x\le (k+1)/m-k/m.
\end{align*}
The right-hand side is
\begin{align*}
(k+1)/m-k/m=((k+1)-k)/m=1/m<\varepsilon.
\end{align*}
Thus every $x\in[0,1]$ lies within $\varepsilon$ of some point of the finite grid $G_m$, so $[0,1]$ is totally bounded.
Completeness is a different assertion: if a Cauchy sequence in $[0,1]$ converges as a real sequence to some limit $x$, then the order bounds are preserved because $0\le x_n\le 1$ for every $n$ and limits in the chosen constructive real presentation respect closed order bounds. Hence the limit still lies in $[0,1]$.
Heine-Borel compactness asks for more than these two metric facts. Given an open cover, one must produce finitely many members of the cover whose union contains all of $[0,1]$; the finite grids above only approximate points at a chosen scale, and completeness only turns Cauchy data into a limit. The extra constructive step is a uniform finite bound on how far a binary subdivision search must go before the cover is detected locally, which is precisely the fan-theoretic content used in constructive proofs of interval compactness.
[/example]
The comparison can be summarised as follows. Total boundedness provides finite approximations scale by scale; completeness turns Cauchy information into limits; sequential compactness asks for subsequences; Heine-Borel compactness asks for finite subcovers; the fan theorem supplies uniform finite bounds for binary branching searches. Classical mathematics identifies these under broad hypotheses, while constructive mathematics tracks the exact construction needed for each implication.
Chapter 10 clarified how compactness, choice, and branching principles behave constructively and why they matter for analysis. The next chapter steps back from the constructive setting to compare it with classical mathematics, separating results that transfer directly from those that require a change in formulation or proof method.
# 11. Comparison with Classical Mathematics
Classical mathematics is not a single opponent of constructive mathematics. Some classical theorems survive unchanged, some keep their mathematical content after a change of formulation, and some rely on principles that have no Bishop-constructive proof. This chapter develops a practical method for comparing the two settings: identify the logical form of a theorem, find where classical principles enter its proof, and decide whether the result should be retained, reformulated, or separated from the constructive theory.
Chapters 1-6 built constructive logic, arithmetic, and realizability, while Chapters 7-10 built constructive analysis, compactness, and choice principles internally. We now use those tools diagnostically. The aim is not to reject classical mathematics, but to understand which parts already carry computational information and which parts conceal nonconstructive choices.
## Sorting Classical Theorems by Constructive Content
The first problem is classification. Given a theorem from ordinary algebra or analysis, should we try to prove it as stated, replace it by a more informative version, or treat it as a genuinely classical principle? The answer depends less on the subject area than on the logical shape of the statement and the data demanded by its conclusion.
A useful first class consists of theorems whose proofs are already constructive after small edits. These tend to have explicit constructions, algorithms, or estimates in their classical proofs. We record the classification used throughout the comparison.
[definition: Constructive Status Of A Classical Theorem]
Let $T$ be a theorem stated in classical mathematics over a fixed constructive background theory. Its constructive status is one of the following three labels:
1. $T$ is constructively valid if the same statement has a proof in the constructive background theory.
2. $T$ is constructively reformulable if there is a constructive statement $T^\prime$ with similar intended mathematical use, and classical mathematics proves $T \iff T^\prime$.
3. $T$ is essentially nonconstructive relative to the background theory if $T$ implies a principle not accepted in that constructive theory.
[/definition]
This definition separates two tasks that are often conflated. A theorem may fail as written because it asks for the wrong kind of information, while a nearby formulation may preserve the part of the result that was actually used in applications.
[example: Three Forms Of Existence]
Let $f:[a,b]\to\mathbb{R}$ be continuous with $f(a)<0<f(b)$. The classical sentence “there is a zero” can mean different things depending on the information demanded from the conclusion. If it means $\exists x\in[a,b]\, f(x)=0$, then a constructive proof must give enough data to approximate such an $x$: for each requested precision, it should tell us how to produce an interval or rational approximation in which a zero is forced.
The weaker statement $\neg\neg\exists x\in[a,b]\, f(x)=0$ does not supply such an approximation procedure. It only says that assuming the nonexistence of a zero leads to contradiction, so it gives negative information about the impossibility of avoiding zeros, not a displayed zero or a converging sequence of approximations.
With extra data, the statement becomes algorithmic. Suppose, for instance, that $f$ comes with a modulus of uniform continuity and an approximate sign procedure. Starting from $[a,b]$, choose the midpoint $m=(a+b)/2$. If the approximate sign data certifies a sign change on $[a,m]$, keep $[a,m]$; otherwise keep $[m,b]$. After $n$ bisections the retained interval has length $(b-a)/2^n$, because each step replaces an interval of length $L$ by one of length $L/2$. Choosing $n$ with $(b-a)/2^n<\varepsilon$ gives an explicit interval of diameter $<\varepsilon$ in which the sign-change data persists. Thus the constructive reformulation does not merely assert that a zero cannot be absent; it gives finite instructions for approximating where a zero must lie.
[/example]
The example shows that a classification has to respect logical polarity: negative claims and positive existence claims behave differently. The next concept isolates the negative situation in which a classical double-negation proof can still be converted into constructive information.
[definition: Stable Proposition]
A proposition $P$ is stable if $\neg\neg P \to P$.
[/definition]
Stability is not automatic for arbitrary assertions, but it is common for negative formulas. This motivates a transfer principle: if a classical argument reaches only the double negation of a stable negative conclusion, no witness has been lost.
[quotetheorem:7534]
[citeproof:7534]
The stability hypothesis is essential. If $N$ is replaced by an arbitrary existential statement $\exists n\in\mathbb{N}\,P(n)$, then the same pattern would turn $\neg\neg\exists n\,P(n)$ into a numerical witness, which would validate Markov-style existence principles not accepted in the background theory. The hypothesis on $\Gamma$ is also essential: if $\Gamma$ contains an undecided instance of excluded middle $P\vee\neg P$, its negative translation may be harmless while the original assumption already supplies a nonconstructive case split. The theorem does not say that every classical proof of a negative-looking statement transfers automatically; it says that once the hypotheses survive negative translation and the conclusion is stable, the remaining double negation can be discharged. This is the diagnostic tool used below when uniqueness survives but existence has to be reformulated.
[example: Uniqueness Survives Better Than Existence]
Let $f:[0,1]\to\mathbb{R}$ be given by $f(x)=x^2$, and fix a real number $y\ge 0$. We show that the uniqueness part of the inverse problem is constructive: if $x_1,x_2\in[0,1]$ and $x_1^2=y=x_2^2$, then $x_1=x_2$.
First subtract the two displayed equalities:
\begin{align*}
x_1^2-x_2^2=y-y=0.
\end{align*}
Factoring the difference of squares gives
\begin{align*}
x_1^2-x_2^2=(x_1-x_2)(x_1+x_2),
\end{align*}
so
\begin{align*}
(x_1-x_2)(x_1+x_2)=0.
\end{align*}
Because $x_1,x_2\in[0,1]$, we have $0\le x_1$ and $0\le x_2$, hence
\begin{align*}
0\le x_1+x_2.
\end{align*}
If $x_1+x_2$ is bounded away from $0$, then multiplication by its reciprocal gives $x_1-x_2=0$. If $x_1+x_2$ is arbitrarily small, then
\begin{align*}
|x_1-x_2|\le |x_1|+|x_2|=x_1+x_2,
\end{align*}
so $|x_1-x_2|$ is arbitrarily small and therefore $x_1=x_2$. Thus in either constructive reading of the nonnegative factor $x_1+x_2$, the two possible nonnegative roots cannot differ.
Existence is different: producing $x$ with $x^2=y$ requires square-root construction data for $y$, not merely the proof that two possible roots would be equal. The reusable audit technique is to split the classical inverse-function claim into the negative uniqueness part and the positive witness-producing part.
[/example]
## Nonconstructive Existence In Algebra And Analysis
The next problem is to understand why some familiar existence theorems resist constructive proof. The obstruction is usually not technical difficulty; it is that the theorem asks for a global object selected from incomplete information. Algebraic maximality, arbitrary bases, weak compactness, and unrestricted choice are standard places where this occurs.
The first algebraic example is the maximal ideal theorem. Classically, it follows from [Zorn's lemma](/theorems/1226) that every nonzero commutative ring with identity has a maximal ideal. Constructively, this conclusion gives too much information about an arbitrary ring, so it must be treated as a choice-dependent theorem rather than a basic algebraic construction.
Here an ideal $I$ in a commutative ring $R$ is an additive subgroup closed under multiplication by arbitrary elements of $R$. It is proper when $I\ne R$, and maximal when it is proper and no strictly larger proper ideal lies between $I$ and $R$. In a Boolean algebra $B$, a prime ideal is a proper ideal $P$ such that $x\wedge y\in P$ forces $x\in P$ or $y\in P$; equivalently, its complement is an ultrafilter, a collection closed upward and under finite meets that decides each Boolean element by containing exactly one of $b$ and $\neg b$. The Boolean-ring reduction sends $B$ to the ring whose addition is symmetric difference and whose multiplication is meet, so a maximal ideal in that ring corresponds to a prime ideal, or ultrafilter complement, in the original Boolean algebra. This reduction is what makes the algebraic maximal-ideal principle strong enough to imply a choice-like Boolean prime ideal principle.
[quotetheorem:7535]
[proofunderconstruction:7535]
The nonzero and identity hypotheses are not cosmetic. The zero ring has no proper ideals, so dropping nonzero would make the maximal-ideal statement false even classically. Dropping the identity also breaks the reduction: the Boolean ring attached to a Boolean algebra uses the top element $1_B$ as the multiplicative identity, and nonunital rings can have maximal one-sided or proper ideals whose behaviour does not encode Boolean prime ideals. The theorem also separates the forward connection from what it does not claim. It does not say that every concrete ring lacks maximal ideals constructively; for example, a finitely generated algebra over a discrete field may have maximal ideals that can be exhibited by calculation. What fails is the unrestricted passage from an arbitrary ring presentation to a globally selected maximal ideal. The constructive response is therefore not to abandon algebraic geometry or commutative algebra, but to replace point-like existence by algebraic or geometric data that can be computed from the ring.
[example: Prime Ideals Versus Formal Spectra]
Let $R$ be a ring with decidable equality and a displayed list of generators. Instead of choosing a prime ideal $P$, the formal spectrum works with basic opens $D(s)$, read as the condition that $s$ is invertible. A finite cover of $D(s)$ by $D(f_1),\dots,D(f_n)$ is recorded by explicit coefficients $a_1,\dots,a_n\in R$ and an exponent $N\ge 1$ such that
\begin{align*}
s^N=a_1f_1+\cdots+a_nf_n.
\end{align*}
This equation is finite algebraic data: if a prime ideal $P$ avoids $s$, then $s^N\notin P$; since $s^N=a_1f_1+\cdots+a_nf_n$, not all $f_i$ can lie in $P$, because otherwise the right-hand side would lie in $P$ by closure of ideals under addition and multiplication by ring elements. Thus the formal cover expresses exactly the finite forcing content of the classical statement “some $D(f_i)$ contains the point $P$,” but it does not choose such a $P$.
Localization is handled the same way. On the basic open $D(s)$ one uses the localized ring $R_s$, whose fractions satisfy
\begin{align*}
\frac{a}{s^m}=\frac{b}{s^n}
\end{align*}
precisely when there is $k\ge 0$ with
\begin{align*}
s^k(s^n a-s^m b)=0.
\end{align*}
So equality of sections is checked by a finite equation in $R$, and restriction from $D(s)$ to $D(sf)$ sends
\begin{align*}
\frac{a}{s^m}
\end{align*}
to the same numerator $a$ viewed with denominator $s^m$ in $R_{sf}$. The formal spectrum therefore keeps the finite equations needed for localization and sheaf arguments, while avoiding the nonconstructive step of selecting a single prime or maximal ideal.
[/example]
This example exposes the same pattern in a different algebraic language: a basis theorem for arbitrary vector spaces also asks for a global maximal object. This records why constructive linear algebra keeps bases as data or as outputs of finite algorithms, rather than as automatic structure.
[quotetheorem:7536]
[citeproof:7536]
The discreteness of $k$ and the condition $0\ne 1$ are needed so that finite linear combinations have decidable coefficients and so that the decoding procedure can identify a nonzero finite support. The proof also shows why the naive quotient collapsing all $e_{i,a}$ to one common generator is insufficient: such a quotient would display a vector in the $i$th summand only by hiding the original set $A_i$, and a basis of that one-dimensional quotient need not reveal any element of $A_i$. Blass's coding avoids this by forcing every basis expansion of the displayed vector to expose a genuine finite list from $A_i$. If discreteness is removed, the step "choose a nonzero coefficient in a finite support" is no longer a finite decision. If the family $(A_i)$ is not inhabited, no choice function can exist even classically, so the theorem would have a false conclusion rather than a hidden choice conclusion. The theorem does not rule out bases when the vector space is finite-dimensional with a displayed spanning list, nor does it rule out using a basis supplied as part of the data. It rules out the automatic classical upgrade from arbitrary vector space to chosen basis; in models of constructive set theory where countable or dependent choice fails, this automatic basis principle fails with it.
This algebraic construction is useful because the nonconstructive step has been isolated: the basis turns a family of nonempty finite-support searches into a global selector. The same pattern appears in analysis with compactness replacing bases. A compactness theorem may look like a statement about limits, but when it is applied uniformly to arbitrary trees or arbitrary sequences it can choose coherent branches through finite approximations. The next theorem makes that hidden branch selection visible.
[quotetheorem:7537]
[citeproof:7537]
The preceding algebraic examples hid choice in maximal objects; this theorem shows the same issue inside a familiar analytic assertion. The interval $[0,1]$ is important because binary subdivision turns subsequence convergence into coherent choices of left or right halves. The decidability and level-witness assumptions are also essential: without decidability the finite prefix search cannot tell which candidate strings belong to the tree, and without displayed level witnesses the phrase "infinite tree" supplies no sequence of nodes to feed into sequential compactness. A finite tree with arbitrarily presented missing levels is a concrete failure of the witness hypothesis, while an undecidable subtree defined by membership in an arbitrary proposition is a concrete failure of the decidability hypothesis. The theorem does not say that no bounded sequence ever has a constructively obtainable convergent subsequence; monotone sequences with located suprema or sequences equipped with an explicit Cauchy-subsequence algorithm are harmless. It says that the unrestricted classical principle for arbitrary sequences carries the strength of Weak König's Lemma. This is why constructive compactness statements below specify the exact finite approximation or subsequence data being used.
## Compactness Notions That Classically Coincide
The comparison problem becomes more subtle when classical mathematics identifies several formulations. In metric spaces, compactness, sequential compactness, total boundedness plus completeness, and open-cover compactness are equivalent under classical assumptions. Constructively, each formulation asks for different information.
[definition: Sequential Compactness]
A metric space $(X,d)$ is sequentially compact if for every sequence $x:\mathbb{N}\to X$ there exist a strictly increasing map $k:\mathbb{N}\to\mathbb{N}$ and a point $x_\infty\in X$ such that the sequence $y:\mathbb{N}\to X$ defined by $y(n)=x(k(n))$ converges to $x_\infty$.
[/definition]
This definition contains existential data twice: the subsequence and the [limit point](/page/Limit%20Point). To separate this from finite approximation, we need to name the compactness property that supplies nets at each scale.
[definition: Total Boundedness]
A metric space $(X,d)$ is totally bounded if for every $\varepsilon>0$ there exist $n\in\mathbb{N}$ and points $x_1,\dots,x_n\in X$ such that
\begin{align*}
X \subset \bigcup_{i=1}^{n} B(x_i,\varepsilon).
\end{align*}
[/definition]
For constructive purposes, total boundedness is strongest when the finite $\varepsilon$-net is given by a procedure depending on $\varepsilon$. This motivates completeness as a separate definition, because finite nets describe approximation at fixed scales while completeness supplies the limit point of a Cauchy approximation process.
[definition: Complete Metric Space]
A metric space $(X,d)$ is complete if every Cauchy sequence in $X$ converges to a point of $X$.
[/definition]
Completeness itself also has an effective version: a Cauchy sequence with a modulus should produce a limit with controlled approximation. The classical theorem below packages these notions together; the constructive audit will then unpack where the package uses choice.
[quotetheorem:316]
[citeproof:316]
Completeness is needed in the final direction because total boundedness alone only constructs Cauchy subsequences; the rational interval $(0,1)\cap\mathbb{Q}$ is totally bounded but a Cauchy sequence converging to an irrational real has no limit in the space. Total boundedness is needed because completeness alone allows spaces such as an infinite discrete [complete metric space](/page/Complete%20Metric%20Space), where no finite $\varepsilon$-net exists for small $\varepsilon$. The theorem is correct as classical mathematics, but it does not say that the equivalence is uniform or algorithmic. Each implication must be audited for hidden selection and hidden decidability, which leads to the constructive separation theorem.
[quotetheorem:7538]
[citeproof:7538]
Completeness and total boundedness are retained in the hypothesis to show that the failure is not caused by missing finite approximations or missing limits of Cauchy sequences separately. If completeness is removed, $(0,1)\cap\mathbb{Q}$ already gives a totally bounded space where Cauchy approximations can leave the space. If total boundedness is removed, an infinite discrete complete metric space has no convergent subsequence for the sequence listing distinct points. The proof deliberately uses the standard presentation of $[0,1]$, rather than a closure $X_T=\overline{S_T}$ of a tree-coded set, because such a closure would require additional located membership or prefix data to be a constructive metric subspace with usable finite covers. The theorem does not deny that complete totally bounded spaces with explicit fan data can be sequentially compact; that extra fan data is precisely what supplies the path choices constructively. Its point is that the classical implication hides a tree-choice principle. This separation is not a defect in the constructive theory: it records which algorithms are actually present, so applications that use only finite nets and completion should state exactly those data.
[example: Located Supremum Instead Of Least Upper Bound]
Let $A\subset\mathbb{R}$ be bounded above. The classical least-upper-bound statement asks for a real number $s$ such that every $a\in A$ satisfies $a\le s$, and every upper bound $t$ of $A$ satisfies $s\le t$. This says what $s$ is order-theoretically, but by itself it does not explain how to approximate $s$ from elements of $A$.
A located supremum keeps the upper-bound condition and adds explicit approximation data. For every $\varepsilon>0$, it gives either evidence that $A$ is empty in the relevant sense, or an element $a_\varepsilon\in A$ such that
\begin{align*}
s-\varepsilon<a_\varepsilon\le s.
\end{align*}
The right inequality $a_\varepsilon\le s$ follows from $s$ being an upper bound of $A$. The left inequality gives a lower approximation to $s$: subtracting $a_\varepsilon$ from each side of $s-\varepsilon<a_\varepsilon$ gives
\begin{align*}
s-a_\varepsilon<\varepsilon.
\end{align*}
Together with $a_\varepsilon\le s$, this yields
\begin{align*}
0\le s-a_\varepsilon<\varepsilon.
\end{align*}
Hence $a_\varepsilon$ determines $s$ up to error $<\varepsilon$ from below. For example, taking $\varepsilon=2^{-N}$ gives an element $a_N\in A$ with
\begin{align*}
0\le s-a_N<2^{-N}.
\end{align*}
Thus the constructive version does more than name the least upper bound: it supplies finite data for approximating it to any requested precision.
[/example]
## Reading Classical Proofs Constructively
The final problem is methodological. Given a classical proof, how do we decide what to keep? The constructive reading proceeds by locating the first point where the proof demands information not supplied by the hypotheses.
[definition: Constructive Audit Of A Proof]
A constructive audit of a classical proof consists of the following steps:
1. Identify each use of excluded middle, proof by contradiction, choice, maximality, compactness, or least upper bounds.
2. Determine whether the relevant proposition is decidable, stable, or negative.
3. Replace nonconstructive choices by explicit data, algorithms, moduli, or weakened conclusions where possible.
4. Record any remaining classical principle required by the argument.
[/definition]
The audit is most useful when it produces a better theorem rather than merely a rejection. The following bisection example is the standard pattern.
[quotetheorem:7539]
[citeproof:7539]
The hypotheses are doing separate jobs. The displayed modulus is necessary because, without it, the algorithm has no way to choose a mesh size from the input; a pointwise continuous function may come with no computable global rate of variation on $[a,b]$. The approximate sign procedure is necessary because exact comparison with $0$ for constructive reals is not decidable; a scan that asks whether $f(t)\le 0$ or $f(t)>0$ can stop at a real whose sign is not determined. The strict endpoint separation is also necessary for this finite scan: with only $f(a)\le 0\le f(b)$, an endpoint may already be $0$, and the overlapping approximate alternatives need not force the procedure to return the side needed by the scan. If both endpoints have the same strict sign, for instance $f(x)=1$ on $[0,1]$, no approximate zero exists. The theorem deliberately does not assert an exact $x$ with $f(x)=0$; that stronger conclusion needs extra locatedness or a construction of a Cauchy sequence $(x_N)$ with $|f(x_N)|\to0$ and a proof that its limit is available. The output is still stronger than a bare classical existence claim for numerical work, because it gives certified approximate zeros at any requested precision.
[example: Auditing A Classical Intermediate Value Proof]
A classical proof of the intermediate value theorem usually begins with
\begin{align*}
A=\{t\in[a,b]: f(t)\le 0\},
\end{align*}
then sets $s=\sup A$, and finally proves $f(s)=0$ by ruling out the two alternatives $f(s)<0$ and $f(s)>0$. The constructive audit asks what data are needed at each step. To form $A$ as a usable search set, one would need to decide, for each $t\in[a,b]$, whether $f(t)\le 0$ holds. For constructive real numbers, exact order comparison with $0$ is not generally decidable, so this step asks for more information than continuity supplies.
The supremum step has the same problem in a different form. A least upper bound $s$ only says that every $t\in A$ satisfies $t\le s$, and that every upper bound $u$ of $A$ satisfies $s\le u$. To use $s$ computationally, one needs located approximation data: for a requested $\varepsilon>0$, one needs either a reason that no relevant element of $A$ exists, or a point $a_\varepsilon\in A$ with
\begin{align*}
s-\varepsilon<a_\varepsilon\le s.
\end{align*}
From this inequality, subtracting $a_\varepsilon$ gives
\begin{align*}
s-a_\varepsilon<\varepsilon,
\end{align*}
and the upper-bound inequality gives
\begin{align*}
0\le s-a_\varepsilon.
\end{align*}
Thus a located supremum would give $0\le s-a_\varepsilon<\varepsilon$, but the classical proof only names $\sup A$ and does not supply these approximants.
The final sign argument also uses a hidden case split. To exclude $f(s)<0$, the proof uses continuity to move slightly to the right of $s$ while keeping $f$ negative, contradicting that $s$ is an upper bound for $A$. To exclude $f(s)>0$, it moves slightly to the left of $s$ while keeping $f$ positive, contradicting that points of $A$ approach $s$ from below. Constructively, this requires deciding between $f(s)<0$, $f(s)=0$, and $f(s)>0$, or at least deciding enough order information to run the two contradiction arguments. The bisection reformulation replaces these three nonconstructive demands by explicit data: approximate sign information replaces decidability of $A$, a modulus of uniform continuity tells how small the intervals must be, and the nested-interval construction produces finite approximations to a zero without first selecting an exact supremum.
[/example]
Proof by contradiction also requires careful reading. Constructively, proving $\neg P$ by assuming $P$ and deriving contradiction is valid. What fails is proving $P$ by assuming $\neg P$ and deriving contradiction, unless $P$ is stable.
[remark: Where Proof By Contradiction Is Harmless]
A contradiction argument is constructive when its conclusion is negative, such as $\neg P$, uniqueness, incompatibility, or impossibility. It becomes suspect when its conclusion asks for positive data, such as $P\vee Q$, $\exists x\,P(x)$, a maximal object, or a convergent subsequence.
[/remark]
This distinction is the main practical lesson of the chapter. Classical proofs should be read as compressed information: some lines encode algorithms, some encode finite searches, and some encode principles that constructive mathematics deliberately keeps visible.
[explanation: The Comparison Workflow]
When faced with a classical theorem, first rewrite the conclusion in logical form. If the conclusion is negative or stable, try a double-negation or negative-translation argument. If the conclusion is existential, ask what data a witness should include and whether the proof supplies it. If a compactness theorem is used, identify whether the application needs a finite net, a convergent subsequence, an open-cover finite subcover, or a located supremum. If a choice principle is used, record the exact family from which choices are made.
The outcome should be a precise constructive statement. Sometimes it is the original theorem. Sometimes it is a reformulation with moduli, locatedness, decidable equality, separability, total boundedness, or explicit generators. Sometimes the audit proves that the classical theorem depends on a nonconstructive principle, which is itself valuable information about the theorem's content.
[/explanation]
The comparison with classical mathematics therefore strengthens rather than narrows mathematical understanding. Constructive mathematics asks every theorem to display the information it uses. Classical mathematics remains a powerful guide, but its proofs must be unpacked before their computational content is visible.
Chapter 11 showed that comparison with classical mathematics is not a rejection of classical methods, but a way to measure their constructive content. The final chapter brings that perspective together by treating constructive proof as something to be audited and built carefully, so that every theorem reveals the information it really uses.
# 12. Synthesis: Building and Auditing Constructive Proofs
## Auditing Proofs for Constructive Content
A constructive audit starts with a practical question: if this proof is accepted, what data could be extracted from it, and where would a reader have to make an undecidable choice? The audit is not a search for stylistic defects. It is a way of locating witnesses, algorithms, moduli, bounds, and logical principles that were used without being named.
[definition: Constructive Proof Audit]
A constructive proof audit of a proof consists of a record of the following items:
1. each existential conclusion and the witness produced for it;
2. each universal-existential conclusion and the algorithm taking inputs to witnesses;
3. each convergence, continuity, compactness, or approximation assertion and the modulus or bound supplied for it;
4. each disjunction and the method deciding which side is proved;
5. each use of excluded middle, Markov's principle, countable choice, dependent choice, or a least-element argument;
6. each passage from negative information to positive data.
[/definition]
The point of the audit is that constructive failures usually have a local source. A proof of $\neg\exists x\,P(x)$ may be constructive even when a nearby proof of $\neg\forall x\,\neg P(x) \to \neg\neg\exists x\,P(x)$ gives no witness. To see why the checklist matters, we first examine a familiar classical pattern where the missing witness is disguised as an order-theoretic move.
[example: Auditing a Least Counterexample Argument]
Suppose a classical proof starts from $\neg\forall n\,P(n)$, rewrites this as $\exists n\,\neg P(n)$, chooses the least such $n_0$, and then derives a contradiction from the fact that $P(m)$ holds for every $m<n_0$. The constructive audit separates the two logical steps that the classical proof has merged: $\neg\forall n\,P(n)$ says that a proof of $\forall n\,P(n)$ would be impossible, while $\exists n\,\neg P(n)$ supplies an actual natural number $n$ together with evidence for $\neg P(n)$.
The implication from positive data to negative data is constructive: if $(n,h)$ witnesses $\exists n\,\neg P(n)$ and $F$ witnesses $\forall n\,P(n)$, then $F(n)$ proves $P(n)$, while $h$ proves $P(n)\to\bot$; applying $h$ to $F(n)$ gives $\bot$, so $\exists n\,\neg P(n)$ implies $\neg\forall n\,P(n)$. The reverse implication is the missing classical move: from $\neg\forall n\,P(n)$ alone there is no displayed number $n$ and no displayed proof of $\neg P(n)$. Thus the least-counterexample proof is constructive only after adding enough data to produce a counterexample, for example a bound $B$ and decidability of each $P(0),\dots,P(B)$, together with the stronger hypothesis $\neg\bigl(P(0)\wedge\cdots\wedge P(B)\bigr)$.
With those bounded decidable hypotheses, one searches the finite list $0,1,\dots,B$ in order. At stage $i$, decidability gives either $P(i)$ or $\neg P(i)$; if $\neg P(i)$ is found, stop and return that $i$. If the search reaches the end with proofs of $P(0),\dots,P(B)$, conjunction introduction gives $P(0)\wedge\cdots\wedge P(B)$, contradicting the bounded negative hypothesis. The repaired proof therefore either avoids the least counterexample and proves $\forall n\,P(n)$ directly by induction, or states this bounded counterexample principle with its decidability and bound made explicit.
[/example]
This example illustrates the most common audit pattern in the course: negative information is cheap, positive data is expensive. We need the opposite kind of proof, where every existential output is carried by a concrete computation from the start.
[quotetheorem:7540]
[citeproof:7540]
This result is a model constructive argument because the output is stronger than the classical existence statement: it gives the gcd and the Bezout coefficients in a single computation. Each hypothesis is doing real work. The condition that $a$ and $b$ are not both zero prevents the degenerate case in which every integer divides both inputs and there is no positive greatest common divisor satisfying the displayed universal property. The divisibility clauses identify $g$ as a gcd, while the Bezout identity supplies extra witness data: it proves not merely that $g$ exists, but that $g$ is explicitly generated by $a$ and $b$.
The audit also records termination as part of the proof, because an algorithm without a termination argument would not yet be a constructive witness for a total function. What the theorem does not say is that every proof of existence of a gcd is equally constructive. For example, a proof that appeals to the least positive element of the set $\{ua+vb : u,v\in\mathbb{Z},\ ua+vb>0\}$ must still explain constructively how such an element is found; the Euclidean algorithm supplies this missing search procedure and its termination measure.
[example: Computing Bezout Data]
For $a=30$ and $b=21$, the Euclidean algorithm first divides $30$ by $21$:
\begin{align*}
30 = 1\cdot 21 + 9.
\end{align*}
Then it divides $21$ by the remainder $9$:
\begin{align*}
21 = 2\cdot 9 + 3.
\end{align*}
Finally it divides $9$ by the remainder $3$:
\begin{align*}
9 = 3\cdot 3 + 0.
\end{align*}
The last non-zero remainder is therefore $3$, so the algorithm sets $g=3$.
To extract the Bezout coefficients, solve the second division identity for $3$:
\begin{align*}
3 = 21 - 2\cdot 9.
\end{align*}
The first division identity gives $9=30-1\cdot 21$, so substituting this expression for $9$ gives
\begin{align*}
3 = 21 - 2(30-1\cdot 21).
\end{align*}
Distributing the factor $-2$ gives
\begin{align*}
3 = 21 - 2\cdot 30 + 2\cdot 21.
\end{align*}
Combining the two multiples of $21$ gives
\begin{align*}
3 = 3\cdot 21 - 2\cdot 30.
\end{align*}
Reordering the terms into the form $g=ua+vb$ gives
\begin{align*}
3 = (-2)\cdot 30 + 3\cdot 21.
\end{align*}
Thus the algorithm returns $g=3$, $u=-2$, and $v=3$. The displayed identity is the witness that $3$ is an integer linear combination of $30$ and $21$, so the gcd is not merely asserted to exist; it is produced together with its Bezout data.
[/example]
The same audit method applies outside arithmetic, using the terminology developed earlier: in analysis the witness is often a modulus from Chapters 8 and 9, while in logic it may be a negative translation from Chapter 4 or a realizer from Chapter 6.
## Quantitative Replacements for Qualitative Analysis
Classical analysis often states that an object exists because a sequence converges, a function is continuous on a compact domain, or a family is uniformly controlled. Constructively, the question is more precise: what finite information determines the requested approximation, and how large must the finite stage be?
[definition: Modulus of Cauchy Convergence]
Let $(x_n)_{n \in \mathbb{N}}$ be a sequence in a metric space $(X,d)$. A modulus of Cauchy convergence for $(x_n)$ is a function $N:\mathbb{N} \to \mathbb{N}$ such that for all $k \in \mathbb{N}$ and all $m,n \ge N(k)$,
\begin{align*}
d(x_m,x_n) \le 2^{-k}.
\end{align*}
[/definition]
This definition packages convergence as usable data. Instead of saying that tails eventually become small, it gives a rule that transforms an error request $2^{-k}$ into a stage after which the tail is small; we need to record how that rule passes from the sequence to its limit.
[quotetheorem:7541]
[citeproof:7541]
The theorem exposes a pattern that appears throughout constructive analysis: a qualitative hypothesis is replaced by a hypothesis carrying a rate, and the conclusion returns a rate of the same kind. The completeness assumption is indispensable: in $\mathbb{Q}$ with the usual metric, a rational Cauchy sequence may have a rational modulus while converging in $\mathbb{R}$ to $\sqrt{2}$, so no point of $\mathbb{Q}$ can be returned. The modulus assumption is also indispensable, because a bare assertion that a sequence is Cauchy gives no procedure for deciding how far out to go for a requested precision. Finally, the chosen completion operation is not cosmetic: constructively, the proof must specify how a limit object is represented and how the representation supports the displayed error estimate.
This does not weaken the result for applications, since numerical and symbolic uses of convergence usually require such estimates. It places the hidden data where later arguments can call it explicitly.
[example: Rewriting Uniform Convergence with a Rate]
Let $(f_n)_{n\in\mathbb{N}}$ be functions $f_n:[0,1]\to\mathbb{R}$. A classical use of uniform convergence says: given $\varepsilon>0$, choose an index $N$ such that for every $x\in[0,1]$ and every $n\ge N$,
\begin{align*}
|f_n(x)-f(x)|<\varepsilon.
\end{align*}
The constructive rewrite stores the choice of index as a function of the requested binary precision: a rate $N:\mathbb{N}\to\mathbb{N}$ satisfying, for every $k\in\mathbb{N}$, every $x\in[0,1]$, and every $n\ge N(k)$,
\begin{align*}
|f_n(x)-f(x)|\le 2^{-k}.
\end{align*}
Suppose a later argument needs error at most $\beta$, and suppose we have chosen an integer $k_\beta$ such that
\begin{align*}
2^{-k_\beta}\le \beta.
\end{align*}
Now take any $x\in[0,1]$ and any $n\ge N(k_\beta)$. Applying the rate with $k=k_\beta$ gives
\begin{align*}
|f_n(x)-f(x)|\le 2^{-k_\beta}.
\end{align*}
Combining this estimate with $2^{-k_\beta}\le\beta$ gives
\begin{align*}
|f_n(x)-f(x)|\le \beta.
\end{align*}
Thus the old phrase "choose $N$ by uniform convergence" has been replaced by the explicit instruction "choose $k_\beta$ with $2^{-k_\beta}\le\beta$, then use the index $N(k_\beta)$." The proof has the same shape as the classical proof, but every appeal to eventual uniform closeness now calls a specified rate.
[/example]
Uniform convergence is only one place where hidden quantitative data appears. Continuity arguments are another: this motivates the definition of a modulus of uniform continuity, the function converting an output tolerance into an input tolerance.
[definition: Modulus of Uniform Continuity]
Let $(X,d_X)$ and $(Y,d_Y)$ be metric spaces. A modulus of uniform continuity for a function $f:X\to Y$ is a function $\rho:\mathbb{N} \to \mathbb{N}$ such that for all $k \in \mathbb{N}$ and all $x,y \in X$,
\begin{align*}
d_X(x,y) \le 2^{-\rho(k)} \implies d_Y(f(x),f(y)) \le 2^{-k}.
\end{align*}
[/definition]
The modulus is the constructive substitute for the phrase "given an error, choose a small enough input tolerance." If the proof actually computes that tolerance, then the theorem should state it.
[example: Extracting a Modulus from a Lipschitz Bound]
Let $f:[0,1]\to\mathbb{R}$ satisfy $|f(x)-f(y)|\le L|x-y|$ for all $x,y\in[0,1]$, where $L\ge1$ is known. Choose an integer $c\ge0$ such that $L\le 2^c$, and define $\rho:\mathbb{N}\to\mathbb{N}$ by $\rho(k)=k+c$. We show that $\rho$ is a modulus of uniform continuity.
Fix $k\in\mathbb{N}$ and $x,y\in[0,1]$, and assume
\begin{align*}
|x-y|\le 2^{-\rho(k)}.
\end{align*}
Since $\rho(k)=k+c$, this is
\begin{align*}
|x-y|\le 2^{-(k+c)}.
\end{align*}
Applying the Lipschitz bound gives
\begin{align*}
|f(x)-f(y)|\le L|x-y|.
\end{align*}
Using $|x-y|\le 2^{-(k+c)}$ and $L\ge0$, we get
\begin{align*}
L|x-y|\le L2^{-(k+c)}.
\end{align*}
Using $L\le 2^c$ and $2^{-(k+c)}>0$, we get
\begin{align*}
L2^{-(k+c)}\le 2^c2^{-(k+c)}.
\end{align*}
The powers of $2$ combine as
\begin{align*}
2^c2^{-(k+c)}=2^{c-(k+c)}=2^{-k}.
\end{align*}
Therefore
\begin{align*}
|f(x)-f(y)|\le 2^{-k}.
\end{align*}
Thus $\rho(k)=k+c$ supplies exactly the input precision needed to guarantee output precision $2^{-k}$; the Lipschitz estimate has been converted into explicit modulus data for later approximation arguments.
[/example]
Quantitative statements also clarify how constructive results interact with classical mathematics. A classical reader can ignore the modulus and read the theorem as a usual convergence statement, while a constructive reader keeps the modulus as computational content.
## Negative Translations and Logical Stability
Not every classical proof can be repaired by adding a bound. Some proofs use excluded middle in a way that changes the logical form of the theorem. The synthesis question here is: when can a classical proof be transported into intuitionistic logic by translating its propositions into stable negative form?
[definition: Negative Translation]
A negative translation is an assignment $A \mapsto A^N$ from formulas to formulas such that intuitionistic logic proves $A^N$ is stable under double negation, and classical provability of $A$ yields intuitionistic provability of $A^N$.
[/definition]
After translation, disjunctions and existentials usually lose direct witness content unless they already occur under negative structure. To use this idea responsibly, we need to name the property that lets double-negated information be recovered without adding a general law of excluded middle.
[definition: Stable Proposition]
A proposition $A$ is stable if intuitionistic logic proves
\begin{align*}
\neg\neg A \to A.
\end{align*}
[/definition]
Stability is common for negative propositions such as $\neg B$, and for decidable propositions. Once translations land in stable propositions, the next question is whether translations may be layered without losing the soundness guarantee.
[quotetheorem:7542]
[citeproof:7542]
This result is a logical counterpart of the quantitative audit in analysis. In both cases, the corrected theorem makes explicit what the original proof preserved: either numerical data such as a modulus, or logical data such as stability under double negation. The soundness hypotheses for both component translations are essential. If $(-)^M$ sends every formula to a fixed intuitionistically provable formula such as $0=0$, then it is stable but it no longer preserves the content of classical proofs; composition with another translation cannot repair that loss. Stability is also essential for the advertised conclusion: a translation may be sound into intuitionistic logic while producing formulas that still contain unprotected disjunctions or existentials, and then double-negated information cannot be eliminated at the translated endpoint.
The limitation is therefore sharp. Composition preserves metatheoretic soundness and stability only when each layer already has those properties. It does not turn a negative translation into a witness-extracting interpretation of the original formula.
[example: Double Negated Excluded Middle]
[claim]For every proposition $P$, intuitionistic logic proves $\neg\neg(P\vee\neg P)$.[/claim]
[proof]To prove $\neg\neg(P\vee\neg P)$, assume
\begin{align*}
H:\neg(P\vee\neg P).
\end{align*}
We must derive $\bot$.
First we construct $\neg P$. Assume
\begin{align*}
p:P.
\end{align*}
By left introduction for disjunction, $p$ gives
\begin{align*}
\operatorname{inl}(p):P\vee\neg P.
\end{align*}
Applying $H$ to this disjunction gives
\begin{align*}
H(\operatorname{inl}(p)):\bot.
\end{align*}
Since an arbitrary assumption $p:P$ led to $\bot$, implication introduction gives
\begin{align*}
\lambda p.\,H(\operatorname{inl}(p)):\neg P.
\end{align*}
Call this proof $h:\neg P$.
Now $h$ itself gives the right side of the disjunction, so right introduction gives
\begin{align*}
\operatorname{inr}(h):P\vee\neg P.
\end{align*}
Applying $H$ again gives
\begin{align*}
H(\operatorname{inr}(h)):\bot.
\end{align*}
Thus every assumption $H:\neg(P\vee\neg P)$ leads to $\bot$, so by implication introduction we have
\begin{align*}
\neg\neg(P\vee\neg P).
\end{align*}
[/proof]
The proof does not construct either a proof of $P$ or a proof of $\neg P$ in advance; it shows only that the denial of the disjunction is impossible, so the classical disjunction survives intuitionistically in negative form.
[/example]
The audit lesson is that a negative translation preserves consistency strength and classical consequences, but it should not be mistaken for a construction of the original disjunction or existential. When the course asks for constructive mathematics rather than metatheoretic comparison, the translated theorem is often an intermediate result rather than the final target.
## Stating Theorems for Both Constructive and Classical Use
A final practical problem remains: how should we state constructive theorems so that they expose computational content without becoming unreadable to mathematicians trained on classical formulations? The answer is to state the ordinary mathematical object first, then attach the extra data in the hypotheses or conclusion exactly where the proof uses it.
[definition: Constructively Informative Statement]
A theorem statement is constructively informative if every existential, disjunctive, convergence, continuity, compactness, or choice-dependent assertion in its conclusion is accompanied by the data needed to verify or compute it from the hypotheses.
[/definition]
This definition is not a new logical principle. It is a drafting standard for mathematical communication: the statement should tell the reader which parts of the proof are computationally meaningful.
[example: Constructive Banach-Style Statement]
Let $(X,d)$ be a metric space with a modulus-bearing completion operation, let $T:X\to X$ satisfy the contraction estimate
\begin{align*}
d(Tu,Tv)\le q\,d(u,v)
\end{align*}
for all $u,v\in X$, where $0\le q<1$, and choose a starting point $x_0\in X$. Define the Picard iterates by $x_{n+1}=T(x_n)$, and put $\Delta=d(x_1,x_0)$.
The first step is to make the convergence rate explicit. For $i=0$,
\begin{align*}
d(x_{1},x_0)=q^0\Delta.
\end{align*}
If $d(x_{i+1},x_i)\le q^i\Delta$, then using $x_{i+2}=T(x_{i+1})$, $x_{i+1}=T(x_i)$, and the contraction estimate gives
\begin{align*}
d(x_{i+2},x_{i+1})=d(Tx_{i+1},Tx_i)\le q\,d(x_{i+1},x_i)\le q\,q^i\Delta=q^{i+1}\Delta.
\end{align*}
Thus induction gives $d(x_{i+1},x_i)\le q^i\Delta$ for every $i\in\mathbb{N}$.
Now let $m>n$. Repeated use of the triangle inequality gives
\begin{align*}
d(x_m,x_n)\le d(x_{n+1},x_n)+d(x_{n+2},x_{n+1})+\cdots+d(x_m,x_{m-1}).
\end{align*}
Using the estimate just proved, this becomes
\begin{align*}
d(x_m,x_n)\le (q^n+q^{n+1}+\cdots+q^{m-1})\Delta.
\end{align*}
Factoring out $q^n$ gives
\begin{align*}
q^n+q^{n+1}+\cdots+q^{m-1}=q^n(1+q+\cdots+q^{m-n-1}).
\end{align*}
For $r=m-n$, the finite geometric sum satisfies
\begin{align*}
(1-q)(1+q+\cdots+q^{r-1})=1-q^r.
\end{align*}
Since $0\le q<1$, we have $0\le q^r\le1$, hence $1-q^r\le1$, so
\begin{align*}
1+q+\cdots+q^{r-1}\le (1-q)^{-1}.
\end{align*}
Therefore
\begin{align*}
d(x_m,x_n)\le q^n(1-q)^{-1}\Delta.
\end{align*}
A constructive Banach-style statement now packages this estimate as part of the conclusion: the completion operation applied to the Cauchy sequence $(x_n)$ returns a point $x$, and the same tail bound gives
\begin{align*}
d(x_n,x)\le q^n(1-q)^{-1}d(x_1,x_0).
\end{align*}
The fixed point is obtained from the displayed algorithm $x_{n+1}=T(x_n)$ together with its rate of convergence, so the classical assertion of a unique fixed point is restated with the computational data that the proof actually uses.
[/example]
Good constructive statements often look slightly longer because they separate three roles that classical prose merges: existence of an object, a method for obtaining approximations to it, and a proof that the approximations meet the requested error. The added length pays for itself when the theorem is reused.
[explanation: Audit Principle for Constructive Restatement]
An audit principle is not a single formal theorem until a background proof system, a translation scheme, and a precise notion of classical consequence have been fixed. In these notes it is a disciplined method: read the classical proof line by line and replace each non-constructive move by the data that would make the same move intuitionistically valid. Existential introductions are replaced by recorded witnesses, bounded searches by their decidability and finite bounds, analytic limiting steps by their moduli, and negative classical subclaims by stable translations. After those replacements, forgetting the extra data recovers the classical implication, because witnesses imply existence, moduli imply qualitative convergence or continuity, and stable translations preserve the relevant negative consequences.
[/explanation]
The principle has strict limitations, and those limitations are part of the audit. If a proof uses unbounded search over an undecidable predicate, the audit cannot manufacture an algorithm; it must either weaken the conclusion, add a decidability or boundedness hypothesis, or record the non-constructive principle being assumed. If a compactness argument extracts an exact point but supplies only finite approximations, the constructive restatement should usually conclude approximate data rather than the original exact existential. If a negative translation is used for a positive existential conclusion, the result is generally only a double-negated or translated existence statement, not a witness. Thus the listed hypotheses are indispensable: they are precisely the places where the repaired proof obtains its computational or logical content.
[example: Extreme Value Theorem Audit]
A classical proof of the extreme value theorem for a continuous $f:[0,1]\to\mathbb{R}$ may use compactness to extract a convergent subsequence, excluded middle to compare values, and completeness to pass to a limit. A constructive replacement asks instead for explicit data: a modulus of uniform continuity $\rho$ for $f$, and a procedure which, at each rational grid point $t$, computes rational approximations to $f(t)$ to any requested binary precision.
Fix $k\in\mathbb{N}$, and put $\epsilon=2^{-(k+2)}$. Choose $M\in\mathbb{N}$ with $M>0$ and
\begin{align*}
M^{-1}\le 2^{-\rho(k+2)}.
\end{align*}
Consider the finite grid $t_i=i/M$ for $i=0,\dots,M$. For each $i$, compute a rational $r_i$ such that
\begin{align*}
|r_i-f(t_i)|\le \epsilon.
\end{align*}
Since the list $r_0,\dots,r_M$ is finite and rational-valued, bounded decidable comparison selects an index $j$ such that $r_i\le r_j$ for every $i$. Define
\begin{align*}
x_k=t_j.
\end{align*}
Now let $y\in[0,1]$. Choose a grid point $t_i$ with $|y-t_i|\le M^{-1}$; for instance, take a neighboring point in the grid interval containing $y$. Since $M^{-1}\le 2^{-\rho(k+2)}$, the modulus of uniform continuity gives
\begin{align*}
|f(y)-f(t_i)|\le 2^{-(k+2)}=\epsilon.
\end{align*}
Therefore
\begin{align*}
f(y)\le f(t_i)+\epsilon.
\end{align*}
From $|r_i-f(t_i)|\le\epsilon$, we have
\begin{align*}
f(t_i)\le r_i+\epsilon.
\end{align*}
Combining the previous two inequalities gives
\begin{align*}
f(y)\le r_i+2\epsilon.
\end{align*}
Since $r_i\le r_j$, this gives
\begin{align*}
f(y)\le r_j+2\epsilon.
\end{align*}
From $|r_j-f(t_j)|\le\epsilon$, we have
\begin{align*}
r_j\le f(t_j)+\epsilon.
\end{align*}
Hence
\begin{align*}
f(y)\le f(t_j)+3\epsilon.
\end{align*}
Because $\epsilon=2^{-(k+2)}$,
\begin{align*}
3\epsilon=3\cdot 2^{-(k+2)}\le 4\cdot 2^{-(k+2)}=2^{-k}.
\end{align*}
Since $t_j=x_k$, we conclude
\begin{align*}
f(y)\le f(x_k)+2^{-k}.
\end{align*}
Thus the constructive audit replaces the exact maximizer by the computable approximate maximizer $x_k$: the grid size comes from the modulus, the finite maximum is chosen by decidable rational comparison, and the final error bound is obtained by adding the visible approximation errors.
[/example]
This final example summarizes the course's main practical message. Constructive mathematics is not obtained by deleting classical words from a proof. It is obtained by tracking the information that the proof must supply: witnesses for existence, algorithms for search, moduli for limiting processes, decidability for cases, and stable translations for negative classical reasoning.
## Beyond This Page
Constructive mathematics connects naturally to [Proof Theory](/page/Proof%20Theory), computability, type-theoretic foundations, and analysis. The proof-theoretic side continues with cut elimination, normalization, and realizability interpretations. The analytic side develops continuity and convergence with explicit data, as in [Continuity (Real Analysis)](/page/Continuity%20%28Real%20Analysis%29) and [Continuity (Metric Spaces)](/page/Continuity%20%28Metric%20Spaces%29). These topics reuse the same discipline emphasized here: keep witnesses, moduli, decidability assumptions, and choice principles visible.
For readers using these notes as a bridge into later material, the most useful next audit is to compare each classical theorem with the data its constructive version requires: a witness, a decision procedure, a modulus, a choice principle, or a stable negative translation. That comparison is the recurring thread tying intuitionistic logic to Bishop-style analysis.
## References
Errett Bishop and Douglas Bridges, *Constructive Analysis*.
Anne Sjerp Troelstra and Dirk van Dalen, *Constructivism in Mathematics: An Introduction*.
Michael Beeson, *Foundations of Constructive Mathematics*.
Sara Negri and Jan von Plato, *Structural Proof Theory*.
Contents
- Introduction
- What Constructive Mathematics Asks For
- Truth, Refutation, And Negation
- Logic As A Calculus Of Constructions
- Constructive Analysis As Ordinary Analysis With More Data
- 1. Constructive Proof and Mathematical Meaning
- Judgements, Constructions, and Evidence
- The BHK Interpretation of Logical Connectives
- Negation, Refutability, and Double Negation
- Decidability, Stability, and Classical-Looking Conclusions
- 2. Intuitionistic Propositional and Predicate Logic
- Natural Deduction for Intuitionistic Logic
- The Sequent Calculus LJ
- Disjunction and Existence Properties
- 3. Semantics: Heyting Algebras and Kripke Models
- Why Implication Needs New Semantics
- Kripke Frames And Persistent Truth
- Completeness For Intuitionistic Propositional Logic
- 4. Classical Logic Inside Constructive Logic
- Negative and Stable Formulas
- Glivenko's Theorem and Double-Negation Translation
- De Morgan Principles and Weak Forms of Classical Logic
- 5. Arithmetic, Decidability, and Omniscience Principles
- Arithmetic as a Constructive Theory
- Omniscience Principles for Binary Sequences
- Sigma-One Search and Markov's Principle
- Decision, Semidecision, and Search
- 6. Realizability and Proofs as Programs
- Realizers for Arithmetic Formulas
- Soundness and Numerical Existence
- Church's Thesis and the Boundary Between Realizable and Derivable Principles
- Modified Realizability and Extracted Algorithms
- 7. Constructive Sets, Functions, and Metric Spaces
- Sets, Predicates, and Extensional Functions
- Apartness and Located Subsets
- Total Boundedness and Constructive Compactness
- Metric Completion by Regular Cauchy Sequences
- Continuity Principles in Completed Spaces
- 8. Constructive Real Numbers and Elementary Analysis
- Presentations of the Real Numbers
- Order, Apartness, and Located Intervals
- Sequences, Series, and Moduli of Continuity
- Constructive Versions of Classical Theorems
- 9. Bishop-Style Constructive Analysis
- Bishop's Program and Interpretations
- Normed Spaces and Fixed Point Arguments
- Integration and Measure in a Bishop Framework
- 10. Choice Principles, Fan Theorems, and Compactness
- Controlled Choice Principles
- Bars and Brouwer's Fan Theorem
- Constructive Compactness Principles
- 11. Comparison with Classical Mathematics
- Sorting Classical Theorems by Constructive Content
- Nonconstructive Existence In Algebra And Analysis
- Compactness Notions That Classically Coincide
- Reading Classical Proofs Constructively
- 12. Synthesis: Building and Auditing Constructive Proofs
- Auditing Proofs for Constructive Content
- Quantitative Replacements for Qualitative Analysis
- Negative Translations and Logical Stability
- Stating Theorems for Both Constructive and Classical Use
- Beyond This Page
- References
Constructive Mathematics
Content
Problems
History
Created by admin on 6/18/2026 | Last updated on 6/18/2026
Prerequisites (0/9 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent