This course develops type theory as a unified language for logic, computation, and structured mathematics. It begins with the untyped and simply typed lambda calculus as a foundation for expressing programs and proofs, then moves into the Curry-Howard correspondence, where propositions are viewed as types and proofs as terms. From there, the course expands the basic type formers, showing how products, sums, inductive data, dependent types, identity types, and universes let us represent increasingly rich mathematical structures in a constructive setting.
The chapters are arranged to build this theory in layers. Early material establishes typed syntax and natural deduction, then introduces dependent judgments and Martin-Löf type theory as the core framework. Subsequent chapters study Pi and Sigma types, equality reasoning, and the role of universes in [constructive mathematics](/page/Constructive%20Mathematics), before turning to metatheoretic results such as normalization, canonicity, and consistency. The later chapters connect the theory to proof assistants, categorical semantics, and modern extensions, giving both a working foundation and a broader conceptual view of how type theory functions in logic and in formalized mathematics.
# Introduction
Type theory begins from a deceptively practical question: how can a formal language keep track of what its expressions are allowed to mean before those expressions are evaluated? In set-theoretic foundations, mathematical objects are usually collected into sets after they have been described; in type theory, the classification of an expression is part of the act of forming it. This course studies that shift from classification as an external property to classification as internal syntax.
The central path of the course runs from the simply typed lambda calculus to Martin-Löf dependent type theory. Along the way, the same formal patterns will be read in several ways: terms as programs, types as specifications, propositions as types, proofs as terms, and contexts as structured environments of assumptions. The purpose of this opening chapter is to fix the guiding questions and the basic vocabulary that the later technical chapters will make precise.
## Why Types Belong in the Syntax
What goes wrong if a formal language admits every expression and sorts out meaningfulness only afterwards? Untyped lambda calculus is powerful enough to express computation directly, but it admits terms such as self-application that make uniform reasoning about evaluation and logical interpretation delicate. A type system restricts formation so that many malformed expressions are excluded before reduction begins.
[definition: Type System]
A type system is a collection of syntactic rules assigning types to terms by derivations of judgments of the form $\Gamma \vdash t : A$, where $\Gamma$ is a context, $t$ is a term, and $A$ is a type.
[/definition]
The structural emphasis is important: a type assignment is witnessed by a derivation built from rules, rather than by a later semantic inspection of a finished string. Chapters 1 through 7 will specify the rules for function types, product types, sum types, dependent products, dependent sums, and identity types.
[example: Typing A Function Application]
Assume $\Gamma \vdash f : A \to B$ and $\Gamma \vdash a : A$. The function-elimination rule for $A \to B$ has exactly these two premises: one term of function type $A \to B$ and one argument of type $A$ in the same context $\Gamma$. Applying the rule gives the judgment
\begin{align*}
\Gamma \vdash f(a) : B.
\end{align*}
The argument $a$ is accepted because its type matches the domain $A$ of $f$, and the resulting term has type $B$ because $B$ is the codomain specified by the function type $A \to B$.
[/example]
The example illustrates the discipline that will reappear throughout the course: constructors build canonical elements of a type, while eliminators explain how such elements may be used. Function application is the eliminator for a function type, and lambda abstraction is its constructor.
## Judgments Before Truth Values
If type theory is to serve as a foundation for mathematics, then it needs a notion more primitive than the statement that a proposition is true. The primitive unit is a judgment: an assertion that a term has a type, that two terms are definitionally equal, or that a context is well formed. Truth will enter through the Curry-Howard reading, but the formal system begins with derivability of judgments.
[definition: Typing Judgment]
A typing judgment is an expression $\Gamma \vdash t : A$ asserting that, under the assumptions recorded in the context $\Gamma$, the term $t$ has type $A$.
[/definition]
Typing judgments say which expressions are meaningful in a context, but a type checker also needs to know when two meaningful expressions count as the same during computation. That need leads to a second basic judgment, because equality used by the rules cannot always wait for a separately proved proposition.
[definition: Definitional Equality Judgment]
A definitional equality judgment is an expression $\Gamma \vdash s \equiv t : A$ asserting that, under the context $\Gamma$, the terms $s$ and $t$ are equal by the computation rules of the type theory at type $A$.
[/definition]
Definitional equality is stronger than propositional equality. It is the equality checked by the formal system during type checking, and it includes computation such as beta-reduction.
[example: Beta Computation As Definitional Equality]
Assume $\Gamma, x : A \vdash b : B$ and $\Gamma \vdash a : A$. The introduction rule for $A \to B$ turns the first derivation into
\begin{align*}
\Gamma \vdash \lambda x. b : A \to B.
\end{align*}
Since the argument has the matching input type,
\begin{align*}
\Gamma \vdash a : A,
\end{align*}
the function-elimination rule forms the application
\begin{align*}
\Gamma \vdash (\lambda x. b)(a) : B.
\end{align*}
The beta computation rule for function types says that applying an abstraction to an argument replaces each free occurrence of $x$ in the body by that argument:
\begin{align*}
(\lambda x. b)(a) \equiv b[a/x].
\end{align*}
Thus the definitional equality judgment is
\begin{align*}
\Gamma \vdash (\lambda x. b)(a) \equiv b[a/x] : B.
\end{align*}
In dependent notation the codomain may mention $x$, so substituting $a$ for $x$ also changes the displayed result type from $B$ to $B[a/x]$:
\begin{align*}
\Gamma \vdash (\lambda x. b)(a) \equiv b[a/x] : B[a/x].
\end{align*}
When $B$ does not mention $x$, the substitution leaves the type unchanged, so $B[a/x]$ is just $B$.
[/example]
This example foreshadows why substitution is the main technical operation in the early course. Typing, equality, and computation all have to be stable when terms are substituted for variables.
## Contexts As Structured Assumptions
How should a formal system record the assumptions under which a term or proof is valid? In first-order logic, assumptions are often handled as a background list of formulas. In type theory, a context is itself a syntactic object whose entries may depend on previous entries.
[definition: Context]
A context is a finite ordered list of variable declarations
\begin{align*}
\Gamma = x_1 : A_1,\ x_2 : A_2(x_1),\ \dots,\ x_n : A_n(x_1,\dots,x_{n-1})
\end{align*}
such that each type is well formed in the preceding part of the list.
[/definition]
The ordering matters in dependent type theory because later declarations may mention earlier variables. This is the first place where dependent types depart from the simply typed setting in a structural way.
[example: A Dependent Context]
Let $P$ be a type family over $A$, so an element $x : A$ determines a type $P(x)$. A valid dependent context can be written as
\begin{align*}
A : \mathsf{Type},\ x : A,\ p : P(x).
\end{align*}
The first declaration makes $A$ available as a type. The second declaration is then meaningful because $A$ has already been introduced. The third declaration is meaningful because $P(x)$ can be formed only after the variable $x : A$ is in scope.
If the same entries were reordered as
\begin{align*}
A : \mathsf{Type},\ p : P(x),\ x : A,
\end{align*}
then the declaration $p : P(x)$ would mention $x$ before $x$ had been declared, so that list is not a well-formed context. This is why a dependent context is an ordered structure of assumptions, not merely a set whose entries may be rearranged freely.
[/example]
Once contexts are ordered and dependency-sensitive, weakening, exchange, and substitution have to be formulated with care. Much of the metatheory in the course checks that these structural operations preserve well-formedness and typing.
## The Curry-Howard Reading
Why should a course on types also be a course on logic? The Curry-Howard correspondence says that logical connectives and type formers share the same introduction, elimination, and computation patterns. A proof of a proposition is represented by a term inhabiting the corresponding type.
[definition: Propositions As Types Reading]
The propositions as types reading interprets a proposition $P$ as a type and interprets a proof of $P$ as a term $p : P$.
[/definition]
This reading does not erase the distinction between logic and computation; instead, it explains why the same formal calculus can support both. Constructive logic appears because a proof must provide data of the appropriate type.
[example: Implication As Function Type]
Under the propositions-as-types reading, a proposition $P \implies Q$ is represented by the function type $P \to Q$. Suppose $f : P \to Q$ represents a proof of the implication and $p : P$ represents a proof of the premise. The elimination rule for the function type $P \to Q$ has two required inputs: a term of type $P \to Q$ and a term of its domain type $P$. Applying that rule to $f$ and $p$ yields
\begin{align*}
f(p) : Q.
\end{align*}
Thus modus ponens is not a separate operation added on top of the type theory: it is exactly function application, sending evidence for $P$ through a function that transforms such evidence into evidence for $Q$.
[/example]
The implication example gives only one connective, so the next question is whether the pattern persists across a whole fragment of logic. The course answers this by proving a systematic translation between natural deduction derivations and typed terms.
[quotetheorem:9610]
[citeproof:9610]
The restrictions in the statement matter. The logical system is intuitionistic because an ordinary typed lambda term is constructive data: a term of $P \to Q$ is a procedure transforming evidence for $P$ into evidence for $Q$, and a term of $P \vee Q$ must choose a side together with evidence for that side. Classical principles such as excluded middle or double-negation elimination are therefore not represented by ordinary intuitionistic lambda terms without adding extra operators, control effects, continuations, or axioms.
The fragment also matters. The theorem above does not say that every logical system is already a typed lambda calculus, nor does it identify propositions and types by informal resemblance alone. It says that, after fixing source rules, target type formers, context translation, and equality of terms, derivations and typed terms determine each other in a precise structural way. This is why later chapters introduce each type former by formation, introduction, elimination, and computation rules: those rules specify both its data meaning and its proof-theoretic behaviour.
## Computation And Metatheory
What makes a typed calculus mathematically trustworthy rather than merely suggestive notation? The answer lies in metatheorems about substitution, preservation, progress, normalization, and confluence. These results say that typing interacts coherently with computation.
[quotetheorem:4635]
[citeproof:4635]
The hypotheses are part of the content of the theorem. The well-formedness assumptions rule out malformed contexts and ill-scoped types, while the specific reduction relation ensures that computation follows the typing rules. If reduction were changed to discard an argument of type $A$ and replace it by an unrelated term of type $C$, or if beta-reduction performed substitution without respecting binding, the conclusion could fail. Preservation is therefore not a property of every rewriting system placed on lambda terms; it is a compatibility theorem between this syntax, this typing relation, and this operational semantics.
Preservation also has a precise limitation. It says that a step from a well-typed term remains well typed at the same type, but it does not say that a step exists. It also does not imply termination: a typed system with recursion may preserve types while allowing infinite computation. Type safety therefore needs a second theorem explaining why well-typed closed terms are either finished values or can take a step. This is the role of progress, which connects typing to the possible shapes of values.
[quotetheorem:9611]
[citeproof:9611]
Closedness is essential in the progress statement. An open term such as $x$ may be well typed under the context $x : A$, but it is neither a value introduced by a constructor nor a reducible expression; it is stuck because the environment has not supplied the missing data. The canonical forms lemmas are the bridge from typing to operational behaviour: for example, a closed value of function type must be a lambda abstraction, so an application whose function part has become such a value can use beta-reduction.
The chosen operational semantics is also part of the theorem. Progress for call-by-value refers to the values and evaluation contexts of call-by-value; changing to call-by-name changes which subterm is allowed to reduce first and which terms count as ready to use. Finally, progress alone does not preserve types after a step. It only promises that a closed well-typed term is not stuck at the current moment; preservation supplies the separate guarantee that the next term still has the same type, so the argument can be repeated.
Together, preservation and progress form the standard type-safety argument. They also motivate why the syntax, typing rules, and operational semantics must be designed together rather than independently.
## Dependent Types And Mathematical Structure
What changes when the type of an expression is allowed to depend on the value of another expression? The language can express families of types, predicates over data, and constructions whose output type records an input-dependent specification. This is the central move from simply typed lambda calculus to Martin-Löf type theory.
[definition: Dependent Type]
A dependent type over a type $A$ is a type family $B$ indexed by $A$, commonly written in informal notation as $B : A \to \mathsf{Type}$, assigning to each term $a : A$ a type $B(a)$.
[/definition]
The basic non-dependent function type $A \to B$ is recovered when the output type does not depend on the input. To express universal statements and input-sensitive programs, however, the course needs a function type whose codomain may vary with the argument.
[definition: Dependent Function Type]
Given a type $A$ and a family of types $B$ over $A$, the dependent function type is written
\begin{align*}
\prod_{x : A} B(x),
\end{align*}
and its terms are functions sending each $x : A$ to a term of type $B(x)$.
[/definition]
Dependent function types provide the type-theoretic form of universal quantification. To prove $\forall x : A,\ P(x)$ is to give a function taking each $x : A$ to a proof of $P(x)$.
[example: Polymorphic Identity]
A dependent function may express, in one term, the operation that sends each type to its identity function. Assume a universe $\mathsf{Type}$ and let $A : \mathsf{Type}$. In the extended context $A : \mathsf{Type}, x : A$, the variable rule gives
\begin{align*}
A : \mathsf{Type}, x : A \vdash x : A.
\end{align*}
By function introduction in the variable $x$, this yields
\begin{align*}
A : \mathsf{Type} \vdash \lambda x. x : A \to A.
\end{align*}
Now abstract over the type variable $A$. Since the resulting codomain depends on the chosen input type $A$, the dependent-function introduction rule gives
\begin{align*}
\vdash \lambda A.\lambda x. x : \prod_{A : \mathsf{Type}} A \to A.
\end{align*}
Thus applying the polymorphic identity first to a type $A$ produces the ordinary identity function on $A$, and applying that function to $a : A$ returns a term of the same type $A$.
[/example]
Dependent types are the bridge to proof assistants. A theorem statement becomes a type, and proving the theorem becomes constructing a term inhabiting that type.
## Constructive Mathematics And Proof Assistants
Why do Lean, Agda, and Coq organize mathematics through terms and types rather than through a separate proof language? A dependent type theory can express data, algorithms, propositions, and proofs inside one formal grammar. This makes type checking the central mechanical verification problem.
[definition: Type Checking Problem]
The type checking problem asks whether a proposed judgment $\Gamma \vdash t : A$ is derivable from the rules of the type theory.
[/definition]
In practical proof assistants, definitional equality is the computational engine behind type checking. Normalization and related decidability results explain why checking can often be automated, even when finding a proof remains creative.
[remark: Proof Search Versus Proof Checking]
Finding a term of a specified type may be difficult, because it amounts to finding a proof or constructing a program. Checking a proposed term is usually more constrained, because the rules dictate how its pieces must fit together. This distinction is one reason proof assistants can verify large developments while still requiring human guidance.
[/remark]
The course will not treat proof assistants merely as software tools. Instead, they serve as evidence that the formal metatheory has operational content: normalization, substitution, and definitional equality directly affect what the checker accepts.
## The Shape Of The Course
How do the later chapters build from this introduction? The first part develops lambda calculus and simple types, where substitution, beta-reduction, eta-conversion, preservation, progress, and normalization can be studied in a controlled setting. The second part translates these ideas into natural deduction and the Curry-Howard correspondence.
The middle of the course introduces dependent products, dependent sums, identity types, universes, and inductive types. Each type former will be presented through formation, introduction, elimination, and computation rules. This pattern is the grammar of Martin-Löf type theory.
The final part connects the formal calculus to constructive mathematics, categorical semantics, and modern proof-assistant libraries. By the end, the reader should be able to parse dependent type signatures, recognize the proof content of terms, and understand why metatheorems about syntax are foundational rather than technical decoration.
The foundational distinction between syntax, computation, and typing now has a concrete setting. In the next chapter, that same discipline is applied to the lambda calculus, where functions become the basic language of expression and typing judgments begin to separate meaningful terms from raw strings.
# 1. Lambda Calculus and Typed Syntax
Type theory begins with a compact language for functions. The first task is to separate three layers that are often blurred in informal programming notation: raw syntax, computation by rewriting, and typing judgments that certify which expressions are meaningful. This chapter starts with the untyped lambda calculus because it isolates binding and substitution, then adds simple types to control how terms can be applied, and finally formulates reduction as an operational semantics.
## Untyped Lambda Terms and Conversion
How can a formal language express anonymous functions without relying on a pre-existing notion of named procedures? The lambda calculus answers this by taking variables, abstraction, and application as primitive syntactic forms. The central difficulty is that abstraction binds variables, so the same function may have many different textual presentations.
[definition: Lambda Term]
Let $\textsf{Var}$ be a countably infinite set of variables. The set $\textsf{Tm}$ of untyped lambda terms is generated by the grammar
\begin{align*}
t ::= x \mid \lambda x.t \mid t\,u,
\end{align*}
where $x \in \textsf{Var}$.
[/definition]
The term $\lambda x.t$ is an abstraction: it represents a function whose formal parameter is $x$ and whose body is $t$. The term $t\,u$ is application: it represents applying $t$ to the argument $u$. Parentheses are used to disambiguate, and application associates to the left, so $t\,u\,v$ means $(t\,u)\,v$.
[example: Identity and Self-Application]
The term $\lambda x.x$ is a lambda term because $x$ is a variable and the grammar permits abstraction $\lambda x.t$ whenever $t$ is a term. The application $(\lambda x.x)\,y$ is also a lambda term: $\lambda x.x$ is a term, $y$ is a term, and the grammar permits application $t\,u$ of any two terms $t$ and $u$.
The self-application body $x\,x$ is a term for the same syntactic reason: the left occurrence of $x$ is a term, the right occurrence of $x$ is a term, and therefore $x\,x$ is a term. Hence $\lambda x.x\,x$ is a term by abstraction. Applying this term to itself is again just one use of the application clause:
\begin{align*}
(\lambda x.x\,x)(\lambda x.x\,x) \in \textsf{Tm}.
\end{align*}
Thus the untyped grammar checks only raw syntactic formation, not whether the left-hand term in an application has a function type; that is why self-application is admitted here but later rejected by the simply typed system.
[/example]
The example separates raw syntactic well-formedness from any later typing discipline. To decide which variables in such a term are supplied externally and which are bound internally, we need a recursive bookkeeping device. This bookkeeping is also needed before substitution can be stated without ambiguity.
[definition: Free Variable]
The free-variable operation is the function
\begin{align*}
FV : \textsf{Tm} \to \mathcal{P}(\textsf{Var})
\end{align*}
defined recursively by
\begin{align*}
FV(x) = \{x\}.
\end{align*}
\begin{align*}
FV(t\,u) = FV(t) \cup FV(u).
\end{align*}
\begin{align*}
FV(\lambda x.t) = FV(t) \setminus \{x\}.
\end{align*}
[/definition]
A term is closed when it has no free variables. Closed terms are also called combinators. Free variables tell us what a term depends on, but they do not yet identify when two abstractions differ only by the accidental choice of a bound variable name; that gap motivates a formal equality of binding patterns.
[definition: Alpha-Equivalence]
Alpha-equivalence, written $t \equiv_\alpha u$, is the least congruence on lambda terms identifying terms that differ only by consistent renaming of bound variables, provided no free variable becomes captured.
[/definition]
Alpha-equivalence records that $\lambda x.x$ and $\lambda y.y$ are the same function-forming expression, while $\lambda x.y$ and $\lambda y.y$ are not alpha-equivalent because the second term binds the occurrence of $y$. The problem now is to define capture-avoiding substitution, since applying a function to an argument requires replacement inside a body while preserving which variables are free. Bound names may have to be changed first, so the definition must describe replacement on binding structure rather than on surface strings.
[definition: Capture-Avoiding Substitution]
Capture-avoiding substitution is the operation
\begin{align*}
(-)[-/-] : \textsf{Tm} \times \textsf{Tm} \times \textsf{Var} \to \textsf{Tm}
\end{align*}
written $(t,s,x) \mapsto t[s/x]$, defined modulo alpha-equivalence, where $t[s/x]$ is the result of replacing the free occurrences of $x$ in $t$ by $s$, renaming bound variables in $t$ when necessary to avoid capture of free variables of $s$.
[/definition]
The definition is normally implemented by recursion on $t$, with a freshness condition in the abstraction case. If $t = \lambda y.r$ and $y \ne x$, then substitution proceeds under the binder only after ensuring $y \notin FV(s)$; otherwise, the bound variable $y$ is first renamed to a fresh variable. Before substitution is used as the computation rule for function application, the following example motivates why the capture-avoidance condition is part of the definition rather than an optional convention.
[example: Avoiding Capture]
In the substitution $(\lambda y.x)[y/x]$, the term being inserted is the variable $y$, so $FV(y)=\{y\}$. If we replaced the free occurrence of $x$ inside the body without renaming the binder, we would get
\begin{align*}
\lambda y.x \mapsto \lambda y.y.
\end{align*}
This is wrong for capture-avoiding substitution: the occurrence of $y$ inserted for $x$ was free before substitution, but in $\lambda y.y$ it lies under the binder $\lambda y$ and is therefore bound.
To avoid capture, rename the bound variable first. Choose $z$ with $z \ne y$ and $z \ne x$; then $\lambda y.x \equiv_\alpha \lambda z.x$, since the renamed binder does not bind the body variable $x$. Now substitution proceeds in the renamed body:
\begin{align*}
(\lambda y.x)[y/x] = (\lambda z.x)[y/x].
\end{align*}
\begin{align*}
(\lambda z.x)[y/x] = \lambda z.(x[y/x]).
\end{align*}
\begin{align*}
x[y/x]=y.
\end{align*}
Hence
\begin{align*}
(\lambda y.x)[y/x]=\lambda z.y.
\end{align*}
The result keeps the inserted variable $y$ free, so substitution depends on binding structure rather than textual search and replacement.
[/example]
The example explains why applying a lambda abstraction to an argument cannot be modelled by naive textual replacement. Since capture-avoiding substitution is now available, it motivates the first reduction relation: a function application computes by substituting the argument into the function body.
[definition: Beta-Reduction]
Beta-reduction, written $t \to_\beta u$, is the compatible one-step reduction relation generated by
\begin{align*}
(\lambda x.t)\,s \to_\beta t[s/x].
\end{align*}
The subterm $(\lambda x.t)\,s$ is called a beta-redex.
[/definition]
Beta-reduction may occur inside larger terms because the relation is compatible with abstraction and application. Its reflexive [transitive closure](/theorems/1493) is written $\to_\beta^*$. Beta-reduction captures computation by contraction; the next conversion captures an extensional principle saying when two functions count as the same by their action on an argument.
[definition: Eta-Conversion]
Eta-conversion identifies $\lambda x.t\,x$ with $t$ when $x \notin FV(t)$.
[/definition]
Eta-conversion is not a computation step in every operational presentation, but it is a useful extensional equality principle. It says that wrapping a function in an abstraction that immediately applies it to the new variable does not change its behaviour. With beta-reduction available, the main global question is whether different choices of redex can lead to incompatible results.
[quotetheorem:9612]
[citeproof:9612]
The theorem guarantees uniqueness of normal forms up to alpha-equivalence: if a term reduces to two beta-normal terms, they must be alpha-equivalent. This is a special property of beta-reduction, not a feature of rewrite systems in general. For example, a rewrite system on three symbols with rules $a \to b$ and $a \to c$, and no further rules, is not confluent because $b$ and $c$ cannot be joined. Church-Rosser says that beta-reduction avoids exactly this kind of incompatible branching even though different redex choices are available. It does not say that every term has a normal form; the self-application example above reduces forever. The next examples show that, despite this possible divergence, untyped lambda terms can encode familiar data.
[example: Church Booleans]
The untyped calculus can encode boolean values as selectors. Define
\begin{align*}
\mathsf{true} = \lambda t.\lambda f.t.
\end{align*}
\begin{align*}
\mathsf{false} = \lambda t.\lambda f.f.
\end{align*}
Using alpha-equivalent bound names chosen fresh for the branch terms $u$ and $v$, write
\begin{align*}
\mathsf{if} = \lambda b.\lambda p.\lambda q.b\,p\,q.
\end{align*}
First evaluate the true branch:
\begin{align*}
\mathsf{if}\,\mathsf{true}\,u\,v = (((\lambda b.\lambda p.\lambda q.b\,p\,q)\,\mathsf{true})\,u)\,v.
\end{align*}
\begin{align*}
((\lambda b.\lambda p.\lambda q.b\,p\,q)\,\mathsf{true}) \to_\beta \lambda p.\lambda q.\mathsf{true}\,p\,q.
\end{align*}
\begin{align*}
((\lambda p.\lambda q.\mathsf{true}\,p\,q)\,u) \to_\beta \lambda q.\mathsf{true}\,u\,q.
\end{align*}
\begin{align*}
(\lambda q.\mathsf{true}\,u\,q)\,v \to_\beta \mathsf{true}\,u\,v.
\end{align*}
Now expand $\mathsf{true}$:
\begin{align*}
\mathsf{true}\,u\,v = ((\lambda t.\lambda f.t)\,u)\,v.
\end{align*}
\begin{align*}
(\lambda t.\lambda f.t)\,u \to_\beta \lambda f.u.
\end{align*}
\begin{align*}
(\lambda f.u)\,v \to_\beta u.
\end{align*}
Therefore $\mathsf{if}\,\mathsf{true}\,u\,v \to_\beta^* u$.
The false branch is analogous but returns the second argument:
\begin{align*}
\mathsf{if}\,\mathsf{false}\,u\,v = (((\lambda b.\lambda p.\lambda q.b\,p\,q)\,\mathsf{false})\,u)\,v.
\end{align*}
\begin{align*}
((\lambda b.\lambda p.\lambda q.b\,p\,q)\,\mathsf{false}) \to_\beta \lambda p.\lambda q.\mathsf{false}\,p\,q.
\end{align*}
\begin{align*}
((\lambda p.\lambda q.\mathsf{false}\,p\,q)\,u) \to_\beta \lambda q.\mathsf{false}\,u\,q.
\end{align*}
\begin{align*}
(\lambda q.\mathsf{false}\,u\,q)\,v \to_\beta \mathsf{false}\,u\,v.
\end{align*}
Expanding $\mathsf{false}$ gives
\begin{align*}
\mathsf{false}\,u\,v = ((\lambda t.\lambda f.f)\,u)\,v.
\end{align*}
\begin{align*}
(\lambda t.\lambda f.f)\,u \to_\beta \lambda f.f.
\end{align*}
\begin{align*}
(\lambda f.f)\,v \to_\beta v.
\end{align*}
Hence $\mathsf{if}\,\mathsf{false}\,u\,v \to_\beta^* v$. A Church boolean is therefore exactly a term that selects one of its two continuations.
[/example]
Church booleans show that data can be represented by its eliminator: a boolean is something that chooses between two branches. Natural numbers follow the same pattern, replacing selection by iteration. This prepares Chapter 2's Curry-Howard view that introduction and elimination rules are the structure of a type.
[example: Church Numerals]
A [natural number](/page/Natural%20Number) $n$ is represented by a term that takes a function $f$, takes an input $x$, and applies $f$ exactly $n$ times to $x$:
\begin{align*}
\overline{n}=\lambda f.\lambda x.f^n(x).
\end{align*}
Here $f^0(x)$ means $x$, and $f^{n+1}(x)$ means $f(f^n(x))$. Thus
\begin{align*}
\overline{0}=\lambda f.\lambda x.x.
\end{align*}
\begin{align*}
\overline{2}=\lambda f.\lambda x.f(fx).
\end{align*}
Define the successor combinator by
\begin{align*}
\mathsf{succ}=\lambda n.\lambda f.\lambda x.f\,(n\,f\,x).
\end{align*}
We compute its action on $\overline{n}$ by beta-reduction:
\begin{align*}
\mathsf{succ}\,\overline{n}=(\lambda n.\lambda f.\lambda x.f\,(n\,f\,x))\,\overline{n}.
\end{align*}
\begin{align*}
(\lambda n.\lambda f.\lambda x.f\,(n\,f\,x))\,\overline{n}\to_\beta \lambda f.\lambda x.f\,(\overline{n}\,f\,x).
\end{align*}
Now expand $\overline{n}$ inside the body:
\begin{align*}
\overline{n}\,f\,x=((\lambda f.\lambda x.f^n(x))\,f)\,x.
\end{align*}
\begin{align*}
(\lambda f.\lambda x.f^n(x))\,f\to_\beta \lambda x.f^n(x).
\end{align*}
\begin{align*}
(\lambda x.f^n(x))\,x\to_\beta f^n(x).
\end{align*}
Substituting this reduced subterm back into the successor body gives
\begin{align*}
\lambda f.\lambda x.f\,(\overline{n}\,f\,x)\to_\beta^*\lambda f.\lambda x.f(f^n(x)).
\end{align*}
Since $f^{n+1}(x)$ is defined as $f(f^n(x))$, we have
\begin{align*}
\lambda f.\lambda x.f(f^n(x))=\lambda f.\lambda x.f^{n+1}(x)=\overline{n+1}.
\end{align*}
Therefore $\mathsf{succ}\,\overline{n}\to_\beta^*\overline{n+1}$: arithmetic begins by representing a number as the operation of iterating a function that many times.
[/example]
## Simply Typed Lambda Calculus
The untyped calculus is expressive enough to encode data and computation, but it also admits pathological self-application. The next question is how to impose a discipline that rules out ill-formed applications while preserving function abstraction and application. The simply typed lambda calculus does this by assigning each term a type and checking applications against function types.
[definition: Simple Type]
Fix a set of base types, written $\mathsf{Base}$. The simple types are generated by
\begin{align*}
A,B ::= \alpha \mid A \to B,
\end{align*}
where $\alpha \in \mathsf{Base}$.
[/definition]
A type $A \to B$ classifies functions that accept inputs of type $A$ and return outputs of type $B$. The arrow associates to the right, so $A \to B \to C$ means $A \to (B \to C)$. Types alone classify closed shapes of data and functions, but open terms contain variables; this motivates a separate record of variable assumptions.
[definition: Typing Context]
A typing context $\Gamma$ is a finite list of declarations $x_1:A_1,\dots,x_n:A_n$ with distinct variables. The notation $x:A \in \Gamma$ means that $\Gamma$ assigns type $A$ to $x$.
[/definition]
A context is a local environment of hypotheses. It is the type-theoretic counterpart of a list of assumptions in logic. Contexts motivate the central form of assertion in the type system: a term has a type only relative to the assumptions under which its free variables are interpreted.
[definition: Typing Judgment]
A typing judgment has the form $\Gamma \vdash t : A$, read as "$t$ has type $A$ under context $\Gamma$".
[/definition]
A typing judgment records what we want to prove, but it does not yet say which judgments are derivable. The syntax has exactly three constructors, so the next definition gives one rule for variables, one for abstraction, and one for application. These rules are needed to turn informal type annotations into formal derivations.
[definition: Typing Rules For Simply Typed Lambda Calculus]
The simply typed lambda calculus has the following typing rules:
\begin{align*}
\frac{x:A \in \Gamma}{\Gamma \vdash x:A}
\qquad
\frac{\Gamma,x:A \vdash t:B}{\Gamma \vdash \lambda x.t:A \to B}
\qquad
\frac{\Gamma \vdash t:A \to B \quad \Gamma \vdash u:A}{\Gamma \vdash t\,u:B}.
\end{align*}
[/definition]
The abstraction rule introduces a function type by temporarily extending the context with the input variable. The application rule eliminates a function type by checking that the argument has the required domain type. These rules motivate the following combinator examples, which test the rules on the most basic typed higher-order programs.
[example: Typing The K Combinator]
Let $A$ and $B$ be simple types, and let
\begin{align*}
K=\lambda x.\lambda y.x.
\end{align*}
We show that $K:A\to B\to A$, where $B\to A$ means $B\to A$ inside the right-associated type $A\to(B\to A)$.
Start in the context $x:A,y:B$. Since $x:A \in x:A,y:B$, the variable rule gives
\begin{align*}
x:A,y:B \vdash x:A.
\end{align*}
Applying the abstraction rule to the assumption $y:B$ gives
\begin{align*}
x:A \vdash \lambda y.x:B\to A.
\end{align*}
Applying the abstraction rule again to the assumption $x:A$ gives
\begin{align*}
\varnothing \vdash \lambda x.\lambda y.x:A\to(B\to A).
\end{align*}
Thus
\begin{align*}
\varnothing \vdash K:A\to B\to A.
\end{align*}
The $K$ combinator is therefore the typed constant-function constructor: after receiving an input of type $A$, it returns a function that ignores any input of type $B$ and returns the original $A$-typed value.
[/example]
The $K$ combinator is the constant-function constructor: it keeps the first input and ignores the second. Composition is the next structural operation because it combines two typed functions whose middle types match. Its derivation illustrates how application and abstraction alternate in a typing proof.
[example: Typing The Composition Combinator]
For simple types $A,B,C$, define
\begin{align*}
\mathsf{comp}=\lambda f.\lambda g.\lambda x.f\,(g\,x).
\end{align*}
We show that $\mathsf{comp}$ has type $(B\to C)\to(A\to B)\to A\to C$, where arrows associate to the right.
Work in the context
\begin{align*}
\Gamma=f:B\to C,\ g:A\to B,\ x:A.
\end{align*}
Since $g:A\to B \in \Gamma$ and $x:A \in \Gamma$, the variable rule gives
\begin{align*}
\Gamma \vdash g:A\to B.
\end{align*}
\begin{align*}
\Gamma \vdash x:A.
\end{align*}
By the application rule applied to these two judgments,
\begin{align*}
\Gamma \vdash g\,x:B.
\end{align*}
Also, since $f:B\to C \in \Gamma$, the variable rule gives
\begin{align*}
\Gamma \vdash f:B\to C.
\end{align*}
Applying the application rule again to $\Gamma \vdash f:B\to C$ and $\Gamma \vdash g\,x:B$ gives
\begin{align*}
\Gamma \vdash f\,(g\,x):C.
\end{align*}
Now discharge the assumptions one at a time using the abstraction rule. From
\begin{align*}
f:B\to C,\ g:A\to B,\ x:A \vdash f\,(g\,x):C
\end{align*}
we get
\begin{align*}
f:B\to C,\ g:A\to B \vdash \lambda x.f\,(g\,x):A\to C.
\end{align*}
Applying abstraction to $g:A\to B$ gives
\begin{align*}
f:B\to C \vdash \lambda g.\lambda x.f\,(g\,x):(A\to B)\to A\to C.
\end{align*}
Applying abstraction to $f:B\to C$ gives
\begin{align*}
\varnothing \vdash \lambda f.\lambda g.\lambda x.f\,(g\,x):(B\to C)\to(A\to B)\to A\to C.
\end{align*}
Therefore
\begin{align*}
\varnothing \vdash \mathsf{comp}:(B\to C)\to(A\to B)\to A\to C.
\end{align*}
The composition combinator is typable exactly because the output type $B$ of $g$ matches the input type $B$ of $f$.
[/example]
Composition uses an argument once through $g$ and then feeds the result to $f$. The $S$ combinator is more demanding because it sends the same argument into two different subterms. This makes it a useful example for tracking repeated assumptions in a context.
[example: Typing The S Combinator]
For simple types $A,B,C$, let
\begin{align*}
S=\lambda f.\lambda g.\lambda x.f\,x\,(g\,x).
\end{align*}
We show that $S$ has type $(A\to B\to C)\to(A\to B)\to A\to C$, with arrows associated to the right.
Work in the context
\begin{align*}
\Gamma=f:A\to B\to C,\ g:A\to B,\ x:A.
\end{align*}
Since $f:A\to B\to C \in \Gamma$ and $x:A \in \Gamma$, the variable rule gives
\begin{align*}
\Gamma \vdash f:A\to B\to C.
\end{align*}
\begin{align*}
\Gamma \vdash x:A.
\end{align*}
By the application rule applied to these two judgments,
\begin{align*}
\Gamma \vdash f\,x:B\to C.
\end{align*}
Similarly, since $g:A\to B \in \Gamma$ and $x:A \in \Gamma$, the variable rule gives
\begin{align*}
\Gamma \vdash g:A\to B.
\end{align*}
\begin{align*}
\Gamma \vdash x:A.
\end{align*}
Applying the application rule gives
\begin{align*}
\Gamma \vdash g\,x:B.
\end{align*}
Now $f\,x$ has type $B\to C$ and $g\,x$ has type $B$ in the same context, so one more use of the application rule gives
\begin{align*}
\Gamma \vdash (f\,x)(g\,x):C.
\end{align*}
Since application associates to the left, this is the same term as $f\,x\,(g\,x)$:
\begin{align*}
\Gamma \vdash f\,x\,(g\,x):C.
\end{align*}
Discharge the assumptions using the abstraction rule. From
\begin{align*}
f:A\to B\to C,\ g:A\to B,\ x:A \vdash f\,x\,(g\,x):C
\end{align*}
we obtain
\begin{align*}
f:A\to B\to C,\ g:A\to B \vdash \lambda x.f\,x\,(g\,x):A\to C.
\end{align*}
Then
\begin{align*}
f:A\to B\to C \vdash \lambda g.\lambda x.f\,x\,(g\,x):(A\to B)\to A\to C.
\end{align*}
Finally,
\begin{align*}
\varnothing \vdash \lambda f.\lambda g.\lambda x.f\,x\,(g\,x):(A\to B\to C)\to(A\to B)\to A\to C.
\end{align*}
Therefore
\begin{align*}
\varnothing \vdash S:(A\to B\to C)\to(A\to B)\to A\to C.
\end{align*}
The $S$ combinator is typable because the shared input $x:A$ can be used both as the direct argument to $f$ and as the argument to $g$, whose output has exactly the type expected by the second input of $f\,x$.
[/example]
The examples above build typed terms by repeatedly introducing and eliminating arrow types. Computation, however, substitutes an argument into a function body, so we need a theorem saying that this operation respects typing derivations. Without this result, the beta-rule would have no reason to preserve the type assigned before reduction.
[quotetheorem:9613]
[citeproof:9613]
The lemma explains why beta-reduction preserves types: the body of a well-typed function is typed under the assumption that its parameter has the input type, and the actual argument supplies a term of that type. Both hypotheses are necessary. If $\Gamma,x:A \vdash x:A$ but $\Gamma \vdash s:C$ with $C \ne A$, then replacing $x$ by $s$ cannot produce a term of type $A$ without an additional conversion principle between $A$ and $C$. Capture avoidance is also essential: the naive replacement $(\lambda y.x)[y/x]=\lambda y.y$ changes the free variable $y$ in the substituting term into a bound variable, so it does not preserve the dependency information recorded by the context. The lemma does not say that arbitrary textual replacement is type preserving, nor that substitution is defined without respecting binders. Its role is precisely to bridge the typing rule for abstraction with the beta-step used in preservation.
## Small-Step Semantics and Type Safety
Typing says which programs are well formed, but it does not by itself say how programs run. We now ask for an operational semantics: a deterministic or nondeterministic relation describing individual computation steps. For the simply typed lambda calculus, small-step semantics presents evaluation as repeated local reduction.
[definition: Value]
In the call-by-value simply typed lambda calculus, a value is a lambda abstraction $\lambda x.t$.
[/definition]
Values are terms that have finished evaluating. In the pure calculus with only functions, abstractions are the only final computational forms. To specify not only what can reduce but where the next reduction happens inside a larger term, we introduce evaluation contexts.
[definition: Evaluation Context]
Evaluation contexts for call-by-value lambda calculus are generated by
\begin{align*}
E ::= [\,] \mid E\,t \mid v\,E,
\end{align*}
where $v$ ranges over values.
[/definition]
An evaluation context marks the next position at which a computation step may occur. The grammar evaluates the function position first and then the argument position once the function is already a value. The small-step relation combines this search strategy with the beta-rule for value arguments.
[definition: Small-Step Reduction]
The call-by-value small-step reduction relation $t \to u$ is generated by
\begin{align*}
(\lambda x.t)\,v \to t[v/x].
\end{align*}
\begin{align*}
\frac{t \to u}{E[t] \to E[u]},
\end{align*}
where $v$ is a value and $E$ is an evaluation context.
[/definition]
This semantics is more restrictive than unrestricted beta-reduction because it reduces a beta-redex only when its argument is already a value. The restriction models eager evaluation in functional programming languages. A concrete reduction makes the evaluation-context grammar visible.
[example: A Call-By-Value Reduction]
Consider
\begin{align*}
((\lambda x.x)(\lambda z.z))(\lambda y.y).
\end{align*}
The outer term is not a call-by-value beta-redex, because its function position is the application $(\lambda x.x)(\lambda z.z)$ rather than a value. The evaluation-context grammar therefore selects the function position, with
\begin{align*}
E=[\,](\lambda y.y).
\end{align*}
Since $\lambda z.z$ is a value, the beta rule applies to the selected subterm:
\begin{align*}
(\lambda x.x)(\lambda z.z)\to x[(\lambda z.z)/x].
\end{align*}
The substituted term is just the variable being replaced, so
\begin{align*}
x[(\lambda z.z)/x]=\lambda z.z.
\end{align*}
By closure under the evaluation context $E$, this gives
\begin{align*}
((\lambda x.x)(\lambda z.z))(\lambda y.y)\to(\lambda z.z)(\lambda y.y).
\end{align*}
Now the whole term is a beta-redex, and $\lambda y.y$ is a value:
\begin{align*}
(\lambda z.z)(\lambda y.y)\to z[(\lambda y.y)/z].
\end{align*}
Again the body is exactly the variable being replaced, so
\begin{align*}
z[(\lambda y.y)/z]=\lambda y.y.
\end{align*}
Hence
\begin{align*}
(\lambda z.z)(\lambda y.y)\to\lambda y.y.
\end{align*}
This reduction shows that call-by-value evaluation first reduces the function position until it becomes a value, and only then contracts the resulting beta-redex.
[/example]
The example demonstrates that evaluation changes syntax while leaving the intended result intact. To make that claim type-theoretic rather than informal, one asks whether every permitted computation step preserves the original typing judgment. In the simply typed lambda calculus this is the preservation, or subject reduction, principle: if $\Gamma \vdash t:A$ and $t\to t'$, then $\Gamma \vdash t':A$. This local principle is about the operational semantics just defined, not about arbitrary rewrites.
Preservation is sometimes called subject reduction. It rules out the possibility that a well-typed program begins as a term of type $A$ but later becomes a term of some incompatible type. The typing hypothesis is doing real work: an untyped beta-step such as $(\lambda x.x\,x)(\lambda x.x\,x) \to (\lambda x.x\,x)(\lambda x.x\,x)$ gives no type information to preserve, because the term is not typable in the simply typed system. The reduction hypothesis also matters, since preservation concerns the specified operational semantics rather than arbitrary rewrites that replace a term by an unrelated expression. Preservation alone does not rule out getting stuck before reaching a value, so a second theorem is needed.
[quotetheorem:9611]
[citeproof:9611]
Preservation and progress together form type safety. The closedness assumption in progress is essential: in a nonempty context $x:A \vdash x:A$, the term $x$ is not a value and has no reduction step, because variables do not compute on their own. Thus progress is a theorem about complete programs, not about open terms with unresolved assumptions. A closed well-typed term may diverge only if the language permits nontermination; in the simply typed lambda calculus without recursion, a stronger normalization theorem is available later in the course. The final example returns to self-application and explains exactly how typing prevents the untyped pathology.
[example: Why Self-Application Is Untypable]
Assume, toward a contradiction, that $\lambda x.x\,x$ has some simple type. Then by the abstraction rule there are simple types $A$ and $B$ such that the body is typable under the one-variable context:
\begin{align*}
x:A \vdash x\,x:B.
\end{align*}
For this judgment to be derived by the application rule, there must be a simple type $C$ such that the left occurrence of $x$ has function type $C\to B$ and the right occurrence of $x$ has argument type $C$:
\begin{align*}
x:A \vdash x:C\to B.
\end{align*}
\begin{align*}
x:A \vdash x:C.
\end{align*}
But the context assigns exactly one type to $x$, namely $A$. Therefore the variable rule can type either occurrence of $x$ only with type $A$, so the two displayed judgments force
\begin{align*}
A=C\to B.
\end{align*}
\begin{align*}
A=C.
\end{align*}
Substituting $C=A$ into $A=C\to B$ gives
\begin{align*}
A=A\to B.
\end{align*}
No simple type satisfies this equation: if $|{-}|$ denotes the number of constructors in a simple type, then $|A\to B|=1+|A|+|B|>|A|$. Hence $A$ cannot equal $A\to B$.
Thus $x\,x$ is not typable in any one-variable context assigning a single simple type to $x$, and consequently the raw untyped term $\lambda x.x\,x$ has no simple type in the simply typed lambda calculus.
[/example]
The chapter has now separated syntax, reduction, and typing. Untyped lambda calculus provides the basic language of binding and computation; simple types add a static discipline; small-step semantics explains execution; and the substitution, preservation, progress, and Church-Rosser theorems give the first metatheoretic guarantees that the system behaves coherently.
By the end of the lambda-calculus chapter, computation and typing have been made precise enough to support the first metatheorems. The next step is to interpret those typed terms as proofs, and to see how natural deduction and Curry-Howard identify logical derivations with programs.
# 2. Natural Deduction and Curry-Howard
Natural deduction gives rules for building and using proofs; the typed lambda calculus gives rules for building and using programs. This chapter explains the precise match between these two calculi for intuitionistic propositional logic. The guiding question is: when a proof of a proposition is viewed as a term of a type, what do the introduction, elimination, and computation rules become?
The previous chapter supplied the syntax and metatheory of simply typed lambda terms. We now reinterpret the same syntax logically: function types become implications, products become conjunctions, sums become disjunctions, the unit type becomes truth, and the empty type becomes falsehood. This reinterpretation is not a metaphor; it is an isomorphism between proof construction and typed programming.
## Propositions as Types
The first problem is to identify what data a constructive proof should contain. In classical truth-value semantics, a proposition is assigned a truth value, but natural deduction treats a proposition by its possible proofs. The propositions-as-types reading says that the meaning of a proposition is the type of its proofs.
[definition: Propositional Type Language]
Fix a collection of atomic propositions $P,Q,R,\dots$. The propositional type language is generated by the grammar
\begin{align*}
A,B ::= P \mid A \to B \mid A \times B \mid A + B \mid \mathbf{1} \mid \mathbf{0}.
\end{align*}
[/definition]
The connective $A \to B$ represents implication, $A \times B$ represents conjunction, $A+B$ represents disjunction, $\mathbf{1}$ represents truth, and $\mathbf{0}$ represents falsehood. To state that a proof uses assumptions, the language must also record which proof variables are available and what propositions they prove; this motivates the next definition of a context.
[definition: Proof Term Context]
A proof term context is a finite list
\begin{align*}
\Gamma = x_1:A_1,\dots,x_n:A_n
\end{align*}
of distinct variables equipped with propositional types.
[/definition]
A context records the assumptions currently available. The judgement $\Gamma \vdash t:A$ is read both as the typing judgement “term $t$ has type $A$ under context $\Gamma$” and as the logical judgement “from assumptions $\Gamma$ there is a proof $t$ of proposition $A$”. This double reading motivates a definition naming the interpretation that identifies the proof-theoretic and programming meanings.
[definition: Curry-Howard Reading]
Under the Curry-Howard reading, propositions are identified with types, proofs are identified with terms, assumptions are identified with variables in a context, and proof transformations are identified with term reductions.
[/definition]
This definition packages the slogan, but its content comes from the rules. We first test it on the most familiar logical inference, modus ponens, because it reveals the operational meaning of implication.
[example: Modus Ponens as Application]
Suppose the context contains $f:A\to B$ and $a:A$, so $\Gamma \vdash f:A\to B$ and $\Gamma \vdash a:A$. By the elimination rule for implication, these two judgements give
\begin{align*}
\Gamma \vdash f\,a:B.
\end{align*}
Under the logical reading, $f$ is a proof of the implication $A\to B$, $a$ is a proof of $A$, and the application $f\,a$ is the resulting proof of $B$. Thus the usual modus ponens inference is exactly the application rule for function types; no separate programming construct is needed.
[/example]
This example shows why implication is read as a function type: a proof of $A\to B$ is a method which turns any proof of $A$ into a proof of $B$. The same comparison can be made for every connective, so the next result records the full rule-by-rule correspondence.
[quotetheorem:9610]
[citeproof:9610]
The theorem is the conceptual centre of the chapter, but its hypotheses matter. It is a theorem about the intuitionistic propositional fragment with exactly the listed connectives and their ordinary natural deduction rules. It does not say that every classically valid principle has a closed program: for example, there is no closed term of type $A+ (A\to \mathbf{0})$ in the pure simply typed intuitionistic calculus for an arbitrary atomic proposition $A$. Adding classical principles requires adding new constants, control operators, or axioms, and those additions change the computational interpretation.
This limitation is part of the value of the correspondence. It separates rules that have direct constructive content from principles whose computational meaning needs extra structure. Its practical use is that every proof rule can now be studied by asking how the corresponding program is formed, eliminated, and reduced.
## Constructors and Eliminators
The next question is how each connective is governed by its introduction and elimination behaviour. An introduction rule says how to construct evidence for a proposition. An elimination rule says how evidence for that proposition may be used.
[definition: Implication Term Rules]
For propositions $A$ and $B$, implication is represented by the function type $A\to B$. Its introduction and elimination rules are
\begin{align*}
\frac{\Gamma,x:A \vdash t:B}{\Gamma \vdash \lambda x.t:A\to B},
\qquad
\frac{\Gamma \vdash f:A\to B \quad \Gamma \vdash a:A}{\Gamma \vdash f\,a:B}.
\end{align*}
[/definition]
The introduction rule abstracts over a temporary assumption. The elimination rule applies the resulting proof-function to actual evidence for its input proposition, so the simplest implication proof is the one that returns its assumed input.
[example: Deriving the Identity Proof]
For any proposition $A$, we derive $A\to A$ by first adding a temporary assumption $x:A$. In that one-variable context, the variable rule gives
\begin{align*}
x:A \vdash x:A.
\end{align*}
Applying implication introduction to this judgement discharges the temporary assumption $x:A$ and yields
\begin{align*}
\vdash \lambda x.x : A\to A.
\end{align*}
Thus $\lambda x.x$ is a closed proof of $A\to A$: given evidence $x$ for $A$, it returns exactly that same evidence. This proof does not inspect or transform evidence for $A$, since an arbitrary atomic proposition has no eliminator available here; the whole argument is the implication-introduction pattern of assuming the antecedent and deriving the consequent from that assumption.
[/example]
This example involves one input and one output. Conjunction has a different shape because evidence for $A\wedge B$ is not merely evidence for one proposition transformed into another: it must keep independent evidence for $A$ and evidence for $B$ available at the same time.
The formal rules for conjunction must therefore answer two complementary questions: how a pair of independent proofs is assembled into one proof object, and how a proof object already known to contain both components can be inspected to recover either component. The product rules below record exactly those two operations.
[definition: Conjunction Term Rules]
For propositions $A$ and $B$, conjunction is represented by the product type $A\times B$. Its introduction and elimination rules are
\begin{align*}
\frac{\Gamma \vdash a:A \quad \Gamma \vdash b:B}{\Gamma \vdash (a,b):A\times B},
\qquad
\frac{\Gamma \vdash p:A\times B}{\Gamma \vdash \pi_1(p):A},
\qquad
\frac{\Gamma \vdash p:A\times B}{\Gamma \vdash \pi_2(p):B}.
\end{align*}
[/definition]
Products make the content of a conjunction inspectable. A proof of $A\times B$ may be projected to recover the proof of either conjunct, giving the typed form of conjunction elimination.
[example: Conjunction Elimination as Projection]
Let $p:A\times B$ be an assumption. By the first product-elimination rule, the judgement $p:A\times B \vdash p:A\times B$ gives
\begin{align*}
p:A\times B \vdash \pi_1(p):A.
\end{align*}
By the second product-elimination rule, the same assumption gives
\begin{align*}
p:A\times B \vdash \pi_2(p):B.
\end{align*}
If the product proof was introduced from proofs $a:A$ and $b:B$, then the product-introduction rule forms
\begin{align*}
\Gamma \vdash (a,b):A\times B.
\end{align*}
Applying the first projection gives
\begin{align*}
\Gamma \vdash \pi_1(a,b):A.
\end{align*}
The product beta law reduces this term to the first component:
\begin{align*}
\pi_1(a,b)\to_\beta a.
\end{align*}
Applying the second projection gives
\begin{align*}
\Gamma \vdash \pi_2(a,b):B.
\end{align*}
The second product beta law reduces this term to the second component:
\begin{align*}
\pi_2(a,b)\to_\beta b.
\end{align*}
Thus conjunction elimination is component selection: from $p:A\times B$ either component may be recovered immediately, while from $s:A+B$ neither $A$ nor $B$ can be recovered without a case split.
[/example]
Conjunction stores both sides, but disjunction stores a choice of side. This difference forces a new eliminator: to use evidence for $A\vee B$, we must be prepared for either way that evidence might have been introduced.
[definition: Disjunction Term Rules]
For propositions $A$ and $B$, disjunction is represented by the sum type $A+B$. Its introduction rules are
\begin{align*}
\frac{\Gamma \vdash a:A}{\Gamma \vdash \operatorname{inl}(a):A+B},
\qquad
\frac{\Gamma \vdash b:B}{\Gamma \vdash \operatorname{inr}(b):A+B}.
\end{align*}
The elimination rule is
\begin{align*}
\frac{\Gamma \vdash s:A+B \quad \Gamma,x:A \vdash u:C \quad \Gamma,y:B \vdash v:C}{\Gamma \vdash \operatorname{case}(s;x.u;y.v):C}.
\end{align*}
[/definition]
The eliminator expresses constructive proof by cases: to obtain $C$ from a proof of $A\vee B$, it is necessary to give a method from $A$ to $C$ and a method from $B$ to $C$.
[example: Proof by Cases as Case Analysis]
Assume the context contains $s:A+B$, $f:A\to C$, and $g:B\to C$. To use $s$ by cases, introduce a branch variable $x:A$ for the left case; since $f:A\to C$, implication elimination gives
\begin{align*}
s:A+B,f:A\to C,g:B\to C,x:A \vdash f\,x:C.
\end{align*}
Similarly, in the right branch, introduce $y:B$; since $g:B\to C$, implication elimination gives
\begin{align*}
s:A+B,f:A\to C,g:B\to C,y:B \vdash g\,y:C.
\end{align*}
The sum-elimination rule therefore forms the proof
\begin{align*}
s:A+B,f:A\to C,g:B\to C \vdash \operatorname{case}(s;x.f\,x;y.g\,y):C.
\end{align*}
If the sum evidence was introduced on the left from $a:A$, then substituting $\operatorname{inl}(a)$ for $s$ gives
\begin{align*}
\operatorname{case}(\operatorname{inl}(a);x.f\,x;y.g\,y)\to_\beta (f\,x)[a/x].
\end{align*}
The substitution replaces the branch variable $x$ by $a$, so
\begin{align*}
(f\,x)[a/x]=f\,a.
\end{align*}
If the sum evidence was introduced on the right from $b:B$, then
\begin{align*}
\operatorname{case}(\operatorname{inr}(b);x.f\,x;y.g\,y)\to_\beta (g\,y)[b/y].
\end{align*}
The substitution replaces the branch variable $y$ by $b$, so
\begin{align*}
(g\,y)[b/y]=g\,b.
\end{align*}
Thus [disjunction elimination](/theorems/4625) is computationally case analysis: each injection selects the matching branch and supplies its stored proof to that branch.
[/example]
The proof-by-cases example has two constructors and one eliminator with two branches. Truth and falsehood are the boundary cases where this constructor-eliminator pattern becomes degenerate.
To make these boundary cases usable as propositions, the rules must specify what counts as evidence when no choice or payload is present. Truth needs a canonical piece of evidence carrying no information, while falsehood needs an eliminator explaining what follows from an impossible piece of evidence. The unit and empty type rules provide these two extremes.
[definition: Truth and Falsehood Term Rules]
Truth is represented by the unit type $\mathbf{1}$ with introduction rule
\begin{align*}
\frac{}{\Gamma \vdash \star:\mathbf{1}}.
\end{align*}
Falsehood is represented by the empty type $\mathbf{0}$ with elimination rule
\begin{align*}
\frac{\Gamma \vdash z:\mathbf{0}}{\Gamma \vdash \operatorname{abort}_A(z):A}.
\end{align*}
[/definition]
The rule for $\mathbf{0}$ is ex falso in typed form. It is harmless only because there should be no closed proof term of type $\mathbf{0}$, a fact proved after normalization.
[example: Ex Falso as Empty Elimination]
Let $\Gamma$ be any context and assume $z:\mathbf{0}$ is available. The variable rule first gives the judgement
\begin{align*}
\Gamma,z:\mathbf{0}\vdash z:\mathbf{0}.
\end{align*}
The elimination rule for falsehood says that from any judgement whose conclusion has type $\mathbf{0}$, one may form an eliminator into any proposition $A$. Applying that rule to the displayed judgement yields
\begin{align*}
\Gamma,z:\mathbf{0}\vdash \operatorname{abort}_A(z):A.
\end{align*}
Thus $z$ is not being inspected to extract hidden data of type $A$; rather, $\mathbf{0}$ has no introduction rule, so there is no constructor case for the eliminator to analyze. Operationally, $\operatorname{abort}_A$ is the branch-free eliminator: if evidence for falsehood were present, every target proposition would follow from that impossible case.
[/example]
Taken together, these rules express a design principle: each connective is determined by how proofs of it are introduced and how they are consumed. The next section studies the equations that make introductions and eliminations fit together correctly.
## Beta Laws as Local Proof Reductions
The third problem is to understand when two proofs should count as the same local proof. Natural deduction contains detours: introduce a connective and immediately eliminate it. Beta laws remove these detours, and in the typed calculus they are exactly computation rules.
[definition: Beta Laws]
The beta laws for the propositional type formers are the following reductions:
\begin{align*}
(\lambda x.t)\,a &\to_\beta t[a/x].
\end{align*}
\begin{align*}
\pi_1(a,b) &\to_\beta a.
\end{align*}
\begin{align*}
\pi_2(a,b) &\to_\beta b.
\end{align*}
\begin{align*}
\operatorname{case}(\operatorname{inl}(a);x.u;y.v) &\to_\beta u[a/x].
\end{align*}
\begin{align*}
\operatorname{case}(\operatorname{inr}(b);x.u;y.v) &\to_\beta v[b/y].
\end{align*}
[/definition]
Each law says that eliminating evidence just constructed returns the relevant component of the construction. These are local proof reductions because they simplify a small proof fragment without changing the conclusion.
[example: Reducing a Modus Ponens Detour]
Let $\Gamma,x:A \vdash t:B$ and $\Gamma \vdash a:A$. Implication introduction first discharges the temporary assumption $x:A$, giving
\begin{align*}
\Gamma \vdash \lambda x.t : A\to B.
\end{align*}
Implication elimination then applies this proof-function to the actual proof $a:A$, so
\begin{align*}
\Gamma \vdash (\lambda x.t)\,a : B.
\end{align*}
The function beta law rewrites the application of an abstraction to its body with the argument substituted for the bound variable:
\begin{align*}
(\lambda x.t)\,a \to_\beta t[a/x].
\end{align*}
The substituted term still has type $B$ in the original context,
\begin{align*}
\Gamma \vdash t[a/x] : B.
\end{align*}
Thus the detour “assume $x:A$, prove $t:B$, abstract over $x$, then apply to $a$” reduces to the proof obtained by putting the actual evidence $a$ everywhere the temporary assumption $x$ was used.
[/example]
This example shows a proof changing shape while keeping the same conclusion. That raises a necessary soundness question: if computation rewrites a proof term, could the rewritten term prove a different proposition or fall outside the typing judgment altogether? Beta reduction can serve as proof simplification only if typing is stable under the rewrite, so the proposition assigned to a well-typed proof term is preserved during reduction.
[quotetheorem:4635]
[citeproof:4635]
This theorem says that local proof simplification does not change what has been proved, but the well-typedness hypothesis is indispensable. The statement only concerns reductions that are legal inside the typed calculus, where substitution respects contexts and every eliminator is applied to evidence of the matching type. In the untyped lambda calculus, beta reduction still rewrites terms, but there is no proposition attached to a term for reduction to preserve; for an ill-typed expression such as applying a pair as if it were a function, the logical reading has already broken down. Thus subject reduction is preservation of proof meaning, not a general safety statement about arbitrary syntax.
## Eta Laws as Local Proof Expansions
Beta laws remove unnecessary detours, but there is a dual question: when is a proof determined by how all eliminators observe it? Without such a principle, two proofs can be indistinguishable by every allowed observation and still fail to be definitionally equal. For instance, a function $f:A\to B$ and its expansion $\lambda x.f\,x$ behave the same on every argument, and a product proof $p:A\times B$ has the same projections as $(\pi_1(p),\pi_2(p))$. Eta laws express the principle that these complete observations determine the proof.
[definition: Eta Laws]
The eta laws for implication, conjunction, truth, and disjunction are the following definitional equalities or expansions:
\begin{align*}
f &\equiv \lambda x.f\,x \quad (x \notin FV(f)).
\end{align*}
\begin{align*}
p &\equiv (\pi_1(p),\pi_2(p)).
\end{align*}
\begin{align*}
u &\equiv \star \quad (u:\mathbf{1}).
\end{align*}
\begin{align*}
s &\equiv \operatorname{case}(s;x.\operatorname{inl}(x);y.\operatorname{inr}(y)).
\end{align*}
[/definition]
The eta law for functions says that a function is determined by its action on an arbitrary argument. The eta law for products says that a pair is determined by its two projections. The eta law for sums says that a sum value is determined by case analysis which immediately rebuilds the observed injection.
[example: Expanding a Product Proof]
Let $p:A\times B$. Product elimination gives both component judgements
\begin{align*}
p:A\times B \vdash \pi_1(p):A.
\end{align*}
\begin{align*}
p:A\times B \vdash \pi_2(p):B.
\end{align*}
Product introduction then rebuilds a product proof from exactly those two components:
\begin{align*}
p:A\times B \vdash (\pi_1(p),\pi_2(p)):A\times B.
\end{align*}
The product eta law identifies $p$ with this rebuilt pair,
\begin{align*}
p \equiv (\pi_1(p),\pi_2(p)).
\end{align*}
This gives a concrete comparison method for product proofs. If $p,q:A\times B$ and the two observed components agree,
\begin{align*}
\pi_1(p)\equiv \pi_1(q),
\end{align*}
\begin{align*}
\pi_2(p)\equiv \pi_2(q),
\end{align*}
then product introduction preserves these component equalities, so
\begin{align*}
(\pi_1(p),\pi_2(p))\equiv(\pi_1(q),\pi_2(q)).
\end{align*}
Using eta for $p$ and for $q$ gives
\begin{align*}
p\equiv(\pi_1(p),\pi_2(p))\equiv(\pi_1(q),\pi_2(q))\equiv q.
\end{align*}
Thus a product proof is determined by both projections together; observing only $\pi_1(p)$ would determine the $A$-component but would leave the $B$-component $\pi_2(p)$ unconstrained.
[/example]
Eta is not merely a convenience for equational reasoning; it tells us that the eliminators are complete observations for their type formers. Without eta, two terms might behave identically under all eliminations but fail to be identified by the definitional equality of the theory.
[remark: Beta and Eta as Harmony]
The beta and eta laws together express harmony between introduction and elimination rules. Beta says eliminators do not extract more information than constructors provided. Eta says constructors provide enough information to account for everything eliminators can observe.
[/remark]
This harmony is one reason intuitionistic natural deduction fits typed computation so tightly. It also prepares the ground for dependent type theory, where each new type former will come with formation, introduction, elimination, and computation rules.
## Normal Forms and Consistency
The final question is what the computational interpretation tells us about logic as a whole. If proofs are programs and proof simplification is evaluation, then a logical inconsistency would amount to a closed program of type falsehood.
[definition: Normal Form]
A typed term $t$ is in normal form if there is no term $t'$ such that $t\to_\beta t'$.
[/definition]
Normal forms are proofs with no local detours remaining. In natural deduction language, a normal proof no longer introduces a connective only to eliminate it immediately, so normalization is the global theorem saying every proof can be simplified to such a form. The restriction to simple types is essential: the untyped lambda calculus contains self-application patterns such as $\Omega=(\lambda x.x\,x)(\lambda x.x\,x)$, whose beta reduction repeats forever and has no normal form. Simple types rule out exactly this kind of unrestricted self-application.
[quotetheorem:9614]
[proofunderconstruction:9614]
Normalization is the computational version of cut elimination for this fragment of logic. The theorem depends on the discipline imposed by simple types: every eliminator consumes evidence of a specified type, and the logical relations proof measures terms by the structure of those types. It should not be read as a theorem about arbitrary lambda terms or about languages with unrestricted recursion. In the untyped calculus, the term $\Omega=(\lambda x.x\,x)(\lambda x.x\,x)$ reduces to itself, so proof simplification would not necessarily reach a detour-free representative without the typing restrictions.
[example: Normalizing a Proof of $A\to A$]
We compute the type and the reduction of $(\lambda f.\lambda x.f\,x)(\lambda y.y)$ with $f:A\to A$, $x:A$, and $y:A$. The variable rule gives
\begin{align*}
y:A\vdash y:A.
\end{align*}
By implication introduction,
\begin{align*}
\vdash \lambda y.y:A\to A.
\end{align*}
Also, from $f:A\to A$ and $x:A$, implication elimination gives
\begin{align*}
f:A\to A,x:A\vdash f\,x:A.
\end{align*}
Discharging $x$ and then $f$ gives
\begin{align*}
\vdash \lambda f.\lambda x.f\,x:(A\to A)\to(A\to A).
\end{align*}
Applying this function to $\lambda y.y$ therefore yields
\begin{align*}
\vdash (\lambda f.\lambda x.f\,x)(\lambda y.y):A\to A.
\end{align*}
The first beta step substitutes $\lambda y.y$ for the bound variable $f$:
\begin{align*}
(\lambda f.\lambda x.f\,x)(\lambda y.y)\to_\beta(\lambda x.f\,x)[(\lambda y.y)/f].
\end{align*}
Since the substitution replaces each free occurrence of $f$ in the body and $x$ is a different bound variable,
\begin{align*}
(\lambda x.f\,x)[(\lambda y.y)/f]=\lambda x.(\lambda y.y)\,x.
\end{align*}
The second beta step reduces the application inside the body:
\begin{align*}
(\lambda y.y)\,x\to_\beta y[x/y].
\end{align*}
The substitution $y[x/y]$ replaces the variable $y$ by $x$, so
\begin{align*}
y[x/y]=x.
\end{align*}
Therefore
\begin{align*}
\lambda x.(\lambda y.y)\,x\to_\beta\lambda x.x.
\end{align*}
Thus the full reduction sequence is
\begin{align*}
(\lambda f.\lambda x.f\,x)(\lambda y.y)\to_\beta\lambda x.(\lambda y.y)\,x\to_\beta\lambda x.x.
\end{align*}
The term $\lambda x.x$ has no beta redex in its body, so it is beta-normal; the original proof of $A\to A$ only inserted an administrative function around the identity proof, and normalization removes that layer.
[/example]
This example shows normalization producing a canonical proof of an implication. For falsehood, the consequence is sharper: if a closed proof of $\mathbf{0}$ existed, normalization would produce a closed normal proof of $\mathbf{0}$, so consistency reduces to inspecting the possible normal forms. This is where the empty type differs from every inhabited connective: functions have lambda constructors, products have pair constructors, sums have injections, and truth has $\star$, but falsehood has no constructor at all. Without normalization, a closed term could hide forever behind eliminations and reductions; without the empty type having no constructors, a normal proof of falsehood would no longer be impossible.
[quotetheorem:4633]
[citeproof:4633]
This theorem is the logical payoff of the chapter, and its proof uses both ingredients just highlighted. Normalization turns any alleged closed proof of falsehood into a closed normal proof, and the syntax of $\mathbf{0}$ then rules out such a normal proof because there is no constructor for it. The result is therefore a consistency theorem for the pure simply typed propositional fragment. It does not automatically apply after adding arbitrary axioms, general recursion, or unrestricted classical control operators; each extension needs its own normalization or consistency argument.
[remark: Constructive Meaning of Disjunction and Existence]
Although this chapter only treats propositional disjunction, the same idea later explains dependent existential types. A proof of a constructive disjunction contains a choice of side, and a proof of a dependent existential will contain a witness together with evidence that the witness satisfies the predicate. This is why Curry-Howard is more informative than a truth-table account of connectives.
[/remark]
The chapter has recast the simply typed lambda calculus as intuitionistic propositional logic. Chapter 3 first broadens the simply typed fragment with products, sums, and inductive data; after that, the course moves to dependent types, where propositions may depend on terms and logical quantifiers appear as dependent function and pair types.
The Curry-Howard correspondence shows that proofs and programs share the same structural grammar, but the simply typed fragment still has only a small repertoire of type formers. The next chapter expands that repertoire with products, sums, and inductive data, so the calculus can describe ordinary mathematical objects as well as functions.
# 3. Products, Sums, and Inductive Data
This chapter adds the first genuinely algebraic type formers to the simply typed calculus. Function types describe processes from inputs to outputs; products, sums, unit, empty, natural numbers, lists, and trees describe the shapes of data that those processes consume and produce. The guiding question is: once a type is introduced by its constructors, what is the corresponding elimination principle, and why does that elimination principle determine all functions out of the type?
The chapter therefore continues Chapter 2's Curry-Howard viewpoint, but the emphasis shifts from proof connectives to data. Conjunction becomes binary product, disjunction becomes binary coproduct, truth becomes the unit type, falsehood becomes the empty type, and induction appears as the elimination rule for recursively generated types.
Throughout this chapter, uniqueness principles are stated relative to the equality rules assumed for the type former in question. When uniqueness is judgmental, the relevant eta rule is part of the definitional equality theory. When uniqueness is propositional, the proof uses the corresponding eta principle and, for equality of functions, function extensionality. This convention matters because the beta computation rules alone say how eliminators act on constructors, while eta principles say that the constructors account for all terms of the type.
## Finite Type Formers and Their Eliminators
How should a type theory express that a datum has no information, exactly one piece of information, two pieces of information, or a finite choice of alternatives? The answer is not to describe these types by listing their set-theoretic elements. Instead, each finite type former is specified by introduction rules, elimination rules, computation laws, and a uniqueness principle.
The smallest finite data type with an inhabitant is the unit type. It represents a datum carrying no computational information beyond its existence, so it is the right type for a theorem with no hypotheses or a program input that carries no meaningful parameter.
[definition: Unit Type]
The unit type is a type $\top$ equipped with a canonical term $\ttstar : \top$.
[/definition]
The definition gives the constructor, but a usable type also needs a rule for consuming its terms. Since every term of $\top$ is accounted for by $\ttstar$, a map out of $\top$ should be determined by the value assigned to that constructor.
[quotetheorem:9615]
[citeproof:9615]
This theorem depends on the eta principle because the computation rule alone only tells us what the recursor does to $\ttstar$; it does not by itself say that every element of $\top$ is exhausted by that constructor. The result also does not assert that $\top$ is empty or computationally irrelevant in every semantic model; it asserts that maps out of $\top$ carry no more information than a chosen point of the codomain. Without this uniqueness statement, two functions $\top \to C$ could agree on the constructor yet fail to be identified by the theory, breaking the intended Curry-Howard reading of truth as a proposition with a single proof. The next boundary case is even more extreme: a type with no constructors, which represents an impossible datum and gives the type-theoretic form of ex falso.
[definition: Empty Type]
The empty type is a type $\bot$ with no introduction rule.
[/definition]
Having no constructor means that a term of $\bot$ cannot be analysed by any real branch. This absence is what makes its eliminator useful: a function out of $\bot$ requires no cases, because there is no possible input constructor to handle.
[quotetheorem:9616]
[citeproof:9616]
The hypothesis that $\bot$ has no introduction rule is essential: if even one constructor existed, a map $\bot \to C$ would require a value of $C$ for that constructor. The eliminator does not produce a closed term of every type $C$; it only produces a term of $C$ from an assumed impossible input. This distinction is the type-theoretic difference between explosion from contradiction and global inconsistency. The unit and empty types describe zero-choice endpoints: one constructor or none, while products describe data made from two pieces at the same time.
[definition: Binary Product Type]
For types $A$ and $B$, the binary product type $A \times B$ is equipped with a pairing constructor $(-,-) : A \to B \to A \times B$ and projections $\pi_1 : A \times B \to A$ and $\pi_2 : A \times B \to B$.
[/definition]
The computation laws for products say that the projections recover the entries of a pair:
\begin{align*}
\pi_1(a,b) &= a.
\end{align*}
\begin{align*}
\pi_2(a,b) &= b.
\end{align*}
These laws explain how to use a particular pair. The universal property asks a stronger question: what information is needed to define any function into a product?
[quotetheorem:9617]
[citeproof:9617]
The theorem says that products package two outputs without adding hidden structure, and the eta hypothesis is the part that rules out hidden structure. The projection equations alone only constrain what happens after applying $\pi_1$ and $\pi_2$; without eta, a pair-like object could carry extra information invisible to both projections. The result is therefore a categorical universal property: $A \times B$ is characterized by how maps into it correspond to pairs of maps into $A$ and $B$. A standard way to use this is to combine two computations with the same input into a single computation whose result retains both outputs.
[example: Pairing Two Computations]
Let $f : \mathbb{N}_0 \to \mathbb{N}_0$ be given by $f(n):=2n$, and let $g : \mathbb{N}_0 \to \mathbb{N}_0$ be given by $g(n):=n^2$. The paired map $p:=\langle f,g\rangle : \mathbb{N}_0 \to \mathbb{N}_0 \times \mathbb{N}_0$ is defined by
\begin{align*}
p(n) = (f(n),g(n)) = (2n,n^2).
\end{align*}
The projection equations follow from the product beta laws:
\begin{align*}
\pi_1(p(n)) = \pi_1(f(n),g(n)) = f(n) = 2n.
\end{align*}
\begin{align*}
\pi_2(p(n)) = \pi_2(f(n),g(n)) = g(n) = n^2.
\end{align*}
Now suppose $h : \mathbb{N}_0 \to \mathbb{N}_0 \times \mathbb{N}_0$ has the same two projections, meaning $\pi_1(h(n))=2n$ and $\pi_2(h(n))=n^2$ for every $n : \mathbb{N}_0$. For each $n$, the product eta principle gives
\begin{align*}
h(n) = (\pi_1(h(n)),\pi_2(h(n))).
\end{align*}
Substituting the two projection equalities gives
\begin{align*}
h(n) = (2n,n^2) = p(n).
\end{align*}
Thus $h$ and $p$ agree at every input $n$, so by function extensionality they are equal as functions. This is the reusable proof technique for product-valued functions: prove equality by proving equality after both projections.
[/example]
The product example shows how to combine simultaneous outputs. The next finite connective is different: instead of retaining two components at once, it records a choice of origin, so the eliminator must split into cases.
[definition: Binary Coproduct Type]
For types $A$ and $B$, the binary coproduct type $A + B$ is equipped with injections $\iota_1 : A \to A+B$ and $\iota_2 : B \to A+B$.
[/definition]
The injections explain how to build a sum term from either side, and this raises the central use question for sums: how can a consumer handle a term when it does not know which injection produced it? The answer is a case-analysis principle, and its uniqueness form says that the two branch functions determine the whole function out of $A+B$.
[quotetheorem:9618]
[citeproof:9618]
The result formalises proof by cases and program branching, and the two-injection hypothesis is what makes the branch list complete. If a term of $A+B$ could be formed in some third way, the two handlers $f$ and $g$ would not determine a total consumer. The theorem also does not say that an element of $A+B$ contains data from both sides; it says that every consumer must be prepared for either origin. This is the coproduct universal property dual to the product property above, and it is the point at which proof by cases becomes an elimination rule rather than an informal argument. A useful example is a total function that consumes either a successful output or an error code, with no partiality in the ambient type theory.
[example: Case Analysis on an Error Type]
Let $A$ be a type of successful outputs, let $E$ be a type of error codes, and let $C$ be the result type. Choose a success handler $s : A \to C$ and an error handler $r : E \to C$. By the coproduct eliminator, these determine a function $q := [s,r] : A+E \to C$, and the coproduct beta laws give the two constructor equations:
\begin{align*}
q(\iota_1(a)) = [s,r](\iota_1(a)) = s(a).
\end{align*}
\begin{align*}
q(\iota_2(e)) = [s,r](\iota_2(e)) = r(e).
\end{align*}
Now suppose $h : A+E \to C$ is another function with the same behavior on successes and errors:
\begin{align*}
h(\iota_1(a)) = s(a).
\end{align*}
\begin{align*}
h(\iota_2(e)) = r(e).
\end{align*}
To compare $h$ with $q$, use case analysis on an arbitrary input $x : A+E$, as justified by the coproduct eta principle in *[Uniqueness Principle for Coproducts](/theorems/9618)*. If $x=\iota_1(a)$, then
\begin{align*}
h(x)=h(\iota_1(a))=s(a)=q(\iota_1(a))=q(x).
\end{align*}
If $x=\iota_2(e)$, then
\begin{align*}
h(x)=h(\iota_2(e))=r(e)=q(\iota_2(e))=q(x).
\end{align*}
Thus $h(x)=q(x)$ for every $x:A+E$, so by function extensionality $h=q$. The example shows that a consumer of an error type is total exactly because it supplies one branch for successful outputs and one branch for error codes.
[/example]
Binary sums cover two alternatives, as the error example illustrates. Syntax and programs often need a fixed finite menu of choices, so finite enumerated types package the same branching pattern with any finite list of nullary constructors.
[definition: Finite Enumerated Type]
A finite enumerated type is a type $E$ specified by finitely many nullary constructors $e_1 : E$, $e_2 : E$, ..., $e_n : E$.
[/definition]
Its eliminator is finite pattern matching: to define a function $E \to C$, one gives terms $c_i : C$ for each constructor $e_i$. Booleans are the case $n=2$, with constructors often written $\mathsf{true}$ and $\mathsf{false}$.
## Natural Numbers, Recursion, and Induction
What extra structure is needed to describe data that can be built to arbitrary finite depth? The natural-number type used in this course is the inductive type of finite ordinals, written $\mathbb{N}_0$ to distinguish it from the wiki convention that $\mathbb{N}$ starts at $1$. It has a base constructor and a step constructor, and its eliminators are recursion and induction.
The constructors for natural numbers are finite in number, but they generate infinitely many closed numerals by repeated use of successor. This is the first place where an eliminator must support a recursive clause. Without a structural rule, recursive definitions would be indistinguishable from arbitrary self-reference, so the type former must say exactly which previous value is available at a successor.
[definition: Natural Number Type]
The natural number type $\mathbb{N}_0$ is a type equipped with constructors $0 : \mathbb{N}_0$ and $S : \mathbb{N}_0 \to \mathbb{N}_0$.
[/definition]
The definition gives the two constructors from which numerals are generated. To define functions out of all natural numbers, we need a rule that starts at $0$ and extends through successors in a way that determines every value.
[quotetheorem:9619]
[citeproof:9619]
The theorem permits recursion only because the successor clause receives the recursive value at the predecessor $n$. It does not permit a definition of $f(S(n))$ in terms of $f(S(n))$ itself or in terms of values at larger inputs, which would destroy the constructor-by-constructor justification. The dependence of $s$ on $n$ is useful for definitions whose update rule changes with the stage, while the fixed codomain $C$ keeps this as ordinary recursion rather than induction over a family. Many mathematical statements about natural numbers have result types that vary with the number, so the dependent eliminator is needed to express induction as a term-forming rule.
[quotetheorem:9620]
[citeproof:9620]
The base hypothesis $p_0$ is required because $0$ has no predecessor from which evidence could be inherited. The step hypothesis is required because a proof at $S(n)$ may depend on the already constructed proof at $n$, not merely on the number $n$ itself. This principle does not prove arbitrary facts about $\mathbb{N}_0$ automatically; it reduces such a proof to exactly the base and successor obligations dictated by the constructors. The recursion theorem is not only an abstract rule; it is the mechanism by which arithmetic operations are introduced, with addition as the first example because its recursive clause uses only successor.
[example: Defining Addition by Recursion]
Fix $m : \mathbb{N}_0$. To define addition with left argument $m$, apply primitive recursion on the second argument with target type $C:=\mathbb{N}_0$, base value $c_0:=m$, and step function $s : \mathbb{N}_0 \to \mathbb{N}_0 \to \mathbb{N}_0$ given by $s(n,k):=S(k)$. Thus
\begin{align*}
\mathsf{add}_m:=\operatorname{rec}_{\mathbb{N}_0}(m,s) : \mathbb{N}_0 \to \mathbb{N}_0.
\end{align*}
The base computation rule for primitive recursion gives
\begin{align*}
\mathsf{add}_m(0)=\operatorname{rec}_{\mathbb{N}_0}(m,s)(0)=m.
\end{align*}
For a successor input $S(n)$, the successor computation rule gives
\begin{align*}
\mathsf{add}_m(S(n))=\operatorname{rec}_{\mathbb{N}_0}(m,s)(S(n))=s(n,\operatorname{rec}_{\mathbb{N}_0}(m,s)(n)).
\end{align*}
Since $s(n,k):=S(k)$ and $\mathsf{add}_m=\operatorname{rec}_{\mathbb{N}_0}(m,s)$, this becomes
\begin{align*}
\mathsf{add}_m(S(n))=S(\operatorname{rec}_{\mathbb{N}_0}(m,s)(n))=S(\mathsf{add}_m(n)).
\end{align*}
Writing $m+n$ for $\mathsf{add}_m(n)$, the two defining equations are therefore
\begin{align*}
m+0=\mathsf{add}_m(0)=m.
\end{align*}
\begin{align*}
m+S(n)=\mathsf{add}_m(S(n))=S(\mathsf{add}_m(n))=S(m+n).
\end{align*}
Thus addition is introduced as a term obtained from the eliminator for $\mathbb{N}_0$, with its equations forced by the base and successor computation rules rather than postulated as external arithmetic laws.
[/example]
Once addition has been defined, it can be used inside later recursive clauses. Multiplication is the next layer because its successor case accumulates one more copy of the fixed multiplicand.
[example: Defining Multiplication from Addition]
Fix $m : \mathbb{N}_0$. Define multiplication with left argument $m$ by primitive recursion on the second argument. Use target type $C:=\mathbb{N}_0$, base value $c_0:=0$, and step function $s : \mathbb{N}_0 \to \mathbb{N}_0 \to \mathbb{N}_0$ given by $s(n,k):=k+m$, where addition has already been defined. Thus
\begin{align*}
\mathsf{mul}_m:=\operatorname{rec}_{\mathbb{N}_0}(0,s):\mathbb{N}_0\to\mathbb{N}_0.
\end{align*}
The base computation rule for primitive recursion gives
\begin{align*}
\mathsf{mul}_m(0)=\operatorname{rec}_{\mathbb{N}_0}(0,s)(0)=0.
\end{align*}
For a successor input $S(n)$, the successor computation rule gives
\begin{align*}
\mathsf{mul}_m(S(n))=\operatorname{rec}_{\mathbb{N}_0}(0,s)(S(n))=s(n,\operatorname{rec}_{\mathbb{N}_0}(0,s)(n)).
\end{align*}
Since $s(n,k):=k+m$ and $\mathsf{mul}_m=\operatorname{rec}_{\mathbb{N}_0}(0,s)$, this becomes
\begin{align*}
\mathsf{mul}_m(S(n))=\operatorname{rec}_{\mathbb{N}_0}(0,s)(n)+m=\mathsf{mul}_m(n)+m.
\end{align*}
Writing $m\cdot n$ for $\mathsf{mul}_m(n)$, the defining equations are therefore
\begin{align*}
m\cdot 0=\mathsf{mul}_m(0)=0.
\end{align*}
\begin{align*}
m\cdot S(n)=\mathsf{mul}_m(S(n))=\mathsf{mul}_m(n)+m=(m\cdot n)+m.
\end{align*}
Thus multiplication is repeated addition in the precise sense that the value at $0$ is $0$, and each successor step adds one more copy of the fixed multiplicand $m$ using the previously defined addition operation.
[/example]
The recursive equations for arithmetic suggest a universal property: a natural number is exactly what can be iterated from a starting point. To state this categorically, we package a target type together with its chosen starting point and step map.
[definition: Algebra for the Natural Number Signature]
An algebra for the natural number signature consists of a type $C$, a point $c_0 : C$, and a function $s : C \to C$.
[/definition]
This definition gives the possible targets for iteration, but it does not yet say why the natural numbers are the canonical source of such iterations.
The next formal issue is not another constructor rule, but a universal property: once a target algebra chooses a starting point and a step map, there should be one and only one map out of $\mathbb{N}$ that sends $0$ to the starting point and successor to the chosen step. Without that uniqueness, recursion equations would describe possible functions rather than determine the recursive function. This is the initiality property of the natural numbers as an algebra.
[remark: Initiality of the Natural Number Algebra]
Let $(C,c_0,s)$ be an algebra for the natural number signature, with $c_0:C$ and $s:C\to C$. The initiality property of $\mathbb{N}_0$ says that there is a unique map
\begin{align*}
f:\mathbb{N}_0\to C
\end{align*}
such that
\begin{align*}
f(0)=c_0
\end{align*}
and
\begin{align*}
f(S(n))=s(f(n))
\end{align*}
for every $n:\mathbb{N}_0$.
[/remark]
This property packages recursion as a universal property: to interpret natural numbers in any algebra, choose where $0$ goes and how successor acts. The initiality claim would fail without uniqueness, since many structure-preserving maps could then satisfy the same constructor equations without being identified. It also would fail without the homomorphism equations, because an arbitrary function $\mathbb{N}_0 \to C$ need not respect the chosen point and step of the target algebra. This categorical formulation prepares the list and tree cases, where the same idea appears as folds over richer constructor signatures.
## Lists, Trees, and Structural Recursion
Natural numbers are generated by one base constructor and one unary step constructor. Many data types have several constructors, and some constructors contain recursive occurrences of the type being defined. Lists and trees show how the same introduction-elimination pattern scales to ordinary inductive data.
For lists, the constructors say that a list over $A$ is either empty or obtained by adjoining a head element to a smaller list. This gives the simplest recursive data type whose elements also carry non-recursive payload data.
[definition: List Type]
For a type $A$, the list type $\mathsf{List}(A)$ is equipped with constructors $\mathsf{nil} : \mathsf{List}(A)$ and $\mathsf{cons} : A \to \mathsf{List}(A) \to \mathsf{List}(A)$.
[/definition]
The list definition contributes two constructors, and their shapes dictate the next eliminator. The empty constructor supplies a base branch, while the cons constructor supplies a head, a tail, and the recursive value already computed on that tail.
[remark: List Recursion Principle]
For a type $A$ and a target type $C$, list recursion is specified by a base value $c_{\mathsf{nil}}:C$ and a cons branch
\begin{align*}
c_{\mathsf{cons}}:A\to\mathsf{List}(A)\to C\to C.
\end{align*}
These data determine a function
\begin{align*}
\operatorname{rec}_{\mathsf{List}}(c_{\mathsf{nil}},c_{\mathsf{cons}}):\mathsf{List}(A)\to C
\end{align*}
with computation rules
\begin{align*}
\operatorname{rec}_{\mathsf{List}}(c_{\mathsf{nil}},c_{\mathsf{cons}})(\mathsf{nil})=c_{\mathsf{nil}}
\end{align*}
and
\begin{align*}
\operatorname{rec}_{\mathsf{List}}(c_{\mathsf{nil}},c_{\mathsf{cons}})(\mathsf{cons}(a,\ell))
=c_{\mathsf{cons}}(a,\ell,\operatorname{rec}_{\mathsf{List}}(c_{\mathsf{nil}},c_{\mathsf{cons}})(\ell)).
\end{align*}
[/remark]
The principle relies on the fact that the recursive occurrence in $\mathsf{cons}$ is the tail, so the recursive call is made on a direct sublist. A definition that tried to recurse on a larger list, or on a list obtained by first calling the function being defined, would not be justified by this eliminator. The branch data also show why the head $a$ and the tail $\ell$ are both available in the cons case: the constructor contains both, while only the tail contributes a recursive value. The recursor is most visible when defining functions with an accumulator fixed outside the recursion, such as append by recursion on the first list.
[example: Defining List Append]
Let $A$ be a type and fix a list $r : \mathsf{List}(A)$. To define append with right argument $r$, use the list recursor with target type $C:=\mathsf{List}(A)$, base branch $c_{\mathsf{nil}}:=r$, and cons branch
\begin{align*}
c_{\mathsf{cons}}(a,\ell,k):=\mathsf{cons}(a,k).
\end{align*}
Thus
\begin{align*}
\mathsf{append}_r:=\operatorname{rec}_{\mathsf{List}}(r,c_{\mathsf{cons}}):\mathsf{List}(A)\to\mathsf{List}(A).
\end{align*}
The empty-list computation rule gives
\begin{align*}
\mathsf{append}_r(\mathsf{nil})=\operatorname{rec}_{\mathsf{List}}(r,c_{\mathsf{cons}})(\mathsf{nil})=r.
\end{align*}
For a cons input, the cons computation rule gives
\begin{align*}
\mathsf{append}_r(\mathsf{cons}(a,\ell))=\operatorname{rec}_{\mathsf{List}}(r,c_{\mathsf{cons}})(\mathsf{cons}(a,\ell))=c_{\mathsf{cons}}(a,\ell,\operatorname{rec}_{\mathsf{List}}(r,c_{\mathsf{cons}})(\ell)).
\end{align*}
Using the definition of $c_{\mathsf{cons}}$ and the definition of $\mathsf{append}_r$, this becomes
\begin{align*}
\mathsf{append}_r(\mathsf{cons}(a,\ell))=\mathsf{cons}(a,\operatorname{rec}_{\mathsf{List}}(r,c_{\mathsf{cons}})(\ell))=\mathsf{cons}(a,\mathsf{append}_r(\ell)).
\end{align*}
Writing $\ell++r$ for $\mathsf{append}_r(\ell)$, the defining equations are
\begin{align*}
\mathsf{nil}++r=r.
\end{align*}
\begin{align*}
\mathsf{cons}(a,\ell)++r=\mathsf{cons}(a,\ell++r).
\end{align*}
For example, if $a,b:A$, then repeated use of the cons equation gives
\begin{align*}
\mathsf{cons}(a,\mathsf{cons}(b,\mathsf{nil}))++r=\mathsf{cons}(a,\mathsf{cons}(b,\mathsf{nil})++r).
\end{align*}
\begin{align*}
\mathsf{cons}(a,\mathsf{cons}(b,\mathsf{nil})++r)=\mathsf{cons}(a,\mathsf{cons}(b,\mathsf{nil}++r)).
\end{align*}
\begin{align*}
\mathsf{cons}(a,\mathsf{cons}(b,\mathsf{nil}++r))=\mathsf{cons}(a,\mathsf{cons}(b,r)).
\end{align*}
So append keeps each head of the first list in the same order, recurses only on its tail, and stops by returning the fixed list $r$ at the empty-list case.
[/example]
Lists have one recursive subdatum in their cons constructor. To see how eliminators handle branching recursive structure, the next example uses binary trees, whose node constructor contains two recursive subtrees.
[definition: Binary Tree Type]
For a type $A$ of labels, the binary tree type $\mathsf{Tree}(A)$ is equipped with constructors $\mathsf{leaf} : A \to \mathsf{Tree}(A)$ and $\mathsf{node} : \mathsf{Tree}(A) \to \mathsf{Tree}(A) \to \mathsf{Tree}(A)$.
[/definition]
The recursion principle for binary trees follows the constructor shape. Leaf data are handled directly, while node data require combining the recursively computed values from the left and right subtrees.
[remark: Binary Tree Recursion Principle]
For a type $A$ of labels and a target type $C$, binary-tree recursion is specified by a leaf branch
\begin{align*}
c_{\mathsf{leaf}}:A\to C
\end{align*}
and a node branch
\begin{align*}
c_{\mathsf{node}}:\mathsf{Tree}(A)\to\mathsf{Tree}(A)\to C\to C\to C.
\end{align*}
These data determine a function
\begin{align*}
\operatorname{rec}_{\mathsf{Tree}}(c_{\mathsf{leaf}},c_{\mathsf{node}}):\mathsf{Tree}(A)\to C
\end{align*}
whose node case receives the recursively computed values for the left and right subtrees.
[/remark]
The node equation shows why branching recursion needs more than the list pattern: the branch receives two recursive values, one for each subtree. The principle does not allow recursive calls on arbitrary trees assembled from $t_1$ and $t_2$; it allows exactly the calls supplied by the immediate recursive arguments of the constructor. This restriction is what makes structural recursion well-founded even when the data have branching shape. Counting leaves is a compact example of a tree fold whose node case combines those two recursive values by addition.
[example: Counting Leaves of a Binary Tree]
Take $C := \mathbb{N}$. Define the leaf branch $c_{\mathsf{leaf}} : A \to \mathbb{N}$ by $c_{\mathsf{leaf}}(a):=1$, and define the node branch by
\begin{align*}
c_{\mathsf{node}}(t_1,t_2,u,v):=u+v.
\end{align*}
The tree recursor then gives a function
\begin{align*}
\mathsf{leaves}:=\operatorname{rec}_{\mathsf{Tree}}(c_{\mathsf{leaf}},c_{\mathsf{node}}):\mathsf{Tree}(A)\to\mathbb{N}.
\end{align*}
For a leaf input, the leaf computation rule gives
\begin{align*}
\mathsf{leaves}(\mathsf{leaf}(a))=\operatorname{rec}_{\mathsf{Tree}}(c_{\mathsf{leaf}},c_{\mathsf{node}})(\mathsf{leaf}(a))=c_{\mathsf{leaf}}(a)=1.
\end{align*}
For a node input, the node computation rule gives
\begin{align*}
\mathsf{leaves}(\mathsf{node}(t_1,t_2))=c_{\mathsf{node}}(t_1,t_2,\mathsf{leaves}(t_1),\mathsf{leaves}(t_2)).
\end{align*}
Substituting the definition of $c_{\mathsf{node}}$ yields
\begin{align*}
c_{\mathsf{node}}(t_1,t_2,\mathsf{leaves}(t_1),\mathsf{leaves}(t_2))=\mathsf{leaves}(t_1)+\mathsf{leaves}(t_2).
\end{align*}
Therefore
\begin{align*}
\mathsf{leaves}(\mathsf{node}(t_1,t_2))=\mathsf{leaves}(t_1)+\mathsf{leaves}(t_2).
\end{align*}
Thus each leaf contributes exactly $1$, while a node contributes no new leaf of its own; it only adds the two counts already obtained from its immediate subtrees.
[/example]
## Primitive Recursive Programs as Typed Eliminations
The danger in adding recursion to a type theory is arbitrary self-reference: an unrestricted equation such as $f(x)=f(x)$ or a definition that calls itself on larger data gives no constructor-based explanation of the value being defined. The examples above avoid this danger because each recursive call is justified by an eliminator for the input type. Type theory therefore treats primitive recursive functions not as informal algorithms but as terms obtained by recursion principles.
To make this precise, we separate primitive recursive definitions from arbitrary self-reference. A definition is primitive recursive in the typed sense only when the recursive calls occur in the slots provided by the relevant eliminator.
[definition: Primitive Recursive Definition in Type Theory]
A primitive recursive definition is a term defined from constructors, previously defined terms, and eliminators for inductive types, with recursive calls occurring only in the positions supplied by those eliminators.
[/definition]
This definition is syntactic: it records how a term is built. Its mathematical force comes from the fact that eliminators provide both computation rules and uniqueness principles, so the defining equations are controlled by the type former rather than by an external operational convention.
[quotetheorem:9621]
[citeproof:9621]
This result explains why definitional equations for addition, multiplication, append, and tree folds are not extra axioms. The theorem needs the restriction to eliminator-supplied recursive calls; without it, the defining equations might not determine a total function. It also needs uniqueness, because computation equations alone specify how a candidate behaves on constructors but do not identify all candidates satisfying those equations. In programming-language terms, primitive recursive programs are folds over inductive data, and the fold laws are inherited from the type former rather than imposed from outside.
[remark: Constructors Determine Eliminators]
For each inductive type in this chapter, the eliminator mirrors the constructors. A nullary constructor contributes a base branch, a constructor with non-recursive data contributes parameters to a branch, and each recursive argument contributes an induction hypothesis or recursive value. This constructor-by-constructor pattern is the operational core of inductive data in type theory.
[/remark]
The next chapter uses this lesson in a dependent setting. Once types may depend on terms, products become dependent products, sums become dependent sums, and induction principles become the main mechanism for constructing proofs and programs whose result types vary with their inputs.
Products, sums, and inductive types make the simply typed calculus expressive enough to model structured data and recursive definitions. That prepares the move to dependent type theory, where types can mention terms and the formation of judgments becomes sensitive to context in a new way.
# 4. Dependent Judgments and Martin-Löf Type Theory
This chapter is the point where the syntax stops being merely typed and becomes dependent. In the simply typed lambda calculus, a context records variables together with fixed types; in Martin-Löf type theory, later types in a context may mention earlier variables. That single change forces us to separate several kinds of judgment, to control substitution with care, and to distinguish equality by computation from equality proved internally by a term.
The guiding question is: what must the formal system remember so that expressions like “the type of vectors of length $n$” are meaningful before we know the value of $n$? The answer is a discipline of contexts and judgments. Once those are in place, dependent products, dependent sums, identity types, universes, and inductive families can be introduced without changing the basic metatheoretic shape of the calculus.
## Judgments Before Rules
How can a type theory say that an expression is meaningful before it says whether the expression is inhabited or equal to something else? In dependent type theory, well-formedness is not background grammar alone: it is itself a judgment, and every formation rule must state the context in which it is valid.
[definition: Context]
A context is a finite list
\begin{align*}
\Gamma &= x_1 : A_1,\ x_2 : A_2(x_1),\ \dots,\ x_n : A_n(x_1,\dots,x_{n-1})
\end{align*}
such that each $A_i$ is a type in the preceding context $x_1 : A_1,\dots,x_{i-1}:A_{i-1}$.
[/definition]
The order of entries is part of the data, because later declarations may depend on earlier variables. This differs from the simply typed setting, where exchanging declarations usually has no visible effect. The following example shows why validity of a dependent context must be judged before any term inside it can be typed.
[example: A Dependent Context For Vectors]
Let $\mathsf{Nat}$ be the type of natural numbers, and suppose $\operatorname{Vec}(A,n)$ denotes the type of lists of elements of $A$ whose length is $n$. We verify that the context
\begin{align*}
A : \mathcal U,\ n : \mathsf{Nat},\ v : \operatorname{Vec}(A,n)
\end{align*}
is well formed by checking the declarations from left to right. The first declaration is valid because $\mathcal U$ is a universe, so $A : \mathcal U$ introduces a type variable. In the extended context $A : \mathcal U$, the type $\mathsf{Nat}$ is already formed, so the next declaration $n : \mathsf{Nat}$ is valid. In the further extended context $A : \mathcal U,\ n : \mathsf{Nat}$, both parameters needed to form $\operatorname{Vec}(A,n)$ are in scope: $A$ supplies the element type and $n$ supplies the length index. Hence the third declaration
\begin{align*}
v : \operatorname{Vec}(A,n)
\end{align*}
is meaningful in that preceding context.
The reordered list
\begin{align*}
v : \operatorname{Vec}(A,n),\ A : \mathcal U,\ n : \mathsf{Nat}
\end{align*}
fails the same left-to-right check at its first entry. To declare $v : \operatorname{Vec}(A,n)$, the expression $\operatorname{Vec}(A,n)$ would have to be a type in the empty preceding context, but the names $A$ and $n$ have not yet been declared there. Thus the original order is a valid dependent context, while the reordered list is not; dependency is recorded by the order of declarations, not merely by the collection of variable names.
[/example]
The example separates a well-scoped list of assumptions from an ill-scoped one. Since every later judgment will be made relative to such a list, the calculus needs a judgment that certifies the list itself before asking about types or terms in it. This is the base layer on which the other judgments depend.
[definition: Context Validity Judgment]
The judgment $\Gamma\ \mathsf{ctx}$ asserts that $\Gamma$ is a valid context.
[/definition]
Context validity tells us where it is meaningful to reason, but it does not yet say what counts as an object of study inside that environment. The next question is whether a type expression is well-formed under the assumptions already declared. This is the dependent analogue of checking that a formula has all its variables in scope.
[definition: Type Formation Judgment]
The judgment $\Gamma \vdash A\ \mathsf{type}$ asserts that $A$ is a type in the context $\Gamma$.
[/definition]
Type formation classifies expressions such as $\operatorname{Vec}(A,n)$ as valid types, but a type alone does not provide an inhabitant. To express proofs and programs, the calculus must also say that a term belongs to a type in the same context. This gives the judgment that carries the computational content of the theory.
[definition: Term Formation Judgment]
The judgment $\Gamma \vdash a : A$ asserts that $a$ is a term of type $A$ in the context $\Gamma$.
[/definition]
A term judgment can still be too rigid if the type checker is unable to identify expressions that compute to the same result. For example, a term of a vector type indexed by $1+1$ should be usable where the index $2$ is expected when the arithmetic reduces by computation. The next definition is needed because the typing relation must name this equality by computation, rather than treating it as a theorem requiring a proof term.
[definition: Definitional Equality Judgment]
The judgment $\Gamma \vdash a \equiv b : A$ asserts that $a$ and $b$ are definitionally equal terms of type $A$ in the context $\Gamma$.
[/definition]
There is also a definitional equality judgment for types, written $\Gamma \vdash A \equiv B\ \mathsf{type}$. The system uses this judgment when a term has been assigned one type and must be treated as having another computationally equal type.
[remark: Judgments Are Metatheoretic Assertions]
Judgments are not themselves terms of the theory. A derivation of $\Gamma \vdash a : A$ belongs to the metatheory, while a term $p : \operatorname{Id}_A(a,b)$ belongs to the theory. This distinction is essential when comparing definitional equality with propositional equality later in the chapter.
[/remark]
The four basic judgments are tied together by inference rules. A typical presentation writes rules as fractions whose premises are judgments and whose conclusion is a judgment:
\begin{align*}
\frac{\Gamma\ \mathsf{ctx}}{\Gamma \vdash \mathcal U\ \mathsf{type}},
\qquad
\frac{\Gamma \vdash A\ \mathsf{type}\qquad \Gamma \vdash a : A}{\Gamma \vdash a \equiv a : A}.
\end{align*}
The exact type formers vary between presentations of Martin-Löf type theory, but the structural discipline of judgments and contexts is common to all of them.
## Dependent Contexts And Families Of Types
What changes when a type is allowed to vary with a term? Instead of assigning a single type to a collection of objects, dependent type theory assigns a family of types indexed by elements of another type, and substitution becomes the operation of evaluating that family at a particular index.
[definition: Type Family]
Given a context $\Gamma$ and a type $A$ in that context, a type family over $A$ is a judgment
\begin{align*}
\Gamma,\ x : A \vdash B(x)\ \mathsf{type}.
\end{align*}
[/definition]
The notation $B(x)$ reminds us that $B$ may depend on the variable $x$. It is not a function in the metatheoretic set-theoretic sense; it is a type expression whose well-formedness requires $x : A$.
[example: Finite Sets Indexed By Natural Numbers]
Let $\mathsf{Nat}$ be the type of natural numbers, and let $\operatorname{Fin}(n)$ denote the type of indices strictly smaller than $n$. To say that $\operatorname{Fin}$ is a dependent family is to say that, once an index $n$ has been declared as a natural number, the expression $\operatorname{Fin}(n)$ is a well-formed type:
\begin{align*}
n : \mathsf{Nat} \vdash \operatorname{Fin}(n)\ \mathsf{type}.
\end{align*}
Substituting the numeral $3$ for the index $n$ gives the instance
\begin{align*}
\vdash \operatorname{Fin}(3)\ \mathsf{type}.
\end{align*}
Its intended inhabitants are exactly the three indices $0$, $1$, and $2$, because these are precisely the natural numbers satisfying $0<3$, $1<3$, and $2<3$. Thus a term
\begin{align*}
i : \operatorname{Fin}(3)
\end{align*}
is evidence that $i$ is one of those three allowed positions. Substituting $0$ instead gives
\begin{align*}
\vdash \operatorname{Fin}(0)\ \mathsf{type}.
\end{align*}
There is no natural number $j$ with $j<0$, so the usual introduction rules provide no constructor for a term of $\operatorname{Fin}(0)$. The example shows that the index is not decorative: changing $n$ changes which terms the type can have.
[/example]
Finite sets show that the index can control the available inhabitants of a type. The same mechanism also turns predicates into families of proof types, so the distinction between a proposition and the type of its proofs becomes syntactic rather than informal. This prepares the next example, where a logical predicate is represented by a dependent type.
[example: Predicates As Families]
In the context $n : \mathsf{Nat}$, define a type family $\operatorname{Even}(n)$ whose intended meaning is “proofs that $n$ is even.” Thus the judgment
\begin{align*}
n : \mathsf{Nat} \vdash \operatorname{Even}(n)\ \mathsf{type}
\end{align*}
says that once a natural number $n$ is in scope, the expression $\operatorname{Even}(n)$ is a well-formed type. Substituting the numeral $6$ for $n$ gives the instance
\begin{align*}
\vdash \operatorname{Even}(6)\ \mathsf{type}.
\end{align*}
Since $6$ is even because $6 = 2 \cdot 3$, an inhabitant
\begin{align*}
p : \operatorname{Even}(6)
\end{align*}
is a proof object witnessing that evenness claim, not a Boolean value such as $\mathsf{true}$.
Substituting the numeral $5$ instead gives
\begin{align*}
\vdash \operatorname{Even}(5)\ \mathsf{type}.
\end{align*}
There is no natural number $k$ with $5 = 2k$: if $k \leq 2$, then $2k \leq 4$, while if $k \geq 3$, then $2k \geq 6$. Hence the usual intended interpretation provides no inhabitant of $\operatorname{Even}(5)$. The family $\operatorname{Even}(-)$ therefore represents a predicate by assigning to each index not a truth value, but a type whose terms are proofs at that index.
[/example]
The examples make families meaningful, but using a family requires replacing its formal index by an actual term. This replacement must update not only the target judgment but also any later assumptions whose types depended on the index. The calculus therefore needs a precise substitution operation for dependent contexts.
[definition: Substitution Into A Type Family]
Let $\Gamma, x : A, \Delta \vdash J$ be a judgment whose context contains $x : A$, and let $\Gamma \vdash a : A$. The substituted judgment $\Gamma, \Delta[a/x] \vdash J[a/x]$ is obtained by replacing free occurrences of $x$ by $a$ in the later context entries and in $J$.
[/definition]
The phrase “later context entries” matters. In a dependent context, variables declared before $x$ cannot mention $x$, while variables declared after $x$ may depend on it.
[example: Substituting A Length]
Start with the judgment
\begin{align*}
A : \mathcal U,\ n : \mathsf{Nat},\ v : \operatorname{Vec}(A,n) \vdash v : \operatorname{Vec}(A,n).
\end{align*}
The declaration $A:\mathcal U$ occurs before $n$, so it contains no occurrence of $n$. The declaration $v:\operatorname{Vec}(A,n)$ occurs after $n$, so substituting the numeral $2$ for $n$ replaces the index in that later declaration:
\begin{align*}
\operatorname{Vec}(A,n)[2/n] = \operatorname{Vec}(A,2).
\end{align*}
The term in the judgment is the variable $v$, and $v$ is not the variable being replaced, so
\begin{align*}
v[2/n] = v.
\end{align*}
The type on the right side of the judgment is again updated by the same replacement:
\begin{align*}
\operatorname{Vec}(A,n)[2/n] = \operatorname{Vec}(A,2).
\end{align*}
Therefore, assuming $2:\mathsf{Nat}$, substitution changes the original judgment into
\begin{align*}
A : \mathcal U,\ v : \operatorname{Vec}(A,2) \vdash v : \operatorname{Vec}(A,2).
\end{align*}
The example shows that substitution in a dependent context is not only substitution in the final term: every later declaration whose type mentions the replaced variable must be updated as well.
[/example]
The substitution example shows that context manipulation must preserve derivability, not merely produce plausible-looking syntax. Before proving the full substitution theorem, we need a simpler stability result: adding an assumption that the current derivation does not use should not destroy the derivation. This need motivates the [weakening rule](/theorems/9622), the first structural theorem for dependent judgments.
[quotetheorem:9622]
[citeproof:9622]
Weakening says that unused assumptions may be added. In the dependent setting the freshness and well-formedness conditions prevent the new declaration from accidentally capturing variables or changing the meaning of existing expressions.
For example, if a term $c:C$ has already been derived in $\Gamma$, weakening permits the same term to be used in $\Gamma,n:\mathsf{Nat}$ while defining a family over natural numbers. Likewise, a constant codomain $C$ can be reused inside a dependent product or Sigma construction after a new variable $x:A$ has been introduced. This is why later rules can open a temporary assumption for a binder, an eliminator branch, or an identity-elimination motive without rebuilding every earlier derivation from scratch.
The hypotheses in weakening are all active. If $x$ is not fresh, then extending $x:A$ may shadow or duplicate an existing declaration, so a judgment mentioning $x$ no longer has a unique binding site. If $\Gamma \vdash A\ \mathsf{type}$ is missing, then the extended list may fail to be a context at all; for instance, appending $y:B(z)$ when no earlier variable $z$ is in scope creates an invalid environment. The theorem also does not say that the added assumption may be used to prove new judgments; it only preserves judgments already derivable without that assumption. This limitation leads directly to exchange: once contexts may contain harmless extra assumptions, we must ask when those assumptions may be moved without breaking dependency order.
[quotetheorem:9623]
[citeproof:9623]
Exchange is more restricted than in ordinary first-order logic. The theorem does not say that any two assumptions may be swapped; it says that swapping is permitted exactly when the dependency order remains valid.
The condition $\Gamma \vdash B\ \mathsf{type}$ is what makes the adjacent swap legitimate: it rules out cases such as $\Gamma,x:A,y:C(x)$, where moving $y$ before $x$ would put the type $C(x)$ before the variable it mentions. The validity of $\Delta^\sigma$ is also necessary, because a later declaration may mention both $x$ and $y$ and must still be meaningful after the swap. The theorem does not license arbitrary permutations of a dependent context; it only transports derivations across swaps whose dependency graph remains well ordered. Exchange handles order, but it still leaves assumptions as formal variables. The real purpose of a hypothesis $x:A$ is that it may later be instantiated by a term $a:A$, which leads to substitution through the entire dependent tail of the context.
[quotetheorem:9624]
[citeproof:9624]
The rule is the formal justification for using a dependent family at a chosen index. It also underlies beta-reduction for dependent products and the elimination rules for inductive families.
Each hypothesis in substitution prevents a different failure. The assumption $\Gamma \vdash a:A$ rules out replacing $x:A$ by a term of the wrong type, such as substituting a natural number into a variable expected to range over a universe. The simultaneous update of $\Delta$ is also necessary: from $n:\mathsf{Nat}, v:\operatorname{Vec}(A,n)$, substituting $2$ for $n$ without changing the declaration of $v$ would leave a context containing a free occurrence of the removed variable $n$. The theorem does not say that substitution preserves definitional equality by itself, nor that every syntactic replacement is capture-avoiding; those are handled by the equality and variable-renaming parts of the calculus. Substitution explains how to replace variables by terms, but type checking must also respect computation inside the types produced by that replacement. This need leads to the conversion rule.
[remark: Conversion Rule]
The conversion rule is a primitive typing rule rather than a theorem being proved from earlier rules. In a dependent type theory with typed definitional equality, it has the form: from judgments
\begin{align*}
\Gamma \vdash a : A
\end{align*}
and
\begin{align*}
\Gamma \vdash A \equiv B\ \mathsf{type},
\end{align*}
one may infer
\begin{align*}
\Gamma \vdash a : B.
\end{align*}
[/remark]
Conversion is what allows computation to simplify types during type checking. For example, a vector type indexed by $1+1$ may be used as a vector type indexed by $2$ when those indices are definitionally equal natural number terms.
The type-equality hypothesis is essential. Without $\Gamma \vdash A \equiv B\ \mathsf{type}$, a term $a:A$ cannot be retyped as an element of an unrelated type $B$; for example, a natural number term cannot be used as a proof term merely because both expressions occur in the same context. The rule also does not identify terms of $A$ and $B$ propositionally, and it does not provide a coercion along an arbitrary proof of equality between types. It only says that the type checker may silently replace a type by a definitionally equal one. This is the bridge from the structural rules for contexts to the next distinction: equality by computation versus equality proved internally by a term.
These structural theorems are not optional bookkeeping. They are what make dependent contexts behave like mathematical assumptions rather than a fragile syntax of ordered declarations.
## Definitional Equality And Propositional Equality
When should two expressions count as the same during type checking, and when should their equality require proof? Martin-Löf type theory answers with two equality notions: definitional equality is a metatheoretic judgment used by the checker, while propositional equality is represented by a type inhabited by proof terms.
[definition: Definitional Equality]
Definitional equality is the judgmental relation $\Gamma \vdash a \equiv b : A$ generated by computation rules, eta-principles when included, congruence rules, and structural equality rules such as reflexivity, symmetry, and transitivity.
[/definition]
A definitionally equal pair of terms is interchangeable without supplying a proof term. This is why beta-reduction can happen silently during type checking.
[example: Beta Equality For Functions]
Assume the dependent function is formed in a context where
\begin{align*}
\Gamma,\ x:A \vdash t : B(x).
\end{align*}
Then abstraction gives
\begin{align*}
\Gamma \vdash \lambda x.t : \prod_{x:A} B(x).
\end{align*}
If also $\Gamma \vdash a:A$, the application rule gives
\begin{align*}
\Gamma \vdash (\lambda x.t)(a) : B(a).
\end{align*}
The beta computation rule for dependent functions replaces the bound variable $x$ in the body $t$ by the argument $a$, and replaces $x$ by $a$ in the result type $B(x)$:
\begin{align*}
t[a/x] : B(x)[a/x].
\end{align*}
Since $B(x)[a/x]$ is exactly $B(a)$ by substitution into the type family, the judgmental equality is
\begin{align*}
\Gamma \vdash (\lambda x.t)(a) \equiv t[a/x] : B(a).
\end{align*}
Thus the application is identified with the substituted body by computation itself; no term of an identity type is required before $f(a)$ may be used as an element of $B(a)$.
[/example]
The beta example shows equality that the type checker performs as computation. Mathematics inside the theory also needs to express equalities that are not built into computation, such as an equality obtained by induction or by a previous theorem. For those equalities, the theory introduces a type whose elements are equality proofs.
[definition: Identity Type]
For a type $A$ in context $\Gamma$ and terms $a,b:A$, the identity type is a type
\begin{align*}
\Gamma \vdash \operatorname{Id}_A(a,b)\ \mathsf{type}.
\end{align*}
Its canonical reflexivity term has judgment
\begin{align*}
\Gamma \vdash \operatorname{refl}_a : \operatorname{Id}_A(a,a).
\end{align*}
[/definition]
The identity type says that equality can itself be reasoned about. Unlike definitional equality, a proof $p : \operatorname{Id}_A(a,b)$ is a term that may be passed to functions, analyzed by eliminators, or used as data.
[remark: Definitional Equality Implies Propositional Equality]
If $\Gamma \vdash a \equiv b : A$, then $\operatorname{refl}_a$ can be viewed as a term of $\operatorname{Id}_A(a,b)$ by conversion. Thus judgmental equality gives propositional equality. The converse is not generally valid in intensional Martin-Löf type theory.
[/remark]
The failure of the converse is a design choice with major consequences. It keeps type checking decidable in many systems and leaves room for higher-dimensional interpretations of equality, where different proofs of equality may themselves carry structure.
[example: Equal Lengths Versus Convertible Lengths]
Suppose $m,n : \mathsf{Nat}$, $p : \operatorname{Id}_{\mathsf{Nat}}(m,n)$, and $v : \operatorname{Vec}(A,m)$. The conversion rule would allow us to retype $v$ as an element of $\operatorname{Vec}(A,n)$ only from a judgmental equality of types
\begin{align*}
\operatorname{Vec}(A,m) \equiv \operatorname{Vec}(A,n)\ \mathsf{type}.
\end{align*}
Such a type equality would follow if the indices were judgmentally equal,
\begin{align*}
m \equiv n : \mathsf{Nat},
\end{align*}
by congruence for the type family $\operatorname{Vec}(A,-)$. But the given datum
\begin{align*}
p : \operatorname{Id}_{\mathsf{Nat}}(m,n)
\end{align*}
is a term of an identity type, not a derivation of the judgment $m \equiv n : \mathsf{Nat}$. Therefore conversion alone does not change
\begin{align*}
v : \operatorname{Vec}(A,m)
\end{align*}
into
\begin{align*}
v : \operatorname{Vec}(A,n).
\end{align*}
To use the proof $p$, take the family
\begin{align*}
B(x) = \operatorname{Vec}(A,x)
\end{align*}
over $x : \mathsf{Nat}$. Then
\begin{align*}
B(m) = \operatorname{Vec}(A,m)
\end{align*}
and
\begin{align*}
B(n) = \operatorname{Vec}(A,n).
\end{align*}
Transport along $p$ gives a map
\begin{align*}
\operatorname{transport}_B(p) : \operatorname{Vec}(A,m) \to \operatorname{Vec}(A,n).
\end{align*}
Applying this map to $v : \operatorname{Vec}(A,m)$ produces
\begin{align*}
\operatorname{transport}_B(p)(v) : \operatorname{Vec}(A,n).
\end{align*}
Thus equal lengths proved inside the theory require transport through the family $\operatorname{Vec}(A,-)$, while convertible lengths can be handled silently by definitional conversion.
[/example]
The vector example exposes the gap between knowing an equality by proof and having a type reduce by computation. A term indexed by $m$ cannot be retyped at $n$ without an operation that follows the proof $p$ through the family $\operatorname{Vec}(A,-)$. We need the next definition because that operation is the standard way to move terms across propositional equality in a dependent family.
[definition: Transport]
Let $\Gamma, x:A \vdash B(x)\ \mathsf{type}$ be a type family, and let $a,b:A$. For a proof $p : \operatorname{Id}_A(a,b)$, transport along $p$ is the map
\begin{align*}
\operatorname{transport}_B(p) : B(a) \to B(b).
\end{align*}
[/definition]
Transport is the operational face of substituting equal terms into dependent types. If the equality is definitional, conversion handles the change silently; if the equality is propositional, transport records the proof that justifies the change. This contrast motivates the small theorem showing that definitional equality always supplies a propositional equality proof by reflexivity and conversion.
[quotetheorem:9625]
[citeproof:9625]
This result is small but conceptually important. It says the internal equality type respects the external computation relation.
The definitional-equality hypothesis is necessary because the proof uses conversion on the identity type itself. If $a$ and $b$ are merely propositionally equal by some $p : \operatorname{Id}_A(a,b)$, the term $\operatorname{refl}_a$ still has type $\operatorname{Id}_A(a,a)$, not automatically $\operatorname{Id}_A(a,b)$. The theorem also does not give the converse: an inhabitant of $\operatorname{Id}_A(a,b)$ need not make $a$ and $b$ definitionally equal. This asymmetry is the reason transport remains necessary, and it prepares the later structural summary where conversion is only one admissible operation among several.
The distinction between these equalities explains many proof-assistant experiences. Some rewrites happen by computation and require no visible term; other rewrites require an explicit proof and invoke transport, rewriting, or induction on equality.
## Structural Rules For Dependent Contexts
How do the previous rules fit together into a stable calculus rather than a list of isolated admissibility facts? The answer is that dependent contexts form a structured environment: variables can be introduced, ignored, reordered when dependencies permit, substituted by terms, and converted across judgmental equality.
[definition: Context Extension]
If $\Gamma\ \mathsf{ctx}$ and $\Gamma \vdash A\ \mathsf{type}$, then context extension forms the context $\Gamma, x:A\ \mathsf{ctx}$ for a fresh variable $x$.
[/definition]
Context extension is the syntactic version of moving under an assumption. In dependent type theory, this is also the operation that creates a new possible index for later families. We need a final stability theorem because every later type former will rely on all structural operations working together, not merely on isolated weakening or substitution lemmas.
[quotetheorem:9626]
[citeproof:9626]
This theorem packages the structural rules into the slogan that derivability is invariant under legitimate changes of context. Each word “legitimate” hides a dependency condition, and those conditions are exactly what context validity records.
The separate hypotheses behind the four clauses cannot be dropped. Weakening needs a fresh, well-typed new assumption; exchange needs a dependency-respecting swap; substitution needs a correctly typed term and a substituted tail; conversion needs judgmental equality of types. If any condition fails, the combined stability statement fails as well: for example, substituting into a tail without updating dependent declarations can produce free variables, while exchanging $x:A$ with $y:C(x)$ destroys context validity. The theorem also does not say that every syntactic rearrangement of a derivation is valid, nor that propositional equality can replace definitional equality in type checking. It summarizes exactly the legitimate context changes that later dependent products, sums, universes, and inductive families may rely on.
A useful way to read this theorem is that dependent type theory treats contexts as meaningful mathematical environments. The syntax remembers enough dependency information to let later constructions behave predictably.
[example: Combining Substitution And Conversion]
Let $A : \mathcal U$, $n : \mathsf{Nat}$, and $v : \operatorname{Vec}(A,n)$, so we start from the judgment
\begin{align*}
A : \mathcal U,\ n : \mathsf{Nat},\ v : \operatorname{Vec}(A,n) \vdash v : \operatorname{Vec}(A,n).
\end{align*}
Suppose also that $k : \mathsf{Nat}$, $k \equiv 1+1 : \mathsf{Nat}$, and $1+1 \equiv 2 : \mathsf{Nat}$. By transitivity of definitional equality,
\begin{align*}
k \equiv 2 : \mathsf{Nat}.
\end{align*}
Now apply *Substitution Rule* with the substitution $k/n$. The earlier declaration $A : \mathcal U$ is unchanged because it does not contain $n$. The later declaration for $v$ is updated by replacing the length index:
\begin{align*}
\operatorname{Vec}(A,n)[k/n] = \operatorname{Vec}(A,k).
\end{align*}
The term $v$ itself is unchanged because the substituted variable is $n$, not $v$:
\begin{align*}
v[k/n] = v.
\end{align*}
The type on the right side is updated in the same way:
\begin{align*}
\operatorname{Vec}(A,n)[k/n] = \operatorname{Vec}(A,k).
\end{align*}
Hence substitution gives
\begin{align*}
A : \mathcal U,\ v : \operatorname{Vec}(A,k) \vdash v : \operatorname{Vec}(A,k).
\end{align*}
Since $k \equiv 2 : \mathsf{Nat}$, congruence for the type family $\operatorname{Vec}(A,-)$ gives
\begin{align*}
\operatorname{Vec}(A,k) \equiv \operatorname{Vec}(A,2)\ \mathsf{type}.
\end{align*}
Applying the conversion rule to the judgment $v : \operatorname{Vec}(A,k)$ therefore yields
\begin{align*}
A : \mathcal U,\ v : \operatorname{Vec}(A,k) \vdash v : \operatorname{Vec}(A,2).
\end{align*}
Thus substitution instantiates the dependent length, and conversion then replaces the resulting type by a computationally equal one.
[/example]
The structural rules also explain why type formers in later chapters can be stated locally. To introduce dependent products, for instance, it suffices to say what happens in a context $\Gamma, x:A \vdash B(x)\ \mathsf{type}$; the structural rules guarantee that the construction behaves under substitution and context extension.
## Dependent Judgments As A Foundation For Martin-Löf Type Theory
What has been gained by making judgments and contexts so explicit? We have replaced informal talk about variables and assumptions with a calculus in which every dependency is tracked by the same syntax that assigns types and computes terms.
The practical payoff is visible in the examples. Vectors indexed by length, finite sets indexed by a natural number, and predicates indexed by elements of a type all use the same pattern:
\begin{align*}
\Gamma, x:A \vdash B(x)\ \mathsf{type}.
\end{align*}
Terms of $B(a)$ are obtained by substitution, moved across definitional equalities by conversion, and moved across propositional equalities by transport.
The conceptual payoff is that Martin-Löf type theory supports mathematics where propositions, proofs, sets, and constructions live in one formal language. The next chapters build the main dependent type formers on top of this infrastructure: dependent products express universal quantification and dependent functions, dependent sums express existential quantification and structured pairs, and identity types express equality internal to the theory.
Once types may depend on terms, the ordinary function space becomes the dependent function type. The next chapter develops that idea systematically, showing how Pi types internalize universal quantification and how computation and proof interact in the dependent setting.
# 5. Pi Types and Universal Quantification
Dependent function types are the point where the simply typed function space becomes genuinely dependent. In earlier chapters, a function type $A \to B$ classified programs that take an input of type $A$ and return an output of a fixed type $B$. Here the codomain is allowed to vary with the input, so a term may take $a : A$ and return an element of a type $B(a)$ that depends on that particular $a$.
This chapter develops the formal rules for these types and then reads the same rules through propositions-as-types. The central observation is that a dependent function is also a proof of a universal statement: to prove $\forall a : A, B(a)$ is to give a method that, for an arbitrary $a : A$, constructs evidence for $B(a)$.
## Dependent Function Types and Their Rules
The first problem is how to type a function whose output type depends on its input. In ordinary function types, the target type is fixed before the argument is supplied. For families of types, such as a type $\operatorname{Vec}(A,n)$ of lists of length $n$, the result type may mention the argument itself.
[definition: Type Family Over A Type]
Let $\mathcal{U}$ be a universe and let $A : \mathcal{U}$. A type family over $A$ with values in $\mathcal{U}$ is a map
\begin{align*}
B : A \to \mathcal{U}, \qquad x \mapsto B(x).
\end{align*}
[/definition]
A type family supplies dependent codomains one fibre at a time, but the calculus still needs a single type whose terms choose elements from all fibres coherently. This is the role of the dependent function type: it binds the input variable in the codomain and turns the family into a function space.
[definition: Dependent Function Type]
Let $A$ be a type, and suppose that in the context $x : A$ there is a type $B(x)$. The dependent function type over $A$ with fibre $B$ is written
\begin{align*}
\prod_{x : A} B(x).
\end{align*}
Its terms are dependent functions which send each $x : A$ to a term of type $B(x)$.
[/definition]
The symbol $\prod$ is used because the type behaves like a product indexed by all inputs $x : A$: an element chooses one element from every fibre $B(x)$. The next question is when this expression is a legitimate type rather than just notation.
[explanation: Pi Type Formation Rule]
In an ambient context $\Gamma$, if $\Gamma \vdash A : \mathcal{U}$ and $\Gamma, x : A \vdash B(x) : \mathcal{U}$, then the type theory has the judgement
\begin{align*}
\Gamma \vdash \prod_{x : A} B(x) : \mathcal{U}.
\end{align*}
[/explanation]
This is a primitive rule of the calculus, not a proposition proved inside the calculus. The hypothesis $x : A \vdash B(x) : \mathcal{U}$ is essential: without it, the expression $B(a)$ used after substitution may not be meaningful for an arbitrary $a : A$. The rule also does not assert that the product is nonempty; it only says that the dependent function type is well-formed. The obstruction it removes is syntactic rather than logical: a family such as $\operatorname{Vec}(C,n)$ cannot be used as a codomain until the index variable $n : \mathbb{N}$ has been bound. Once the type has been formed, the next task is to construct its inhabitants.
[explanation: Pi Type Introduction Rule]
In an ambient context $\Gamma$, if $\Gamma, x : A \vdash b(x) : B(x)$, then the type theory has the judgement
\begin{align*}
\Gamma \vdash \lambda x. b(x) : \prod_{x : A} B(x).
\end{align*}
[/explanation]
This is the primitive rule that packages a construction depending on a variable into a dependent function. The extended-context hypothesis is necessary because the body must be typable for an arbitrary input, not merely for a selected example of $A$. The rule does not permit the body to depend on undeclared data, and capture-avoiding binding matters because $x$ is local to the abstraction. Without this rule, a family of terms $b(x)$ would remain a meta-level recipe rather than an object of the type theory. To use such a packaged construction in a proof or program, the calculus needs a rule that extracts the component corresponding to a chosen input.
[explanation: Pi Type Elimination Rule]
In an ambient context $\Gamma$, if
\begin{align*}
\Gamma \vdash f : \prod_{x : A} B(x)
\end{align*}
and $\Gamma \vdash a : A$, then the type theory has the judgement
\begin{align*}
\Gamma \vdash f(a) : B(a).
\end{align*}
[/explanation]
The elimination rule is where dependency is felt most strongly. The hypothesis $a : A$ is required because application also substitutes $a$ into the codomain; if the argument had the wrong type, the expression $B(a)$ would not be a valid fibre. The rule does not say that all fibres $B(x)$ are the same, and it does not erase the index after application. This is exactly why indexed data types are useful in proof assistants: applying a length-preserving vector map at $3$ returns a value whose type still remembers the length $3$.
[example: Polymorphic Identity Function]
Let $\mathcal{U}$ be a universe of small types. We construct a term of
\begin{align*}
\prod_{A : \mathcal{U}} A \to A.
\end{align*}
Assume first that $A : \mathcal{U}$. To build an element of $A \to A$, assume $x : A$. Under this assumption, the variable rule gives
\begin{align*}
x : A.
\end{align*}
Therefore Pi introduction for the constant family with value $A$ gives
\begin{align*}
\lambda x.x : A \to A.
\end{align*}
Since this construction was made for an arbitrary $A : \mathcal{U}$, another use of Pi introduction gives
\begin{align*}
\lambda A.\lambda x.x : \prod_{A : \mathcal{U}} A \to A.
\end{align*}
Now let $C : \mathcal{U}$. Applying the dependent function to $C$ gives
\begin{align*}
(\lambda A.\lambda x.x)(C) = \lambda x.x : C \to C.
\end{align*}
If $c : C$, applying once more gives
\begin{align*}
((\lambda A.\lambda x.x)(C))(c) = (\lambda x.x)(c) = c : C.
\end{align*}
Thus the single dependent function specializes at each type $C$ to the ordinary identity map on that type.
[/example]
The rules above describe how dependent functions are formed, introduced, and used. To make the calculus computational, we also need an equation describing what happens when an introduced function is immediately eliminated.
[remark: Pi Type Beta Computation Rule]
The Pi beta rule is a primitive computation rule of the dependent function type. If
\begin{align*}
\Gamma \vdash A\ \mathsf{type}, \qquad \Gamma,x:A \vdash B(x)\ \mathsf{type}, \qquad \Gamma,x:A \vdash b(x):B(x),
\end{align*}
and $\Gamma \vdash a:A$, then Pi introduction gives
\begin{align*}
\Gamma \vdash \lambda x.\,b(x):\prod_{x:A}B(x),
\end{align*}
and immediate application computes judgmentally:
\begin{align*}
\Gamma \vdash (\lambda x.\,b(x))(a)\equiv b[a/x]:B[a/x].
\end{align*}
Here $b[a/x]$ and $B[a/x]$ denote capture-avoiding substitution of $a$ for $x$.
[/remark]
This is the dependent version of beta reduction. The assumption that $b(x)$ is typed in the context $x : A$ is needed so that substitution produces a term in the specialised fibre $B(a)$, not merely a syntactically similar expression. If a candidate argument $c$ has some unrelated type $C$ rather than type $A$, then $B(c)$ is not a valid fibre of the family $B : A \to \mathcal U$, so the expression $(\lambda x.b(x))(c)$ has no type. Similarly, if $B(x)$ was not first formed under the assumption $x : A$, then after replacing $x$ by $a$ there may be no well-formed type $B(a)$ in which the result could live. The rule does not say that every dependent function is definitionally a lambda abstraction; that stronger direction is the eta or uniqueness principle discussed below. It also does not identify values at different fibres $B(a)$ and $B(a')$, which may be distinct types. In logic, the rule says that proof by universal introduction followed by [universal instantiation](/theorems/4637) reduces to the original proof at the chosen instance.
[example: Applying A Length-Preserving Map]
Suppose $A$ and $C$ are types, and suppose $\operatorname{Vec}(A,n)$ is the type of vectors of length $n$ with entries in $A$. Let
\begin{align*}
F : \prod_{n : \mathbb N} \bigl(\operatorname{Vec}(A,n) \to \operatorname{Vec}(C,n)\bigr).
\end{align*}
We compute the type of applying $F$ at the length $3$. Since $3 : \mathbb N$, Pi elimination instantiates the fibre
\begin{align*}
n \mapsto \operatorname{Vec}(A,n) \to \operatorname{Vec}(C,n)
\end{align*}
at $n := 3$, giving
\begin{align*}
F(3) : \operatorname{Vec}(A,3) \to \operatorname{Vec}(C,3).
\end{align*}
Now if
\begin{align*}
v : \operatorname{Vec}(A,3),
\end{align*}
ordinary function application, equivalently Pi elimination for the constant codomain after the index has been fixed, gives
\begin{align*}
F(3)(v) : \operatorname{Vec}(C,3).
\end{align*}
Thus the same index $3$ used to type the input vector is substituted into the output fibre, so the result type records that the output vector has length $3$ rather than merely being some vector over $C$.
[/example]
The length-preserving example illustrates computation at a specific index: after applying $F$ to $3$, beta-style computation and ordinary application explain the specialised result type. It leaves a different question unresolved. When two dependent functions are compared, it is not enough to know how a freshly introduced lambda abstraction computes; we also need to know whether an arbitrary function $f$ is recovered from the pointwise rule $x \mapsto f(x)$. The eta principle is the rule that supplies this recovery. Without it, or without a suitable function-extensionality principle, a theory may distinguish $f$ from $\lambda x.f(x)$ even though every application has the same result.
[remark: Judgemental Eta Rule for Pi Types]
The judgemental eta rule for Pi types is an optional equality rule of the ambient theory rather than a derived theorem. When it is included, any term
\begin{align*}
\Gamma \vdash f : \prod_{x:A}B(x)
\end{align*}
satisfies the judgmental equality
\begin{align*}
\Gamma \vdash f \equiv \lambda x.\,f(x) : \prod_{x:A}B(x).
\end{align*}
[/remark]
This rule explains when dependent functions can be reasoned about pointwise judgmentally. The hypothesis on the equality theory is essential, because beta computation alone only controls functions already written as lambda abstractions. In intensional Martin-Löf type theory presented without judgemental eta for Pi types, as in proof-assistant kernels that keep conversion deliberately small, the terms $f$ and $\lambda x.f(x)$ need not be definitionally equal; the type checker will not reduce one to the other unless eta is part of the conversion relation. The corresponding equality may still be available propositionally, for example from function extensionality, but then it must be invoked as proof data rather than silently computed. The principle does not identify functions whose values are merely related by some weaker relation; it identifies functions whose applications agree in the equality notion available in the theory.
## Universal Quantification Under Propositions As Types
The next question is why the same type former should also represent the logical quantifier $\forall$. Under propositions-as-types, propositions are types and proofs are terms. A universally quantified proposition therefore needs a type whose inhabitants provide evidence at every possible input.
[definition: Universal Quantification As A Pi Type]
Let $A$ be a type and let $P(x)$ be a proposition, represented as a type, depending on $x : A$. The universal proposition
\begin{align*}
\forall x : A,\ P(x)
\end{align*}
is represented by the Pi type
\begin{align*}
\prod_{x : A} P(x).
\end{align*}
[/definition]
The definition is intentionally not a new rule of syntax. It is an interpretation: the same formation and introduction rules become the rules for proving universal statements, while elimination becomes the act of instantiating a universal proof at a chosen witness.
[quotetheorem:4638]
[citeproof:4638/type-theory]
The word constructive matters here. The hypothesis $p : \prod_{x : A}P(x)$ is stronger than the absence of a counterexample, because it contains a method that works uniformly for every input. The hypothesis $a : A$ is also necessary: without a term $a : A$, the specialised proposition $P(a)$ is not a valid instance of the family being quantified over. Conversely, even when $a : A$ is available, the instance $P(a)$ is not inhabited by the instantiation rule alone; without the universal proof $p$, there is no constructive source for the term $p(a)$. The theorem does not assert that such a universal proof exists, and it does not allow instantiation at objects outside $A$. This is the Curry-Howard reading used by proof assistants: applying a universally quantified proof is the same operation as applying a dependent function.
[example: Every Element Of A Singleton Type Is Its Centre]
Let $A$ be a type and let $a_0 : A$. Define $\operatorname{Sing}(A,a_0)$ to be the dependent pair type whose elements have the form $s=(a,p)$ with $a : A$ and $p : a = a_0$. We show that the type
\begin{align*}
\prod_{s : \operatorname{Sing}(A,a_0)} \pi_1(s) = a_0
\end{align*}
is inhabited.
Assume $s : \operatorname{Sing}(A,a_0)$. By the definition of $\operatorname{Sing}(A,a_0)$, write $s=(a,p)$ with $a : A$ and $p : a=a_0$. The first projection computes to
\begin{align*}
\pi_1(s)=\pi_1(a,p)=a.
\end{align*}
The second projection computes to
\begin{align*}
\pi_2(s)=\pi_2(a,p)=p.
\end{align*}
Since $p : a=a_0$ and $\pi_1(s)=a$ by the projection computation above, the term $\pi_2(s)$ has type
\begin{align*}
\pi_2(s) : \pi_1(s)=a_0.
\end{align*}
Thus, under the assumption $s : \operatorname{Sing}(A,a_0)$, we have constructed a term $\pi_2(s) : \pi_1(s)=a_0$. Pi introduction therefore gives
\begin{align*}
\lambda s.\pi_2(s) : \prod_{s : \operatorname{Sing}(A,a_0)} \pi_1(s)=a_0.
\end{align*}
Instantiating this universal proof at a concrete singleton element $(a,p)$ returns its stored equality proof:
\begin{align*}
(\lambda s.\pi_2(s))(a,p)=\pi_2(a,p)=p : a=a_0.
\end{align*}
So every element of the singleton is equal to its centre because membership in the singleton already carries exactly that equality proof.
[/example]
This example shows the computational content of universal proof. Instantiating the universal proof at a particular singleton element $(a,p)$ returns $p$, the stored equality proof.
[remark: Quantification Over Types]
When the domain of a Pi type is a universe, the same notation expresses polymorphism. A term of type $\prod_{A : \mathcal U} A \to A$ is both a polymorphic program and, under propositions-as-types, a universally quantified construction over all small types in $\mathcal U$.
[/remark]
The universe annotation prevents an unrestricted quantification over all types from becoming syntactically ill behaved. Later chapters use universes to manage this stratification more systematically.
## Non-Dependent Function Types As Constant Pi Types
The final question in this chapter is how the old function type $A \to B$ fits into the dependent system. It should not be a separate primitive if the codomain happens not to depend on the argument. Pi types absorb ordinary function types by taking a constant family.
[definition: Constant Type Family]
Let $\mathcal{U}$ be a universe and let $A : \mathcal{U}$ and $B : \mathcal{U}$. The constant type family over $A$ with value $B$ is the map
\begin{align*}
A \to \mathcal{U}, \qquad x \mapsto B.
\end{align*}
[/definition]
The constant family gives the dependent notation $\prod_{x : A}B$, but students coming from the simply typed calculus expect the arrow notation $A \to B$. The next definition fixes the translation between these notations so the earlier function type becomes a named special case of Pi type formation.
[definition: Non-Dependent Function Type]
Let $A$ and $B$ be types. The non-dependent function type from $A$ to $B$ is written
\begin{align*}
A \to B
\end{align*}
when the codomain $B$ does not depend on the input variable from $A$.
[/definition]
Equivalently, this notation abbreviates the Pi type
\begin{align*}
\prod_{x : A} B,
\end{align*}
where $x$ does not occur free in $B$. The rules for $A \to B$ are therefore special cases of the Pi rules. Lambda abstraction is Pi introduction for a constant family, function application is Pi elimination, and the beta and eta laws are inherited directly.
[example: Ordinary Identity As A Constant-Fibre Pi Term]
For a fixed type $A$, the ordinary function type $A \to A$ is the constant-fibre Pi type
\begin{align*}
A \to A \;=\; \prod_{x : A} A,
\end{align*}
where the displayed $A$ in the codomain does not depend on $x$. To construct an inhabitant, assume $x : A$. The variable rule gives the judgement
\begin{align*}
x : A.
\end{align*}
Since the fibre at every input is again $A$, this judgement has exactly the required fibre type. Pi introduction therefore gives
\begin{align*}
\lambda x.x : \prod_{x : A} A.
\end{align*}
Using the abbreviation for the constant-fibre Pi type, this is the same typing judgement as
\begin{align*}
\lambda x.x : A \to A.
\end{align*}
Now let $a : A$. Pi elimination gives
\begin{align*}
(\lambda x.x)(a) : A.
\end{align*}
By the *Pi Type Computation Rule*, applying the lambda abstraction substitutes $a$ for the bound variable $x$ in the body $x$, so
\begin{align*}
(\lambda x.x)(a)=a : A.
\end{align*}
Thus the constant-fibre Pi term recovers the ordinary identity map on $A$. The same term is not automatically an inhabitant of $\prod_{x : A} B(x)$ for an arbitrary family $B$, because the variable rule only gives $x : A$, while the required body type would be $B(x)$.
[/example]
This special case recovers familiar functions, but it is useful to record the recovery as a metatheoretic comparison because it prevents the course from treating $\to$ and $\prod$ as unrelated primitives. The theorem below packages the formation, introduction, elimination, beta, and eta rules into a single comparison between rule schemes.
[quotetheorem:9628]
[citeproof:9628]
The independence of $B$ from $x$ is the decisive hypothesis. If $B$ genuinely depends on $x$, then application changes the codomain from $B$ to the specialised fibre $B(a)$, so the ordinary arrow rules no longer describe the typing behaviour. The theorem also does not say that dependent functions are reducible to ordinary functions; it only says that ordinary functions are recovered as the constant-fibre case. This special case explains why implication is a non-dependent universal quantifier over proof objects. If $P$ and $Q$ are propositions and $Q$ does not depend on a proof of $P$, then $P \to Q$ is the type of functions taking evidence for $P$ to evidence for $Q$.
## Connections With Semantics And Proof Engineering
The formal rules explain how Pi terms are checked, but they leave a practical question: why does this single type former appear both in semantic models and in the everyday mechanics of proof assistants? The answer is that the same dependency must be tracked at three levels: a semantic family has fibres over a base, a proof term must work uniformly over those fibres, and an implementation must reconstruct enough hidden arguments to apply the term at the intended fibre.
In categorical semantics, a dependent type over $A$ is treated like a family over a base object, and the Pi type corresponds to a dependent product. It packages compatible choices from all fibres and is characterised by a universal property. This is the semantic reason for the product notation and for the close relationship between substitution and application.
In set-based intuition, an element of $\prod_{x : A}B(x)$ resembles a choice function selecting an element of each set $B(x)$. Type theory makes this selection constructive: an inhabitant must be an explicit rule producing the selected element, not merely an assertion that selections exist.
In proof engineering, the same structure appears whenever arguments are implicit. Suppose a theorem has type
\begin{align*}
\prod_{A : \mathcal U}\prod_{x : A} P(A,x) \to Q(A,x).
\end{align*}
If the current goal is $Q(\mathbb N,3)$ and the context contains $p : P(\mathbb N,3)$, a proof assistant can infer the hidden arguments $A := \mathbb N$ and $x := 3$ from the expected target and from the type of $p$. The resulting elaborated term is ordinary Pi elimination applied several times:
\begin{align*}
t(\mathbb N)(3)(p) : Q(\mathbb N,3).
\end{align*}
Implicit-argument inference is therefore not an extra logical rule. It is an implementation layer that inserts the arguments required by the dependent-function type.
The chapter therefore ends where the earlier simply typed theory began, but with a stronger explanation. Ordinary functions are constant dependent functions, universal quantification is dependent function space over propositions, and computation is controlled by the beta and eta principles for Pi types.
Dependent function types clarify how ordinary functions, universal quantifiers, and proof terms fit into one formal pattern. The natural companion is the dependent pair type, which generalizes products and gives a way to package data together with evidence or structure that depends on it.
# 6. Sigma Types, Existentials, and Structured Objects
Dependent pair types extend the product types from Chapter 3 in the dependent setting introduced by Chapter 4, complementing the dependent function types of Chapter 5 by allowing the type of the second component to depend on the value of the first. The prerequisites are ordinary product types, basic identity types, and the idea of a type family $B(x)$ indexed by terms $x:A$. This is the type-theoretic form of a variable-length specification: choose an object, then choose data or evidence whose type is determined by that object. The chapter develops the formal rules for Sigma types, reads them as constructive existential quantifiers, and then uses them to package records, subtypes, fibers, and algebraic structures.
## Dependent Pair Formation and Computation
Ordinary products $A \times B$ solve the problem of storing two independent pieces of data. Many mathematical objects are not built from independent choices: after choosing a natural number $n$, the type of proofs of $n < k$ depends on $n$; after choosing a carrier type $G$, the type of binary operations on $G$ depends on $G$; after choosing $x : A$, the type of proofs that $f(x)=y$ depends on $x$. Sigma types are the primitive dependent product-like construction for such data.
[definition: Sigma Type]
Let $\Gamma$ be a context, let $A\;\mathrm{type}$ in context $\Gamma$, and let $B(x)\;\mathrm{type}$ in context $\Gamma, x:A$. The Sigma type is written
\begin{align*}
\Sigma_{x:A} B(x)\;\mathrm{type}
\end{align*}
in context $\Gamma$.
[/definition]
These notes also use the notation $\sum_{x:A}B(x)$ for the same dependent sum type; the symbols $\Sigma_{x:A}B(x)$ and $\sum_{x:A}B(x)$ are interchangeable here.
The notation should be read as the type of pairs whose first component is $x:A$ and whose second component lies in the fiber $B(x)$. When $B$ is constant in $x$, this specializes to the ordinary product type $A \times B$. This sets up the formation rule; the next question is how a term of such a type is introduced.
[definition: Sigma Pair]
Given $a:A$ and $b:B(a)$, the pair constructor forms
\begin{align*}
\frac{a:A \quad b:B(a)}{(a,b):\Sigma_{x:A}B(x)}.
\end{align*}
[/definition]
Pairing is the introduction rule. The main new feature compared with ordinary products is that the second component cannot be typed until the first component has been chosen. Once pairs can be formed, we need operations that recover the stored data without losing the dependency between the two components.
[definition: Sigma Projections]
For $p:\Sigma_{x:A}B(x)$, the first projection is
\begin{align*}
\pi_1 : \Sigma_{x:A}B(x)\to A, \qquad \pi_1(p):A,
\end{align*}
and the second projection is
\begin{align*}
\pi_2 : \Pi_{p:\Sigma_{x:A}B(x)} B(\pi_1(p)), \qquad \pi_2(p):B(\pi_1(p)).
\end{align*}
[/definition]
The type of $\pi_2(p)$ records the dependency. It is not merely an element of some fixed type $B$; it belongs to the fiber selected by the first projection. Having introduced projections, we need their computation laws, since these laws are what make projections usable after a pair has been constructed.
[remark: Sigma Projection Computation Rules]
For every $a:A$ and every $b:B(a)$, the introduced dependent pair $(a,b):\Sigma_{x:A}B(x)$ computes under projection as
\begin{align*}
\pi_1((a,b)) \equiv a : A
\end{align*}
and
\begin{align*}
\pi_2((a,b)) \equiv b : B(a).
\end{align*}
In a theory with identity types, these judgmental equalities also give propositional equalities $\pi_1((a,b))=a$ and $\pi_2((a,b))=b$ by reflexivity.
[/remark]
These beta rules say that projections recover the data used to build a dependent pair. The hypotheses matter: $b$ must have type $B(a)$, because if $p:\Sigma_{x:A}B(x)$ then $\pi_2(p)$ has type $B(\pi_1(p))$, not a fixed type independent of $p$. For example, from $h:\Sigma_{n:\mathbb N}(n<k)$, the second projection is a proof of $\pi_1(h)<k$; replacing it by a proof of some unrelated inequality would not type-check. The beta rules do not say that every pair is determined by its projections; they only describe what happens immediately after constructing a pair. The converse question is whether the projections determine the original pair, and this is exactly the obstruction addressed by the eta principle.
[quotetheorem:9630]
[citeproof:9630]
The eta principle explains why dependent pairs behave like structured objects rather than opaque tokens. Without eta, two terms with the same first and second projections could still fail to be identified as the same package, so projection data alone would not support record-like reasoning. The theorem does not say that eta is always judgmental: in some intensional theories, rewriting $p$ to $(\pi_1(p),\pi_2(p))$ requires an explicit propositional equality rather than automatic computation. For a concrete failure mode, suppose a definitional-equality checker reduces projections but has no judgmental eta rule for Sigma types. Then $\pi_1(p)$ and $\pi_1((\pi_1(p),\pi_2(p)))$ reduce to the same term, and similarly for the second projection, but the terms $p$ and $(\pi_1(p),\pi_2(p))$ need not be accepted as interchangeable during type checking. This distinction matters later when record expressions are normalized, because judgmental eta can simplify terms during type checking while propositional eta must be transported through by proof.
[example: Bounded Natural Numbers]
Fix $k:\mathbb N$. The type of natural numbers bounded by $k$ is
\begin{align*}
\Sigma_{n:\mathbb N}(n<k).
\end{align*}
By the Sigma-pair rule, to give an element of this type one must first give a natural number $n:\mathbb N$, and then give a term of the proposition-indexed type $n<k$. Thus a typical element has the form $(n,p)$ with $p:n<k$.
For $k=3$, the type becomes
\begin{align*}
\Sigma_{n:\mathbb N}(n<3).
\end{align*}
Since $\mathbb N$ includes $0$, the natural numbers strictly below $3$ are exactly $0$, $1$, and $2$: we have proof terms $p_0:0<3$, $p_1:1<3$, and $p_2:2<3$, while $3<3$ is false and every $n$ with $3\le n$ is not strictly below $3$. Therefore the displayed Sigma type has elements of the forms
\begin{align*}
(0,p_0):\Sigma_{n:\mathbb N}(n<3),
\end{align*}
\begin{align*}
(1,p_1):\Sigma_{n:\mathbb N}(n<3),
\end{align*}
and
\begin{align*}
(2,p_2):\Sigma_{n:\mathbb N}(n<3).
\end{align*}
The first component records the bounded number, and the second component records the certificate that this particular number satisfies the bound.
[/example]
This example shows why the dependency is not decoration. The type of the second component changes with the chosen natural number, since a proof of $1<3$ and a proof of $2<3$ inhabit different proposition-indexed types.
## Elimination and Pattern Matching for Sigma Types
After introducing a way to build dependent pairs, the next question is how to use them. Since the second component depends on the first, an eliminator must expose both components together and preserve the dependency in the resulting type.
[quotetheorem:9631]
[citeproof:9631]
In programming notation, the eliminator is pattern matching: to define $f(p)$, write $p$ as $(x,y)$ and define the result in terms of $x$ and $y$. The motive must be allowed to depend on the whole pair $p$, not merely on its first projection, because the desired output may mention both the witness and the evidence stored with it. A sharp example is the target
\begin{align*}
C(p) := B(\pi_1(p)).
\end{align*}
The function $f(p):= \pi_2(p)$ is typed by Sigma elimination: in the branch for $(x,y)$, the branch term is $y:B(x)$, which is exactly $C((x,y))$. If the branch target were treated as a fixed type independent of the matched pair, the same branch would have to return $y$ in a type not known to be $B(x)$, and the definition would be ill-typed. The theorem does not introduce any new way to construct Sigma elements; it only says that every use of such an element may be reduced to the case of an introduced pair. In proof notation, it is the rule that permits destructing a dependent pair while keeping the witness and its evidence in scope.
[example: Fibers of a Function]
Let $f:A\to B$ and let $y:B$. The fiber of $f$ over $y$ is the dependent pair type
\begin{align*}
\operatorname{fib}_f(y) := \Sigma_{x:A}(f(x)=y).
\end{align*}
Thus, to form an element, one first chooses $x:A$ and then chooses a proof term $p:f(x)=y$; the resulting element is
\begin{align*}
(x,p):\operatorname{fib}_f(y).
\end{align*}
For this introduced pair, the projections recover exactly the two pieces:
\begin{align*}
\pi_1((x,p)) = x.
\end{align*}
\begin{align*}
\pi_2((x,p)) = p.
\end{align*}
The second equality is typed as $\pi_2((x,p)):f(\pi_1((x,p)))=y$, and using $\pi_1((x,p))=x$ this is the same displayed evidence $p:f(x)=y$.
To define a function out of $\operatorname{fib}_f(y)$, let $C(q)$ be the desired output type for $q:\operatorname{fib}_f(y)$. It is enough to give, for arbitrary $x:A$ and $p:f(x)=y$, a term
\begin{align*}
g(x,p):C((x,p)).
\end{align*}
The induced function $F:\Pi_{q:\operatorname{fib}_f(y)}C(q)$ is determined on introduced fiber elements by
\begin{align*}
F((x,p))=g(x,p).
\end{align*}
So a point of the fiber is not merely a candidate preimage; it is a candidate together with the equality certificate needed to use it as [lying over](/theorems/2876) $y$.
[/example]
Fibers are a central example because they combine computation with equality evidence. The first projection extracts the candidate preimage, while the second projection is precisely the certificate that the candidate lies over the chosen point.
This is the type-theoretic version of a total space of a fibration or indexed family. A family $B:A\to \mathrm{Type}$ assigns a fiber $B(x)$ over each base point $x:A$, and $\Sigma_{x:A}B(x)$ collects all fibers into one total type equipped with the projection $\pi_1:\Sigma_{x:A}B(x)\to A$. In geometry and moduli problems the same pattern appears when objects are organized over a base of parameters: a point of the total space is a parameter together with structure [lying over](/theorems/2944) that parameter.
[remark: Pattern Matching Syntax]
Many proof assistants write the eliminator as a pattern match such as `match p with | (x, y) => t`. The type checker inserts the dependency information: inside the branch, $y$ has type $B(x)$, and the branch target is $C((x,y))$.
[/remark]
This syntax hides the eliminator, but it does not change the rule. Sigma elimination is valid because the only way to construct a Sigma element is by pairing.
## Sigma Types as Constructive Existentials
The Curry-Howard reading of products as conjunction suggests asking what type corresponds to existential quantification. In constructive mathematics, proving that there exists an $x:A$ satisfying $P(x)$ requires more than asserting non-emptiness: it requires a witness and evidence that the witness satisfies the predicate.
[definition: Constructive Existential]
For a type $A$ and a family of propositions $P(x)$ indexed by $x:A$, the constructive existential proposition is represented by
\begin{align*}
\exists x:A,\;P(x) \quad := \quad \Sigma_{x:A}P(x).
\end{align*}
[/definition]
The definition identifies existential proof with dependent pair construction. A proof of $\exists x:A, P(x)$ contains a witness $a:A$ and a proof of $P(a)$. Having represented existence as data, we need the corresponding elimination rule: how to use an existential hypothesis in a proof of another proposition.
[quotetheorem:4640]
[citeproof:4640/type-theory]
This is the familiar existential-elimination rule, but the computational content is explicit. The hypothesis is a data-bearing Sigma element, so destructing it gives both the witness and the proof that the witness has the desired property. The rule would be invalid for a merely truncated existential if the target $Q$ asked for data depending on the hidden witness, because truncation deliberately prevents extracting that witness. Thus this theorem does not say that all forms of existence support computation; it applies to constructive existence represented by dependent pairs. This distinction is used below when subtypes and fibers are treated as objects carrying usable certificates.
[example: Extracting Data from an Existence Proof]
Suppose $h:\Sigma_{n:\mathbb N}(n<k)$. By the Sigma projection rules, the first projection has type
\begin{align*}
\pi_1(h):\mathbb N.
\end{align*}
The second projection is typed in the fiber selected by the first projection, so
\begin{align*}
\pi_2(h):\pi_1(h)<k.
\end{align*}
Thus destructing $h$ gives two usable pieces of data: a candidate number $\pi_1(h)$ and a certificate $\pi_2(h)$ that this same candidate is below $k$.
Equivalently, if $h$ is introduced as a pair $(n,p)$ with $n:\mathbb N$ and $p:n<k$, then the Sigma beta rules give
\begin{align*}
\pi_1((n,p))=n.
\end{align*}
and
\begin{align*}
\pi_2((n,p))=p.
\end{align*}
In this case the extracted number is exactly $n$, and the extracted proof is exactly the certificate $p:n<k$. This is why a constructive existence proof can be used in later constructions, such as a bounded search procedure: projection supplies both the natural number to inspect and the proof that it lies among the candidates below $k$.
[/example]
This constructive reading differs from classical existential reasoning when the witness is not computationally available. In type theory, a closed term of $\Sigma_{x:A}P(x)$ carries enough information to compute the witness by projection.
[remark: Propositional Truncation]
Some type theories distinguish the data-bearing Sigma type from a mere existential proposition using propositional truncation. The Sigma type remembers the witness, while the truncated existential records only that some witness exists. This chapter uses Sigma types for the constructive, witness-carrying existential.
[/remark]
The distinction matters when uniqueness or proof irrelevance is not assumed. Two Sigma elements may have different witnesses, or the same witness with different evidence, depending on the equality theory of the proposition family.
## Records, Subtypes, and Mathematical Structures
Mathematical objects are often packages: a carrier together with operations, constants, relations, and laws. The obstacle is that ordinary products can only combine fields whose types are already fixed, while a later field of a structure usually mentions earlier fields. For instance, the multiplication of a group has type $G\to G\to G$ only after the carrier $G$ has been chosen, and the associativity law can only be typed after the multiplication has been chosen. Sigma types provide a uniform way to encode such packages by adding one dependent component at a time.
[definition: Subtype]
Given a type $A$ and a predicate $P:A\to \mathrm{Type}$, the subtype of elements of $A$ satisfying $P$ is
\begin{align*}
\{x:A\mid P(x)\} := \Sigma_{x:A}P(x).
\end{align*}
[/definition]
A subtype is therefore not a subset in an external set-theoretic universe; it is a type of elements equipped with certificates. Forgetting the certificate is the first projection.
[example: Positive Natural Numbers Below a Bound]
Fix $k:\mathbb N$ and consider
\begin{align*}
\Sigma_{n:\mathbb N}\big((1\le n)\times(n<k)\big).
\end{align*}
To construct an element of this Sigma type, one first chooses a natural number $n:\mathbb N$. After that choice, the second component must have type $(1\le n)\times(n<k)$, so it is an ordinary pair of proof terms
\begin{align*}
q=(u,v):(1\le n)\times(n<k),
\end{align*}
where $u:1\le n$ and $v:n<k$. Thus an element has the form
\begin{align*}
(n,(u,v)):\Sigma_{n:\mathbb N}\big((1\le n)\times(n<k)\big).
\end{align*}
For such an introduced element, the first Sigma projection returns the underlying natural number:
\begin{align*}
\pi_1((n,(u,v)))=n.
\end{align*}
The second Sigma projection returns the bundled evidence:
\begin{align*}
\pi_2((n,(u,v)))=(u,v):(1\le n)\times(n<k).
\end{align*}
Projecting this ordinary product evidence then gives separately the positivity proof $u:1\le n$ and the bound proof $v:n<k$. So projection to $\mathbb N$ forgets the certificates, while the second component retains exactly the two proofs needed to use $n$ as a positive natural number below $k$.
[/example]
This subtype example has a fixed list of evidence fields attached to a chosen element. To encode mathematical structures, later fields must depend on earlier fields, so we need the more general view of a record as an iterated dependent sum.
[definition: Iterated Sigma Record]
A record with fields $x:A$, $y:B(x)$, and $z:C(x,y)$ is represented by the iterated Sigma type
\begin{align*}
\Sigma_{x:A}\Sigma_{y:B(x)}C(x,y).
\end{align*}
[/definition]
The nesting order records the dependency order of the fields. Choosing $x$ determines the type of $y$, and choosing both $x$ and $y$ determines the type of $z$. A concrete problem appears when record-manipulation code changes parentheses: a term of type $\Sigma_{x:A}\Sigma_{y:B(x)}C(x,y)$ exposes fields as $(x,(y,z))$, while a term of type $\Sigma_{p:\Sigma_{x:A}B(x)} C(\pi_1(p),\pi_2(p))$ exposes them as $((x,y),z)$. Without an equivalence between these shapes, repackaging a record would require ad hoc coercions every time later fields refer to earlier ones. Once records are nested, we need to know when two different parenthesizations describe the same structured information.
[quotetheorem:9632]
[citeproof:9632]
This equivalence justifies changing parentheses in iterated dependent sums when the dependencies are transported along projections. The hypotheses specify the exact dependency pattern: $C$ may depend on both $x$ and $y$, so the right-hand side must feed it $\pi_1(p)$ and $\pi_2(p)$ rather than treating it as a constant third field. The result does not say that the two parenthesized record types are definitionally equal in every intensional theory; it supplies maps and inverse laws, possibly only up to propositional equality. This matters in formalization because moving between the two encodings may require explicit transports unless the proof assistant has record eta or definitional reassociation built in. It is the formal reason records can be viewed either as nested pairs or as a single package whose earlier fields are available to later ones.
[example: The Type of Groups]
Write $m(x,y)$ for the value of the curried operation $m:G\to G\to G$ at $x:G$ and $y:G$. A group record is built by choosing the fields in dependency order: first a carrier $G:\mathrm{Type}$, then a multiplication $m:G\to G\to G$, then an identity element $e:G$, then an inverse operation $i:G\to G$, and finally a term proving the group laws for those previously chosen data.
The law component may be written as an ordinary product of proof types. For example, the associativity law has type
\begin{align*}
\Pi_{x:G}\Pi_{y:G}\Pi_{z:G}\big(m(m(x,y),z)=m(x,m(y,z))\big).
\end{align*}
The left identity law has type
\begin{align*}
\Pi_{x:G}\big(m(e,x)=x\big).
\end{align*}
The right identity law has type
\begin{align*}
\Pi_{x:G}\big(m(x,e)=x\big).
\end{align*}
The left inverse law has type
\begin{align*}
\Pi_{x:G}\big(m(i(x),x)=e\big).
\end{align*}
The right inverse law has type
\begin{align*}
\Pi_{x:G}\big(m(x,i(x))=e\big).
\end{align*}
Thus $\operatorname{GroupLaws}(G,m,e,i)$ denotes the product of these five proof types, and the type of groups is the iterated Sigma type
\begin{align*}
\Sigma_{G:\mathrm{Type}}\;\Sigma_{m:G\to G\to G}\;\Sigma_{e:G}\;\Sigma_{i:G\to G}\;\operatorname{GroupLaws}(G,m,e,i).
\end{align*}
If $L:\operatorname{GroupLaws}(G,m,e,i)$, then the corresponding group package is the nested dependent pair
\begin{align*}
(G,(m,(e,(i,L)))):\Sigma_{G:\mathrm{Type}}\;\Sigma_{m:G\to G\to G}\;\Sigma_{e:G}\;\Sigma_{i:G\to G}\;\operatorname{GroupLaws}(G,m,e,i).
\end{align*}
Each later field is typed using the earlier fields: $m$ can be typed only after $G$ is chosen, and $L$ can be typed only after $G$, $m$, $e$, and $i$ are all available. An element of the displayed Sigma type is therefore not just a tuple of five unrelated objects; it is a carrier together with operations and law proofs whose types are determined by the preceding choices.
[/example]
This example also shows why records are more than tuples. The type of the multiplication field depends on the carrier, the type of the identity depends on the same carrier, and the laws mention all previous fields.
[remark: Field Names and Projections]
Record syntax in proof assistants gives names to projections, such as `Group.carrier` or `Group.mul`. These are elaborated forms of Sigma projections, often with additional support for implicit arguments and coercions.
[/remark]
The Sigma encoding is therefore the core mathematical idea beneath record notation. Records improve readability, while dependent sums supply the formation, introduction, elimination, and computation principles.
## Conceptual Summary
Sigma types are the dependent analogue of product types. They form pairs $(a,b)$ where the type of $b$ may depend on $a$, and their eliminator is the dependent pattern-matching rule for using such pairs.
Under Curry-Howard, $\Sigma_{x:A}P(x)$ is the constructive existential: a proof consists of a witness and a proof that the witness satisfies the predicate. In mathematical formalization, the same mechanism packages subtypes, fibers, bounded data, records, and structures such as groups. Iterated Sigma types explain why structured objects can be assembled field by field while preserving the dependencies among their components.
Dependent pairs show how mathematical structures can be assembled field by field while preserving their internal dependencies. The next chapter turns to identity types, where equality itself becomes part of the theory and reasoning shifts from external substitution to internal paths and eliminators.
# 7. Identity Types and Equality Reasoning
This chapter studies the identity type, the type-theoretic internal language of equality. It assumes the earlier material on dependent function types, dependent pair types, type families, and the distinction between judgemental formation rules and terms inhabiting types. Earlier chapters introduced dependent types as families of types varying over terms; identity types add the ability to speak, inside the theory, about when two terms of the same type are equal. The central point is that equality is not governed by arbitrary rewriting rules: it is introduced by reflexivity and eliminated by the rule $J$, from which the familiar operations of symmetry, transitivity, substitution, and congruence are derived.
The chapter also separates two notions that often look similar in informal mathematics. Definitional equality is the judgemental equality used by the type checker during computation, while propositional equality is a type whose inhabitants are proofs. This distinction controls what can be reduced by computation and what must be proved by an identity-type argument.
## Identity Types and the Eliminator J
How can a dependent type theory express the proposition that two terms are equal while still treating equality as a type with introduction and elimination rules? The identity type answers this by assigning, to two terms of the same type, a new type whose terms are proofs of their equality.
[definition: Identity Type]
Let $A$ be a type and let $a,b:A$. The identity type of $a$ and $b$ in $A$ is the type
\begin{align*}
\operatorname{Id}_A(a,b).
\end{align*}
A term $p:\operatorname{Id}_A(a,b)$ is called an identification, equality proof, or path from $a$ to $b$.
[/definition]
The notation records the ambient type because equality is typed: two terms may be compared only after they have been placed in a common type. To make this type usable as a logical proposition, we next need its canonical proof in the case where both endpoints are the same term.
[definition: Reflexivity]
For every type $A$ and every term $a:A$, there is a canonical term
\begin{align*}
\operatorname{refl}_a : \operatorname{Id}_A(a,a).
\end{align*}
[/definition]
Reflexivity is the only primitive constructor for identity proofs in intensional Martin-Löf type theory. The next problem is elimination: if every primitive equality proof comes from reflexivity, what principle allows us to use an arbitrary proof $p:\operatorname{Id}_A(a,x)$?
[definition: Identity Eliminator J]
Let $A$ be a type, let $a:A$, and let
\begin{align*}
C(x,p)\ \operatorname{type}
\end{align*}
be a type family depending on $x:A$ and $p:\operatorname{Id}_A(a,x)$. If
\begin{align*}
c:C(a,\operatorname{refl}_a),
\end{align*}
then for every $x:A$ and every $p:\operatorname{Id}_A(a,x)$ there is a term
\begin{align*}
J(c,x,p):C(x,p).
\end{align*}
The computation rule is
\begin{align*}
J(c,a,\operatorname{refl}_a) \equiv c.
\end{align*}
[/definition]
The eliminator $J$ is a primitive rule of the theory, not a theorem proved from earlier type formers. It says that to construct something depending on an equality proof $p:\operatorname{Id}_A(a,x)$, it is enough to handle the reflexive case where $x$ is $a$ and $p$ is $\operatorname{refl}_a$. The hypotheses are deliberately strong: the motive must be allowed to depend on both the endpoint $x$ and the path $p$, because many later constructions need to inspect the equality proof as part of the target type.
The rule also has important limitations. It does not say that every equality proof is judgementally equal to reflexivity, and it does not collapse all terms of $\operatorname{Id}_A(a,b)$ into a single proof. For example, in a theory extended with homotopical models, a type may have non-trivial paths even though $J$ remains valid; $J$ only says that constructions out of paths are determined by the reflexive case in the precise dependent-induction sense. If the reflexive datum $c:C(a,\operatorname{refl}_a)$ is unavailable, the rule gives no term at all: a proposed construction of $C(x,p)$ cannot start from an arbitrary non-reflexive equality proof without first reducing the problem to this base case.
The strength of $J$ is that it converts equality reasoning into a controlled induction principle. This motivates testing the rule on the familiar symmetry operation, where an equality proof must be reversed.
[example: Symmetry From J]
Fix a type $A$, terms $a,b:A$, and a path $p:\operatorname{Id}_A(a,b)$. We define the inverse path by identity elimination on $p$, with fixed left endpoint $a$ and motive
\begin{align*}
C(x,r)\coloneqq \operatorname{Id}_A(x,a)
\end{align*}
for $x:A$ and $r:\operatorname{Id}_A(a,x)$. In the reflexive case, the endpoint is $x\equiv a$ and the path is $r\equiv \operatorname{refl}_a$, so the target type becomes
\begin{align*}
C(a,\operatorname{refl}_a)\equiv \operatorname{Id}_A(a,a).
\end{align*}
The required base term is therefore
\begin{align*}
\operatorname{refl}_a:\operatorname{Id}_A(a,a).
\end{align*}
Applying $J$ to this base term gives, for the original endpoint $b$ and path $p:\operatorname{Id}_A(a,b)$, a term
\begin{align*}
J(\operatorname{refl}_a,b,p):C(b,p).
\end{align*}
Since the motive was defined by $C(x,r)\coloneqq \operatorname{Id}_A(x,a)$, this target is
\begin{align*}
C(b,p)\equiv \operatorname{Id}_A(b,a).
\end{align*}
Thus we may define
\begin{align*}
p^{-1}\coloneqq J(\operatorname{refl}_a,b,p):\operatorname{Id}_A(b,a).
\end{align*}
When $p\equiv \operatorname{refl}_a$, the computation rule for $J$ gives $p^{-1}\equiv \operatorname{refl}_a$, so symmetry is obtained by reducing every path argument to the reflexive case.
[/example]
Symmetry is not an additional axiom of equality. It is a derived operation, and the next familiar equality operation is composition of proofs.
[example: Transitivity From J]
Let $p:\operatorname{Id}_A(a,b)$ and $q:\operatorname{Id}_A(b,c)$. We define the composite path by identity elimination on $q$, keeping the path $p$ fixed. Use the motive
\begin{align*}
C(x,r)\coloneqq \operatorname{Id}_A(a,x)
\end{align*}
for $x:A$ and $r:\operatorname{Id}_A(b,x)$. In the reflexive case, the endpoint is $x\equiv b$ and the path is $r\equiv \operatorname{refl}_b$, so the target type is
\begin{align*}
C(b,\operatorname{refl}_b)\equiv \operatorname{Id}_A(a,b).
\end{align*}
The required base term is exactly
\begin{align*}
p:\operatorname{Id}_A(a,b).
\end{align*}
Applying $J$ to this base term gives, for the original endpoint $c$ and path $q:\operatorname{Id}_A(b,c)$, a term
\begin{align*}
J(p,c,q):C(c,q).
\end{align*}
Since $C(x,r)\coloneqq \operatorname{Id}_A(a,x)$, the target type becomes
\begin{align*}
C(c,q)\equiv \operatorname{Id}_A(a,c).
\end{align*}
Thus define
\begin{align*}
p\mathbin{\cdot}q\coloneqq J(p,c,q):\operatorname{Id}_A(a,c).
\end{align*}
When $q\equiv \operatorname{refl}_b$, the computation rule for $J$ gives
\begin{align*}
p\mathbin{\cdot}\operatorname{refl}_b\equiv p.
\end{align*}
Thus transitivity is obtained by reducing the second path argument to the reflexive case, where the composite is the original path $p$.
[/example]
The preceding examples show how identity types recover the groupoid-like behaviour expected of equality. The next question is how equality interacts with dependent families, where replacing equals by equals may change the type of the term being moved.
## Path Induction and Transport Along Equality
A dependent family $B:A\to\mathsf{Type}$ assigns different types to different points of $A$. If $p:\operatorname{Id}_A(a,b)$ and $u:B(a)$, then $u$ cannot be reused as a term of $B(b)$ without an equality principle that moves terms between fibres.
[quotetheorem:9633]
[citeproof:9633]
Path induction is often the most convenient presentation of identity elimination because it removes the artificial choice of a fixed left endpoint from $J$. The full dependency in $D(x,y,p)$ is necessary: if the target were allowed to depend only on $x$ and $y$, it could not express constructions whose output type mentions the particular path, such as coherence statements comparing transport along different equalities. A concrete case is the statement that transport along the reflexive path acts as the identity on a fibre: the target type must mention the proof $p$ in order to say that $\operatorname{transport}^{B}(p)$ is being compared with the identity map in the special case where $p$ is $\operatorname{refl}_x$. Later coherence laws, such as comparing transport along a composite path with successive transports, also require the motive to remember which path is being used, not only its endpoints.
The principle still has the same boundary as $J$. It does not assert that all paths are reflexivity, and it does not give a construction unless the reflexive case $d(x):D(x,x,\operatorname{refl}_x)$ has already been supplied for every $x:A$. For instance, if one asks for a term of an arbitrary family $D(x,y,p)$ but cannot produce it when $x\equiv y$ and $p\equiv \operatorname{refl}_x$, path induction has no base case to extend. This limitation is exactly what makes the rule an induction principle rather than an unrestricted rewriting command.
The next basic operation needed for dependent substitution is moving a term from one fibre of a family to another along a path in the base. This is not ordinary function application: the term starts in $B(a)$, while the desired target type is $B(b)$, and these fibres may be syntactically different even when $a$ and $b$ are propositionally equal. The missing operation is therefore a controlled coercion whose only primitive source is the equality proof itself, and whose reflexive case must reduce to doing nothing.
[definition: Transport]
Let $A$ be a type, let $B:A\to\mathsf{Type}$ be a dependent type family, and let $p:\operatorname{Id}_A(a,b)$. Transport along $p$ is the function
\begin{align*}
\operatorname{transport}^{B}(p):B(a)\to B(b)
\end{align*}
defined by identity elimination on $p$.
[/definition]
Transport is the dependent version of substitution. We next record the computation rule that explains why transport behaves as the identity when the equality proof is reflexivity.
[quotetheorem:9634]
[citeproof:9634]
Transport is the point at which dependent types make equality reasoning more delicate than ordinary rewriting. The dependent family $B:A\to\mathsf{Type}$ is essential: without a family of fibres there is nowhere for the term to move, and the non-dependent special case reduces instead to ordinary [congruence of functions](/theorems/9635). The path $p:\operatorname{Id}_A(a,b)$ is also essential; if $a$ and $b$ are merely two unrelated indices, a term of $B(a)$ need not have any type-correct interpretation as a term of $B(b)$.
A typical failure occurs with length-indexed vectors. A vector $v:\operatorname{Vec}(C,m)$ cannot be used at type $\operatorname{Vec}(C,n)$ unless the theory is given an equality proof $p:\operatorname{Id}_{\mathbb N}(m,n)$ and transports $v$ along it. The transport lemma does not say that all such transports compute away; only transport along $\operatorname{refl}_a$ reduces judgementally to the original term. For a general path $p$, transport produces a propositional operation whose later behaviour may require additional lemmas.
This motivates a concrete indexed example, since length-indexed data exposes the change of fibre directly.
[example: Transporting Vector Elements Along Equality of Lengths]
Let $\operatorname{Vec}(A,k)$ be the fibre over $k:\mathbb N$, and define the indexed family
\begin{align*}
B(k)\coloneqq \operatorname{Vec}(A,k).
\end{align*}
Given $p:\operatorname{Id}_{\mathbb N}(m,n)$ and $v:\operatorname{Vec}(A,m)$, the definition of transport applied to this family gives a function
\begin{align*}
\operatorname{transport}^{B}(p):B(m)\to B(n).
\end{align*}
Substituting the definition of $B$ into the domain gives
\begin{align*}
B(m)\equiv \operatorname{Vec}(A,m).
\end{align*}
Substituting the definition of $B$ into the codomain gives
\begin{align*}
B(n)\equiv \operatorname{Vec}(A,n).
\end{align*}
Since $v:\operatorname{Vec}(A,m)$ and $\operatorname{Vec}(A,m)\equiv B(m)$, applying the transport function to $v$ gives
\begin{align*}
\operatorname{transport}^{B}(p)(v):B(n).
\end{align*}
Using $B(n)\equiv \operatorname{Vec}(A,n)$, this is exactly
\begin{align*}
\operatorname{transport}^{B}(p)(v):\operatorname{Vec}(A,n).
\end{align*}
In the reflexive case $p\equiv \operatorname{refl}_m$, the computation rule for transport gives
\begin{align*}
\operatorname{transport}^{B}(\operatorname{refl}_m)(v)\equiv v.
\end{align*}
Thus transport along a proof $p:\operatorname{Id}_{\mathbb N}(m,n)$ changes the visible length index from $m$ to $n$, while in the reflexive case it reduces judgementally to the original vector.
[/example]
The transport example illustrates a recurring pattern in proof assistants: after proving that two indices are equal, an object is transported across that equality to satisfy the required type. This motivates the non-dependent special case where equality in the domain of a function produces equality in its codomain.
[quotetheorem:9635]
[citeproof:9635]
Congruence is the internal form of the familiar rule that equal inputs have equal outputs, and each hypothesis has a role. The common domain type $A$ is needed so that $a$ and $b$ can even be compared, the fixed codomain $B$ is needed so that $f(a)$ and $f(b)$ lie in the same identity type, and the path $p$ is the data along which the equality is propagated. Without $p$, a function does not turn unrelated inputs into equal outputs; for example, a function $f:\mathbb N\to\mathbb N$ may send $0$ and $1$ to different values.
The limitation is that $\operatorname{ap}$ applies only to non-dependent functions. If $f:\prod_{x:A}B(x)$ is dependent, then $f(a):B(a)$ and $f(b):B(b)$ usually live in different types, so the expression $\operatorname{Id}(f(a),f(b))$ is not well-typed until one side has been transported along $p$. This is why dependent type theory also uses the dependent congruence operation $\operatorname{apd}$, whose target compares $\operatorname{transport}^{B}(p)(f(a))$ with $f(b)$.
## Equality of Dependent Pairs and Functions
The identity type is primitive, but equality of compound objects must be analysed using the rules for those objects. Dependent pairs and dependent functions are the first places where the limits of intensional equality become visible.
[definition: Dependent Pair Type]
Let $A$ be a type and let $B:A\to\mathsf{Type}$. The dependent pair type is
\begin{align*}
\sum_{x:A} B(x).
\end{align*}
Its terms are pairs $(a,u)$ with $a:A$ and $u:B(a)$.
[/definition]
For ordinary products, equality of pairs is equality of both components. In a dependent pair, the second component lives in a fibre depending on the first component, so we need a theorem that inserts transport into the second-component comparison.
[quotetheorem:9636]
[citeproof:9636]
This theorem explains why equality of dependent pairs is not merely componentwise equality in the naive sense. The first-component path $p$ is indispensable: without it, the second components may lie in different fibres, so there may be no type in which they can be compared. The transport condition on $q$ is equally indispensable, because even after proving $a=b$, the term $u:B(a)$ must be moved into $B(b)$ before it can be compared with $v:B(b)$.
A concrete failure occurs in $\sum_{n:\mathbb N}\operatorname{Vec}(C,n)$. Given $(2,v)$ and $(3,w)$, there is no path $p:\operatorname{Id}_{\mathbb N}(2,3)$ in ordinary natural-number identity, so the theorem cannot manufacture an equality of the pairs. Even when the first components are propositionally equal by some path $p$, the theorem does not say the second components are judgementally equal; it says that the transported first vector must be propositionally equal to the second vector in the target fibre. A useful way to see the point is to return to vectors indexed by length, where the first component equality determines the type in which the second components can be compared.
[example: Equality in a Sigma Type]
Let $A\coloneqq \mathbb N$ and $B(k)\coloneqq \operatorname{Vec}(C,k)$. Then the dependent pair type is
\begin{align*}
\sum_{k:\mathbb N} B(k)
\end{align*}
and substituting the definition of $B$ gives
\begin{align*}
\sum_{k:\mathbb N} B(k)\equiv \sum_{k:\mathbb N}\operatorname{Vec}(C,k).
\end{align*}
Thus a term $(m,v)$ of this type consists of $m:\mathbb N$ and $v:B(m)$, which is the same as
\begin{align*}
v:\operatorname{Vec}(C,m).
\end{align*}
To prove equality of $(m,v)$ and $(n,w)$ in $\sum_{k:\mathbb N}\operatorname{Vec}(C,k)$, first give a path
\begin{align*}
p:\operatorname{Id}_{\mathbb N}(m,n).
\end{align*}
Transport for the family $B(k)\coloneqq \operatorname{Vec}(C,k)$ gives
\begin{align*}
\operatorname{transport}^{B}(p):B(m)\to B(n).
\end{align*}
Since $B(m)\equiv \operatorname{Vec}(C,m)$ and $v:\operatorname{Vec}(C,m)$, applying this function gives
\begin{align*}
\operatorname{transport}^{B}(p)(v):B(n).
\end{align*}
Since $B(n)\equiv \operatorname{Vec}(C,n)$, this has type
\begin{align*}
\operatorname{transport}^{B}(p)(v):\operatorname{Vec}(C,n).
\end{align*}
Now the second component comparison is well-typed: one must give
\begin{align*}
q:\operatorname{Id}_{\operatorname{Vec}(C,n)}(\operatorname{transport}^{B}(p)(v),w).
\end{align*}
The pair equality is then the path determined by the first-component path $p$ together with this transported second-component path $q$.
When $p\equiv \operatorname{refl}_m$, the transport computation rule gives
\begin{align*}
\operatorname{transport}^{B}(\operatorname{refl}_m)(v)\equiv v.
\end{align*}
In that case $n\equiv m$, the target fibre is $\operatorname{Vec}(C,m)$, and the second component condition becomes
\begin{align*}
q:\operatorname{Id}_{\operatorname{Vec}(C,m)}(v,w).
\end{align*}
So in the reflexive case, equality of dependent pairs reduces to ordinary equality of vectors in the same length-indexed fibre.
[/example]
Dependent functions create a different issue. A term $f:\prod_{x:A}B(x)$ is observed by applying it to inputs, but equality in intensional type theory is an identity type at the whole function type.
Before comparing whole dependent functions by an identity type, we need a separate name for the value-by-value comparison that evaluation can actually observe. For each input $x$, the two outputs lie in the same fibre $B(x)$, so the natural observable comparison asks for an identity proof in that fibre. The following definition packages this family of fibrewise equalities.
[definition: Pointwise Equality of Dependent Functions]
Let $A$ be a type, let $B:A\to\mathsf{Type}$, and let $f,g:\prod_{x:A}B(x)$. Pointwise equality of $f$ and $g$ is the type
\begin{align*}
\prod_{x:A}\operatorname{Id}_{B(x)}(f(x),g(x)).
\end{align*}
[/definition]
Pointwise equality gives a target for comparing functions value by value, but it remains to connect this comparison with the ordinary identity type of the function space. The direction available without adding any extensional axiom is evaluation of an existing path: if two functions are already identified, then applying both sides at a fixed input should give identified values in the appropriate fibre. This is the part of function equality that follows from identity elimination alone.
[quotetheorem:9637]
[citeproof:9637]
This direction follows from the ordinary rules of identity types because a path between functions may be eliminated by reducing to reflexivity. Each hypothesis contributes something specific. The type $A$ supplies the inputs at which the functions may be evaluated, the family $B:A\to\mathsf{Type}$ supplies the fibre in which each value equality lives, and the assumption that both $f$ and $g$ have type $\prod_{x:A}B(x)$ ensures that $f(x)$ and $g(x)$ are comparable in the same fibre $B(x)$. The path $p$ is the data that is eliminated; without it, the theorem would be trying to create pointwise equalities between unrelated functions.
Its limitation is that it only moves from a function equality to pointwise equalities; it does not reconstruct a path between functions from pointwise data. In bare intensional Martin-Löf type theory, a term of $\prod_{x:A}\operatorname{Id}_{B(x)}(f(x),g(x))$ is a family of value-level paths, not automatically a term of $\operatorname{Id}_{\prod_{x:A}B(x)}(f,g)$.
The obstruction is not a missing computation step but a missing principle. Pointwise equality records that every evaluation agrees propositionally, while equality of functions is an identity type at the function type itself. Passing back from the former to the latter is the familiar extensionality principle, which must be assumed or derived from stronger foundations such as univalence rather than obtained from $J$ alone.
[remark: Function Extensionality Principle]
Let $A$ be a type, let $B:A\to\mathsf{Type}$, and let $f,g:\prod_{x:A}B(x)$. Function extensionality is the additional principle that a term of
\begin{align*}
\prod_{x:A}\operatorname{Id}_{B(x)}(f(x),g(x))
\end{align*}
may be used to produce a term of
\begin{align*}
\operatorname{Id}_{\prod_{x:A}B(x)}(f,g).
\end{align*}
[/remark]
The course treats function extensionality as an extensional principle rather than as a consequence of $J$. Its assumption is precisely the pointwise family of paths: without a term witnessing $\prod_{x:A}\operatorname{Id}_{B(x)}(f(x),g(x))$, the principle gives no equality of functions. It also does not say that pointwise equal functions become definitionally equal; it produces a propositional equality in the identity type of the function space.
A useful boundary case is given by two lambda terms with different normal forms but propositionally equal outputs at every input. Function extensionality may turn those output equalities into a path between the functions, but the type checker still computes each function by its own definition. Thus the principle strengthens propositional equality of functions while leaving judgemental equality and normalization behaviour unchanged. This distinction motivates a final warning: extensional propositional equalities should not be confused with judgemental computation.
[remark: Definitional Limits of Extensional Reasoning]
Definitional equality is a judgement of the type theory, not a type inhabited by terms. Beta-reduction, eta-laws when included judgementally, and computation rules for eliminators may reduce terms by definitional equality. By contrast, function extensionality, equality of transported objects, and most equalities between compound constructions usually produce propositional equalities. These propositional equalities can be used by transport and rewriting, but they do not by themselves change the judgemental normal forms of terms.
[/remark]
This distinction is a practical guide for formal work. A final example separates having an equality proof from changing the reduction behaviour of the expressions involved.
[example: Propositional Equality Does Not Force Definitional Reduction]
Fix functions $f,g:A\to B$ and a pointwise equality
\begin{align*}
h:\prod_{x:A}\operatorname{Id}_B(f(x),g(x)).
\end{align*}
For each $a:A$, applying the dependent function $h$ to $a$ gives the value-level path
\begin{align*}
h(a):\operatorname{Id}_B(f(a),g(a)).
\end{align*}
By *[Function Extensionality Principle](/theorems/9638)*, the whole family $h$ induces a function-level path
\begin{align*}
p:\operatorname{Id}_{A\to B}(f,g).
\end{align*}
This path $p$ is a term of an identity type; it is not a judgemental equality $f\equiv g$. Therefore the two applications still reduce according to the definitions of $f$ and $g$ separately:
\begin{align*}
f(a)\ \text{reduces by the defining clauses of }f.
\end{align*}
\begin{align*}
g(a)\ \text{reduces by the defining clauses of }g.
\end{align*}
The proof $p$ may be used for transport or rewriting in propositions and dependent types, and applying equality to the input $a$ recovers a propositional equality between outputs, but it does not change the type checker's judgemental reduction rules. Thus pointwise propositional equality, even upgraded to a path $p:\operatorname{Id}_{A\to B}(f,g)$, does not force $f$ and $g$ to have the same definitional normal form.
[/example]
The chapter closes with the operational lesson that identity types give a disciplined internal equality, not an unrestricted replacement for judgemental equality. The eliminator $J$ is the source of ordinary equality reasoning, transport handles dependent substitution, and extensionality principles must be tracked according to whether they are primitive, derived, or assumed.
Identity types complete the basic toolkit by making equality an inhabitant of the theory rather than a meta-level relation alone. With that machinery in place, the course can ask what kinds of types are small, how propositions fit into universes, and why constructive mathematics needs those distinctions.
# 8. Universes, Propositions, and Constructive Mathematics
This chapter explains why dependent type theory does not put all types inside one unrestricted totality. The same syntax that lets propositions be treated as types also forces us to distinguish computational data, logical assertions, and universes of small types. The guiding questions are: how do we speak about types as objects, when are proofs themselves data, and which classical principles can be accepted without changing the constructive meaning of the theory?
## Universes and Universe Levels
If types classify terms, then it is natural to ask whether there is a type whose terms are all types. The answer cannot be an unrestricted yes: a type of all types would allow a version of [Russell's paradox](/theorems/620) inside the judgmental structure. Universes solve this by letting small types be named as terms of a larger type, while preventing the larger type from being one of its own small objects.
[definition: Universe]
A universe is a type $\mathcal U$ whose terms are codes for types, together with a decoding map
\begin{align*}
\operatorname{El} : \mathcal U \to \operatorname{Type}
\end{align*}
assigning to each code $A : \mathcal U$ the type $\operatorname{El}(A)$ that it denotes.
[/definition]
The code/decoding language separates an internal name for a type from the external type it denotes. In informal type theory this is often suppressed, and one writes $A : \mathcal U$ while also treating $A$ as a type.
[example: Small Types In A Universe]
Let $n_{\mathbb N} : \mathcal U_0$ be the code whose decoding is the natural number type:
\begin{align*}
\operatorname{El}(n_{\mathbb N})=\mathbb N.
\end{align*}
Since $\mathcal U_0$ is closed under function types between previously coded small types, applying that closure rule to the two codes $n_{\mathbb N}$ and $n_{\mathbb N}$ gives a new code, call it
\begin{align*}
n_{\mathbb N\to\mathbb N} : \mathcal U_0.
\end{align*}
The decoding rule for the function-type code says that this code denotes the function type from the decoding of the first input code to the decoding of the second input code:
\begin{align*}
\operatorname{El}(n_{\mathbb N\to\mathbb N})=\operatorname{El}(n_{\mathbb N})\to \operatorname{El}(n_{\mathbb N}).
\end{align*}
Substituting $\operatorname{El}(n_{\mathbb N})=\mathbb N$ in the domain and codomain gives
\begin{align*}
\operatorname{El}(n_{\mathbb N\to\mathbb N})=\mathbb N\to\mathbb N.
\end{align*}
Thus $\mathbb N\to\mathbb N$ is small in exactly the sense that it has a code in $\mathcal U_0$; the type itself is obtained by decoding that code.
[/example]
Once a universe is present, the same question reappears for the universe itself. Rather than placing $\mathcal U_0$ inside itself, the theory introduces a hierarchy.
[definition: Cumulative Universe Hierarchy]
A cumulative universe hierarchy is a sequence of universes
\begin{align*}
\mathcal U_0 : \mathcal U_1 : \mathcal U_2 : \cdots
\end{align*}
such that every type in $\mathcal U_i$ is also available as a type in $\mathcal U_{i+1}$.
[/definition]
Cumulativity lets ordinary mathematics avoid constant bookkeeping: a natural number type used at level $0$ may also be used in a construction living at level $1$. The hierarchy still blocks self-membership, because no universe is assigned the judgment $\mathcal U_i : \mathcal U_i$. The next theorem records why this restriction is structural rather than administrative: removing it reintroduces the same kind of self-reference that made naive set comprehension inconsistent.
[quotetheorem:9639]
[citeproof:9639]
The self-membership hypothesis $\mathcal U : \mathcal U$ is essential: an indexed hierarchy $\mathcal U_0 : \mathcal U_1 : \mathcal U_2 : \cdots$ provides the counterexample to the naive conclusion, because each universe is named only from a strictly higher one. The theorem does not say that quantifying over types is forbidden; it says that quantifying over a totality containing the quantifier's own universe breaks consistency. This is why proof assistants such as Coq and Lean track universe constraints even when they infer most levels automatically. The next issue is that once universes keep types apart by size, the theory must also distinguish data-bearing types from proposition-like types.
[remark: Universe Polymorphism]
A definition such as the identity function can be written once while being valid at every universe level. Universe polymorphism records that the same term has instances
\begin{align*}
\operatorname{id}_A : A \to A
\end{align*}
for any $A : \mathcal U_i$, with $i$ left as an implicit level parameter.
[/remark]
## Propositions, Data Types, and Proof Relevance
The propositions-as-types interpretation says that proving a proposition means constructing a term of the corresponding type. This raises a new question: should different proofs of the same proposition be observable as different data, or should propositions remember only whether they are inhabited? Type theory can support both proof-relevant types and proof-irrelevant propositions, but the distinction affects how logic interacts with computation.
[definition: Proof Relevance]
A type $A$ is treated proof-relevantly when two terms $a,b : A$ may carry distinct computational or mathematical information even if both witness the same informal assertion.
[/definition]
Proof relevance is essential for ordinary data types. Two elements of $\mathbb N$ are not interchangeable, and two witnesses of an existential statement may contain different explicit data.
[example: Proof Relevant Existence]
Under the propositions-as-types reading, a term of $\sum_{n:\mathbb N}(n+n=4)$ is a pair whose first component is a number $n:\mathbb N$ and whose second component is a proof of the equality $n+n=4$. For the witness $n=2$, the required equality is obtained by unfolding addition on natural numbers:
\begin{align*}
2+2=(1+1)+2
\end{align*}
\begin{align*}
(1+1)+2=1+(1+2)
\end{align*}
\begin{align*}
1+2=3
\end{align*}
\begin{align*}
1+(1+2)=1+3=4.
\end{align*}
Thus there is a term $(2,p):\sum_{n:\mathbb N}(n+n=4)$, where $p$ is the displayed proof of $2+2=4$. The first component $2$ is recoverable data, not merely evidence that the type is inhabited, so this dependent sum carries more information than a truth value.
[/example]
For many logical propositions, however, the internal identity of proofs is not intended to matter. This motivates a separate class of propositions.
[definition: Proposition]
A proposition is a type $P$ such that for all $p,q : P$ there is an identification $p=q$.
[/definition]
This definition makes propositions behave like truth values rather than data containers: a proposition may have proofs, but its proof terms have no observable internal variation. That property is exactly what is needed when a proof is meant only to certify that an assertion holds, rather than to supply data that later computation may inspect. The next definition names this collapse of proof identity from the point of view of the proofs themselves: instead of asking whether $P$ is a proposition, it asks whether all proofs of $P$ are interchangeable. This terminology matters because proof assistants and homotopy type theory often discuss the same condition under the label of proof irrelevance or h-propositionality.
[definition: Proof Irrelevance]
A proposition $P$ is proof-irrelevant if for all $p,q : P$ there is an identification $p=q$.
[/definition]
Proof irrelevance lets a proposition behave like a mere truth value while preserving the constructive requirement that a proof must still be provided. It also explains why many systems distinguish a universe of propositions from universes of data types.
[example: Unique Proofs Of Truth]
Let $1$ be the unit type, with sole constructor $\star : 1$. To show that $1$ is proof-irrelevant, take arbitrary terms $p,q : 1$ and construct an identification $p=q$.
By the induction principle for the unit type, it is enough to define the required identification in the single constructor case $p\equiv \star$ and $q\equiv \star$. In that case the required type is
\begin{align*}
\star=\star.
\end{align*}
This type is inhabited by reflexivity:
\begin{align*}
\operatorname{refl}_{\star} : \star=\star.
\end{align*}
Unit induction therefore gives, for every $p,q:1$, a term
\begin{align*}
\operatorname{unit\text{-}proof\text{-}irrelevance}(p,q) : p=q.
\end{align*}
Thus every proof of truth is identified with every other proof of truth; the unit type carries no proof-relevant information beyond being inhabited.
[/example]
The distinction between propositions and data becomes most visible in elimination rules. Eliminating from a data type may compute by inspecting its constructors, whereas eliminating from a proposition into data can reveal proof information that proof irrelevance intended to hide.
[remark: Restricted Elimination From Propositions]
In systems with a separate universe of propositions, elimination from a proposition into another proposition is usually unrestricted. Elimination from a proposition into a data type is often restricted unless the proposition has at most one constructor or is otherwise known to contain no computational information.
[/remark]
## Constructive Negation and Decidability
Classical logic treats every proposition as either true or false, but constructive type theory asks for evidence. The central issue is not whether contradiction is meaningful, but whether failure to refute a proposition supplies a proof of it. Negation, decidability, and excluded middle separate these ideas.
[definition: Constructive Negation]
For a proposition $P$, its negation is the function type
\begin{align*}
\neg P := P \to \varnothing,
\end{align*}
where $\varnothing$ is the empty type.
[/definition]
This definition makes a negation proof into a function from evidence for $P$ to contradiction, so it explains why refutation is stronger than absence of evidence. It also motivates the next theorem: whenever evidence for $P$ is already available, the same evidence should defeat every proposed refutation of $P$.
[quotetheorem:9640]
[citeproof:9640]
The input proof $p : P$ is necessary: without it, the same construction has nothing to feed to a proposed refutation $h : P \to \varnothing$. A sharper warning comes from Kripke semantics for intuitionistic logic: at a node where $P$ is not yet forced but every future extension prevents $\neg P$, the node can force $\neg\neg P$ without forcing $P$. Thus double-negation elimination would collapse a genuine constructive distinction between stable refutability information and positive evidence. The theorem therefore does not justify double-negation elimination; it only embeds direct proofs into their double negations. This limitation leads naturally to decidability, where the theory has enough information to choose either $P$ or $\neg P$.
Double-negation introduction is constructive because it transforms direct evidence into evidence against refutation. The reverse implication is a classical principle and is not derivable in basic Martin-Löf type theory.
[example: Refuting Refutation]
Assume $p:P$. By the definition of constructive negation, $\neg P$ is the function type $P\to\varnothing$, so proving $\neg\neg P$ means constructing a term of
\begin{align*}
(P\to\varnothing)\to\varnothing.
\end{align*}
Let $h:\neg P$. Again using the definition of negation, this means
\begin{align*}
h:P\to\varnothing.
\end{align*}
Since $p:P$, function application gives
\begin{align*}
h(p):\varnothing.
\end{align*}
Therefore the lambda term
\begin{align*}
\lambda h.\,h(p):(P\to\varnothing)\to\varnothing
\end{align*}
is a proof of $\neg\neg P$. Thus direct evidence for $P$ refutes every proposed refutation of $P$; it does not produce a decision procedure for arbitrary propositions.
[/example]
The example shows the constructive content of double negation: it builds a refutation of refutations, not a general decision procedure. This motivates isolating those propositions where the theory can actually choose between a proof of $P$ and a proof of $\neg P$, with the choice recorded as explicit coproduct data.
[definition: Decidable Proposition]
A proposition $P$ is decidable if there is a term of the coproduct type
\begin{align*}
P + \neg P.
\end{align*}
[/definition]
The coproduct records which side has been proven. For data types, the analogous question is whether equality of two elements can be decided by a uniform procedure returning either an identification or a refutation.
[definition: Decidable Equality]
A type $A$ has decidable equality if for all $x,y : A$ there is a term of
\begin{align*}
(x=y) + \neg(x=y).
\end{align*}
[/definition]
Decidable equality is a recurring bridge between data and logic: to compare data, the theory must produce either an identification or a refutation of identification. The natural numbers are the basic test case, since their induction principle should give a decision procedure by structural recursion.
[quotetheorem:7512]
[citeproof:7512]
The inductive structure of $\mathbb N$ is essential here: the constructors $0$ and successor give a terminating comparison procedure. A contrasting example is a function type $\mathbb N \to \mathbb N$, where equality is generally not decidable constructively because deciding equality of functions would require comparing infinitely many values and, extensionally, proving they agree everywhere. The theorem therefore does not say that every type has decidable equality; it identifies a class of data types whose constructors support recursive comparison. This prototype is then reused for finite types and indexed finite sets such as $\operatorname{Fin}(n)$.
This theorem is the prototype for recursive decision procedures. The proof is also a program: given two natural numbers, it computes whether they are equal and returns the corresponding evidence.
[example: Decidable Equality On Fin]
Let $\operatorname{Fin}(n)$ be represented as the dependent sum
\begin{align*}
\operatorname{Fin}(n):=\sum_{k:\mathbb N}(k<n).
\end{align*}
Take two elements $x,y:\operatorname{Fin}(n)$, and write them as
\begin{align*}
x=(k,p)
\end{align*}
and
\begin{align*}
y=(\ell,q),
\end{align*}
where $k,\ell:\mathbb N$, $p:k<n$, and $q:\ell<n$. By *Decidability Of Equality For Natural Numbers*, we have a term of
\begin{align*}
(k=\ell)+\neg(k=\ell).
\end{align*}
If the comparison gives $e:k=\ell$, then transporting $p:k<n$ along $e$ gives a proof
\begin{align*}
e_*(p):\ell<n.
\end{align*}
Both $e_*(p)$ and $q$ inhabit the proposition $\ell<n$, so proof irrelevance gives
\begin{align*}
e_*(p)=q.
\end{align*}
The equality $e:k=\ell$, together with this equality of transported bound proofs, gives
\begin{align*}
(k,p)=(\ell,q).
\end{align*}
Thus this case returns the left coproduct injection.
If the comparison gives $r:\neg(k=\ell)$, then assume for contradiction that
\begin{align*}
a:(k,p)=(\ell,q).
\end{align*}
Applying the first projection to equal pairs gives
\begin{align*}
\operatorname{ap}_{\pi_1}(a):k=\ell.
\end{align*}
Applying $r$ to this equality gives
\begin{align*}
r(\operatorname{ap}_{\pi_1}(a)):\varnothing.
\end{align*}
Therefore
\begin{align*}
\lambda a.\,r(\operatorname{ap}_{\pi_1}(a)):\neg((k,p)=(\ell,q)).
\end{align*}
This case returns the right coproduct injection.
So the natural-number comparison on the first components determines equality in $\operatorname{Fin}(n)$: equal first components lift to equal finite elements because the bound proofs are proof-irrelevant, and unequal first components refute equality by projecting any alleged equality back to $\mathbb N$.
[/example]
## Excluded Middle and Choice Principles
The final question is how much classical mathematics can be added to constructive type theory without erasing its computational meaning. Two principles are especially delicate: excluded middle, which decides every proposition, and choice, which turns pointwise existence into a function choosing witnesses. Their interaction depends on whether existence is proof-relevant data or proof-irrelevant truncation.
[definition: Excluded Middle]
Excluded middle is the principle asserting that for every proposition $P$ there is a term of
\begin{align*}
P + \neg P.
\end{align*}
[/definition]
Excluded middle says every proposition is decidable. It is compatible with consistency in many settings, but it changes the constructive reading of proofs because it supplies decisions without algorithms. To compare it with choice, we first state the version of choice that is built into dependent sums.
[definition: Type-Theoretic Choice]
Let $A$ be a type, let $B : A \to \operatorname{Type}$ be a family of types, and let $R(x,y)$ be a type or proposition for $x:A$ and $y:B(x)$. Type-theoretic choice is the map
\begin{align*}
\left(\prod_{x:A} \sum_{y:B(x)} R(x,y)\right)
\to
\sum_{f:\prod_{x:A} B(x)} \prod_{x:A} R(x,f(x))
\end{align*}
that sends $s : \prod_{x:A} \sum_{y:B(x)} R(x,y)$ to the pair whose first component is $f := \lambda x.\,\pi_1(s(x))$ and whose second component is $\lambda x.\,\pi_2(s(x))$.
[/definition]
For dependent sums, this is not an additional axiom: a term already contains a witness for each input. Projecting those witnesses gives the choosing function. In Lean and Coq this is the mathematical content behind pattern matching on an existential in a proof-relevant universe: the first projection supplies the chosen object, and the second projection supplies the verification that it satisfies the predicate.
[example: Propositions As Types Formulation Of Choice]
Suppose $A$ is a type, $B:A\to\operatorname{Type}$ is a family of types, and $R(x,y)$ is a proposition for $x:A$ and $y:B(x)$. Given
\begin{align*}
s:\prod_{x:A}\sum_{y:B(x)}R(x,y),
\end{align*}
we compute the two components of the chosen output explicitly. For each $x:A$, function application gives
\begin{align*}
s(x):\sum_{y:B(x)}R(x,y).
\end{align*}
By the elimination rule for dependent sums, this term has a first projection
\begin{align*}
\pi_1(s(x)):B(x)
\end{align*}
and a second projection
\begin{align*}
\pi_2(s(x)):R(x,\pi_1(s(x))).
\end{align*}
Define
\begin{align*}
f:=\lambda x.\,\pi_1(s(x)).
\end{align*}
Then for each $x:A$, the displayed type of $\pi_1(s(x))$ shows
\begin{align*}
f(x):B(x),
\end{align*}
so
\begin{align*}
f:\prod_{x:A}B(x).
\end{align*}
Now define
\begin{align*}
\rho:=\lambda x.\,\pi_2(s(x)).
\end{align*}
For each $x:A$, substituting $f(x)=\pi_1(s(x))$ into the type of the second projection gives
\begin{align*}
\rho(x):R(x,f(x)).
\end{align*}
Hence
\begin{align*}
\rho:\prod_{x:A}R(x,f(x)).
\end{align*}
Therefore the pair
\begin{align*}
(f,\rho):\sum_{f:\prod_{x:A}B(x)}\prod_{x:A}R(x,f(x))
\end{align*}
is the value of type-theoretic choice at $s$. The construction is just projection from the proof-relevant dependent-sum data already supplied by $s$; no separate witness-extraction axiom is being used.
[/example]
The example shows why proof-relevant choice is computationally harmless: the witnesses are already inside the input term. The tension appears when existence is weakened to a mere proposition, so that witnesses cannot be projected as data. The next theorem explains why adding a strong witness-extraction principle for mere existence can force classical reasoning.
[quotetheorem:1231]
[citeproof:1231]
The quotient or truncation hypothesis is essential: ordinary type-theoretic choice for dependent sums only projects witnesses already present in the input and does not decide arbitrary propositions. As a counterexample to the stronger conclusion, basic Martin-Lof type theory validates dependent-sum choice but does not validate excluded middle. The theorem also does not say that every weak or computational choice principle is classical; it isolates choice from mere existence, where witnesses have been hidden. This distinction connects constructive type theory with constructive set theory and with the design choice in proof assistants to separate data-carrying existence from proof-irrelevant existence.
This result does not say that all choice is forbidden. It says that choice over proof-relevant dependent sums is computational projection, while choice over mere existence can carry classical strength.
[remark: Constructive Reading Of Choice]
The constructive acceptability of a choice principle depends on what its existential hypothesis contains. If the hypothesis is a dependent sum, witnesses are already present. If the hypothesis is a proof-irrelevant or truncated existence statement, extracting witnesses requires extra power.
[/remark]
Universes and propositions explain how type theory can express mathematics without collapsing everything into one undifferentiated totality. The next chapter then asks what the computational content of this system looks like, and why normalization and canonicity are the key reasons the theory remains consistent.
# 9. Normalization, Canonicity, and Consistency
This chapter explains why the computational rules of type theory are strong enough to support mathematical reasoning without collapsing the system. The guiding questions are: do well-typed programs always compute to a final form, what do closed natural-number terms compute to, and why does this rule out a closed proof of contradiction? The answer passes through normalization, then canonicity, and finally consistency.
The previous chapters developed dependent types, identity types, and the distinction between definitional and propositional equality. Here that distinction becomes metatheoretic: definitional computation is controlled by reduction, while propositional equality is represented by terms inside the theory. Normalization says that reduction cannot run forever; canonicity says that closed data of observable types reduce to canonical constructors; consistency says that the empty type has no closed inhabitant.
## Strong Normalization and Computation
A typed calculus is meant to describe terminating constructions, not arbitrary partial programs. The first problem is therefore operational: when a term is well typed, can reduction continue indefinitely, or must computation eventually stop?
[definition: Reduction Sequence]
A reduction sequence from a term $t$ is a sequence of terms $(t_k)_{k \in \mathbb N}$ such that $t_1 = t$ and $t_k \to t_{k+1}$ for every $k \in \mathbb N$ for which $t_{k+1}$ is defined.
[/definition]
This definition records computation as repeated local simplification. The important distinction is between a finite sequence that reaches a term with no next step and an infinite sequence that never reaches a value.
[definition: Strongly Normalizing Term]
A term $t$ is strongly normalizing if there is no infinite reduction sequence from $t$.
[/definition]
Strong normalization is stronger than the assertion that some evaluation strategy terminates. Once this property has been isolated for individual terms, the central metatheoretic question becomes whether typing itself enforces it for every term produced by the formal rules.
[quotetheorem:9641]
[proofunderconstruction:9641]
The theorem is a termination theorem for typed programs, and the typing hypothesis is doing essential work. It does not say that every untyped lambda term terminates; the untyped term $\Omega = (\lambda x. x x)(\lambda x. x x)$ reduces to itself and is excluded by simple typing. Nor does strong normalization say that every term has the same operational behaviour under every evaluation discipline; it says that no possible chain of reductions can run forever, so termination is independent of reduction choices. This is why the result becomes a model for dependent type theory: if typing rules control all computation, then later canonical-forms arguments can inspect final normal terms instead of worrying that computation might diverge before reaching them.
[example: Normalizing a Simply Typed Application]
Let $\operatorname{id}_{\mathsf{Nat}} : \mathsf{Nat} \to \mathsf{Nat}$ be the abstraction $\lambda x. x$, where $x : \mathsf{Nat}$. We compute the closed application $\operatorname{id}_{\mathsf{Nat}}(\mathsf{succ}(\mathsf{zero})) : \mathsf{Nat}$ by first unfolding the abbreviation:
\begin{align*}
\operatorname{id}_{\mathsf{Nat}}(\mathsf{succ}(\mathsf{zero})) = (\lambda x. x)(\mathsf{succ}(\mathsf{zero})).
\end{align*}
The beta-reduction rule replaces the bound variable $x$ in the body $x$ by the argument $\mathsf{succ}(\mathsf{zero})$, so
\begin{align*}
(\lambda x. x)(\mathsf{succ}(\mathsf{zero})) \to x[\mathsf{succ}(\mathsf{zero})/x].
\end{align*}
Since substituting a term for the single occurrence of the variable $x$ in $x$ gives that term itself,
\begin{align*}
x[\mathsf{succ}(\mathsf{zero})/x] = \mathsf{succ}(\mathsf{zero}).
\end{align*}
Thus the whole reduction sequence is
\begin{align*}
\operatorname{id}_{\mathsf{Nat}}(\mathsf{succ}(\mathsf{zero})) \to \mathsf{succ}(\mathsf{zero}).
\end{align*}
The result is already a constructor form of type $\mathsf{Nat}$, so this is the smallest example of a well-typed closed natural-number program reaching a final numeral after finitely many reduction steps.
[/example]
The proof is usually first given for the simply typed lambda calculus because the type structure is small enough to see the induction. Dependent type theory needs a more delicate normalization argument, since types themselves contain terms and reduction occurs inside types as well as programs.
[remark: Normalization Is a Metatheorem]
Strong normalization is not a term inside the object theory in the basic presentation. It is a theorem about the formal system, proved externally by induction and semantic interpretation. This matters because the theory can be used as a foundation only after its metatheoretic behaviour has been controlled.
[/remark]
## Canonicity for Natural Numbers
Termination alone does not identify what final results look like. The next problem is observability: if a closed term has type $\mathsf{Nat}$, must it compute to an actual numeral, or could it get stuck in a non-numeral normal form?
[definition: Numeral]
A numeral is a closed term of type $\mathsf{Nat}$ of the form
\begin{align*}
\mathsf{zero}, \quad \mathsf{succ}(\mathsf{zero}), \quad \mathsf{succ}(\mathsf{succ}(\mathsf{zero})), \quad \dots .
\end{align*}
[/definition]
Numerals are the canonical closed values of the natural-number type. This raises the precise canonicity question: does every closed term of type $\mathsf{Nat}$ reduce to one of these numerals, so that closed natural-number programs always have observable numerical output?
[quotetheorem:9642]
[citeproof:9642]
Canonicity is the bridge from syntax to computation. A closed proof assistant term of type $\mathsf{Nat}$ is not merely known to denote some number semantically; it computes to a visible numeral by the rules of the type theory. The closedness assumption is indispensable: open terms may be normal because computation is suspended on variables, and canonicity makes no claim that such terms are already numerals. The theorem also depends on the exact computational rules of the pure theory; adding opaque constants, axioms, or new reduction principles can change the normal forms available at $\mathsf{Nat}$. Finally, canonicity is not a completeness theorem for propositional equality: equations such as $n+0=n$ may require an internal proof even though every closed natural-number program still computes to a numeral. This distinction prepares the consistency argument, where the same normalization-plus-canonical-forms pattern is applied to a type with no canonical constructors at all.
[example: Reducing a Closed Natural-Number Term]
Let
\begin{align*}
\mathsf{double}(n) := \mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), n).
\end{align*}
Here the zero case is $\mathsf{zero}$, and the successor step sends the previous result $r$ to $\mathsf{succ}(\mathsf{succ}(r))$. We compute $\mathsf{double}(\mathsf{succ}(\mathsf{succ}(\mathsf{zero})))$ by unfolding the definition:
\begin{align*}
\mathsf{double}(\mathsf{succ}(\mathsf{succ}(\mathsf{zero}))) = \mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{succ}(\mathsf{succ}(\mathsf{zero}))).
\end{align*}
The successor computation rule for primitive recursion gives
\begin{align*}
\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{succ}(\mathsf{succ}(\mathsf{zero}))) \to (\lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)))(\mathsf{succ}(\mathsf{zero}))(\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{succ}(\mathsf{zero}))).
\end{align*}
Beta-reducing the two applications of the step function yields
\begin{align*}
(\lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)))(\mathsf{succ}(\mathsf{zero}))(\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{succ}(\mathsf{zero}))) \to \mathsf{succ}(\mathsf{succ}(\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{succ}(\mathsf{zero})))).
\end{align*}
Apply the successor computation rule once more to the remaining recursive call:
\begin{align*}
\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{succ}(\mathsf{zero})) \to (\lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)))(\mathsf{zero})(\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{zero})).
\end{align*}
Beta-reducing this step function application gives
\begin{align*}
(\lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)))(\mathsf{zero})(\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{zero})) \to \mathsf{succ}(\mathsf{succ}(\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{zero}))).
\end{align*}
The zero computation rule for primitive recursion gives
\begin{align*}
\mathsf{rec}_{\mathsf{Nat}}(\mathsf{zero}, \lambda k.\lambda r.\mathsf{succ}(\mathsf{succ}(r)), \mathsf{zero}) \to \mathsf{zero}.
\end{align*}
Substituting this result into the previous expressions, the closed term reduces to
\begin{align*}
\mathsf{succ}(\mathsf{succ}(\mathsf{succ}(\mathsf{succ}(\mathsf{zero})))).
\end{align*}
Thus primitive recursion computes the double of $\mathsf{succ}(\mathsf{succ}(\mathsf{zero}))$ to the numeral $4$, with each successor input contributing two successor constructors to the final normal form.
[/example]
The statement is special to closed terms. An open term $x : \mathsf{Nat} \vdash x : \mathsf{Nat}$ is already normal but is not a numeral, because its value depends on the surrounding context.
[remark: Closedness in Canonicity]
The empty context is essential in the canonicity theorem. Variables are neutral terms, and neutral terms are valid normal forms in non-empty contexts. Canonicity says that once all inputs have been removed, natural-number computation cannot remain suspended on an unknown variable.
[/remark]
## Definitional and Propositional Computation
Canonicity depends on the judgmental reduction rules of the theory, so we must separate two kinds of equality. The problem is that type theory has both computation performed by the kernel and equalities represented by terms that users may prove.
[definition: Definitional Equality]
Definitional equality is the judgment $\Gamma \vdash t \equiv u : A$ generated by the computational conversion rules of the type theory, together with structural rules such as congruence and conversion.
[/definition]
Definitional equality is checked by the type checker as part of deciding whether terms have the required types. This external judgment handles automatic computation, but the internal language still needs a way to state and prove equalities that are not judgmental conversions.
[definition: Propositional Equality]
Propositional equality between $a,b : A$ is the identity type $\operatorname{Id}_A(a,b)$, whose terms are proofs that $a$ and $b$ are equal in the internal language.
[/definition]
The identity type lets the theory reason about equality internally. Unlike definitional equality, a propositional equality may require an explicit proof term and may not cause automatic computation by the kernel.
[example: Definitional Versus Propositional Computation]
Define addition by recursion on the first argument, with computation rules
\begin{align*}
\mathsf{zero}+n \equiv n
\end{align*}
and
\begin{align*}
\mathsf{succ}(k)+n \equiv \mathsf{succ}(k+n).
\end{align*}
The first rule applies immediately to $0+n$, so $0+n$ and $n$ are definitionally equal. By contrast, if $n:\mathsf{Nat}$ is a variable, the expression $n+0$ has no head constructor to inspect: it is not of the form $\mathsf{zero}+0$ and it is not of the form $\mathsf{succ}(k)+0$, so neither defining equation applies judgmentally.
To prove the equality internally, we construct a term
\begin{align*}
p(n):\operatorname{Id}_{\mathsf{Nat}}(n+0,n)
\end{align*}
by induction on $n$. In the base case, the first computation rule gives
\begin{align*}
\mathsf{zero}+0 \equiv 0,
\end{align*}
so reflexivity gives
\begin{align*}
p(\mathsf{zero}) := \mathsf{refl}_{0} : \operatorname{Id}_{\mathsf{Nat}}(0+0,0).
\end{align*}
For the successor step, assume
\begin{align*}
p(k):\operatorname{Id}_{\mathsf{Nat}}(k+0,k).
\end{align*}
The second computation rule gives
\begin{align*}
\mathsf{succ}(k)+0 \equiv \mathsf{succ}(k+0).
\end{align*}
Applying congruence of $\mathsf{succ}$ to $p(k)$ gives
\begin{align*}
\operatorname{ap}_{\mathsf{succ}}(p(k)):\operatorname{Id}_{\mathsf{Nat}}(\mathsf{succ}(k+0),\mathsf{succ}(k)).
\end{align*}
Using the displayed definitional equality to convert the left endpoint, this is the required successor case
\begin{align*}
p(\mathsf{succ}(k)):\operatorname{Id}_{\mathsf{Nat}}(\mathsf{succ}(k)+0,\mathsf{succ}(k)).
\end{align*}
Thus $0+n\equiv n$ is automatic definitional computation, while $n+0=n$ for variable $n$ is witnessed by an explicit induction proof term.
[/example]
This example explains why canonicity is not the assertion that all true equations compute judgmentally. It says that closed natural-number programs compute to numerals; many useful equations between open terms still live as propositional equalities.
[remark: Intensionality]
In intensional type theory, definitional equality is deliberately restrictive. This keeps type checking computationally manageable and leaves richer equality principles to be expressed as types. The price is that some equations familiar from ordinary mathematics require explicit proof terms.
[/remark]
## Consistency from Normalization
The final problem is logical rather than computational. Under Curry--Howard, a closed term of the empty type would be a proof of falsehood, so consistency means that no such term exists.
[definition: Empty Type]
The empty type $\mathsf{Empty}$ is a type with no introduction rules.
[/definition]
Because there is no constructor for $\mathsf{Empty}$, any closed normal inhabitant would have to arise from a neutral term. In the empty context there are no variables, so normalization turns this informal obstruction into a proof.
[quotetheorem:9643]
[citeproof:9643]
This theorem is the proof-theoretic payoff of normalization, but each metatheoretic hypothesis is doing work. Normalization ensures that any alleged proof of falsehood can be reduced to a normal form, preservation ensures that the reduced term still has type $\mathsf{Empty}$, and the canonical-forms lemma rules out closed normal inhabitants of a type with no constructors. The theorem does not show that every extension of the theory is consistent, nor does it settle semantic questions such as whether every intended model validates a proposed axiom. Adding general recursion can destroy normalization, while adding an arbitrary closed axiom can introduce a neutral closed term whose type may lead to contradiction. Thus the result is a consistency theorem for the specified core calculus, not a blanket guarantee for every proof-assistant feature built on top of it.
[example: A Failed Construction of an Empty Inhabitant]
Suppose we try to construct a closed term $\vdash t : \mathsf{Empty}$. Since $\mathsf{Empty}$ has no introduction rules, $t$ cannot be a constructor for $\mathsf{Empty}$. If $t$ were an eliminator expression, then it would have to eliminate some scrutinee already available in the empty context; if that scrutinee reduced to a constructor, the eliminator would take a computation step, and if it were neutral, its head would have to come from a variable or closed constant.
By *Normalization*, any such closed term would reduce in finitely many steps to a normal form $u$. By preservation of typing along reduction, the final term would still satisfy $\vdash u : \mathsf{Empty}$. Now inspect the possible shapes of a closed normal term $u : \mathsf{Empty}$. It cannot be an introduction form, because $\mathsf{Empty}$ has no constructors. It cannot be neutral, because in the empty context there is no variable to serve as the head of a neutral term, and in the pure theory there are no extra closed constants or axioms. Hence there is no possible closed normal form $u : \mathsf{Empty}$, so the attempted construction of $t$ cannot succeed.
[/example]
The same reasoning also explains why adding new axioms is delicate. An axiom is a new closed constant, and if its type can imply $\mathsf{Empty}$, then consistency may be lost even though the original computational rules were well behaved.
[remark: Consistency Is Relative to the Rules]
Consistency is a theorem about a specified type theory. Extending the theory with quotient rules, equality reflection, general recursion, or unrestricted axioms changes the normalization and canonicity arguments. Each extension must be checked against the metatheory it is meant to preserve.
[/remark]
## How the Three Results Fit Together
The chapter has moved from operational termination to observable data and then to logical soundness. Strong normalization says that well-typed computation terminates. Canonicity says that closed natural-number computations terminate specifically at numerals. Consistency says that there is no closed computation whose type is falsehood.
[explanation: Normalization to Canonicity to Consistency]
Normalization supplies the existence of normal forms. Canonical-forms lemmas identify which normal forms are possible at each type. For $\mathsf{Nat}$ this yields numerals, while for $\mathsf{Empty}$ it yields impossibility.
The role of preservation is to keep types stable along reduction. Without preservation, normalizing a term could change the type being analyzed. Without canonical forms, normalization would only give a final syntax tree, not a mathematical interpretation of what that tree represents.
[/explanation]
The practical message for proof assistants is that computation is part of proof checking. Definitional equality performs trusted reduction, propositional equality records internal mathematical arguments, and normalization ensures that the reduction layer behaves as a terminating computation for the core theory covered here.
Normalization and canonicity show that well-typed terms do not merely exist abstractly but compute in a controlled way. The final conceptual step is to see how proof assistants exploit that fact, using a small dependent kernel to check large formal developments reliably.
# 10. Type Theory in Proof Assistants
This chapter turns from type theory as a mathematical calculus to type theory as the trusted core of proof assistants. Lean, Agda, and Coq differ in surface language, automation, and library design, but each ultimately checks a small typed term in a dependent type theory. The central question is how a convenient interactive environment can remain faithful to a compact kernel, so that automation and notation help the user without enlarging the trusted mathematics.
This is also where the Curry-Howard correspondence from Chapter 2 becomes engineering practice. Propositions are represented as types, proofs as programs, and theorem checking as type checking, so proof-assistant architecture sits at the intersection of logic and programming-language semantics. Questions about normalization, reduction, and elaboration are therefore not implementation details alone; they are the operational side of the logical guarantees developed in the preceding chapters.
## Dependent Type Theories as Kernels
A proof assistant must decide what counts as a valid proof. The user writes declarations, tactic scripts, recursive definitions, pattern matches, and heavily abbreviated expressions, but the system's trusted component should check only a small set of judgments. This distinction matters because a tactic may contain a bug, a parser may misread notation, or an elaborator may insert the wrong implicit argument; none of these front-end failures should by itself turn an unfinished argument into an accepted theorem. This section isolates the role of the kernel: it implements the primitive typing, conversion, and reduction rules of the underlying dependent type theory.
[definition: Kernel]
A kernel of a proof assistant is the trusted component that checks declarations by verifying typing judgments of the form $\Gamma \vdash t : A$, definitional equality judgments of the form $\Gamma \vdash t \equiv u : A$, and accepted formation rules for universes, dependent function types, inductive types, and eliminators.
[/definition]
The kernel is not the whole proof assistant; it is the final checker at the boundary between user-facing syntax and formal derivation. To understand the security claim made by a prover, we next need a name for the components whose correctness is assumed rather than rechecked inside the system.
[definition: Trusted Computing Base]
The trusted computing base of a proof assistant is the collection of components whose correctness is required for the system's accepted theorems to be valid in the intended metatheory.
[/definition]
The goal of modern proof assistants is to keep the trusted computing base small. This raises the main soundness question for the architecture: if all accepted declarations pass through a sound kernel, what exactly has been established about the theorems in the library?
[remark: Kernel Soundness Principle]
Assume a proof-assistant kernel $K$ for a dependent type theory $\mathcal T$ is sound in the following sense: whenever $K$ accepts a judgment in a well formed kernel environment $E$, that judgment is derivable in $\mathcal T$ extended by the declarations of $E$. If a finite sequence of environments
\begin{align*}
E_0 \subset E_1 \subset \cdots \subset E_n
\end{align*}
is built only by conservative, kernel-checked declarations, and no theorem declaration is accepted as an unchecked axiom, then a theorem declaration accepted over $E_i$ with checked proof term $p:P$ establishes a derivation of $P$ in $\mathcal T+E_i$.
[/remark]
The hypotheses matter. If the environment could be extended by an unchecked axiom labelled as a theorem, then a later kernel check might faithfully use that constant while the library as a whole no longer represented a derivation from the intended primitive rules. Likewise, if the kernel's conversion or typing algorithm were unsound, a closed term could be accepted at a proposition it does not inhabit. The principle also does not say that the intended foundations are philosophically correct, that the implementation has no bugs outside the kernel, or that the checked proof term is readable; it says only that accepted theorem declarations reduce to derivations in the specified core theory and its conservative environment. This principle explains why proof assistants can safely expose rich interfaces: trust is concentrated at the boundary where surface syntax becomes a kernel term. The next comparison asks what the core theories of Lean, Agda, and Coq share before the chapter turns to the front-end machinery built above them.
[example: Three Kernel Styles]
Lean, Coq, and Agda all use dependent types as kernel-level objects, but they choose different primitive rules around universes, propositions, equality, and computation. Lean is based on a calculus of inductive constructions: it has a proposition universe $\operatorname{Prop}$ with proof irrelevance, so two proofs of the same proposition in $\operatorname{Prop}$ are identified for computational purposes, and it has a cumulative hierarchy $\operatorname{Type}_0,\operatorname{Type}_1,\operatorname{Type}_2,\ldots$ for computational data. Coq is also based on a calculus of inductive constructions, with a persistent separation between computational types and propositions; this separation governs which proof objects can be erased during extraction and which data remain computationally relevant. Agda is closer in style to intensional Martin-Löf type theory: dependent pattern matching is a central programming form, and equality is treated intensionally, so terms are identified by the kernel only when its definitional equality rules justify the identification.
The shared Curry-Howard core can be seen in the identity proof. If $A$ is a proposition or type in the system, then a proof of $A \to A$ is the term
\begin{align*}
\lambda x \mapsto x : A \to A.
\end{align*}
To check this term, the kernel assumes $x:A$ in the context and verifies that the body has type $A$:
\begin{align*}
x:A \vdash x:A.
\end{align*}
By the dependent function introduction rule, this yields
\begin{align*}
\vdash \lambda x \mapsto x : A \to A.
\end{align*}
Thus the same basic proof object exists in all three systems, while the surrounding kernel rules determine whether proofs are erased, which universe $A$ inhabits, and which expressions count as definitionally equal during checking.
[/example]
The example shows that the phrase "dependent type theory" names a family rather than a single formal system. A comparison of proof assistants therefore needs a kernel-level equality judgment before it can explain why two pieces of syntax may count as the same term during checking.
[definition: Definitional Equality]
For each context $\Gamma$ and type $A$, definitional equality is the judgmental equality relation on terms of type $A$ in context $\Gamma$, written $\Gamma \vdash t \equiv u : A$, generated by the computation rules, conversion rules, and reduction rules built into the kernel.
[/definition]
Definitional equality is the equality the checker uses without asking the user for a proof. This makes it central to interactive proving: if two expressions reduce to the same normal form, the kernel may treat them as interchangeable during type checking.
[remark: Propositional Equality Versus Definitional Equality]
Definitional equality is a judgment of the type theory, while propositional equality is an internal type whose inhabitants are proofs. A term of type $x = y$ can be transported across, pattern matched on, and passed as data. A definitional equality has no proof term in the context; it is part of the checker itself.
[/remark]
## Tactics, Terms, and Elaboration
Users rarely write fully explicit kernel terms. A natural proof script may omit type arguments, universe levels, coercions, instance arguments, and intermediate terms. The practical problem is therefore to explain how tactic proofs and concise term proofs can be translated into explicit objects without changing the trusted theory.
[definition: Term Proof]
A term proof is a proof written as an expression whose type is the proposition being proved.
[/definition]
A term proof is close to the Curry-Howard reading of the kernel, because the proof object is visible as an expression. However, many interactive proofs are easier to construct by manipulating goals, so we need a second surface style that still produces a checked term.
[definition: Tactic Proof]
A tactic proof is a proof written as a sequence of commands that transform proof states and eventually construct a term whose type is the target proposition.
[/definition]
Tactics are a user interface for constructing terms. The distinction between term style and tactic style is best seen by proving the same theorem in both forms and observing that both routes end at the same kind of kernel object.
[example: Associativity of Addition in Two Styles]
For the proof assistant's natural-number type $\operatorname{Nat}$, addition is a curried operation $+:\operatorname{Nat}\to\operatorname{Nat}\to\operatorname{Nat}$ defined by recursion on the first argument:
\begin{align*}0+n \equiv n.\end{align*}
\begin{align*}\operatorname{succ}(r)+n \equiv \operatorname{succ}(r+n).\end{align*}
Associativity of addition asks for a proof of $(m+n)+k=m+(n+k)$ for all $m,n,k:\operatorname{Nat}$.
Fix $n,k:\operatorname{Nat}$ and prove the statement by induction on $m$. In the zero case, the left side reduces by the first computation rule for addition:
\begin{align*}(0+n)+k \equiv n+k.\end{align*}
The right side reduces by the same rule:
\begin{align*}0+(n+k)\equiv n+k.\end{align*}
So the zero case is closed by reflexivity of equality at $n+k$.
For the successor case, assume the induction hypothesis
\begin{align*}h:(m+n)+k=m+(n+k).\end{align*}
The left side of the successor goal reduces twice by the second computation rule for addition:
\begin{align*}(\operatorname{succ}(m)+n)+k \equiv \operatorname{succ}(m+n)+k.\end{align*}
\begin{align*}\operatorname{succ}(m+n)+k \equiv \operatorname{succ}((m+n)+k).\end{align*}
The right side reduces once:
\begin{align*}\operatorname{succ}(m)+(n+k)\equiv \operatorname{succ}(m+(n+k)).\end{align*}
Applying congruence of $\operatorname{succ}$ to $h$ gives
\begin{align*}\operatorname{succ}((m+n)+k)=\operatorname{succ}(m+(n+k)).\end{align*}
This proves the successor case.
A term-oriented proof writes exactly this induction as a term built from the eliminator for $\operatorname{Nat}$, reflexivity, and congruence. A tactic-oriented proof performs the same construction by opening induction on $m$, reducing the two addition rules in each branch, and rewriting the successor branch with the induction hypothesis. The surface scripts differ, but the kernel checks the same kind of proof object.
[/example]
This example also shows why tactic scripts should not be confused with foundations. The same kernel proof can be reached through several surface routes, and a system therefore needs a general translation stage from user syntax to explicit core terms.
[definition: Elaboration]
Elaboration is a partial transformation from user-level declarations and expressions in the surface language, including omissions, notation, coercions, tactics, and implicit information, to explicit core declarations and terms in the proof assistant's kernel language.
[/definition]
Elaboration explains how high-level scripts become kernel terms, but the most common omitted pieces deserve a sharper name. In dependent type theory, many missing parameters are types, universe levels, instances, or indices recovered from surrounding information; such an omitted parameter is called an implicit argument. Implicit arguments make dependent type theory usable at scale, since explicit terms would otherwise carry many repeated type and universe parameters.
They also raise a foundational worry: if the elaborator fills gaps, searches for instances, and runs tactics, we must know that this front end has not added a new proof rule. The answer is that elaboration remains conservative when all of its products are checked by the kernel.
[quotetheorem:9644]
[citeproof:9644]
The kernel-checking hypothesis is essential. If an elaborator were allowed to record a theorem without producing a core term, then a faulty tactic could turn an unsolved goal into an accepted declaration and thereby add a new axiom by accident. If a coercion or type-class search procedure guessed an inconsistent hidden proof and the kernel did not recheck it, the error would become part of the trusted mathematics rather than part of an untrusted search procedure. The theorem is conservative in a precise architectural sense: it does not say elaboration is predictable, deterministic, complete, or easy to debug, and it does not prevent proof scripts from becoming fragile when notation or automation changes. It says that successful elaboration contributes no new primitive proof rule unless the kernel itself is extended. This prepares the next topic, because inductive definitions, pattern matching, and recursion checkers are more sophisticated front-end mechanisms whose outputs must again be justified by kernel-level terms and reduction rules.
[remark: Automation and Proof Terms]
Heavy automation can make the generated proof term much larger than the script that produced it. This affects performance, transparency, and maintainability, but not kernel soundness provided the generated term is checked. Many library design choices are therefore tradeoffs between concise scripts, stable proof terms, and fast checking.
[/remark]
## Inductive Types and Computation
Dependent type theory becomes a practical foundation because it supports user-defined data and propositions. The problem is to admit inductive definitions and recursive programs while preserving consistency and normalization properties expected of the theory. Proof assistants solve this by restricting how inductive types, pattern matching, and recursion may be formed.
[definition: Inductive Type]
An inductive type is a type specified by constructors, together with eliminators or pattern-matching principles that define functions out of the type by giving cases for its constructors.
[/definition]
The natural numbers are the guiding example. In proof-assistant libraries this zero-based inductive type is usually written $\operatorname{Nat}$ or `nat`, rather than using set-theoretic notation whose convention elsewhere in these notes starts at $1$. A value of $\operatorname{Nat}$ is generated from $0$ and successor, and functions out of $\operatorname{Nat}$ are defined by giving a value at $0$ and a rule for successors.
[example: Natural Numbers as an Inductive Type]
The inductive type $\operatorname{Nat}$ has constructors $0:\operatorname{Nat}$ and $\operatorname{succ}:\operatorname{Nat}\to\operatorname{Nat}$. Addition is a function
\begin{align*}+:\operatorname{Nat}\to\operatorname{Nat}\to\operatorname{Nat},\end{align*}
defined by recursion on its first argument: for every $n:\operatorname{Nat}$,
\begin{align*}0+n \equiv n,\end{align*}
and for every $m,n:\operatorname{Nat}$,
\begin{align*}\operatorname{succ}(m)+n \equiv \operatorname{succ}(m+n).\end{align*}
For instance, the computation of $\operatorname{succ}(\operatorname{succ}(0))+n$ unfolds one constructor at a time:
\begin{align*}\operatorname{succ}(\operatorname{succ}(0))+n \equiv \operatorname{succ}(\operatorname{succ}(0)+n).\end{align*}
The inner addition again reduces by the successor rule:
\begin{align*}\operatorname{succ}(0)+n \equiv \operatorname{succ}(0+n).\end{align*}
The zero rule gives
\begin{align*}0+n \equiv n,\end{align*}
so substituting these reductions yields
\begin{align*}\operatorname{succ}(\operatorname{succ}(0))+n \equiv \operatorname{succ}(\operatorname{succ}(n)).\end{align*}
Thus addition computes by inspecting the first input, and induction on that same input matches the recursive clauses used in the associativity proof.
[/example]
The natural-number example records only unindexed finite data, so properties such as list length would normally be stated after a definition. Dependent type theory can instead put such an invariant into the type; the standard example is the length-indexed vector $\operatorname{Vec}(A,n)$, an inductive family with constructors $\operatorname{nil}:\operatorname{Vec}(A,0)$ and $\operatorname{cons}:A\to\operatorname{Vec}(A,n)\to\operatorname{Vec}(A,\operatorname{succ}(n))$.
The length appears in the type, so functions over vectors can express size guarantees in their result type. This shifts some run-time checks into type checking.
[example: Map Preserves Vector Length]
Let $f:A\to B$. The length-indexed vector map is defined by recursion on the input vector, with target type depending on the same length index:
\begin{align*}\operatorname{map}_f:\operatorname{Vec}(A,n)\to \operatorname{Vec}(B,n).\end{align*}
For the empty vector, the input has type $\operatorname{Vec}(A,0)$, so the output must have type $\operatorname{Vec}(B,0)$. The constructor $\operatorname{nil}$ has exactly this type, hence
\begin{align*}\operatorname{map}_f(\operatorname{nil}) \equiv \operatorname{nil}:\operatorname{Vec}(B,0).\end{align*}
For the cons case, suppose $a:A$ and $w:\operatorname{Vec}(A,n)$. Then
\begin{align*}\operatorname{cons}(a,w):\operatorname{Vec}(A,\operatorname{succ}(n)).\end{align*}
By the recursive call at the smaller vector $w$, we have
\begin{align*}\operatorname{map}_f(w):\operatorname{Vec}(B,n).\end{align*}
Since $f(a):B$, the vector constructor for $B$ gives
\begin{align*}\operatorname{cons}(f(a),\operatorname{map}_f(w)):\operatorname{Vec}(B,\operatorname{succ}(n)).\end{align*}
Thus the cons clause is
\begin{align*}\operatorname{map}_f(\operatorname{cons}(a,w)) \equiv \operatorname{cons}(f(a),\operatorname{map}_f(w)):\operatorname{Vec}(B,\operatorname{succ}(n)).\end{align*}
The input and output indices are $0$ in the empty case and $\operatorname{succ}(n)$ in the cons case, so the recursive definition itself enforces length preservation through its codomain $\operatorname{Vec}(B,n)$.
[/example]
The vector example illustrates the constructive force of indices. A statement such as "map preserves length" becomes the type of a program, but writing such programs conveniently requires branch syntax that reveals which constructor was used and updates the surrounding type information.
[definition: Pattern Matching]
Pattern matching is a surface syntax for defining functions by cases on the constructors of an inductive type or inductive family.
[/definition]
For non-dependent types, pattern matching resembles case analysis in functional programming. For indexed families, the type of the goal may change from branch to branch because the pattern reveals information about indices.
[remark: Dependent Pattern Matching]
In a vector pattern match, the empty branch forces the length index to be $0$, while the cons branch forces it to be a successor. The elaborator translates this branch-sensitive reasoning into eliminators, equality transports, or primitive pattern-matching terms accepted by the kernel. Different systems expose different amounts of this translation to the user.
[/remark]
## Recursion Checking and Universes
A proof assistant must reject definitions that would allow non-terminating proofs of arbitrary propositions. The challenge is that ordinary programming wants recursive functions, while logical consistency requires a disciplined termination criterion. Structural recursion and guarded corecursion are the main syntactic disciplines used to keep definitions computationally meaningful.
[definition: Structural Recursion]
A recursive definition is structurally recursive when every recursive call is made on an immediate structural subobject of the argument being eliminated, or on a value accepted by the system's structural termination checker as smaller.
[/definition]
Structural recursion is restrictive, but it is predictable and closely aligned with induction. It handles functions that consume finite inductive data; to discuss infinite output such as streams, the dual problem is not termination of the whole object but productivity of each observable layer.
[definition: Guarded Corecursion]
A corecursive definition is guarded when every corecursive call appears underneath a constructor that produces one observable layer of a coinductive object.
[/definition]
Guardedness is the dual discipline for infinite objects such as streams. It permits productive definitions: each finite observation of the output can be computed in finite time. The common role of guardedness and structural recursion is to control when recursive unfolding is allowed inside definitional equality.
[quotetheorem:9645]
[citeproof:9645]
These criteria are intentionally conservative. Without a decrease or productivity requirement, a definition such as $f(n)=f(n)$ could be unfolded forever inside conversion, so type checking would no longer have the normalization behaviour that conversion relies on. That non-normalization problem is distinct from the consistency risk: if the system also allowed an unchecked circular proof constant, for instance a declared proof $p:P$ whose definition expands back to $p$ without a legitimate constructor or eliminator justifying $P$, then the library could contain a theorem accepted by self-reference rather than by a derivation in the kernel theory. The theorem does not guarantee that every terminating program is accepted, nor does it claim that all assistants use the same syntactic test or expose the same coinductive computation rules. Many terminating functions are rejected unless rewritten with an explicit measure, an accessibility proof, or a library recursion principle. The examples below show the practical boundary between immediately structural recursion and recursion that needs extra justification, before the discussion turns to universe levels as a separate guard against self-reference in types.
[example: Accepted and Rejected Recursion]
For factorial, define $\operatorname{fact}:\operatorname{Nat}\to\operatorname{Nat}$ by
\begin{align*}\operatorname{fact}(0)\equiv 1.\end{align*}
\begin{align*}\operatorname{fact}(\operatorname{succ}(n))\equiv \operatorname{succ}(n)\cdot \operatorname{fact}(n).\end{align*}
In the successor clause, the argument being eliminated is $\operatorname{succ}(n)$, and the recursive call is made at its immediate predecessor $n$. Thus the call is on a structural subobject of the input, so the definition satisfies structural recursion.
By contrast, a proposed definition
\begin{align*}f(n)\equiv f(n)\end{align*}
makes its recursive call at exactly the same argument $n$. The input has not changed from $n$ to a predecessor or to any certified smaller value, so unfolding the definition gives
\begin{align*}f(n)\equiv f(n)\equiv f(n),\end{align*}
with no constructor case that decreases the argument. A structural termination checker therefore rejects it.
A definition that calls itself at $\lfloor n/2\rfloor$ is different: the recursive argument is usually smaller, but that fact is arithmetic rather than immediate from the constructor shape of $n$. For $n>0$, one must justify
\begin{align*}\lfloor n/2\rfloor < n.\end{align*}
Indeed, since $n>0$, we have $1\leq n$, hence $n<2n$. Dividing by $2$ gives
\begin{align*}n/2<n,\end{align*}
and since $\lfloor n/2\rfloor\leq n/2$, transitivity gives
\begin{align*}\lfloor n/2\rfloor<n.\end{align*}
So this recursion can be accepted by systems that allow a supplied decrease proof, while it is not accepted by the simplest immediate-subterm test alone.
[/example]
The recursion example controls circularity at the level of computation. A separate circularity problem arises at the level of types themselves: a universe containing itself would recreate paradoxes of self-reference, so proof assistants organize types into levels.
[definition: Universe Hierarchy]
A universe hierarchy is a sequence of universes $\operatorname{Type}_0$, $\operatorname{Type}_1$, $\operatorname{Type}_2$, and so on, arranged so that small types live in lower universes and each universe is itself classified by a higher universe.
[/definition]
Universe constraints are often inferred, but they are not cosmetic. They are part of the formal term that the kernel checks.
[example: Polymorphic Identity and Universe Constraints]
Let the polymorphic identity be the term
\begin{align*}\operatorname{id}_u := \lambda (\alpha:\operatorname{Type}_u)\mapsto \lambda (x:\alpha)\mapsto x.\end{align*}
If $\alpha:\operatorname{Type}_u$ and $x:\alpha$, then the body $x$ has type $\alpha$ in the extended context, so function introduction gives
\begin{align*}\operatorname{id}_u : \{\alpha:\operatorname{Type}_u\}\to \alpha\to \alpha.\end{align*}
For natural numbers, $\operatorname{Nat}:\operatorname{Type}_0$. Instantiating the implicit argument $\alpha$ with $\operatorname{Nat}$ therefore fixes $u=0$, and the specialized term has type
\begin{align*}\operatorname{id}_0\{\operatorname{Nat}\}:\operatorname{Nat}\to\operatorname{Nat}.\end{align*}
For any $n:\operatorname{Nat}$, applying the function unfolds the lambda term:
\begin{align*}\operatorname{id}_0\{\operatorname{Nat}\}(n)\equiv (\lambda (x:\operatorname{Nat})\mapsto x)(n).\end{align*}
Beta reduction gives
\begin{align*}(\lambda (x:\operatorname{Nat})\mapsto x)(n)\equiv n.\end{align*}
For a bundled type of groups whose carrier lies in $\operatorname{Type}_u$, the whole bundled type typically lies in $\operatorname{Type}_{u+1}$, because one field quantifies over a carrier type $G:\operatorname{Type}_u$. If $\operatorname{Group}_u:\operatorname{Type}_{u+1}$ and $H:\operatorname{Group}_u$, then applying the same identity term to groups instantiates its universe parameter with $u+1$:
\begin{align*}\operatorname{id}_{u+1}\{\operatorname{Group}_u\}:\operatorname{Group}_u\to\operatorname{Group}_u.\end{align*}
Thus
\begin{align*}\operatorname{id}_{u+1}\{\operatorname{Group}_u\}(H)\equiv H.\end{align*}
The surface expression is the same identity function, but the kernel checks different universe constraints in the two applications: $u=0$ for $\operatorname{Nat}$ and $u+1$ for the type of bundled groups.
[/example]
The chapter's main lesson is architectural. Proof assistants are powerful because elaboration, tactics, implicit arguments, recursion checkers, and pattern-matching compilers give users a high-level language; they are trustworthy because accepted declarations are reduced to explicit objects checked by a small dependent type-theoretic kernel. This same architecture supports certified programming and extraction, where computational content is obtained from checked definitions, while choices about proof irrelevance and richer equality theories such as homotopy type theory show that kernel design also shapes what kinds of mathematical identity and computation the assistant can express.
Proof assistants make the computational core of type theory operational as a foundation for formalized mathematics. The remaining question is how those syntax-level rules can also be understood semantically, and categorical semantics provides that interpretation in terms of objects, morphisms, and reindexing.
# 11. Categorical Semantics of Type Theory
Categorical semantics provides a second way to read the formal rules developed earlier in the course. Instead of treating contexts, substitutions, and terms as pieces of syntax, we interpret them as objects, morphisms, and sections in a category. The guiding question is: what categorical structure is exactly needed so that the typing rules remain sound under interpretation? This chapter answers that question first for simple types and then for dependent types, where substitution becomes pullback and type formers become adjoints to reindexing.
## Contexts, Substitutions, and Types as Fibrations
The first problem is to identify the categorical shadow of a typing judgment. A context records the variables available to a term, while a substitution replaces those variables by terms in another context. This is already composition-like: substituting from $\theta$ into $\rho$, then into $\tau$, should agree with the composite substitution.
[definition: Category of Contexts]
A category of contexts is a category $\baseC$ whose objects are interpreted contexts and whose morphisms $\theta:\rho\to\tau$ are interpreted substitutions from $\rho$ to $\tau$.
[/definition]
The direction convention is chosen so that a morphism $\theta:\rho\to\tau$ sends data in the source context $\rho$ to data expected by the target context $\tau$. Identity morphisms interpret identity substitutions, and categorical composition interprets successive substitution.
[example: Contexts in Set]
Let a context $\tau=(x_1:A_1,\dots,x_n:A_n)$ be interpreted in $\textbf{Set}$ by the Cartesian product
\begin{align*}
\tau^\flat=A_1\times\cdots\times A_n.
\end{align*}
If $\rho$ is another context, a substitution $\theta:\rho\to\tau$ assigns to each variable $x_i:A_i$ in $\tau$ a function
\begin{align*}
\theta_i:\rho^\flat\to A_i.
\end{align*}
These component functions determine a single function $\theta^\flat:\rho^\flat\to\tau^\flat$ by
\begin{align*}
\theta^\flat(r)=(\theta_1(r),\dots,\theta_n(r)).
\end{align*}
Conversely, any function $u:\rho^\flat\to A_1\times\cdots\times A_n$ has component functions $\pi_i\circ u:\rho^\flat\to A_i$, where $\pi_i$ is the $i$th product projection. Thus tuples of substitution components are exactly functions $\rho^\flat\to\tau^\flat$.
Now let $\phi:\sigma\to\rho$ have components $\phi_j:\sigma^\flat\to R_j$, so that $\phi^\flat:\sigma^\flat\to\rho^\flat$. The composite substitution $\theta\circ\phi:\sigma\to\tau$ has $i$th component
\begin{align*}
(\theta\circ\phi)_i=\theta_i\circ\phi^\flat:\sigma^\flat\to A_i.
\end{align*}
Therefore its associated function sends $s\in\sigma^\flat$ to
\begin{align*}
(\theta\circ\phi)^\flat(s)=((\theta_1\circ\phi^\flat)(s),\dots,(\theta_n\circ\phi^\flat)(s)).
\end{align*}
The ordinary composite $\theta^\flat\circ\phi^\flat$ sends the same $s$ to
\begin{align*}
(\theta^\flat\circ\phi^\flat)(s)=\theta^\flat(\phi^\flat(s))=(\theta_1(\phi^\flat(s)),\dots,\theta_n(\phi^\flat(s))).
\end{align*}
Since $(\theta_i\circ\phi^\flat)(s)=\theta_i(\phi^\flat(s))$ for each $i$, the two functions are equal. So, in this model, composing substitutions is precisely ordinary composition of functions between the interpreted context products.
[/example]
This example captures simple contexts, but dependent types require the available type over a context to vary with the value of the context. To represent such variation, the next definition records not only a total object of elements but also the base context over which those elements are indexed.
[definition: Type over a Context]
Let $\baseC$ be a category of contexts. A type over an object $\tau\in\baseC$ is an object $p:A\to\tau$ of the slice category $\baseC/\tau$.
[/definition]
A type projection describes which dependent elements are available over each point of the context, but a term must choose one such element coherently. The next definition isolates this choice by requiring a map back into the total space whose projection is the original context point.
[definition: Term as Section]
Let $p:A\to\tau$ be a type over $\tau$. A term of type $A$ in context $\tau$ is a morphism $a:\tau\to A$ in $\baseC$ such that $p\circ a=\operatorname{id}_\tau$.
[/definition]
The section condition says that the term really lies over the context in which it is typed. In set-theoretic language, if $A\to\tau$ is a family $(A_t)_{t\in\tau}$, then a section chooses an element $a(t)\in A_t$ for each $t$.
[example: Pullback of a Family]
Let $f:X\to Y$ be a function and let $p:E\to Y$ be a family of sets over $Y$. Write the fiber of $p$ over $y\in Y$ as
\begin{align*}
E_y=\{e\in E:p(e)=y\}.
\end{align*}
The pullback family $f^*E\to X$ has underlying set
\begin{align*}
f^*E=\{(x,e)\in X\times E:p(e)=f(x)\}
\end{align*}
and projection $q:f^*E\to X$ given by
\begin{align*}
q(x,e)=x.
\end{align*}
For a fixed $x\in X$, the fiber of $q$ over $x$ is
\begin{align*}
q^{-1}(x)=\{(x',e)\in f^*E:q(x',e)=x\}.
\end{align*}
Using the definition of $q$, the condition $q(x',e)=x$ is exactly $x'=x$, so
\begin{align*}
q^{-1}(x)=\{(x,e)\in X\times E:p(e)=f(x)\}.
\end{align*}
The map
\begin{align*}
q^{-1}(x)\to E_{f(x)},\qquad (x,e)\mapsto e
\end{align*}
is well-defined because membership in $q^{-1}(x)$ includes the equality $p(e)=f(x)$. Its inverse is
\begin{align*}
E_{f(x)}\to q^{-1}(x),\qquad e\mapsto (x,e),
\end{align*}
which is well-defined because $e\in E_{f(x)}$ means $p(e)=f(x)$. Therefore the fiber of the pulled-back family over $x$ is canonically $E_{f(x)}$. Pulling back along $f$ replaces the original parameter $y\in Y$ by $f(x)$, which is exactly the semantic effect of substituting $f(x)$ for the variable of type $Y$.
[/example]
The pullback example shows how a single substitution acts on one family. To model an entire type theory, these pullbacks must be chosen coherently for every context morphism, so the next definition packages all fibers of types and all substitution operations together. Without such chosen coherence, two syntactically identical iterated substitutions could be interpreted by merely isomorphic but not literally equal pullbacks, and the substitution lemma would no longer be a strict statement. For instance, pulling a family over $Z$ first along $g:Y\to Z$ and then along $f:X\to Y$ gives a pullback over $X$, while pulling it once along $g\circ f$ gives another pullback over $X$; these agree canonically, but a split structure chooses them so that the interpretation respects syntactic associativity on the nose.
[definition: Split Fibration of Types]
A split fibration of types over $\baseC$ consists of a category $\typeE$, a functor $\pi:\typeE\to\baseC$, and for every morphism $\theta:\rho\to\tau$ in $\baseC$ a chosen reindexing functor $\theta^*:\typeE_\tau\to\typeE_\rho$ satisfying $(\operatorname{id}_\tau)^*=\operatorname{id}_{\typeE_\tau}$ and $(\theta\circ\phi)^*=\phi^*\circ\theta^*$. For each $A\in\typeE_\tau$, the object $\theta^*A\in\typeE_\rho$ is equipped with a cartesian morphism $\theta^*A\to A$ lying over $\theta$; these cartesian morphisms are the cartesian lift data of the fibration.
[/definition]
The fiber $\typeE_\tau$ is the category of types over the context $\tau$. Reindexing a family accounts for substituting into a type, but syntax also substitutes into terms.
Once types are interpreted as families, the remaining semantic question is how a term behaves under the same substitution. Since a term is a section of its displayed family, substitution should turn a section over $Y$ into a section of the pulled-back family over $X$. The compatibility to check is that pullback changes not only the ambient family but also the term living in that family.
[quotetheorem:9646]
[citeproof:9646]
The hypotheses here are doing real work: the set-valued family model supplies actual pullbacks, and without pullbacks a substitution $f:X\to Y$ need not produce any type over $X$ from a type over $Y$. A concrete failure occurs in the category $\textbf{Set}_{\neq\varnothing}$ of nonempty sets and functions. Let $Y=\{0,1\}$, let $X=\{*\}$ map to $0\in Y$, and let $E=\{e\}$ map to $1\in Y$. The set-theoretic pullback is empty, so it is not an object of $\textbf{Set}_{\neq\varnothing}$. Thus the family $E\to Y$ cannot be reindexed along $X\to Y$ inside that category. The theorem does not say that every category of contexts supports all dependent type formers; it only identifies the action of substitution once families and pullbacks are available. This result gives the basic dictionary for dependent syntax: contexts are bases, types are families, terms are sections, and substitution is pullback. The remaining issue is to identify which extra categorical operations interpret the type formers.
## Cartesian Closed Categories for Simple Types
Before dependent types enter, simple type theory has two central type formers: products and functions. The problem is to find a categorical setting in which pairing, projections, lambda abstraction, and application are all available with the expected computation laws.
[definition: Cartesian Closed Category]
A cartesian closed category is a category $\baseC$ with a terminal object $1$, binary products $A\times B$, and for every pair of objects $A,B$ an exponential object $B^A$ equipped with a natural bijection
\begin{align*}
\baseC(C\times A,B)\cong\baseC(C,B^A).
\end{align*}
[/definition]
The terminal object interprets the empty product, binary products interpret product types, and exponentials interpret function types. The displayed bijection is currying: maps with two inputs correspond to maps into a function object.
[example: Products and Functions in Set]
In $\textbf{Set}$, the terminal object is any singleton set $1=\{*\}$: for every set $C$, there is exactly one function $C\to 1$, namely the function sending every $c\in C$ to $*$. The product of sets $A$ and $B$ is the Cartesian product $A\times B$ with projections $\pi_A(a,b)=a$ and $\pi_B(a,b)=b$.
We compute the exponential object. Let $B^A$ be the set of functions $A\to B$. Given a function $h:C\times A\to B$, define its curried form $\tilde h:C\to B^A$ by requiring that, for each $c\in C$, $\tilde h(c)$ is the function $A\to B$ given by
\begin{align*}
\tilde h(c)(a)=h(c,a).
\end{align*}
This is well-defined because for fixed $c\in C$ and every $a\in A$, the pair $(c,a)$ lies in $C\times A$, so $h(c,a)\in B$.
Conversely, given a function $k:C\to B^A$, define its uncurried form $\hat k:C\times A\to B$ by
\begin{align*}
\hat k(c,a)=k(c)(a).
\end{align*}
Now start with $h:C\times A\to B$. Its curried form is $\tilde h$, and then uncurrying gives
\begin{align*}
\widehat{\tilde h}(c,a)=\tilde h(c)(a)=h(c,a).
\end{align*}
Thus $\widehat{\tilde h}=h$. Starting with $k:C\to B^A$, currying $\hat k$ gives a function $\widetilde{\hat k}:C\to B^A$ satisfying, for every $c\in C$ and $a\in A$,
\begin{align*}
\widetilde{\hat k}(c)(a)=\hat k(c,a)=k(c)(a).
\end{align*}
Therefore $\widetilde{\hat k}(c)=k(c)$ as functions $A\to B$ for each $c$, so $\widetilde{\hat k}=k$. Hence functions $C\times A\to B$ are in bijection with functions $C\to B^A$. The evaluation map is the uncurried identity on $B^A$, namely
\begin{align*}
\operatorname{ev}:B^A\times A\to B,\qquad \operatorname{ev}(g,a)=g(a).
\end{align*}
This is why, in $\textbf{Set}$, function types are modeled by ordinary sets of functions, currying models lambda abstraction, and evaluation models application.
[/example]
This example explains the syntax of functions: lambda abstraction is currying, while application is evaluation. Product introduction is pairing into a product, and product eliminations are the two projections.
[definition: Interpretation of Simple Types]
Fix a cartesian closed category $\baseC$ and let $\mathsf{Ty}$ be the set of simple types generated from ground types by product and function type formers. An interpretation of simple types consists of a function
\begin{align*}
\llbracket-\rrbracket_{\mathsf{Ty}}:\mathsf{Ty}\to\operatorname{Obj}(\baseC)
\end{align*}
such that $\llbracket A\times B\rrbracket_{\mathsf{Ty}}=\llbracket A\rrbracket_{\mathsf{Ty}}\times\llbracket B\rrbracket_{\mathsf{Ty}}$ and $\llbracket A\to B\rrbracket_{\mathsf{Ty}}=\llbracket B\rrbracket_{\mathsf{Ty}}^{\llbracket A\rrbracket_{\mathsf{Ty}}}$. If $\mathsf{Jdg}$ is the class of derivable judgments $\tau\vdash t:A$, the term part of the interpretation is a function sending each judgment in $\mathsf{Jdg}$ to a morphism
\begin{align*}
\llbracket t\rrbracket:\llbracket\tau\rrbracket\to\llbracket A\rrbracket_{\mathsf{Ty}}
\end{align*}
in $\baseC$, where $\llbracket\tau\rrbracket$ is the product of the interpreted types of the variables in $\tau$.
[/definition]
Given a typing context $\tau=x_1:A_1,\dots,x_n:A_n$, its interpretation is the product $\llbracket\tau\rrbracket=\llbracket A_1\rrbracket\times\dots\times\llbracket A_n\rrbracket$. A term judgment $\tau\vdash t:A$ should then be interpreted as a morphism $\llbracket t\rrbracket:\llbracket\tau\rrbracket\to\llbracket A\rrbracket$. The obstruction is that each syntactic rule demands a corresponding categorical operation: variables require projections, pairing requires products, application requires evaluation, and lambda abstraction requires exponential transposition. A sound interpretation exists only if these operations fit together with the typing rules.
[quotetheorem:9647]
[citeproof:9647]
The hypotheses are necessary because each syntactic constructor asks for a matching categorical operation. The category $\textbf{Grp}$ has finite products, but it is not cartesian closed: for instance, an exponential $K^H$ would make the functor $-\times H$ a left adjoint and hence preserve coproducts, while products with a nontrivial group do not preserve free products in $\textbf{Grp}$. Thus $\textbf{Grp}$ can interpret pairing and projections but has no object that uniformly represents functions $H\to K$, so lambda abstraction has no general semantic target there. The theorem does not claim completeness, nor does it say that two morphisms equal in every cartesian closed category must be syntactically convertible in this particular presentation. It says that the syntax cannot distinguish terms identified by the categorical universal properties. In particular, beta reduction is not an extra operational trick in the model; it is built into the definition of the exponential transpose.
[example: Interpreting Function Application]
Let $\tau\vdash f:A\to B$ and $\tau\vdash a:A$ be interpreted as morphisms
\begin{align*}
\llbracket f\rrbracket:\llbracket\tau\rrbracket\to\llbracket B\rrbracket^{\llbracket A\rrbracket}
\end{align*}
and
\begin{align*}
\llbracket a\rrbracket:\llbracket\tau\rrbracket\to\llbracket A\rrbracket.
\end{align*}
By the universal property of the product, these two morphisms determine the pairing
\begin{align*}
\langle\llbracket f\rrbracket,\llbracket a\rrbracket\rangle:\llbracket\tau\rrbracket\to\llbracket B\rrbracket^{\llbracket A\rrbracket}\times\llbracket A\rrbracket,
\end{align*}
characterized by
\begin{align*}
\pi_1\circ\langle\llbracket f\rrbracket,\llbracket a\rrbracket\rangle=\llbracket f\rrbracket
\end{align*}
and
\begin{align*}
\pi_2\circ\langle\llbracket f\rrbracket,\llbracket a\rrbracket\rangle=\llbracket a\rrbracket.
\end{align*}
The application $f(a)$ is interpreted by composing this pairing with evaluation:
\begin{align*}
\llbracket f(a)\rrbracket=\operatorname{ev}\circ\langle\llbracket f\rrbracket,\llbracket a\rrbracket\rangle:\llbracket\tau\rrbracket\to\llbracket B\rrbracket.
\end{align*}
Equivalently, in the set model, an element $t\in\llbracket\tau\rrbracket$ is first sent to
\begin{align*}
\langle\llbracket f\rrbracket,\llbracket a\rrbracket\rangle(t)=(\llbracket f\rrbracket(t),\llbracket a\rrbracket(t)),
\end{align*}
and evaluation then gives
\begin{align*}
\llbracket f(a)\rrbracket(t)=\operatorname{ev}(\llbracket f\rrbracket(t),\llbracket a\rrbracket(t))=\llbracket f\rrbracket(t)(\llbracket a\rrbracket(t)).
\end{align*}
Thus categorical function elimination is exactly: form the pair consisting of the interpreted function and its interpreted argument, then apply the evaluation morphism.
[/example]
The simple interpretation uses one ambient category and ordinary products. Dependent type theory forces each type to live over a context, so products and functions must be replaced by fiberwise versions that interact with substitution.
## Locally Cartesian Closed Categories and Comprehension Categories
The main problem for dependent types is that a type may depend on a term. If $x:A\vdash B(x)\ \text{type}$, then the semantic object for $B$ should live over the semantic object for $A$, and substituting a term into $B$ should be stable under pullback.
[definition: Locally Cartesian Closed Category]
A locally cartesian closed category is a category $\baseC$ with finite limits such that every slice category $\baseC/\tau$ is cartesian closed.
[/definition]
Finite limits provide pullbacks, so types over contexts can be reindexed. Cartesian closure in each slice provides dependent function spaces internal to each context.
[example: Families of Sets]
Let $X$ be a set. An object of the slice category $\textbf{Set}/X$ is a function $p:E\to X$, which is equivalently the family of fibers
\begin{align*}
E_x=\{e\in E:p(e)=x\}
\end{align*}
indexed by $x\in X$. A morphism from $p:E\to X$ to $q:F\to X$ is a function $u:E\to F$ such that $q\circ u=p$; therefore if $e\in E_x$, then
\begin{align*}
q(u(e))=(q\circ u)(e)=p(e)=x,
\end{align*}
so $u$ restricts to a function $E_x\to F_x$ on each fiber.
The terminal object of $\textbf{Set}/X$ is $\operatorname{id}_X:X\to X$, whose fiber over $x$ is the singleton $\{x\}$. Given $p:E\to X$ and $q:F\to X$, their product in the slice is the pullback
\begin{align*}
E\times_X F=\{(e,f)\in E\times F:p(e)=q(f)\}
\end{align*}
with projection $(e,f)\mapsto p(e)=q(f)$ to $X$. Its fiber over $x$ is
\begin{align*}
(E\times_X F)_x=\{(e,f)\in E\times F:p(e)=x\text{ and }q(f)=x\}=E_x\times F_x.
\end{align*}
Now take two families $p:E\to X$ and $q:F\to X$. Define the fiber of the exponential $F^E\to X$ over $x$ to be the ordinary function set
\begin{align*}
(F^E)_x=\operatorname{Set}(E_x,F_x).
\end{align*}
Thus the total set is
\begin{align*}
F^E=\{(x,\alpha):x\in X\text{ and }\alpha:E_x\to F_x\}
\end{align*}
with projection $(x,\alpha)\mapsto x$. Evaluation over $X$ is the map
\begin{align*}
\operatorname{ev}:F^E\times_X E\to F,\qquad ((x,\alpha),e)\mapsto \alpha(e).
\end{align*}
This is well-defined because $((x,\alpha),e)\in F^E\times_X E$ means $e\in E_x$, and $\alpha:E_x\to F_x$, so $\alpha(e)\in F_x\subseteq F$.
For any family $r:G\to X$, a map $h:G\times_X E\to F$ over $X$ sends each pair $(g,e)$ with $g\in G_x$ and $e\in E_x$ to an element of $F_x$. It therefore determines a map $\tilde h:G\to F^E$ by
\begin{align*}
\tilde h(g)=(r(g),\alpha_g),
\end{align*}
where
\begin{align*}
\alpha_g:E_{r(g)}\to F_{r(g)},\qquad \alpha_g(e)=h(g,e).
\end{align*}
Conversely, a map $k:G\to F^E$ over $X$ has the form $k(g)=(r(g),\beta_g)$ with $\beta_g:E_{r(g)}\to F_{r(g)}$, and it gives
\begin{align*}
\hat k:G\times_X E\to F,\qquad \hat k(g,e)=\beta_g(e).
\end{align*}
Starting with $h$, we get
\begin{align*}
\widehat{\tilde h}(g,e)=\alpha_g(e)=h(g,e).
\end{align*}
Starting with $k$, we get
\begin{align*}
\widetilde{\hat k}(g)=(r(g),\beta_g)=k(g).
\end{align*}
Hence $\textbf{Set}/X$ is cartesian closed, with products and exponentials computed fiber by fiber. Since $\textbf{Set}$ also has finite limits, $\textbf{Set}$ is locally cartesian closed: dependent products and function types of set-indexed families are ordinary products and function sets inside each fiber.
[/example]
The slice viewpoint supplies the local categorical structure, but syntax also has a primitive operation of extending a context by a type. Slice objects alone tell us that a family $A\to\tau$ exists; they do not by themselves identify the syntactic context $\tau.A$ together with its projection back to $\tau$. Without this extra structure, the rule that introduces a fresh variable $x:A$ can lose its semantic anchor: one may know that a family lives over $\tau$, but not which context object represents pairs consisting of an old context and an element of that family. To connect slice objects with this syntactic context extension, the next definition names the display maps that realize $\tau.A$ from $\tau$ and $A$.
[definition: Comprehension Category]
A comprehension category consists of a category of contexts $\baseC$, a fibration $\pi:\typeE\to\baseC$ of types over contexts, and, for each object $\tau\in\baseC$, a comprehension functor
\begin{align*}
\chi_\tau:\typeE_\tau\to\baseC/\tau
\end{align*}
from the fiber of types over $\tau$ to the slice of display maps over $\tau$. For $A\in\typeE_\tau$, write $\chi_\tau(A)=p_A:\tau.A\to\tau$. The reindexing data consist of functors $\theta^*:\typeE_\tau\to\typeE_\rho$ for every morphism $\theta:\rho\to\tau$ in $\baseC$, together with chosen cartesian comparison isomorphisms in $\baseC/\rho$ identifying $\chi_\rho(\theta^*A)$ with the pullback $\theta^*(\chi_\tau(A))$ for every $A\in\typeE_\tau$.
[/definition]
The extended context $\tau.A$ supports a distinguished variable of type $A$ over $\tau$. Dependent functions pose a semantic reversal problem: their introduction rule starts with a term of $B$ in the larger context $\tau.A$, but produces a term of $\Pi_{x:A}B(x)$ back in the smaller context $\tau$. Categorically, this requires an operation that moves suitable families or sections along the display map $\tau.A\to\tau$ in the opposite direction to reindexing, while remaining compatible with substitution.
[quotetheorem:9648]
[citeproof:9648]
The right-adjoint hypothesis is essential because dependent functions require a way to collect terms over the extended context back into the original context. In the family model, if the fibers $B(a)$ are sets, the dependent product has fiber $\prod_{a\in A_t}B(a)$; without such products in the relevant fibers, there may be no object representing all dependent functions at once. A specific non-dependent shadow of this failure appears in $\textbf{Grp}$: taking display maps to include product projections $G\times H\to G$, a right adjoint to reindexing along such a projection would supply an exponential object by $H$, but $\textbf{Grp}$ is not cartesian closed. Beck-Chevalley is also essential: without it, forming $\Pi$ and then substituting can differ from substituting the family first and then forming $\Pi$, which would invalidate the syntactic substitution rule for dependent functions. The theorem does not assert that arbitrary right adjoints automatically interpret a full dependent type theory; they must be attached to the display maps used for context extension and must be coherent with reindexing. The adjoint description mirrors the rule for dependent functions: to build an element of $\Pi_{x:A}B(x)$ in context $\tau$, we build an element of $B(x)$ in the extended context $\tau.A$. The following theorem treats the dual construction for dependent pair types, where the semantic operation remembers the added variable together with its dependent component.
[quotetheorem:9649]
[citeproof:9649]
The left-adjoint hypothesis is necessary because dependent sums require a categorical operation that pushes a family along a display map. In $\textbf{Set}$ this pushforward is the disjoint union of fibers, but in a category without the relevant left adjoints there need not be any object over $\tau$ representing pairs drawn from a type over $\rho$. For a concrete indexed-category failure, take a base category with one nonidentity arrow $0\to 1$, take the fiber over $1$ to be the one-object poset, take the fiber over $0$ to be $(\mathbb N,\le)$, and let reindexing send the unique object over $1$ to $0\in\mathbb N$. This reindexing functor has no left adjoint, since the unique possible functor $\mathbb N\to 1$ would require $n\le 0$ for every $n\in\mathbb N$. Beck-Chevalley is again essential: otherwise substituting into $\sum_{x:A}B(x)$ could fail to match the sum of the substituted family, so the syntactic substitution law for pair types would not be sound. The theorem does not say that dependent sums preserve all further structure, such as identity types or universes; it only supplies the categorical operation corresponding to $\Sigma$-formation, introduction, elimination, and computation. Dependent sums are geometrically simpler than dependent products: they collect all fibers into one total space. The type remembers both the base point and the dependent element over that point.
[example: Dependent Sum as Disjoint Union of Fibers]
Let $A$ be a set and let $B:A\to\textbf{Set}$ be a family. Define
\begin{align*}
\sum_{a\in A}B(a)=\{(a,b):a\in A\text{ and }b\in B(a)\}.
\end{align*}
When this dependent sum is viewed as a family over $A$, its projection is
\begin{align*}
\pi:\sum_{a\in A}B(a)\to A,\qquad \pi(a,b)=a.
\end{align*}
This is well-defined because every element of $\sum_{a\in A}B(a)$ is, by definition, a pair $(a,b)$ with first component $a\in A$.
For a fixed $a_0\in A$, the fiber of $\pi$ over $a_0$ is
\begin{align*}
\pi^{-1}(a_0)=\{(a,b)\in\sum_{a\in A}B(a):\pi(a,b)=a_0\}.
\end{align*}
Using the definition of $\pi$, the condition $\pi(a,b)=a_0$ is exactly $a=a_0$, so
\begin{align*}
\pi^{-1}(a_0)=\{(a_0,b):b\in B(a_0)\}.
\end{align*}
The map
\begin{align*}
B(a_0)\to\pi^{-1}(a_0),\qquad b\mapsto(a_0,b)
\end{align*}
has inverse
\begin{align*}
\pi^{-1}(a_0)\to B(a_0),\qquad (a_0,b)\mapsto b.
\end{align*}
Thus the fiber over $a_0$ is exactly the original set $B(a_0)$, with the tag $a_0$ recording which fiber the element came from. The dependent sum is therefore the disjoint union of the fibers $B(a)$, and the projection remembers the indexing element $a$.
[/example]
The projection in the example is the displayed family over $A$: it records the first component of each dependent pair. If instead we form the total dependent sum along the unique map $A\to 1$, the same set of pairs is viewed as an object over the terminal context, so the base information has been collected into the total object. This distinction between a displayed family over $A$ and its pushforward to $1$ is the semantic difference between remembering the ambient variable and closing the type in the empty context.
The adjunctions for $\Pi$ and $\Sigma$ give the right universal properties in a fixed context, but type theory also allows every construction to be substituted into another context. The possible failure is not the absence of a product or sum in a single fiber; it is that two legitimate semantic routes may produce incompatible objects after a substitution. The next definition states the coherence condition that makes reindexing commute with the adjoints attached to display maps.
The comparison maps in the next definition express the precise test for whether dependent products and sums commute with substitution. They are needed because the same syntactic expression can be interpreted by first changing context and then forming the type former, or by forming the type former first and then changing context.
[definition: Beck-Chevalley Condition]
For a pullback square with vertical maps $\theta':\rho'\to\tau'$ and $\theta:\rho\to\tau$, and horizontal maps $\phi':\rho'\to\rho$ and $\phi:\tau'\to\tau$, the Beck-Chevalley condition for right adjoints states that the canonical comparison $\phi^*\Pi_\theta\to\Pi_{\theta'}(\phi')^*$ is an isomorphism. The Beck-Chevalley condition for left adjoints states that the canonical comparison $\Sigma_{\theta'}(\phi')^*\to\phi^*\Sigma_\theta$ is an isomorphism.
[/definition]
This condition says that substituting first and then forming a dependent product or sum gives the same semantic result as forming it first and then substituting. The next theorem records why this coherence is not decorative: it is exactly what validates the substitution rules for dependent type formers in the categorical interpretation.
[quotetheorem:9650]
[citeproof:9650]
The pullback-square hypothesis is necessary because substitution changes both the base context and the extended context. If the square is not a pullback, the object $\rho'$ need not be the substituted context extension, so comparing the two routes no longer expresses the syntactic substitution rule. The isomorphism condition is also necessary: a mere map $\phi^*\Pi_\theta B\to\Pi_{\theta'}(\phi')^*B$ would give only a one-way comparison, not equality of interpretations up to canonical isomorphism. The theorem does not claim that Beck-Chevalley creates adjoints or type formers; it says that once the adjoints exist, this condition is what makes them stable under substitution.
This is the same coherence principle behind indexed categories, the internal language of locally cartesian closed categories, and many categorical accounts of proof-assistant semantics. In the language of indexed categories, the fibers $\typeE_\tau$ form a pseudofunctor $\baseC^{\mathrm{op}}\to\textbf{Cat}$, and the Beck-Chevalley isomorphisms say that the adjoints to reindexing are compatible with change of base. In a topos or a locally cartesian closed category, these fiberwise adjunctions are part of the categorical logic that lets internal formulas be transported along substitutions. Proof assistants exploit the same architecture: context extension, weakening, substitution, and dependent type formation are implemented syntactically, while categorical semantics explains why their coherence laws have the shape of pullback stability.
The categorical picture now matches the shape of dependent type theory: contexts form a base category, types vary over contexts, substitution is reindexing, terms are sections, and type formers arise from universal constructions in slices. In this sense, categorical semantics is not merely an external model; it is a compressed account of the structural rules that make type theory coherent.
Categorical semantics recasts contexts, substitutions, and type formers in structural terms, showing that the rules of type theory have a coherent mathematical meaning beyond syntax alone. With that perspective established, the course can move outward to extensions that enrich the core theory with new forms of inductive, coinductive, and higher-dimensional structure.
# 12. Extensions and Modern Directions
This chapter surveys extensions of dependent type theory that broaden what the core calculus can express. Earlier chapters developed ordinary inductive types, identity types, and the Curry-Howard reading; here we ask how those ideas scale to indexed data, well-founded trees, infinite observations, quotients, extensional equality principles, and verified programming. The guiding thread is that every extension changes both the objects that can be formed and the proof principles available for reasoning about them.
## Inductive Families, W-Types, Coinduction, and Quotient-Like Constructions
The ordinary inductive types met earlier, such as natural numbers and lists, already support proofs by structural induction. Many mathematical and programming examples require families of types whose members carry an index, such as vectors indexed by length or expressions indexed by their object-language type. The question is how to extend induction so that the index carries logical information without turning every proof into a separate external invariant.
[definition: Inductive Family]
An inductive family over a type $I$ is a family of types $A : I \to \mathcal{U}$ generated by constructors whose target types have the form $A(i)$ for specified indices $i : I$.
[/definition]
The index is part of the type being constructed, so the constructor rules can enforce constraints that would otherwise be propositions about data. This is the mechanism behind length-indexed vectors, typed syntax trees, derivation trees, and intrinsically typed programs.
[example: Vectors Indexed By Length]
Let $A : \mathcal{U}$ be a type, and let length indices range over the internal zero-based type $\operatorname{Nat}$, with constructors $\operatorname{zero}$ and $\operatorname{succ}$. This is distinct from the convention that $\mathbb{N}$ denotes the positive natural numbers. The inductive family $\operatorname{Vec}(A,n)$ has constructors
\begin{align*}
\operatorname{nil} : \operatorname{Vec}(A,\operatorname{zero})
\end{align*}
and
\begin{align*}
\operatorname{cons} : \prod_{n:\operatorname{Nat}} A \to \operatorname{Vec}(A,n) \to \operatorname{Vec}(A,\operatorname{succ}(n)).
\end{align*}
The append operation is specified by the dependent type
\begin{align*}
\operatorname{append} : \prod_{m,n:\operatorname{Nat}} \operatorname{Vec}(A,m) \to \operatorname{Vec}(A,n) \to \operatorname{Vec}(A,m+n).
\end{align*}
We define $\operatorname{append}$ by recursion on the first vector. In the base case, the first vector is $\operatorname{nil}$, so $m$ is $\operatorname{zero}$. For $v:\operatorname{Vec}(A,n)$, the required output type is
\begin{align*}
\operatorname{Vec}(A,\operatorname{zero}+n).
\end{align*}
By the defining equation for addition on $\operatorname{Nat}$,
\begin{align*}
\operatorname{zero}+n \equiv n.
\end{align*}
Thus $v:\operatorname{Vec}(A,n)$ has exactly the required type, so
\begin{align*}
\operatorname{append}(\operatorname{nil},v) := v.
\end{align*}
In the successor case, the first vector has the form $\operatorname{cons}(a,u)$ with $a:A$ and $u:\operatorname{Vec}(A,m)$. For $v:\operatorname{Vec}(A,n)$, the recursive call gives
\begin{align*}
\operatorname{append}(u,v):\operatorname{Vec}(A,m+n).
\end{align*}
Applying $\operatorname{cons}$ to $a$ and this recursive result gives
\begin{align*}
\operatorname{cons}(a,\operatorname{append}(u,v)):\operatorname{Vec}(A,\operatorname{succ}(m+n)).
\end{align*}
The required result type is
\begin{align*}
\operatorname{Vec}(A,\operatorname{succ}(m)+n).
\end{align*}
By the defining equation for addition in the successor case,
\begin{align*}
\operatorname{succ}(m)+n \equiv \operatorname{succ}(m+n).
\end{align*}
Therefore $\operatorname{cons}(a,\operatorname{append}(u,v))$ has the required type, and we set
\begin{align*}
\operatorname{append}(\operatorname{cons}(a,u),v) := \operatorname{cons}(a,\operatorname{append}(u,v)).
\end{align*}
The length statement is not a separate theorem added afterward: it is enforced by the target type $\operatorname{Vec}(A,m+n)$ in each defining clause.
[/example]
After indexed induction, the next extension asks for inductive objects whose branching shape is itself part of the data. This leads to W-types, which are the type-theoretic form of well-founded trees.
[definition: W-Type]
Given a type $A : \mathcal{U}$ and a family $B : A \to \mathcal{U}$, the W-type $W_{a:A} B(a)$ is generated by a constructor
\begin{align*}
\operatorname{sup} : \prod_{a:A} (B(a) \to W_{a:A} B(a)) \to W_{a:A} B(a).
\end{align*}
[/definition]
An element of a W-type is a tree whose root has a label $a:A$, and whose immediate subtrees are indexed by the elements of $B(a)$. To use such trees in proofs and recursive definitions, we need a principle saying that it is enough to prove a property at a root after proving it for all immediate subtrees. The W-type induction theorem gives exactly that rule, with the branching family $B(a)$ controlling the available induction hypotheses.
[quotetheorem:9651]
[citeproof:9651]
This induction principle explains why W-types can represent syntax, ordinals below a universe bound, and well-founded recursive data. The hypothesis is not merely a recursive clause at the root: it must work uniformly for every label $a:A$ and every subtree assignment $f:B(a)\to W$, because those data describe all possible immediate shapes of a W-tree. If the step were supplied only for a particular branching profile, the eliminator would fail on nodes whose child positions were indexed by a different $B(a)$. For instance, if $A$ has labels $\operatorname{leaf}$ and $\operatorname{node}$ with $B(\operatorname{leaf})$ empty and $B(\operatorname{node})$ a two-element type, a proposed recursion that handles only the binary-node case gives no value at $\operatorname{sup}(\operatorname{leaf},f)$. Conversely, a clause that assumes exactly two recursive results cannot be applied to a label whose child-position type has three elements. The theorem also does not justify arbitrary corecursive definitions or definitions that inspect infinitely descending data; every recursive use is confined to an immediate subtree already exposed by the constructor.
[example: Well-Founded Trees]
Let $W = W_{a:A}B(a)$, and suppose each child-position type $B(a)$ is finite, so a finite maximum of natural-number-indexed child heights is available. We define a height function
\begin{align*}
\operatorname{height}:W\to \operatorname{Nat}
\end{align*}
by W-type recursion. For a tree written as $\operatorname{sup}(a,f)$, where $f:B(a)\to W$ assigns one immediate subtree to each child position, the recursive data are the values
\begin{align*}
\operatorname{height}(f(b)):\operatorname{Nat}\quad\text{for each }b:B(a).
\end{align*}
The defining clause is
\begin{align*}
\operatorname{height}(\operatorname{sup}(a,f)):=\operatorname{succ}\left(\max_{b:B(a)}\operatorname{height}(f(b))\right).
\end{align*}
This uses only the immediate subtrees $f(b)$, so it is exactly the kind of recursive definition justified by the *[W-Type Induction Principle](/theorems/9651)*.
For a leaf label $a$ with $B(a)$ empty, the family of child heights has no entries, so its maximum is the chosen empty maximum $\operatorname{zero}$. Hence
\begin{align*}
\operatorname{height}(\operatorname{sup}(a,f))=\operatorname{succ}(\operatorname{zero}).
\end{align*}
For a binary label $a$ with child positions $\ell,r:B(a)$ and no others, the finite maximum is
\begin{align*}
\max_{b:B(a)}\operatorname{height}(f(b))=\max\bigl(\operatorname{height}(f(\ell)),\operatorname{height}(f(r))\bigr),
\end{align*}
so
\begin{align*}
\operatorname{height}(\operatorname{sup}(a,f))=\operatorname{succ}\bigl(\max(\operatorname{height}(f(\ell)),\operatorname{height}(f(r)))\bigr).
\end{align*}
Thus the height counts one node at the root plus the largest height among its immediate subtrees, with the branching type $B(a)$ determining exactly which subtrees are included.
[/example]
Some data, such as streams or infinite transition systems, are not well-founded. The problem there is dual: instead of justifying recursion by reaching smaller subdata, we justify observation by revealing finite approximations of potentially infinite data.
[definition: Coinductive Type]
A record-style coinductive type is a type $C:\mathcal{U}$ specified by a collection of observation maps
\begin{align*}
d_i : C \to F_i(C)
\end{align*}
for specified type expressions $F_i(C)$, together with a guarded or productive corecursion principle for constructing elements whose finite observations are determined by those destructors.
[/definition]
Coinduction replaces induction when the object is defined by how it can be observed rather than by how it was finitely constructed. Categorically, a coinductive type is often understood as a final coalgebra for the functor describing its observations; operationally, type theories enforce guardedness or productivity so that every finite observation can be produced in finite time. The associated proof principle is usually bisimulation: two coinductive objects are equal when their observations match and their later states remain related.
[example: Streams]
For a type $A$, a stream over $A$ is observed by
\begin{align*}
\operatorname{head}:\operatorname{Stream}(A)\to A
\end{align*}
and
\begin{align*}
\operatorname{tail}:\operatorname{Stream}(A)\to\operatorname{Stream}(A).
\end{align*}
The constant stream of ones over $\mathbb{N}$ is the stream $\operatorname{ones}:\operatorname{Stream}(\mathbb{N})$ whose observations are specified by
\begin{align*}
\operatorname{head}(\operatorname{ones})=1
\end{align*}
and
\begin{align*}
\operatorname{tail}(\operatorname{ones})=\operatorname{ones}.
\end{align*}
Thus the first observation returns $1$, and the next state to be observed is again the same stream.
For types $A,B:\mathcal{U}$, stream mapping has type
\begin{align*}
\operatorname{map} : (A\to B)\to \operatorname{Stream}(A)\to\operatorname{Stream}(B).
\end{align*}
Given $f:A\to B$ and $s:\operatorname{Stream}(A)$, we define $\operatorname{map}(f,s)$ by specifying its two observations:
\begin{align*}
\operatorname{head}(\operatorname{map}(f,s)):=f(\operatorname{head}(s)).
\end{align*}
Here $\operatorname{head}(s):A$, so $f(\operatorname{head}(s)):B$, which is exactly the required head type for a stream in $\operatorname{Stream}(B)$. The tail observation is
\begin{align*}
\operatorname{tail}(\operatorname{map}(f,s)):=\operatorname{map}(f,\operatorname{tail}(s)).
\end{align*}
Since $\operatorname{tail}(s):\operatorname{Stream}(A)$, the recursive call has type $\operatorname{Stream}(B)$, which is exactly the required tail type. The recursive occurrence is guarded by the tail observation: to observe the head of $\operatorname{map}(f,s)$ one computes $f(\operatorname{head}(s))$ immediately, and only a later tail observation asks for $\operatorname{map}(f,\operatorname{tail}(s))$. Thus every finite prefix of the mapped stream is produced after finitely many observations of the input stream.
[/example]
Quotient-like constructions raise a different issue. Mathematics often identifies objects up to an [equivalence relation](/page/Equivalence%20Relation), but intensional type theory keeps definitional equality stricter than mathematical equality. A quotient extension must let us pass from representatives to equivalence classes without breaking computation.
[definition: Set-Quotient]
Given a type $A : \mathcal{U}$ and an equivalence relation $R : A \to A \to \mathcal{U}$ whose values are propositions, a set-quotient $A/R$ is a type equipped with a map $q:A\to A/R$ such that $q(a)=q(b)$ whenever $R(a,b)$, and satisfying the corresponding elimination principle for functions constant on $R$-equivalence classes.
[/definition]
The quotient eliminator is the important part: to define a function out of $A/R$, it is not enough to define it on representatives; the definition must respect the relation. In homotopical type theories, quotients are often handled as higher inductive types, where point constructors add representatives and path constructors explicitly add the required equalities. This makes quotienting another instance of the same theme as coinduction: extending the type former changes both the objects available and the eliminator used to reason from them.
[example: Integers As A Quotient]
The integers can be represented by pairs $(m,n):\operatorname{Nat}\times\operatorname{Nat}$, read as the formal difference $m-n$ using the zero-based natural-number object. Define
\begin{align*}
(m,n)\sim(m',n')\quad\text{when}\quad m+n'=m'+n.
\end{align*}
On representatives, addition is proposed by
\begin{align*}
(m,n)+(p,q):=(m+p,n+q).
\end{align*}
To see that this descends to equivalence classes, suppose $(m,n)\sim(m',n')$ and $(p,q)\sim(p',q')$. Thus
\begin{align*}
m+n'=m'+n
\end{align*}
and
\begin{align*}
p+q'=p'+q.
\end{align*}
We must prove
\begin{align*}
(m+p)+(n'+q')=(m'+p')+(n+q),
\end{align*}
because this is exactly the statement that $(m+p,n+q)\sim(m'+p',n'+q')$.
Using associativity and commutativity of addition on $\operatorname{Nat}$, the left-hand side can be regrouped as
\begin{align*}
(m+p)+(n'+q')=(m+n')+(p+q').
\end{align*}
Substituting the two assumed equivalences gives
\begin{align*}
(m+n')+(p+q')=(m'+n)+(p'+q).
\end{align*}
Again using associativity and commutativity of addition on $\operatorname{Nat}$,
\begin{align*}
(m'+n)+(p'+q)=(m'+p')+(n+q).
\end{align*}
Combining these equalities gives
\begin{align*}
(m+p)+(n'+q')=(m'+p')+(n+q).
\end{align*}
Therefore the representative-level formula for addition respects $\sim$, so it defines an addition operation on the quotient type of integer classes.
[/example]
## Function Extensionality, Univalence, and Homotopy Type Theory
Identity types give a syntax of equality internal to type theory, but intensional equality does not automatically validate every equality principle used in ordinary mathematics. The central question of this section is which additional principles can be added consistently, and what mathematical meaning they give to equality.
The first such principle concerns functions. If two functions return equal outputs at every input, ordinary mathematical practice treats them as equal functions. Intensional type theory does not derive this in general, so it is commonly added as an axiom or obtained in richer systems.
[remark: Function Extensionality Schema]
For types $A : \mathcal{U}$ and families $B : A \to \mathcal{U}$, function extensionality is the principle that for any $f,g : \prod_{x:A}B(x)$ there is a map
\begin{align*}
\left(\prod_{x:A} f(x)=g(x)\right) \to f=g.
\end{align*}
[/remark]
This schema is stated for dependent functions; the non-dependent version is the special case in which $B(x)$ is a constant family. Its input is essential: without a term of $\prod_{x:A}f(x)=g(x)$, function extensionality gives no way to identify $f$ and $g$. For example, two functions $A\to B$ that agree only on a chosen subset of $A$ cannot be concluded equal as functions on all of $A$. The schema also does not collapse different proofs of pointwise equality or make functions definitionally equal; it supplies an identity term in the internal equality type, so computation by reduction is still governed by the underlying definitional equality rules.
A concrete boundary case is obtained by taking $A$ to be the boolean type and $B$ to be the internal type $\operatorname{Nat}$. Define $f(\operatorname{true})=\operatorname{zero}$, $f(\operatorname{false})=\operatorname{zero}$, while $g(\operatorname{true})=\operatorname{zero}$ and $g(\operatorname{false})=\operatorname{succ}(\operatorname{zero})$. The functions agree on the subset containing only $\operatorname{true}$, but there is no pointwise equality at $\operatorname{false}$. Function extensionality cannot turn this partial agreement into a path $f=g$, because its premise demands equality at every input of the declared domain.
[remark: Semantic Status Of Function Extensionality]
In the basic intensional rules, evaluation sends a path $f=g$ to pointwise paths $\prod_{x:A}f(x)=g(x)$, but the reverse direction is not derivable in general. Extensional models validate the reverse direction by interpreting functions extensionally, while homotopy type theory validates it through the path structure of dependent products. In this chapter it is therefore treated as a schema or additional principle, not as a theorem proved from the bare identity eliminator.
[/remark]
Function extensionality changes the way equality proofs for programs are written. Instead of constructing a direct identity proof between two functions, it permits a proof by fixing an arbitrary input and comparing outputs.
[example: Equality Of Composed Functions]
Let $f,g : A \to B$ and $h:B\to C$, and suppose
\begin{align*}
p : \prod_{x:A} f(x)=g(x).
\end{align*}
We show that $h\circ f=h\circ g$. Fix $x:A$. From the component
\begin{align*}
p(x):f(x)=g(x)
\end{align*}
congruence of equality under the function $h$ gives
\begin{align*}
\operatorname{ap}_h(p(x)):h(f(x))=h(g(x)).
\end{align*}
By the definition of composition,
\begin{align*}
(h\circ f)(x)\equiv h(f(x)).
\end{align*}
Also by the definition of composition,
\begin{align*}
(h\circ g)(x)\equiv h(g(x)).
\end{align*}
Thus the term $\operatorname{ap}_h(p(x))$ has type
\begin{align*}
(h\circ f)(x)=(h\circ g)(x).
\end{align*}
Since this construction works for every $x:A$, we have
\begin{align*}
\lambda x.\operatorname{ap}_h(p(x)):\prod_{x:A}(h\circ f)(x)=(h\circ g)(x).
\end{align*}
By *Function Extensionality Schema*, this pointwise equality determines a path
\begin{align*}
h\circ f=h\circ g.
\end{align*}
So postcomposition by $h$ preserves equality of functions proved pointwise.
[/example]
Pointwise equality of functions is only the first place where mathematical equality outruns the intensional rules. The next issue concerns equality of types themselves: mathematics often identifies structures when they are isomorphic, while raw type theory distinguishes their presentations. To state the univalence principle, we first isolate the notion of sameness between types that plays the role of isomorphism.
The required notion cannot be a bare pair of functions $A\to B$ and $B\to A$, because the maps must also witness that going there and back recovers the original element up to identity. The quasi-inverse formulation records exactly this inverse behaviour. Other presentations add coherence data explicitly or define equivalence by contractible fibres; these formulations agree in homotopy type theory, but the quasi-inverse version is the most economical for this survey.
[definition: Equivalence Of Types]
For types $A,B:\mathcal{U}$, a quasi-inverse equivalence from $A$ to $B$ consists of a function $f:A\to B$, a function $g:B\to A$, and identity terms
\begin{align*}
\eta &: \prod_{x:A} g(f(x)) = x, &
\varepsilon &: \prod_{y:B} f(g(y)) = y.
\end{align*}
[/definition]
There are several equivalent formulations of equivalence, including quasi-inverses, half-adjoint equivalences, and contractible fibres. The course uses equivalence as the type-theoretic analogue of isomorphism; when coherence matters, the quasi-inverse data above are replaced by an equivalent coherent presentation. The univalence axiom then answers the central question: whether paths between types in a universe are exactly equivalences between those types.
[remark: Univalence Axiom]
For types $A,B:\mathcal{U}$, the univalence axiom says that the canonical map
\begin{align*}
(A=B) \to (A\simeq B)
\end{align*}
from identity of types to equivalence of types is itself an equivalence.
[/remark]
The basic intensional rules construct the displayed canonical map by identity elimination: from a path $p:A=B$, transport along $p$ gives an equivalence $A\simeq B$. Univalence asserts that this map has inverse behaviour, so its reverse direction is not obtained from ordinary identity elimination alone. In homotopy type theory, the axiom is taken as a principle governing universes. Its computational and semantic content is that equivalent types may be identified, and constructions invariant under equivalence can be transported along that identification.
The philosophical shift is that equality of structures becomes sensitive to their correct notion of isomorphism. The universe $\mathcal{U}$ matters: univalence is a statement about paths between types inside a universe and equivalences between those same types. It identifies equivalent types, not arbitrary pieces of data that merely look similar; an equivalence must include the maps and inverse-behaviour data required by the type theory. The resulting path is also not the same as definitional equality, so replacing a type by an equivalent one may require transport rather than ordinary computation by reduction.
For a boundary case, compare the booleans with the internal type $\operatorname{Nat}$. There is a map $\operatorname{Bool}\to\operatorname{Nat}$ sending $\operatorname{false}$ to $\operatorname{zero}$ and $\operatorname{true}$ to $\operatorname{succ}(\operatorname{zero})$, but it has no inverse $\operatorname{Nat}\to\operatorname{Bool}$ satisfying the two identity families required for an equivalence, since values such as $\operatorname{succ}(\operatorname{succ}(\operatorname{zero}))$ cannot be recovered from a boolean. Univalence therefore does not identify these types from the existence of an injection or a coding function alone. The universe hypothesis is also part of the statement: the path lives in a specified universe containing both types, so univalence is not a licence to compare objects that have not first been presented as types in the same universe level.
[example: Isomorphism Versus Equality Of Small Structures]
Let $T$ be the two-element type with constructors $a:T$ and $b:T$. Define
\begin{align*}
e:\operatorname{Bool}\to T
\end{align*}
by $e(\operatorname{true})=a$ and $e(\operatorname{false})=b$, and define
\begin{align*}
r:T\to \operatorname{Bool}
\end{align*}
by $r(a)=\operatorname{true}$ and $r(b)=\operatorname{false}$. For $x:\operatorname{Bool}$, the two cases give
\begin{align*}
r(e(\operatorname{true}))=r(a)=\operatorname{true}
\end{align*}
and
\begin{align*}
r(e(\operatorname{false}))=r(b)=\operatorname{false}.
\end{align*}
Thus $r(e(x))=x$ for every $x:\operatorname{Bool}$. For $y:T$, the two constructor cases give
\begin{align*}
e(r(a))=e(\operatorname{true})=a
\end{align*}
and
\begin{align*}
e(r(b))=e(\operatorname{false})=b.
\end{align*}
Thus $e(r(y))=y$ for every $y:T$, so $e$ and $r$ exhibit an equivalence $\operatorname{Bool}\simeq T$.
By the *Univalence Axiom*, this equivalence determines a path
\begin{align*}
\operatorname{Bool}=T
\end{align*}
in the universe. Any property invariant under equivalence, such as being a type with exactly two elements, can be transported along this path from $\operatorname{Bool}$ to $T$. Constructor-sensitive computation is different: a boolean eliminator reduces by its clauses for $\operatorname{true}$ and $\operatorname{false}$, while the eliminator for $T$ reduces by its clauses for $a$ and $b$. Transport carries terms and properties across the univalent path, but it does not make the constructor names or their reduction rules definitionally identical.
[/example]
Univalence points toward a broader interpretation of identity types. If equality between types behaves like equivalence, then equality between terms should also be read as carrying structured information rather than as a mere syntactic token. This raises the problem of giving identity types a semantics rich enough to account for paths, paths between paths, and higher coherence. The next principle records the homotopy-theoretic answer to that problem.
[remark: Identity As Path Principle]
In the homotopy interpretation of type theory, the identity type $x=y$ of a type $A$ is interpreted as the type of paths from $x$ to $y$ in the space represented by $A$.
[/remark]
The interpretation assigns to each type a homotopy-theoretic object and to each term a point of that object. The formation rule for identity types is read as forming a path object between two points. Reflexivity corresponds to the constant path at a point, while the eliminator $J$ expresses path induction: to prove a property of all paths, it suffices to prove it for constant paths. Higher identity types then interpret homotopies between paths, matching the iterated structure present in spaces.
This path interpretation explains why identity types support symmetry, transitivity, and congruence but may carry more information than mere truth values. Its limitation is that it is an interpretation of the identity rules, not an assertion that every type behaves like an ordinary set. A set-like type has identity types that are propositions, so parallel paths carry no further information; a circle-like type can have nontrivial loops, so an equality $x=x$ may encode more than reflexivity. The principle also does not identify points merely because they lie in the same [connected component](/page/Connected%20Component) unless a path between them is part of the interpreted identity type.
For example, interpret a type as a [topological space](/page/Topological%20Space) with two disconnected points. The two points inhabit the same ambient type, but there is no path between them in the interpreted space, so the identity type between the corresponding terms has no path witness. Even in a [connected space](/page/Connected%20Space), membership in a connected component is weaker than a specified path in the interpretation used for identity. The path principle therefore depends on actual path data, not on a coarse statement that two points belong to the same broad region.
## Verified Mathematics and Certified Programming
The final theme asks why these extensions matter outside the metatheory. Dependent type theory is not only a formal calculus; it is also the foundation of proof assistants and certified programs. The guiding problem is how to express mathematical theorems, program specifications, and correctness proofs in one typed language.
[definition: Certified Program]
A certified program is a program term paired with a type expressing its specification.
[/definition]
The Curry-Howard correspondence makes specifications into types, so a program specification can be treated as an ordinary target type for construction. Type checking then verifies the program and its proof obligations simultaneously, relative to the type theory and axioms in use. Many useful specifications ask not merely for a value, but for a value together with evidence that it has a required property. Dependent pair types provide the standard way to package those two components without separating the program from its certificate.
[definition: Specification As Sigma Type]
Given a type $A:\mathcal{U}$ and a predicate $P:A\to\mathcal{U}$, the dependent pair type
\begin{align*}
\sum_{x:A}P(x)
\end{align*}
represents objects $x:A$ equipped with proofs of $P(x)$.
[/definition]
This pattern is common in verified programming. The first projection is the program or data, and the second projection is the certificate that it meets the stated property.
[example: Verified Sorting Specification]
Let $\operatorname{List}(\mathbb{N})$ be the type of lists of natural numbers. Write $\operatorname{Sorted}(ys)$ for the proposition that the entries of $ys$ occur in nondecreasing order, and write $\operatorname{Perm}(xs,ys)$ for the proposition that $ys$ contains exactly the same entries as $xs$ with the same multiplicities. A certified sorting function has target type
\begin{align*}
\prod_{xs:\operatorname{List}(\mathbb{N})}\sum_{ys:\operatorname{List}(\mathbb{N})}\bigl(\operatorname{Sorted}(ys)\times\operatorname{Perm}(xs,ys)\bigr).
\end{align*}
Thus, for an input $xs$, the first component of the result is the computed list $ys$, and the second component is a pair whose first projection proves $\operatorname{Sorted}(ys)$ and whose second projection proves $\operatorname{Perm}(xs,ys)$.
For insertion sort, the recursive proof proceeds by induction on $xs$. In the empty case, take
\begin{align*}
ys:=[].
\end{align*}
The empty list is sorted by the definition of $\operatorname{Sorted}$, and it is a permutation of itself by the reflexivity clause for $\operatorname{Perm}$, so the certified output is
\begin{align*}
\bigl([],(\operatorname{sorted}_{[]},\operatorname{perm\_refl}_{[]})\bigr):\sum_{ys:\operatorname{List}(\mathbb{N})}\bigl(\operatorname{Sorted}(ys)\times\operatorname{Perm}([],ys)\bigr).
\end{align*}
In the successor case, write the input as $x::xs$. The induction hypothesis supplies a certified sorted version of $xs$:
\begin{align*}
(ys,(s,p)):\sum_{zs:\operatorname{List}(\mathbb{N})}\bigl(\operatorname{Sorted}(zs)\times\operatorname{Perm}(xs,zs)\bigr).
\end{align*}
Here $ys:\operatorname{List}(\mathbb{N})$, $s:\operatorname{Sorted}(ys)$, and $p:\operatorname{Perm}(xs,ys)$. Define the new output list by inserting $x$ into the sorted list $ys$:
\begin{align*}
zs:=\operatorname{insert}(x,ys).
\end{align*}
The insertion-sortedness lemma has type
\begin{align*}
\prod_{x:\mathbb{N}}\prod_{ys:\operatorname{List}(\mathbb{N})}\operatorname{Sorted}(ys)\to \operatorname{Sorted}(\operatorname{insert}(x,ys)).
\end{align*}
Applying it to $x$, $ys$, and $s$ gives
\begin{align*}
\operatorname{insert\_sorted}(x,ys,s):\operatorname{Sorted}(\operatorname{insert}(x,ys)).
\end{align*}
The insertion-permutation lemma has type
\begin{align*}
\prod_{x:\mathbb{N}}\prod_{xs,ys:\operatorname{List}(\mathbb{N})}\operatorname{Perm}(xs,ys)\to \operatorname{Perm}(x::xs,\operatorname{insert}(x,ys)).
\end{align*}
Applying it to $x$, $xs$, $ys$, and $p$ gives
\begin{align*}
\operatorname{insert\_perm}(x,xs,ys,p):\operatorname{Perm}(x::xs,\operatorname{insert}(x,ys)).
\end{align*}
Therefore the certified output for $x::xs$ is
\begin{align*}
\bigl(\operatorname{insert}(x,ys),(\operatorname{insert\_sorted}(x,ys,s),\operatorname{insert\_perm}(x,xs,ys,p))\bigr).
\end{align*}
This term has exactly the required dependent pair type, so the program does not merely return a sorted list; it returns the list together with the two certificates required by the specification.
[/example]
The sorting specification shows how dependent types can express both the computational output and the theorem that the output is correct. This pattern scales from programs to mathematics: a library theorem is a named type, and a verified proof is a term inhabiting that type. The general metatheoretic principle is that type checking itself performs proof verification.
[quotetheorem:9652]
[citeproof:9652]
This theorem is the conceptual reason proof assistants can use kernel checking as a small trusted core. Its hypotheses include the chosen dependent type theory, the typing context $\Gamma$, and any axioms admitted into the environment. If the theory includes univalence, quotient principles, classical choice, or postulated constants, type checking verifies derivability relative to those assumptions; it does not separately prove their consistency. There is also a distinction between elaboration and checking: tactics and automation may synthesize a term, but the trusted claim begins only when the kernel checks the elaborated term against the declared type and definitional equality rules.
A boundary case is a development that declares a constant $c:A$ as an axiom. The kernel may then check $c:A$, and by Curry-Howard this is a proof of $A$ relative to the context containing that axiom. It is not a construction of $A$ from the empty context, nor evidence that the added axiom is consistent with the rest of the theory. Another boundary is an untrusted tactic that prints a proposed proof script: until elaboration produces a term and the kernel checks it against $A$, the script itself is not the verified proof object covered by the theorem.
[remark: Practical Trusted Kernels]
In systems such as Lean, Agda, and Coq, user-facing proof scripts are not themselves the foundation. They elaborate into terms, and a kernel checks those terms against the typing rules. The reliability of the system depends on the kernel, the implementation of definitional equality, and any axioms imported into the development.
[/remark]
Modern type theory therefore sits between logic, programming language theory, category theory, and homotopy theory. Inductive families and W-types give precise data and recursion principles, coinduction handles infinite observation, quotient-like and higher inductive constructions support mathematical identification, and extensional axioms align the internal equality with mathematical practice. The course ends with the guiding lesson that changing the equality principles of type theory changes both its computational behaviour and its foundational interpretation.
## Beyond and Connections
Several natural continuations branch out from this course. For programming language theory, the preservation and progress results lead to richer type systems with products, sums, polymorphism, dependent types, effects, and operational semantics for real languages. For [proof theory](/page/Proof%20Theory), normalization and cut elimination explain why computation can be read as proof simplification. For category theory, initial algebras, final coalgebras, locally cartesian closed categories, and toposes give semantic settings for inductive, coinductive, and dependent types. For homotopy type theory, identity types, function extensionality, univalence, and higher inductive types connect type formation with spaces and paths.
Readers who want a more computational route should next study typed lambda calculi with products, sums, recursive types, and operational equivalence. Readers who want a more foundational route should study Martin-Lof type theory, categorical semantics of dependent type theory, and the basic constructions of homotopy type theory.
## References
- Henk Barendregt, Wil Dekkers, and Richard Statman, *Lambda Calculus with Types*.
- Benjamin C. Pierce, *Types and Programming Languages*.
- Per Martin-Lof, *Intuitionistic Type Theory*.
- The Univalent Foundations Program, *Homotopy Type Theory: Univalent Foundations of Mathematics*.
Contents
- Introduction
- Why Types Belong in the Syntax
- Judgments Before Truth Values
- Contexts As Structured Assumptions
- The Curry-Howard Reading
- Computation And Metatheory
- Dependent Types And Mathematical Structure
- Constructive Mathematics And Proof Assistants
- The Shape Of The Course
- 1. Lambda Calculus and Typed Syntax
- Untyped Lambda Terms and Conversion
- Simply Typed Lambda Calculus
- Small-Step Semantics and Type Safety
- 2. Natural Deduction and Curry-Howard
- Propositions as Types
- Constructors and Eliminators
- Beta Laws as Local Proof Reductions
- Eta Laws as Local Proof Expansions
- Normal Forms and Consistency
- 3. Products, Sums, and Inductive Data
- Finite Type Formers and Their Eliminators
- Natural Numbers, Recursion, and Induction
- Lists, Trees, and Structural Recursion
- Primitive Recursive Programs as Typed Eliminations
- 4. Dependent Judgments and Martin-Löf Type Theory
- Judgments Before Rules
- Dependent Contexts And Families Of Types
- Definitional Equality And Propositional Equality
- Structural Rules For Dependent Contexts
- Dependent Judgments As A Foundation For Martin-Löf Type Theory
- 5. Pi Types and Universal Quantification
- Dependent Function Types and Their Rules
- Universal Quantification Under Propositions As Types
- Non-Dependent Function Types As Constant Pi Types
- Connections With Semantics And Proof Engineering
- 6. Sigma Types, Existentials, and Structured Objects
- Dependent Pair Formation and Computation
- Elimination and Pattern Matching for Sigma Types
- Sigma Types as Constructive Existentials
- Records, Subtypes, and Mathematical Structures
- Conceptual Summary
- 7. Identity Types and Equality Reasoning
- Identity Types and the Eliminator J
- Path Induction and Transport Along Equality
- Equality of Dependent Pairs and Functions
- 8. Universes, Propositions, and Constructive Mathematics
- Universes and Universe Levels
- Propositions, Data Types, and Proof Relevance
- Constructive Negation and Decidability
- Excluded Middle and Choice Principles
- 9. Normalization, Canonicity, and Consistency
- Strong Normalization and Computation
- Canonicity for Natural Numbers
- Definitional and Propositional Computation
- Consistency from Normalization
- How the Three Results Fit Together
- 10. Type Theory in Proof Assistants
- Dependent Type Theories as Kernels
- Tactics, Terms, and Elaboration
- Inductive Types and Computation
- Recursion Checking and Universes
- 11. Categorical Semantics of Type Theory
- Contexts, Substitutions, and Types as Fibrations
- Cartesian Closed Categories for Simple Types
- Locally Cartesian Closed Categories and Comprehension Categories
- 12. Extensions and Modern Directions
- Inductive Families, W-Types, Coinduction, and Quotient-Like Constructions
- Function Extensionality, Univalence, and Homotopy Type Theory
- Verified Mathematics and Certified Programming
- Beyond and Connections
- References
Type Theory
Also known as: Type Theory, typed lambda calculus, dependent type theory, Martin-Lof type theory, constructive type theory, Curry-Howard correspondence, proof theory
Content
Problems
History
Created by admin on 6/22/2026 | Last updated on 6/22/2026
Prerequisites (0/6 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent