[proofplan]
We prove a cut-reduction lemma for a single final cut, ordered by the complexity of the cut formula and then by the heights of the two cut premises. Non-principal cuts are commuted upward across the last inference of one premise, decreasing the height component. Principal cuts are replaced by cuts on immediate subformulas, decreasing the formula-complexity component; in quantifier cases we first rename eigenvariables to avoid capture. Applying this lemma from the bottom upward removes all cuts from an arbitrary LK derivation while preserving the end-sequent.
[/proofplan]
custom_env
admin
[step:Fix the sequent calculus conventions and the reduction measure]
A sequent is a finite multiset expression $\Gamma \Rightarrow \Delta$, where $\Gamma$ and $\Delta$ are finite multisets of first-order formulas. The cut rule is
\begin{align*}
\frac{\Pi \Rightarrow \Lambda, A \qquad A, \Sigma \Rightarrow \Theta}{\Pi, \Sigma \Rightarrow \Lambda, \Theta}.
\end{align*}
For a formula $A$, define its degree $d(A) \in \mathbb{N}$ by $d(P)=0$ for atomic $P$ and
\begin{align*}
d(\neg B)&=d(B)+1,\\
d(B\wedge C)&=d(B)+d(C)+1,\\
d(B\vee C)&=d(B)+d(C)+1,\\
d(B\to C)&=d(B)+d(C)+1,\\
d(\forall x\,B)&=d(B)+1,\\
d(\exists x\,B)&=d(B)+1.
\end{align*}
For an LK derivation $\mathcal D$, let $h(\mathcal D)\in\mathbb N$ be its height, counted as the maximum number of inference instances on a branch. We order triples
\begin{align*}
(d(A), h(\mathcal D_1), h(\mathcal D_2))
\end{align*}
lexicographically, where $A$ is the cut formula and $\mathcal D_1,\mathcal D_2$ are the left and right cut-premise derivations.
A formula occurrence is principal in an inference if it is the formula occurrence introduced by that inference. By repeated renaming of bound variables and eigenvariables, assume throughout that no eigenvariable introduced in one premise occurs free in the other premise or in the final end-sequent.
We use the standard first-order LK rules with finite multiset contexts: initial sequents are $P,\Gamma \Rightarrow \Delta,P$ for atomic formulas $P$; the logical rules are the usual left and right rules for $\neg,\wedge,\vee,\to,\forall,\exists$; and weakening and contraction are available on both sides. We shall use three standard admissibility facts for LK. First, weakening is height-preserving admissible: from a derivation of $\Gamma \Rightarrow \Delta$ one obtains a derivation of $\Gamma,\Gamma_0 \Rightarrow \Delta,\Delta_0$ for any finite multisets $\Gamma_0,\Delta_0$. Second, contraction is admissible on both sides: from $B,B,\Gamma \Rightarrow \Delta$ one obtains $B,\Gamma \Rightarrow \Delta$, and from $\Gamma \Rightarrow \Delta,B,B$ one obtains $\Gamma \Rightarrow \Delta,B$. Third, eigenvariables may be renamed provided freshness side conditions are preserved. In $\forall R$ and $\exists L$, the eigenvariable is required not to occur free in the lower sequent. The substitution lemma for LK says that if a derivation uses eigenvariables fresh for a term $t$, then uniformly substituting $t$ for such an eigenvariable preserves every rule instance. In the local reduction below, primitive structural inferences are treated separately: weakening on a cut occurrence is deleted, contraction not involving the cut occurrence is commuted as an ordinary non-principal rule, and contraction of the cut occurrence is reduced by replacing the contracted cut with the corresponding two smaller-height cuts followed by admissible contraction of the duplicated context.
For a cut instance $C$ whose cut formula is $A$ and whose immediate left and right premise derivations are $\mathcal D_1$ and $\mathcal D_2$, define its cut measure by
\begin{align*}
\mu(C) := (d(A),h(\mathcal D_1),h(\mathcal D_2)) \in \mathbb N^3.
\end{align*}
We order $\mathbb N^3$ lexicographically and order finite multisets of such triples by the induced multiset extension: replacing one element by finitely many strictly smaller elements gives a strictly smaller multiset. This multiset order is well-founded because the lexicographic order on $\mathbb N^3$ is well-founded.
[/step]
custom_env
admin
[step:Prove the local cut-reduction lemma][claim:Final cut reduction]
Let $\Pi,\Lambda,\Sigma,\Theta$ be finite multisets of formulas, let $A$ be a formula, let $\mathcal D_1$ be an LK derivation of $\Pi \Rightarrow \Lambda,A$, and let $\mathcal D_2$ be an LK derivation of $A,\Sigma \Rightarrow \Theta$. Let $C_0$ denote the cut instance with immediate premise derivations $\mathcal D_1$ and $\mathcal D_2$, so that $\mu(C_0)=(d(A),h(\mathcal D_1),h(\mathcal D_2))$. Then there is an LK derivation of $\Pi,\Sigma \Rightarrow \Lambda,\Theta$ such that every newly introduced cut instance $C$ satisfies $\mu(C)<\mu(C_0)$.
[/claim]
[proof]
We prove the claim by lexicographic induction on $(d(A),h(\mathcal D_1),h(\mathcal D_2))$.
First handle the initial-sequent cases. If $\mathcal D_1$ has height $0$, then its end-sequent $\Pi \Rightarrow \Lambda,A$ is an initial sequent $P,\Gamma \Rightarrow \Delta,P$ with distinguished atomic formula $P$. If the displayed occurrence of $A$ is not the distinguished succedent occurrence of that axiom, then the distinguished occurrence of $P$ on the right lies in $\Lambda$, and the matching occurrence of $P$ on the left lies in $\Pi$. Hence the desired conclusion $\Pi,\Sigma \Rightarrow \Lambda,\Theta$ is itself an initial sequent, with the extra formulas $\Sigma$ and $\Theta$ supplied as side context. If the displayed occurrence of $A$ is the distinguished succedent occurrence, then $A=P$ and the matching distinguished antecedent occurrence of $P$ lies in $\Pi$; weaken $\mathcal D_2$ by the remaining formulas of $\Pi$ and by $\Lambda$, and then use contraction on the antecedent occurrence of $A$ to obtain $\Pi,\Sigma \Rightarrow \Lambda,\Theta$. The case $h(\mathcal D_2)=0$ is dual. If the displayed antecedent occurrence of $A$ is not distinguished, the distinguished atomic formula already occurs on both sides outside the displayed $A$, so $\Pi,\Sigma \Rightarrow \Lambda,\Theta$ is an initial sequent. If the displayed antecedent occurrence of $A$ is distinguished, then $A$ occurs in $\Theta$, so weakening and contraction applied to $\mathcal D_1$ give $\Pi,\Sigma \Rightarrow \Lambda,\Theta$. These transformations introduce no cuts.
Now suppose that neither premise derivation is an initial sequent and that the last inference of $\mathcal D_1$ does not introduce the displayed occurrence of $A$ as a principal logical formula. If the last inference is weakening that adds the displayed occurrence of $A$, delete that weakening and obtain the desired conclusion from the remaining derivation by weakening with $\Sigma$ and $\Theta$. If the last inference is contraction on a formula different from the displayed occurrence of $A$, commute the cut above that contraction, apply the induction hypothesis to the smaller premise derivation, and reapply contraction. If the last inference contracts two succedent occurrences of $A$ into the displayed occurrence, let $\mathcal E$ derive $\Pi \Rightarrow \Lambda,A,A$. First cut $\mathcal E$ against $\mathcal D_2$ on one occurrence of $A$; the left premise height has decreased. Then cut the resulting derivation against a fresh copy of $\mathcal D_2$ on the remaining occurrence of $A$; this is the standard contraction reduction, and the duplicated side contexts are finally contracted to obtain $\Pi,\Sigma \Rightarrow \Lambda,\Theta$. Both introduced cuts are justified by the induction hypothesis at the stage where their immediate unreduced cut premise above the contraction is $\mathcal E$, so their left-height component is strictly smaller than $h(\mathcal D_1)$. For any other non-principal logical rule instance $R$ with premises derived by $\mathcal E_1,\dots,\mathcal E_m$, apply the induction hypothesis to each premise of $R$ in which the succedent contains the carried occurrence of $A$, together with $\mathcal D_2$. The cut formula remains $A$, while the left height is smaller than $h(\mathcal D_1)$. Reapply the same rule $R$ below the resulting derivations. The rule $R$ is still a valid LK instance after adding the fixed multiset $\Sigma$ to each antecedent and $\Theta$ to each succedent, because weakening supplies the added context and the side conditions for $\forall R$ and $\exists L$ are preserved by the preliminary eigenvariable renaming.
The case in which the last inference of $\mathcal D_2$ does not introduce the displayed occurrence of $A$ as a principal logical formula is dual. Weakening that adds the displayed antecedent occurrence of $A$ is deleted. Contraction on a formula different from $A$ is commuted and then reapplied. If the last inference contracts two antecedent occurrences of $A$ into the displayed occurrence, cut $\mathcal D_1$ successively against the premise containing the two occurrences, then contract the duplicated side contexts; each introduced cut has the same formula and a strictly smaller right-height component before the contraction is reapplied. For any other non-principal logical inference with premise derivations $\mathcal F_1,\dots,\mathcal F_n$, apply the induction hypothesis to $\mathcal D_1$ and to each premise in which the antecedent contains the carried occurrence of $A$. The cut formula remains $A$, while the right height is smaller than $h(\mathcal D_2)$. Reapply the same inference below the reduced derivations; its side conditions are preserved by the same eigenvariable-renaming convention.
It remains to treat the case in which $A$ is principal in both last inferences. We inspect the outermost connective or quantifier of $A$.
If $A=B\wedge C$, the left premise ends with $\wedge R$ and the right premise ends with either $\wedge L_1$ or $\wedge L_2$. In the $\wedge L_1$ case the relevant derivations have the form
\begin{align*}
\frac{\mathcal E_1:\Pi \Rightarrow \Lambda,B \qquad \mathcal E_2:\Pi \Rightarrow \Lambda,C}{\Pi \Rightarrow \Lambda,B\wedge C}
\qquad
\frac{\mathcal F:B,\Sigma \Rightarrow \Theta}{B\wedge C,\Sigma \Rightarrow \Theta}.
\end{align*}
Replace the displayed cut by the cut of $\mathcal E_1$ against $\mathcal F$ on $B$. Since $d(B)<d(B\wedge C)$, the induction hypothesis gives the required derivation. The $\wedge L_2$ case uses $\mathcal E_2$ and cuts on $C$.
If $A=B\vee C$, the right premise has both branches for $\vee L$ and the left premise ends with either $\vee R_1$ or $\vee R_2$. In the $\vee R_1$ case the final inferences have the form
\begin{align*}
\frac{\mathcal E:\Pi \Rightarrow \Lambda,B}{\Pi \Rightarrow \Lambda,B\vee C}
\qquad
\frac{\mathcal F_1:B,\Sigma \Rightarrow \Theta \qquad \mathcal F_2:C,\Sigma \Rightarrow \Theta}{B\vee C,\Sigma \Rightarrow \Theta}.
\end{align*}
Replace the cut on $B\vee C$ by the cut of $\mathcal E$ against $\mathcal F_1$ on $B$. The $\vee R_2$ case is identical with $C$ and $\mathcal F_2$.
If $A=B\to C$, the left premise ends with $\to R$ and the right premise ends with $\to L$:
\begin{align*}
\frac{\mathcal E:B,\Pi \Rightarrow \Lambda,C}{\Pi \Rightarrow \Lambda,B\to C}
\qquad
\frac{\mathcal F_1:\Sigma \Rightarrow \Theta,B \qquad \mathcal F_2:C,\Sigma \Rightarrow \Theta}{B\to C,\Sigma \Rightarrow \Theta}.
\end{align*}
First cut $\mathcal F_1$ with $\mathcal E$ on $B$, obtaining a derivation of $\Pi,\Sigma \Rightarrow \Lambda,\Theta,C$. Then cut that derivation with $\mathcal F_2$ on $C$, obtaining $\Pi,\Sigma,\Sigma \Rightarrow \Lambda,\Theta,\Theta$. Finally apply contraction on the duplicated contexts. Both new cuts have formulas $B$ and $C$, whose degrees are strictly smaller than $d(B\to C)$.
If $A=\neg B$, the left premise ends with $\neg R$ and the right premise ends with $\neg L$:
\begin{align*}
\frac{\mathcal E:B,\Pi \Rightarrow \Lambda}{\Pi \Rightarrow \Lambda,\neg B}
\qquad
\frac{\mathcal F:\Sigma \Rightarrow \Theta,B}{\neg B,\Sigma \Rightarrow \Theta}.
\end{align*}
Cut $\mathcal F$ against $\mathcal E$ on $B$, producing $\Pi,\Sigma \Rightarrow \Lambda,\Theta$. The degree has decreased from $d(\neg B)$ to $d(B)$.
If $A=\forall x\,B(x)$, the left premise ends with $\forall R$ and the right premise ends with $\forall L$:
\begin{align*}
\frac{\mathcal E:\Pi \Rightarrow \Lambda,B(y)}{\Pi \Rightarrow \Lambda,\forall x\,B(x)}
\qquad
\frac{\mathcal F:B(t),\Sigma \Rightarrow \Theta}{\forall x\,B(x),\Sigma \Rightarrow \Theta},
\end{align*}
where $y$ is an eigenvariable not free in $\Pi,\Lambda$, and $t$ is a term free for $x$ in $B(x)$. By the preliminary renaming convention, $y$ is also fresh for $\Sigma,\Theta,t$. Substitute $t$ for $y$ throughout $\mathcal E$. The substitution lemma for LK rule instances gives a derivation $\mathcal E[t/y]$ of $\Pi \Rightarrow \Lambda,B(t)$. Cutting $\mathcal E[t/y]$ against $\mathcal F$ on $B(t)$ gives the desired conclusion, and $d(B(t))<d(\forall x\,B(x))$.
If $A=\exists x\,B(x)$, the left premise ends with $\exists R$ and the right premise ends with $\exists L$:
\begin{align*}
\frac{\mathcal E:\Pi \Rightarrow \Lambda,B(t)}{\Pi \Rightarrow \Lambda,\exists x\,B(x)}
\qquad
\frac{\mathcal F:B(y),\Sigma \Rightarrow \Theta}{\exists x\,B(x),\Sigma \Rightarrow \Theta},
\end{align*}
where $y$ is an eigenvariable not free in $\Sigma,\Theta$, and $t$ is free for $x$ in $B(x)$. Rename $y$ so that it is fresh for $\Pi,\Lambda,t$. Substitute $t$ for $y$ throughout $\mathcal F$, obtaining a derivation $\mathcal F[t/y]$ of $B(t),\Sigma \Rightarrow \Theta$. Cut $\mathcal E$ against $\mathcal F[t/y]$ on $B(t)$. The new cut formula has smaller degree.
Atomic formulas have no principal logical case. Thus the induction covers all possible final inferences, and the claim follows.
[/proof][/step]
custom_env
admin
[guided]The proof is a local normalisation argument for one cut placed at the bottom of a derivation. Here $\Pi,\Lambda,\Sigma,\Theta$ are finite multisets of formulas, $A$ is the cut formula, $\mathcal D_1$ derives $\Pi \Rightarrow \Lambda,A$, and $\mathcal D_2$ derives $A,\Sigma \Rightarrow \Theta$. We argue by induction on
\begin{align*}
(d(A),h(\mathcal D_1),h(\mathcal D_2)).
\end{align*}
First consider the base case in which one premise is an initial sequent. If $\mathcal D_1$ has height $0$, then $\Pi \Rightarrow \Lambda,A$ is an initial sequent $P,\Gamma \Rightarrow \Delta,P$. There are two different possibilities, and it is important not to confuse them. If the displayed occurrence of $A$ is not the distinguished succedent occurrence of the initial sequent, then the distinguished atomic formula $P$ already occurs in $\Pi$ on the left and in $\Lambda$ on the right. Therefore the desired sequent $\Pi,\Sigma \Rightarrow \Lambda,\Theta$ is itself an initial sequent, with $\Sigma$ and $\Theta$ merely added as side context. If the displayed occurrence of $A$ is the distinguished succedent occurrence, then $A=P$ and the matching occurrence of $P$ occurs in $\Pi$. In that case we weaken $\mathcal D_2$ by the remaining formulas of $\Pi$ and by $\Lambda$, and contraction removes the duplicated antecedent occurrence of $A$, yielding $\Pi,\Sigma \Rightarrow \Lambda,\Theta$. The case $h(\mathcal D_2)=0$ is dual. If the displayed antecedent occurrence of $A$ is not distinguished, the target sequent is already initial; if it is distinguished, then the matching occurrence of $A$ lies in $\Theta$, and weakening plus contraction applied to $\mathcal D_1$ gives the target sequent. No cuts are introduced in any initial-sequent case.
After the initial cases, there are two kinds of reductions.
First, suppose the cut formula is merely carried through the last inference of one premise. For a logical inference in $\mathcal D_1$ whose principal formula is not the displayed occurrence of $A$, the occurrence of $A$ appears unchanged in the relevant premises. We cut $\mathcal D_2$ against those smaller premise derivations. The cut formula is still $A$, but the height of the left premise has decreased. After those smaller cuts have been reduced by the induction hypothesis, we reapply the same last rule; the rule remains valid because LK rules permit the same surrounding contexts to be carried along and eigenvariables were renamed fresh in advance.
Structural rules require a separate check. Weakening that creates the displayed occurrence of $A$ can simply be deleted, because the opposite premise is no longer needed for that occurrence. Contraction on a formula other than $A$ commutes past the cut and is reapplied afterward. If contraction merges two displayed occurrences of $A$, we do not pretend there is a single carried occurrence in the premise. Instead, we cut against the premise containing both occurrences, once for the first occurrence and once for the second, and then contract the duplicated side contexts. These two cuts are the standard contraction reduction; at the point where each is introduced, the premise above the contraction has smaller height than the original contracted derivation, so the induction measure decreases.
The same analysis applies when the cut formula is not principal in the last inference of $\mathcal D_2$. Logical rules commute upward in the dual way, weakening on the displayed occurrence is deleted, contraction away from $A$ is commuted, and contraction of two antecedent occurrences of $A$ is replaced by two cuts followed by contraction. This is the mechanism that moves cuts upward until the cut formula becomes principal on both sides.
Now suppose the cut formula is principal on both sides. Then its outermost connective determines the reduction. If the cut formula is $B\wedge C$, the right-introduction of $B\wedge C$ on the left side has premises deriving both $B$ and $C$ in the succedent. The left-introduction of $B\wedge C$ on the right side uses either $B$ or $C$ in the antecedent. We replace the cut on $B\wedge C$ by the corresponding cut on $B$ or on $C$. Since $B$ and $C$ are immediate subformulas, their degrees are smaller.
For $B\vee C$, the roles are dual. The left-introduction of $B\vee C$ on the right side has one premise for $B$ and one for $C$, while the right-introduction on the left side chooses one disjunct. We keep the matching branch and cut only on that chosen subformula.
For $B\to C$, the standard reduction uses two smaller cuts. The last rules have the shape
\begin{align*}
\frac{B,\Pi \Rightarrow \Lambda,C}{\Pi \Rightarrow \Lambda,B\to C}
\qquad
\frac{\Sigma \Rightarrow \Theta,B \qquad C,\Sigma \Rightarrow \Theta}{B\to C,\Sigma \Rightarrow \Theta}.
\end{align*}
The first smaller cut eliminates $B$: cut $\Sigma \Rightarrow \Theta,B$ against $B,\Pi \Rightarrow \Lambda,C$, producing $\Pi,\Sigma \Rightarrow \Lambda,\Theta,C$. The second smaller cut eliminates $C$: cut this result against $C,\Sigma \Rightarrow \Theta$, producing $\Pi,\Sigma,\Sigma \Rightarrow \Lambda,\Theta,\Theta$. The contraction rule is then applied finitely many times to each formula whose multiplicity was doubled in $\Sigma$ or $\Theta$, yielding $\Pi,\Sigma \Rightarrow \Lambda,\Theta$. Both cuts are on proper subformulas of $B\to C$, so their first measure coordinate is strictly smaller than $d(B\to C)$.
For $\neg B$, the two principal rules move $B$ from one side of the sequent to the other. Cutting the two premises directly on $B$ replaces the cut on $\neg B$ by a strictly simpler cut.
For quantifiers, the only additional issue is variable capture. In the $\forall$ case, the principal rules have the shape
\begin{align*}
\frac{\Pi \Rightarrow \Lambda,B(y)}{\Pi \Rightarrow \Lambda,\forall x\,B(x)}
\qquad
\frac{B(t),\Sigma \Rightarrow \Theta}{\forall x\,B(x),\Sigma \Rightarrow \Theta}.
\end{align*}
The eigenvariable $y$ is chosen fresh. After renaming if necessary, it is fresh for the right premise and for the term $t$. Substituting $t$ for $y$ in the left premise derivation gives a derivation of $\Pi \Rightarrow \Lambda,B(t)$. We then cut on $B(t)$, whose degree is smaller than that of $\forall x\,B(x)$.
The $\exists$ case is dual. The left premise supplies an instance $B(t)$, and the right premise uses an eigenvariable instance $B(y)$. Rename $y$ fresh for the left side and substitute $t$ for $y$ in the right premise derivation. Then cut on $B(t)$.
Atomic cut formulas cannot be principal in a logical inference, because no logical rule introduces an atomic formula as a compound principal formula. Since the initial-sequent cases were handled at the start of the induction, an atomic cut is reduced by commuting it upward until one premise is initial, and that final initial case removes it without introducing any new cut. Every non-principal commutation replaces the original cut by cuts with the same formula and a smaller left or right height coordinate, while every principal logical reduction replaces it by cuts on proper subformulas. Hence every newly introduced cut has measure strictly smaller than the measure of the original final cut. This completes the local cut-reduction lemma.[/guided]
custom_env
admin
[step:Remove all cuts from an arbitrary LK derivation]
We prove by induction on the height of an LK derivation $\mathcal D$ of $\Gamma \Rightarrow \Delta$ that there exists a cut-free derivation of $\Gamma \Rightarrow \Delta$.
If $\mathcal D$ has height $0$, then $\mathcal D$ is an initial sequent derivation and contains no cut.
Assume the result for all derivations of height smaller than $h(\mathcal D)$. If the last inference of $\mathcal D$ is not a cut, apply the induction hypothesis to each premise derivation and reapply the same non-cut inference. This gives a cut-free derivation of the same end-sequent.
If the last inference of $\mathcal D$ is a cut,
\begin{align*}
\frac{\mathcal D_1:\Pi \Rightarrow \Lambda,A \qquad \mathcal D_2:A,\Sigma \Rightarrow \Theta}{\Pi,\Sigma \Rightarrow \Lambda,\Theta},
\end{align*}
then $h(\mathcal D_1)<h(\mathcal D)$ and $h(\mathcal D_2)<h(\mathcal D)$. By the outer induction hypothesis, replace $\mathcal D_1$ and $\mathcal D_2$ by cut-free derivations $\mathcal D_1'$ and $\mathcal D_2'$ of the same premiss sequents. Apply the final cut-reduction claim to $\mathcal D_1'$ and $\mathcal D_2'$. Let $M(\mathcal E)$ denote the finite multiset of measures $\mu(C)$ of all cut instances $C$ occurring in a derivation $\mathcal E$. The local reduction replaces one cut measure by finitely many strictly smaller cut measures and leaves all other cut measures unchanged. Therefore $M(\mathcal E)$ strictly decreases in the multiset extension of the lexicographic order on $\mathbb N^3$. Since this multiset order is well-founded, repeated local reduction terminates. At termination no cut instance remains, so we obtain a cut-free derivation of $\Pi,\Sigma \Rightarrow \Lambda,\Theta$.
Thus every LK derivation can be transformed into a cut-free derivation of the same end-sequent.
[/step]
custom_env
admin