[proofplan]
We prove first that one cycle-canceling augmentation preserves feasibility and flow value. The augmentation is a circulation: it has zero net imbalance at every vertex, so conservation and the $s$-$t$ value are unchanged, while the residual capacity bounds ensure that no original arc violates its lower or upper capacity constraint. Finally, for a terminal flow, we prove optimality directly: the difference between any competing feasible flow of the same value and the terminal flow decomposes into residual directed cycles, and absence of negative residual cycles forces the cost difference to be nonnegative.
[/proofplan]
[step:Write one residual cycle augmentation as a circulation on the original arcs]
Let $f:A\to\mathbb{R}_{\ge 0}$ be a feasible $s$-$t$ flow of value $b$. For an arc $a=(x,y)\in A$, the residual network contains the forward residual arc $(x,y)$ with residual capacity $u(a)-f(a)$ and residual cost $c(a)$ when $u(a)-f(a)>0$, and the backward residual arc $(y,x)$ with residual capacity $f(a)$ and residual cost $-c(a)$ when $f(a)>0$.
Let $\Gamma$ be a directed cycle in the residual network, understood in the standard simple sense: no residual arc occurs more than once along $\Gamma$. Let $\delta>0$ be no larger than the residual capacity of every residual arc in $\Gamma$. Define the signed increment $h:A\to\mathbb{R}$ by
\begin{align*}
h(a)
=
\delta\,N_+(a)-\delta\,N_-(a),
\end{align*}
where $N_+(a)$ is the number of times $\Gamma$ uses the forward residual arc corresponding to $a$, and $N_-(a)$ is the number of times $\Gamma$ uses the backward residual arc corresponding to $a$. Define the augmented flow $f':A\to\mathbb{R}$ by
\begin{align*}
f'(a)=f(a)+h(a)
\end{align*}
for every $a\in A$.
[/step]
[step:Use residual capacity bounds to preserve arc feasibility]
For each $a\in A$, simplicity of $\Gamma$ gives $N_+(a),N_-(a)\in\{0,1\}$. If $N_+(a)=1$, then the forward residual arc of $a$ lies in $\Gamma$, and the choice of $\delta$ gives
\begin{align*}
\delta N_+(a)\le u(a)-f(a).
\end{align*}
If $N_+(a)=0$, the same inequality holds because its left-hand side is $0$. Likewise, if $N_-(a)=1$, then the backward residual arc of $a$ lies in $\Gamma$, and the choice of $\delta$ gives
\begin{align*}
\delta N_-(a)\le f(a),
\end{align*}
while if $N_-(a)=0$ the inequality is immediate. Therefore
\begin{align*}
0\le f(a)-\delta N_-(a)\le f'(a)\le f(a)+\delta N_+(a)\le u(a).
\end{align*}
Thus $0\le f'(a)\le u(a)$ for every $a\in A$.
[/step]
[step:Use the cycle property to preserve conservation and value]
For each vertex $v\in V$, define the net imbalance of a function $q:A\to\mathbb{R}$ at $v$ by
\begin{align*}
\operatorname{div} q(v)
=
\sum_{\substack{a=(v,w)\in A}} q(a)
-
\sum_{\substack{a=(w,v)\in A}} q(a).
\end{align*}
The residual cycle $\Gamma$ enters and leaves every vertex the same number of times, so the increment $h$ satisfies
\begin{align*}
\operatorname{div} h(v)=0
\end{align*}
for every $v\in V$. Since $f'=f+h$, we get
\begin{align*}
\operatorname{div} f'(v)=\operatorname{div} f(v)+\operatorname{div} h(v)=\operatorname{div} f(v)
\end{align*}
for every $v\in V$. Hence flow conservation at all vertices in $V\setminus\{s,t\}$ is preserved, and the value at $s$ remains
\begin{align*}
\operatorname{div} f'(s)=\operatorname{div} f(s)=b.
\end{align*}
Therefore each augmentation produces another feasible $s$-$t$ flow of value $b$.
[guided]
The reason cycle augmentation preserves the value is that the update is not an $s$-$t$ path augmentation. It is a circulation. To make this precise, define, for any function $q:A\to\mathbb{R}$, the net imbalance at a vertex $v\in V$ by
\begin{align*}
\operatorname{div} q(v)
=
\sum_{\substack{a=(v,w)\in A}} q(a)
-
\sum_{\substack{a=(w,v)\in A}} q(a).
\end{align*}
A feasible $s$-$t$ flow of value $b$ has $\operatorname{div} f(v)=0$ for every $v\in V\setminus\{s,t\}$ and $\operatorname{div} f(s)=b$. The increment $h:A\to\mathbb{R}$ created by augmenting around $\Gamma$ records how much flow is added on original arcs and how much is subtracted on original arcs. Since $\Gamma$ is a directed cycle, every time the cycle visits a vertex it contributes one incoming residual arc and one outgoing residual arc. Translating residual arcs back to signed changes on original arcs gives
\begin{align*}
\operatorname{div} h(v)=0
\end{align*}
for every $v\in V$.
Now use additivity of the divergence operator:
\begin{align*}
\operatorname{div} f'(v)
=
\operatorname{div}(f+h)(v)
=
\operatorname{div} f(v)+\operatorname{div} h(v)
=
\operatorname{div} f(v).
\end{align*}
Thus every conservation equation that held for $f$ also holds for $f'$. In particular, the source imbalance remains
\begin{align*}
\operatorname{div} f'(s)=\operatorname{div} f(s)=b,
\end{align*}
so the augmentation does not change the flow value. Combined with the residual capacity bounds, this proves that the new flow is again feasible and has value $b$.
[/guided]
[/step]
[step:Compute the cost change of one negative-cycle augmentation]
Define the cost of a flow $q:A\to\mathbb{R}$ by
\begin{align*}
\operatorname{cost}(q)=\sum_{a\in A} c(a)q(a).
\end{align*}
The residual cost of $\Gamma$ is
\begin{align*}
c_f(\Gamma)=\sum_{a\in A} c(a)N_+(a)-\sum_{a\in A} c(a)N_-(a).
\end{align*}
Using $f'=f+h$, linearity of the finite sum defining cost gives
\begin{align*}
\operatorname{cost}(f')-\operatorname{cost}(f)=\sum_{a\in A} c(a)h(a).
\end{align*}
Substituting the definition $h(a)=\delta N_+(a)-\delta N_-(a)$ gives
\begin{align*}
\sum_{a\in A} c(a)h(a)=\delta\sum_{a\in A} c(a)N_+(a)-\delta\sum_{a\in A} c(a)N_-(a).
\end{align*}
By the definition of $c_f(\Gamma)$, this is
\begin{align*}
\operatorname{cost}(f')-\operatorname{cost}(f)=\delta\,c_f(\Gamma).
\end{align*}
Thus if $\Gamma$ has negative residual cost, the augmentation strictly decreases the total cost.
[/step]
[step:Decompose the difference from any competitor into residual cycles]
Let $f:A\to\mathbb{R}_{\ge 0}$ be a terminal flow, so its residual network has no directed cycle of negative residual cost. Let $g:A\to\mathbb{R}_{\ge 0}$ be any feasible $s$-$t$ flow of value $b$, and define $r:A\to\mathbb{R}$ by
\begin{align*}
r(a)=g(a)-f(a)
\end{align*}
for every $a\in A$.
Since $f$ and $g$ have the same value and both satisfy conservation away from $s,t$, we have
\begin{align*}
\operatorname{div} r(v)=0
\end{align*}
for every $v\in V$. If $r(a)>0$, then $g(a)\le u(a)$ implies
\begin{align*}
0<r(a)\le u(a)-f(a),
\end{align*}
so the forward residual arc of $a$ exists with enough residual capacity. If $r(a)<0$, then $g(a)\ge 0$ implies
\begin{align*}
0<-r(a)\le f(a),
\end{align*}
so the backward residual arc of $a$ exists with enough residual capacity.
Thus $r$ determines a nonnegative circulation in the residual network: send amount $r(a)$ on the forward residual arc of $a$ when $r(a)>0$, and amount $-r(a)$ on the backward residual arc of $a$ when $r(a)<0$. We now justify the cycle decomposition used here. If the residual circulation determined by $r$ is zero, take $m=0$. Otherwise choose a residual arc carrying positive amount and follow outgoing residual arcs of positive amount. At every visited vertex, conservation of the residual circulation implies that positive incoming amount is balanced by positive outgoing amount, so the walk can continue. Since the residual network is finite, some vertex repeats; the segment between the two visits is a directed residual cycle. Let $\alpha>0$ be the minimum carried amount on the arcs of that cycle, subtract $\alpha$ times the cycle circulation, and repeat. Each subtraction removes at least one positive residual arc and never creates a negative amount, so the process terminates after finitely many repetitions. Hence there are directed residual cycles $\Gamma_1,\dots,\Gamma_m$ and numbers $\alpha_1,\dots,\alpha_m>0$ such that the residual circulation determined by $r$ is their weighted sum.
[/step]
[step:Use absence of negative residual cycles to compare costs]
For each cycle $\Gamma_i$, let $c_f(\Gamma_i)$ denote its residual cost. Since $f$ is terminal, the residual network has no negative-cost directed cycle, so
\begin{align*}
c_f(\Gamma_i)\ge 0
\end{align*}
for every $i\in\{1,\dots,m\}$. The cost difference between $g$ and $f$ is the residual cost of the circulation determined by $r$. By the definition of $r$, this first gives
\begin{align*}
\operatorname{cost}(g)-\operatorname{cost}(f)=\sum_{a\in A} c(a)(g(a)-f(a)).
\end{align*}
Using the cycle decomposition of the residual circulation, the same quantity is
\begin{align*}
\operatorname{cost}(g)-\operatorname{cost}(f)=\sum_{i=1}^m \alpha_i c_f(\Gamma_i).
\end{align*}
Since every coefficient $\alpha_i$ is positive and every residual cycle cost satisfies $c_f(\Gamma_i)\ge 0$, we obtain
\begin{align*}
\operatorname{cost}(g)-\operatorname{cost}(f)\ge 0.
\end{align*}
Therefore $\operatorname{cost}(f)\le \operatorname{cost}(g)$ for every feasible $s$-$t$ flow $g$ of value $b$. Hence the terminal flow $f$ is a minimum-cost feasible $s$-$t$ flow of value $b$.
[/step]