[proofplan]
We prove the invariant that after every augmentation the current flow is minimum-cost among flows of its current value, and its residual network has no negative-cost directed cycle. The main exchange argument compares an arbitrary competing flow of the new value with the old optimal flow; their difference decomposes in the old residual network into residual $s$-$t$ paths and directed cycles. Shortestness of the chosen path bounds every path component from below, while the no-negative-cycle hypothesis bounds every cycle component from below. This proves optimality after the augmentation, and then optimality itself rules out negative residual cycles.
[/proofplan]
[step:Define residual costs and the cost of an augmentation]
Let $G=(V,E)$ be the directed network, let $u:E\to \mathbb{R}_{\geq 0}$ be the capacity map, and let $c:E\to \mathbb{R}$ be the cost map. For a function $f:E\to \mathbb{R}_{\geq 0}$, define its net outflow map $\operatorname{bal}_f:V\to \mathbb{R}$ by
\begin{align*}
\operatorname{bal}_f(v)=\sum_{(v,w)\in E} f(v,w)-\sum_{(w,v)\in E} f(w,v).
\end{align*}
A feasible $s$-$t$ flow is a map $f:E\to \mathbb{R}_{\geq 0}$ satisfying
\begin{align*}
0\leq f(e)\leq u(e)
\end{align*}
for every $e\in E$ and $\operatorname{bal}_f(v)=0$ for every $v\in V\setminus\{s,t\}$. Define the value map $|\cdot|$ on feasible $s$-$t$ flows by
\begin{align*}
|f|=\operatorname{bal}_f(s)=\sum_{(s,v)\in E} f(s,v)-\sum_{(v,s)\in E} f(v,s).
\end{align*}
Define the cost map $C$ from feasible $s$-$t$ flows to $\mathbb{R}$ by
\begin{align*}
C(f)=\sum_{e\in E} c(e)f(e).
\end{align*}
Residual arcs are treated as labelled arcs: each residual arc is labelled by its underlying original arc $e\in E$ and by whether it is used in the forward or reverse orientation. Thus parallel residual arcs, or a reverse residual arc with the same ordered endpoints as an original forward arc, remain distinguishable. The residual network $G_f$ has a forward residual arc $e=(v,w)$ with residual capacity $u(e)-f(e)$ and residual cost $c(e)$ whenever $f(e)<u(e)$, and a reverse residual arc $e^{-1}=(w,v)$ with residual capacity $f(e)$ and residual cost $-c(e)$ whenever $f(e)>0$.
Let $\mathcal{P}_f$ denote the set of residual $s$-$t$ paths in $G_f$. Define the residual path-cost map $c_f:\mathcal{P}_f\to \mathbb{R}$ by
\begin{align*}
c_f(P)=\sum_{a\in P} c_f(a),
\end{align*}
where $c_f(a)$ denotes the residual cost of the residual arc $a$. If $f'$ is obtained from $f$ by augmenting an amount $\delta>0$ along $P$, then
\begin{align*}
|f'|=|f|+\delta,
\qquad
C(f')=C(f)+\delta c_f(P).
\end{align*}
Indeed, each forward residual arc on $P$ increases the corresponding original flow by $\delta$, and each reverse residual arc decreases the corresponding original flow by $\delta$, so the change in cost is exactly the sum of the corresponding residual costs multiplied by $\delta$.
[/step]
[step:Decompose the difference between a competitor and the old flow]
Let $f$ be a feasible flow of value $k$, and let $g$ be a feasible flow of value $k+\delta$, where $\delta>0$. Define a non-negative residual flow $h$ on the residual arcs of $G_f$ as follows. For each original arc $e=(v,w)\in E$, if $g(e)\geq f(e)$, place $h(e)=g(e)-f(e)$ on the forward residual arc $e=(v,w)$; if $g(e)<f(e)$, place $h(e^{-1})=f(e)-g(e)$ on the reverse residual arc $e^{-1}=(w,v)$.
This is well-defined because feasibility gives $0\leq f(e),g(e)\leq u(e)$, hence
\begin{align*}
g(e)-f(e)\leq u(e)-f(e)
\end{align*}
in the forward case and
\begin{align*}
f(e)-g(e)\leq f(e)
\end{align*}
in the reverse case. The net imbalance of $h$ is $+\delta$ at $s$, $-\delta$ at $t$, and $0$ at every vertex in $V\setminus\{s,t\}$. Its residual cost satisfies
\begin{align*}
\sum_{a\in E(G_f)} c_f(a)h(a)=C(g)-C(f).
\end{align*}
Because $G_f$ is finite, the usual flow-decomposition procedure applies to $h$: repeatedly extract an $s$-$t$ path carrying positive $h$-mass until no such path remains, and then decompose the remaining circulation into directed cycles. Each extracted path reduces the net imbalance at $s$ by exactly its extracted weight and does not change the imbalance at any vertex in $V\setminus\{s,t\}$. Since the initial imbalance at $s$ is $\delta$ and the process stops only after all positive $s$-$t$ flow has been removed, the sum of the extracted path weights is exactly $\delta$. Thus there exist residual $s$-$t$ paths $Q_1,\dots,Q_m$, positive weights $\alpha_1,\dots,\alpha_m$, directed residual cycles $\Gamma_1,\dots,\Gamma_r$, and positive weights $\beta_1,\dots,\beta_r$ such that
\begin{align*}
\sum_{i=1}^m \alpha_i=\delta
\end{align*}
and
\begin{align*}
C(g)-C(f)=\sum_{i=1}^m \alpha_i c_f(Q_i)+\sum_{j=1}^r \beta_j c_f(\Gamma_j).
\end{align*}
[guided]
The purpose of this step is to convert the comparison between two flows into a comparison between residual paths and residual cycles.
Let $f:E\to\mathbb{R}_{\geq 0}$ be a feasible flow of value $k$, and let $g:E\to\mathbb{R}_{\geq 0}$ be a feasible flow of value $k+\delta$, where $\delta>0$. We compare $g$ with $f$ arc by arc. For an original arc $e=(v,w)\in E$, if $g(e)\geq f(e)$, then $g$ uses $e$ more than $f$ does, so we record the amount $g(e)-f(e)$ on the forward residual arc $e=(v,w)$. If $g(e)<f(e)$, then $g$ uses $e$ less than $f$ does, so we record the amount $f(e)-g(e)$ on the reverse residual arc $e^{-1}=(w,v)$.
This gives a function $h:E(G_f)\to\mathbb{R}_{\geq 0}$. It is feasible in the residual network. In the forward case,
\begin{align*}
0\leq g(e)-f(e)\leq u(e)-f(e),
\end{align*}
because $g(e)\leq u(e)$. In the reverse case,
\begin{align*}
0\leq f(e)-g(e)\leq f(e),
\end{align*}
because $g(e)\geq 0$. Therefore $h$ never exceeds residual capacity.
The flow-balance of $h$ is exactly the difference between the flow-balance of $g$ and the flow-balance of $f$. Since $g$ has value $k+\delta$ and $f$ has value $k$, the residual flow $h$ sends net value $\delta$ from $s$ to $t$. Thus the imbalance of $h$ is $+\delta$ at $s$, $-\delta$ at $t$, and $0$ at every intermediate vertex.
The cost identity follows from the definition of residual costs. If $g(e)\geq f(e)$, the contribution is
\begin{align*}
c(e)(g(e)-f(e)).
\end{align*}
If $g(e)<f(e)$, the reverse residual arc has cost $-c(e)$ and carries $f(e)-g(e)$, so its contribution is
\begin{align*}
-c(e)(f(e)-g(e))=c(e)(g(e)-f(e)).
\end{align*}
Summing over all original arcs gives
\begin{align*}
\sum_{a\in E(G_f)} c_f(a)h(a)=\sum_{e\in E} c(e)(g(e)-f(e))=C(g)-C(f).
\end{align*}
Finally, since the residual network is finite, we decompose $h$. While there is positive flow from $s$ to $t$, choose a directed $s$-$t$ path using only arcs with positive $h$-value, subtract the minimum $h$-value on that path, and record that amount as a path weight. This process terminates because at least one positive arc is removed at every extraction. The total extracted path weight is $\delta$, since all net value from $s$ to $t$ must be removed. What remains has zero imbalance at every vertex, hence is a circulation; decomposing a finite directed circulation by repeatedly following positive arcs until a vertex repeats gives directed cycles. Therefore
\begin{align*}
C(g)-C(f)=\sum_{i=1}^m \alpha_i c_f(Q_i)+\sum_{j=1}^r \beta_j c_f(\Gamma_j),
\end{align*}
with $\sum_{i=1}^m\alpha_i=\delta$.
[/guided]
[/step]
[step:Use shortestness and absence of negative cycles to preserve optimality]
Assume that $f$ is minimum-cost among feasible flows of value $k$, that $G_f$ has no directed cycle of negative residual cost, and that $P$ is a minimum-cost residual $s$-$t$ path in $G_f$. Let $f'$ be obtained from $f$ by augmenting an integral amount $\delta>0$ along $P$.
Let $g$ be any feasible flow of value $k+\delta$. Applying the decomposition from the previous step to $g-f$, we obtain residual $s$-$t$ paths $Q_i$, directed residual cycles $\Gamma_j$, and positive weights $\alpha_i,\beta_j$ with $\sum_i\alpha_i=\delta$. Since $P$ is a minimum-cost residual $s$-$t$ path,
\begin{align*}
c_f(Q_i)\geq c_f(P)
\end{align*}
for every $i$. Since $G_f$ has no negative-cost directed residual cycle,
\begin{align*}
c_f(\Gamma_j)\geq 0
\end{align*}
for every $j$. Therefore the decomposition identity gives
\begin{align*}
C(g)-C(f)=\sum_{i=1}^m \alpha_i c_f(Q_i)+\sum_{j=1}^r \beta_j c_f(\Gamma_j).
\end{align*}
The path and cycle bounds imply
\begin{align*}
C(g)-C(f)\geq \sum_{i=1}^m \alpha_i c_f(P).
\end{align*}
Since $\sum_{i=1}^m\alpha_i=\delta$, this becomes
\begin{align*}
C(g)-C(f)\geq \delta c_f(P).
\end{align*}
Using the augmentation cost identity,
\begin{align*}
C(f')=C(f)+\delta c_f(P).
\end{align*}
Hence $C(g)\geq C(f')$ for every feasible flow $g$ of value $k+\delta$. Thus $f'$ is minimum-cost among feasible flows of value $k+\delta$.
[/step]
[step:Show optimality forbids negative residual cycles]
Let $f'$ be minimum-cost among feasible flows of value $k+\delta$. Suppose, for contradiction, that $G_{f'}$ contains a directed cycle $\Gamma$ with negative residual cost:
\begin{align*}
c_{f'}(\Gamma)<0.
\end{align*}
Let $\varepsilon>0$ be the minimum residual capacity of the arcs of $\Gamma$. Since $\Gamma$ is finite and every arc of $\Gamma$ is residual, $\varepsilon$ is positive. Augmenting $f'$ by $\varepsilon$ around $\Gamma$ preserves flow conservation at every vertex and therefore preserves the value $k+\delta$. Its cost changes by
\begin{align*}
\varepsilon c_{f'}(\Gamma)<0.
\end{align*}
This produces a feasible flow of the same value and strictly smaller cost than $f'$, contradicting the optimality of $f'$. Therefore $G_{f'}$ has no directed cycle of negative residual cost.
[/step]
[step:Induct over the augmentations until value $b$ is reached]
The zero flow has value $0$. Let $g:E\to\mathbb{R}_{\geq 0}$ be any feasible flow of value $0$. Applying the same finite decomposition argument to $g$ relative to the zero flow produces no $s$-$t$ path components, because the value difference is $0$, and decomposes $g$ into directed cycles in the residual network of the zero flow. The algorithmic hypothesis at the first iteration says this residual network has no negative-cost directed cycle, so every cycle component has non-negative cost. Hence $C(g)\geq 0=C(0)$, and the zero flow is minimum-cost among feasible flows of value $0$.
Assume inductively that before an iteration the current flow $f$ is minimum-cost among feasible flows of its value $k$ and that $G_f$ has no negative-cost directed cycle. The algorithm chooses a minimum-cost residual $s$-$t$ path $P$ and augments by a positive integral amount $\delta$ not exceeding the residual capacity of $P$ and not exceeding $b-k$. By the previous two steps, the new flow is minimum-cost among feasible flows of value $k+\delta$ and its residual network again has no negative-cost directed cycle.
Since capacities are integral and each augmentation amount is a positive integer, the flow value increases through integers and never exceeds $b$. When the algorithm stops at value $b$, the induction invariant gives that the final flow is minimum-cost among all feasible $s$-$t$ flows of value $b$. This is exactly the claimed correctness of the successive shortest augmenting path algorithm.
[/step]