[proofplan]
The proof has two parts: pathwise uniqueness and existence. For pathwise uniqueness, we take two solutions $X, X'$ with the same initial condition and driving Brownian motion, localize by stopping at the exit from $\{|X| \leq n\} \cap \{|X'| \leq n\}$, estimate $\mathbb{E}[(X_{t \wedge S_n} - X'_{t \wedge S_n})^2]$ using the Lipschitz condition and the Itô isometry, and conclude via Grönwall's lemma that the difference is zero. For existence, we construct a solution via Picard iteration: define $X^0_t = x$ and $X^{k+1} = \mathcal{F}(X^k)$ where $\mathcal{F}$ is the Itô map, show the iterates converge uniformly in $L^2$ using iterated Lipschitz estimates and Doob's maximal inequality, and verify the limit solves the SDE.
[/proofplan]
[step:Set up the framework and reduce to the one-dimensional case]
We prove the theorem for $m = d = 1$ to simplify notation; the multi-dimensional case follows by the same argument with matrix-valued $\sigma$ and vector-valued $b$, replacing absolute values with norms. Let $K > 0$ be the Lipschitz constant:
\begin{align*}
|b(t, x) - b(t, y)| + |\sigma(t, x) - \sigma(t, y)| \leq K|x - y|
\end{align*}
for all $t \geq 0$ and $x, y \in \mathbb{R}$. Let $(\Omega, \mathcal{F}, (\mathcal{F}_t), \mathbb{P})$ be a filtered probability space satisfying the usual conditions, and let $W$ be an $(\mathcal{F}_t)$-Brownian motion.
[/step]
[step:Prove pathwise uniqueness by Grönwall's lemma]
Suppose $X$ and $X'$ are two continuous adapted solutions to $\mathcal{E}_x(\sigma, b)$ on the same probability space with the same Brownian motion $W$ and with $X_0 = X'_0 = x$. Define the stopping times
\begin{align*}
S_n = \inf\{t \geq 0 : |X_t| \geq n \text{ or } |X'_t| \geq n\}.
\end{align*}
Since $X$ and $X'$ are continuous, $S_n \uparrow \infty$ a.s. Fix $T > 0$ and $n \geq |x|$. For $t \in [0, T]$:
\begin{align*}
X_{t \wedge S_n} - X'_{t \wedge S_n} = \int_0^{t \wedge S_n} \bigl[\sigma(s, X_s) - \sigma(s, X'_s)\bigr] \, dW_s + \int_0^{t \wedge S_n} \bigl[b(s, X_s) - b(s, X'_s)\bigr] \, d\mathcal{L}^1(s).
\end{align*}
Using the inequality $(a + b)^2 \leq 2a^2 + 2b^2$ and taking expectations:
\begin{align*}
\mathbb{E}\bigl[(X_{t \wedge S_n} - X'_{t \wedge S_n})^2\bigr] &\leq 2\,\mathbb{E}\!\left[\left(\int_0^{t \wedge S_n} [\sigma(s, X_s) - \sigma(s, X'_s)] \, dW_s\right)^2\right] \\
&\quad + 2\,\mathbb{E}\!\left[\left(\int_0^{t \wedge S_n} [b(s, X_s) - b(s, X'_s)] \, d\mathcal{L}^1(s)\right)^2\right].
\end{align*}
For the stochastic integral term, the [Itô Isometry](/theorems/2091) gives
\begin{align*}
\mathbb{E}\!\left[\left(\int_0^{t \wedge S_n} [\sigma(s, X_s) - \sigma(s, X'_s)] \, dW_s\right)^2\right] = \mathbb{E}\!\left[\int_0^{t \wedge S_n} |\sigma(s, X_s) - \sigma(s, X'_s)|^2 \, d\mathcal{L}^1(s)\right].
\end{align*}
By the Lipschitz condition, $|\sigma(s, X_s) - \sigma(s, X'_s)|^2 \leq K^2 |X_s - X'_s|^2$, so this is bounded by
\begin{align*}
K^2 \int_0^t \mathbb{E}\bigl[|X_{s \wedge S_n} - X'_{s \wedge S_n}|^2\bigr] \, d\mathcal{L}^1(s).
\end{align*}
For the Lebesgue integral term, the Cauchy-Schwarz inequality in $L^2([0, t \wedge S_n], \mathcal{L}^1)$ gives
\begin{align*}
\left(\int_0^{t \wedge S_n} |b(s, X_s) - b(s, X'_s)| \, d\mathcal{L}^1(s)\right)^2 \leq T \int_0^{t \wedge S_n} |b(s, X_s) - b(s, X'_s)|^2 \, d\mathcal{L}^1(s).
\end{align*}
By Lipschitz, this is bounded by $TK^2 \int_0^{t \wedge S_n} |X_s - X'_s|^2 \, d\mathcal{L}^1(s)$. Taking expectations and using Fubini (the integrand is non-negative):
\begin{align*}
\mathbb{E}\!\left[\left(\int_0^{t \wedge S_n} [b(s, X_s) - b(s, X'_s)] \, d\mathcal{L}^1(s)\right)^2\right] \leq TK^2 \int_0^t \mathbb{E}\bigl[|X_{s \wedge S_n} - X'_{s \wedge S_n}|^2\bigr] \, d\mathcal{L}^1(s).
\end{align*}
Combining, and defining $h(t) = \mathbb{E}[(X_{t \wedge S_n} - X'_{t \wedge S_n})^2]$:
\begin{align*}
h(t) \leq 2K^2(1 + T) \int_0^t h(s) \, d\mathcal{L}^1(s).
\end{align*}
Since $h$ is non-negative and measurable with $h(0) = 0$, Grönwall's lemma gives $h(t) \leq h(0) \cdot e^{2K^2(1+T)t} = 0$ for all $t \in [0, T]$.
Therefore $\mathbb{E}[(X_{t \wedge S_n} - X'_{t \wedge S_n})^2] = 0$ for all $t \leq T$, which implies $X_{t \wedge S_n} = X'_{t \wedge S_n}$ a.s. Letting $n \to \infty$ (so $S_n \to \infty$) and then $T \to \infty$, we conclude $X_t = X'_t$ a.s. for all $t \geq 0$. Since both processes are continuous, a single null set works for all $t$: $\mathbb{P}(\sup_{t \geq 0} |X_t - X'_t| > 0) = 0$.
[guided]
The uniqueness argument is a standard application of Grönwall's lemma to the $L^2$ difference of two solutions. The localization by $S_n$ ensures all quantities are well-defined and finite.
Why do we need $S_n$? Without stopping, the Itô isometry requires $\mathbb{E}[\int_0^t |\sigma(s, X_s) - \sigma(s, X'_s)|^2 \, d\mathcal{L}^1(s)] < \infty$, which is not immediately clear since $X$ and $X'$ could be unbounded. By stopping at $S_n$, both $X^{S_n}$ and $(X')^{S_n}$ are bounded by $n$, so the Lipschitz bound gives $|\sigma(s, X_{s \wedge S_n}) - \sigma(s, X'_{s \wedge S_n})|^2 \leq K^2 |X_{s \wedge S_n} - X'_{s \wedge S_n}|^2 \leq 4K^2 n^2$, which is integrable.
The two estimates use different tools:
- The stochastic integral term uses the [Itô Isometry](/theorems/2091), which converts $\mathbb{E}[(\int H \, dW)^2]$ to $\mathbb{E}[\int H^2 \, d\mathcal{L}^1]$.
- The Lebesgue integral term uses the Cauchy-Schwarz inequality, which introduces the factor $T$. This factor is harmless because we work on a fixed interval $[0, T]$.
After combining, we obtain $h(t) \leq C \int_0^t h(s) \, d\mathcal{L}^1(s)$ with $C = 2K^2(1+T)$. This is the classical Grönwall setup: a non-negative function dominated by its own integral. Since $h(0) = 0$ (both solutions start at $x$), Grönwall gives $h \equiv 0$ on $[0, T]$.
The passage from "a.s. for each fixed $t$" to "a.s. for all $t$ simultaneously" uses continuity: $\{X_t = X'_t \text{ for all } t \in \mathbb{Q}_+\}$ has probability $1$ (countable intersection of probability-$1$ events), and on this event, continuity implies $X_t = X'_t$ for all $t \geq 0$.
[/guided]
[/step]
[step:Construct a solution via Picard iteration]
Fix $T > 0$. Define the Picard iterates recursively:
\begin{align*}
X^0_t &= x, \\
X^{k+1}_t &= x + \int_0^t \sigma(s, X^k_s) \, dW_s + \int_0^t b(s, X^k_s) \, d\mathcal{L}^1(s), \quad k \geq 0.
\end{align*}
Each $X^k$ is a continuous adapted process (by induction, using the continuity of stochastic integrals and Lebesgue integrals of continuous integrands).
[/step]
[step:Establish the contraction estimate for successive iterates]
We show by induction that for all $k \geq 0$:
\begin{align*}
\mathbb{E}\!\left[\sup_{0 \leq s \leq t} |X^{k+1}_s - X^k_s|^2\right] \leq \frac{(Ct)^k}{k!} \cdot \mathbb{E}\!\left[\sup_{0 \leq s \leq T} |X^1_s - X^0_s|^2\right]
\end{align*}
for a constant $C = C(K, T)$ and all $t \leq T$.
**Base case ($k = 0$).** We need $\mathbb{E}[\sup_{s \leq T} |X^1_s - x|^2] < \infty$. Since $X^1_t - x = \int_0^t \sigma(s, x) \, dW_s + \int_0^t b(s, x) \, d\mathcal{L}^1(s)$, Doob's $L^2$ maximal inequality applied to the martingale part and the Cauchy-Schwarz inequality applied to the drift give
\begin{align*}
\mathbb{E}\!\left[\sup_{s \leq T} |X^1_s - x|^2\right] \leq 8\int_0^T |\sigma(s, x)|^2 \, d\mathcal{L}^1(s) + 2T\int_0^T |b(s, x)|^2 \, d\mathcal{L}^1(s) < \infty,
\end{align*}
where the finiteness uses the linear growth bound $|\sigma(s, x)| + |b(s, x)| \leq |\sigma(s, 0)| + |b(s, 0)| + K|x|$, which is bounded on $[0, T]$ for fixed $x$.
**Inductive step.** For $k \geq 0$:
\begin{align*}
X^{k+2}_t - X^{k+1}_t = \int_0^t [\sigma(s, X^{k+1}_s) - \sigma(s, X^k_s)] \, dW_s + \int_0^t [b(s, X^{k+1}_s) - b(s, X^k_s)] \, d\mathcal{L}^1(s).
\end{align*}
By the same estimates as in the uniqueness step (Doob's inequality with constant $4$ for the martingale part, Cauchy-Schwarz for the drift part, and the Lipschitz condition):
\begin{align*}
\mathbb{E}\!\left[\sup_{0 \leq u \leq t} |X^{k+2}_u - X^{k+1}_u|^2\right] &\leq 8K^2 \int_0^t \mathbb{E}[|X^{k+1}_s - X^k_s|^2] \, d\mathcal{L}^1(s) + 2TK^2 \int_0^t \mathbb{E}[|X^{k+1}_s - X^k_s|^2] \, d\mathcal{L}^1(s) \\
&\leq C \int_0^t \mathbb{E}\!\left[\sup_{0 \leq u \leq s} |X^{k+1}_u - X^k_u|^2\right] \, d\mathcal{L}^1(s),
\end{align*}
where $C = 2K^2(4 + T)$. Applying the inductive hypothesis:
\begin{align*}
\mathbb{E}\!\left[\sup_{0 \leq u \leq t} |X^{k+2}_u - X^{k+1}_u|^2\right] \leq C \cdot \frac{C^k}{k!} \cdot \mathbb{E}\!\left[\sup_{s \leq T} |X^1_s - x|^2\right] \int_0^t s^k \, d\mathcal{L}^1(s) = \frac{C^{k+1} t^{k+1}}{(k+1)!} \cdot \mathbb{E}\!\left[\sup_{s \leq T} |X^1_s - x|^2\right].
\end{align*}
This completes the induction.
[guided]
The Picard iteration is a fixed-point argument in the space of adapted continuous processes with the $L^2$-supremum norm. Why does the factorial $k!$ in the denominator appear? This is the key feature that makes Picard iteration work even when the "Lipschitz constant" $C$ of the Itô map exceeds $1$.
Consider the analogous deterministic ODE setting: if $\Phi(x)_t = x_0 + \int_0^t f(s, x_s) \, d\mathcal{L}^1(s)$ with $f$ Lipschitz in $x$ with constant $L$, then
\begin{align*}
\sup_{s \leq t} |\Phi^{k+1}(x)_s - \Phi^k(x)_s| \leq \frac{(Lt)^k}{k!} \sup_{s \leq T} |\Phi(x)_s - x_0|.
\end{align*}
The factorial comes from iterating the integral: $\int_0^t \int_0^{s_1} \cdots \int_0^{s_{k-1}} d\mathcal{L}^1(s_k) \cdots d\mathcal{L}^1(s_1) = t^k/k!$.
In the stochastic case, we get the same structure. The Doob maximal inequality gives us an extra constant (replacing $\mathbb{E}[|\int H \, dW|^2]$ with $\mathbb{E}[\sup |\int H \, dW|^2]$ costs a factor of $4$), but this only affects the constant $C$, not the factorial decay.
The series $\sum_{k=0}^\infty \frac{(CT)^k}{k!} = e^{CT} < \infty$ guarantees convergence, regardless of the value of $C$.
[/guided]
[/step]
[step:Show the Picard iterates converge uniformly in $L^2$ to a solution]
The estimate from the previous step gives
\begin{align*}
\sum_{k=0}^\infty \mathbb{E}\!\left[\sup_{0 \leq s \leq T} |X^{k+1}_s - X^k_s|^2\right]^{1/2} \leq \sum_{k=0}^\infty \frac{(CT)^{k/2}}{(k!)^{1/2}} \cdot \mathbb{E}\!\left[\sup_{s \leq T} |X^1_s - x|^2\right]^{1/2} < \infty,
\end{align*}
since $(CT)^{k/2} / (k!)^{1/2} \to 0$ faster than any geometric sequence. By the Chebyshev inequality,
\begin{align*}
\sum_{k=0}^\infty \mathbb{P}\!\left(\sup_{s \leq T} |X^{k+1}_s - X^k_s| \geq 2^{-k}\right) \leq \sum_{k=0}^\infty 4^k \, \mathbb{E}\!\left[\sup_{s \leq T} |X^{k+1}_s - X^k_s|^2\right] < \infty,
\end{align*}
so by the Borel-Cantelli lemma, $\sup_{s \leq T} |X^{k+1}_s - X^k_s| < 2^{-k}$ for all sufficiently large $k$, a.s. Therefore the telescoping series $X^k = x + \sum_{j=0}^{k-1}(X^{j+1} - X^j)$ converges uniformly on $[0, T]$ a.s. to a continuous adapted limit
\begin{align*}
X_t = \lim_{k \to \infty} X^k_t, \quad t \in [0, T].
\end{align*}
We verify $X$ solves the SDE. By the Lipschitz condition:
\begin{align*}
|\sigma(s, X^k_s) - \sigma(s, X_s)|^2 \leq K^2 |X^k_s - X_s|^2
\end{align*}
for all $s \in [0, T]$. Taking expectations and integrating:
\begin{align*}
\mathbb{E}\!\left[\int_0^t |\sigma(s, X^k_s) - \sigma(s, X_s)|^2 \, d\mathcal{L}^1(s)\right] \leq K^2 \int_0^t \mathbb{E}\bigl[|X^k_s - X_s|^2\bigr] \, d\mathcal{L}^1(s) \leq K^2 T \, \mathbb{E}\!\left[\sup_{s \leq T} |X^k_s - X_s|^2\right].
\end{align*}
The right-hand side tends to zero because the $L^2$-supremum convergence $\mathbb{E}[\sup_{s \leq T} |X^k_s - X_s|^2] \to 0$ was established above (the contraction estimate gives summability of $\mathbb{E}[\sup_s |X^{k+1}_s - X^k_s|^2]$, so the telescoping partial sums converge in $L^2$-sup norm). The [Itô Isometry](/theorems/2091) then gives
\begin{align*}
\mathbb{E}\!\left[\left(\int_0^t \sigma(s, X^k_s) \, dW_s - \int_0^t \sigma(s, X_s) \, dW_s\right)^2\right] = \mathbb{E}\!\left[\int_0^t |\sigma(s, X^k_s) - \sigma(s, X_s)|^2 \, d\mathcal{L}^1(s)\right] \to 0,
\end{align*}
so the stochastic integrals converge in $L^2$. Similarly, $\int_0^t b(s, X^k_s) \, d\mathcal{L}^1(s) \to \int_0^t b(s, X_s) \, d\mathcal{L}^1(s)$ a.s. by dominated convergence on the Lebesgue integral (the Lipschitz bound and uniform convergence $X^k \to X$ a.s. provide the dominator). Passing the Picard recursion to the limit:
\begin{align*}
X_t = x + \int_0^t \sigma(s, X_s) \, dW_s + \int_0^t b(s, X_s) \, d\mathcal{L}^1(s).
\end{align*}
[/step]
[step:Extend to a global solution by concatenation]
The construction above produces a strong solution on $[0, T]$. Applying the same argument on $[T, 2T]$, $[2T, 3T]$, etc. (each time using $X_T, X_{2T}, \ldots$ as the initial condition), we obtain a solution on each interval. By pathwise uniqueness, these solutions agree on overlaps, so we concatenate them to obtain a strong solution on $[0, \infty)$. Uniqueness of this global solution follows from the pathwise uniqueness established in the second step: any two solutions agree on $[0, T]$ for every $T$, hence on $[0, \infty)$.
[/step]