[proofplan]
The proof proceeds in three stages. First, we establish Ito's formula for polynomial $f$ by induction on degree, using the integration by parts formula as the inductive engine: the base case is constant functions, and the inductive step writes $g(x) = x_k f(x)$ and applies integration by parts to the semimartingales $X^k$ and $f(X)$. Second, we reduce to bounded processes by introducing stopping times $T_n$ that keep $|X|$ and the total variation of the finite variation part bounded. Third, we approximate a general $C^2$ function by polynomials on compact sets using the Weierstrass approximation theorem, and pass to the limit using the stochastic dominated convergence theorem for the martingale integrals and the classical dominated convergence theorem for the finite variation integrals.
[/proofplan]
[step:Prove the formula for constant functions (base case)]
If $f$ is a constant function, say $f(x) = c$ for all $x \in \mathbb{R}^p$, then $f(X_t) = c = f(X_0)$ for all $t$. All partial derivatives of $f$ vanish identically: $\frac{\partial f}{\partial x_i} = 0$ and $\frac{\partial^2 f}{\partial x_i \partial x_j} = 0$ for all $i, j$. The right-hand side of Ito's formula is therefore $f(X_0) + 0 + 0 = f(X_0) = f(X_t)$, so the formula holds.
[/step]
[step:Prove the inductive step: if the formula holds for $f$, it holds for $g(x) = x_k f(x)$]
Assume Ito's formula holds for some $f \in C^2(\mathbb{R}^p)$, and fix a coordinate index $k \in \{1, \ldots, p\}$. Define $g: \mathbb{R}^p \to \mathbb{R}$ by $g(x) = x_k f(x)$. The process $f(X)$ is a semimartingale by the inductive hypothesis, and $X^k$ is a continuous semimartingale by assumption. Apply the [Integration by Parts](/theorems/2098) formula to the pair $(X^k, f(X))$:
\begin{align*}
X^k_t f(X_t) - X^k_0 f(X_0) = \int_0^t X^k_s \, df(X_s) + \int_0^t f(X_s) \, dX^k_s + \langle X^k, f(X) \rangle_t.
\end{align*}
That is, $g(X_t) - g(X_0) = I_1 + I_2 + I_3$ where $I_1$, $I_2$, $I_3$ denote the three terms on the right.
[step:Expand $I_1 = \int_0^t X^k_s \, df(X_s)$ using the inductive hypothesis]
By the inductive hypothesis, $f(X)$ is a semimartingale with
\begin{align*}
df(X_s) = \sum_{i=1}^p \frac{\partial f}{\partial x_i}(X_s) \, dX^i_s + \frac{1}{2}\sum_{i,j=1}^p \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s.
\end{align*}
Substituting into $I_1$:
\begin{align*}
I_1 = \sum_{i=1}^p \int_0^t X^k_s \frac{\partial f}{\partial x_i}(X_s) \, dX^i_s + \frac{1}{2}\sum_{i,j=1}^p \int_0^t X^k_s \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s.
\end{align*}
[/step]
[step:Compute $I_3 = \langle X^k, f(X) \rangle_t$ using bilinearity of covariation]
From the inductive hypothesis, the local martingale part of $f(X)$ is $\sum_{i=1}^p \int_0^{\cdot} \frac{\partial f}{\partial x_i}(X_s) \, dX^i_s$. The finite variation part of $f(X)$ contributes nothing to the covariation (since the covariation of a continuous local martingale with a continuous finite variation process is zero). By bilinearity,
\begin{align*}
I_3 = \langle X^k, f(X) \rangle_t = \sum_{i=1}^p \int_0^t \frac{\partial f}{\partial x_i}(X_s) \, d\langle X^k, X^i \rangle_s.
\end{align*}
[/step]
[step:Assemble the terms and verify they match the Ito formula for $g$]
Combining $I_1$, $I_2$, and $I_3$:
\begin{align*}
g(X_t) - g(X_0) &= \sum_{i=1}^p \int_0^t X^k_s \frac{\partial f}{\partial x_i}(X_s) \, dX^i_s + \frac{1}{2}\sum_{i,j=1}^p \int_0^t X^k_s \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s \\
&\quad + \int_0^t f(X_s) \, dX^k_s + \sum_{i=1}^p \int_0^t \frac{\partial f}{\partial x_i}(X_s) \, d\langle X^k, X^i \rangle_s.
\end{align*}
We now verify that this matches the Ito formula for $g(x) = x_k f(x)$. By the product rule for partial derivatives:
\begin{align*}
\frac{\partial g}{\partial x_i}(x) &= \delta_{ik} f(x) + x_k \frac{\partial f}{\partial x_i}(x), \\
\frac{\partial^2 g}{\partial x_i \partial x_j}(x) &= \delta_{ik} \frac{\partial f}{\partial x_j}(x) + \delta_{jk} \frac{\partial f}{\partial x_i}(x) + x_k \frac{\partial^2 f}{\partial x_i \partial x_j}(x).
\end{align*}
The Ito formula for $g$ asserts
\begin{align*}
g(X_t) - g(X_0) &= \sum_{i=1}^p \int_0^t \frac{\partial g}{\partial x_i}(X_s) \, dX^i_s + \frac{1}{2}\sum_{i,j=1}^p \int_0^t \frac{\partial^2 g}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s.
\end{align*}
Substituting the expressions for $\frac{\partial g}{\partial x_i}$ and $\frac{\partial^2 g}{\partial x_i \partial x_j}$, the first-order term becomes
\begin{align*}
\sum_{i=1}^p \int_0^t \bigl(\delta_{ik} f(X_s) + X^k_s \frac{\partial f}{\partial x_i}(X_s)\bigr) \, dX^i_s = \int_0^t f(X_s) \, dX^k_s + \sum_{i=1}^p \int_0^t X^k_s \frac{\partial f}{\partial x_i}(X_s) \, dX^i_s,
\end{align*}
which matches $I_2$ plus the first sum in $I_1$. The second-order term becomes
\begin{align*}
&\frac{1}{2}\sum_{i,j=1}^p \int_0^t \Bigl(\delta_{ik} \frac{\partial f}{\partial x_j}(X_s) + \delta_{jk} \frac{\partial f}{\partial x_i}(X_s) + X^k_s \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s)\Bigr) d\langle X^i, X^j \rangle_s \\
&= \frac{1}{2}\sum_{j=1}^p \int_0^t \frac{\partial f}{\partial x_j}(X_s) \, d\langle X^k, X^j \rangle_s + \frac{1}{2}\sum_{i=1}^p \int_0^t \frac{\partial f}{\partial x_i}(X_s) \, d\langle X^i, X^k \rangle_s \\
&\quad + \frac{1}{2}\sum_{i,j=1}^p \int_0^t X^k_s \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s.
\end{align*}
Since $\langle X^i, X^j \rangle = \langle X^j, X^i \rangle$ (covariation is symmetric), the first two sums are identical and combine to $\sum_{i=1}^p \int_0^t \frac{\partial f}{\partial x_i}(X_s) \, d\langle X^k, X^i \rangle_s$, which is $I_3$. The third sum is the second-order part of $I_1$. Therefore the two expressions agree term by term, confirming that Ito's formula holds for $g = x_k f$.
[guided]
The key observation is that the inductive step works by the product rule: passing from $f$ to $g(x) = x_k f(x)$ introduces exactly one new coordinate factor, and the integration by parts formula accounts for the interaction between $X^k$ and $f(X)$ through the covariation $\langle X^k, f(X) \rangle_t$. This covariation produces the extra first-derivative terms $\frac{\partial f}{\partial x_i}$ that, together with the $\delta_{ik}$ terms from differentiating $x_k f(x)$, assemble into the correct derivatives of $g$.
The algebra is bookkeeping: we have four groups of terms from $I_1 + I_2 + I_3$ (two from $I_1$, one from $I_2$, one from $I_3$), and they must match the first- and second-order terms of the Ito formula for $g$. The matching works because the derivatives of $g(x) = x_k f(x)$ satisfy:
- $\frac{\partial g}{\partial x_i} = \delta_{ik} f + x_k \frac{\partial f}{\partial x_i}$, which captures $I_2$ (the $\delta_{ik}$ term) and the stochastic integral part of $I_1$ (the $x_k \frac{\partial f}{\partial x_i}$ term).
- $\frac{\partial^2 g}{\partial x_i \partial x_j} = \delta_{ik}\frac{\partial f}{\partial x_j} + \delta_{jk}\frac{\partial f}{\partial x_i} + x_k \frac{\partial^2 f}{\partial x_i \partial x_j}$, whose first two summands (after using symmetry of covariation) produce $I_3$, and whose third summand produces the finite variation part of $I_1$.
[/guided]
[/step]
[step:Conclude the formula for all polynomials by induction on degree]
Every monomial $x_1^{a_1} \cdots x_p^{a_p}$ can be built from the constant function $1$ by repeatedly multiplying by coordinate functions $x_k$. The base case (constant functions) was verified in the first step, and the inductive step shows that if Ito's formula holds for $f$, it holds for $x_k f$ for any $k$. By induction on degree, the formula holds for all monomials. Since the stochastic integral is linear in the integrand and the formula is additive (both sides are linear in $f$ given the linearity of derivatives), the formula extends to all polynomials $f: \mathbb{R}^p \to \mathbb{R}$.
[/step]
[/step]
[step:Reduce to bounded semimartingales via stopping times]
Write $X = M + A$ where $M = (M^1, \ldots, M^p)$ is a vector of continuous local martingales and $A = (A^1, \ldots, A^p)$ is a vector of continuous finite variation processes. For each $n \geq 1$, define the stopping time
\begin{align*}
T_n: \Omega &\to [0, \infty] \\
\omega &\mapsto \inf\!\left\{t \geq 0 : |X_t(\omega)| \geq n \text{ or } \sum_{i=1}^p \int_0^t |dA^i_s(\omega)| \geq n\right\}.
\end{align*}
Since $X$ is continuous and each $A^i$ has continuous finite variation paths, we have $T_n(\omega) \uparrow \infty$ for each $\omega$.
The stopped process $X^{T_n}_t = X_{t \wedge T_n}$ is a continuous semimartingale with $|X^{T_n}_t| \leq n$ for all $t$ and $\omega$, and the total variation of its finite variation part is bounded by $n$. Ito's formula for $X^{T_n}$ and the formula for $X$ are related by the stopped-integral identities: if the formula holds for $X^{T_n}$, then
\begin{align*}
f(X_{t \wedge T_n}) = f(X_0) + \sum_{i=1}^p \int_0^{t \wedge T_n} \frac{\partial f}{\partial x_i}(X_s) \, dX^i_s + \frac{1}{2}\sum_{i,j=1}^p \int_0^{t \wedge T_n} \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s.
\end{align*}
Taking $n \to \infty$, continuity of all processes and $T_n \to \infty$ give the formula for all $t$. It therefore suffices to prove the formula under the additional assumption that $|X_t| \leq n$ and $\sum_i \int_0^t |dA^i_s| \leq n$ for all $t$ and $\omega$.
[guided]
Why do we reduce to bounded processes? The Weierstrass approximation theorem in the next step will approximate $f$ and its derivatives uniformly on compact sets. If $X$ is unbounded, there is no single compact set on which we can work. By stopping at $T_n$, we confine $X$ to the ball $\overline{B}(0, n) \subset \mathbb{R}^p$, and we can approximate $f$ on this ball.
The stopping also controls the finite variation part: bounding $\sum_i \int_0^t |dA^i_s| \leq n$ ensures the finite variation integrals in the Ito formula are uniformly bounded, which is needed for the dominated convergence argument in the next step.
After proving the formula for stopped processes, the extension to the unstopped process is immediate: since $T_n \to \infty$ a.s. and both sides of the identity are continuous in $t$, the identity for $f(X_{t \wedge T_n})$ with $n$ large enough covers any fixed $t$.
[/guided]
[/step]
[step:Approximate $f \in C^2(\mathbb{R}^p)$ by polynomials and pass to the limit]
Assume $|X_t| \leq n$ for all $t$ and $\omega$ (the reduction from the previous step). By the Weierstrass Approximation Theorem, for each $\ell \geq 1$ there exists a polynomial $p_\ell: \mathbb{R}^p \to \mathbb{R}$ such that
\begin{align*}
\sup_{|x| \leq n} \left(|f(x) - p_\ell(x)| + \max_{1 \leq i \leq p} \left|\frac{\partial f}{\partial x_i}(x) - \frac{\partial p_\ell}{\partial x_i}(x)\right| + \max_{1 \leq i,j \leq p} \left|\frac{\partial^2 f}{\partial x_i \partial x_j}(x) - \frac{\partial^2 p_\ell}{\partial x_i \partial x_j}(x)\right|\right) \leq \frac{1}{\ell}.
\end{align*}
Since Ito's formula holds for each polynomial $p_\ell$ (by the first step), we have
\begin{align*}
p_\ell(X_t) = p_\ell(X_0) + \sum_{i=1}^p \int_0^t \frac{\partial p_\ell}{\partial x_i}(X_s) \, dX^i_s + \frac{1}{2}\sum_{i,j=1}^p \int_0^t \frac{\partial^2 p_\ell}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s.
\end{align*}
We pass to the limit $\ell \to \infty$ in each term:
**Left-hand side.** Since $|X_t| \leq n$ and $\sup_{|x| \leq n} |f(x) - p_\ell(x)| \leq 1/\ell$, we have $p_\ell(X_t) \to f(X_t)$ and $p_\ell(X_0) \to f(X_0)$ uniformly in $\omega$.
**Stochastic integral terms.** Write $X^i = M^i + A^i$. The stochastic integral decomposes as
\begin{align*}
\int_0^t \frac{\partial p_\ell}{\partial x_i}(X_s) \, dX^i_s = \int_0^t \frac{\partial p_\ell}{\partial x_i}(X_s) \, dM^i_s + \int_0^t \frac{\partial p_\ell}{\partial x_i}(X_s) \, dA^i_s.
\end{align*}
For the martingale part: the integrand $\frac{\partial p_\ell}{\partial x_i}(X_s)$ converges to $\frac{\partial f}{\partial x_i}(X_s)$ uniformly in $s$ and $\omega$ (since $|X_s| \leq n$), and both are bounded by $\sup_{|x| \leq n} |\frac{\partial f}{\partial x_i}(x)| + 1$ for $\ell$ large enough. Since $\langle M^i \rangle_t < \infty$ a.s., the [Stochastic Dominated Convergence Theorem](/theorems/2096) gives
\begin{align*}
\int_0^t \frac{\partial p_\ell}{\partial x_i}(X_s) \, dM^i_s \xrightarrow{\mathbb{P}} \int_0^t \frac{\partial f}{\partial x_i}(X_s) \, dM^i_s.
\end{align*}
For the finite variation part: $\frac{\partial p_\ell}{\partial x_i}(X_s) \to \frac{\partial f}{\partial x_i}(X_s)$ uniformly, and the total variation $\int_0^t |dA^i_s| \leq n$, so the classical dominated convergence theorem (with respect to the finite signed measure $dA^i$) gives
\begin{align*}
\int_0^t \frac{\partial p_\ell}{\partial x_i}(X_s) \, dA^i_s \to \int_0^t \frac{\partial f}{\partial x_i}(X_s) \, dA^i_s.
\end{align*}
**Quadratic variation terms.** Since $\frac{\partial^2 p_\ell}{\partial x_i \partial x_j}(X_s) \to \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s)$ uniformly in $s$ and $\omega$, and $\langle X^i, X^j \rangle_t < \infty$ a.s. (as a continuous finite variation process), the classical dominated convergence theorem with respect to the measure $d\langle X^i, X^j \rangle_s$ gives
\begin{align*}
\int_0^t \frac{\partial^2 p_\ell}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s \to \int_0^t \frac{\partial^2 f}{\partial x_i \partial x_j}(X_s) \, d\langle X^i, X^j \rangle_s.
\end{align*}
Taking $\ell \to \infty$ in the identity for $p_\ell$, all terms converge (in probability for the stochastic integrals, pathwise for the rest), yielding Ito's formula for $f$ and the bounded process $X$.
[guided]
The approximation strategy is: we know Ito's formula for polynomials, and we want it for $C^2$ functions. The Weierstrass theorem lets us approximate $f$ and its first two derivatives uniformly on the compact set $\{|x| \leq n\}$, which contains all values of $X$ by the stopping argument. The convergence of the three types of integrals uses different tools:
1. **Stochastic integrals against the local martingale part**: these require the stochastic dominated convergence theorem, because convergence in the Ito integral is convergence in probability, not pathwise. The dominating process is a constant (since all integrands are uniformly bounded on $\{|x| \leq n\}$), and the integrability condition $\int_0^t K^2 \, d\langle M^i \rangle_s < \infty$ holds since $\langle M^i \rangle_t < \infty$ a.s.
2. **Integrals against the finite variation part**: these are ordinary Lebesgue-Stieltjes integrals, and the classical dominated convergence theorem applies because the total variation measure $|dA^i_s|$ has finite total mass $\leq n$.
3. **Integrals against the quadratic variation**: these are also Lebesgue-Stieltjes integrals (since $\langle X^i, X^j \rangle$ is a continuous finite variation process), and again the classical dominated convergence theorem applies.
The Weierstrass approximation theorem for derivatives requires a small remark: one applies the standard Weierstrass theorem to $f$, $\frac{\partial f}{\partial x_i}$, and $\frac{\partial^2 f}{\partial x_i \partial x_j}$ simultaneously on the compact set $\overline{B}(0, n)$. The polynomial $p_\ell$ can be chosen to approximate all of these uniformly, since smooth functions on compact sets can be uniformly approximated by polynomials in all their derivatives (this follows from applying Weierstrass to each derivative and then integrating, or more directly from the multi-dimensional Stone-Weierstrass theorem applied to the jet space).
[/guided]
[/step]
[step:Conclude that $f(X)$ is a semimartingale]
The Ito formula expresses $f(X_t) - f(X_0)$ as the sum of a stochastic integral against continuous local martingales (which is itself a continuous local martingale) and integrals against continuous finite variation processes (which are continuous and of finite variation). Therefore $f(X)$ is a continuous semimartingale, as claimed.
[/step]