[proofplan]
We prove that the stochastic integral $H \cdot M$ of a simple previsible process $H \in \mathcal{E}$ against a continuous square-integrable martingale $M \in \mathcal{M}^2_c$ is itself in $\mathcal{M}^2_c$, and that the Ito isometry $\|H \cdot M\|_{\mathcal{M}^2}^2 = \mathbb{E}[\int_0^\infty H_s^2 \, d\langle M \rangle_s]$ holds. The argument proceeds in three stages: first, we show that each elementary piece $X^i_t = H_{t_{i-1}}(M_{t_i \wedge t} - M_{t_{i-1} \wedge t})$ is a continuous $L^2$-bounded martingale and that their sum $H \cdot M$ lies in $\mathcal{M}^2_c$; second, we compute the quadratic variation $\langle X^i \rangle_t = H_{t_{i-1}}^2 (\langle M \rangle_{t_i \wedge t} - \langle M \rangle_{t_{i-1} \wedge t})$ using the scaling and stopping properties of the bracket; third, we show the cross-brackets $\langle X^i, X^j \rangle = 0$ for $i \neq j$ (orthogonality on disjoint time intervals), sum to obtain $\langle H \cdot M \rangle_t = \int_0^t H_s^2 \, d\langle M \rangle_s$, and take expectations to conclude.
[/proofplan]
[step:Decompose $H \cdot M$ into elementary martingale pieces and verify $H \cdot M \in \mathcal{M}^2_c$]
Let $H \in \mathcal{E}$ be a simple previsible process of the form
\begin{align*}
H_s = \sum_{i=1}^{k} H_{t_{i-1}} \, \mathbb{1}_{(t_{i-1}, t_i]}(s),
\end{align*}
where $0 = t_0 < t_1 < \cdots < t_k$ is a finite partition and each $H_{t_{i-1}}$ is a bounded $\mathcal{F}_{t_{i-1}}$-measurable random variable. Define
\begin{align*}
X^i : [0, \infty) \times \Omega &\to \mathbb{R} \\
t &\mapsto H_{t_{i-1}} (M_{t_i \wedge t} - M_{t_{i-1} \wedge t}),
\end{align*}
so that $(H \cdot M)_t = \sum_{i=1}^k X^i_t$.
Each $X^i$ is a continuous martingale: for $s \leq t$, the increment $X^i_t - X^i_s = H_{t_{i-1}}(M_{t_i \wedge t} - M_{t_i \wedge s})$ when $s \geq t_{i-1}$ (and similarly for $s < t_{i-1}$). Since $H_{t_{i-1}} \in \mathcal{F}_{t_{i-1}} \subseteq \mathcal{F}_s$ and $M$ is a martingale, the conditional expectation satisfies
\begin{align*}
\mathbb{E}[X^i_t - X^i_s \mid \mathcal{F}_s] = H_{t_{i-1}} \, \mathbb{E}[M_{t_i \wedge t} - M_{t_i \wedge s} \mid \mathcal{F}_s] = 0.
\end{align*}
Continuity of $X^i$ follows from continuity of $M$. Moreover, $X^i$ is bounded in $L^2$:
\begin{align*}
\sup_{t \geq 0} \|X^i_t\|_{L^2}^2 = \sup_{t \geq 0} \mathbb{E}[H_{t_{i-1}}^2 (M_{t_i \wedge t} - M_{t_{i-1} \wedge t})^2] \leq \|H_{t_{i-1}}\|_\infty^2 \cdot 4 \|M\|_{\mathcal{M}^2}^2,
\end{align*}
using $|M_{t_i \wedge t} - M_{t_{i-1} \wedge t}| \leq |M_{t_i \wedge t}| + |M_{t_{i-1} \wedge t}| \leq 2 \sup_{u \geq 0} |M_u|$ and Doob's maximal inequality. Since $H$ is bounded, $H \cdot M = \sum_i X^i$ is a finite sum of continuous $L^2$-bounded martingales, hence $H \cdot M \in \mathcal{M}^2_c$.
[guided]
Why does the martingale property hold for $X^i$? The key is that $H_{t_{i-1}}$ is $\mathcal{F}_{t_{i-1}}$-measurable, hence also $\mathcal{F}_s$-measurable for $s \geq t_{i-1}$. This means $H_{t_{i-1}}$ can be pulled out of the conditional expectation $\mathbb{E}[\cdot \mid \mathcal{F}_s]$, leaving $\mathbb{E}[M_{t_i \wedge t} - M_{t_i \wedge s} \mid \mathcal{F}_s]$, which vanishes by the martingale property of $M$.
For $s < t_{i-1}$, note that $X^i_s = H_{t_{i-1}}(M_{t_i \wedge s} - M_{t_{i-1} \wedge s}) = 0$ (since both $t_i \wedge s$ and $t_{i-1} \wedge s$ equal $s$ when $s < t_{i-1}$, or both are clamped below $t_{i-1}$). One must check both cases, but the martingale property still holds by the tower property of conditional expectation.
The $L^2$ bound ensures $X^i \in \mathcal{M}^2_c$ (not just a local martingale). Since $\mathcal{M}^2_c$ is a vector space, the finite sum $H \cdot M = \sum_i X^i \in \mathcal{M}^2_c$.
[/guided]
[/step]
[step:Compute the quadratic variation of each piece $\langle X^i \rangle_t = H_{t_{i-1}}^2 (\langle M \rangle_{t_i \wedge t} - \langle M \rangle_{t_{i-1} \wedge t})$]
Consider the process $X^i_t = H_{t_{i-1}}(M_{t_i \wedge t} - M_{t_{i-1} \wedge t})$. Define the stopped martingale $\widetilde{M}^i_t := M_{t_i \wedge t} - M_{t_{i-1} \wedge t}$, so that $X^i = H_{t_{i-1}} \widetilde{M}^i$.
Since $H_{t_{i-1}}$ is $\mathcal{F}_{t_{i-1}}$-measurable (hence deterministic at time $t_{i-1}$), the process $(X^i)^2_t = H_{t_{i-1}}^2 (\widetilde{M}^i_t)^2$. The quadratic variation of $X^i$ must satisfy: $t \mapsto (X^i_t)^2 - \langle X^i \rangle_t$ is a continuous local martingale. We claim $\langle X^i \rangle_t = H_{t_{i-1}}^2 \langle \widetilde{M}^i \rangle_t$.
To verify this, note that
\begin{align*}
(X^i_t)^2 - H_{t_{i-1}}^2 \langle \widetilde{M}^i \rangle_t = H_{t_{i-1}}^2 \left[(\widetilde{M}^i_t)^2 - \langle \widetilde{M}^i \rangle_t \right].
\end{align*}
The bracketed expression $(\widetilde{M}^i)^2 - \langle \widetilde{M}^i \rangle$ is a continuous local martingale (by the characterisation of quadratic variation). Since $H_{t_{i-1}}^2$ is $\mathcal{F}_{t_{i-1}}$-measurable and bounded, the product $H_{t_{i-1}}^2 [(\widetilde{M}^i)^2 - \langle \widetilde{M}^i \rangle]$ is a continuous local martingale (multiplication by an $\mathcal{F}_{t_{i-1}}$-measurable bounded random variable preserves the local martingale property for any process adapted to a filtration containing $\mathcal{F}_{t_{i-1}}$). By uniqueness of the quadratic variation, $\langle X^i \rangle_t = H_{t_{i-1}}^2 \langle \widetilde{M}^i \rangle_t$.
By the [Quadratic Variation of a Stopped Process](/theorems/2083),
\begin{align*}
\langle \widetilde{M}^i \rangle_t = \langle M^{t_i} - M^{t_{i-1}} \rangle_t = \langle M \rangle_{t_i \wedge t} - \langle M \rangle_{t_{i-1} \wedge t},
\end{align*}
where the second equality uses bilinearity and the stopping property $\langle M^{t_j} \rangle_t = \langle M \rangle_{t_j \wedge t}$ from the [Quadratic Variation of a Stopped Process](/theorems/2083). Therefore
\begin{align*}
\langle X^i \rangle_t = H_{t_{i-1}}^2 \left(\langle M \rangle_{t_i \wedge t} - \langle M \rangle_{t_{i-1} \wedge t}\right).
\end{align*}
[guided]
The computation reduces to understanding how scaling by a predictable random variable affects the quadratic variation. If $Z$ is $\mathcal{F}_0$-measurable and bounded, and $L$ is a continuous local martingale, then $\langle ZL \rangle = Z^2 \langle L \rangle$. Why? Because $(ZL)^2 - Z^2 \langle L \rangle = Z^2(L^2 - \langle L \rangle)$, and multiplying a continuous local martingale by a bounded $\mathcal{F}_0$-measurable random variable preserves the local martingale property (the conditional expectations factor).
Here, $H_{t_{i-1}}$ is $\mathcal{F}_{t_{i-1}}$-measurable rather than $\mathcal{F}_0$-measurable, but the same argument applies: for $s \geq t_{i-1}$, the variable $H_{t_{i-1}}$ is known and can be treated as a constant. The process $\widetilde{M}^i = M^{t_i} - M^{t_{i-1}}$ is zero before time $t_{i-1}$ and equals $M_t - M_{t_{i-1}}$ for $t_{i-1} \leq t \leq t_i$, so the quadratic variation accumulates only on $[t_{i-1}, t_i]$, where $H_{t_{i-1}}$ is $\mathcal{F}_s$-measurable.
The stopping property $\langle \widetilde{M}^i \rangle_t = \langle M \rangle_{t_i \wedge t} - \langle M \rangle_{t_{i-1} \wedge t}$ follows from the linearity of the bracket and the [Quadratic Variation of a Stopped Process](/theorems/2083): $\langle M^{t_i} \rangle_t = \langle M \rangle_{t_i \wedge t}$ and $\langle M^{t_{i-1}} \rangle_t = \langle M \rangle_{t_{i-1} \wedge t}$, and the cross term $\langle M^{t_i}, M^{t_{i-1}} \rangle_t = \langle M \rangle_{t_{i-1} \wedge t}$ by part (iv) of the [Properties of Covariation](/theorems/2086).
[/guided]
[/step]
[step:Show orthogonality $\langle X^i, X^j \rangle = 0$ for $i \neq j$]
Assume $i < j$ (so $t_{i-1} < t_i \leq t_{j-1} < t_j$). The process $X^i_t = H_{t_{i-1}}(M_{t_i \wedge t} - M_{t_{i-1} \wedge t})$ is constant for $t \geq t_i$ (since $M_{t_i \wedge t} = M_{t_i}$ and $M_{t_{i-1} \wedge t} = M_{t_{i-1}}$ for $t \geq t_i$). The process $X^j_t = H_{t_{j-1}}(M_{t_j \wedge t} - M_{t_{j-1} \wedge t})$ is zero for $t \leq t_{j-1}$.
Since $\langle X^i, X^j \rangle$ is a continuous finite variation process, its increments on $(t_{j-1}, t_j]$ are given by
\begin{align*}
d\langle X^i, X^j \rangle_t = 0 \quad \text{for all } t,
\end{align*}
because $X^i$ is constant on $[t_i, \infty) \supseteq [t_{j-1}, \infty)$ (so $dX^i_t = 0$ for $t > t_i$), and $X^j$ is zero on $[0, t_{j-1}]$ (so $dX^j_t = 0$ for $t \leq t_{j-1}$). The covariation $\langle X^i, X^j \rangle$ can only accumulate on time intervals where both $X^i$ and $X^j$ are varying, but the supports $[t_{i-1}, t_i]$ and $[t_{j-1}, t_j]$ are disjoint. By part (iii) of the [Properties of Covariation](/theorems/2086), the Riemann-sum approximations $\sum_\ell \Delta_\ell X^i \, \Delta_\ell X^j$ have the property that for each subinterval, at least one of $\Delta_\ell X^i$ or $\Delta_\ell X^j$ is zero (for sufficiently fine partitions that separate $t_i$ and $t_{j-1}$). Therefore $\langle X^i, X^j \rangle = 0$.
[/step]
[step:Sum the quadratic variations and take expectations to establish the isometry]
By bilinearity of the bracket and the orthogonality $\langle X^i, X^j \rangle = 0$ for $i \neq j$,
\begin{align*}
\langle H \cdot M \rangle_t = \left\langle \sum_{i=1}^k X^i, \sum_{j=1}^k X^j \right\rangle_t = \sum_{i=1}^k \langle X^i \rangle_t = \sum_{i=1}^k H_{t_{i-1}}^2 \left(\langle M \rangle_{t_i \wedge t} - \langle M \rangle_{t_{i-1} \wedge t}\right).
\end{align*}
The right-hand side is precisely the Lebesgue--Stieltjes integral of $H^2$ against $d\langle M \rangle$:
\begin{align*}
\sum_{i=1}^k H_{t_{i-1}}^2 \left(\langle M \rangle_{t_i \wedge t} - \langle M \rangle_{t_{i-1} \wedge t}\right) = \int_0^t H_s^2 \, d\langle M \rangle_s,
\end{align*}
since $H^2 = \sum_i H_{t_{i-1}}^2 \mathbb{1}_{(t_{i-1}, t_i]}$ is a step function and the integral of a step function against a Stieltjes measure is the corresponding weighted sum of increments.
By the [Quadratic Variation Norm Formula](/theorems/2085) applied to $H \cdot M \in \mathcal{M}^2_c$ (noting $(H \cdot M)_0 = 0$),
\begin{align*}
\|H \cdot M\|_{\mathcal{M}^2}^2 = \mathbb{E}[\langle H \cdot M \rangle_\infty] = \mathbb{E}\!\left[\int_0^\infty H_s^2 \, d\langle M \rangle_s\right].
\end{align*}
This is the Ito isometry for simple processes.
[guided]
The final step connects the bracket computation to the $\mathcal{M}^2$ norm. The [Quadratic Variation Norm Formula](/theorems/2085) states that for any $M \in \mathcal{M}^2_c$ with $M_0 = 0$, the $\mathcal{M}^2$ norm satisfies $\|M\|_{\mathcal{M}^2}^2 = \mathbb{E}[\langle M \rangle_\infty]$. We apply this to the process $H \cdot M$, which starts at $0$ and lies in $\mathcal{M}^2_c$ by the first step.
The identity $\langle H \cdot M \rangle_t = \int_0^t H_s^2 \, d\langle M \rangle_s$ is the bracket identity that will later characterise the stochastic integral via part (ii) of the [Ito Isometry](/theorems/2092). Here, for simple processes, it is a direct computation rather than a characterisation.
The name "isometry" comes from viewing the map $H \mapsto H \cdot M$ as a map from $L^2(M)$ (the space of previsible processes with $\mathbb{E}[\int_0^\infty H_s^2 \, d\langle M \rangle_s] < \infty$) to $\mathcal{M}^2_c$. The identity $\|H \cdot M\|_{\mathcal{M}^2} = \|H\|_{L^2(M)}$ says this map preserves norms, hence is an isometry. This is the foundation for extending the stochastic integral from $\mathcal{E}$ to all of $L^2(M)$ by density and completeness.
[/guided]
[/step]