[proofplan]
The strategy is to reduce to non-negative submartingales via the conditional [Jensen inequality](/theorems/515), then exploit a layer-cake representation of the truncated $p$-th moment $\mathbb{E}[(\min\{X_n^*, K\})^p]$. The layer-cake identity converts the $p$-th moment into an [integral](/page/Integral) over level [sets](/page/Set), where [Doob's maximal inequality](/theorems/1158) bounds each level-set probability. Exchanging the order of integration via Fubini's theorem collapses the double integral into a single expectation, and Hölder's inequality with conjugate exponents $p$ and $p/(p-1)$ decouples $X_n$ from the truncated maximal [function](/page/Function). Dividing by the finite truncated norm and sending the truncation parameter $K \to \infty$ via the [monotone convergence theorem](/theorems/509) yields the sharp constant $p/(p-1)$.
[/proofplan]
[step:Reduce to the case of a non-negative submartingale]
If $X = (X_n)_{n \geq 0}$ is a martingale adapted to $(\mathcal{F}_n)_{n \geq 0}$, then the process $|X| = (|X_n|)_{n \geq 0}$ is a non-negative submartingale. To verify this, fix $n \geq 0$. Since $X_n$ is $\mathcal{F}_n$-measurable and $|\cdot|$ is Borel, $|X_n|$ is $\mathcal{F}_n$-measurable. Since $X_n \in L^p(\Omega, \mathcal{F}, \mathbb{P})$ with $p > 1$, we have $\mathbb{E}[|X_n|] \leq \mathbb{E}[|X_n|^p]^{1/p} < \infty$ by [Jensen's Inequality](/theorems/515) applied to the concave function $t \mapsto t^{1/p}$ on $[0, \infty)$. For the submartingale property, the function $\varphi: \mathbb{R} \to \mathbb{R}$, $x \mapsto |x|$ is convex, so the conditional Jensen inequality gives
\begin{align*}
\mathbb{E}[|X_{n+1}| \mid \mathcal{F}_n] \geq |\mathbb{E}[X_{n+1} \mid \mathcal{F}_n]| = |X_n| \quad \text{a.s.}
\end{align*}
where the final equality uses the martingale property $\mathbb{E}[X_{n+1} \mid \mathcal{F}_n] = X_n$ a.s. Since $\sup_{k \leq n} |X_k| = X_n^*$ regardless of whether we work with $X$ or $|X|$, and $\|X_n\|_p = \||X_n|\|_p$, it suffices to prove the inequality for non-negative submartingales. We assume henceforth that $X_n \geq 0$ a.s. for all $n \geq 0$ and that $(X_n)_{n \geq 0}$ is a submartingale.
[guided]
Why must we reduce to submartingales rather than proving the inequality directly for martingales? The proof relies on [Doob's Maximal Inequality](/theorems/1158), which is stated for non-negative submartingales. That result provides the bound $\lambda \, \mathbb{P}(X_n^* \geq \lambda) \leq \mathbb{E}[X_n \mathbb{1}_{X_n^* \geq \lambda}]$, and the right-hand side requires $X_n \geq 0$ for the inequality to hold (otherwise the indicator-weighted expectation could be negative, making the bound useless).
To carry out the reduction, we must verify that $|X|$ satisfies all three submartingale requirements:
**Adaptedness.** Each $X_n$ is $\mathcal{F}_n$-measurable by hypothesis. Since $\varphi: \mathbb{R} \to \mathbb{R}$, $x \mapsto |x|$ is continuous (hence Borel-measurable), the composition $|X_n| = \varphi \circ X_n$ is $\mathcal{F}_n$-measurable.
**Integrability.** We need $\mathbb{E}[|X_n|] < \infty$. Since $p > 1$, by [Jensen's Inequality](/theorems/515) applied to the concave function $t \mapsto t^{1/p}$ on $[0, \infty)$, we obtain $\mathbb{E}[|X_n|] = \mathbb{E}[(|X_n|^p)^{1/p}] \leq (\mathbb{E}[|X_n|^p])^{1/p} = \|X_n\|_p < \infty$.
**Conditional submartingale inequality.** The function $\varphi(x) = |x|$ is convex. The conditional Jensen inequality (applied with $\varphi$ convex, $X_{n+1} \in L^1$, and conditioning on the sub-$\sigma$-algebra $\mathcal{F}_n$) yields
\begin{align*}
\mathbb{E}[|X_{n+1}| \mid \mathcal{F}_n] = \mathbb{E}[\varphi(X_{n+1}) \mid \mathcal{F}_n] \geq \varphi(\mathbb{E}[X_{n+1} \mid \mathcal{F}_n]) = |\mathbb{E}[X_{n+1} \mid \mathcal{F}_n]| = |X_n| \quad \text{a.s.}
\end{align*}
The final equality uses the martingale property $\mathbb{E}[X_{n+1} \mid \mathcal{F}_n] = X_n$ a.s. Since $X_n^* = \sup_{k \leq n} |X_k|$ is the same whether we define it using $X$ or $|X|$, and $\|X_n\|_p = \||X_n|\|_p$, the inequality for $|X|$ as a non-negative submartingale implies the inequality for $X$ as a martingale. We therefore assume $X_n \geq 0$ a.s. for all $n$.
[/guided]
[/step]
[step:Truncate the maximal function and apply the layer-cake representation]
Fix $K \in (0, \infty)$ and define the truncation
\begin{align*}
Y_K: \Omega &\to [0, K] \\
\omega &\mapsto \min\{X_n^*(\omega), K\}.
\end{align*}
Since $0 \leq Y_K \leq K$ a.s., we have $Y_K \in L^p(\Omega, \mathcal{F}, \mathbb{P})$ with $\|Y_K\|_p \leq K < \infty$. We apply the layer-cake identity: for any non-negative random variable $Z$ and $p \geq 1$,
\begin{align*}
Z^p = \int_0^\infty p \lambda^{p-1} \mathbb{1}_{\{Z \geq \lambda\}} \, d\mathcal{L}^1(\lambda).
\end{align*}
Applying this with $Z = Y_K$ and taking expectations, then exchanging the expectation with the [Lebesgue integral](/page/Lebesgue%20Integral) via [Fubini's Theorem](/theorems/513) — the integrand $p \lambda^{p-1} \mathbb{1}_{\{Y_K \geq \lambda\}}$ is non-negative and the [measure spaces](/page/Measure%20Space) $(\Omega, \mathcal{F}, \mathbb{P})$ and $((0, \infty), \mathcal{B}((0, \infty)), \mathcal{L}^1)$ are both $\sigma$-finite — we obtain
\begin{align*}
\mathbb{E}[Y_K^p] &= \int_0^\infty p \lambda^{p-1} \mathbb{P}(Y_K \geq \lambda) \, d\mathcal{L}^1(\lambda) = \int_0^K p \lambda^{p-1} \mathbb{P}(X_n^* \geq \lambda) \, d\mathcal{L}^1(\lambda),
\end{align*}
where the second equality uses the facts that $\{Y_K \geq \lambda\} = \{X_n^* \geq \lambda\}$ for $\lambda \leq K$ and $\mathbb{P}(Y_K \geq \lambda) = 0$ for $\lambda > K$, so the integral vanishes outside $[0, K]$.
[guided]
The truncation serves a crucial technical purpose. The maximal function $X_n^*$ may not belong to $L^p$ — indeed, proving that it does is the content of the theorem. By working with $Y_K = \min\{X_n^*, K\}$, we ensure all quantities are finite, which permits division by $\|Y_K\|_p^{p-1}$ in a later step. We will send $K \to \infty$ at the end.
The layer-cake identity (also called Cavalieri's principle) is the formula
\begin{align*}
Z^p = \int_0^\infty p \lambda^{p-1} \mathbb{1}_{\{Z \geq \lambda\}} \, d\mathcal{L}^1(\lambda),
\end{align*}
valid pointwise for any non-negative measurable $Z$. This can be verified by computing the right-hand side: for fixed $\omega$, the indicator $\mathbb{1}_{\{Z(\omega) \geq \lambda\}} = 1$ for $\lambda \in [0, Z(\omega)]$ and vanishes otherwise, so the integral equals $\int_0^{Z(\omega)} p \lambda^{p-1} \, d\mathcal{L}^1(\lambda) = Z(\omega)^p$.
Taking expectations and applying [Fubini's Theorem](/theorems/513) to exchange $\mathbb{E}$ and $\int_0^\infty$ — valid because $(\Omega, \mathcal{F}, \mathbb{P})$ is a probability space (hence $\sigma$-finite), $((0,\infty), \mathcal{B}((0,\infty)), \mathcal{L}^1)$ is $\sigma$-finite, and the integrand is non-negative — we obtain
\begin{align*}
\mathbb{E}[Y_K^p] = \int_0^\infty p \lambda^{p-1} \mathbb{P}(Y_K \geq \lambda) \, d\mathcal{L}^1(\lambda).
\end{align*}
Since $Y_K = \min\{X_n^*, K\}$, the level set $\{Y_K \geq \lambda\}$ equals $\{X_n^* \geq \lambda\}$ when $\lambda \leq K$, and is empty when $\lambda > K$. The upper [limit](/page/Limit) of integration therefore reduces to $K$.
[/guided]
[/step]
[step:Apply Doob's maximal inequality to bound the level-set probabilities]
Since $(X_n)_{n \geq 0}$ is a non-negative submartingale and $\lambda > 0$, [Doob's Maximal Inequality](/theorems/1158) gives
\begin{align*}
\lambda \, \mathbb{P}(X_n^* \geq \lambda) \leq \mathbb{E}[X_n \mathbb{1}_{\{X_n^* \geq \lambda\}}].
\end{align*}
Substituting $\mathbb{P}(X_n^* \geq \lambda) \leq \lambda^{-1} \mathbb{E}[X_n \mathbb{1}_{\{X_n^* \geq \lambda\}}]$ into the layer-cake integral from the previous step:
\begin{align*}
\mathbb{E}[Y_K^p] &\leq \int_0^K p \lambda^{p-1} \cdot \lambda^{-1} \, \mathbb{E}[X_n \mathbb{1}_{\{X_n^* \geq \lambda\}}] \, d\mathcal{L}^1(\lambda) = \int_0^K p \lambda^{p-2} \, \mathbb{E}[X_n \mathbb{1}_{\{X_n^* \geq \lambda\}}] \, d\mathcal{L}^1(\lambda).
\end{align*}
[guided]
This is the heart of the argument: the $L^1$-type maximal inequality ([Doob's Maximal Inequality](/theorems/1158)) controls tail probabilities of $X_n^*$, and the layer-cake representation converts these tail bounds into an $L^p$ estimate. The exponent $p-2$ in $\lambda^{p-2}$ arises from $\lambda^{p-1} \cdot \lambda^{-1}$: the layer-cake formula contributes the factor $\lambda^{p-1}$, and the maximal inequality contributes $\lambda^{-1}$ from dividing both sides by $\lambda$. The requirement $p > 1$ ensures $p - 2 > -1$, so the singularity at $\lambda = 0$ is integrable — this is where the hypothesis $p > 1$ is consumed. (For $p = 1$, the exponent would be $-1$, producing a logarithmic divergence, and indeed the $L^1$ maximal inequality does not upgrade to an $L^1$ bound on $X_n^*$.)
[/guided]
[/step]
[step:Exchange the order of integration via Fubini and evaluate the inner integral]
The integrand $p \lambda^{p-2} X_n(\omega) \mathbb{1}_{\{X_n^*(\omega) \geq \lambda\}}$ is non-negative (since $X_n \geq 0$ and $p - 2 > -1$), and the measure spaces $(\Omega, \mathcal{F}, \mathbb{P})$ and $((0, K), \mathcal{B}((0, K)), \mathcal{L}^1)$ are $\sigma$-finite, so the Tonelli form of [Fubini's Theorem](/theorems/513) permits exchanging the expectation and the Lebesgue integral:
\begin{align*}
\int_0^K p \lambda^{p-2} \, \mathbb{E}[X_n \mathbb{1}_{\{X_n^* \geq \lambda\}}] \, d\mathcal{L}^1(\lambda) &= \mathbb{E}\left[ X_n \int_0^K p \lambda^{p-2} \mathbb{1}_{\{X_n^* \geq \lambda\}} \, d\mathcal{L}^1(\lambda) \right].
\end{align*}
For fixed $\omega \in \Omega$, the indicator $\mathbb{1}_{\{X_n^*(\omega) \geq \lambda\}} = 1$ for $\lambda \in (0, X_n^*(\omega)]$ and vanishes for $\lambda > X_n^*(\omega)$. The inner integral therefore evaluates as
\begin{align*}
\int_0^K p \lambda^{p-2} \mathbb{1}_{\{X_n^*(\omega) \geq \lambda\}} \, d\mathcal{L}^1(\lambda) &= \int_0^{\min\{X_n^*(\omega),\, K\}} p \lambda^{p-2} \, d\mathcal{L}^1(\lambda) = p \cdot \frac{\lambda^{p-1}}{p-1} \Bigg|_{\lambda=0}^{\min\{X_n^*, K\}} = \frac{p}{p-1} \big(\min\{X_n^*, K\}\big)^{p-1}.
\end{align*}
Substituting back:
\begin{align*}
\mathbb{E}[Y_K^p] \leq \frac{p}{p-1} \, \mathbb{E}\big[X_n \cdot Y_K^{p-1}\big].
\end{align*}
[guided]
After the Fubini exchange, the key computation is recognising that the inner integral is a reverse application of the layer-cake identity. For fixed $\omega$, the integration variable $\lambda$ ranges over $(0, \min\{X_n^*(\omega), K\})$ — this is where the truncation $K$ interacts with the running maximum $X_n^*$. We integrate:
\begin{align*}
\int_0^{\min\{X_n^*, K\}} p \lambda^{p-2} \, d\mathcal{L}^1(\lambda) = \frac{p}{p-1} \lambda^{p-1} \Bigg|_0^{\min\{X_n^*, K\}} = \frac{p}{p-1} (\min\{X_n^*, K\})^{p-1}.
\end{align*}
The antiderivative $\lambda^{p-1}/(p-1)$ exists precisely because $p - 2 > -1$ (i.e., $p > 1$), confirming again where the exponent restriction enters. The [boundary](/page/Boundary) term at $\lambda = 0$ vanishes since $p - 1 > 0$.
Recalling that $Y_K = \min\{X_n^*, K\}$, the estimate takes the compact form $\mathbb{E}[Y_K^p] \leq \frac{p}{p-1} \mathbb{E}[X_n Y_K^{p-1}]$.
[/guided]
[/step]
[step:Decouple the right-hand side via Hölder's inequality]
We apply the [Hölder Inequality](/theorems/516) on the measure space $(\Omega, \mathcal{F}, \mathbb{P})$ to the product $X_n \cdot Y_K^{p-1}$, pairing
\begin{align*}
f &:= X_n \in L^p(\Omega, \mathcal{F}, \mathbb{P}), \quad g := Y_K^{p-1} \in L^q(\Omega, \mathcal{F}, \mathbb{P}),
\end{align*}
with conjugate exponents $p$ and $q = p/(p-1)$. The function $f$ belongs to $L^p$ by hypothesis; $g$ belongs to $L^q$ because $\|g\|_q = \|Y_K^{p-1}\|_{p/(p-1)} = (\mathbb{E}[Y_K^p])^{(p-1)/p} = \|Y_K\|_p^{p-1} \leq K^{p-1} < \infty$. Hölder's inequality gives
\begin{align*}
\mathbb{E}[X_n \cdot Y_K^{p-1}] \leq \|X_n\|_p \cdot \|Y_K^{p-1}\|_q = \|X_n\|_p \cdot \|Y_K\|_p^{p-1}.
\end{align*}
Combining with the estimate from the previous step:
\begin{align*}
\|Y_K\|_p^p = \mathbb{E}[Y_K^p] \leq \frac{p}{p-1} \|X_n\|_p \cdot \|Y_K\|_p^{p-1}.
\end{align*}
[guided]
Why do we pair $X_n$ with exponent $p$ and $Y_K^{p-1}$ with the conjugate $q = p/(p-1)$? The goal is to isolate the $L^p$ norm of $X_n$ on the right-hand side, since this is what appears in the theorem statement. The remaining factor $Y_K^{p-1}$ must then go in the conjugate $L^q$ space. Let us verify the exponent arithmetic: $\|Y_K^{p-1}\|_q^q = \mathbb{E}[Y_K^{(p-1)q}] = \mathbb{E}[Y_K^{(p-1) \cdot p/(p-1)}] = \mathbb{E}[Y_K^p]$, so $\|Y_K^{p-1}\|_q = (\mathbb{E}[Y_K^p])^{1/q} = (\mathbb{E}[Y_K^p])^{(p-1)/p} = \|Y_K\|_p^{p-1}$.
The right-hand side of Hölder's inequality therefore reads $\|X_n\|_p \cdot \|Y_K\|_p^{p-1}$, which has the same $\|Y_K\|_p$ factor as the left-hand side $\|Y_K\|_p^p$. This sets up the critical cancellation in the next step.
[/guided]
[/step]
[step:Divide by the truncated norm and send $K \to \infty$ via the monotone convergence theorem]
Since $0 \leq Y_K \leq K$ and $Y_K \in L^p$, the quantity $\|Y_K\|_p^{p-1}$ is finite. If $\|Y_K\|_p = 0$, then $Y_K = 0$ a.s., so $X_n^* = 0$ a.s. and the inequality holds with both sides equal to zero. If $\|Y_K\|_p > 0$, divide both sides of
\begin{align*}
\|Y_K\|_p^p \leq \frac{p}{p-1} \|X_n\|_p \cdot \|Y_K\|_p^{p-1}
\end{align*}
by $\|Y_K\|_p^{p-1}$ to obtain
\begin{align*}
\|\min\{X_n^*, K\}\|_p \leq \frac{p}{p-1} \|X_n\|_p.
\end{align*}
As $K \to \infty$, the [sequence](/page/Sequence) $\min\{X_n^*, K\}$ increases monotonically to $X_n^*$ pointwise. By the [Monotone Convergence Theorem](/theorems/509) applied to the non-negative [measurable functions](/page/Measurable%20Functions) $(\min\{X_n^*, K\})^p \nearrow (X_n^*)^p$ on the measure space $(\Omega, \mathcal{F}, \mathbb{P})$,
\begin{align*}
\|X_n^*\|_p = \lim_{K \to \infty} \|\min\{X_n^*, K\}\|_p \leq \frac{p}{p-1} \|X_n\|_p.
\end{align*}
[guided]
The division step requires $\|Y_K\|_p^{p-1} \neq 0$ and finite — both conditions guaranteed by the truncation. Without truncation, we would need $\|X_n^*\|_p < \infty$ before we could divide, but establishing this finiteness is the entire point of the theorem. The truncation-then-limit strategy avoids this circularity.
For the passage $K \to \infty$: define $g_K: \Omega \to [0, \infty)$, $\omega \mapsto (\min\{X_n^*(\omega), K\})^p$. Then $0 \leq g_1 \leq g_2 \leq \cdots$ and $g_K(\omega) \nearrow (X_n^*(\omega))^p$ for every $\omega$. The [Monotone Convergence Theorem](/theorems/509) applies because $(g_K)$ is a non-decreasing sequence of non-negative measurable functions on a measure space, giving
\begin{align*}
\lim_{K \to \infty} \mathbb{E}[g_K] = \mathbb{E}[(X_n^*)^p],
\end{align*}
which is equivalent to $\lim_{K \to \infty} \|Y_K\|_p^p = \|X_n^*\|_p^p$. Taking $p$-th roots (the map $t \mapsto t^{1/p}$ is continuous and monotone increasing on $[0, \infty]$) yields $\|X_n^*\|_p \leq \frac{p}{p-1} \|X_n\|_p$.
Note that the right-hand side $\frac{p}{p-1} \|X_n\|_p$ is independent of $K$, so the bound passes to the limit unchanged. If $\|X_n\|_p = \infty$, the inequality holds vacuously. If $\|X_n\|_p < \infty$, the inequality establishes that $X_n^* \in L^p$ with the quantitative bound $\|X_n^*\|_p \leq \frac{p}{p-1} \|X_n\|_p$.
[/guided]
[/step]