[proofplan]
We give the von Neumann proof using Hilbert space theory. First reduce to the finite case by exhausting $X$ into sets of finite $(\mu + \nu)$-measure. On each finite piece, the functional $\varphi \mapsto \int \varphi \, d\nu$ is a bounded linear functional on $L^2(\mu + \nu)$, so the Riesz Representation Theorem provides a function $g$ with $\int \varphi \, d\nu = \int \varphi g \, d(\mu + \nu)$. We show $0 \le g < 1$ $(\mu + \nu)$-a.e. using absolute continuity, then algebraically extract the Radon-Nikodym derivative as $f = g/(1 - g)$. The passage from finite to $\sigma$-finite uses monotone convergence. Uniqueness follows from the standard null-set argument.
[/proofplan]
[step:Reduce to the case where $\mu$ and $\nu$ are both finite measures]
Since $\mu$ and $\nu$ are $\sigma$-finite, there exist sequences $(A_k)_{k=1}^\infty$ and $(B_k)_{k=1}^\infty$ in $\mathcal{A}$ with $A_k \uparrow X$, $\mu(A_k) < \infty$, and $B_k \uparrow X$, $\nu(B_k) < \infty$. Define $X_k := A_k \cap B_k$. Then $X_k \uparrow X$, $\mu(X_k) < \infty$, and $\nu(X_k) < \infty$.
If we can prove the theorem for the finite measures $\mu_k := \mu|_{X_k}$ and $\nu_k := \nu|_{X_k}$ on the measurable space $(X_k, \mathcal{A} \cap X_k)$, we will extend to the $\sigma$-finite case in the final step. The condition $\nu_k \ll \mu_k$ is inherited: if $A \in \mathcal{A}$ satisfies $\mu_k(A) = \mu(A \cap X_k) = 0$, then $\nu_k(A) = \nu(A \cap X_k) = 0$ since $\nu \ll \mu$.
For the remainder of the proof, we assume $\mu(X) < \infty$ and $\nu(X) < \infty$, and we construct $f$ on this finite measure space. The extension to the $\sigma$-finite case is carried out in the final step.
[guided]
The reduction to finite measures is standard in measure theory. The point is that Hilbert space arguments (specifically the Riesz Representation Theorem) require finite measures to ensure that constant functions are square-integrable. If $\mu(X) = \infty$, then $\mathbb{1}_X \notin L^2(\mu)$, and the functional we construct would not be bounded on $L^2$.
The construction $X_k = A_k \cap B_k$ ensures that both $\mu$ and $\nu$ are simultaneously finite on $X_k$. This is necessary because we will work with the combined measure $\mu + \nu$, which must also be finite.
[/guided]
[/step]
[step:Construct the linear functional on $L^2(X, \mathcal{A}, \mu + \nu)$ and apply the Riesz Representation Theorem]
Define the finite measure $\sigma := \mu + \nu$ on $(X, \mathcal{A})$. Since $\sigma(X) = \mu(X) + \nu(X) < \infty$, the space $L^2(X, \mathcal{A}, \sigma)$ is a Hilbert space with inner product:
\begin{align*}
(\varphi, \psi)_{L^2(\sigma)} := \int_X \varphi \psi \, d\sigma.
\end{align*}
Define the linear functional:
\begin{align*}
\Lambda: L^2(X, \mathcal{A}, \sigma) &\to \mathbb{R}, \quad \varphi \mapsto \int_X \varphi \, d\nu.
\end{align*}
We verify $\Lambda$ is well-defined and bounded. For any $\varphi \in L^2(\sigma)$, the Cauchy-Schwarz inequality for the measure $\sigma$ gives:
\begin{align*}
|\Lambda(\varphi)| = \left|\int_X \varphi \, d\nu\right| \le \int_X |\varphi| \, d\nu \le \int_X |\varphi| \, d\sigma.
\end{align*}
Applying the Cauchy-Schwarz inequality with respect to $\sigma$ to the pair $|\varphi|$ and $\mathbb{1}_X$ in $L^2(\sigma)$:
\begin{align*}
\int_X |\varphi| \, d\sigma = \int_X |\varphi| \cdot 1 \, d\sigma \le \left(\int_X |\varphi|^2 \, d\sigma\right)^{1/2} \left(\int_X 1 \, d\sigma\right)^{1/2} = \|\varphi\|_{L^2(\sigma)} \cdot \sigma(X)^{1/2}.
\end{align*}
Therefore $|\Lambda(\varphi)| \le \sigma(X)^{1/2} \|\varphi\|_{L^2(\sigma)}$, so $\Lambda$ is a bounded linear functional with $\|\Lambda\| \le \sigma(X)^{1/2}$.
By the Riesz Representation Theorem for Hilbert spaces, there exists a unique $g \in L^2(X, \mathcal{A}, \sigma)$ such that:
\begin{align*}
\int_X \varphi \, d\nu = \Lambda(\varphi) = (\varphi, g)_{L^2(\sigma)} = \int_X \varphi \, g \, d\sigma \quad \text{for every } \varphi \in L^2(\sigma).
\end{align*}
[guided]
The von Neumann strategy is to represent integration against $\nu$ as an inner product in $L^2(\sigma)$. The measure $\sigma = \mu + \nu$ is chosen so that $\nu \le \sigma$, which ensures $\int |\varphi| \, d\nu \le \int |\varphi| \, d\sigma$ — this is what makes $\Lambda$ bounded.
The bound $|\Lambda(\varphi)| \le \sigma(X)^{1/2} \|\varphi\|_{L^2(\sigma)}$ uses the Cauchy-Schwarz inequality in the form $\int |h| \, d\sigma \le \|h\|_{L^2(\sigma)} \cdot \|\mathbb{1}\|_{L^2(\sigma)}$, which is valid precisely because $\sigma(X) < \infty$ ensures $\mathbb{1}_X \in L^2(\sigma)$. This is where the finiteness assumption is consumed.
The Riesz Representation Theorem (for Hilbert spaces, not for $C(K)$ or $L^p$ duality) then gives the representing function $g$. The function $g$ is the key intermediate object: it encodes how $\nu$ relates to $\sigma = \mu + \nu$, and from it we will extract the Radon-Nikodym derivative of $\nu$ with respect to $\mu$.
[/guided]
[/step]
[step:Show that $0 \le g(x) \le 1$ $\sigma$-a.e., and $g < 1$ $\mu$-a.e.]
**Nonnegativity.** For any $A \in \mathcal{A}$, take $\varphi = \mathbb{1}_A \in L^2(\sigma)$ (since $\sigma(X) < \infty$). The representation gives:
\begin{align*}
0 \le \nu(A) = \int_A g \, d\sigma.
\end{align*}
This holds for every $A \in \mathcal{A}$. If $g < 0$ on a set $E := \{g < 0\}$ of positive $\sigma$-measure, then $\int_E g \, d\sigma < 0$, contradicting $\nu(E) \ge 0$. Therefore $g \ge 0$ $\sigma$-a.e.
**Upper bound.** For any $A \in \mathcal{A}$, again using $\varphi = \mathbb{1}_A$:
\begin{align*}
\nu(A) = \int_A g \, d\sigma \le \sigma(A) = \mu(A) + \nu(A).
\end{align*}
Since $\nu(A) \le \mu(A) + \nu(A)$ is automatic, we need the sharper bound. Consider $\varphi = \mathbb{1}_A$:
\begin{align*}
\int_A g \, d\sigma = \nu(A) \le \sigma(A) = \int_A 1 \, d\sigma.
\end{align*}
Hence $\int_A (g - 1) \, d\sigma \le 0$ for every $A \in \mathcal{A}$. Taking $A = \{g > 1\}$: $\int_{\{g > 1\}} (g - 1) \, d\sigma \le 0$. Since $g - 1 > 0$ on $\{g > 1\}$, this forces $\sigma(\{g > 1\}) = 0$. Therefore $g \le 1$ $\sigma$-a.e.
**Strict inequality $g < 1$ $\mu$-a.e.** Let $E := \{g = 1\}$. For every measurable $A \subseteq E$:
\begin{align*}
\nu(A) = \int_A g \, d\sigma = \int_A 1 \, d\sigma = \sigma(A) = \mu(A) + \nu(A).
\end{align*}
Therefore $\mu(A) = 0$ for every measurable $A \subseteq E$. In particular, $\mu(E) = 0$. Since $\nu \ll \mu$, this gives $\nu(E) = 0$ as well. Hence $g < 1$ $(\mu + \nu)$-a.e., and in particular $g < 1$ $\mu$-a.e.
[guided]
The bounds on $g$ encode the relationship $\nu \le \sigma$. The nonnegativity $g \ge 0$ comes from $\nu \ge 0$, and the upper bound $g \le 1$ comes from $\nu \le \sigma$.
The strict inequality $g < 1$ $\mu$-a.e. is where absolute continuity $\nu \ll \mu$ enters. On the set $\{g = 1\}$, the identity $\nu(A) = \sigma(A) = \mu(A) + \nu(A)$ forces $\mu(A) = 0$ for every measurable subset. This means $\mu$ assigns zero mass to $\{g = 1\}$, so $g < 1$ $\mu$-a.e. The absolute continuity $\nu \ll \mu$ then gives $\nu(\{g = 1\}) = 0$ too, so $g < 1$ also $\sigma$-a.e.
This strict inequality is essential: in the next step, we will divide by $1 - g$, which requires $g \neq 1$.
[/guided]
[/step]
[step:Extract the Radon-Nikodym derivative $f = g/(1-g)$]
After modifying $g$ on a $\sigma$-null set, we may assume $0 \le g(x) < 1$ for all $x \in X$. For any nonneg $\mathcal{A}$-measurable function $\varphi \in L^2(\sigma)$, the representation gives:
\begin{align*}
\int_X \varphi \, d\nu = \int_X \varphi \, g \, d\sigma = \int_X \varphi \, g \, d\mu + \int_X \varphi \, g \, d\nu,
\end{align*}
where the last equality uses $\sigma = \mu + \nu$. Rearranging:
\begin{align*}
\int_X \varphi(1 - g) \, d\nu = \int_X \varphi \, g \, d\mu.
\end{align*}
Apply this identity with $\varphi = \mathbb{1}_A \cdot \sum_{j=0}^{n} g^j$ for $A \in \mathcal{A}$ and $n \in \mathbb{N}$. Since $0 \le g < 1$ and $\sigma(X) < \infty$, each such $\varphi$ belongs to $L^2(\sigma)$. We compute:
\begin{align*}
\int_A (1-g) \sum_{j=0}^{n} g^j \, d\nu &= \int_A (1 - g^{n+1}) \, d\nu, \\
\int_A g \sum_{j=0}^{n} g^j \, d\mu &= \int_A \sum_{j=0}^{n} g^{j+1} \, d\mu = \int_A \sum_{j=1}^{n+1} g^j \, d\mu.
\end{align*}
Therefore:
\begin{align*}
\int_A (1 - g^{n+1}) \, d\nu = \int_A \sum_{j=1}^{n+1} g^j \, d\mu.
\end{align*}
As $n \to \infty$: since $0 \le g < 1$ everywhere, $g^{n+1} \to 0$ pointwise. On the left, the [Dominated Convergence Theorem](/theorems/4) applies with dominating function $\mathbb{1}_A$, since $|1 - g^{n+1}| \le 1$ and $\nu(A) \le \nu(X) < \infty$:
\begin{align*}
\lim_{n \to \infty} \int_A (1 - g^{n+1}) \, d\nu = \int_A 1 \, d\nu = \nu(A).
\end{align*}
On the right, the partial sums $\sum_{j=1}^{n+1} g^j$ increase monotonically to $\sum_{j=1}^{\infty} g^j = g/(1-g)$ (as a geometric series with ratio $g \in [0,1)$). By the [Monotone Convergence Theorem](/theorems/509):
\begin{align*}
\lim_{n \to \infty} \int_A \sum_{j=1}^{n+1} g^j \, d\mu = \int_A \sum_{j=1}^{\infty} g^j \, d\mu = \int_A \frac{g}{1-g} \, d\mu.
\end{align*}
Define the $\mathcal{A}$-measurable function:
\begin{align*}
f: X &\to [0, \infty), \quad f := \frac{g}{1 - g}.
\end{align*}
This is well-defined since $0 \le g < 1$ everywhere. We have established:
\begin{align*}
\nu(A) = \int_A f \, d\mu \quad \text{for every } A \in \mathcal{A}.
\end{align*}
[guided]
The algebraic extraction of $f$ from $g$ is the most delicate part of the proof. The representation $\int \varphi \, d\nu = \int \varphi g \, d\sigma$ holds for $\varphi \in L^2(\sigma)$, but we need the identity $\nu(A) = \int_A f \, d\mu$ for a function $f$ that may not be in $L^2$.
The idea is to split $d\sigma = d\mu + d\nu$ and rearrange to get $\int \varphi(1-g) \, d\nu = \int \varphi g \, d\mu$. If we could simply take $\varphi = \mathbb{1}_A / (1-g)$, we would immediately get $\nu(A) = \int_A g/(1-g) \, d\mu$. However, $1/(1-g)$ may not be in $L^2(\sigma)$ (it could be unbounded), so we cannot directly use it as a test function.
The workaround is to approximate $1/(1-g)$ by the partial sums of the geometric series $\sum_{j=0}^n g^j$. Each partial sum is bounded (since $g \le 1$), so $\varphi = \mathbb{1}_A \sum_{j=0}^n g^j \in L^2(\sigma)$. The identity $(1-g)\sum_{j=0}^n g^j = 1 - g^{n+1}$ telescopes, and we pass to the limit $n \to \infty$.
On the left side, $g^{n+1} \to 0$ pointwise (since $0 \le g < 1$), and the [Dominated Convergence Theorem](/theorems/4) applies because $|1 - g^{n+1}| \le 1$ and $\nu(A) < \infty$. On the right side, the partial sums increase monotonically, so the [Monotone Convergence Theorem](/theorems/509) applies. The limit is the geometric series $\sum_{j=1}^\infty g^j = g/(1-g)$.
[/guided]
[/step]
[step:Extend from finite to $\sigma$-finite measures]
Returning to the general $\sigma$-finite case. Recall the exhaustion $X_k \uparrow X$ with $\mu(X_k) < \infty$ and $\nu(X_k) < \infty$. By the finite case, for each $k$ there exists an $(\mathcal{A} \cap X_k)$-measurable function $f_k: X_k \to [0, \infty)$ with:
\begin{align*}
\nu(A \cap X_k) = \int_{A \cap X_k} f_k \, d\mu \quad \text{for every } A \in \mathcal{A}.
\end{align*}
[claim:Consistency: $f_{k+1} = f_k$ $\mu$-a.e. on $X_k$]
For each $k$, $f_{k+1}|_{X_k} = f_k$ $\mu$-a.e. on $X_k$.
[/claim]
[proof]
For every $A \in \mathcal{A}$ with $A \subseteq X_k$:
\begin{align*}
\int_A f_k \, d\mu = \nu(A \cap X_k) = \nu(A) = \nu(A \cap X_{k+1}) = \int_A f_{k+1} \, d\mu.
\end{align*}
(The second equality holds because $A \subseteq X_k \subseteq X_{k+1}$.) Therefore $\int_A (f_k - f_{k+1}) \, d\mu = 0$ for every measurable $A \subseteq X_k$. Taking $A = \{x \in X_k : f_k(x) > f_{k+1}(x)\}$ gives $f_k \le f_{k+1}$ $\mu$-a.e. on $X_k$. Taking $A = \{x \in X_k : f_{k+1}(x) > f_k(x)\}$ gives $f_{k+1} \le f_k$ $\mu$-a.e. on $X_k$. Hence $f_k = f_{k+1}$ $\mu$-a.e. on $X_k$.
[/proof]
Define $f: X \to [0, \infty)$ by $f(x) := f_k(x)$ for $x \in X_k \setminus X_{k-1}$ (with $X_0 := \varnothing$). By the consistency claim, this is well-defined $\mu$-a.e. Equivalently, $f = \lim_{k \to \infty} f_k \mathbb{1}_{X_k}$, where the limit is nondecreasing $\mu$-a.e. (since $f_k = f_{k+1}$ $\mu$-a.e. on $X_k$, we have $f_k \mathbb{1}_{X_k} \le f_{k+1} \mathbb{1}_{X_{k+1}}$ $\mu$-a.e.).
For every $A \in \mathcal{A}$, the [Monotone Convergence Theorem](/theorems/509) gives:
\begin{align*}
\nu(A) = \lim_{k \to \infty} \nu(A \cap X_k) = \lim_{k \to \infty} \int_{A \cap X_k} f_k \, d\mu = \lim_{k \to \infty} \int_A f_k \mathbb{1}_{X_k} \, d\mu = \int_A f \, d\mu.
\end{align*}
The first equality uses continuity of the measure $\nu$ from below ($A \cap X_k \uparrow A$). The third equality holds because $f_k \mathbb{1}_{X_k} \uparrow f$ $\mu$-a.e. and all functions are nonneg.
[guided]
The passage from finite to $\sigma$-finite is a standard exhaustion argument. The key subtlety is consistency: the densities $f_k$ obtained on each $X_k$ must agree on their overlaps. This follows from the uniqueness of the Radon-Nikodym derivative on the finite measure space $(X_k, \mu|_{X_k})$.
The construction of the global function $f$ glues together the local densities. The Monotone Convergence Theorem is applied to the nondecreasing sequence $f_k \mathbb{1}_{X_k}$, which converges to $f$ $\mu$-a.e. This is where $\sigma$-finiteness is essential: without the exhaustion, we could not approximate $\nu(A)$ by $\nu(A \cap X_k)$.
[/guided]
[/step]
[step:Prove $\mu$-a.e. uniqueness of $f$]
Suppose $f, \tilde{f}: X \to [0, \infty)$ both satisfy $\nu(A) = \int_A f \, d\mu = \int_A \tilde{f} \, d\mu$ for every $A \in \mathcal{A}$. Then:
\begin{align*}
\int_A (f - \tilde{f}) \, d\mu = 0 \quad \text{for every } A \in \mathcal{A}.
\end{align*}
Define $E^+ := \{x \in X : f(x) > \tilde{f}(x)\}$ and $E^- := \{x \in X : f(x) < \tilde{f}(x)\}$. Taking $A = E^+ \cap X_k$ for each $k$:
\begin{align*}
0 = \int_{E^+ \cap X_k} (f - \tilde{f}) \, d\mu.
\end{align*}
Since $f - \tilde{f} > 0$ on $E^+$ and the integral over $E^+ \cap X_k$ is zero, we have $\mu(E^+ \cap X_k) = 0$ for every $k$. By continuity from below, $\mu(E^+) = \lim_k \mu(E^+ \cap X_k) = 0$. Similarly $\mu(E^-) = 0$. Therefore $f = \tilde{f}$ $\mu$-a.e.
[guided]
The uniqueness argument is the same one used throughout the Radon-Nikodym theory. If two nonneg functions have the same integral over every measurable set, then their difference has integral zero over every measurable set. Choosing the set where the difference is positive (or negative) forces the difference to be zero a.e.
The $\sigma$-finiteness enters through the exhaustion: we first show $\mu(E^+ \cap X_k) = 0$ on each finite piece (where the integral is well-defined and finite), then pass to the limit. On a finite measure space, this step would be immediate.
[/guided]
[/step]