[proofplan]
We compare the joint law of $(X,Y)$ with the measure on $\mathbb R^2$ having density $(x,y)\mapsto f_X(x)f_Y(y)$. If $X$ and $Y$ are independent, these two measures agree on measurable rectangles, hence on all Borel sets, and uniqueness of densities gives the almost everywhere factorization. Conversely, if the density factorization holds almost everywhere, integrating it over rectangles and using the Fubini-Tonelli theorem gives the product formula for probabilities, which is exactly independence.
[/proofplan]
[step:Define the joint law and the product-density measure]
Let $\mu$ denote the joint distribution of $(X,Y)$, so $\mu:\mathcal B(\mathbb R^2)\to[0,1]$ is the [probability measure](/page/Probability%20Measure)
\begin{align*}
\mu(E)=\mathbb P((X,Y)\in E)
\end{align*}
for each $E\in\mathcal B(\mathbb R^2)$. Since $f_{X,Y}$ is a joint density, for every $E\in\mathcal B(\mathbb R^2)$,
\begin{align*}
\mu(E)=\int_E f_{X,Y}(x,y)\,d\mathcal L^2(x,y).
\end{align*}
Define the [measurable function](/page/Measurable%20Function) $g:\mathbb R^2\to[0,\infty)$ by
\begin{align*}
g(x,y)=f_X(x)f_Y(y).
\end{align*}
By the Fubini-Tonelli theorem for non-negative [measurable functions](/page/Measurable%20Functions), $g$ is $\mathcal L^2$-integrable and
\begin{align*}
\int_{\mathbb R^2} g(x,y)\,d\mathcal L^2(x,y)=\left(\int_{\mathbb R} f_X(x)\,d\mathcal L^1(x)\right)\left(\int_{\mathbb R} f_Y(y)\,d\mathcal L^1(y)\right)=1.
\end{align*}
Thus $g$ defines a probability measure $\nu:\mathcal B(\mathbb R^2)\to[0,1]$ by
\begin{align*}
\nu(E)=\int_E g(x,y)\,d\mathcal L^2(x,y).
\end{align*}
[/step]
[step:Prove density factorization from independence]
Assume that $X$ and $Y$ are independent. Let $A,B\in\mathcal B(\mathbb R)$. Since $A\times B\in\mathcal B(\mathbb R^2)$, independence gives
\begin{align*}
\mu(A\times B)=\mathbb P(X\in A,\ Y\in B)=\mathbb P(X\in A)\mathbb P(Y\in B).
\end{align*}
Using the marginal density formulas for $X$ and $Y$,
\begin{align*}
\mathbb P(X\in A)\mathbb P(Y\in B)=\left(\int_A f_X(x)\,d\mathcal L^1(x)\right)\left(\int_B f_Y(y)\,d\mathcal L^1(y)\right).
\end{align*}
By the Fubini-Tonelli theorem applied to the non-negative measurable function $(x,y)\mapsto \mathbb 1_A(x)\mathbb 1_B(y)f_X(x)f_Y(y)$,
\begin{align*}
\left(\int_A f_X(x)\,d\mathcal L^1(x)\right)\left(\int_B f_Y(y)\,d\mathcal L^1(y)\right)=\int_{A\times B} g(x,y)\,d\mathcal L^2(x,y)=\nu(A\times B).
\end{align*}
Therefore $\mu$ and $\nu$ agree on all measurable rectangles $A\times B$ with $A,B\in\mathcal B(\mathbb R)$.
The class of Borel sets $E\in\mathcal B(\mathbb R^2)$ satisfying $\mu(E)=\nu(E)$ is a Dynkin system. It contains the $\pi$-system of measurable rectangles, and this $\pi$-system generates $\mathcal B(\mathbb R^2)$. By the $\pi$-$\lambda$ theorem, $\mu=\nu$ on $\mathcal B(\mathbb R^2)$.
It remains only to translate equality of the induced measures into equality of their densities. Since
\begin{align*}
\int_E f_{X,Y}(x,y)\,d\mathcal L^2(x,y)=\mu(E)=\nu(E)=\int_E g(x,y)\,d\mathcal L^2(x,y)
\end{align*}
for every $E\in\mathcal B(\mathbb R^2)$, uniqueness of non-negative Lebesgue densities gives
\begin{align*}
f_{X,Y}(x,y)=g(x,y)=f_X(x)f_Y(y)
\end{align*}
for $\mathcal L^2$-a.e. $(x,y)\in\mathbb R^2$.
[guided]
Assume that $X$ and $Y$ are independent. The goal is to prove equality of two functions, but densities are best compared through the measures they define. Let $A,B\in\mathcal B(\mathbb R)$. The set $A\times B$ is a Borel subset of $\mathbb R^2$, and the event $\{(X,Y)\in A\times B\}$ is exactly the event $\{X\in A,\ Y\in B\}$. Therefore
\begin{align*}
\mu(A\times B)
=
\mathbb P((X,Y)\in A\times B)
=
\mathbb P(X\in A,\ Y\in B).
\end{align*}
By independence of the random variables,
\begin{align*}
\mathbb P(X\in A,\ Y\in B)
=
\mathbb P(X\in A)\mathbb P(Y\in B).
\end{align*}
Now use the fact that $f_X$ and $f_Y$ are the chosen marginal densities:
\begin{align*}
\mathbb P(X\in A)
=
\int_A f_X(x)\,d\mathcal L^1(x)
\end{align*}
and
\begin{align*}
\mathbb P(Y\in B)
=
\int_B f_Y(y)\,d\mathcal L^1(y).
\end{align*}
Multiplying these two identities gives
\begin{align*}
\mathbb P(X\in A)\mathbb P(Y\in B)
=
\left(\int_A f_X(x)\,d\mathcal L^1(x)\right)
\left(\int_B f_Y(y)\,d\mathcal L^1(y)\right).
\end{align*}
The product of the two one-dimensional integrals is exactly the integral of the product density over the rectangle. Indeed, the function
\begin{align*}
(x,y)\mapsto \mathbb 1_A(x)\mathbb 1_B(y)f_X(x)f_Y(y)
\end{align*}
is non-negative and Borel measurable on $\mathbb R^2$. Hence the Fubini-Tonelli theorem applies and yields
\begin{align*}
\int_{A\times B} g(x,y)\,d\mathcal L^2(x,y)
=
\left(\int_A f_X(x)\,d\mathcal L^1(x)\right)
\left(\int_B f_Y(y)\,d\mathcal L^1(y)\right).
\end{align*}
Combining the previous identities gives
\begin{align*}
\mu(A\times B)=\nu(A\times B)
\end{align*}
for every pair $A,B\in\mathcal B(\mathbb R)$.
Why is equality on rectangles enough? The measurable rectangles form a $\pi$-system, meaning that the intersection of two such rectangles is again such a rectangle. They generate the full Borel $\sigma$-algebra $\mathcal B(\mathbb R^2)$. The class
\begin{align*}
\mathcal D=\{E\in\mathcal B(\mathbb R^2):\mu(E)=\nu(E)\}
\end{align*}
is a Dynkin system because $\mu$ and $\nu$ are finite measures: it contains $\mathbb R^2$, is closed under complements inside $\mathbb R^2$, and is closed under countable disjoint unions by countable additivity. Since $\mathcal D$ contains the rectangle $\pi$-system, the $\pi$-$\lambda$ theorem implies that $\mathcal D=\mathcal B(\mathbb R^2)$. Thus $\mu=\nu$ on all Borel subsets of $\mathbb R^2$.
Finally, both measures are represented by densities with respect to $\mathcal L^2$:
\begin{align*}
\mu(E)=\int_E f_{X,Y}(x,y)\,d\mathcal L^2(x,y)
\end{align*}
and
\begin{align*}
\nu(E)=\int_E g(x,y)\,d\mathcal L^2(x,y)
\end{align*}
for every $E\in\mathcal B(\mathbb R^2)$. Since $\mu=\nu$, these integrals are equal for every Borel set $E$. The uniqueness of non-negative Lebesgue densities then gives
\begin{align*}
f_{X,Y}(x,y)=g(x,y)=f_X(x)f_Y(y)
\end{align*}
for $\mathcal L^2$-a.e. $(x,y)\in\mathbb R^2$.
[/guided]
[/step]
[step:Record the density uniqueness argument used above]
[claim:Non-negative densities with the same integrals are equal almost everywhere]
Let $h,k:\mathbb R^2\to[0,\infty)$ be Borel measurable functions such that
\begin{align*}
\int_E h(x,y)\,d\mathcal L^2(x,y)
=
\int_E k(x,y)\,d\mathcal L^2(x,y)
\end{align*}
for every $E\in\mathcal B(\mathbb R^2)$. Then $h=k$ for $\mathcal L^2$-a.e. $(x,y)\in\mathbb R^2$.
[/claim]
[proof]
For each $n\in\mathbb N$, define the Borel set
\begin{align*}
E_n=\{(x,y)\in\mathbb R^2:h(x,y)\ge k(x,y)+1/n\}.
\end{align*}
On $E_n$ we have $h\ge k+1/n$, so monotonicity and additivity of the [Lebesgue integral](/page/Lebesgue%20Integral) give
\begin{align*}
\int_{E_n} h(x,y)\,d\mathcal L^2(x,y)
\ge
\int_{E_n} k(x,y)\,d\mathcal L^2(x,y)
+
\frac{1}{n}\mathcal L^2(E_n).
\end{align*}
The hypothesis says that the two integrals over $E_n$ are equal. Hence $\mathcal L^2(E_n)=0$. Since
\begin{align*}
\{(x,y)\in\mathbb R^2:h(x,y)>k(x,y)\}
=
\bigcup_{n=1}^{\infty} E_n,
\end{align*}
[countable subadditivity](/theorems/1108) gives
\begin{align*}
\mathcal L^2(\{h>k\})=0.
\end{align*}
Interchanging the roles of $h$ and $k$ gives
\begin{align*}
\mathcal L^2(\{k>h\})=0.
\end{align*}
Therefore $\mathcal L^2(\{h\ne k\})=0$, which is the desired almost everywhere equality.
[/proof]
[/step]
[step:Derive independence from the density factorization]
Assume now that
\begin{align*}
f_{X,Y}(x,y)=f_X(x)f_Y(y)
\end{align*}
for $\mathcal L^2$-a.e. $(x,y)\in\mathbb R^2$. Let $A,B\in\mathcal B(\mathbb R)$. Since changing an integrand on an $\mathcal L^2$-null set does not change its Lebesgue integral,
\begin{align*}
\mathbb P(X\in A,\ Y\in B)=\int_{A\times B} f_{X,Y}(x,y)\,d\mathcal L^2(x,y)=\int_{A\times B} f_X(x)f_Y(y)\,d\mathcal L^2(x,y).
\end{align*}
By the Fubini-Tonelli theorem applied to the non-negative measurable function $(x,y)\mapsto \mathbb 1_A(x)\mathbb 1_B(y)f_X(x)f_Y(y)$,
\begin{align*}
\int_{A\times B} f_X(x)f_Y(y)\,d\mathcal L^2(x,y)=\left(\int_A f_X(x)\,d\mathcal L^1(x)\right)\left(\int_B f_Y(y)\,d\mathcal L^1(y)\right).
\end{align*}
Using the marginal density formulas again,
\begin{align*}
\left(\int_A f_X(x)\,d\mathcal L^1(x)\right)\left(\int_B f_Y(y)\,d\mathcal L^1(y)\right)=\mathbb P(X\in A)\mathbb P(Y\in B).
\end{align*}
Thus
\begin{align*}
\mathbb P(X\in A,\ Y\in B)=\mathbb P(X\in A)\mathbb P(Y\in B)
\end{align*}
for all $A,B\in\mathcal B(\mathbb R)$. This is the definition of independence of the real-valued random variables $X$ and $Y$. Hence $X$ and $Y$ are independent.
[/step]