[proofplan]
We first prove the identity for indicator functions, where it is exactly the defining property of the pushforward measure. Finite linearity then gives the result for nonnegative simple functions. For a general nonnegative measurable function, we approximate it from below by nonnegative simple functions and pass to the limit using the [Monotone Convergence Theorem](/theorems/509) on both measure spaces. Finally, signed and complex-valued functions are reduced to the nonnegative case by decomposing into positive and negative parts, and then into real and imaginary parts.
[/proofplan]
[step:Verify the identity for indicator functions from the definition of pushforward measure]
Let $B\in\mathcal B$, and define the indicator function $\mathbb 1_B:Y\to[0,\infty]$ by $\mathbb 1_B(y)=1$ if $y\in B$ and $\mathbb 1_B(y)=0$ if $y\notin B$. Since $T:X\to Y$ is measurable, $T^{-1}(B)\in\mathcal A$, and the composition $\mathbb 1_B\circ T:X\to[0,\infty]$ is the indicator function $\mathbb 1_{T^{-1}(B)}$.
By the definition of the integral of an indicator function and the definition of the pushforward measure,
\begin{align*}
\int_Y \mathbb 1_B(y)\,d(T_{\#}\mu)(y)=(T_{\#}\mu)(B)=\mu(T^{-1}(B))=\int_X \mathbb 1_{T^{-1}(B)}(x)\,d\mu(x).
\end{align*}
Since $\mathbb 1_{T^{-1}(B)}(x)=\mathbb 1_B(T(x))$ for every $x\in X$, this gives
\begin{align*}
\int_Y \mathbb 1_B(y)\,d(T_{\#}\mu)(y)=\int_X \mathbb 1_B(T(x))\,d\mu(x).
\end{align*}
[/step]
[step:Extend the identity to nonnegative simple functions by finite linearity]
Let $s:Y\to[0,\infty)$ be a nonnegative $\mathcal B$-measurable [simple function](/page/Simple%20Function). Choose a positive integer $m$, coefficients $a_1,\dots,a_m\in[0,\infty)$, and measurable sets $B_1,\dots,B_m\in\mathcal B$ such that
\begin{align*}
s=\sum_{k=1}^{m} a_k\mathbb 1_{B_k}.
\end{align*}
Then $s\circ T:X\to[0,\infty)$ is the nonnegative simple function
\begin{align*}
s\circ T=\sum_{k=1}^{m} a_k\mathbb 1_{T^{-1}(B_k)}.
\end{align*}
Using finite additivity and homogeneity of the nonnegative simple-function integral, together with the indicator-function case proved above, we obtain
\begin{align*}
\int_Y s(y)\,d(T_{\#}\mu)(y)=\sum_{k=1}^{m}a_k(T_{\#}\mu)(B_k).
\end{align*}
By the definition of $T_{\#}\mu$,
\begin{align*}
\sum_{k=1}^{m}a_k(T_{\#}\mu)(B_k)=\sum_{k=1}^{m}a_k\mu(T^{-1}(B_k)).
\end{align*}
Using the definition of the simple-function integral on $X$,
\begin{align*}
\sum_{k=1}^{m}a_k\mu(T^{-1}(B_k))=\int_X s(T(x))\,d\mu(x).
\end{align*}
Therefore
\begin{align*}
\int_Y s(y)\,d(T_{\#}\mu)(y)=\int_X s(T(x))\,d\mu(x).
\end{align*}
[/step]
[step:Pass from simple functions to nonnegative measurable functions by monotone convergence]
Let $f:Y\to[0,\infty]$ be $\mathcal B$-measurable. Since $T:X\to Y$ is $\mathcal A/\mathcal B$-measurable, the composition $f\circ T:X\to[0,\infty]$, defined by $(f\circ T)(x)=f(T(x))$, is $\mathcal A$-measurable.
By the standard simple-function approximation theorem for nonnegative [measurable functions](/page/Measurable%20Functions), there exists a sequence $(s_n)_{n\geq 1}$ of nonnegative $\mathcal B$-measurable simple functions $s_n:Y\to[0,\infty)$ such that $s_n(y)\uparrow f(y)$ for every $y\in Y$. For each integer $n\geq 1$, the simple-function case gives
\begin{align*}
\int_Y s_n(y)\,d(T_{\#}\mu)(y)=\int_X s_n(T(x))\,d\mu(x).
\end{align*}
Moreover $s_n\circ T:X\to[0,\infty)$ is measurable and satisfies $s_n(T(x))\uparrow f(T(x))$ for every $x\in X$.
Applying the Monotone Convergence Theorem to the increasing sequence $(s_n)_{n\geq 1}$ on the [measure space](/page/Measure%20Space) $(Y,\mathcal B,T_{\#}\mu)$ gives
\begin{align*}
\int_Y f(y)\,d(T_{\#}\mu)(y)=\lim_{n\to\infty}\int_Y s_n(y)\,d(T_{\#}\mu)(y).
\end{align*}
Applying the same theorem to the increasing sequence $(s_n\circ T)_{n\geq 1}$ on the measure space $(X,\mathcal A,\mu)$ gives
\begin{align*}
\int_X f(T(x))\,d\mu(x)=\lim_{n\to\infty}\int_X s_n(T(x))\,d\mu(x).
\end{align*}
Since the two integrals are equal for every integer $n\geq 1$, their limits are equal. Hence
\begin{align*}
\int_Y f(y)\,d(T_{\#}\mu)(y)=\int_X f(T(x))\,d\mu(x).
\end{align*}
[guided]
The goal is to pass from simple functions, where the identity has already been proved by finite linearity, to an arbitrary nonnegative measurable function. The bridge is approximation from below. Since $f:Y\to[0,\infty]$ is $\mathcal B$-measurable, the standard simple-function approximation theorem gives a sequence $(s_n)_{n\geq 1}$ of nonnegative $\mathcal B$-measurable simple functions $s_n:Y\to[0,\infty)$ such that $s_n(y)\uparrow f(y)$ for every $y\in Y$.
We must also check that the approximating sequence remains valid after composition with $T$. Because $T:X\to Y$ is $\mathcal A/\mathcal B$-measurable and each $s_n$ is $\mathcal B$-measurable, the composition
\begin{align*}
s_n\circ T:X\to[0,\infty),\qquad (s_n\circ T)(x)=s_n(T(x))
\end{align*}
is $\mathcal A$-measurable for every integer $n\geq 1$. Since $s_n(y)\uparrow f(y)$ pointwise on $Y$, substituting $y=T(x)$ gives
\begin{align*}
s_n(T(x))\uparrow f(T(x))
\end{align*}
for every $x\in X$. Thus $(s_n\circ T)_{n\geq 1}$ is an increasing sequence of nonnegative measurable functions on $(X,\mathcal A,\mu)$.
For each fixed integer $n\geq 1$, the simple-function case already proved applies to $s_n$, so
\begin{align*}
\int_Y s_n(y)\,d(T_{\#}\mu)(y)=\int_X s_n(T(x))\,d\mu(x).
\end{align*}
Now we pass to the limit. The Monotone Convergence Theorem applies on $(Y,\mathcal B,T_{\#}\mu)$ because $(s_n)_{n\geq 1}$ is nonnegative, measurable, and increasing pointwise to $f$. Therefore
\begin{align*}
\int_Y f(y)\,d(T_{\#}\mu)(y)=\lim_{n\to\infty}\int_Y s_n(y)\,d(T_{\#}\mu)(y).
\end{align*}
The same theorem applies on $(X,\mathcal A,\mu)$ because $(s_n\circ T)_{n\geq 1}$ is nonnegative, measurable, and increasing pointwise to $f\circ T$. Hence
\begin{align*}
\int_X f(T(x))\,d\mu(x)=\lim_{n\to\infty}\int_X s_n(T(x))\,d\mu(x).
\end{align*}
The two sequences inside these limits are equal term by term, so their limits are equal. This proves
\begin{align*}
\int_Y f(y)\,d(T_{\#}\mu)(y)=\int_X f(T(x))\,d\mu(x).
\end{align*}
The nonnegativity hypothesis is exactly what permits this monotone convergence argument without any integrability assumption.
[/guided]
[/step]
[step:Reduce real-valued functions to their positive and negative parts]
Let $f:Y\to\mathbb R$ be $\mathcal B$-measurable. Define the positive and negative parts $f^+:Y\to[0,\infty]$ and $f^-:Y\to[0,\infty]$ by
\begin{align*}
f^+(y)=\max\{f(y),0\},\qquad f^-(y)=\max\{-f(y),0\}.
\end{align*}
Then $f=f^+-f^-$ and $|f|=f^++f^-$. The nonnegative case applied to $f^+$ and $f^-$ gives
\begin{align*}
\int_Y f^+(y)\,d(T_{\#}\mu)(y)=\int_X f^+(T(x))\,d\mu(x)
\end{align*}
and
\begin{align*}
\int_Y f^-(y)\,d(T_{\#}\mu)(y)=\int_X f^-(T(x))\,d\mu(x).
\end{align*}
Also $(f\circ T)^+=f^+\circ T$ and $(f\circ T)^-=f^-\circ T$ pointwise on $X$.
If the signed integral of $f$ with respect to $T_{\#}\mu$ is well-defined, then the two nonnegative quantities defining it are not both infinite. By the two identities above, the corresponding positive and negative integrals of $f\circ T$ with respect to $\mu$ have the same values, so the signed integral of $f\circ T$ is well-defined and
\begin{align*}
\int_Y f(y)\,d(T_{\#}\mu)(y)=\int_X f(T(x))\,d\mu(x).
\end{align*}
Conversely, suppose the signed integral of $f\circ T$ with respect to $\mu$ is well-defined. Then the two nonnegative quantities defining that signed integral are not both infinite. Since the nonnegative-case identities identify those quantities with the corresponding positive and negative integrals of $f$ with respect to $T_{\#}\mu$, the signed integral of $f$ with respect to $T_{\#}\mu$ is well-defined and the same subtraction gives the displayed identity. In particular, if $f\in L^1(Y,\mathcal B,T_{\#}\mu)$, then applying the nonnegative case to $|f|$ gives
\begin{align*}
\int_X |f(T(x))|\,d\mu(x)=\int_Y |f(y)|\,d(T_{\#}\mu)(y)<\infty,
\end{align*}
so $f\circ T\in L^1(X,\mathcal A,\mu)$ and the identity holds. Conversely, if $f\circ T\in L^1(X,\mathcal A,\mu)$, then the same displayed equality gives
\begin{align*}
\int_Y |f(y)|\,d(T_{\#}\mu)(y)=\int_X |f(T(x))|\,d\mu(x)<\infty,
\end{align*}
so $f\in L^1(Y,\mathcal B,T_{\#}\mu)$ and the identity holds.
[/step]
[step:Reduce complex-valued functions to real and imaginary parts]
Let $f:Y\to\mathbb C$ be $\mathcal B$-measurable. Define the real and imaginary part functions $\operatorname{Re} f:Y\to\mathbb R$ and $\operatorname{Im} f:Y\to\mathbb R$ by writing
\begin{align*}
f(y)=\operatorname{Re} f(y)+i\operatorname{Im} f(y)
\end{align*}
for each $y\in Y$. Since the coordinate projection maps $\operatorname{Re}:\mathbb C\to\mathbb R$ and $\operatorname{Im}:\mathbb C\to\mathbb R$ are Borel measurable, the compositions $\operatorname{Re} f$ and $\operatorname{Im} f$ are $\mathcal B$-measurable. Under the hypothesis that the corresponding component integrals are well-defined, the real-valued case applies to $\operatorname{Re} f$ and $\operatorname{Im} f$ and gives
\begin{align*}
\int_Y \operatorname{Re} f(y)\,d(T_{\#}\mu)(y)=\int_X \operatorname{Re} f(T(x))\,d\mu(x)
\end{align*}
and
\begin{align*}
\int_Y \operatorname{Im} f(y)\,d(T_{\#}\mu)(y)=\int_X \operatorname{Im} f(T(x))\,d\mu(x).
\end{align*}
Since integration of complex-valued functions is defined componentwise by real and imaginary parts, we obtain
\begin{align*}
\int_Y f(y)\,d(T_{\#}\mu)(y)=\int_X f(T(x))\,d\mu(x).
\end{align*}
If $f\in L^1(Y,\mathcal B,T_{\#}\mu)$, then the nonnegative case applied to $|f|:Y\to[0,\infty)$ gives
\begin{align*}
\int_X |f(T(x))|\,d\mu(x)=\int_Y |f(y)|\,d(T_{\#}\mu)(y)<\infty.
\end{align*}
Thus $f\circ T\in L^1(X,\mathcal A,\mu)$, and the complex-valued identity follows. Conversely, if $f\circ T\in L^1(X,\mathcal A,\mu)$, then the same equality gives
\begin{align*}
\int_Y |f(y)|\,d(T_{\#}\mu)(y)=\int_X |f(T(x))|\,d\mu(x)<\infty.
\end{align*}
Thus $f\in L^1(Y,\mathcal B,T_{\#}\mu)$, and the complex-valued identity follows. This completes the proof.
[/step]