[proofplan]
We prove the unnormalised identity $\mu(T_A^{-1}(E)) = \mu(E)$ for every $E \in \mathcal{B}_A$, then divide by $\mu(A)$. The [Poincaré Recurrence Theorem](/theorems/???) gives $\mu(A_\infty) = 0$, so the extension of $T_A$ by the identity on $A_\infty$ contributes only a $\mu$-null subset to every preimage. On $A_*$ we partition by first-return time and identify the induced preimage as a disjoint union of first-return pieces. To show that the measures of those pieces sum to $\mu(E)$, we repeatedly pull $E$ back under $T$, peel off the points whose first return to $A$ occurs at the next step, and show that the remaining error terms have measures tending to zero.
[/proofplan]
[step:Verify measurability of the induced map and reduce to the recurrent part of $A$]
For $j \in \mathbb{N} \cup \{0\}$, let $T^j : X \to X$ denote the $j$-fold iterate of $T$, with $T^0$ the identity map on $X$. For $C \in \mathcal{B}$, write $T^{-j}(C) := (T^j)^{-1}(C)$; this is preimage notation and does not require $T$ to be invertible.
For each $k \in \mathbb{N}$, the level set
\begin{align*}
\{n_A = k\} = A \cap \bigcap_{j=1}^{k-1} T^{-j}(X \setminus A) \cap T^{-k}(A)
\end{align*}
lies in $\mathcal{B}_A$, where the intersection over $1 \le j \le k-1$ is interpreted as $X$ when $k = 1$. Therefore $n_A$ is $\mathcal{B}_A$-measurable as a map into $\mathbb{N} \cup \{\infty\}$ (with the discrete $\sigma$-algebra), and
\begin{align*}
A_\infty = A \setminus \bigcup_{k=1}^{\infty} \{n_A = k\} \in \mathcal{B}_A, \qquad A_* = A \setminus A_\infty \in \mathcal{B}_A.
\end{align*}
Since $(X, \mathcal{B}, \mu, T)$ is measure-preserving, $\mu(X) < \infty$, and $A \in \mathcal{B}$, the hypotheses of the [Poincaré Recurrence Theorem](/theorems/???) are met, and it yields
\begin{align*}
\mu(A_\infty) = 0.
\end{align*}
For every $E \in \mathcal{B}_A$, the definition of $T_A$ gives
\begin{align*}
T_A^{-1}(E) = \left(\bigcup_{k=1}^{\infty} \{n_A = k\} \cap T^{-k}(E)\right) \dot\cup (E \cap A_\infty),
\end{align*}
where the union is disjoint because the sets $\{n_A = k\}$ are pairwise disjoint and contained in $A_*$, while $E \cap A_\infty \subseteq A_\infty$. Each piece $\{n_A = k\} \cap T^{-k}(E)$ lies in $\mathcal{B}_A$, and $E \cap A_\infty \in \mathcal{B}_A$, so $T_A^{-1}(E) \in \mathcal{B}_A$. Hence $T_A$ is $\mathcal{B}_A$-measurable. Because $E \cap A_\infty \subseteq A_\infty$ and $\mu(A_\infty) = 0$, $\mu(E \cap A_\infty) = 0$, so
\begin{align*}
\mu(T_A^{-1}(E)) = \mu\left(\bigcup_{k=1}^{\infty} \{n_A = k\} \cap T^{-k}(E)\right).
\end{align*}
Replacing the identity on $A_\infty$ by any other $\mathcal{B}_A$-measurable map $A_\infty \to A$ changes $T_A^{-1}(E)$ only by a measurable subset of $A_\infty$, hence only by a $\mu$-null set; the displayed identity, and therefore the entire argument that follows, is unaffected by the choice of extension.
[guided]
We first record the measurability of all sets that appear in the proof, then isolate the only domain issue in the definition of the induced map.
For $j \in \mathbb{N} \cup \{0\}$, let $T^j : X \to X$ be the $j$-fold iterate of $T$, and for every measurable set $C \in \mathcal{B}$ write $T^{-j}(C) := (T^j)^{-1}(C)$. This is preimage notation; no inverse map for $T$ is being assumed.
For each $k \in \mathbb{N}$, a point $x \in A$ satisfies $n_A(x) = k$ if and only if $T^j(x) \notin A$ for $j = 1, \dots, k-1$ and $T^k(x) \in A$. Hence
\begin{align*}
\{n_A = k\} = A \cap \bigcap_{j=1}^{k-1} T^{-j}(X \setminus A) \cap T^{-k}(A),
\end{align*}
a measurable subset of $A$. Therefore $A_\infty = A \setminus \bigcup_{k=1}^{\infty} \{n_A = k\} \in \mathcal{B}_A$ and $A_* = A \setminus A_\infty \in \mathcal{B}_A$.
The [Poincaré Recurrence Theorem](/theorems/???) applies because $(X, \mathcal{B}, \mu, T)$ is measure-preserving, $\mu(X) < \infty$, and $A \in \mathcal{B}$. Its conclusion states that almost every point of $A$ returns to $A$, which translates to
\begin{align*}
\mu(A_\infty) = 0.
\end{align*}
We now check measurability of $T_A : A \to A$. By construction $T_A = T^k$ on $\{n_A = k\}$ for each $k \in \mathbb{N}$, and $T_A$ is the identity on $A_\infty$. For $E \in \mathcal{B}_A$,
\begin{align*}
T_A^{-1}(E) = \left(\bigcup_{k=1}^{\infty} \{n_A = k\} \cap T^{-k}(E)\right) \dot\cup (E \cap A_\infty).
\end{align*}
Each piece $\{n_A = k\} \cap T^{-k}(E)$ lies in $\mathcal{B}_A$ because $\{n_A = k\} \in \mathcal{B}_A$ and $T^{-k}(E) \in \mathcal{B}$, and similarly $E \cap A_\infty \in \mathcal{B}_A$. Hence $T_A^{-1}(E) \in \mathcal{B}_A$, and $T_A$ is $\mathcal{B}_A$-measurable.
Finally, $\mu(A_\infty) = 0$ forces $\mu(E \cap A_\infty) = 0$, so
\begin{align*}
\mu(T_A^{-1}(E)) = \mu\left(\bigcup_{k=1}^{\infty} \{n_A = k\} \cap T^{-k}(E)\right).
\end{align*}
Why does the choice of extension not matter? Any $\mathcal{B}_A$-measurable map $A \to A$ that agrees with $T^{n_A(x)}(x)$ on $A_*$ would alter $T_A^{-1}(E)$ only by some measurable subset of $A_\infty$, which has $\mu$-measure zero. The displayed identity — and therefore the conclusion of the theorem — is unaffected. The convention 'extend by the identity' in the statement is a definite, measurable choice that makes $T_A : A \to A$ well-defined everywhere on $A$; any other measurable choice yields the same measure-preservation conclusion. Without specifying such a choice, an arbitrary set-theoretic extension to the null set $A_\infty$ need not be $\mathcal{B}$-measurable in a non-complete measure space, so the explicit extension matters for measurability even though it does not matter for the measure-preservation identity.
[/guided]
[/step]
[step:Partition the recurrent set by first-return times]
For each $k \in \mathbb{N}$, write
\begin{align*}
A_k := \{n_A = k\} = A \cap \bigcap_{j=1}^{k-1} T^{-j}(X \setminus A) \cap T^{-k}(A) \in \mathcal{B}_A,
\end{align*}
where the intersection over $1 \le j \le k-1$ is interpreted as $X$ when $k = 1$. The sets $(A_k)_{k=1}^{\infty}$ are pairwise disjoint and
\begin{align*}
A_* = \bigcup_{k=1}^{\infty} A_k.
\end{align*}
[/step]
[step:Express the induced preimage through first-return pieces]
Fix $E \in \mathcal{B}_A$. From Step 1, $\mu(E \cap A_\infty) = 0$, so
\begin{align*}
\mu(T_A^{-1}(E)) = \mu\left(\bigcup_{k=1}^{\infty} A_k \cap T^{-k}(E)\right).
\end{align*}
The sets $(A_k \cap T^{-k}(E))_{k \in \mathbb{N}}$ are pairwise disjoint because the $A_k$ are pairwise disjoint. Countable additivity gives
\begin{align*}
\mu(T_A^{-1}(E)) = \sum_{k=1}^{\infty} \mu(A_k \cap T^{-k}(E)).
\end{align*}
[/step]
[step:Peel backward preimages of $E$ until the first return to $A$ appears]
Write $A^c := X \setminus A$. For each $m \in \mathbb{N}$, define the remainder set
\begin{align*}
R_m(E) := \bigcap_{j=0}^{m-1} T^{-j}(A^c) \cap T^{-m}(E).
\end{align*}
We claim that for every $m \in \mathbb{N}$,
\begin{align*}
\mu(E) = \sum_{k=1}^{m} \mu(A_k \cap T^{-k}(E)) + \mu(R_m(E)).
\end{align*}
[claim:The finite peeling identity holds for every $m \in \mathbb{N}$]
For every $m \in \mathbb{N}$,
\begin{align*}
\mu(E) = \sum_{k=1}^{m} \mu(A_k \cap T^{-k}(E)) + \mu(R_m(E)).
\end{align*}
[/claim]
[proof]
For $m = 1$, measure-preservation of $T$ gives $\mu(E) = \mu(T^{-1}(E))$. Since $X = A \dot\cup A^c$,
\begin{align*}
T^{-1}(E) = (A \cap T^{-1}(E)) \dot\cup (A^c \cap T^{-1}(E)).
\end{align*}
Because $E \subseteq A$, we have $A \cap T^{-1}(E) = A_1 \cap T^{-1}(E)$, and the second term is $R_1(E)$. Thus the identity holds for $m = 1$.
Assume the identity holds for some $m \in \mathbb{N}$. Since $R_m(E) \in \mathcal{B}$ and $T$ preserves $\mu$,
\begin{align*}
\mu(R_m(E)) &= \mu(T^{-1}(R_m(E)))\\
&= \mu\left(\bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E)\right),
\end{align*}
using $T^{-1}(T^{-j}(A^c)) = T^{-(j+1)}(A^c)$ and $T^{-1}(T^{-m}(E)) = T^{-(m+1)}(E)$. Partition the last set according to $X = A \dot\cup A^c$:
\begin{align*}
\bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E) &= \left(A \cap \bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E)\right)\\
&\quad \dot\cup \left(A^c \cap \bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E)\right).
\end{align*}
Since $E \subseteq A$, the first set is $A_{m+1} \cap T^{-(m+1)}(E)$, and the second set is $R_{m+1}(E)$. Hence
\begin{align*}
\mu(R_m(E)) = \mu(A_{m+1} \cap T^{-(m+1)}(E)) + \mu(R_{m+1}(E)).
\end{align*}
Substituting this into the identity for $m$ proves the identity for $m + 1$. The result follows by induction.
[/proof]
The sets $(R_m(E))_{m=1}^{\infty}$ are pairwise disjoint. Indeed, if $m < \ell$ and $x \in R_m(E) \cap R_\ell(E)$, then $T^m(x) \in E \subseteq A$ from $x \in R_m(E)$, while $T^m(x) \in A^c$ from $x \in R_\ell(E)$ (since the definition of $R_\ell(E)$ forces $T^j(x) \in A^c$ for $0 \le j \le \ell - 1$, in particular for $j = m$). This is a contradiction. Therefore
\begin{align*}
\sum_{m=1}^{M} \mu(R_m(E)) = \mu\left(\bigcup_{m=1}^{M} R_m(E)\right) \le \mu(X)
\end{align*}
for every $M \in \mathbb{N}$. Since $\mu(X) < \infty$, the non-negative sequence $(\mu(R_m(E)))_{m=1}^{\infty}$ is summable, hence tends to $0$. Letting $m \to \infty$ in the finite peeling identity gives
\begin{align*}
\mu(E) = \sum_{k=1}^{\infty} \mu(A_k \cap T^{-k}(E)).
\end{align*}
[guided]
The goal of this step is to prove that the first-return pieces in the induced preimage have total measure $\mu(E)$. We do this by starting from $E$, pulling it back once under $T$, and separating the points according to whether they already lie in $A$. The points lying in $A$ have first return time $1$; the points outside $A$ form an error term. We then repeat the same operation on the error term.
Write $A^c := X \setminus A$. For each $m \in \mathbb{N}$, define
\begin{align*}
R_m(E) := \bigcap_{j=0}^{m-1} T^{-j}(A^c) \cap T^{-m}(E).
\end{align*}
Thus $R_m(E)$ consists of points whose first $m$ orbit positions avoid $A$ at times $0, 1, \dots, m-1$, and whose $m$-th image lies in $E$. We prove by induction that
\begin{align*}
\mu(E) = \sum_{k=1}^{m} \mu(A_k \cap T^{-k}(E)) + \mu(R_m(E)).
\end{align*}
For $m = 1$, measure-preservation of $T$ gives
\begin{align*}
\mu(E) = \mu(T^{-1}(E)).
\end{align*}
Now partition $T^{-1}(E)$ according to the measurable decomposition $X = A \dot\cup A^c$:
\begin{align*}
T^{-1}(E) = (A \cap T^{-1}(E)) \dot\cup (A^c \cap T^{-1}(E)).
\end{align*}
Because $E \subseteq A$, any $x \in A \cap T^{-1}(E)$ starts in $A$ and returns to $A$ after one iterate, so this set is exactly $A_1 \cap T^{-1}(E)$. The other set is exactly
\begin{align*}
R_1(E) = A^c \cap T^{-1}(E).
\end{align*}
Therefore
\begin{align*}
\mu(E) = \mu(A_1 \cap T^{-1}(E)) + \mu(R_1(E)).
\end{align*}
Assume now that the identity holds for some $m \in \mathbb{N}$. We peel the remainder $R_m(E)$ once more. Since $R_m(E) \in \mathcal{B}$ and $T$ preserves $\mu$,
\begin{align*}
\mu(R_m(E)) &= \mu(T^{-1}(R_m(E)))\\
&= \mu\left(T^{-1}\left(\bigcap_{j=0}^{m-1} T^{-j}(A^c) \cap T^{-m}(E)\right)\right)\\
&= \mu\left(\bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E)\right).
\end{align*}
The last equality uses the elementary preimage identities $T^{-1}(T^{-j}(A^c)) = T^{-(j+1)}(A^c)$ and $T^{-1}(T^{-m}(E)) = T^{-(m+1)}(E)$.
Now split this pulled-back set according to whether the starting point is in $A$:
\begin{align*}
\bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E) &= \left(A \cap \bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E)\right)\\
&\quad \dot\cup \left(A^c \cap \bigcap_{j=1}^{m} T^{-j}(A^c) \cap T^{-(m+1)}(E)\right).
\end{align*}
The first set consists of points that start in $A$, avoid $A$ for the next $m$ iterates, and then land in $E \subseteq A$ at time $m+1$. Hence it is
\begin{align*}
A_{m+1} \cap T^{-(m+1)}(E).
\end{align*}
The second set consists of points that avoid $A$ at times $0, 1, \dots, m$ and land in $E$ at time $m+1$, so it is
\begin{align*}
R_{m+1}(E).
\end{align*}
Thus
\begin{align*}
\mu(R_m(E)) = \mu(A_{m+1} \cap T^{-(m+1)}(E)) + \mu(R_{m+1}(E)).
\end{align*}
Substituting this equality into the induction hypothesis gives the peeling identity with $m+1$ in place of $m$.
It remains to pass from the finite identity to the infinite one. The remainders are pairwise disjoint: if $m < \ell$ and $x \in R_m(E) \cap R_\ell(E)$, then $x \in R_m(E)$ gives $T^m(x) \in E \subseteq A$, while $x \in R_\ell(E)$ gives $T^m(x) \in A^c$, because $\ell > m$ and the definition of $R_\ell(E)$ requires avoidance of $A$ at time $m$. This contradiction proves disjointness. Hence, for every $M \in \mathbb{N}$,
\begin{align*}
\sum_{m=1}^{M} \mu(R_m(E)) = \mu\left(\bigcup_{m=1}^{M} R_m(E)\right) \le \mu(X).
\end{align*}
Since $\mu(X) < \infty$, the non-negative sequence $\mu(R_m(E))$ is summable and must tend to $0$. Letting $m \to \infty$ in
\begin{align*}
\mu(E) = \sum_{k=1}^{m} \mu(A_k \cap T^{-k}(E)) + \mu(R_m(E))
\end{align*}
gives
\begin{align*}
\mu(E) = \sum_{k=1}^{\infty} \mu(A_k \cap T^{-k}(E)).
\end{align*}
[/guided]
[/step]
[step:Normalise the unnormalised identity to obtain preservation of $\mu_A$]
Combining the induced-preimage decomposition from Step 3 with the infinite peeling identity from Step 4,
\begin{align*}
\mu(T_A^{-1}(E)) = \sum_{k=1}^{\infty} \mu(A_k \cap T^{-k}(E)) = \mu(E).
\end{align*}
Since $0 < \mu(A) \le \mu(X) < \infty$, the normalised measure $\mu_A$ from the statement is well-defined, and dividing by $\mu(A)$ yields
\begin{align*}
\mu_A(T_A^{-1}(E)) = \frac{\mu(T_A^{-1}(E))}{\mu(A)} = \frac{\mu(E)}{\mu(A)} = \mu_A(E).
\end{align*}
Step 1 established that $T_A^{-1}(E) \in \mathcal{B}_A$, so $T_A : A \to A$ is $\mathcal{B}_A$-measurable. The identity above therefore shows that $T_A$ preserves $\mu_A$, as claimed. Step 1 also shows the conclusion is independent of the particular measurable extension of $T_A$ from $A_*$ to the null set $A_\infty$.
[/step]