[proofplan]
We decompose the space according to the first time an orbit enters $A$, and decompose $A$ according to the first return time to $A$. Ergodicity implies that the set of points which never enter $A$ has measure zero, so the first-entry layers cover $X$ up to a null set and the first-return layers cover $A$ up to a null set. The key identity is that the preimage of the $(k-1)$-st first-entry layer splits disjointly into the $k$-th first-entry layer and the $k$-th first-return layer. Taking measures gives a telescoping formula whose sum is exactly $\int_A n_A(x)\,d\mu(x)=\mu(X)=1$.
[/proofplan]
[step:Define the first return and first entry layers]
Let $\mathbb{N}_0:=\{0,1,2,\ldots\}$, and extend the iterate notation by setting $T^0$ equal to the identity map on $X$.
For each $k\in\mathbb{N}$, define the first-return layer $A_k\in\mathcal{B}$ by
\begin{align*}
A_k:=\{x\in A:n_A(x)=k\}=A\cap\bigcap_{i=1}^{k-1}T^{-i}(A^c)\cap T^{-k}(A).
\end{align*}
For each $m\in\mathbb{N}_0$, define the first-entry layer $C_m\in\mathcal{B}$ by
\begin{align*}
C_0:=A,
\end{align*}
and, for $m\geq 1$,
\begin{align*}
C_m:=\bigcap_{i=0}^{m-1}T^{-i}(A^c)\cap T^{-m}(A).
\end{align*}
Finally define the never-entry set
\begin{align*}
N:=\bigcap_{i=0}^{\infty}T^{-i}(A^c).
\end{align*}
Since $A\in\mathcal{B}$ and $T$ is measurable, all sets $A_k$, $C_m$, and $N$ are measurable. The measurability of $n_A:A\to\mathbb{N}\cup\{\infty\}$ follows because $\{n_A=k\}=A_k\in\mathcal{B}_A$ for every $k\in\mathbb{N}$ and $\{n_A=\infty\}=A\setminus\bigcup_{k\in\mathbb{N}}A_k\in\mathcal{B}_A$. The sets $(C_m)_{m\in\mathbb{N}_0}$ are pairwise disjoint, and
\begin{align*}
X=N\sqcup \bigsqcup_{m=0}^{\infty}C_m.
\end{align*}
Indeed, every point either never enters $A$, or has a unique first time $m\in\mathbb{N}_0$ at which it enters $A$.
[/step]
[step:Use ergodicity to make the first entry layers cover $X$ almost everywhere]
We prove that $\mu(N)=0$. Since $N\subseteq T^{-1}(N)$ and $T$ preserves $\mu$, we have
\begin{align*}
\mu(T^{-r}(N))=\mu(N)
\end{align*}
for every $r\in\mathbb{N}_0$. Define
\begin{align*}
M:=\bigcup_{r=0}^{\infty}T^{-r}(N).
\end{align*}
The sets $T^{-r}(N)$ are increasing in $r$, so by [Continuity from Below of Measures](/theorems/???),
\begin{align*}
\mu(M)=\lim_{r\to\infty}\mu(T^{-r}(N))=\mu(N).
\end{align*}
Moreover,
\begin{align*}
T^{-1}(M)=\bigcup_{r=1}^{\infty}T^{-r}(N)=M,
\end{align*}
because $N\subseteq T^{-1}(N)$. Thus $M$ is $T$-invariant. Since $N\subseteq A^c$, we have
\begin{align*}
\mu(M)=\mu(N)\leq \mu(A^c)=1-\mu(A)<1.
\end{align*}
Ergodicity gives $\mu(M)\in\{0,1\}$, and the strict bound forces $\mu(M)=0$, hence $\mu(N)=0$.
Therefore
\begin{align*}
\mu\left(X\setminus \bigsqcup_{m=0}^{\infty}C_m\right)=0.
\end{align*}
Also the set of points in $A$ which never return to $A$ is
\begin{align*}
R:=A\cap\bigcap_{i=1}^{\infty}T^{-i}(A^c)=A\cap T^{-1}(N),
\end{align*}
so
\begin{align*}
\mu(R)\leq \mu(T^{-1}(N))=\mu(N)=0.
\end{align*}
Thus the sets $(A_k)_{k\in\mathbb{N}}$ partition $A$ up to a $\mu$-null set.
[guided]
We first show that almost every point eventually enters $A$. The obstruction is the measurable set
\begin{align*}
N=\bigcap_{i=0}^{\infty}T^{-i}(A^c),
\end{align*}
whose points never visit $A$ at any non-negative time. If $x\in N$, then $T^i(x)\in A^c$ for every $i\geq 0$, and therefore $T^{i+1}(x)\in A^c$ for every $i\geq 0$. Equivalently, $T(x)\in N$, so $x\in T^{-1}(N)$. Hence
\begin{align*}
N\subseteq T^{-1}(N).
\end{align*}
To use ergodicity, we need an invariant set. Define
\begin{align*}
M:=\bigcup_{r=0}^{\infty}T^{-r}(N).
\end{align*}
Because $N\subseteq T^{-1}(N)$, applying $T^{-r}$ gives
\begin{align*}
T^{-r}(N)\subseteq T^{-(r+1)}(N)
\end{align*}
for every $r\in\mathbb{N}_0$. Thus the union defining $M$ is increasing. Since $T$ preserves $\mu$, each iterate preserves measure under preimages:
\begin{align*}
\mu(T^{-r}(N))=\mu(N).
\end{align*}
By [Continuity from Below of Measures](/theorems/???),
\begin{align*}
\mu(M)=\lim_{r\to\infty}\mu(T^{-r}(N))=\mu(N).
\end{align*}
Now check invariance. We compute directly:
\begin{align*}
T^{-1}(M)
=T^{-1}\left(\bigcup_{r=0}^{\infty}T^{-r}(N)\right)
=\bigcup_{r=0}^{\infty}T^{-(r+1)}(N)
=\bigcup_{r=1}^{\infty}T^{-r}(N).
\end{align*}
Since $N\subseteq T^{-1}(N)$, the final union is the same as $M$. Therefore
\begin{align*}
T^{-1}(M)=M.
\end{align*}
Ergodicity applies to this invariant measurable set: $\mu(M)\in\{0,1\}$. Also $N\subseteq A^c$, so
\begin{align*}
\mu(M)=\mu(N)\leq \mu(A^c)=1-\mu(A)<1.
\end{align*}
Thus ergodicity forces $\mu(M)=0$, and consequently $\mu(N)=0$.
Since
\begin{align*}
X=N\sqcup \bigsqcup_{m=0}^{\infty}C_m,
\end{align*}
we obtain
\begin{align*}
\mu\left(X\setminus \bigsqcup_{m=0}^{\infty}C_m\right)=0.
\end{align*}
The same null set also gives recurrence to $A$ for almost every point of $A$. The set of points in $A$ which never return after time $0$ is
\begin{align*}
R:=A\cap\bigcap_{i=1}^{\infty}T^{-i}(A^c)=A\cap T^{-1}(N).
\end{align*}
Because $T$ preserves $\mu$ and $\mu(N)=0$,
\begin{align*}
\mu(R)\leq \mu(T^{-1}(N))=\mu(N)=0.
\end{align*}
Therefore the first-return layers $(A_k)_{k\in\mathbb{N}}$ cover $A$ up to a null set.
[/guided]
[/step]
[step:Relate first returns from $A$ to first entries into $A$]
For every $k\in\mathbb{N}$, we claim that
\begin{align*}
T^{-1}(C_{k-1})=C_k\sqcup A_k.
\end{align*}
Indeed, if $x\in T^{-1}(C_{k-1})$, then $T^k(x)\in A$ and $T^i(x)\in A^c$ for every $1\leq i\leq k-1$. If $x\in A$, then $x\in A_k$; if $x\in A^c$, then $x\in C_k$. This proves $T^{-1}(C_{k-1})\subseteq C_k\cup A_k$. The reverse inclusion follows immediately from the definitions of $C_k$ and $A_k$. The union is disjoint because $C_k\subseteq A^c$ while $A_k\subseteq A$.
Taking measures and using that $T$ preserves $\mu$ gives
\begin{align*}
\mu(C_{k-1})=\mu(T^{-1}(C_{k-1}))=\mu(C_k)+\mu(A_k).
\end{align*}
Thus, for every $k\in\mathbb{N}$,
\begin{align*}
\mu(A_k)=\mu(C_{k-1})-\mu(C_k).
\end{align*}
[guided]
Fix $k\in\mathbb{N}$. We compare two ways a point can land in the first-entry layer $C_{k-1}$ after one application of $T$. The set $T^{-1}(C_{k-1})$ consists of all points $x\in X$ such that $T(x)$ first enters $A$ after $k-1$ more iterates. Written out, this means
\begin{align*}
T^k(x)\in A
\end{align*}
and
\begin{align*}
T^i(x)\in A^c
\end{align*}
for every $1\leq i\leq k-1$.
There are exactly two cases, determined by whether $x$ itself lies in $A$. If $x\in A$, then the displayed conditions say that the first positive return of $x$ to $A$ occurs at time $k$, so $x\in A_k$. If $x\in A^c$, then $x$ has not entered $A$ at time $0$, has not entered $A$ at times $1,\ldots,k-1$, and enters at time $k$; hence $x\in C_k$. Therefore
\begin{align*}
T^{-1}(C_{k-1})\subseteq C_k\cup A_k.
\end{align*}
Conversely, if $x\in C_k$, then $T(x)$ first enters $A$ after $k-1$ further iterates, so $T(x)\in C_{k-1}$ and $x\in T^{-1}(C_{k-1})$. If $x\in A_k$, then $x\in A$, avoids $A$ at times $1,\ldots,k-1$, and returns to $A$ at time $k$, so again $T(x)\in C_{k-1}$ and $x\in T^{-1}(C_{k-1})$. Thus
\begin{align*}
T^{-1}(C_{k-1})=C_k\cup A_k.
\end{align*}
The two pieces are disjoint because $C_k\subseteq A^c$ and $A_k\subseteq A$, so
\begin{align*}
T^{-1}(C_{k-1})=C_k\sqcup A_k.
\end{align*}
Now take measures. Since $T$ is measure-preserving,
\begin{align*}
\mu(T^{-1}(C_{k-1}))=\mu(C_{k-1}).
\end{align*}
Since the union is disjoint,
\begin{align*}
\mu(T^{-1}(C_{k-1}))=\mu(C_k)+\mu(A_k).
\end{align*}
Combining the last two identities gives
\begin{align*}
\mu(A_k)=\mu(C_{k-1})-\mu(C_k).
\end{align*}
[/guided]
[/step]
[step:Convert the return time integral into a sum over return layers]
For each $N\in\mathbb{N}$, define the measurable simple function
\begin{align*}
s_N:A&\to[0,\infty)\\
x&\mapsto\sum_{k=1}^{N}k\,\mathbb{1}_{A_k}(x),
\end{align*}
where $\mathbb{1}_{A_k}:A\to\{0,1\}$ denotes the indicator map of $A_k$. The sequence $(s_N)_{N\in\mathbb{N}}$ is nonnegative and increasing, and $s_N(x)\to n_A(x)$ for $\mu$-almost every $x\in A$, because the exceptional set $R=A\setminus\bigcup_{k=1}^{\infty}A_k$ has measure zero. Applying the [Monotone Convergence Theorem](/theorems/???) on the measure space $(A,\mathcal{B}_A,\mu|_{\mathcal{B}_A})$ gives
\begin{align*}
\int_A n_A(x)\,d\mu(x)
=\lim_{N\to\infty}\int_A s_N(x)\,d\mu(x)
=\lim_{N\to\infty}\sum_{k=1}^{N}k\,\mu(A_k)
=\sum_{k=1}^{\infty}k\,\mu(A_k).
\end{align*}
[guided]
The first-return time is constant on each first-return layer $A_k$, with value $k$. To pass from this layer decomposition to the integral, define for each $N\in\mathbb{N}$ the simple function
\begin{align*}
s_N:A&\to[0,\infty)\\
x&\mapsto\sum_{k=1}^{N}k\,\mathbb{1}_{A_k}(x),
\end{align*}
where $\mathbb{1}_{A_k}:A\to\{0,1\}$ is the indicator map of $A_k$.
Each $s_N$ is measurable because each $A_k$ is measurable. The functions are nonnegative and increasing in $N$. Since the set
\begin{align*}
R=A\setminus\bigcup_{k=1}^{\infty}A_k
\end{align*}
has $\mu$-measure zero, every $x\in A\setminus R$ belongs to exactly one $A_k$, and for such an $x$ we have
\begin{align*}
\lim_{N\to\infty}s_N(x)=n_A(x).
\end{align*}
Thus $s_N\to n_A$ $\mu$-almost everywhere on $A$.
We apply the [Monotone Convergence Theorem](/theorems/???) to the measure space
\begin{align*}
(A,\mathcal{B}_A,\mu|_{\mathcal{B}_A}).
\end{align*}
The hypotheses are satisfied because $(s_N)$ is a nonnegative increasing sequence of measurable functions and it converges almost everywhere to $n_A$. Therefore
\begin{align*}
\int_A n_A(x)\,d\mu(x)
=\lim_{N\to\infty}\int_A s_N(x)\,d\mu(x).
\end{align*}
Since $s_N$ is a simple function supported on the disjoint sets $A_1,\ldots,A_N$,
\begin{align*}
\int_A s_N(x)\,d\mu(x)=\sum_{k=1}^{N}k\,\mu(A_k).
\end{align*}
Hence
\begin{align*}
\int_A n_A(x)\,d\mu(x)=\sum_{k=1}^{\infty}k\,\mu(A_k).
\end{align*}
[/guided]
[/step]
[step:Telescope the return layer sum to obtain total mass one]
Define the sequence
\begin{align*}
c:\mathbb{N}_0&\to[0,1]\\
m&\mapsto \mu(C_m).
\end{align*}
From the previous measure identity,
\begin{align*}
\mu(A_k)=c_{k-1}-c_k
\end{align*}
for every $k\in\mathbb{N}$. Since the sets $(C_m)_{m\in\mathbb{N}_0}$ are pairwise disjoint and cover $X$ up to a null set,
\begin{align*}
\sum_{m=0}^{\infty}c_m=\sum_{m=0}^{\infty}\mu(C_m)=\mu(X)=1.
\end{align*}
Also $c_k\leq c_{k-1}$ for every $k\in\mathbb{N}$.
For $N\in\mathbb{N}$,
\begin{align*}
\sum_{k=1}^{N}k\,\mu(A_k)
&=\sum_{k=1}^{N}k(c_{k-1}-c_k)\\
&=\sum_{m=0}^{N-1}(m+1)c_m-\sum_{m=1}^{N}m c_m\\
&=\sum_{m=0}^{N-1}c_m-Nc_N.
\end{align*}
It remains to show $Nc_N\to 0$. For each $N\geq 2$, let $q_N:=\lfloor N/2\rfloor$. Since $(c_m)$ is nonincreasing,
\begin{align*}
(N-q_N+1)c_N\leq \sum_{m=q_N}^{N}c_m\leq \sum_{m=q_N}^{\infty}c_m.
\end{align*}
Because $\sum_{m=0}^{\infty}c_m=1$, the tail $\sum_{m=q_N}^{\infty}c_m$ tends to $0$. Since
\begin{align*}
\frac{N}{N-q_N+1}\leq 2,
\end{align*}
we obtain
\begin{align*}
0\leq Nc_N\leq 2\sum_{m=q_N}^{\infty}c_m\to 0.
\end{align*}
Therefore
\begin{align*}
\sum_{k=1}^{\infty}k\,\mu(A_k)
=\lim_{N\to\infty}\sum_{k=1}^{N}k\,\mu(A_k)
=\sum_{m=0}^{\infty}c_m
=1.
\end{align*}
Combining this with the previous step gives
\begin{align*}
\int_A n_A(x)\,d\mu(x)=1.
\end{align*}
Dividing by $\mu(A)>0$ yields
\begin{align*}
\mathbb{E}_{\mu_A}[n_A]=\int_A n_A(x)\,d\mu_A(x)=\frac{1}{\mu(A)}\int_A n_A(x)\,d\mu(x)=\frac{1}{\mu(A)},
\end{align*}
which are the two forms of the desired conclusion.
[guided]
Set
\begin{align*}
c_m:=\mu(C_m)
\end{align*}
for $m\in\mathbb{N}_0$. The identity proved above gives, for every $k\in\mathbb{N}$,
\begin{align*}
\mu(A_k)=c_{k-1}-c_k.
\end{align*}
This is the telescoping mechanism of the proof: return-time mass is expressed as successive losses of first-entry mass.
Because the first-entry layers are disjoint and cover $X$ up to a null set,
\begin{align*}
\sum_{m=0}^{\infty}c_m=\sum_{m=0}^{\infty}\mu(C_m)=\mu(X)=1.
\end{align*}
Also the identity
\begin{align*}
c_{k-1}=c_k+\mu(A_k)
\end{align*}
shows that the sequence $(c_m)$ is nonincreasing.
Now compute the finite partial sums. For $N\in\mathbb{N}$,
\begin{align*}
\sum_{k=1}^{N}k\,\mu(A_k)
&=\sum_{k=1}^{N}k(c_{k-1}-c_k)\\
&=\sum_{k=1}^{N}k c_{k-1}-\sum_{k=1}^{N}k c_k.
\end{align*}
In the first sum, use the index change $m=k-1$; in the second sum, use $m=k$. This gives
\begin{align*}
\sum_{k=1}^{N}k\,\mu(A_k)
&=\sum_{m=0}^{N-1}(m+1)c_m-\sum_{m=1}^{N}m c_m\\
&=\sum_{m=0}^{N-1}c_m-Nc_N.
\end{align*}
The remaining point is to prove that the boundary term $Nc_N$ vanishes. For each $N\geq 2$, define
\begin{align*}
q_N:=\lfloor N/2\rfloor.
\end{align*}
Since $(c_m)$ is nonincreasing, every term $c_m$ with $q_N\leq m\leq N$ is at least $c_N$. Hence
\begin{align*}
(N-q_N+1)c_N\leq \sum_{m=q_N}^{N}c_m\leq \sum_{m=q_N}^{\infty}c_m.
\end{align*}
Because the full series $\sum_{m=0}^{\infty}c_m$ converges to $1$, its tails tend to zero:
\begin{align*}
\sum_{m=q_N}^{\infty}c_m\to 0.
\end{align*}
Moreover,
\begin{align*}
\frac{N}{N-q_N+1}\leq 2.
\end{align*}
Multiplying the preceding estimate by this bounded factor yields
\begin{align*}
0\leq Nc_N\leq 2\sum_{m=q_N}^{\infty}c_m\to 0.
\end{align*}
Taking $N\to\infty$ in the finite telescoping identity gives
\begin{align*}
\sum_{k=1}^{\infty}k\,\mu(A_k)
=\sum_{m=0}^{\infty}c_m
=1.
\end{align*}
From the integral representation already proved,
\begin{align*}
\int_A n_A(x)\,d\mu(x)
=\sum_{k=1}^{\infty}k\,\mu(A_k),
\end{align*}
and therefore
\begin{align*}
\int_A n_A(x)\,d\mu(x)=1.
\end{align*}
Finally, since $\mu_A=\mu/\mu(A)$ on $\mathcal{B}_A$,
\begin{align*}
\mathbb{E}_{\mu_A}[n_A]=\int_A n_A(x)\,d\mu_A(x)=\frac{1}{\mu(A)}\int_A n_A(x)\,d\mu(x)=\frac{1}{\mu(A)}.
\end{align*}
[/guided]
[/step]