[proofplan]
We expand $\mathbb{E}[\mathbb{E}[X \mid Y]]$ using the definition of conditional expectation and the law of total probability for the joint distribution of $(X, Y)$. Interchanging the order of summation reduces the double sum to $\mathbb{E}[X]$.
[/proofplan]
custom_env
admin
[step:Expand $\mathbb{E}[\mathbb{E}[X \mid Y]]$ using the definition of conditional expectation]
By the definition of conditional expectation, $\mathbb{E}[X \mid Y = y] = \sum_x x\,\mathbb{P}(X = x \mid Y = y)$. The random variable $\mathbb{E}[X \mid Y]$ takes the value $\mathbb{E}[X \mid Y = y]$ when $Y = y$. Therefore
\begin{align*}
\mathbb{E}[\mathbb{E}[X \mid Y]] &= \sum_y \mathbb{E}[X \mid Y = y]\,\mathbb{P}(Y = y) \\
&= \sum_y \left(\sum_x x\,\mathbb{P}(X = x \mid Y = y)\right)\mathbb{P}(Y = y).
\end{align*}
[/step]
custom_env
admin
[step:Multiply through by $\mathbb{P}(Y = y)$ and interchange summation to recover $\mathbb{E}[X]$]
By the definition of conditional probability, $\mathbb{P}(X = x \mid Y = y)\,\mathbb{P}(Y = y) = \mathbb{P}(X = x, Y = y)$ (for those $y$ with $\mathbb{P}(Y = y) > 0$; terms with $\mathbb{P}(Y = y) = 0$ contribute zero to the sum). Substituting:
\begin{align*}
\mathbb{E}[\mathbb{E}[X \mid Y]] &= \sum_y \sum_x x\,\mathbb{P}(X = x, Y = y) \\
&= \sum_x x \sum_y \mathbb{P}(X = x, Y = y) \\
&= \sum_x x\,\mathbb{P}(X = x) \\
&= \mathbb{E}[X],
\end{align*}
where interchanging the order of summation is justified by the absolute convergence $\sum_x |x|\,\mathbb{P}(X = x) = \mathbb{E}[|X|] < \infty$ (given by hypothesis), and we used the marginalisation identity $\sum_y \mathbb{P}(X = x, Y = y) = \mathbb{P}(X = x)$.[/step]
custom_env
admin
[guided]The key step is recognising that $\mathbb{P}(X = x \mid Y = y) \cdot \mathbb{P}(Y = y)$ is just the joint probability $\mathbb{P}(X = x, Y = y)$. This converts the "conditional then average" computation into a double sum over the joint distribution:
\begin{align*}
\sum_y \sum_x x\,\mathbb{P}(X = x, Y = y).
\end{align*}
At this point, we interchange the order of summation (justified by $\mathbb{E}[|X|] < \infty$, which guarantees absolute convergence of the double sum). The inner sum $\sum_y \mathbb{P}(X = x, Y = y)$ is the marginal probability $\mathbb{P}(X = x)$ — summing the joint probability over all values of $Y$ recovers the marginal of $X$. This leaves $\sum_x x\,\mathbb{P}(X = x) = \mathbb{E}[X]$.
The result is often summarised as "the law of iterated expectation" or "tower property": first condition on $Y$, then average out $Y$, and you recover the unconditional expectation.[/guided]