[proofplan]
We prove the operator identity by evaluating both compositions at an arbitrary vector $x\in H$. The inclusion $M\subset N$ implies that any vector orthogonal to $N$ is also orthogonal to $M$, so the residual of the $N$-projection does not affect the $M$-component. For the reverse composition, $P_Mx$ already lies in $M\subset N$, so projecting it onto $N$ leaves it unchanged.
[/proofplan]
[step:Record the defining orthogonal decomposition for each projection]
Let $x\in H$ be arbitrary. Since $P_M$ and $P_N$ are the orthogonal projections onto $M$ and $N$, respectively, we have
\begin{align*}
P_Mx\in M
\end{align*}
and
\begin{align*}
x-P_Mx\in M^\perp,
\end{align*}
where
\begin{align*}
M^\perp:=\{h\in H:(h,m)_H=0\text{ for every }m\in M\}.
\end{align*}
Similarly,
\begin{align*}
P_Nx\in N
\end{align*}
and
\begin{align*}
x-P_Nx\in N^\perp,
\end{align*}
where
\begin{align*}
N^\perp:=\{h\in H:(h,n)_H=0\text{ for every }n\in N\}.
\end{align*}
Because $M\subset N$, every vector orthogonal to $N$ is orthogonal to $M$, so
\begin{align*}
N^\perp\subset M^\perp.
\end{align*}
[/step]
[step:Show that projecting first onto $N$ preserves the $M$-component]
We prove that $P_M(P_Nx)=P_Mx$. Since $P_Mx\in M$, it is enough to verify that the residual $P_Nx-P_Mx$ is orthogonal to $M$, because the [orthogonal projection](/theorems/437) of a vector onto $M$ is the unique element of $M$ whose residual lies in $M^\perp$.
For every $m\in M$, the vector $x-P_Mx$ lies in $M^\perp$, so
\begin{align*}
(x-P_Mx,m)_H=0.
\end{align*}
Also $x-P_Nx\in N^\perp\subset M^\perp$, hence
\begin{align*}
(x-P_Nx,m)_H=0.
\end{align*}
Using linearity of the [inner product](/page/Inner%20Product) in the first argument,
\begin{align*}
(P_Nx-P_Mx,m)_H=((x-P_Mx)-(x-P_Nx),m)_H=0.
\end{align*}
Thus $P_Nx-P_Mx\in M^\perp$. Therefore $P_Mx\in M$ and $P_Nx-P_Mx\in M^\perp$ give the [orthogonal decomposition](/theorems/436) of $P_Nx$ relative to $M$, so
\begin{align*}
P_M(P_Nx)=P_Mx.
\end{align*}
[guided]
We want to understand the effect of first projecting $x$ onto the larger subspace $N$ and then projecting the result onto the smaller subspace $M$. The key point is that the part removed by $P_N$ is orthogonal to all of $N$, and therefore also orthogonal to the smaller subspace $M$.
Let $m\in M$ be arbitrary. From the definition of $P_Mx$, the residual $x-P_Mx$ lies in $M^\perp$, so
\begin{align*}
(x-P_Mx,m)_H=0.
\end{align*}
From the definition of $P_Nx$, the residual $x-P_Nx$ lies in $N^\perp$. Since $M\subset N$, the same residual is orthogonal to every element of $M$, and therefore
\begin{align*}
(x-P_Nx,m)_H=0.
\end{align*}
Now compare the candidate $M$-component $P_Mx$ with the vector being projected, namely $P_Nx$. Their difference is
\begin{align*}
P_Nx-P_Mx=(x-P_Mx)-(x-P_Nx).
\end{align*}
Taking the inner product with the arbitrary vector $m\in M$ and using linearity in the first argument gives
\begin{align*}
(P_Nx-P_Mx,m)_H=((x-P_Mx)-(x-P_Nx),m)_H=0.
\end{align*}
Because this holds for every $m\in M$, we have $P_Nx-P_Mx\in M^\perp$.
Thus $P_Mx$ lies in $M$, while the residual from $P_Nx$ to $P_Mx$ lies in $M^\perp$. This is exactly the defining orthogonal decomposition of $P_Nx$ with respect to $M$. By uniqueness of this decomposition, the orthogonal projection of $P_Nx$ onto $M$ is $P_Mx$, so
\begin{align*}
P_M(P_Nx)=P_Mx.
\end{align*}
[/guided]
[/step]
[step:Show that projecting first onto $M$ produces a vector fixed by $P_N$]
Since $P_Mx\in M$ and $M\subset N$, we have $P_Mx\in N$. The residual of $P_Mx$ from itself is the zero vector, and the zero vector belongs to $N^\perp$ because
\begin{align*}
(0,n)_H=0
\end{align*}
for every $n\in N$. Hence $P_Mx\in N$ and $P_Mx-P_Mx=0\in N^\perp$ give the orthogonal decomposition of $P_Mx$ relative to $N$. Therefore
\begin{align*}
P_N(P_Mx)=P_Mx.
\end{align*}
[/step]
[step:Conclude the two operator identities]
For the arbitrary vector $x\in H$, the previous two steps give
\begin{align*}
(P_M\circ P_N)(x)=P_M(P_Nx)=P_Mx
\end{align*}
and
\begin{align*}
(P_N\circ P_M)(x)=P_N(P_Mx)=P_Mx.
\end{align*}
Since these equalities hold for every $x\in H$, the maps $P_M\circ P_N$, $P_N\circ P_M$, and $P_M$ are equal as maps from $H$ to $H$. Therefore
\begin{align*}
P_M\circ P_N=P_N\circ P_M=P_M.
\end{align*}
[/step]