[proofplan]
We define $\pi(v) = \sum_{i=1}^k (e_i, v)\,e_i$ and verify $v - \pi(v) \in W^\perp$, confirming this is the projection from the Orthogonal Decomposition.
The best approximation property follows from Pythagoras: $v - w$ decomposes into orthogonal pieces $(v - \pi(v))$ and $(\pi(v) - w)$, so $\|v - w\|^2 \geq \|v - \pi(v)\|^2$.
[/proofplan]
[step:Define $\pi$ and verify $v - \pi(v) \in W^\perp$]
Define $\pi: V \to W$ by $\pi(v) = \sum_{i=1}^k (e_i, v)\,e_i$.
Then $\pi(v) \in W$ since it is a linear combination of the basis vectors of $W$.
For each $j = 1, \dots, k$:
\begin{align*}
(e_j, v - \pi(v)) = (e_j, v) - \sum_{i=1}^k (e_i, v)\,(e_j, e_i) = (e_j, v) - (e_j, v) = 0.
\end{align*}
Since $(e_1, \dots, e_k)$ spans $W$, this shows $v - \pi(v) \in W^\perp$.
So the decomposition $v = \pi(v) + (v - \pi(v))$ is the [Orthogonal Decomposition](/theorems/436) of $v$ with respect to $W$.
[guided]
The formula $\pi(v) = \sum_{i=1}^k (e_i, v)\,e_i$ is the sum of the "components" of $v$ along each orthonormal basis vector.
The coefficient $(e_i, v)$ measures how much of $v$ lies in the direction of $e_i$.
To verify $v - \pi(v) \perp W$, it suffices to check against the basis vectors of $W$.
For each $j$:
\begin{align*}
(e_j, v - \pi(v)) = (e_j, v) - \sum_{i=1}^k (e_i, v)\,(e_j, e_i).
\end{align*}
Orthonormality gives $(e_j, e_i) = \delta_{ji}$, so the sum collapses to $(e_j, v)$, which cancels the first term.
The residual $v - \pi(v)$ is therefore orthogonal to every $e_j$, hence to all of $W$.
[/guided]
[/step]
[step:Prove the best approximation property via the Pythagorean theorem]
For any $w \in W$, write $v - w = (v - \pi(v)) + (\pi(v) - w)$.
Since $v - \pi(v) \in W^\perp$ and $\pi(v) - w \in W$, these two vectors are orthogonal.
By the Pythagorean theorem:
\begin{align*}
\|v - w\|^2 = \|v - \pi(v)\|^2 + \|\pi(v) - w\|^2 \geq \|v - \pi(v)\|^2,
\end{align*}
with equality if and only if $\|\pi(v) - w\|^2 = 0$, i.e., $w = \pi(v)$.
[guided]
The best approximation property says that $\pi(v)$ is the closest point in $W$ to $v$.
The proof exploits the orthogonal decomposition: the error $v - w$ splits into a "fixed" part $v - \pi(v)$ (the perpendicular distance to $W$, which does not depend on $w$) and a "variable" part $\pi(v) - w$ (which lies inside $W$).
Since these two parts are orthogonal, the Pythagorean theorem gives:
\begin{align*}
\|v - w\|^2 = \|v - \pi(v)\|^2 + \|\pi(v) - w\|^2.
\end{align*}
The first term is a constant (independent of $w$), and the second term is $\geq 0$ with equality iff $w = \pi(v)$.
So $\|v - w\|$ is minimised uniquely at $w = \pi(v)$.
This is the geometric content of orthogonal projection: the shortest path from a point to a subspace is the perpendicular one.
[/guided]
[/step]