[guided]Define the error vector $e \in V$ by $e := u-u_h$. First we rederive the orthogonality property that makes the comparison possible. Let $v_h \in V_h$ be arbitrary. Since $V_h \subset V$, the continuous variational identity applies to $v_h$, and the discrete variational identity also applies to $v_h$. Hence $a(u,v_h)=F(v_h)$ and $a(u_h,v_h)=F(v_h)$. Subtracting these two equalities and using the bilinearity of $a$ from the theorem hypotheses gives $a(e,v_h)=a(u-u_h,v_h)=a(u,v_h)-a(u_h,v_h)=0$. Thus $a(e,v_h)=0$ for every $v_h \in V_h$.
Now we compare the actual discrete solution $u_h$ with an arbitrary discrete candidate $w_h \in V_h$. This is the key move because the final estimate must hold against the best possible approximation from $V_h$. Since $V_h$ is a linear subspace by hypothesis and $u_h,w_h \in V_h$, the difference $w_h-u_h$ belongs to $V_h$. Applying the orthogonality just proved with $v_h=w_h-u_h$ gives $a(e,w_h-u_h)=0$. Insert $w_h$ into the error: $e=u-u_h=(u-w_h)+(w_h-u_h)$. Bilinearity of $a$ in the second variable gives $a(e,e)=a(e,u-w_h)+a(e,w_h-u_h)$. The second term vanishes by Galerkin orthogonality, so $a(e,e)=a(e,u-w_h)$.
This identity replaces the unknown error $e$ in the second argument by the arbitrary approximation error $u-w_h$. Coercivity now gives the lower bound $\alpha\|e\|_V^2 \leq a(e,e)$. Using $a(e,e)=a(e,u-w_h)$ and then applying the boundedness estimate for $a$ with the pair $(e,u-w_h)$, we get
\begin{align*}
\alpha\|e\|_V^2 \leq a(e,u-w_h) \leq |a(e,u-w_h)| \leq M\|e\|_V\|u-w_h\|_V.
\end{align*}
All hypotheses used here have now been accounted for: coercivity supplies the lower bound, boundedness supplies the upper bound, and Galerkin orthogonality removes the discrete component $w_h-u_h$.
It remains to convert this estimate for a fixed $w_h$ into the stated best-approximation estimate. If $\|e\|_V=0$, then $\|u-u_h\|_V=0$, so
\begin{align*}
\|u-u_h\|_V \leq \frac{M}{\alpha}\inf_{v_h \in V_h}\|u-v_h\|_V.
\end{align*}
If instead $\|e\|_V>0$, then the displayed estimate gives
\begin{align*}
\alpha\|e\|_V^2 \leq M\|e\|_V\|u-w_h\|_V.
\end{align*}
Since $\alpha>0$ and $\|e\|_V>0$, division by $\alpha\|e\|_V$ is valid and yields
\begin{align*}
\|e\|_V \leq \frac{M}{\alpha}\|u-w_h\|_V.
\end{align*}
The element $w_h \in V_h$ was arbitrary, so taking the infimum over all $w_h \in V_h$ gives
\begin{align*}
\|u-u_h\|_V=\|e\|_V \leq \frac{M}{\alpha}\inf_{w_h \in V_h}\|u-w_h\|_V.
\end{align*}
This is exactly the quasioptimality estimate.[/guided]