[proofplan]
The proof is the standard Galerkin quasioptimality argument for a coercive bounded [bilinear form](/page/Bilinear%20Form) on a real [Hilbert space](/page/Hilbert%20Space). First subtract the continuous and discrete variational equations to obtain Galerkin orthogonality of the error against every discrete [test function](/page/Test%20Function). Then compare the error $u-u_h$ with an arbitrary discrete approximation $w_h \in V_h$; coercivity controls the error from below, while boundedness controls the same quantity from above by the approximation error $\|u-w_h\|_V$. Finally divide by the error norm, handle the zero-error case separately, and take the infimum over $w_h \in V_h$.
[/proofplan]
[step:Derive Galerkin orthogonality for the error]
Define the error vector $e \in V$ by $e := u-u_h$. Let $v_h \in V_h$ be arbitrary. Since $V_h \subset V$, the identity $a(u,v)=F(v)$ applies to $v=v_h$, while the discrete identity gives $a(u_h,v_h)=F(v_h)$. Because $a$ is bilinear by hypothesis, $a(e,v_h)=a(u-u_h,v_h)=a(u,v_h)-a(u_h,v_h)=F(v_h)-F(v_h)=0$. Thus $a(e,v_h)=0$ for every $v_h \in V_h$.
[/step]
[step:Compare the error with an arbitrary discrete approximation]
Fix an arbitrary element $w_h \in V_h$. Since $V_h$ is a linear subspace by hypothesis and $u_h,w_h \in V_h$, the vector $w_h-u_h$ belongs to $V_h$. Applying Galerkin orthogonality with $v_h=w_h-u_h$ gives $a(e,w_h-u_h)=0$. Using the identity $e=u-u_h=(u-w_h)+(w_h-u_h)$ and bilinearity of $a$ in the second variable, we obtain $a(e,e)=a(e,u-w_h)+a(e,w_h-u_h)=a(e,u-w_h)$. Coercivity applied to $e$ gives $\alpha\|e\|_V^2 \leq a(e,e)$.
Combining this with the preceding identity and then applying continuity of $a$ to the pair $(e,u-w_h)$ yields
\begin{align*}
\alpha\|e\|_V^2 \leq a(e,u-w_h) \leq |a(e,u-w_h)| \leq M\|e\|_V\|u-w_h\|_V.
\end{align*}
[guided]
Define the error vector $e \in V$ by $e := u-u_h$. First we rederive the orthogonality property that makes the comparison possible. Let $v_h \in V_h$ be arbitrary. Since $V_h \subset V$, the continuous variational identity applies to $v_h$, and the discrete variational identity also applies to $v_h$. Hence $a(u,v_h)=F(v_h)$ and $a(u_h,v_h)=F(v_h)$. Subtracting these two equalities and using the bilinearity of $a$ from the theorem hypotheses gives $a(e,v_h)=a(u-u_h,v_h)=a(u,v_h)-a(u_h,v_h)=0$. Thus $a(e,v_h)=0$ for every $v_h \in V_h$.
Now we compare the actual discrete solution $u_h$ with an arbitrary discrete candidate $w_h \in V_h$. This is the key move because the final estimate must hold against the best possible approximation from $V_h$. Since $V_h$ is a linear subspace by hypothesis and $u_h,w_h \in V_h$, the difference $w_h-u_h$ belongs to $V_h$. Applying the orthogonality just proved with $v_h=w_h-u_h$ gives $a(e,w_h-u_h)=0$. Insert $w_h$ into the error: $e=u-u_h=(u-w_h)+(w_h-u_h)$. Bilinearity of $a$ in the second variable gives $a(e,e)=a(e,u-w_h)+a(e,w_h-u_h)$. The second term vanishes by Galerkin orthogonality, so $a(e,e)=a(e,u-w_h)$.
This identity replaces the unknown error $e$ in the second argument by the arbitrary approximation error $u-w_h$. Coercivity now gives the lower bound $\alpha\|e\|_V^2 \leq a(e,e)$. Using $a(e,e)=a(e,u-w_h)$ and then applying the boundedness estimate for $a$ with the pair $(e,u-w_h)$, we get
\begin{align*}
\alpha\|e\|_V^2 \leq a(e,u-w_h) \leq |a(e,u-w_h)| \leq M\|e\|_V\|u-w_h\|_V.
\end{align*}
All hypotheses used here have now been accounted for: coercivity supplies the lower bound, boundedness supplies the upper bound, and Galerkin orthogonality removes the discrete component $w_h-u_h$.
It remains to convert this estimate for a fixed $w_h$ into the stated best-approximation estimate. If $\|e\|_V=0$, then $\|u-u_h\|_V=0$, so
\begin{align*}
\|u-u_h\|_V \leq \frac{M}{\alpha}\inf_{v_h \in V_h}\|u-v_h\|_V.
\end{align*}
If instead $\|e\|_V>0$, then the displayed estimate gives
\begin{align*}
\alpha\|e\|_V^2 \leq M\|e\|_V\|u-w_h\|_V.
\end{align*}
Since $\alpha>0$ and $\|e\|_V>0$, division by $\alpha\|e\|_V$ is valid and yields
\begin{align*}
\|e\|_V \leq \frac{M}{\alpha}\|u-w_h\|_V.
\end{align*}
The element $w_h \in V_h$ was arbitrary, so taking the infimum over all $w_h \in V_h$ gives
\begin{align*}
\|u-u_h\|_V=\|e\|_V \leq \frac{M}{\alpha}\inf_{w_h \in V_h}\|u-w_h\|_V.
\end{align*}
This is exactly the quasioptimality estimate.
[/guided]
[/step]
[step:Divide by the error norm and take the best approximation]
If $\|e\|_V=0$, then $\|u-u_h\|_V=0 \leq \frac{M}{\alpha}\inf_{v_h \in V_h}\|u-v_h\|_V$, so the desired estimate holds.
Assume now that $\|e\|_V>0$. From the estimate obtained for the arbitrary element $w_h \in V_h$,
\begin{align*}
\alpha\|e\|_V^2 \leq M\|e\|_V\|u-w_h\|_V.
\end{align*}
Dividing by the positive number $\alpha\|e\|_V$ gives
\begin{align*}
\|e\|_V \leq \frac{M}{\alpha}\|u-w_h\|_V.
\end{align*}
Since $w_h \in V_h$ was arbitrary, taking the infimum over all $w_h \in V_h$ yields
\begin{align*}
\|u-u_h\|_V=\|e\|_V \leq \frac{M}{\alpha}\inf_{w_h \in V_h}\|u-w_h\|_V.
\end{align*}
This is the asserted quasioptimality estimate.
[/step]