[guided]The pivot of the entire representer theorem is here: the reproducing kernel structure forces the orthogonal complement $\mathcal{A}^\perp$ to consist of functions that vanish at the training inputs. Once we know this, the loss term cannot distinguish between $f$ and its projection $g$ onto $\mathcal{A}$.
Fix $f \in \mathcal{H}_\phi$ and decompose $f = g + f^\perp$ as in Step 1, with $g \in \mathcal{A}$ and $f^\perp \in \mathcal{A}^\perp$. Pick any training index $j \in \{1, \ldots, N\}$. The kernel section $k_\phi(\cdot, x^j)$ is one of the spanning vectors of $\mathcal{A} = \operatorname{Span}\{k_\phi(\cdot, x^i) : 1 \le i \le N\}$, so $k_\phi(\cdot, x^j) \in \mathcal{A}$.
By the very definition of $\mathcal{A}^\perp = \{h \in \mathcal{H}_\phi : (h, g')_{\mathcal{H}_\phi} = 0 \text{ for all } g' \in \mathcal{A}\}$, every $h \in \mathcal{A}^\perp$ is orthogonal to every element of $\mathcal{A}$, in particular to $k_\phi(\cdot, x^j)$:
\begin{align*}
(f^\perp, k_\phi(\cdot, x^j))_{\mathcal{H}_\phi} = 0.
\end{align*}
Now we apply the **reproducing property** of $k_\phi$: for any $h \in \mathcal{H}_\phi$ and any $z \in \widetilde{\mathcal{C}_p}$,
\begin{align*}
h(z) = (h, k_\phi(\cdot, z))_{\mathcal{H}_\phi}.
\end{align*}
This is the defining property of an RKHS and is what makes evaluation $h \mapsto h(z)$ a continuous linear functional. Apply this with $h = f^\perp$ and $z = x^j$:
\begin{align*}
f^\perp(x^j) = (f^\perp, k_\phi(\cdot, x^j))_{\mathcal{H}_\phi} = 0.
\end{align*}
The first equality is reproducing, the second is the orthogonality just established.
This holds for every $j \in \{1, \ldots, N\}$, so $f^\perp$ vanishes on the training input set $\{x^1, \ldots, x^N\}$. Note that $f^\perp$ is generally **not** the zero function — it can be non-zero off the training set. The point is precisely that the reproducing kernel structure allows non-zero elements of $\mathcal{H}_\phi$ that nonetheless vanish at any specified finite set of inputs (the $\mathcal{A}^\perp$ space is exactly such a space).
Why is this the engine of the proof? Because the loss in the next step depends on $f$ only through the values $f(x^j)$, and we have just shown $f(x^j) = g(x^j) + f^\perp(x^j) = g(x^j) + 0 = g(x^j)$. So the loss cannot tell $f$ from $g$, even though they differ as functions in $\mathcal{H}_\phi$.[/guided]