[guided]The goal of this step is to replace the covariant expression produced by differentiating curvature with an ordinary exterior derivative, because the theorem claims that the difference of Chern-Weil forms is exact.
Work in the same local frame. The curvature matrix $\Omega_t$ satisfies the Bianchi identity
\begin{align*}
d_{\Gamma_t}\Omega_t=0.
\end{align*}
This identity applies because $\Omega_t$ is the curvature of the connection represented by $\Gamma_t$.
Now apply the covariant Leibniz rule to the invariant polarized expression
\begin{align*}
P(a,\Omega_t,\dots,\Omega_t)\in \Omega^{2k-1}(U).
\end{align*}
The first input $a$ has degree $1$, and every curvature input $\Omega_t$ has degree $2$. Therefore the sign accumulated when the derivative passes the first input is negative, while passing curvature factors introduces no additional odd-degree sign. The rule gives
\begin{align*}
dP(a,\Omega_t,\dots,\Omega_t)=P(d_{\Gamma_t}a,\Omega_t,\dots,\Omega_t)-\sum_{j=2}^{k}P(a,\Omega_t,\dots,d_{\Gamma_t}\Omega_t,\dots,\Omega_t).
\end{align*}
The sum vanishes term by term because $d_{\Gamma_t}\Omega_t=0$. Thus only the derivative of the first input remains.
Why is the left side the ordinary exterior derivative $d$ rather than a covariant derivative of a matrix-valued form? The expression $P(a,\Omega_t,\dots,\Omega_t)$ is scalar-valued after applying the invariant polynomial. The difference between expanding with $d$ and expanding with $d_{\Gamma_t}$ consists exactly of commutator terms involving the connection matrix $\Gamma_t$.
We now spell out the cancellation. Since $P$ is invariant under the adjoint action of $GL(r,\mathbb C)$, its symmetric polarization satisfies the infinitesimal identity
\begin{align*}
\sum_{j=1}^{k}P(Y_1,\dots,[X,Y_j],\dots,Y_k)=0
\end{align*}
for every $X,Y_1,\dots,Y_k\in\mathfrak{gl}(r,\mathbb C)$. This follows by differentiating at $\lambda=0$ the identity $P(\operatorname{Ad}_{\exp(\lambda X)}Y_1,\dots,\operatorname{Ad}_{\exp(\lambda X)}Y_k)=P(Y_1,\dots,Y_k)$. Applying this formula coefficientwise to the matrix components of $\Gamma_t$, $a$, and $\Omega_t$ cancels precisely the commutator contributions. Hence
\begin{align*}
dP(a,\Omega_t,\dots,\Omega_t)=P(d_{\Gamma_t}a,\Omega_t,\dots,\Omega_t).
\end{align*}
Because $P$ is $\operatorname{Ad}$-invariant, the same identity holds after changing frames, so the local calculation represents the global formula
\begin{align*}
dP(A,F_{\nabla^t},\dots,F_{\nabla^t})=P(d_{\nabla^t}A,F_{\nabla^t},\dots,F_{\nabla^t}).
\end{align*}[/guided]