[proofplan]
We prove the formula locally in an arbitrary frame and then use $\operatorname{Ad}$-invariance to see that the resulting forms glue globally. Along the affine path of connections, the curvature varies by the covariant [exterior derivative](/theorems/1525) of the difference form, namely $\frac{\partial}{\partial t}F_{\nabla^t}=d_{\nabla^t}A$. Differentiating the polarized invariant polynomial gives the factor $k$, and the Bianchi identity plus infinitesimal $\operatorname{Ad}$-invariance identifies the derivative with an exterior derivative. Integrating the resulting identity over $t\in[0,1]$ gives the transgression formula.
[/proofplan]
[step:Fix the polarized Chern-Weil convention in a local frame]
Let $U\subset M$ be an [open set](/page/Open%20Set) over which $E$ admits a smooth frame
\begin{align*}
s=(s_1,\dots,s_r):U\times \mathbb C^r\to E|_U.
\end{align*}
In this frame, write
\begin{align*}
\Gamma_t\in \Omega^1(U;\mathfrak{gl}(r,\mathbb C))
\end{align*}
for the connection matrix of $\nabla^t$, write
\begin{align*}
a\in \Omega^1(U;\mathfrak{gl}(r,\mathbb C))
\end{align*}
for the matrix of $A$, and write
\begin{align*}
\Omega_t\in \Omega^2(U;\mathfrak{gl}(r,\mathbb C))
\end{align*}
for the curvature matrix of $\nabla^t$. Thus
\begin{align*}
\Gamma_t=\Gamma_0+ta
\end{align*}
and
\begin{align*}
\Omega_t=d\Gamma_t+\Gamma_t\wedge \Gamma_t.
\end{align*}
The expression $P(\beta_1,\dots,\beta_k)$ for matrix-valued forms $\beta_j\in \Omega^{m_j}(U;\mathfrak{gl}(r,\mathbb C))$ denotes the form obtained from the symmetric polarization of $P$ by wedging the form parts and composing the matrix parts in the standard Chern-Weil symmetrized convention. Since $P$ is $\operatorname{Ad}$-invariant, this local expression is unchanged under change of frame, so the local forms $P(\Omega_t,\dots,\Omega_t)$ and $P(a,\Omega_t,\dots,\Omega_t)$ represent the global forms $P(F_{\nabla^t})$ and $P(A,F_{\nabla^t},\dots,F_{\nabla^t})$.
[/step]
[step:Compute the curvature variation along the affine path]
For an $\operatorname{End}(E)$-valued $m$-form $\beta$ represented locally by a matrix-valued form $b\in \Omega^m(U;\mathfrak{gl}(r,\mathbb C))$, the covariant exterior derivative $d_{\nabla^t}\beta$ is represented by
\begin{align*}
d_{\Gamma_t}b:=db+\Gamma_t\wedge b-(-1)^m b\wedge \Gamma_t.
\end{align*}
Since $a$ is independent of $t$, differentiating the local curvature formula gives
\begin{align*}
\frac{\partial \Omega_t}{\partial t}=da+a\wedge \Gamma_t+\Gamma_t\wedge a.
\end{align*}
Because $a$ has degree $1$, the preceding formula is precisely
\begin{align*}
\frac{\partial \Omega_t}{\partial t}=d_{\Gamma_t}a.
\end{align*}
Equivalently, in global notation,
\begin{align*}
\frac{\partial}{\partial t}F_{\nabla^t}=d_{\nabla^t}A.
\end{align*}
[guided]
The curvature of $\nabla^t$ is easiest to differentiate after choosing a frame, because a connection then becomes a matrix of $1$-forms. In the chosen frame on $U$, the affine path is represented by
\begin{align*}
\Gamma_t=\Gamma_0+ta,
\end{align*}
where $a$ is the matrix of the globally defined form $A=\nabla^1-\nabla^0$. The curvature matrix is
\begin{align*}
\Omega_t=d\Gamma_t+\Gamma_t\wedge \Gamma_t.
\end{align*}
Differentiating with respect to $t$ is legitimate coefficientwise because $\Gamma_t$ depends smoothly and affinely on $t$. We obtain
\begin{align*}
\frac{\partial \Omega_t}{\partial t}=d a+a\wedge \Gamma_t+\Gamma_t\wedge a.
\end{align*}
The order of the two product terms matters because the wedge product also composes matrix entries. Now recall the local formula for the covariant exterior derivative on an $\operatorname{End}(E)$-valued $m$-form represented by $b$:
\begin{align*}
d_{\Gamma_t}b=db+\Gamma_t\wedge b-(-1)^m b\wedge \Gamma_t.
\end{align*}
Here $b=a$ has degree $m=1$, so $-(-1)^1=1$, and therefore
\begin{align*}
d_{\Gamma_t}a=da+\Gamma_t\wedge a+a\wedge \Gamma_t.
\end{align*}
This is exactly the derivative computed above. Translating out of the frame gives the global identity
\begin{align*}
\frac{\partial}{\partial t}F_{\nabla^t}=d_{\nabla^t}A.
\end{align*}
[/guided]
[/step]
[step:Differentiate the invariant polynomial evaluated on curvature]
For each $t\in[0,1]$, the polarized form of $P$ gives
\begin{align*}
P(F_{\nabla^t})=P(F_{\nabla^t},\dots,F_{\nabla^t}).
\end{align*}
Since $P$ is symmetric and $k$-linear, differentiation with respect to $t$ and the curvature variation formula give
\begin{align*}
\frac{\partial}{\partial t}P(F_{\nabla^t})=kP(d_{\nabla^t}A,F_{\nabla^t},\dots,F_{\nabla^t}).
\end{align*}
[/step]
[step:Convert the covariant derivative term into an exterior derivative]
We claim that, for each $t\in[0,1]$,
\begin{align*}
dP(A,F_{\nabla^t},\dots,F_{\nabla^t})=P(d_{\nabla^t}A,F_{\nabla^t},\dots,F_{\nabla^t}).
\end{align*}
In the local frame on $U$, the Bianchi identity says
\begin{align*}
d_{\Gamma_t}\Omega_t=0.
\end{align*}
The covariant Leibniz rule for an invariant polynomial gives
\begin{align*}
dP(a,\Omega_t,\dots,\Omega_t)=P(d_{\Gamma_t}a,\Omega_t,\dots,\Omega_t)-\sum_{j=2}^{k}P(a,\Omega_t,\dots,d_{\Gamma_t}\Omega_t,\dots,\Omega_t).
\end{align*}
The minus sign before the sum occurs because $a$ has degree $1$ and each $\Omega_t$ has degree $2$. Each summand in the sum vanishes by the Bianchi identity. The remaining commutator terms in passing from $d$ to $d_{\Gamma_t}$ cancel by differentiating the $GL(r,\mathbb C)$-$\operatorname{Ad}$-invariance of $P$. Explicitly, for all $X,Y_1,\dots,Y_k\in\mathfrak{gl}(r,\mathbb C)$,
\begin{align*}
\sum_{j=1}^{k}P(Y_1,\dots,[X,Y_j],\dots,Y_k)=0.
\end{align*}
This identity is applied coefficientwise after expanding the matrix-valued forms in local coordinates, with $X$ supplied by the connection matrix component of $\Gamma_t$ and the $Y_j$ supplied by the matrix components of $a$ and $\Omega_t$. Hence
\begin{align*}
dP(a,\Omega_t,\dots,\Omega_t)=P(d_{\Gamma_t}a,\Omega_t,\dots,\Omega_t).
\end{align*}
Since this identity is invariant under change of frame, it is the asserted global identity.
[guided]
The goal of this step is to replace the covariant expression produced by differentiating curvature with an ordinary exterior derivative, because the theorem claims that the difference of Chern-Weil forms is exact.
Work in the same local frame. The curvature matrix $\Omega_t$ satisfies the Bianchi identity
\begin{align*}
d_{\Gamma_t}\Omega_t=0.
\end{align*}
This identity applies because $\Omega_t$ is the curvature of the connection represented by $\Gamma_t$.
Now apply the covariant Leibniz rule to the invariant polarized expression
\begin{align*}
P(a,\Omega_t,\dots,\Omega_t)\in \Omega^{2k-1}(U).
\end{align*}
The first input $a$ has degree $1$, and every curvature input $\Omega_t$ has degree $2$. Therefore the sign accumulated when the derivative passes the first input is negative, while passing curvature factors introduces no additional odd-degree sign. The rule gives
\begin{align*}
dP(a,\Omega_t,\dots,\Omega_t)=P(d_{\Gamma_t}a,\Omega_t,\dots,\Omega_t)-\sum_{j=2}^{k}P(a,\Omega_t,\dots,d_{\Gamma_t}\Omega_t,\dots,\Omega_t).
\end{align*}
The sum vanishes term by term because $d_{\Gamma_t}\Omega_t=0$. Thus only the derivative of the first input remains.
Why is the left side the ordinary exterior derivative $d$ rather than a covariant derivative of a matrix-valued form? The expression $P(a,\Omega_t,\dots,\Omega_t)$ is scalar-valued after applying the invariant polynomial. The difference between expanding with $d$ and expanding with $d_{\Gamma_t}$ consists exactly of commutator terms involving the connection matrix $\Gamma_t$.
We now spell out the cancellation. Since $P$ is invariant under the adjoint action of $GL(r,\mathbb C)$, its symmetric polarization satisfies the infinitesimal identity
\begin{align*}
\sum_{j=1}^{k}P(Y_1,\dots,[X,Y_j],\dots,Y_k)=0
\end{align*}
for every $X,Y_1,\dots,Y_k\in\mathfrak{gl}(r,\mathbb C)$. This follows by differentiating at $\lambda=0$ the identity $P(\operatorname{Ad}_{\exp(\lambda X)}Y_1,\dots,\operatorname{Ad}_{\exp(\lambda X)}Y_k)=P(Y_1,\dots,Y_k)$. Applying this formula coefficientwise to the matrix components of $\Gamma_t$, $a$, and $\Omega_t$ cancels precisely the commutator contributions. Hence
\begin{align*}
dP(a,\Omega_t,\dots,\Omega_t)=P(d_{\Gamma_t}a,\Omega_t,\dots,\Omega_t).
\end{align*}
Because $P$ is $\operatorname{Ad}$-invariant, the same identity holds after changing frames, so the local calculation represents the global formula
\begin{align*}
dP(A,F_{\nabla^t},\dots,F_{\nabla^t})=P(d_{\nabla^t}A,F_{\nabla^t},\dots,F_{\nabla^t}).
\end{align*}
[/guided]
[/step]
[step:Integrate the derivative identity over the path of connections]
Combining the previous two steps gives, for every $t\in[0,1]$,
\begin{align*}
\frac{\partial}{\partial t}P(F_{\nabla^t})=k\,dP(A,F_{\nabla^t},\dots,F_{\nabla^t}).
\end{align*}
Integrating this identity over $[0,1]$ with respect to one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) gives
\begin{align*}
P(F_{\nabla^1})-P(F_{\nabla^0})=\int_{[0,1]}\frac{\partial}{\partial t}P(F_{\nabla^t})\,d\mathcal L^1(t).
\end{align*}
Substituting the derivative identity yields
\begin{align*}
P(F_{\nabla^1})-P(F_{\nabla^0})=k\int_{[0,1]}dP(A,F_{\nabla^t},\dots,F_{\nabla^t})\,d\mathcal L^1(t).
\end{align*}
The exterior derivative commutes with integration over the compact parameter interval because the integrand is a smooth family of differential forms on $M$. Therefore
\begin{align*}
P(F_{\nabla^1})-P(F_{\nabla^0})=d\left(k\int_{[0,1]}P(A,F_{\nabla^t},\dots,F_{\nabla^t})\,d\mathcal L^1(t)\right).
\end{align*}
This is the claimed Chern-Weil homotopy formula.
[/step]