[proofplan]
We prove the identity by testing both tangent vectors on an arbitrary smooth germ at the identity. The curve $t\mapsto \operatorname{Ad}_{\exp(tX)}Y$ is computed by differentiating the conjugation curve $s\mapsto \exp(tX)\exp(sY)\exp(-tX)$ at $s=0$. The Lie bracket with the left-invariant convention is computed from the flows of $X^L$ and $Y^L$, and the two resulting mixed derivatives agree with the positive sign.
[/proofplan]
[step:Represent tangent vectors as derivations on smooth functions]
Let $X,Y\in\mathfrak g$. Let $\mathfrak X(G)=\Gamma(TG)$ denote the real [vector space](/page/Vector%20Space) of smooth vector fields on $G$. For each $g\in G$, let $L_g:G\to G$ denote the left translation map $L_g(h)=gh$. Let $X^L,Y^L\in\mathfrak X(G)$ denote the unique left-invariant vector fields satisfying $X^L_e=X$, $Y^L_e=Y$, $X^L_g=d(L_g)_e(X)$, and $Y^L_g=d(L_g)_e(Y)$ for every $g\in G$.
Let $f:G\to\mathbb R$ be a smooth function defined on an open neighbourhood of $e$. We regard a tangent vector $v\in T_eG$ as the derivation
\begin{align*}
v(f)=df_e(v).
\end{align*}
It is enough to prove that, for every such $f$,
\begin{align*}
\operatorname{ad}_X(Y)(f)=[X^L,Y^L]_e(f),
\end{align*}
because tangent vectors at $e$ are determined by their values on smooth germs at $e$.
[/step]
[step:Compute the infinitesimal adjoint action as a mixed derivative]
Define the one-parameter curve
\begin{align*}
a:\mathbb R\to G
\end{align*}
by $a(t)=\exp(tX)$. For each $t\in\mathbb R$, the curve
\begin{align*}
\gamma_t:\mathbb R\to G
\end{align*}
defined by
\begin{align*}
\gamma_t(s)=C_{a(t)}(\exp(sY))=a(t)\exp(sY)a(t)^{-1}
\end{align*}
satisfies $\gamma_t(0)=e$. Therefore, by the definition of pushforward,
\begin{align*}
\operatorname{Ad}_{a(t)}Y
=
d(C_{a(t)})_e(Y)
=
\left.\frac{d}{ds}\right|_{s=0}a(t)\exp(sY)a(t)^{-1}.
\end{align*}
Applying this tangent vector to $f$ gives
\begin{align*}
(\operatorname{Ad}_{a(t)}Y)(f)
=
\left.\frac{d}{ds}\right|_{s=0}
f(a(t)\exp(sY)a(t)^{-1}).
\end{align*}
Since the map $(t,s)\mapsto f(a(t)\exp(sY)a(t)^{-1})$ is smooth near $(0,0)$, differentiating with respect to $t$ at $0$ gives
\begin{align*}
\operatorname{ad}_X(Y)(f)
=
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}
f(a(t)\exp(sY)a(t)^{-1}).
\end{align*}
[guided]
The adjoint action is defined as the differential of conjugation at the identity, so to understand $\operatorname{Ad}_{a(t)}Y$ we must choose a curve through $e$ with initial velocity $Y$. The exponential curve
\begin{align*}
s\mapsto \exp(sY)
\end{align*}
has initial velocity $Y$ at $s=0$. Conjugating this curve by $a(t)=\exp(tX)$ gives the curve
\begin{align*}
\gamma_t:\mathbb R\to G
\end{align*}
defined by
\begin{align*}
\gamma_t(s)=a(t)\exp(sY)a(t)^{-1}.
\end{align*}
Because $\gamma_t(0)=a(t)ea(t)^{-1}=e$, its initial velocity lies in $T_eG$. By the definition of the differential,
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\gamma_t(s)
=
d(C_{a(t)})_e(Y)
=
\operatorname{Ad}_{a(t)}Y.
\end{align*}
Testing this tangent vector against the smooth function $f$ gives
\begin{align*}
(\operatorname{Ad}_{a(t)}Y)(f)
=
\left.\frac{d}{ds}\right|_{s=0}
f(a(t)\exp(sY)a(t)^{-1}).
\end{align*}
Now the infinitesimal adjoint action is the derivative in $t$ of this curve in $T_eG$. Since multiplication, inversion, the exponential map, and $f$ are smooth, the two-variable function
\begin{align*}
(t,s)\mapsto f(a(t)\exp(sY)a(t)^{-1})
\end{align*}
is smooth near $(0,0)$. Hence the derivative defining $\operatorname{ad}_X(Y)$ is the mixed derivative
\begin{align*}
\operatorname{ad}_X(Y)(f)
=
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}
f(a(t)\exp(sY)a(t)^{-1}).
\end{align*}
This formula is the place where the sign is fixed: the conjugating element $a(t)$ appears on the left and its inverse appears on the right.
[/guided]
[/step]
[step:Compute the left-invariant bracket by differentiating flows]
For a left-invariant vector field $Z^L$ with $Z\in\mathfrak g$, its flow is the smooth map
\begin{align*}
\Phi^Z:\mathbb R\times G\to G
\end{align*}
given by
\begin{align*}
\Phi^Z_r(g)=g\exp(rZ).
\end{align*}
Indeed, $\Phi^Z_0(g)=g$, and the velocity of $r\mapsto g\exp(rZ)$ at time $r$ is $d(L_{g\exp(rZ)})_e(Z)=Z^L_{g\exp(rZ)}$.
Using the derivation definition of the vector-field bracket, we have
\begin{align*}
[X^L,Y^L]_e(f)=X^L_e(Y^L f)-Y^L_e(X^L f).
\end{align*}
The first term is
\begin{align*}
X^L_e(Y^L f)
=
\left.\frac{d}{dt}\right|_{t=0}
(Y^L f)(\exp(tX)).
\end{align*}
Since the flow of $Y^L$ through $\exp(tX)$ is $s\mapsto \exp(tX)\exp(sY)$,
\begin{align*}
(Y^L f)(\exp(tX))
=
\left.\frac{d}{ds}\right|_{s=0}
f(\exp(tX)\exp(sY)).
\end{align*}
Thus
\begin{align*}
X^L_e(Y^L f)
=
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}
f(\exp(tX)\exp(sY)).
\end{align*}
Similarly,
\begin{align*}
Y^L_e(X^L f)
=
\left.\frac{d}{ds}\right|_{s=0}
\left.\frac{d}{dt}\right|_{t=0}
f(\exp(sY)\exp(tX)).
\end{align*}
Therefore
\begin{align*}
[X^L,Y^L]_e(f)
=
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}
f(\exp(tX)\exp(sY))
-
\left.\frac{d}{ds}\right|_{s=0}
\left.\frac{d}{dt}\right|_{t=0}
f(\exp(sY)\exp(tX)).
\end{align*}
[/step]
[step:Match the two mixed derivatives and obtain the identity]
Define
\begin{align*}
F:\mathbb R^2\to\mathbb R
\end{align*}
by
\begin{align*}
F(t,s)=f(\exp(tX)\exp(sY)\exp(-tX)).
\end{align*}
For each fixed $t$, the $s$-derivative at $s=0$ is exactly
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}F(t,s)
=
(\operatorname{Ad}_{\exp(tX)}Y)(f),
\end{align*}
so
\begin{align*}
\operatorname{ad}_X(Y)(f)
=
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}F(t,s).
\end{align*}
To compare this with the bracket, fix $s$ near $0$ and apply the product rule to the smooth map
\begin{align*}
t\mapsto \exp(tX)\exp(sY)\exp(-tX)
\end{align*}
at $t=0$. More explicitly, write the multiplication map as $m:G\times G\to G$, $m(g,h)=gh$, and define $p_s:G\times G\to G$ by $p_s(g,h)=g\exp(sY)h$. Then the curve above is $p_s(\exp(tX),\exp(-tX))$. Since $\frac{d}{dt}|_{t=0}\exp(tX)=X\in T_eG$ and $\frac{d}{dt}|_{t=0}\exp(-tX)=-X\in T_eG$, the chain rule gives its velocity at $t=0$ as
\begin{align*}
d(p_s)_{(e,e)}(X,-X).
\end{align*}
By linearity of the differential in the product tangent space $T_{(e,e)}(G\times G)\cong T_eG\oplus T_eG$, this is the sum of the contribution from $(X,0)$ and the contribution from $(0,-X)$. After applying $df_{\exp(sY)}$, the first contribution is represented by $t\mapsto f(\exp(tX)\exp(sY))$, and the second contribution is the negative of the contribution represented by $t\mapsto f(\exp(sY)\exp(tX))$. Therefore
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}F(t,s)
=
\left.\frac{d}{dt}\right|_{t=0}f(\exp(tX)\exp(sY))
-
\left.\frac{d}{dt}\right|_{t=0}f(\exp(sY)\exp(tX)).
\end{align*}
The functions of $(t,s)$ appearing here are smooth near $(0,0)$, because multiplication, inversion, the exponential map, and $f$ are smooth. Hence equality of mixed partial derivatives gives
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\left.\frac{d}{dt}\right|_{t=0}f(\exp(tX)\exp(sY))
=
\left.\frac{d}{dt}\right|_{t=0}\left.\frac{d}{ds}\right|_{s=0}f(\exp(tX)\exp(sY)).
\end{align*}
Differentiating the displayed identity for $\partial_tF(0,s)$ with respect to $s$ at $s=0$ and using this equality of mixed partial derivatives gives
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}
f(\exp(tX)\exp(sY)\exp(-tX))
=
[X^L,Y^L]_e(f).
\end{align*}
[guided]
We now compare the conjugation calculation with the bracket calculation. Define $F:\mathbb R^2\to\mathbb R$ by
\begin{align*}
F(t,s)=f(\exp(tX)\exp(sY)\exp(-tX)).
\end{align*}
For fixed $s$, the curve in the argument of $f$ has three factors. The middle factor $\exp(sY)$ is constant in $t$, so we isolate the two moving outer factors. Define $p_s:G\times G\to G$ by $p_s(g,h)=g\exp(sY)h$. Then
\begin{align*}
F(t,s)=f(p_s(\exp(tX),\exp(-tX))).
\end{align*}
The velocities of the two factor curves at $t=0$ are $X\in T_eG$ and $-X\in T_eG$. Hence the chain rule gives the velocity of $t\mapsto p_s(\exp(tX),\exp(-tX))$ at $t=0$ as
\begin{align*}
d(p_s)_{(e,e)}(X,-X).
\end{align*}
Using the vector-space identification $T_{(e,e)}(G\times G)\cong T_eG\oplus T_eG$ and the linearity of $d(p_s)_{(e,e)}$, this tangent vector is the sum of the contribution from $(X,0)$ and the contribution from $(0,-X)$. After applying $df_{\exp(sY)}$, the first contribution is represented by
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}f(\exp(tX)\exp(sY)),
\end{align*}
and the second contribution is
\begin{align*}
-\left.\frac{d}{dt}\right|_{t=0}f(\exp(sY)\exp(tX)).
\end{align*}
Therefore, for every $s$ near $0$,
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}F(t,s)
=
\left.\frac{d}{dt}\right|_{t=0}f(\exp(tX)\exp(sY))
-
\left.\frac{d}{dt}\right|_{t=0}f(\exp(sY)\exp(tX)).
\end{align*}
Now differentiate this identity with respect to $s$ at $s=0$. Because multiplication, inversion, the exponential map, and $f$ are smooth, all two-variable functions involved are smooth near $(0,0)$. Thus the mixed partial derivatives commute. In particular,
\begin{align*}
\left.\frac{d}{ds}\right|_{s=0}\left.\frac{d}{dt}\right|_{t=0}f(\exp(tX)\exp(sY))
=
\left.\frac{d}{dt}\right|_{t=0}\left.\frac{d}{ds}\right|_{s=0}f(\exp(tX)\exp(sY)).
\end{align*}
The bracket computation from the previous step states that
\begin{align*}
[X^L,Y^L]_e(f)
=
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}
f(\exp(tX)\exp(sY))
-
\left.\frac{d}{ds}\right|_{s=0}
\left.\frac{d}{dt}\right|_{t=0}
f(\exp(sY)\exp(tX)).
\end{align*}
The equality of mixed partial derivatives just established converts the derivative in $s$ of the right-hand side of the identity for $\partial_tF(0,s)$ into this displayed bracket expression. Consequently,
\begin{align*}
\left.\frac{d}{dt}\right|_{t=0}
\left.\frac{d}{ds}\right|_{s=0}
f(\exp(tX)\exp(sY)\exp(-tX))
=
[X^L,Y^L]_e(f).
\end{align*}
The left-hand side is the value of $\operatorname{ad}_X(Y)$ on $f$, so the two tangent vectors have the same action on the arbitrary smooth germ $f$ at $e$.
[/guided]
Combining this with the computation of the infinitesimal adjoint action yields
\begin{align*}
\operatorname{ad}_X(Y)(f)=[X^L,Y^L]_e(f).
\end{align*}
Since this holds for every smooth germ $f$ at $e$, the tangent vectors are equal:
\begin{align*}
\operatorname{ad}_X(Y)=[X^L,Y^L]_e=[X,Y].
\end{align*}
This proves the theorem.
[/step]