[proofplan]
The argument rests on the flow formula for the Lie bracket, $[X,Y]|_p = \lim_{t \to 0} t^{-1}((\varphi_{-t})_* Y|_p - Y|_p)$, which we first establish by Taylor expansion in a chart. Using this formula, $[X,Y] = 0$ is equivalent to invariance of $Y$ under the flow of $X$, i.e. $(\varphi_t)_* Y = Y$ for all admissible $t$. Finally, the naturality of flows under pushforward, which states that the flow of $\alpha_* Z$ is $\alpha \circ \psi_t \circ \alpha^{-1}$, converts invariance of $Y$ into commutativity of the two flows.
[/proofplan]
[step:Establish the flow formula $[X,Y]|_p = \lim_{t \to 0} t^{-1}((\varphi_{-t})_* Y|_p - Y|_p)$ via Taylor expansion]
Fix $p \in M$ and choose a chart $(U, x_1, \dots, x_n)$ around $p$ on which $\varphi_t$ is defined for $|t| < \varepsilon$. We work componentwise in this chart. Let $f \in C^\infty(M)$. We compute the two expansions in $t$:
\begin{align*}
f \circ \varphi_{-t}(q) &= f(q) - t (X \cdot f)(q) + O(t^2) \quad \text{(uniformly for } q \text{ in a compact neighbourhood of } p\text{)}, \\
Y|_{\varphi_t(p)} &= Y|_p + t (X \cdot Y)|_p + O(t^2) \quad \text{(componentwise in the chart)}.
\end{align*}
Here $(X \cdot Y)|_p$ denotes the chart-wise directional derivative $\sum_i X_i(p) \partial_{x_i} Y_j(p) \, \partial_{x_j}|_p$, which depends on the chart but cancels in the final combination.
The pushforward of $Y$ by $\varphi_{-t}$ acts on $f$ by the change-of-variables rule
\begin{align*}
((\varphi_{-t})_* Y)|_p (f) = Y|_{\varphi_t(p)}(f \circ \varphi_{-t}).
\end{align*}
Substituting both expansions and retaining terms up to order $t$:
\begin{align*}
Y|_{\varphi_t(p)}(f \circ \varphi_{-t}) &= \bigl(Y|_p + t(X \cdot Y)|_p\bigr)\bigl(f - t\, X \cdot f\bigr) + O(t^2) \\
&= Y|_p(f) - t\, Y|_p(X \cdot f) + t\, (X \cdot Y)|_p(f) + O(t^2).
\end{align*}
Subtracting $Y|_p(f)$, dividing by $t$, and sending $t \to 0$:
\begin{align*}
\lim_{t \to 0} \frac{((\varphi_{-t})_* Y)|_p - Y|_p}{t}(f) = (X \cdot Y)|_p(f) - Y|_p(X \cdot f) = X(Yf)|_p - Y(Xf)|_p = [X,Y]|_p(f),
\end{align*}
where the last equality uses the definition of the Lie bracket as the commutator of derivations. Since $f$ was arbitrary, this establishes the flow formula
\begin{align*}
[X,Y]|_p = \lim_{t \to 0} \frac{(\varphi_{-t})_* Y|_p - Y|_p}{t}.
\end{align*}
[guided]
We want to express $[X,Y]$ in terms of the flow $\varphi_t$ of $X$. The idea is: the bracket should measure how much $Y$ fails to be invariant under the flow of $X$. To capture this, we pull $Y$ back to $p$ via $(\varphi_{-t})_*$ and compare with $Y|_p$; the first-order discrepancy will be the bracket.
Fix $p \in M$ and a chart $(U, x_1, \dots, x_n)$ around $p$ with $\varphi_t$ defined for $|t| < \varepsilon$. We compute in this chart. Let $f \in C^\infty(M)$.
**Taylor expansion of $f$ along the flow.** The function $t \mapsto f \circ \varphi_{-t}(q)$ has derivative at $t = 0$ equal to $-X \cdot f(q)$, by the defining property of the flow ($\frac{d}{dt}\varphi_t|_{t=0} = X$). Hence
\begin{align*}
f \circ \varphi_{-t}(q) = f(q) - t (X \cdot f)(q) + O(t^2),
\end{align*}
uniformly for $q$ in a compact neighbourhood of $p$.
**Taylor expansion of $Y$ along the flow.** In the chart, $Y|_{\varphi_t(p)}$ has components $Y_i(\varphi_t(p))$. Expanding in $t$:
\begin{align*}
Y|_{\varphi_t(p)} = Y|_p + t (X \cdot Y)|_p + O(t^2),
\end{align*}
where $(X \cdot Y)|_p := \sum_i X_i(p) \partial_{x_i} Y_j(p) \, \partial_{x_j}|_p$ is the chart-wise derivative of the components. This is chart-dependent, but the bracket combination $X \cdot Y - Y \cdot X$ is not.
**Pushforward formula.** By the definition of the pushforward under a diffeomorphism, a vector $v \in T_q M$ pushes to a vector $(\varphi_{-t})_* v \in T_{\varphi_{-t}(q)} M$ acting on $f$ by $(\varphi_{-t})_* v (f) = v(f \circ \varphi_{-t})$. Applied with $q = \varphi_t(p)$ and $v = Y|_{\varphi_t(p)}$, we get $(\varphi_{-t})_* Y|_p (f) = Y|_{\varphi_t(p)}(f \circ \varphi_{-t})$.
**Combining the two expansions.** Substitute and keep terms up to $O(t)$:
\begin{align*}
Y|_{\varphi_t(p)}(f \circ \varphi_{-t}) &= \bigl(Y|_p + t(X \cdot Y)|_p + O(t^2)\bigr)\bigl(f - t\,(X \cdot f) + O(t^2)\bigr) \\
&= Y|_p(f) + t\bigl[(X \cdot Y)|_p(f) - Y|_p(X \cdot f)\bigr] + O(t^2).
\end{align*}
Why does the $O(t^2)$ survive? The vector $Y|_p$ is a linear functional, so it acts linearly on the remainder $O(t^2)$ term in $f \circ \varphi_{-t}$, giving $O(t^2)$. Similarly $(X \cdot Y)|_p$ acting on $O(t)$ terms contributes $O(t^2)$.
**Identifying the bracket.** Subtracting $Y|_p(f)$, dividing by $t$, and letting $t \to 0$:
\begin{align*}
\lim_{t \to 0} \frac{(\varphi_{-t})_* Y|_p - Y|_p}{t}(f) = (X \cdot Y)|_p(f) - Y|_p(X \cdot f).
\end{align*}
In any chart, $(X \cdot Y)|_p(f) = X(Yf)|_p - Y|_p(X \cdot f) + Y|_p(X \cdot f) = X(Yf)|_p$... Wait: a cleaner way. By the Leibniz calculation in coordinates, $(X \cdot Y)|_p(f) = \sum_{i,j} X_i(p)(\partial_{x_i} Y_j)(p) (\partial_{x_j} f)(p) = X(Yf)|_p - Y|_p(X \cdot f) \cdot 0$... Let us verify directly. Write $Yf = \sum_j Y_j \partial_{x_j} f$. Then $X(Yf) = \sum_{i,j}X_i \partial_{x_i}(Y_j \partial_{x_j} f) = \sum_{i,j}X_i (\partial_{x_i} Y_j)(\partial_{x_j} f) + \sum_{i,j} X_i Y_j \partial_{x_i}\partial_{x_j} f$. The first sum equals $(X \cdot Y)(f)$, and the second equals $Y(Xf) - (Y \cdot X)(f)$ where $(Y \cdot X)(f) = \sum_{i,j} Y_j (\partial_{x_j} X_i)(\partial_{x_i} f)$. In particular, one reads off
\begin{align*}
(X \cdot Y)|_p(f) - Y|_p(X \cdot f) = X(Yf)|_p - Y(Xf)|_p = [X,Y]|_p(f).
\end{align*}
Since $f$ was arbitrary, this identifies the limit as $[X,Y]|_p$, establishing the flow formula.
This formula is the technical heart of the theorem. It converts the algebraic object $[X,Y]$ into an analytic statement about $Y$ near $p$ as observed through the flow of $X$.
[/guided]
[/step]
[step:Restate $[X,Y] = 0$ as invariance $(\varphi_t)_* Y = Y$ along the flow of $X$]
We claim that $[X,Y] = 0$ on $M$ if and only if $(\varphi_t)_* Y|_p = Y|_{\varphi_t(p)}$, equivalently $(\varphi_{-t})_* Y|_{\varphi_t(p)} = Y|_p$, for all $(p, t)$ in the domain of the flow.
($\Leftarrow$) If $(\varphi_{-t})_* Y|_{\varphi_t(p)} = Y|_p$ for all admissible $t$, then the difference in the flow formula vanishes identically in $t$, so $[X,Y]|_p = 0$.
($\Rightarrow$) Suppose $[X,Y] \equiv 0$. Fix $p$ and let $I \subset \mathbb{R}$ be the connected open interval around $0$ on which $\varphi_t(p)$ is defined. Consider the smooth curve
\begin{align*}
\gamma: I &\to T_p M, \\
t &\mapsto (\varphi_{-t})_* Y|_{\varphi_t(p)}.
\end{align*}
(Both the domain of the flow near $p$ and the target tangent space are well-defined, since $(\varphi_{-t})_*: T_{\varphi_t(p)}M \to T_p M$.) We compute its derivative. For $s, t \in I$ with $s + t \in I$, the group law $\varphi_{s+t} = \varphi_s \circ \varphi_t$ gives $(\varphi_{-(s+t)})_* = (\varphi_{-t})_* \circ (\varphi_{-s})_*$, so
\begin{align*}
\gamma(t+s) - \gamma(t) = (\varphi_{-t})_* \bigl[(\varphi_{-s})_* Y|_{\varphi_{t+s}(p)} - Y|_{\varphi_t(p)}\bigr].
\end{align*}
Dividing by $s$, applying $(\varphi_{-t})_*$ (a linear map on $T_{\varphi_t(p)}M$, hence continuous), and using the flow formula at the point $\varphi_t(p)$:
\begin{align*}
\frac{d\gamma}{ds}\bigg|_{s=0}(t) = (\varphi_{-t})_* \bigl([X,Y]|_{\varphi_t(p)}\bigr) = 0
\end{align*}
since $[X,Y] \equiv 0$. Hence $\gamma$ is constant on $I$, and $\gamma(0) = Y|_p$, so $(\varphi_{-t})_* Y|_{\varphi_t(p)} = Y|_p$ for all $t \in I$, which is the claimed invariance.
[guided]
We have the flow formula $[X,Y]|_p = \lim_{t \to 0} t^{-1}((\varphi_{-t})_* Y|_p - Y|_p)$. We want to upgrade a pointwise identity $[X,Y] = 0$ to a global invariance statement: $(\varphi_t)_* Y = Y$ for all admissible $t$. The idea is that the flow formula is just the derivative at $t = 0$ of the curve $t \mapsto (\varphi_{-t})_* Y|_{\varphi_t(p)}$ in $T_p M$; the group law $\varphi_{s+t} = \varphi_s \circ \varphi_t$ lets us transport the vanishing derivative at $0$ to all $t$.
($\Leftarrow$) If invariance holds, then the numerator $(\varphi_{-t})_* Y|_{\varphi_t(p)} - Y|_p$ in the flow formula vanishes identically, so the limit is zero, giving $[X,Y]|_p = 0$.
($\Rightarrow$) Now suppose $[X,Y] \equiv 0$ on $M$. Fix $p$ and let $I \subset \mathbb{R}$ be the maximal connected open interval around $0$ with $\varphi_t(p)$ defined. Define
\begin{align*}
\gamma: I &\to T_p M, \\
t &\mapsto (\varphi_{-t})_* Y|_{\varphi_t(p)}.
\end{align*}
**Why this curve?** The statement $(\varphi_t)_* Y = Y$ is equivalent to $\gamma(t) = \gamma(0) = Y|_p$, i.e. $\gamma$ is constant. To show constancy of a smooth curve in a finite-dimensional vector space, it suffices to show its derivative is zero everywhere.
**Computing $d\gamma/dt$.** We use the group law. For $s, t$ with $s + t \in I$, $\varphi_{s+t} = \varphi_s \circ \varphi_t$, hence $(\varphi_{-(s+t)})_* = (\varphi_{-t})_* \circ (\varphi_{-s})_*$ on tangent vectors. Therefore
\begin{align*}
\gamma(t+s) = (\varphi_{-(s+t)})_* Y|_{\varphi_{s+t}(p)} = (\varphi_{-t})_* \bigl[(\varphi_{-s})_* Y|_{\varphi_s(\varphi_t(p))}\bigr].
\end{align*}
The inner bracket is precisely the curve $\gamma$ associated with the base point $\varphi_t(p)$ in place of $p$, evaluated at $s$. Call this curve $\gamma_{\varphi_t(p)}$. Then
\begin{align*}
\gamma(t+s) - \gamma(t) = (\varphi_{-t})_* \bigl[\gamma_{\varphi_t(p)}(s) - \gamma_{\varphi_t(p)}(0)\bigr].
\end{align*}
Dividing by $s$ and letting $s \to 0$, by continuity of the linear map $(\varphi_{-t})_*: T_{\varphi_t(p)}M \to T_p M$:
\begin{align*}
\frac{d\gamma}{ds}\bigg|_{s=0}(t+\cdot) = (\varphi_{-t})_* \bigl([X,Y]|_{\varphi_t(p)}\bigr),
\end{align*}
using the flow formula at the base point $\varphi_t(p)$. Since $[X,Y] \equiv 0$, this derivative is zero for all $t \in I$.
**Conclusion.** $\gamma$ has zero derivative on a connected interval $I$, so it is constant. Since $\gamma(0) = (\varphi_0)_* Y|_p = Y|_p$, we get $\gamma(t) = Y|_p$ for all $t \in I$, i.e. $(\varphi_{-t})_* Y|_{\varphi_t(p)} = Y|_p$. Applying $(\varphi_t)_*$ to both sides yields $Y|_{\varphi_t(p)} = (\varphi_t)_* Y|_p$, which is the claimed invariance.
What would fail if we only had $[X,Y]|_p = 0$ at a single point $p$? The ODE argument above requires $[X,Y] = 0$ along the entire orbit $\{\varphi_t(p) : t \in I\}$ in order to kill $d\gamma/dt$ at every $t$. The flow formula is a local linearisation; to propagate invariance globally we need vanishing of the bracket globally.
[/guided]
[/step]
[step:Translate invariance of $Y$ into commutativity of the flows via $\alpha_* Z$ having flow $\alpha \circ \psi_t \circ \alpha^{-1}$]
We now invoke the following general fact: if $\alpha: M \to M$ is a diffeomorphism and $Z \in \Gamma(TM)$ has flow $\psi_t$, then $\alpha_* Z$ has flow $\alpha \circ \psi_t \circ \alpha^{-1}$.
We verify this quickly. Define $\rho_t := \alpha \circ \psi_t \circ \alpha^{-1}$. Then $\rho_0 = \operatorname{id}$, $\rho_{s+t} = \rho_s \circ \rho_t$ (group law inherited from $\psi$), and
\begin{align*}
\frac{d}{dt}\bigg|_{t=0} \rho_t(q) = \frac{d}{dt}\bigg|_{t=0} \alpha(\psi_t(\alpha^{-1}(q))) = d\alpha_{\alpha^{-1}(q)} \bigl(Z|_{\alpha^{-1}(q)}\bigr) = (\alpha_* Z)|_q
\end{align*}
by the definition of pushforward. Thus $\rho_t$ is the flow of $\alpha_* Z$ by uniqueness of flows.
Apply this with $\alpha = \varphi_s$ (which is a local diffeomorphism where defined) and $Z = Y$, so that $Z$ has flow $\psi_t$. Then the flow of $(\varphi_s)_* Y$ is $\varphi_s \circ \psi_t \circ \varphi_{-s}$.
By the previous step, $[X,Y] = 0$ if and only if $(\varphi_s)_* Y = Y$ for all admissible $s$. By uniqueness of flows, two vector fields have the same flow iff they are equal. Hence $(\varphi_s)_* Y = Y$ for all $s$ is equivalent to
\begin{align*}
\varphi_s \circ \psi_t \circ \varphi_{-s} = \psi_t \quad \text{for all admissible } (s, t),
\end{align*}
which rearranges to $\varphi_s \circ \psi_t = \psi_t \circ \varphi_s$. This is the commutativity of flows.
Chaining the equivalences:
\begin{align*}
[X,Y] = 0 \iff (\varphi_s)_* Y = Y \text{ for all } s \iff \varphi_s \circ \psi_t = \psi_t \circ \varphi_s \text{ for all } s,t,
\end{align*}
which is the desired conclusion. This completes the proof.
[guided]
We have reduced the theorem to the equivalence "$(\varphi_s)_* Y = Y$ for all $s$" $\iff$ "$\varphi_s \circ \psi_t = \psi_t \circ \varphi_s$". The bridge is a standard fact about how flows transform under pushforward by a diffeomorphism.
**Fact (pushforward of a vector field transforms the flow by conjugation).** If $\alpha: M \to M$ is a diffeomorphism and $Z \in \Gamma(TM)$ has flow $\psi_t$, then the flow of $\alpha_* Z$ is $t \mapsto \alpha \circ \psi_t \circ \alpha^{-1}$.
**Proof of the fact.** Define $\rho_t := \alpha \circ \psi_t \circ \alpha^{-1}$. To show $\rho_t$ is a flow and that it is generated by $\alpha_* Z$:
(i) $\rho_0 = \alpha \circ \operatorname{id} \circ \alpha^{-1} = \operatorname{id}$.
(ii) $\rho_{s+t} = \alpha \circ \psi_{s+t} \circ \alpha^{-1} = \alpha \circ \psi_s \circ \psi_t \circ \alpha^{-1} = (\alpha \circ \psi_s \circ \alpha^{-1})(\alpha \circ \psi_t \circ \alpha^{-1}) = \rho_s \circ \rho_t$.
(iii) The generating vector field at $q$ is $\frac{d}{dt}|_{t=0} \rho_t(q)$. By the chain rule,
\begin{align*}
\frac{d}{dt}\bigg|_{t=0} \rho_t(q) = d\alpha_{\alpha^{-1}(q)}\bigl(\tfrac{d}{dt}\big|_{t=0}\psi_t(\alpha^{-1}(q))\bigr) = d\alpha_{\alpha^{-1}(q)}\bigl(Z|_{\alpha^{-1}(q)}\bigr).
\end{align*}
By the definition of the pushforward, $(\alpha_* Z)|_q := d\alpha_{\alpha^{-1}(q)}(Z|_{\alpha^{-1}(q)})$. Hence the generator of $\rho$ is $\alpha_* Z$. By uniqueness of integral curves, $\rho_t$ is the flow of $\alpha_* Z$.
**Applying the fact.** Let $\alpha = \varphi_s$ (fixed $s$) and $Z = Y$. Then $\varphi_s$ is a local diffeomorphism where defined, and $Y$ has flow $\psi_t$. By the fact, the flow of $(\varphi_s)_* Y$ is $t \mapsto \varphi_s \circ \psi_t \circ \varphi_{-s}$.
**Combining equivalences.** From the previous step, $[X,Y] = 0$ is equivalent to $(\varphi_s)_* Y = Y$ for all admissible $s$. Two vector fields are equal if and only if they generate the same flow (by uniqueness of ODE solutions). Hence $(\varphi_s)_* Y = Y$ is equivalent to the flow of $(\varphi_s)_* Y$ matching the flow of $Y$:
\begin{align*}
\varphi_s \circ \psi_t \circ \varphi_{-s} = \psi_t \quad \text{for all admissible } t.
\end{align*}
Rearranging (compose both sides with $\varphi_s$ on the right): $\varphi_s \circ \psi_t = \psi_t \circ \varphi_s$. This must hold for all $s, t$ in the domain of the respective flows, which is precisely commutativity of $\varphi$ and $\psi$.
**Summary of the chain.**
\begin{align*}
[X,Y] = 0 \;\;\overset{\text{Step 1,2}}{\iff}\;\; (\varphi_s)_* Y = Y \;\;\overset{\text{uniqueness of flows}}{\iff}\;\; \varphi_s \circ \psi_t = \psi_t \circ \varphi_s.
\end{align*}
This is the required biconditional, completing the proof.
[/guided]
[/step]