[guided]We arrive at Step 3 with the expansion from Step 2,
\begin{align*}
R(X, Y) Z &= \nabla_Y \nabla_X Z - \nabla_X \nabla_Y Z + \nabla_{\nabla_X Y} Z - \nabla_{\nabla_Y X} Z,
\end{align*}
and we want to massage it into the operator form $\nabla_{[X,Y]} - [\nabla_X, \nabla_Y]$. What is the obstruction? Two non-tensorial pieces appeared: $\nabla_{\nabla_X Y} Z$ came from $\nabla^2 Z(X, Y)$, and $\nabla_{\nabla_Y X} Z$ came from $\nabla^2 Z(Y, X)$. Individually neither is tensorial in $X$ or $Y$ (each differentiates the field in the other slot via $\nabla$). But their *difference* depends only on the antisymmetric part of the map $(X, Y) \mapsto \nabla_X Y$, and the antisymmetric part of $\nabla$ is precisely the torsion. So if the connection is torsion-free, the difference should collapse to something tensorial. Let us make this precise.
The Levi-Civita connection is by definition **torsion-free**, meaning that for all $X, Y \in \mathfrak{X}(M)$,
\begin{align*}
\nabla_X Y - \nabla_Y X &= [X, Y].
\end{align*}
This is not a coincidence we need to prove here — it is one of the two defining properties of Levi-Civita (the other being metric compatibility). We invoke it directly from the connection's definition.
Why is this exactly what we need? The connection $\nabla_W Z$ is $C^\infty(M)$-linear in its lower index $W$, so it is in particular $\mathbb{R}$-linear:
\begin{align*}
\nabla_{\nabla_X Y} Z - \nabla_{\nabla_Y X} Z &= \nabla_{\nabla_X Y - \nabla_Y X} Z = \nabla_{[X, Y]} Z.
\end{align*}
The two non-tensorial pieces have collapsed to a single tensorial term $\nabla_{[X, Y]} Z$, exactly as we hoped. This is the step where torsion-freeness is consumed; on a connection with non-zero torsion $T(X, Y) := \nabla_X Y - \nabla_Y X - [X, Y]$, the right-hand side would acquire an additional $\nabla_{T(X, Y)} Z$ correction.
Now substitute back into the Step 2 expression:
\begin{align*}
R(X, Y) Z &= \nabla_Y \nabla_X Z - \nabla_X \nabla_Y Z + \nabla_{[X, Y]} Z.
\end{align*}
The remaining piece $\nabla_Y \nabla_X Z - \nabla_X \nabla_Y Z$ is, up to sign, the commutator of the operators $\nabla_X$ and $\nabla_Y$ acting on $Z$:
\begin{align*}
\nabla_Y \nabla_X Z - \nabla_X \nabla_Y Z &= -\bigl( \nabla_X \nabla_Y Z - \nabla_Y \nabla_X Z \bigr) = -[\nabla_X, \nabla_Y] Z.
\end{align*}
Regrouping,
\begin{align*}
R(X, Y) Z &= \nabla_{[X, Y]} Z - [\nabla_X, \nabla_Y] Z.
\end{align*}
The minus sign attached to $[\nabla_X, \nabla_Y]$ is a direct consequence of the sign convention $R = -\nabla \circ \nabla$ chosen in this chapter. Under the opposite convention $R = +\nabla \circ \nabla$ (used by Lee, do Carmo, and many others), the formula reads $R(X, Y) = \nabla_X \nabla_Y - \nabla_Y \nabla_X - \nabla_{[X, Y]}$, which differs only by an overall sign — the underlying calculation is identical.[/guided]