[guided]We have the tensor-level Leibniz identity from the previous step:
\begin{align*}
\nabla(fX) = df \otimes X + f\,\nabla X. \tag{L}
\end{align*}
Our goal is to take the trace of both sides and recognise each term as a recognisable object — the divergence on the left, and the function-derivative pairing $\langle df, X \rangle$ on the right.
The trace we use is the canonical contraction
\begin{align*}
\operatorname{tr} : T^*M \otimes TM &\to C^\infty(M), \\
\theta \otimes Z &\mapsto \theta(Z),
\end{align*}
extended $C^\infty(M)$-linearly. Why is this the "right" trace? Because under the trace-contraction isomorphism $T^*M \otimes TM \cong \operatorname{End}(TM)$, this contraction is exactly the trace of the corresponding endomorphism. Applying it to (L) and using $C^\infty(M)$-linearity (the function $f$ pulls out of the trace),
\begin{align*}
\operatorname{div}(fX) = \operatorname{tr}\big(\nabla(fX)\big) = \operatorname{tr}(df \otimes X) + f\,\operatorname{tr}(\nabla X).
\end{align*}
The second term is recognisable immediately: by the very definition $\operatorname{div} X := \operatorname{tr}(\nabla X)$, we have $f\,\operatorname{tr}(\nabla X) = f\,\operatorname{div} X$.
The first term requires more thought. What is $\operatorname{tr}(df \otimes X)$? Since trace is defined pointwise, fix $p \in M$ and compute in coordinates. Choose a basis $\{e_i\}_{i=1}^n$ of $T_p M$ with dual basis $\{e^i\}$ of $T_p^* M$, and expand
\begin{align*}
df_p = \sum_i a_i\, e^i, \qquad X(p) = \sum_j b_j\, e_j.
\end{align*}
The endomorphism $df_p \otimes X(p) : T_p M \to T_p M$ acts on $v \in T_p M$ by
\begin{align*}
v \mapsto df_p(v) \, X(p) = \Big(\sum_i a_i\, e^i(v)\Big)\Big(\sum_j b_j\, e_j\Big).
\end{align*}
What is its matrix? Reading off the coefficient of $e_i$ in the image of $e_j$, the $(i,j)$-entry is $b_i a_j$. The trace is the sum of diagonal entries:
\begin{align*}
\operatorname{tr}(df_p \otimes X(p)) = \sum_i b_i a_i = df_p(X(p)) = \langle df, X \rangle(p),
\end{align*}
where we recognise $\sum_i a_i b_i$ as the natural pairing $df_p(X(p))$, written $\langle df, X \rangle(p)$. This is exactly the abstract statement that the trace-contraction pairing $T^*M \otimes TM \cong \operatorname{End}(TM)$ sends the simple tensor $\theta \otimes Z$ to $\theta(Z)$ — here $\theta = df_p$ and $Z = X(p)$. Since $p \in M$ was arbitrary,
\begin{align*}
\operatorname{tr}(df \otimes X) = \langle df, X \rangle.
\end{align*}
Combining the two terms,
\begin{align*}
\operatorname{div}(fX) = \langle df, X \rangle + f\,\operatorname{div} X = f\,\operatorname{div} X + \langle df, X \rangle,
\end{align*}
which is the asserted identity.
A remark on what was and was not used. The argument is purely tensorial and pointwise: nowhere did we invoke metric compatibility ($\nabla g = 0$) or torsion-freeness of $\nabla$. The only property of $\nabla$ used was that it is a derivation on $\mathfrak{X}(M)$ over $C^\infty(M)$ — this is what produced (L) — together with the algebraic identification of the trace of a simple tensor as the pairing. The function-derivative term $\langle df, X \rangle$ here plays the role of the elementary Euclidean expression $X(f)$ from vector calculus.[/guided]