[proofplan]
The divergence on a Riemannian manifold is defined as the trace of the covariant derivative: $\operatorname{div} X = \operatorname{tr}(\nabla X)$, where $\nabla X \in \Gamma(T^*M \otimes TM)$ is regarded as an endomorphism of $TM$ via the canonical isomorphism $T^*M \otimes TM \cong \operatorname{End}(TM)$. The Leibniz rule for $\nabla$ on the product $fX$ produces $\nabla(fX) = df \otimes X + f\,\nabla X$. Taking the pointwise trace and identifying $\operatorname{tr}(df \otimes X) = \langle df, X \rangle$ via the same canonical isomorphism gives the result. The argument is purely tensorial and pointwise.
[/proofplan]
[step:Set up the divergence as the trace of the covariant derivative]
Let $(M, g)$ be a Riemannian manifold, $\nabla$ the Levi-Civita connection, $f \in C^\infty(M)$, and $X \in \mathfrak{X}(M)$. The covariant derivative of $X$ is the $(1,1)$-tensor field
\begin{align*}
\nabla X : \mathfrak{X}(M) &\to \mathfrak{X}(M), \\
Y &\mapsto \nabla_Y X,
\end{align*}
which we regard as a section of $T^*M \otimes TM$, equivalently as a section of $\operatorname{End}(TM)$ via the trace-contraction pairing $T^*M \otimes TM \cong \operatorname{End}(TM)$. The divergence is defined pointwise by
\begin{align*}
\operatorname{div} X := \operatorname{tr}(\nabla X) \in C^\infty(M),
\end{align*}
so $(\operatorname{div} X)(p) = \operatorname{tr}\big( v \mapsto \nabla_v X \big)$ at each $p \in M$, where the trace is taken on the linear map $T_p M \to T_p M$, $v \mapsto (\nabla_v X)(p)$.
[/step]
[step:Apply the Leibniz rule for $\nabla$ to the product $fX$]
The Levi-Civita connection $\nabla$ is a derivation: for any $f \in C^\infty(M)$, $X \in \mathfrak{X}(M)$, and $Y \in \mathfrak{X}(M)$,
\begin{align*}
\nabla_Y(fX) = (Yf)\,X + f\,\nabla_Y X.
\end{align*}
Recognising $Yf = df(Y)$, this rewrites at the level of $(1,1)$-tensors as
\begin{align*}
\nabla(fX) = df \otimes X + f\,\nabla X,
\tag{L}
\end{align*}
where $df \otimes X$ is the section of $T^*M \otimes TM$ acting on $Y \in \mathfrak{X}(M)$ by $(df \otimes X)(Y) = df(Y) \, X$.
[/step]
[step:Take the pointwise trace and identify each term]
Apply the trace, viewed as the canonical contraction $T^*M \otimes TM \to C^\infty(M)$, $\theta \otimes Z \mapsto \theta(Z)$, to both sides of (L). The trace is $C^\infty(M)$-linear, so
\begin{align*}
\operatorname{div}(fX) = \operatorname{tr}\big(\nabla(fX)\big) = \operatorname{tr}(df \otimes X) + f\,\operatorname{tr}(\nabla X).
\end{align*}
The second term is $f\,\operatorname{div} X$ by definition of $\operatorname{div}$.
For the first term, fix $p \in M$ and a basis $\{e_i\}_{i=1}^n$ of $T_p M$ with dual basis $\{e^i\}$ of $T_p^* M$. Writing $df_p = \sum_i a_i\, e^i$ and $X(p) = \sum_j b_j\, e_j$, the endomorphism $df_p \otimes X(p) : T_p M \to T_p M$ acts by
\begin{align*}
v \mapsto df_p(v) \, X(p) = \big(\sum_i a_i\, e^i(v)\big) \big(\sum_j b_j\, e_j\big),
\end{align*}
whose matrix in the basis $\{e_i\}$ has $(i,j)$-entry $b_i a_j$. Its trace is
\begin{align*}
\operatorname{tr}(df_p \otimes X(p)) = \sum_i b_i a_i = df_p(X(p)) = \langle df, X \rangle(p),
\end{align*}
where $\langle df, X \rangle := df(X)$ is the natural pairing of a $1$-form with a vector field. Since $p$ was arbitrary,
\begin{align*}
\operatorname{tr}(df \otimes X) = \langle df, X \rangle.
\end{align*}
Combining,
\begin{align*}
\operatorname{div}(fX) = \langle df, X \rangle + f\,\operatorname{div} X = f\,\operatorname{div} X + \langle df, X \rangle.
\end{align*}
This is the asserted identity.
[guided]
We have the tensor-level Leibniz identity from the previous step:
\begin{align*}
\nabla(fX) = df \otimes X + f\,\nabla X. \tag{L}
\end{align*}
Our goal is to take the trace of both sides and recognise each term as a recognisable object — the divergence on the left, and the function-derivative pairing $\langle df, X \rangle$ on the right.
The trace we use is the canonical contraction
\begin{align*}
\operatorname{tr} : T^*M \otimes TM &\to C^\infty(M), \\
\theta \otimes Z &\mapsto \theta(Z),
\end{align*}
extended $C^\infty(M)$-linearly. Why is this the "right" trace? Because under the trace-contraction isomorphism $T^*M \otimes TM \cong \operatorname{End}(TM)$, this contraction is exactly the trace of the corresponding endomorphism. Applying it to (L) and using $C^\infty(M)$-linearity (the function $f$ pulls out of the trace),
\begin{align*}
\operatorname{div}(fX) = \operatorname{tr}\big(\nabla(fX)\big) = \operatorname{tr}(df \otimes X) + f\,\operatorname{tr}(\nabla X).
\end{align*}
The second term is recognisable immediately: by the very definition $\operatorname{div} X := \operatorname{tr}(\nabla X)$, we have $f\,\operatorname{tr}(\nabla X) = f\,\operatorname{div} X$.
The first term requires more thought. What is $\operatorname{tr}(df \otimes X)$? Since trace is defined pointwise, fix $p \in M$ and compute in coordinates. Choose a basis $\{e_i\}_{i=1}^n$ of $T_p M$ with dual basis $\{e^i\}$ of $T_p^* M$, and expand
\begin{align*}
df_p = \sum_i a_i\, e^i, \qquad X(p) = \sum_j b_j\, e_j.
\end{align*}
The endomorphism $df_p \otimes X(p) : T_p M \to T_p M$ acts on $v \in T_p M$ by
\begin{align*}
v \mapsto df_p(v) \, X(p) = \Big(\sum_i a_i\, e^i(v)\Big)\Big(\sum_j b_j\, e_j\Big).
\end{align*}
What is its matrix? Reading off the coefficient of $e_i$ in the image of $e_j$, the $(i,j)$-entry is $b_i a_j$. The trace is the sum of diagonal entries:
\begin{align*}
\operatorname{tr}(df_p \otimes X(p)) = \sum_i b_i a_i = df_p(X(p)) = \langle df, X \rangle(p),
\end{align*}
where we recognise $\sum_i a_i b_i$ as the natural pairing $df_p(X(p))$, written $\langle df, X \rangle(p)$. This is exactly the abstract statement that the trace-contraction pairing $T^*M \otimes TM \cong \operatorname{End}(TM)$ sends the simple tensor $\theta \otimes Z$ to $\theta(Z)$ — here $\theta = df_p$ and $Z = X(p)$. Since $p \in M$ was arbitrary,
\begin{align*}
\operatorname{tr}(df \otimes X) = \langle df, X \rangle.
\end{align*}
Combining the two terms,
\begin{align*}
\operatorname{div}(fX) = \langle df, X \rangle + f\,\operatorname{div} X = f\,\operatorname{div} X + \langle df, X \rangle,
\end{align*}
which is the asserted identity.
A remark on what was and was not used. The argument is purely tensorial and pointwise: nowhere did we invoke metric compatibility ($\nabla g = 0$) or torsion-freeness of $\nabla$. The only property of $\nabla$ used was that it is a derivation on $\mathfrak{X}(M)$ over $C^\infty(M)$ — this is what produced (L) — together with the algebraic identification of the trace of a simple tensor as the pairing. The function-derivative term $\langle df, X \rangle$ here plays the role of the elementary Euclidean expression $X(f)$ from vector calculus.
[/guided]
[/step]