[proofplan]
We first use the differentiable chain rule for total derivatives to identify the total derivative of $f\circ g$ at $a$ as the composition $Df_{g(a)}\circ Dg_a$. Evaluating this [linear map](/page/Linear%20Map) on the standard basis vector $e_i\in\mathbb{R}^m$ turns the total derivative identity into the desired [partial derivative](/page/Partial%20Derivative) identity. The remaining work is to expand $Dg_a(e_i)$ in the standard basis of $\mathbb{R}^k$ and then use linearity of $Df_{g(a)}$.
[/proofplan]
[step:Apply the total derivative chain rule to $f\circ g$]
Define the composition map
\begin{align*}
h:U &\to \mathbb{R}^n
\end{align*}
by $h=f\circ g$. Since $g$ is differentiable at $a$ and $f$ is differentiable at $g(a)$, the differentiable chain rule for Euclidean maps gives that $h$ is differentiable at $a$ and
\begin{align*}
Dh_a=Df_{g(a)}\circ Dg_a.
\end{align*}
Equivalently, for every vector $v\in\mathbb{R}^m$,
\begin{align*}
Dh_a(v)=Df_{g(a)}(Dg_a(v)).
\end{align*}
(citing a result not yet in the wiki: Chain Rule for Differentiable Maps)
[/step]
[step:Evaluate the derivative identity on the $i$-th coordinate direction]
Fix $i\in\{1,\ldots,m\}$, and let $e_i\in\mathbb{R}^m$ denote the $i$-th standard basis vector. By [citetheorem:7904] applied to the differentiable map $h:U\to\mathbb{R}^n$,
\begin{align*}
\partial_{x_i}h(a)=Dh_a(e_i).
\end{align*}
Using the derivative identity from the previous step, we obtain
\begin{align*}
\partial_{x_i}(f\circ g)(a)=Df_{g(a)}(Dg_a(e_i)).
\end{align*}
[guided]
Fix $i\in\{1,\ldots,m\}$, and let $e_i\in\mathbb{R}^m$ be the vector with $1$ in the $i$-th coordinate and $0$ in all other coordinates. The reason to evaluate on $e_i$ is that a partial derivative is exactly the total derivative tested in a coordinate direction.
We apply [citetheorem:7904] to the map $h:U\to\mathbb{R}^n$ defined by $h=f\circ g$. Its hypothesis is satisfied because the previous step showed that $h$ is differentiable at $a$. Therefore the $i$-th partial derivative of $h$ exists and is given by
\begin{align*}
\partial_{x_i}h(a)=Dh_a(e_i).
\end{align*}
Since $h=f\circ g$, this is
\begin{align*}
\partial_{x_i}(f\circ g)(a)=Dh_a(e_i).
\end{align*}
The total derivative chain rule gives $Dh_a=Df_{g(a)}\circ Dg_a$, so substituting this identity into the preceding formula yields
\begin{align*}
\partial_{x_i}(f\circ g)(a)=Df_{g(a)}(Dg_a(e_i)).
\end{align*}
This converts the desired partial derivative formula into a computation in the linear maps $Dg_a:\mathbb{R}^m\to\mathbb{R}^k$ and $Df_{g(a)}:\mathbb{R}^k\to\mathbb{R}^n$.
[/guided]
[/step]
[step:Expand $Dg_a(e_i)$ in the standard basis of $\mathbb{R}^k$]
For each $r\in\{1,\ldots,k\}$, let $\varepsilon_r\in\mathbb{R}^k$ denote the $r$-th standard basis vector. Since $g:U\to\mathbb{R}^k$ is differentiable at $a$, [citetheorem:7904] applied to $g$ gives
\begin{align*}
Dg_a(e_i)=\partial_{x_i}g(a).
\end{align*}
Because $g=(g_1,\ldots,g_k)$, the vector-valued partial derivative has components
\begin{align*}
\partial_{x_i}g(a)=\sum_{r=1}^k \partial_{x_i}g_r(a)\,\varepsilon_r.
\end{align*}
Hence
\begin{align*}
Dg_a(e_i)=\sum_{r=1}^k \partial_{x_i}g_r(a)\,\varepsilon_r.
\end{align*}
[/step]
[step:Use linearity of $Df_{g(a)}$ to obtain the coordinate formula]
Since $Df_{g(a)}:\mathbb{R}^k\to\mathbb{R}^n$ is a linear map, the expansion of $Dg_a(e_i)$ gives
\begin{align*}
Df_{g(a)}(Dg_a(e_i))=\sum_{r=1}^k \partial_{x_i}g_r(a)\,Df_{g(a)}(\varepsilon_r).
\end{align*}
Applying [citetheorem:7904] to the differentiable map $f:V\to\mathbb{R}^n$ at the point $g(a)$ gives
\begin{align*}
Df_{g(a)}(\varepsilon_r)=\partial_{y_r}f(g(a))
\end{align*}
for every $r\in\{1,\ldots,k\}$. Therefore
\begin{align*}
Df_{g(a)}(Dg_a(e_i))=\sum_{r=1}^k \partial_{x_i}g_r(a)\,\partial_{y_r}f(g(a)).
\end{align*}
Scalar multiplication in $\mathbb{R}^n$ is commutative with respect to real scalars, so this is the same as
\begin{align*}
Df_{g(a)}(Dg_a(e_i))=\sum_{r=1}^k \partial_{y_r}f(g(a))\,\partial_{x_i}g_r(a).
\end{align*}
Combining this identity with
\begin{align*}
\partial_{x_i}(f\circ g)(a)=Df_{g(a)}(Dg_a(e_i))
\end{align*}
proves
\begin{align*}
\partial_{x_i}(f\circ g)(a)=\sum_{r=1}^k \partial_{y_r}f(g(a))\,\partial_{x_i}g_r(a).
\end{align*}
Since $i\in\{1,\ldots,m\}$ was arbitrary, the formula holds for every coordinate direction.
[/step]