[proofplan]
We must show $g \circ f$ is [differentiable](/page/Derivative) at $a$ with derivative $\beta \circ \alpha$, where $\alpha = Df_a$ and $\beta = Dg_b$. We substitute the differentiability expansions for $f$ and $g$, set $k = f(a + h) - f(a)$, and expand $g(b + k)$ to isolate the linear term $(\beta \circ \alpha)(h)$. The proof reduces to showing the two remaining error terms are $o(|h|)$: one uses the [Lipschitz bound](/theorems/321) on $\beta$, and the other uses the bound $|k| = O(|h|)$ from [continuity](/page/Continuity) of $f$.
[/proofplan]
[step:Write out the differentiability expansions and substitute]
Write $b = f(a)$, $\alpha = Df_a \in \mathcal{L}(\mathbb{R}^m, \mathbb{R}^n)$, $\beta = Dg_b \in \mathcal{L}(\mathbb{R}^n, \mathbb{R}^p)$. By [differentiability](/page/Derivative):
\begin{align*}
f(a + h) &= f(a) + \alpha(h) + |h|\varepsilon_1(h), \\
g(b + k) &= g(b) + \beta(k) + |k|\varepsilon_2(k),
\end{align*}
where $\varepsilon_1(h) \to \mathbf{0}$ as $h \to \mathbf{0}$ and $\varepsilon_2(k) \to \mathbf{0}$ as $k \to \mathbf{0}$. Set $k = f(a + h) - f(a) = \alpha(h) + |h|\varepsilon_1(h)$. Then:
\begin{align*}
(g \circ f)(a + h) - (g \circ f)(a) &= \beta(k) + |k|\varepsilon_2(k) \\
&= \beta\bigl(\alpha(h) + |h|\varepsilon_1(h)\bigr) + |k|\varepsilon_2(k) \\
&= (\beta \circ \alpha)(h) + |h|\beta(\varepsilon_1(h)) + |k|\varepsilon_2(k),
\end{align*}
where the last line uses linearity of $\beta$.
[/step]
[step:Show the error term $|h|\beta(\varepsilon_1(h))$ is $o(|h|)$]
By the [Lipschitz bound for linear maps](/theorems/321) applied to $\beta$:
\begin{align*}
\bigl||h|\beta(\varepsilon_1(h))\bigr| \leq |h| \cdot \|\beta\| \cdot |\varepsilon_1(h)|.
\end{align*}
Since $|\varepsilon_1(h)| \to 0$ as $h \to \mathbf{0}$, this is $|h| \cdot o(1) = o(|h|)$.
[/step]
[step:Show the error term $|k|\varepsilon_2(k)$ is $o(|h|)$]
First bound $|k|$. By the triangle inequality and the [Lipschitz bound](/theorems/321):
\begin{align*}
|k| = |\alpha(h) + |h|\varepsilon_1(h)| \leq \|\alpha\| \cdot |h| + |h| \cdot |\varepsilon_1(h)| = |h|\bigl(\|\alpha\| + |\varepsilon_1(h)|\bigr).
\end{align*}
As $h \to \mathbf{0}$, the bound gives $|k| \to 0$, so $k \to \mathbf{0}$, which means $\varepsilon_2(k) \to \mathbf{0}$. Therefore:
\begin{align*}
||k|\varepsilon_2(k)| \leq |h|\bigl(\|\alpha\| + |\varepsilon_1(h)|\bigr) \cdot |\varepsilon_2(k)| = |h| \cdot o(1).
\end{align*}
[guided]
The key question is: why does $\varepsilon_2(k) \to \mathbf{0}$ as $h \to \mathbf{0}$? The error function $\varepsilon_2$ satisfies $\varepsilon_2(k) \to \mathbf{0}$ as $k \to \mathbf{0}$, so we need $k \to \mathbf{0}$.
We have $k = \alpha(h) + |h|\varepsilon_1(h)$. The triangle inequality and the [Lipschitz bound](/theorems/321) give $|k| \leq \|\alpha\| \cdot |h| + |h| \cdot |\varepsilon_1(h)| = |h|(\|\alpha\| + |\varepsilon_1(h)|)$. As $h \to \mathbf{0}$, $|\varepsilon_1(h)| \to 0$, so $|k| \leq |h|(\|\alpha\| + o(1)) \to 0$.
This is precisely where [Differentiability Implies Continuity](/theorems/322) enters: $k = f(a + h) - f(a) \to \mathbf{0}$ as $h \to \mathbf{0}$ because differentiable functions are continuous.
The bound also shows $|k| = O(|h|)$: there exists a constant $C = \|\alpha\| + 1$ (valid for $|h|$ small enough) with $|k| \leq C|h|$.
Combining: $|k| \cdot |\varepsilon_2(k)| \leq C|h| \cdot |\varepsilon_2(k)| = O(|h|) \cdot o(1) = o(|h|)$, which is what we needed to show.
[/guided]
[/step]
[step:Combine the estimates to conclude differentiability]
From the preceding steps:
\begin{align*}
(g \circ f)(a + h) - (g \circ f)(a) = (\beta \circ \alpha)(h) + |h|\varepsilon_3(h),
\end{align*}
where $\varepsilon_3(h) \to \mathbf{0}$ as $h \to \mathbf{0}$ (absorbing both error terms). The map $\beta \circ \alpha \in \mathcal{L}(\mathbb{R}^m, \mathbb{R}^p)$ is linear, so $g \circ f$ is [differentiable](/page/Derivative) at $a$ with $D(g \circ f)_a = Dg_{f(a)} \circ Df_a$.
[/step]