[proofplan]
We prove the result directly from the definitions of Lipschitz continuity and function composition. For arbitrary points $x_1,x_2\in X$, the $M$-Lipschitz estimate for $g$ controls the distance between $g(f(x_1))$ and $g(f(x_2))$ by the distance between $f(x_1)$ and $f(x_2)$ in $Y$. The $L$-Lipschitz estimate for $f$ then controls that intermediate distance by $d_X(x_1,x_2)$, and multiplying the two estimates gives the Lipschitz constant $ML$.
[/proofplan]
[step:Declare the composition as a map from $X$ to $Z$]
Since $f:X\to Y$ and $g:Y\to Z$, the composition
\begin{align*}
g\circ f:X\to Z
\end{align*}
is the map defined by
\begin{align*}
(g\circ f)(x)=g(f(x))
\end{align*}
for every $x\in X$.
[/step]
[step:Apply the two Lipschitz estimates in sequence]
Let $x_1,x_2\in X$ be arbitrary. Since $f:X\to Y$, both $f(x_1)$ and $f(x_2)$ are elements of $Y$. Applying the $M$-Lipschitz property of $g:Y\to Z$ to the points $f(x_1),f(x_2)\in Y$ gives
\begin{align*}
d_Z(g(f(x_1)),g(f(x_2)))\le M\,d_Y(f(x_1),f(x_2)).
\end{align*}
Applying the $L$-Lipschitz property of $f:X\to Y$ to the points $x_1,x_2\in X$ gives
\begin{align*}
d_Y(f(x_1),f(x_2))\le L\,d_X(x_1,x_2).
\end{align*}
Because $M\ge 0$, multiplying the [second inequality](/theorems/2136) by $M$ preserves the inequality:
\begin{align*}
M\,d_Y(f(x_1),f(x_2))\le ML\,d_X(x_1,x_2).
\end{align*}
Combining the preceding two inequalities yields
\begin{align*}
d_Z(g(f(x_1)),g(f(x_2)))\le ML\,d_X(x_1,x_2).
\end{align*}
[guided]
We start with arbitrary points $x_1,x_2\in X$ because the definition of being Lipschitz requires an estimate for every pair of points in the domain. The composition $g\circ f$ sends $x\in X$ first to $f(x)\in Y$ and then to $g(f(x))\in Z$, so the distance we must estimate is
\begin{align*}
d_Z((g\circ f)(x_1),(g\circ f)(x_2)).
\end{align*}
By the definition of composition, this is
\begin{align*}
d_Z(g(f(x_1)),g(f(x_2))).
\end{align*}
The map $g:Y\to Z$ is $M$-Lipschitz, meaning that for every $y_1,y_2\in Y$,
\begin{align*}
d_Z(g(y_1),g(y_2))\le M\,d_Y(y_1,y_2).
\end{align*}
We may apply this with $y_1=f(x_1)$ and $y_2=f(x_2)$ because $f$ takes values in $Y$. Therefore,
\begin{align*}
d_Z(g(f(x_1)),g(f(x_2)))\le M\,d_Y(f(x_1),f(x_2)).
\end{align*}
Now we estimate the remaining distance in $Y$. The map $f:X\to Y$ is $L$-Lipschitz, meaning that for every $u_1,u_2\in X$,
\begin{align*}
d_Y(f(u_1),f(u_2))\le L\,d_X(u_1,u_2).
\end{align*}
Applying this with $u_1=x_1$ and $u_2=x_2$ gives
\begin{align*}
d_Y(f(x_1),f(x_2))\le L\,d_X(x_1,x_2).
\end{align*}
Since $M\ge 0$, multiplying this inequality by $M$ preserves the direction of the inequality:
\begin{align*}
M\,d_Y(f(x_1),f(x_2))\le ML\,d_X(x_1,x_2).
\end{align*}
Substituting this bound into the earlier estimate gives
\begin{align*}
d_Z(g(f(x_1)),g(f(x_2)))\le ML\,d_X(x_1,x_2).
\end{align*}
This is precisely the desired Lipschitz estimate for $g\circ f$ at the pair $x_1,x_2$.
[/guided]
[/step]
[step:Conclude the Lipschitz bound for every pair of points]
Using $(g\circ f)(x_i)=g(f(x_i))$ for $i\in\{1,2\}$, the estimate from the previous step becomes
\begin{align*}
d_Z((g\circ f)(x_1),(g\circ f)(x_2))\le ML\,d_X(x_1,x_2).
\end{align*}
Since $x_1,x_2\in X$ were arbitrary, $g\circ f:X\to Z$ is $ML$-Lipschitz.
[/step]