[proofplan]
We show that a smooth map $x \mapsto y(x)$ is canonical if and only if its Jacobian matrix $Jy_x$ is symplectic at every point. The argument linearises the problem: for an infinitesimal displacement, the nonlinear map acts as its Jacobian, and the linear canonical transformation theorem (theorem 1337) characterises when a linear map preserves Hamilton's equations. We verify both directions: the chain rule transforms Hamilton's equations to the new coordinates, producing $\dot{y} = Jy_x \cdot J \cdot (Jy_x)^\top \, \partial \widetilde{H} / \partial y$, which equals $J \, \partial \widetilde{H} / \partial y$ for all $\widetilde{H}$ if and only if $Jy_x \cdot J \cdot (Jy_x)^\top = J$ everywhere.
[/proofplan]
[step:Transform Hamilton's equations under the smooth map $y = y(x)$]
Let $y: \mathbb{R}^{2n} \to \mathbb{R}^{2n}$ be a smooth diffeomorphism. By the chain rule, the time derivative of $y$ along a trajectory $x(t)$ is
\begin{align*}
\dot{y}_i = \sum_{j=1}^{2n} \frac{\partial y_i}{\partial x_j} \dot{x}_j,
\end{align*}
which in matrix notation reads $\dot{y} = Jy_x \cdot \dot{x}$, where $Jy_x$ is the Jacobian matrix with entries $(Jy_x)_{ij} = \partial y_i / \partial x_j$.
Substituting Hamilton's equations $\dot{x} = J \, \partial H / \partial x$:
\begin{align*}
\dot{y} = Jy_x \cdot J \frac{\partial H}{\partial x}.
\end{align*}
[/step]
[step:Express $\partial H / \partial x$ in terms of $\partial \widetilde{H} / \partial y$ via the chain rule]
Define $\widetilde{H}(y) = H(x(y))$ where $x(y) = y^{-1}(y)$ is the inverse map. Then $H(x) = \widetilde{H}(y(x))$, and the chain rule gives
\begin{align*}
\frac{\partial H}{\partial x_i} = \sum_{k=1}^{2n} \frac{\partial \widetilde{H}}{\partial y_k} \cdot \frac{\partial y_k}{\partial x_i} = \sum_{k=1}^{2n} (Jy_x)_{ki} \frac{\partial \widetilde{H}}{\partial y_k},
\end{align*}
so in matrix notation
\begin{align*}
\frac{\partial H}{\partial x} = (Jy_x)^\top \frac{\partial \widetilde{H}}{\partial y}.
\end{align*}
[/step]
[step:Substitute and identify the symplecticity condition]
Substituting into $\dot{y} = Jy_x \cdot J \, \partial H / \partial x$:
\begin{align*}
\dot{y} = Jy_x \cdot J \cdot (Jy_x)^\top \frac{\partial \widetilde{H}}{\partial y}.
\end{align*}
The transformation is canonical if and only if $\dot{y} = J \, \partial \widetilde{H} / \partial y$ for every smooth Hamiltonian $H$. Since $H$ is arbitrary, the gradient $\partial \widetilde{H} / \partial y$ can be prescribed to be any vector at any point (for instance by choosing $\widetilde{H}(y) = v \cdot y$ for an arbitrary $v \in \mathbb{R}^{2n}$). Therefore the equation
\begin{align*}
Jy_x \cdot J \cdot (Jy_x)^\top \frac{\partial \widetilde{H}}{\partial y} = J \frac{\partial \widetilde{H}}{\partial y}
\end{align*}
holds for all vectors $\partial \widetilde{H} / \partial y$ if and only if
\begin{align*}
Jy_x \cdot J \cdot (Jy_x)^\top = J
\end{align*}
at every point $x$. Since this must hold at each $x$ in the domain (the trajectory $x(t)$ can pass through any point), $Jy_x$ must be symplectic everywhere.
[guided]
This proof is the direct generalisation of the linear case (theorem 1337). In the linear case, $y = Ax$ gives $Jy_x = A$ at every point, and the condition reduces to $AJA^\top = J$. For a nonlinear diffeomorphism, the Jacobian $Jy_x$ varies from point to point, so the symplecticity condition must be imposed at each $x$ separately.
The argument hinges on the same "for every Hamiltonian" universality as the linear case. A diffeomorphism might happen to preserve the Hamiltonian flow of one particular $H$ without being canonical --- canonicalness is the requirement that it preserve the flow of every $H$ simultaneously. This is equivalent to preserving the symplectic structure $J$ pointwise.
Note that the condition $Jy_x \cdot J \cdot (Jy_x)^\top = J$ for all $x$ can be restated invariantly: the map $y$ preserves the symplectic 2-form $\omega = \sum_{i=1}^n dq_i \wedge dp_i$. A smooth diffeomorphism that preserves $\omega$ is called a symplectomorphism. The theorem states that canonical transformations and symplectomorphisms are the same thing.
[/guided]
[/step]