[proofplan]
We show that a linear map $y = Ax$ is canonical if and only if $AJA^\top = J$. Starting from Hamilton's equations $\dot{x} = J \, \partial H / \partial x$ in the original coordinates, we transform to the new coordinates $y = Ax$ using the chain rule to express $\partial H / \partial x$ in terms of $\partial \widetilde{H} / \partial y$, and substitute into the transformed equation $\dot{y} = A\dot{x}$ to determine when the result has Hamiltonian form $\dot{y} = J \, \partial \widetilde{H} / \partial y$.
[/proofplan]
[step:Express the dynamics in the new coordinates $y = Ax$]
Since $A$ is a constant matrix and $y = Ax$, differentiation with respect to time gives
\begin{align*}
\dot{y} = A \dot{x}.
\end{align*}
By Hamilton's equations in the original coordinates,
\begin{align*}
\dot{x} = J \frac{\partial H}{\partial x},
\end{align*}
so
\begin{align*}
\dot{y} = A J \frac{\partial H}{\partial x}.
\end{align*}
[/step]
[step:Use the chain rule to express $\partial H / \partial x$ in terms of $\partial \widetilde{H} / \partial y$]
Define $\widetilde{H}(y) = H(A^{-1}y)$, so that $\widetilde{H}(Ax) = H(x)$. The chain rule applied to $H(x) = \widetilde{H}(Ax)$ gives
\begin{align*}
\frac{\partial H}{\partial x_i} = \sum_{j=1}^{2n} \frac{\partial \widetilde{H}}{\partial y_j} \cdot \frac{\partial y_j}{\partial x_i} = \sum_{j=1}^{2n} A_{ji} \frac{\partial \widetilde{H}}{\partial y_j},
\end{align*}
since $y_j = \sum_i A_{ji} x_i$. In matrix notation this reads
\begin{align*}
\frac{\partial H}{\partial x} = A^\top \frac{\partial \widetilde{H}}{\partial y}.
\end{align*}
[guided]
The index computation deserves care. We have $y_j = (Ax)_j = \sum_i A_{ji} x_i$, so $\partial y_j / \partial x_i = A_{ji}$. Applying the multivariate chain rule to $H(x) = \widetilde{H}(y(x))$:
\begin{align*}
\frac{\partial H}{\partial x_i} = \sum_{j=1}^{2n} \frac{\partial \widetilde{H}}{\partial y_j} \cdot \frac{\partial y_j}{\partial x_i} = \sum_{j=1}^{2n} A_{ji} \frac{\partial \widetilde{H}}{\partial y_j}.
\end{align*}
The sum $\sum_j A_{ji} \, \partial \widetilde{H} / \partial y_j$ is precisely the $i$-th component of $A^\top \, \partial \widetilde{H} / \partial y$, since $(A^\top)_{ij} = A_{ji}$.
[/guided]
[/step]
[step:Substitute and identify the condition for Hamiltonian form]
Substituting $\partial H / \partial x = A^\top \, \partial \widetilde{H} / \partial y$ into $\dot{y} = AJ \, \partial H / \partial x$:
\begin{align*}
\dot{y} = A J A^\top \frac{\partial \widetilde{H}}{\partial y}.
\end{align*}
The transformation $y = Ax$ is canonical if and only if $\dot{y} = J \, \partial \widetilde{H} / \partial y$ for every choice of Hamiltonian $H$. Since the above must hold for all smooth $H: M \to \mathbb{R}$ (and hence for all smooth $\widetilde{H}$), the vector $\partial \widetilde{H} / \partial y$ can be made to take any value at any point by choosing $\widetilde{H}$ appropriately. Therefore the matrix equation
\begin{align*}
A J A^\top \frac{\partial \widetilde{H}}{\partial y} = J \frac{\partial \widetilde{H}}{\partial y}
\end{align*}
holds for all vectors $\partial \widetilde{H} / \partial y$ if and only if $AJA^\top = J$.
[guided]
The "for every Hamiltonian" quantifier is doing essential work here. If we only required the condition to hold for a single specific $H$, we would get a weaker condition --- the transformation would need to preserve Hamilton's equations for that particular system, but might fail for others. Canonicalness demands that the transformation preserve the Hamiltonian structure itself, not just one particular Hamiltonian flow. This is why the condition $AJA^\top = J$ is purely geometric (a condition on $A$ and $J$) with no reference to any specific $H$.
To see that $\partial \widetilde{H} / \partial y$ can be prescribed to be any vector $v \in \mathbb{R}^{2n}$ at a given point $y_0$, take $\widetilde{H}(y) = v \cdot y$. Then $\partial \widetilde{H} / \partial y = v$. Since $(AJA^\top - J)v = 0$ must hold for every $v \in \mathbb{R}^{2n}$, we conclude $AJA^\top - J = 0$.
[/guided]
[/step]