[proofplan]
We first reduce controllability between arbitrary endpoints to reachability from the origin at a fixed time $T>0$. The reachable set is identified with the range of the controllability matrix by proving both inclusions: one inclusion follows from the matrix polynomial identity supplied by Cayley-Hamilton, and the reverse inclusion follows from an orthogonal-complement argument using the analyticity of $t \mapsto B^\top e^{tA^\top}y$. Once the reachable set is the range of the displayed matrix, controllability is exactly the assertion that this range is all of $\mathbb{R}^n$, which is equivalent to rank $n$.
[/proofplan]
[step:Reduce controllability to reachability from the origin]
Fix $T>0$. Throughout the proof, $\mathcal{L}^1$ denotes one-dimensional [Lebesgue measure](/page/Lebesgue%20Measure) on $(0,T)$. For a control $u \in L^2((0,T);\mathbb{R}^m)$, the endpoint of the solution starting from $x_0 \in \mathbb{R}^n$ is given by the variation-of-constants formula
\begin{align*}
x(T) = e^{TA}x_0 + \int_0^{\!T}e^{(T-s)A}B u(s)\,d\mathcal{L}^1(s).
\end{align*}
Define the time-$T$ reachable subspace from the origin by
\begin{align*}
\mathcal{R}_T := \left\{\int_0^{\!T}e^{(T-s)A}B u(s)\,d\mathcal{L}^1(s) : u \in L^2((0,T);\mathbb{R}^m)\right\} \subseteq \mathbb{R}^n.
\end{align*}
Then the system can be steered from $x_0$ to $x_1$ in time $T$ if and only if
\begin{align*}
x_1 - e^{TA}x_0 \in \mathcal{R}_T.
\end{align*}
Since $e^{TA}:\mathbb{R}^n \to \mathbb{R}^n$ is a [linear map](/page/Linear%20Map) and $x_0,x_1$ are arbitrary, controllability on the interval $(0,T)$ is equivalent to $\mathcal{R}_T=\mathbb{R}^n$. Therefore the pair $(A,B)$ is controllable for every $T>0$ if and only if $\mathcal{R}_T=\mathbb{R}^n$ for every $T>0$.
[/step]
[step:Show every reachable vector lies in the Kalman span]
Define the controllability matrix $K \in \mathbb{R}^{n \times nm}$ by
\begin{align*}
K := \begin{bmatrix}B & AB & A^2B & \cdots & A^{n-1}B\end{bmatrix},
\end{align*}
and define its range
\begin{align*}
W := \operatorname{Range}(K) \subseteq \mathbb{R}^n.
\end{align*}
Equivalently,
\begin{align*}
W = \operatorname{span}\{A^kBv : k \in \{0,\dots,n-1\},\ v \in \mathbb{R}^m\}.
\end{align*}
We claim that $\mathcal{R}_T \subseteq W$. By the [Cayley-Hamilton Theorem](/theorems/865), every power $A^j$ with $j \ge n$ is a linear combination of $I,A,\dots,A^{n-1}$. Hence every partial sum of the matrix exponential series
\begin{align*}
e^{tA} = \sum_{j=0}^{\infty} \frac{t^jA^j}{j!}
\end{align*}
sends $\operatorname{Range}(B)$ into $W$. The matrix exponential series converges in the finite-dimensional normed space $\mathbb{R}^{n \times n}$, and $W$ is closed because it is a finite-dimensional linear subspace of $\mathbb{R}^n$. Passing to the limit gives
\begin{align*}
\operatorname{Range}(e^{tA}B) \subseteq W.
\end{align*}
Therefore, for every $s \in (0,T)$ and every control value $u(s) \in \mathbb{R}^m$,
\begin{align*}
e^{(T-s)A}Bu(s) \in W.
\end{align*}
For fixed $T>0$, the interval $(0,T)$ has finite $\mathcal{L}^1$-measure. Since $u \in L^2((0,T);\mathbb{R}^m)$, we have $u \in L^1((0,T);\mathbb{R}^m)$ by Hölder's inequality with exponents $2$ and $2$. The map $s \mapsto e^{(T-s)A}B$ is continuous on $[0,T]$, hence bounded in operator norm, so the integrand $s \mapsto e^{(T-s)A}Bu(s)$ is Bochner integrable. Since this integrand takes values in the closed subspace $W$ for $\mathcal{L}^1$-almost every $s \in (0,T)$, its Bochner integral belongs to $W$. Thus every vector in $\mathcal{R}_T$ lies in $W$.
[guided]
We want to prove that no control can produce a terminal displacement outside the span of the columns of
\begin{align*}
K = \begin{bmatrix}B & AB & A^2B & \cdots & A^{n-1}B\end{bmatrix}.
\end{align*}
Let
\begin{align*}
W := \operatorname{Range}(K).
\end{align*}
This means that $W$ consists exactly of finite linear combinations of vectors of the form $A^kBv$, where $k \in \{0,\dots,n-1\}$ and $v \in \mathbb{R}^m$.
The integrand in the reachable vector is $e^{(T-s)A}Bu(s)$, so we need to know where the columns of $e^{tA}B$ live. By the [Cayley-Hamilton Theorem](/theorems/865), the matrix $A$ satisfies its own characteristic polynomial of degree $n$. Consequently every power $A^j$ with $j \ge n$ can be rewritten as a linear combination of $I,A,\dots,A^{n-1}$. The partial sums of the matrix exponential series
\begin{align*}
e^{tA} = \sum_{j=0}^{\infty} \frac{t^jA^j}{j!}
\end{align*}
therefore send $\operatorname{Range}(B)$ into $W$. The matrix exponential series converges in $\mathbb{R}^{n \times n}$, and since $W$ is a finite-dimensional subspace of $\mathbb{R}^n$, it is closed. Therefore the limit of these partial sums also sends $\operatorname{Range}(B)$ into $W$. Hence
\begin{align*}
\operatorname{Range}(e^{tA}B) \subseteq W
\end{align*}
for every $t \in \mathbb{R}$.
Now fix $u \in L^2((0,T);\mathbb{R}^m)$. For each $s \in (0,T)$, the vector $Bu(s)$ lies in $\operatorname{Range}(B)$, and applying $e^{(T-s)A}$ keeps the result inside $W$ by the preceding paragraph:
\begin{align*}
e^{(T-s)A}Bu(s) \in W.
\end{align*}
Because $W$ is a finite-dimensional subspace of $\mathbb{R}^n$, it is closed. We also need the displayed integral to be a genuine Bochner integral. The interval $(0,T)$ has finite $\mathcal{L}^1$-measure, so $u \in L^2((0,T);\mathbb{R}^m)$ implies $u \in L^1((0,T);\mathbb{R}^m)$ by Hölder's inequality with exponents $2$ and $2$. The map $s \mapsto e^{(T-s)A}B$ is continuous on the compact interval $[0,T]$, hence bounded in operator norm. Thus $s \mapsto e^{(T-s)A}Bu(s)$ is Bochner integrable. A Bochner integral of an integrable map whose values lie almost everywhere in a closed linear subspace lies in that subspace, so
\begin{align*}
\int_0^{\!T}e^{(T-s)A}B u(s)\,d\mathcal{L}^1(s) \in W.
\end{align*}
Since $u$ was arbitrary, this proves $\mathcal{R}_T \subseteq W$.
[/guided]
[/step]
[step:Show the Kalman span lies in the reachable subspace]
We prove the reverse inclusion by orthogonal complements. Let $y \in \mathbb{R}^n$ satisfy $y \perp \mathcal{R}_T$ with respect to the Euclidean [inner product](/page/Inner%20Product). Define the continuous map
\begin{align*}
g:(0,T) \to \mathbb{R}^m,\qquad s \mapsto B^\top e^{(T-s)A^\top}y.
\end{align*}
Since $g$ is continuous on the bounded interval $(0,T)$ and extends continuously to $[0,T]$, we have $g \in L^2((0,T);\mathbb{R}^m)$. For every $u \in L^2((0,T);\mathbb{R}^m)$,
\begin{align*}
0 = y \cdot \int_0^{\!T}e^{(T-s)A}B u(s)\,d\mathcal{L}^1(s)
= \int_0^{\!T}g(s)\cdot u(s)\,d\mathcal{L}^1(s).
\end{align*}
By the nondegeneracy of the $L^2$ inner product, applied to the [Hilbert space](/page/Hilbert%20Space) $L^2((0,T);\mathbb{R}^m)$, a function whose inner product with every $L^2$ [test function](/page/Test%20Function) is zero must vanish in $L^2$. Hence
\begin{align*}
B^\top e^{(T-s)A^\top}y = 0
\end{align*}
for $\mathcal{L}^1$-almost every $s \in (0,T)$. Since $g$ is continuous, $g(s)=0$ for every $s \in (0,T)$.
Define
\begin{align*}
h:\mathbb{R} \to \mathbb{R}^m,\qquad t \mapsto B^\top e^{tA^\top}y.
\end{align*}
Then $h(t)=0$ for every $t \in (0,T)$. Since $h$ is real analytic on $\mathbb{R}$ and vanishes on the nonempty open interval $(0,T)$, the identity theorem for real analytic functions gives $h(t)=0$ for every $t \in \mathbb{R}$. Therefore all derivatives of $h$ at $t=0$ vanish. For each integer $k \ge 0$,
\begin{align*}
h^{(k)}(0) = B^\top (A^\top)^k y.
\end{align*}
Thus
\begin{align*}
B^\top (A^\top)^k y = 0
\end{align*}
for every $k \in \{0,\dots,n-1\}$. Equivalently,
\begin{align*}
y \cdot A^kBv = 0
\end{align*}
for every $k \in \{0,\dots,n-1\}$ and every $v \in \mathbb{R}^m$. Hence $y \perp W$.
We have shown $\mathcal{R}_T^\perp \subseteq W^\perp$. Taking orthogonal complements in the finite-dimensional Hilbert space $\mathbb{R}^n$ gives
\begin{align*}
W \subseteq \mathcal{R}_T.
\end{align*}
Together with $\mathcal{R}_T \subseteq W$, this proves $\mathcal{R}_T=W$.
[/step]
[step:Convert equality of subspaces into the rank condition]
For every $T>0$, we have proved
\begin{align*}
\mathcal{R}_T = \operatorname{Range}\begin{bmatrix}B & AB & A^2B & \cdots & A^{n-1}B\end{bmatrix}.
\end{align*}
By the first step, controllability is equivalent to $\mathcal{R}_T=\mathbb{R}^n$ for every $T>0$. Therefore controllability is equivalent to
\begin{align*}
\operatorname{Range}\begin{bmatrix}B & AB & A^2B & \cdots & A^{n-1}B\end{bmatrix} = \mathbb{R}^n.
\end{align*}
For an $n \times nm$ real matrix, its range is all of $\mathbb{R}^n$ if and only if its rank is $n$. Hence $(A,B)$ is controllable if and only if
\begin{align*}
\operatorname{rank}\begin{bmatrix}B & AB & A^2B & \cdots & A^{n-1}B\end{bmatrix} = n.
\end{align*}
This is the desired equivalence.
[/step]