[proofplan]
Throughout the proof we assume the scalar field is $\mathbb{C}$ (or more generally any splitting field for $\chi_A$), so that $\chi_A$ factors into linear factors and the algebraic multiplicities sum to $n$. The argument runs along the definitional chain: $A$ is diagonalisable iff $\mathbb{C}^n$ admits a basis of eigenvectors of $A$ iff the total count of linearly independent eigenvectors equals $n$. We identify this total count with $\sum_\lambda m_\lambda$, the sum of geometric multiplicities, using the independence of eigenvectors drawn from distinct eigenspaces. Comparing against the identity $\sum_\lambda M_\lambda = n$ and the bound $m_\lambda \le M_\lambda$ turns the condition $\sum_\lambda m_\lambda = n$ into the pointwise equality $m_\lambda = M_\lambda$ for every eigenvalue.
[/proofplan]
<!-- GAP: the identity $\sum_\lambda M_\lambda = n$ requires that $\chi_A$ splits over the scalar field; this holds over $\mathbb{C}$ always but not over $\mathbb{R}$ in general. -->
[step:Count the total number of linearly independent eigenvectors as $\sum_\lambda m_\lambda$]
Let $\lambda_1, \dots, \lambda_r$ be the distinct [eigenvalues](/page/Eigenvalue%20and%20Eigenvector) of $A$ and let $E_{\lambda_k} = \ker(A - \lambda_k I) \subseteq \mathbb{C}^n$ be the eigenspace of $\lambda_k$, so $m_{\lambda_k} = \dim E_{\lambda_k}$. For each $k$, choose a basis $\mathcal{B}_k = \{v_{k,1}, \dots, v_{k,m_{\lambda_k}}\}$ of $E_{\lambda_k}$. The union
\begin{align*}
\mathcal{B} := \mathcal{B}_1 \cup \cdots \cup \mathcal{B}_r
\end{align*}
is linearly independent: a dependence among its vectors, grouped by eigenspace, would give vectors $w_k \in E_{\lambda_k}$ (not all zero) with $w_1 + \cdots + w_r = 0$, contradicting the [Linear Independence of Eigenvectors for Distinct Eigenvalues](/theorems/920). Hence the maximum number of linearly independent eigenvectors of $A$ is
\begin{align*}
|\mathcal{B}| = \sum_{k=1}^r m_{\lambda_k} = \sum_\lambda m_\lambda.
\end{align*}
[guided]
The [eigenvectors](/page/Eigenvalue%20and%20Eigenvector) of $A$ partition naturally by eigenvalue: any eigenvector $v$ with eigenvalue $\lambda$ lives in $E_\lambda = \ker(A - \lambda I)$. So to count linearly independent eigenvectors we collect them eigenspace by eigenspace and then merge the collections.
Let $\lambda_1, \dots, \lambda_r$ enumerate the distinct eigenvalues of $A$. Inside each eigenspace $E_{\lambda_k}$, we can pick a basis $\mathcal{B}_k = \{v_{k,1}, \dots, v_{k,m_{\lambda_k}}\}$, where $m_{\lambda_k} = \dim E_{\lambda_k}$ is the geometric multiplicity — this is the maximum number of linearly independent vectors we can extract from $E_{\lambda_k}$ alone. Now merge:
\begin{align*}
\mathcal{B} := \mathcal{B}_1 \cup \cdots \cup \mathcal{B}_r.
\end{align*}
Is $\mathcal{B}$ still linearly independent after merging? Suppose for contradiction that some nontrivial linear combination of the vectors in $\mathcal{B}$ equals $0$. Group the terms by eigenspace: for each $k$, let $w_k$ be the partial sum of terms drawn from $\mathcal{B}_k$, so $w_k \in E_{\lambda_k}$ and $w_1 + \cdots + w_r = 0$. Because the dependence was nontrivial, at least one $w_k$ is nonzero. But the [Linear Independence of Eigenvectors for Distinct Eigenvalues](/theorems/920) says that any collection of nonzero eigenvectors corresponding to distinct eigenvalues is linearly independent — so no nontrivial combination $w_1 + \cdots + w_r$ with $w_k \in E_{\lambda_k}$ and not all $w_k = 0$ can vanish. Contradiction. Hence $\mathcal{B}$ is linearly independent.
Conversely, any linearly independent family of eigenvectors of $A$ has at most $|\mathcal{B}|$ elements: grouping such a family by eigenvalue gives at most $\dim E_{\lambda_k} = m_{\lambda_k}$ vectors with eigenvalue $\lambda_k$. So the maximum number of linearly independent eigenvectors of $A$ is exactly
\begin{align*}
|\mathcal{B}| = \sum_{k=1}^r m_{\lambda_k} = \sum_\lambda m_\lambda.
\end{align*}
[/guided]
[/step]
[step:Rewrite diagonalisability as the count reaching $n$]
By definition, $A$ is diagonalisable iff $\mathbb{C}^n$ has a basis consisting of eigenvectors of $A$. Since $\dim \mathbb{C}^n = n$, such a basis exists iff there are $n$ linearly independent eigenvectors of $A$. Combining with the previous step:
\begin{align*}
A \text{ is diagonalisable} \iff \sum_\lambda m_\lambda = n.
\end{align*}
[guided]
Recall the definition: $A$ is diagonalisable if there exists an invertible $P$ and a diagonal $D$ with $A = PDP^{-1}$. Writing $P = [v_1 \mid \cdots \mid v_n]$ column by column, the equation $AP = PD$ says exactly that each column $v_j$ is an eigenvector of $A$ with eigenvalue the $j$-th diagonal entry of $D$. Invertibility of $P$ says the columns form a basis of $\mathbb{C}^n$. Hence
\begin{align*}
A \text{ is diagonalisable} \iff \mathbb{C}^n \text{ has a basis of eigenvectors of } A.
\end{align*}
Since any basis of $\mathbb{C}^n$ has exactly $n$ elements, the existence of a basis of eigenvectors is equivalent to the existence of $n$ linearly independent eigenvectors. By the previous step, the maximum number of linearly independent eigenvectors of $A$ is $\sum_\lambda m_\lambda$, so "$n$ linearly independent eigenvectors exist" is the same as "$\sum_\lambda m_\lambda \ge n$". We must also know that the reverse inequality $\sum_\lambda m_\lambda \le n$ always holds, so that "$\ge n$" upgrades to "$= n$". This uses the bound $m_\lambda \le M_\lambda$ for each eigenvalue $\lambda$ from the [Geometric Multiplicity Bounded by Algebraic Multiplicity](/theorems/919), together with $\sum_\lambda M_\lambda \le n$ (the algebraic multiplicities account for at most $\deg \chi_A = n$ roots of $\chi_A$ counted with multiplicity). Summing gives
\begin{align*}
\sum_\lambda m_\lambda \le \sum_\lambda M_\lambda \le n.
\end{align*}
The refinement $\sum_\lambda M_\lambda = n$ (which requires that $\chi_A$ splits over the scalar field) will be established in the next step, but for the present equivalence only the inequality is needed. Combining:
\begin{align*}
A \text{ is diagonalisable} \iff \sum_\lambda m_\lambda = n.
\end{align*}
[/guided]
[/step]
[step:Pin the equality using $m_\lambda \le M_\lambda$ and $\sum_\lambda M_\lambda = n$]
We combine two facts. First, the algebraic multiplicities sum to $n$: the characteristic polynomial $\chi_A(t) = \det(tI - A)$ has degree $n$, and under the running assumption that the scalar field is $\mathbb{C}$ (or a splitting field for $\chi_A$), $\chi_A$ factors by the [Fundamental Theorem of Algebra](/theorems/347) as
\begin{align*}
\chi_A(t) = \prod_\lambda (t - \lambda)^{M_\lambda},
\end{align*}
so comparing degrees,
\begin{align*}
\sum_\lambda M_\lambda = \deg \chi_A = n.
\end{align*}
Second, for every eigenvalue $\lambda$ the geometric multiplicity is bounded by the algebraic multiplicity,
\begin{align*}
m_\lambda \le M_\lambda,
\end{align*}
by the [Geometric Multiplicity Bounded by Algebraic Multiplicity](/theorems/919). Summing this bound over all eigenvalues gives
\begin{align*}
\sum_\lambda m_\lambda \le \sum_\lambda M_\lambda = n,
\end{align*}
with equality iff $m_\lambda = M_\lambda$ for every $\lambda$, since a sum of nonnegative terms $M_\lambda - m_\lambda \ge 0$ vanishes iff each term vanishes. Combining with the equivalence from the previous step:
\begin{align*}
A \text{ is diagonalisable} \iff \sum_\lambda m_\lambda = n \iff m_\lambda = M_\lambda \text{ for every eigenvalue } \lambda.
\end{align*}
This is the claim.
[guided]
We now have the equivalence "$A$ diagonalisable $\iff \sum_\lambda m_\lambda = n$" from the previous step and must convert the global condition $\sum_\lambda m_\lambda = n$ into the per-eigenvalue condition $m_\lambda = M_\lambda$. The bridge is the algebraic multiplicity.
The characteristic polynomial of $A$ is $\chi_A(t) = \det(tI - A)$, a monic polynomial of degree $n$ (expand $\det(tI - A)$ along any row: the leading term comes from the product of the diagonal entries $\prod_i (t - A_{ii})$, contributing $t^n$). Under our running assumption that the scalar field is $\mathbb{C}$ (or more generally a splitting field for $\chi_A$), $\chi_A$ factors completely into linear factors by the [Fundamental Theorem of Algebra](/theorems/347):
\begin{align*}
\chi_A(t) = \prod_\lambda (t - \lambda)^{M_\lambda},
\end{align*}
where the product ranges over the distinct eigenvalues and $M_\lambda$ is by definition the multiplicity of $\lambda$ as a root. Matching degrees on both sides,
\begin{align*}
n = \deg \chi_A = \sum_\lambda M_\lambda.
\end{align*}
(If the scalar field were $\mathbb{R}$ and $\chi_A$ did not split over $\mathbb{R}$ — for example, $A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}$, whose characteristic polynomial is $t^2 + 1$ — then $A$ has no real eigenvalues, $\sum_\lambda M_\lambda = 0 \ne n$, and $A$ is not diagonalisable over $\mathbb{R}$. The statement of the theorem, with the $\sum_\lambda M_\lambda = n$ identity, implicitly assumes this splitting.)
Second, for every eigenvalue $\lambda$,
\begin{align*}
m_\lambda \le M_\lambda,
\end{align*}
by the [Geometric Multiplicity Bounded by Algebraic Multiplicity](/theorems/919). This is the only point where the interaction between the eigenspace and the characteristic polynomial is used. Summing over $\lambda$:
\begin{align*}
\sum_\lambda m_\lambda \le \sum_\lambda M_\lambda = n.
\end{align*}
The sum $\sum_\lambda (M_\lambda - m_\lambda)$ is a sum of nonnegative terms — it equals zero iff every term equals zero. Hence
\begin{align*}
\sum_\lambda m_\lambda = n \iff M_\lambda - m_\lambda = 0 \text{ for every } \lambda \iff m_\lambda = M_\lambda \text{ for every } \lambda.
\end{align*}
Chaining with the previous step,
\begin{align*}
A \text{ is diagonalisable} \iff \sum_\lambda m_\lambda = n \iff m_\lambda = M_\lambda \text{ for every eigenvalue } \lambda,
\end{align*}
which is the statement of the theorem.
[/guided]
[/step]