[proofplan]
We combine [Rouche's theorem](/theorems/357) (which counts zeros with multiplicity) with a perturbation argument showing that, for $w \neq f(a)$ sufficiently close to $f(a)$, no zero of $f - w$ can be a multiple zero. The key observation is that a multiple zero of $f - w$ must simultaneously satisfy $f(z) = w$ and $f'(z) = 0$. Since the zeros of $f'$ near $a$ are isolated, shrinking $r$ avoids them for the perturbed equation.
[/proofplan]
[step:Set up the local factorisation and count zeros with multiplicity via Rouche's theorem]
Write $f(z) - f(a) = (z - a)^k h(z)$ with $h$ holomorphic and $h(a) \neq 0$. Choose $r > 0$ small enough that:
1. $\overline{B(a, r)} \subseteq U$,
2. $h(z) \neq 0$ on $\overline{B(a, r)}$,
3. $a$ is the only zero of $f - f(a)$ in $\overline{B(a, r)}$.
Define $\delta = \min_{|z-a|=r} |f(z) - f(a)| > 0$. By [Rouche's theorem](/theorems/357) (as in the proof of the [Open Mapping Theorem](/theorems/358)), for every $w$ with $|w - f(a)| < \delta$, the function $f - w$ has exactly $k$ zeros in $B(a, r)$, counted with multiplicity.
[guided]
Why does Rouche's theorem apply? Write $f(z) - w = (f(z) - f(a)) + (f(a) - w)$. On the circle $|z - a| = r$, we have $|f(z) - f(a)| \geq \delta$ (by definition of $\delta$) and $|f(a) - w| = |w - f(a)| < \delta$ (by hypothesis). So $|f(z) - f(a)| > |f(a) - w|$ on the contour, and Rouche's theorem gives that $f - w = (f - f(a)) + (f(a) - w)$ has the same number of zeros as $f - f(a)$ inside $|z - a| < r$. The zero count of $f - f(a) = (z-a)^k h(z)$ is $k$ because $h \neq 0$ on $\overline{B(a,r)}$, so the only zero comes from $(z-a)^k$ with multiplicity $k$.
[/guided]
[/step]
[step:Show the $k$ zeros are distinct for $w \neq f(a)$ by avoiding critical points of $f$]
A zero $z^*$ of $f - w$ has multiplicity $\geq 2$ if and only if $f(z^*) = w$ and $f'(z^*) = 0$ simultaneously. This is because $f(z) - w$ has a zero of order $\geq 2$ at $z^*$ precisely when $(f - w)(z^*) = 0$ and $(f - w)'(z^*) = f'(z^*) = 0$.
Since $f$ is non-constant, $f' \not\equiv 0$, so the zeros of $f'$ in $\overline{B(a, r)}$ are isolated. By shrinking $r$ if necessary (and redefining $\delta$ accordingly), we may assume that $f'$ has no zeros in $\overline{B(a, r)} \setminus \{a\}$.
Set $\varepsilon = \delta$ (after the final choice of $r$). For $w \in B(f(a), \varepsilon) \setminus \{f(a)\}$, the $k$ zeros of $f - w$ in $B(a, r)$ are all distinct from $a$ (since $f(a) \neq w$). These zeros lie in $B(a, r) \setminus \{a\}$, where $f' \neq 0$. Therefore no zero of $f - w$ can simultaneously be a zero of $f'$, so every zero of $f - w$ is simple. Since all $k$ zeros are simple, they are distinct.
[guided]
Why must a multiple zero of $f - w$ also be a zero of $f'$? If $f(z^*) = w$ and $f - w$ has a zero of order $m \geq 2$ at $z^*$, write $f(z) - w = (z - z^*)^m \tilde{h}(z)$ with $\tilde{h}(z^*) \neq 0$. Differentiating gives $f'(z) = m(z - z^*)^{m-1} \tilde{h}(z) + (z - z^*)^m \tilde{h}'(z)$, and evaluating at $z = z^*$ yields $f'(z^*) = 0$. Conversely, if $f(z^*) = w$ and $f'(z^*) \neq 0$, then $f - w$ has a simple zero at $z^*$, so multiple zeros of $f - w$ occur exactly at the critical points of $f$ (where $f' = 0$).
The strategy is to arrange that no solution of $f(z) = w$ near $a$ lands on a critical point. Since $f'$ has only isolated zeros, shrinking $r$ removes all critical points except possibly $a$. For $w \neq f(a)$, the solutions of $f(z) = w$ are close to $a$ but different from $a$, and for $|w - f(a)|$ small enough they remain in a region where $f' \neq 0$.
[/guided]
[/step]