[proofplan]
We construct a diffeomorphism from a neighbourhood of the zero section in the normal bundle $\nu_{M \subseteq N}$ to a neighbourhood of $M$ in $N$, using the exponential map of a Riemannian metric on $N$. The proof proceeds in four stages: (1) equip $N$ with a Riemannian metric and define the normal bundle as the orthogonal complement of $TM$ in $TN|_M$; (2) define the exponential map $\exp: \nu_{M \subseteq N} \to N$ on a neighbourhood of the zero section; (3) show that $d(\exp)$ along the zero section is the identity on $TN|_M$, so by the inverse function theorem the exponential map is a local diffeomorphism; (4) restrict to a sufficiently small neighbourhood to obtain a global diffeomorphism onto a tubular neighbourhood.
[/proofplan]
[step:Equip $N$ with a Riemannian metric and define the normal bundle]
Let $M \subseteq N$ be a smooth submanifold of codimension $d = \dim(N) - \dim(M)$. Choose a Riemannian metric $g$ on $N$. At each point $x \in M$, the tangent space $T_x N$ decomposes as an orthogonal direct sum
\begin{align*}
T_x N = T_x M \oplus (T_x M)^\perp,
\end{align*}
where the orthogonal complement is taken with respect to $g_x$. The **normal bundle** is the vector bundle
\begin{align*}
\nu_{M \subseteq N} = \bigsqcup_{x \in M} (T_x M)^\perp \to M
\end{align*}
with fiber $(T_x M)^\perp$ over $x$. This is a smooth vector bundle of rank $d$ over $M$, and the Riemannian metric on $N$ restricts to a fiber metric on $\nu_{M \subseteq N}$.
[guided]
The normal bundle can be defined abstractly as the quotient $TN|_M / TM$, but the Riemannian metric provides a canonical splitting: the orthogonal complement $(T_x M)^\perp$ is a preferred complement of $T_x M$ in $T_x N$, giving an isomorphism $\nu_{M \subseteq N} \cong (TM)^\perp$. This concrete realization is essential for the proof because it lets us "push" normal vectors into $N$ via the exponential map.
The choice of Riemannian metric is not canonical, but different choices produce isomorphic normal bundles (the isomorphism type of $\nu_{M \subseteq N}$ depends only on the embedding $M \hookrightarrow N$, not on the metric). The metric is a tool for constructing the tubular neighbourhood, not part of the data.
[/guided]
[/step]
[step:Define the exponential map on the normal bundle]
The Riemannian metric $g$ on $N$ determines a Levi-Civita connection $\nabla$, and hence geodesics. For $x \in M$ and $v \in (T_x M)^\perp \subseteq T_x N$, let $\gamma_v: [0, 1] \to N$ be the geodesic with $\gamma_v(0) = x$ and $\dot{\gamma}_v(0) = v$. Define the exponential map
\begin{align*}
\exp: \nu_{M \subseteq N} &\to N, \quad (x, v) \mapsto \gamma_v(1) = \exp_x(v).
\end{align*}
This map is defined and smooth on a neighbourhood of the zero section in $\nu_{M \subseteq N}$: since $M$ is compact (or by working locally and using a partition of unity), there exists $\varepsilon > 0$ such that $\exp$ is defined on the $\varepsilon$-disk bundle $\{(x, v) \in \nu_{M \subseteq N} : |v|_g < \varepsilon\}$.
On the zero section, $\exp$ restricts to the identity: $\exp(x, 0) = x$ for all $x \in M$.
[/step]
[step:Show $d(\exp)$ is an isomorphism along the zero section]
We compute the differential of $\exp$ at a point $(x, 0)$ of the zero section. The tangent space to the total space of $\nu_{M \subseteq N}$ at $(x, 0)$ decomposes as
\begin{align*}
T_{(x,0)}(\nu_{M \subseteq N}) \cong T_x M \oplus (T_x M)^\perp,
\end{align*}
where $T_x M$ is the tangent space to the zero section (the "horizontal" directions) and $(T_x M)^\perp$ is the tangent space to the fiber (the "vertical" directions).
**On horizontal vectors:** Let $c: (-\delta, \delta) \to M$ be a curve with $c(0) = x$ and $\dot{c}(0) = w \in T_x M$. Then $\exp(c(t), 0) = c(t)$, so $d(\exp)_{(x,0)}(w, 0) = \dot{c}(0) = w$.
**On vertical vectors:** Let $v \in (T_x M)^\perp$ and consider the curve $t \mapsto (x, tv)$ in the fiber. Then $\exp(x, tv) = \exp_x(tv) = \gamma_{tv}(1)$. Since $\gamma_{tv}(1) = \gamma_v(t)$ (by the rescaling property of geodesics), we get $\frac{d}{dt}\big|_{t=0} \exp_x(tv) = \dot{\gamma}_v(0) = v$. Therefore $d(\exp)_{(x,0)}(0, v) = v$.
Combining, $d(\exp)_{(x,0)}$ is the identity map $T_x M \oplus (T_x M)^\perp \to T_x N$. In particular, it is a linear isomorphism.
[guided]
The computation above shows that the exponential map, viewed as a map from the total space of the normal bundle to $N$, has full-rank differential at every point of the zero section. The horizontal directions (along $M$) are preserved because $\exp$ restricts to the identity on the zero section. The vertical directions (normal to $M$) are preserved because the geodesic through $x$ with initial velocity $v$ starts moving in the direction $v$ at time $0$.
This is the crucial step: the inverse function theorem will now give a local diffeomorphism near each point of the zero section. The remaining work is to glue these local diffeomorphisms into a global one.
[/guided]
[/step]
[step:Apply the inverse function theorem to obtain a local diffeomorphism]
Since $d(\exp)_{(x,0)}$ is an isomorphism for every $x \in M$, the inverse function theorem guarantees that for each $x \in M$, there exist open neighbourhoods $U_x$ of $(x, 0)$ in $\nu_{M \subseteq N}$ and $V_x$ of $x$ in $N$ such that
\begin{align*}
\exp|_{U_x}: U_x \xrightarrow{\sim} V_x
\end{align*}
is a diffeomorphism. The exponential map is therefore a local diffeomorphism on a neighbourhood of the zero section.
[/step]
[step:Restrict to a uniform neighbourhood to obtain a global diffeomorphism]
To obtain a global tubular neighbourhood, we must find a single open set $U \subseteq \nu_{M \subseteq N}$ containing the zero section on which $\exp$ is injective.
When $M$ is compact, this follows from a standard compactness argument. Define the set
\begin{align*}
W = \{(x, v) \in \nu_{M \subseteq N} : |v|_g < \varepsilon(x)\}
\end{align*}
where $\varepsilon(x) > 0$ is chosen small enough that $\exp|_{B_x(\varepsilon(x))}$ is a diffeomorphism (here $B_x(\varepsilon)$ denotes the $\varepsilon$-ball in the fiber over $x$). Since $M$ is compact, we can choose a uniform $\varepsilon > 0$ (by taking the minimum of $\varepsilon(x)$ over a finite cover).
We must also ensure global injectivity: $\exp(x_1, v_1) = \exp(x_2, v_2)$ implies $(x_1, v_1) = (x_2, v_2)$. If this fails for all $\varepsilon > 0$, there exist sequences $(x_n, v_n)$ and $(x_n', v_n')$ with $|v_n|, |v_n'| \to 0$, $(x_n, v_n) \neq (x_n', v_n')$, but $\exp(x_n, v_n) = \exp(x_n', v_n')$. By compactness of $M$, pass to a subsequence with $x_n \to x$ and $x_n' \to x'$. Since $\exp(x_n, v_n) = \exp(x_n', v_n')$ and $v_n, v_n' \to 0$, we get $x = x'$. But then for large $n$, both points lie in the domain of the local diffeomorphism at $(x, 0)$, contradicting injectivity. By this contradiction, there exists a uniform $\varepsilon > 0$ for which $\exp$ is globally injective.
The image $\exp(\{|v|_g < \varepsilon\})$ is an open neighbourhood of $M$ in $N$, and $\exp: \{|v|_g < \varepsilon\} \to \exp(\{|v|_g < \varepsilon\})$ is a diffeomorphism. This is the desired tubular neighbourhood.
[guided]
When $M$ is not compact, the argument requires more care: the function $\varepsilon(x)$ can be chosen to vary continuously (it can be made smooth by a partition of unity argument), and one works with the open disk bundle $D_\varepsilon = \{(x,v) : |v| < \varepsilon(x)\}$ for a positive continuous function $\varepsilon: M \to (0, \infty)$. The global injectivity argument still works locally and can be patched together.
The key geometric content of the theorem is that a neighbourhood of $M$ in $N$ "looks like" the normal bundle: the nearby points are parametrized by (base point in $M$, small normal vector), and this parametrization is smooth and bijective. The exponential map provides the precise identification.
[/guided]
[/step]