Inverse Function Theorem (Theorem # 51)
Theorem
Let $U \subseteq \mathbb{R}^n$ be an open set and let $f: U \to \mathbb{R}^n$ be a $C^1$ map. If the [linear map](/page/Linear%20Map) $D_a f$ is invertible (i.e., $\det([D_a f]) \neq 0$) at a point $a \in U$, then:
1. There exist [open sets](/page/Open%20Set) $V \subseteq U$ containing $a$ and $W \subseteq \mathbb{R}^n$ containing $f(a)$ such that $f|_V : V \to W$ is a bijection.
2. The inverse map $g: W \to V$ defined by $g = (f|_V)^{-1}$ is [continuously](/page/Continuity) [differentiable](/page/Derivative) ($C^1$).
3. The derivative of the inverse [function](/page/Function) $g$ at $b = f(a)$ is the inverse of the linear map $D_a f$:
\begin{align*}
D_b g = (D_a f)^{-1}
\end{align*}
Calculus
Multivariable Calculus
Discussion
The Inverse Function Theorem: if a $C^1$ map has an invertible [derivative](/page/Derivative) at a point, it is locally a $C^1$-diffeomorphism. The proof uses the [Contraction Mapping Theorem](/theorems/71) to construct the local inverse. This is the foundational result for local coordinate changes, implicit [function](/page/Function) theory, and smooth manifold theory.
Proof
[proofplan]
We reduce to the case where the total derivative at $a$ is the identity by pre-composing with the inverse of $Df_a$, writing $f(x) = x + \varphi(x)$ with $D_a\varphi = 0$. Continuity of $x \mapsto D_x\varphi$ localises $\varphi$ to a $\tfrac{1}{2}$-Lipschitz perturbation on a closed ball. Surjectivity onto a neighbourhood follows from the [Contraction Mapping Theorem](/theorems/71) applied to the auxiliary map $T_y(x) = y - \varphi(x)$, and injectivity is immediate from the contraction bound. The inverse $g$ is shown to be Lipschitz (hence continuous), and its differentiability with the formula $D_bg = (D_{g(b)}f)^{-1}$ follows from the definition of differentiability and the Lipschitz estimate, with continuity of $b \mapsto D_bg$ ensured by continuity of matrix inversion on $\mathrm{GL}_n(\mathbb{R})$.
[/proofplan]
[step:Reduce to the case $D_af = \mathrm{Id}$ by pre-composing with $(D_af)^{-1}$]
Set $\alpha := D_af: \mathbb{R}^n \to \mathbb{R}^n$, which is an [invertible linear map](/page/Linear%20Map) by hypothesis. Replace $f$ with $\alpha^{-1} \circ f$. Since $D_a(\alpha^{-1} \circ f) = \alpha^{-1} \circ D_af = \mathrm{Id}$, and $\alpha^{-1} \circ f$ is a $C^1$-diffeomorphism if and only if $f$ is (because $\alpha^{-1}$ is a linear isomorphism), it suffices to prove the theorem for the replaced map. Henceforth assume $D_af = \mathrm{Id}$.
Define the perturbation
\begin{align*}
\varphi: U &\to \mathbb{R}^n \\
x &\mapsto f(x) - x,
\end{align*}
so that $f(x) = x + \varphi(x)$ and $D_a\varphi = D_af - \mathrm{Id} = 0$.
[guided]
The goal is to show $f$ is locally invertible near $a$ with a $C^1$ inverse. The hypothesis is that $D_af$ is an invertible linear map, but working with a general invertible linear map adds unnecessary complexity to every estimate. The reduction step eliminates this: by replacing $f$ with $\alpha^{-1} \circ f$ where $\alpha = D_af$, we arrange that $D_af = \mathrm{Id}$. This is legitimate because $\alpha^{-1}$ is a linear diffeomorphism of $\mathbb{R}^n$, so $f$ is a local diffeomorphism if and only if $\alpha^{-1} \circ f$ is.
With this normalisation, write $f(x) = x + \varphi(x)$ where
\begin{align*}
\varphi: U &\to \mathbb{R}^n \\
x &\mapsto f(x) - x.
\end{align*}
Then $D_a\varphi = D_af - \mathrm{Id} = 0$. The map $\varphi$ is the "nonlinear part" of $f$ near $a$, and the condition $D_a\varphi = 0$ means this nonlinear part has vanishing first-order approximation at $a$ -- it is a perturbation that becomes negligible near $a$. The entire proof now reduces to exploiting this smallness.
[/guided]
[/step]
[step:Localise so that $\varphi$ is a $\tfrac{1}{2}$-contraction on $\overline{B}(a,r)$]
Since $f$ is $C^1$ and $D_a\varphi = 0$, the map $x \mapsto D_x\varphi$ is continuous with $D_a\varphi = 0$, so there exists $r > 0$ such that $\overline{B}(a,r) \subset U$ and
\begin{align*}
\|D_x\varphi\| \leq \tfrac{1}{2} \quad \text{for all } x \in \overline{B}(a,r).
\end{align*}
By the [Mean Value Inequality](/theorems/328), for all $x, x' \in \overline{B}(a,r)$:
\begin{align*}
\|\varphi(x) - \varphi(x')\| \leq \tfrac{1}{2}\|x - x'\|.
\end{align*}
The hypotheses of the [Mean Value Inequality](/theorems/328) are satisfied: $\overline{B}(a,r)$ is convex, $\varphi$ is differentiable on $\overline{B}(a,r)$, and $\|D_x\varphi\| \leq \tfrac{1}{2}$ throughout.
[guided]
Why do we need $\varphi$ to be a contraction? The plan is to construct the inverse of $f$ via the [Contraction Mapping Theorem](/theorems/71), which requires a map with [Lipschitz](/page/Continuity%20(Metric%20Spaces)) constant strictly less than $1$. Since $D_a\varphi = 0$ and $x \mapsto D_x\varphi$ is continuous (because $f$ is $C^1$), we can make $\|D_x\varphi\|$ as small as we like by restricting to a sufficiently small ball around $a$. Choosing $r > 0$ so that $\|D_x\varphi\| \leq \tfrac{1}{2}$ on $\overline{B}(a,r)$, the [Mean Value Inequality](/theorems/328) converts this pointwise derivative bound into the Lipschitz estimate
\begin{align*}
\|\varphi(x) - \varphi(x')\| \leq \sup_{z \in \overline{B}(a,r)} \|D_z\varphi\| \cdot \|x - x'\| \leq \tfrac{1}{2}\|x - x'\|
\end{align*}
for all $x, x' \in \overline{B}(a,r)$. The convexity of $\overline{B}(a,r)$ is needed for the Mean Value Inequality (the line segment from $x$ to $x'$ must remain in the domain). The specific constant $\tfrac{1}{2}$ is not special -- any $\lambda \in (0,1)$ would work -- but $\tfrac{1}{2}$ makes the subsequent estimates clean.
[/guided]
[/step]
[step:Prove surjectivity onto a ball around $f(a)$ via the Contraction Mapping Theorem]
Set $V := B(a,r)$. Fix $y \in \mathbb{R}^n$ with $\|y - f(a)\| < r/2$. Define the auxiliary map
\begin{align*}
T_y: \overline{B}(a,r) &\to \mathbb{R}^n \\
x &\mapsto y - \varphi(x).
\end{align*}
We verify that $T_y$ maps $\overline{B}(a,r)$ into itself. For $x \in \overline{B}(a,r)$:
\begin{align*}
\|T_y(x) - a\| &= \|y - \varphi(x) - a\| \\
&\leq \|y - f(a)\| + \|\varphi(a) - \varphi(x)\| \\
&< \tfrac{r}{2} + \tfrac{1}{2}\|x - a\| \\
&\leq \tfrac{r}{2} + \tfrac{r}{2} = r,
\end{align*}
where the second line uses $f(a) = a + \varphi(a)$ and the triangle inequality, and the third uses the hypothesis $\|y - f(a)\| < r/2$ and the contraction estimate from the previous step.
For the contraction property, for $x, x' \in \overline{B}(a,r)$:
\begin{align*}
\|T_y(x) - T_y(x')\| = \|\varphi(x') - \varphi(x)\| \leq \tfrac{1}{2}\|x - x'\|.
\end{align*}
The space $\overline{B}(a,r)$ is a closed subset of the complete metric space $\mathbb{R}^n$, hence complete. By the [Contraction Mapping Theorem](/theorems/71), $T_y$ has a unique fixed point $x_0 \in \overline{B}(a,r)$:
\begin{align*}
x_0 = T_y(x_0) = y - \varphi(x_0),
\end{align*}
which rearranges to $f(x_0) = x_0 + \varphi(x_0) = y$. Since $\|T_y(x_0) - a\| < r$ (strict inequality from the estimate above), the fixed point satisfies $x_0 \in B(a,r) = V$. Therefore $y \in f(V)$, and $W := f(V)$ contains the ball $B(f(a), r/2)$.
[guided]
The idea is to reformulate the equation $f(x) = y$ as a fixed-point problem. Since $f(x) = x + \varphi(x)$, the equation $x + \varphi(x) = y$ rearranges to $x = y - \varphi(x)$, so a solution is precisely a fixed point of the auxiliary map
\begin{align*}
T_y: \overline{B}(a,r) &\to \mathbb{R}^n, \qquad T_y(x) := y - \varphi(x).
\end{align*}
To apply the [Contraction Mapping Theorem](/theorems/71), three hypotheses must be verified: (i) the ambient space $\overline{B}(a,r)$ is a complete metric space, (ii) $T_y$ maps $\overline{B}(a,r)$ into itself, and (iii) $T_y$ is a strict contraction.
Completeness holds because $\overline{B}(a,r)$ is a closed subset of the complete space $\mathbb{R}^n$. For the self-mapping property, note that $f(a) = a + \varphi(a)$, so $a = f(a) - \varphi(a)$. For any $x \in \overline{B}(a,r)$, the triangle inequality gives
\begin{align*}
\|T_y(x) - a\| &= \|y - \varphi(x) - a\| = \|(y - f(a)) + (\varphi(a) - \varphi(x))\| \\
&\leq \|y - f(a)\| + \|\varphi(a) - \varphi(x)\| < \tfrac{r}{2} + \tfrac{1}{2}\|x - a\| \leq \tfrac{r}{2} + \tfrac{r}{2} = r.
\end{align*}
The first term uses the hypothesis $\|y - f(a)\| < r/2$, and the second uses the $\tfrac{1}{2}$-Lipschitz bound $\|\varphi(a) - \varphi(x)\| \leq \tfrac{1}{2}\|x - a\|$ from the previous step. The contraction property follows directly from the same Lipschitz estimate:
\begin{align*}
\|T_y(x) - T_y(x')\| = \|(y - \varphi(x)) - (y - \varphi(x'))\| = \|\varphi(x') - \varphi(x)\| \leq \tfrac{1}{2}\|x - x'\|.
\end{align*}
The [Contraction Mapping Theorem](/theorems/71) now yields a unique fixed point $x_0 \in \overline{B}(a,r)$ satisfying $x_0 = T_y(x_0) = y - \varphi(x_0)$. Rearranging: $f(x_0) = x_0 + \varphi(x_0) = y$. The strict inequality $\|T_y(x_0) - a\| < r$ (not merely $\leq r$) from the self-mapping estimate ensures that $x_0 \in B(a,r) = V$, so the fixed point lies in the open ball, not merely in $\overline{B}(a,r)$.
The constraint $\|y - f(a)\| < r/2$ determines the size of the image neighbourhood $W \supseteq B(f(a), r/2)$. The factor of $2$ arises from the contraction constant $\lambda = \tfrac{1}{2}$: in the self-mapping estimate, the budget $r$ is split between the displacement $\|y - f(a)\| < r/2$ and the perturbation $\|\varphi(a) - \varphi(x)\| \leq \lambda r = r/2$. With a general contraction constant $\lambda \in (0,1)$, the image ball would have radius $r(1 - \lambda)$.
[/guided]
[/step]
[step:Prove injectivity of $f$ on $V$]
Suppose $f(x) = f(x')$ for $x, x' \in V$. Then $x + \varphi(x) = x' + \varphi(x')$, so
\begin{align*}
\|x - x'\| = \|\varphi(x') - \varphi(x)\| \leq \tfrac{1}{2}\|x - x'\|.
\end{align*}
This forces $\|x - x'\| = 0$, hence $x = x'$. Therefore $f|_V: V \to W$ is a bijection.
[/step]
[step:Show that $W = f(V)$ is open]
Let $y_0 = f(x_0) \in W$ with $x_0 \in V$. Since $V$ is open, there exists $\delta > 0$ with $B(x_0, \delta) \subset V$. Apply the contraction argument from the surjectivity step with $a$ replaced by $x_0$ and $r$ replaced by $\delta$: the derivative bound $\|D_x\varphi\| \leq \tfrac{1}{2}$ still holds on $\overline{B}(x_0, \delta) \subset \overline{B}(a,r)$, so the same argument shows $B(y_0, \delta/2) \subset f(B(x_0, \delta)) \subset W$. Since every point of $W$ has a neighbourhood contained in $W$, the set $W$ is open.
[/step]
[step:Prove the inverse $g = (f|_V)^{-1}$ is Lipschitz]
For $y = f(x)$ and $y' = f(x')$ in $W$, write $y - y' = (x - x') + (\varphi(x) - \varphi(x'))$ and rearrange using the triangle inequality:
\begin{align*}
\|x - x'\| &\leq \|y - y'\| + \|\varphi(x) - \varphi(x')\| \leq \|y - y'\| + \tfrac{1}{2}\|x - x'\|.
\end{align*}
Subtracting $\tfrac{1}{2}\|x - x'\|$ from both sides:
\begin{align*}
\|g(y) - g(y')\| = \|x - x'\| \leq 2\|y - y'\|.
\end{align*}
Hence $g$ is Lipschitz with constant $2$, and in particular continuous.
[/step]
[step:Show $g$ is $C^1$ with $D_bg = (D_{g(b)}f)^{-1}$]
Fix $b = f(a_0) \in W$ where $a_0 = g(b) \in V$. Set $\beta := (D_{a_0}f)^{-1} \in \mathcal{L}(\mathbb{R}^n)$, which exists since $D_{a_0}f$ is invertible (the derivative bound $\|D_x\varphi\| \leq \tfrac{1}{2}$ implies $\|D_xf - \mathrm{Id}\| \leq \tfrac{1}{2} < 1$, so $D_xf$ is invertible for all $x \in V$ by the Neumann series). For $k := y - b$ with $y \in W$, set $h := g(y) - g(b) = g(y) - a_0$. Then $f(a_0 + h) = b + k$, and differentiability of $f$ at $a_0$ gives
\begin{align*}
k = D_{a_0}f(h) + \|h\|\,\varepsilon(h),
\end{align*}
where $\varepsilon: \mathbb{R}^n \to \mathbb{R}^n$ satisfies $\varepsilon(h) \to 0$ as $h \to 0$. Applying $\beta$:
\begin{align*}
\beta(k) = h + \beta(\|h\|\,\varepsilon(h)),
\end{align*}
so the differentiability remainder for $g$ is
\begin{align*}
g(y) - g(b) - \beta(k) = -\beta(\|h\|\,\varepsilon(h)).
\end{align*}
Using the Lipschitz bound $\|h\| = \|g(y) - g(b)\| \leq 2\|k\|$ from the previous step:
\begin{align*}
\frac{\|g(y) - g(b) - \beta(k)\|}{\|k\|} &\leq \frac{\|\beta\| \cdot \|h\| \cdot \|\varepsilon(h)\|}{\|k\|} \leq 2\|\beta\| \cdot \|\varepsilon(h)\| \to 0
\end{align*}
as $k \to 0$ (since $h \to 0$ by continuity of $g$, and $\varepsilon(h) \to 0$ by differentiability of $f$). Therefore $g$ is differentiable at $b$ with $D_bg = \beta = (D_{a_0}f)^{-1} = (D_{g(b)}f)^{-1}$.
For continuity of $b \mapsto D_bg$: the map $b \mapsto D_{g(b)}f$ is continuous (composition of the continuous maps $g: W \to V$ and $x \mapsto D_xf: V \to \mathcal{L}(\mathbb{R}^n)$, the latter being continuous because $f$ is $C^1$). Matrix inversion $A \mapsto A^{-1}$ is continuous on $\mathrm{GL}_n(\mathbb{R})$ (the set of invertible $n \times n$ matrices). Therefore $b \mapsto D_bg = (D_{g(b)}f)^{-1}$ is continuous, and $g$ is $C^1$.
[guided]
This is the most delicate step. We must show $g$ is differentiable and compute its derivative. The strategy is to use the definition of differentiability directly: show that $g(y) - g(b) - \beta(y - b)$ is $o(\|y - b\|)$ as $y \to b$, where $\beta = (D_{g(b)}f)^{-1}$ is the candidate derivative.
The key technical input is the Lipschitz estimate $\|h\| \leq 2\|k\|$ from the previous step. Without this, we could not control the ratio $\|h\|/\|k\|$, and the differentiability argument would fail. Here is the computation in detail.
Starting from the differentiability of $f$ at $a_0 = g(b)$: for $h = g(y) - g(b)$ and $k = y - b = f(a_0 + h) - f(a_0)$,
\begin{align*}
k = D_{a_0}f(h) + \|h\|\,\varepsilon(h),
\end{align*}
where $\varepsilon(h) \to 0$ as $h \to 0$. Applying $\beta = (D_{a_0}f)^{-1}$ to both sides:
\begin{align*}
\beta(k) = h + \beta(\|h\|\,\varepsilon(h)).
\end{align*}
Rearranging: $g(y) - g(b) - \beta(k) = h - \beta(k) = -\beta(\|h\|\,\varepsilon(h))$. Taking norms and dividing by $\|k\|$:
\begin{align*}
\frac{\|g(y) - g(b) - \beta(k)\|}{\|k\|} = \frac{\|\beta(\|h\|\,\varepsilon(h))\|}{\|k\|} \leq \|\beta\| \cdot \frac{\|h\|}{\|k\|} \cdot \|\varepsilon(h)\|.
\end{align*}
The Lipschitz bound gives $\|h\|/\|k\| \leq 2$. As $k \to 0$, continuity of $g$ gives $h \to 0$, so $\|\varepsilon(h)\| \to 0$. Therefore the entire expression tends to $0$, confirming $D_bg = \beta = (D_{g(b)}f)^{-1}$.
Why is $D_bg$ continuous in $b$? We need the composition $b \mapsto g(b) \mapsto D_{g(b)}f \mapsto (D_{g(b)}f)^{-1}$ to be continuous. The first arrow is continuous (just proved). The second is continuous because $f$ is $C^1$. The third -- matrix inversion -- is continuous on $\mathrm{GL}_n(\mathbb{R})$, which is an open subset of $\mathbb{R}^{n \times n}$, and $(D_{g(b)}f)$ remains in $\mathrm{GL}_n(\mathbb{R})$ for all $b \in W$ because $\|D_x\varphi\| \leq \tfrac{1}{2} < 1$ implies $D_xf = \mathrm{Id} + D_x\varphi$ is invertible (the Neumann series $\sum_{k=0}^\infty (-D_x\varphi)^k$ converges). Therefore $b \mapsto D_bg$ is continuous and $g$ is $C^1$.
[/guided]
[/step]
Prerequisites (0/6 completed)
Prerequisites Graph
Interactive dependency map showing how this theorem builds on foundational concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Theorems
Definitions & Concepts