A derivative is supposed to be a local linear approximation, but differentiability at isolated points is not enough to support the geometric and analytic arguments that calculus wants to make. A function can have a derivative at every point while its derivative jumps, oscillates, or refuses to behave uniformly. Then tangent planes exist pointwise, yet the function may not be well approximated in a stable way as the base point moves. The class of $C^1$ functions is the first place where pointwise first-order calculus becomes a robust theory.
The central question is this: when does a function have first derivatives that vary continuously with the point? That extra continuity turns the derivative from a list of unrelated local data into a continuous field of linear maps. It is the condition behind the chain rule in its clean form, inverse and implicit function theorems, change of variables, tangent vector fields, and the passage from local estimates to estimates on compact sets.
[example: A Differentiable Function Whose Derivative Is Not Continuous]
Let $f: \mathbb R \to \mathbb R$ be given by $f(0)=0$ and $f(x)=x^2\sin(1/x)$ for $x \neq 0$. For $x \neq 0$, the product rule and one-variable chain rule give
\begin{align*}
f'(x)=2x\sin(1/x)+x^2\cos(1/x)(-x^{-2})=2x\sin(1/x)-\cos(1/x).
\end{align*}
At $0$, the difference quotient is
\begin{align*}
\frac{f(h)-f(0)}{h}=\frac{h^2\sin(1/h)}{h}=h\sin(1/h)
\end{align*}
for $h \neq 0$. Since $|\sin(1/h)|\le 1$, we have
\begin{align*}
|h\sin(1/h)|\le |h|
\end{align*}
and therefore $h\sin(1/h)\to 0$ as $h\to 0$. Hence $f'(0)=0$.
The derivative is not continuous at $0$. Let $x_k=1/(2\pi k)$ and $y_k=1/((2k+1)\pi)$ for $k\ge 1$. Then $x_k\to 0$ and $y_k\to 0$, while
\begin{align*}
f'(x_k)=2x_k\sin(2\pi k)-\cos(2\pi k)=0-1=-1
\end{align*}
and
\begin{align*}
f'(y_k)=2y_k\sin((2k+1)\pi)-\cos((2k+1)\pi)=0-(-1)=1.
\end{align*}
Thus $f'(x)$ has no limit as $x\to 0$, so $f'$ is not continuous at $0$. The function is differentiable everywhere, but it is not $C^1$ on any neighbourhood of $0$.
[/example]
This example isolates the missing ingredient. Differentiability says that each point has its own tangent line. The $C^1$ condition says that those tangent lines themselves move continuously. That continuity is not cosmetic: it is exactly what lets local linear approximations be controlled uniformly over nearby points.
## Definition
The one-variable example suggests the right definition, but in analysis the useful form is already multidimensional. A map $f: U \to \mathbb R^m$ on an [open set](/page/Open%20Set) $U \subset \mathbb R^n$ should be called $C^1$ when all first partial derivatives exist and form continuous functions. The extra continuity is what lets the directional rates of change assemble into a derivative field rather than a disconnected collection of pointwise data.
[definition: $C^1$ Function]
Let $U \subset \mathbb R^n$ be open, and let $f: U \to \mathbb R^m$ be a function. The function $f$ is a $C^1$ function on $U$ if each component $f_i: U \to \mathbb R$ has first partial derivatives $\partial_{x_j} f_i: U \to \mathbb R$ for every $1 \le i \le m$ and $1 \le j \le n$, and each function $\partial_{x_j} f_i$ is continuous on $U$.
[/definition]
For scalar-valued functions this says that the gradient exists and is continuous. For vector-valued functions it says that every entry of the first derivative data varies continuously with the point. The next definition is needed because later estimates and convergence statements refer to the entire collection of such maps, not just to the property of one function.
[definition: $C^1(U; \mathbb R^m)$]
Let $U \subset \mathbb R^n$ be open. The space $C^1(U; \mathbb R^m)$ is the set of all $C^1$ functions $f: U \to \mathbb R^m$.
[/definition]
When $m=1$, it is common to write $C^1(U)$ for $C^1(U;\mathbb R)$. If the codomain matters, it should be included explicitly.
A useful way to remember the definition is that $C^1$ asks for continuity one level higher than the original function. Since continuity of first partial derivatives forces continuity of the function itself on open subsets of Euclidean space, the notation $C^1$ fits into the larger hierarchy $C^0, C^1, C^2, \ldots$.
[quotetheorem:327]
This theorem explains why the definition is stated using partial derivatives. Partial derivatives are coordinate-accessible quantities, while differentiability is the coordinate-free first-order approximation. To use the theorem in computations, we need a name for the matrix that stores those coordinate derivatives.
[definition: Jacobian Matrix]
Let $U \subset \mathbb R^n$ be open, let $f \in C^1(U;\mathbb R^m)$, and let $a \in U$. The Jacobian matrix of $f$ at $a$ is the matrix $Jf_a \in \mathbb R^{m \times n}$ with entries
\begin{align*}
(Jf_a)_{ij}=\partial_{x_j}f_i(a).
\end{align*}
[/definition]
The total derivative $Df_a$ is the [linear map](/page/Linear%20Map) $h \mapsto Jf_a h$. The Jacobian matrix is the representation of that map in the standard bases; determinants and matrix ranks belong to the matrix representation.
[example: A Basic $C^1$ Map]
Let $f: \mathbb R^2 \to \mathbb R^2$ be
\begin{align*}
f(x_1,x_2)=(f_1(x_1,x_2),f_2(x_1,x_2))=(x_1^2x_2,e^{x_1}+\sin x_2).
\end{align*}
We compute its first partial derivatives component by component. For the first component,
\begin{align*}
\partial_{x_1}f_1(x_1,x_2)=\partial_{x_1}(x_1^2x_2)=x_2\partial_{x_1}(x_1^2)=2x_1x_2.
\end{align*}
Also,
\begin{align*}
\partial_{x_2}f_1(x_1,x_2)=\partial_{x_2}(x_1^2x_2)=x_1^2\partial_{x_2}(x_2)=x_1^2.
\end{align*}
For the second component,
\begin{align*}
\partial_{x_1}f_2(x_1,x_2)=\partial_{x_1}(e^{x_1}+\sin x_2)=e^{x_1}+0=e^{x_1}.
\end{align*}
Finally,
\begin{align*}
\partial_{x_2}f_2(x_1,x_2)=\partial_{x_2}(e^{x_1}+\sin x_2)=0+\cos x_2=\cos x_2.
\end{align*}
The functions $2x_1x_2$, $x_1^2$, $e^{x_1}$, and $\cos x_2$ are continuous on $\mathbb R^2$, so by the definition of $C^1$ we have $f \in C^1(\mathbb R^2;\mathbb R^2)$.
At $a=(1,0)$, the Jacobian entries are
\begin{align*}
\partial_{x_1}f_1(1,0)=2\cdot 1\cdot 0=0,\quad \partial_{x_2}f_1(1,0)=1^2=1,\quad \partial_{x_1}f_2(1,0)=e^1=e,\quad \partial_{x_2}f_2(1,0)=\cos 0=1.
\end{align*}
Thus
\begin{align*}
Jf_a=\begin{pmatrix}0&1\end{pmatrix}\text{ in the first row and }\begin{pmatrix}e&1\end{pmatrix}\text{ in the second row}.
\end{align*}
Equivalently, for $h=(h_1,h_2)\in \mathbb R^2$,
\begin{align*}
Df_a(h)=Jf_a h=(h_2,eh_1+h_2).
\end{align*}
This example shows how the coordinate partial derivatives assemble into the linear map that gives the derivative at a point.
[/example]
This computation is typical: to verify that a concrete elementary map is $C^1$, compute first partial derivatives and check their continuity. For more delicate functions, especially ones defined piecewise, the same test must include the boundary where formulas meet.
## First-Order Approximation
The reason $C^1$ functions dominate first-order multivariable analysis is that their local linear approximations are stable. At a point $a$, differentiability gives an approximation by $f(a)+Df_a(h)$. Continuity of $Df$ says that the linear part changes little as $a$ changes little. That turns isolated tangent planes into a controlled first-order calculus.
To state this precisely, we need the map that sends a point to its derivative. This object packages all tangent maps into one function, so that the continuity demanded by the $C^1$ condition can be expressed without referring to individual matrix entries.
[definition: Derivative Map]
Let $U \subset \mathbb R^n$ be open and let $f \in C^1(U;\mathbb R^m)$. The derivative map of $f$ is the function $Df: U \to \mathcal{L}(\mathbb R^n,\mathbb R^m)$ defined by sending $a \in U$ to $Df_a$.
[/definition]
The coordinate definition is convenient for checking examples, but the derivative-map formulation is the better structural statement. It says that a $C^1$ function is a differentiable map whose best linear approximation depends continuously on the base point.
[quotetheorem:8518]
This characterization lets us speak about $C^1$ maps without constantly returning to coordinates. The next question is quantitative: what kind of error is made when the function is replaced by its tangent approximation? The answer is the [first-order Taylor approximation](/theorems/8520).
[quotetheorem:8520]
The theorem does not merely say that $f$ is differentiable. It identifies $C^1$ functions as the setting where first-order approximations depend continuously on the base point, which is what makes estimates robust.
[example: Linearization of a Nonlinear Map]
Let $f:\mathbb R^2\to\mathbb R$ be given by $f(x_1,x_2)=x_1e^{x_2}$. Its first partial derivatives are
\begin{align*}
\partial_{x_1}f(x_1,x_2)=e^{x_2}
\end{align*}
and
\begin{align*}
\partial_{x_2}f(x_1,x_2)=x_1e^{x_2}.
\end{align*}
Both are continuous on $\mathbb R^2$, so $f\in C^1(\mathbb R^2)$ and
\begin{align*}
\nabla f(x_1,x_2)=(e^{x_2},x_1e^{x_2}).
\end{align*}
At $a=(2,0)$, we have $f(a)=2e^0=2$ and
\begin{align*}
\nabla f(a)=(e^0,2e^0)=(1,2).
\end{align*}
Thus for $h=(h_1,h_2)$ the linear part is
\begin{align*}
Df_a(h)=\nabla f(a)\cdot h=h_1+2h_2,
\end{align*}
so the first-order approximation is
\begin{align*}
f(a+h)=f(2+h_1,h_2)\approx 2+h_1+2h_2.
\end{align*}
The exact error is
\begin{align*}
r(h)=f(a+h)-f(a)-Df_a(h)=(2+h_1)e^{h_2}-2-h_1-2h_2.
\end{align*}
Since $e^t=1+t+t\alpha(t)$ with $\alpha(t)\to 0$ as $t\to 0$, substituting $t=h_2$ gives
\begin{align*}
r(h)=(2+h_1)(1+h_2+h_2\alpha(h_2))-2-h_1-2h_2.
\end{align*}
Expanding the product,
\begin{align*}
r(h)=2+2h_2+2h_2\alpha(h_2)+h_1+h_1h_2+h_1h_2\alpha(h_2)-2-h_1-2h_2.
\end{align*}
Canceling the constant, linear $h_1$, and linear $h_2$ terms leaves
\begin{align*}
r(h)=h_1h_2+(2+h_1)h_2\alpha(h_2).
\end{align*}
For $|h|\le 1$,
\begin{align*}
\frac{|r(h)|}{|h|}\le \frac{|h_1h_2|}{|h|}+\frac{|2+h_1||h_2||\alpha(h_2)|}{|h|}\le |h_2|+3|\alpha(h_2)|.
\end{align*}
The right-hand side tends to $0$ as $h\to 0$, so the error satisfies $|r(h)|/|h|\to 0$. Thus $2+h_1+2h_2$ is the genuine first-order linearization of $f$ at $(2,0)$.
[/example]
The linearization is not only a local computational device. In optimization, it is how the gradient detects first-order changes; in geometry, it is how tangent maps are constructed; in PDE, it is the first step toward weak and classical solutions.
## Algebra and Composition
### Closure Under Algebraic Operations
A class of functions becomes genuinely useful when it survives the operations one wants to perform. If sums, products, and compositions of $C^1$ functions left the class, the definition would be too fragile for calculus. The continuity of first derivatives is exactly strong enough to make the usual algebraic rules closed inside $C^1$.
Before stating closure under algebraic operations, it is helpful to isolate the size measurement that controls both the function and its first derivative on compact sets. On a fixed compact set this is a norm for restrictions to that set; on the larger space $C^1(V;\mathbb R^m)$ it is only a seminorm, because it ignores what happens away from $K$.
[definition: $C^1$ Seminorm on a Compact Set]
Let $K \subset \mathbb R^n$ be compact, and let $V \subset \mathbb R^n$ be open with $K \subset V$. The $C^1$ seminorm on $K$ is the function from $C^1(V;\mathbb R^m)$ to $[0,\infty)$ defined by
\begin{align*}
\|f\|_{C^1(K)}=\sup_{x \in K}|f(x)|+\sup_{x \in K}\|Df_x\|_{\mathrm{op}}.
\end{align*}
[/definition]
This seminorm is finite because continuous functions on compact sets attain bounded values. It becomes a genuine norm after identifying two functions that have the same value and first derivative data on $K$. The next theorem addresses a different practical question: after building new scalar functions by addition, multiplication, or division, do we remain inside $C^1$ and what are the resulting derivatives?
[quotetheorem:8522]
The quotient condition cannot be removed. Division by a function that vanishes can create a singularity even when the numerator and denominator are smooth away from that point.
[example: Failure of Division at a Zero]
Let $f,g:\mathbb R\to\mathbb R$ be defined by $f(x)=1$ and $g(x)=x$. The derivatives are
\begin{align*}
f'(x)=0
\end{align*}
and
\begin{align*}
g'(x)=1
\end{align*}
for every $x\in\mathbb R$, and the constant functions $0$ and $1$ are continuous on $\mathbb R$. Hence $f$ and $g$ are $C^1$ on $\mathbb R$.
For $x\neq 0$, the quotient is
\begin{align*}
\frac{f(x)}{g(x)}=\frac{1}{x}.
\end{align*}
Its derivative on $\mathbb R\setminus\{0\}$ is
\begin{align*}
\frac{d}{dx}(x^{-1})=-x^{-2}=-\frac{1}{x^2}.
\end{align*}
The obstruction occurs at the zero of $g$. If a function $q:\mathbb R\to\mathbb R$ agreed with $1/x$ for $x\neq 0$ and were continuous at $0$, then the limit $\lim_{x\to 0}q(x)$ would have to exist and equal $q(0)$. But along $x_k=1/k$,
\begin{align*}
q(x_k)=\frac{1}{1/k}=k
\end{align*}
so $q(x_k)\to+\infty$, while along $y_k=-1/k$,
\begin{align*}
q(y_k)=\frac{1}{-1/k}=-k
\end{align*}
so $q(y_k)\to-\infty$. Thus no value assigned at $0$ can make the quotient continuous on $\mathbb R$, let alone $C^1$. The nonvanishing hypothesis in the quotient rule is therefore essential.
[/example]
### Composition and the Chain Rule
Products and quotients keep functions inside the same domain, but composition is what allows one $C^1$ map to feed into another. This is the operation behind coordinate changes and parametrizations, so the derivative rule must preserve both differentiability and continuity of first derivatives.
[quotetheorem:323]
The chain rule is why $C^1$ is the natural minimum regularity for many geometric constructions. It lets tangent vectors be pushed forward through maps and lets coordinate changes preserve differentiability.
[example: A Chain Rule Computation]
Let $f:\mathbb R^2\to\mathbb R^2$ be given by $f(x_1,x_2)=(x_1x_2,x_1+x_2)$, and let $g:\mathbb R^2\to\mathbb R$ be given by $g(y_1,y_2)=y_1^2+\sin y_2$. The component functions of $f$ and $g$ are polynomial or sine combinations, so their first partial derivatives are continuous and both maps are $C^1$.
For $f_1(x_1,x_2)=x_1x_2$,
\begin{align*}
\partial_{x_1}f_1(x_1,x_2)=x_2
\end{align*}
and
\begin{align*}
\partial_{x_2}f_1(x_1,x_2)=x_1.
\end{align*}
For $f_2(x_1,x_2)=x_1+x_2$,
\begin{align*}
\partial_{x_1}f_2(x_1,x_2)=1
\end{align*}
and
\begin{align*}
\partial_{x_2}f_2(x_1,x_2)=1.
\end{align*}
Thus $Jf_{(x_1,x_2)}$ has first row $(x_2,x_1)$ and second row $(1,1)$.
For $g(y_1,y_2)=y_1^2+\sin y_2$,
\begin{align*}
\partial_{y_1}g(y_1,y_2)=2y_1
\end{align*}
and
\begin{align*}
\partial_{y_2}g(y_1,y_2)=\cos y_2.
\end{align*}
Hence $Jg_{(y_1,y_2)}$ is the row $(2y_1,\cos y_2)$. Evaluating this row at $f(x_1,x_2)=(x_1x_2,x_1+x_2)$ gives
\begin{align*}
Jg_{f(x_1,x_2)}=(2x_1x_2,\cos(x_1+x_2)).
\end{align*}
By *Chain Rule for $C^1$ Maps*, $J(g\circ f)_{(x_1,x_2)}=Jg_{f(x_1,x_2)}Jf_{(x_1,x_2)}$. The first entry of this row product is
\begin{align*}
(2x_1x_2)x_2+\cos(x_1+x_2)\cdot 1=2x_1x_2^2+\cos(x_1+x_2).
\end{align*}
The second entry is
\begin{align*}
(2x_1x_2)x_1+\cos(x_1+x_2)\cdot 1=2x_1^2x_2+\cos(x_1+x_2).
\end{align*}
Therefore $J(g\circ f)_{(x_1,x_2)}$ is the row
\begin{align*}
(2x_1x_2^2+\cos(x_1+x_2),2x_1^2x_2+\cos(x_1+x_2)).
\end{align*}
This is the same result one gets by first forming $(g\circ f)(x_1,x_2)=(x_1x_2)^2+\sin(x_1+x_2)$ and then differentiating with respect to $x_1$ and $x_2$.
[/example]
The matrix product in the example is more than a computational convenience. It is the coordinate expression of composing linear approximations.
## Local Control and Mean Value Estimates
### Compact Control of the Derivative
Continuity of the derivative becomes powerful when it is combined with compactness. On a small closed ball inside the domain, the derivative of a $C^1$ function is bounded. That turns a pointwise derivative into a uniform Lipschitz estimate.
To state this local boundedness in a reusable way, we first name the compact containment relation often used in analysis. Estimates near the edge of an open set are fragile: a derivative can become large as the point approaches a missing boundary. Compact containment records that the set being studied has a buffer inside the domain.
[definition: Compact Containment]
Let $U \subset \mathbb R^n$ be open and let $K \subset U$. The set $K$ is compactly contained in $U$, written $K \subset\subset U$, if $\overline{K}$ is compact and $\overline{K} \subset U$.
[/definition]
When $K$ is already closed, this says exactly that $K$ is a compact subset of $U$. In the Euclidean open-set setting, compact containment implies that $K$ has positive distance from $\mathbb R^n\setminus U$. This keeps estimates away from the boundary of the domain. A function can be perfectly $C^1$ on an open set while its derivative becomes unbounded near a missing boundary point. The next theorem turns this boundary-safe compact control into the finite-difference estimate used throughout local analysis.
[quotetheorem:8525]
The convexity assumption ensures that the line segment from $x$ to $y$ stays inside $K$. Without some path condition inside the set, a derivative bound on the set does not directly control differences between points.
[example: Why Compactness Cannot Be Dropped Globally]
Let $f:\mathbb R\to\mathbb R$ be $f(x)=x^2$. Its derivative is
\begin{align*}
f'(x)=2x.
\end{align*}
The function $x\mapsto 2x$ is continuous on $\mathbb R$, so $f\in C^1(\mathbb R)$.
Fix $R>0$ and take $x,y\in[-R,R]$. Then
\begin{align*}
|f(x)-f(y)|=|x^2-y^2|=|(x-y)(x+y)|.
\end{align*}
Using $|ab|=|a||b|$ gives
\begin{align*}
|(x-y)(x+y)|=|x-y||x+y|.
\end{align*}
Since $x,y\in[-R,R]$, we have $|x|\le R$ and $|y|\le R$, so the triangle inequality gives
\begin{align*}
|x+y|\le |x|+|y|\le R+R=2R.
\end{align*}
Therefore
\begin{align*}
|f(x)-f(y)|\le 2R|x-y|.
\end{align*}
Thus $f$ is Lipschitz on each compact interval $[-R,R]$.
On all of $\mathbb R$, however, no constant $L>0$ can satisfy
\begin{align*}
|x^2-y^2|\le L|x-y|
\end{align*}
for every $x,y\in\mathbb R$. If such an $L$ existed, then taking $y=0$ would give
\begin{align*}
|x^2-0^2|\le L|x-0|.
\end{align*}
That is,
\begin{align*}
x^2\le L|x|.
\end{align*}
For $x>0$, this becomes
\begin{align*}
x^2\le Lx.
\end{align*}
Dividing by $x>0$ gives
\begin{align*}
x\le L.
\end{align*}
But choosing $x=L+1$ gives $L+1\le L$, a contradiction. Hence compact local control of the derivative does not imply a global Lipschitz bound on an unbounded domain.
[/example]
This example explains the phrase local Lipschitz. A $C^1$ function is Lipschitz on small compact pieces, but it need not be globally Lipschitz on an unbounded domain.
### The Mean Value Principle
There is also a scalar form of the [mean value theorem](/theorems/186) that identifies a derivative value along a segment. The vector-valued version is better expressed as an inequality, but the scalar theorem remains the model for turning derivative information into finite-difference information.
[quotetheorem:186]
The theorem converts information about the derivative into information about finite differences. In higher dimensions, applying the one-variable theorem along line segments gives the local Lipschitz estimate above.
## Inverse and Implicit Problems
### Local Invertibility
Many analytic problems ask whether an equation can be solved locally. If $f(a)=b$, does $f(x)=y$ have a unique nearby solution $x$ for $y$ near $b$? The $C^1$ condition is the standard regularity threshold where the answer is governed by the first derivative.
The [inverse function theorem](/theorems/51) begins from the idea that a nonlinear map should be locally invertible when its linear approximation is invertible. The statement requires the derivative to vary continuously so that nearby linear approximations remain invertible.
[quotetheorem:51]
The theorem does not say that an invertible derivative at one point gives a global inverse. It gives a local inverse near that point. Global injectivity is a separate issue.
[example: Local Invertibility Without Global Invertibility]
Let $f:\mathbb R\to\mathbb R$ be $f(x)=x^3-x$. Since $x^3-x$ is a polynomial, its derivative is
\begin{align*}
f'(x)=3x^2-1.
\end{align*}
The derivative $x\mapsto 3x^2-1$ is continuous on $\mathbb R$, so $f\in C^1(\mathbb R)$.
At a point $a\in\mathbb R$,
\begin{align*}
f'(a)=3a^2-1.
\end{align*}
Thus $f'(a)=0$ exactly when
\begin{align*}
3a^2-1=0.
\end{align*}
Equivalently,
\begin{align*}
a^2=\frac{1}{3}.
\end{align*}
Hence
\begin{align*}
a=\pm \frac{1}{\sqrt 3}.
\end{align*}
Therefore, if $a\neq \pm 1/\sqrt 3$, then $f'(a)\neq 0$, and the *Inverse Function Theorem* gives a neighbourhood $V$ of $a$ such that $f|_V$ has a $C^1$ local inverse.
This local conclusion is not global injectivity. Indeed,
\begin{align*}
f(-1)=(-1)^3-(-1)=-1+1=0.
\end{align*}
Also,
\begin{align*}
f(0)=0^3-0=0.
\end{align*}
And
\begin{align*}
f(1)=1^3-1=0.
\end{align*}
The three distinct points $-1$, $0$, and $1$ have the same image, so $f$ is not injective on $\mathbb R$. Thus nonzero derivative at a point controls local invertibility near that point, while global invertibility requires information about the function on the whole domain.
[/example]
### Solving Equations by Graphs
The inverse function theorem solves equations of the form $y=f(x)$ by locally reversing the map $f$. A closely related principle treats equations rather than maps: if $F(x,y)=0$, can the variable $y$ be solved as a $C^1$ function of the variable $x$? The obstruction is whether changing $y$ actually changes the value of $F$ to first order. In the real $C^1$ setting, the needed nondegeneracy condition is that the derivative in the variables being solved for is invertible.
We will use this real implicit-function principle in the elementary form needed here: near a point where $F(a,b)=0$ and the [partial derivative](/page/Partial%20Derivative) with respect to $y$ is nonzero, the solution set of $F(x,y)=0$ can be written locally as the graph $y=g(x)$ of a $C^1$ function. This is one reason the $C^1$ condition is central in geometry. It turns nondegenerate level sets into graphs and eventually into manifolds.
[example: The Unit Circle as a Local $C^1$ Graph]
Let $F:\mathbb R^2\to\mathbb R$ be defined by $F(x,y)=x^2+y^2-1$. Its first partial derivatives are
\begin{align*}
\partial_xF(x,y)=2x
\end{align*}
and
\begin{align*}
\partial_yF(x,y)=2y.
\end{align*}
Both coordinate functions $(x,y)\mapsto 2x$ and $(x,y)\mapsto 2y$ are continuous on $\mathbb R^2$, so $F\in C^1(\mathbb R^2)$.
Let $(a,b)$ lie on the unit circle, so
\begin{align*}
F(a,b)=0.
\end{align*}
Equivalently,
\begin{align*}
a^2+b^2-1=0.
\end{align*}
Thus
\begin{align*}
a^2+b^2=1.
\end{align*}
If $b\neq 0$, then
\begin{align*}
\partial_yF(a,b)=2b\neq 0.
\end{align*}
By *[Implicit Function Theorem](/theorems/52)*, there are neighbourhoods $V$ of $a$ and $W$ of $b$, and a function $\varphi\in C^1(V)$, such that for $(x,y)\in V\times W$,
\begin{align*}
F(x,y)=0 \iff y=\varphi(x).
\end{align*}
Substituting $y=\varphi(x)$ into $F(x,y)=0$ gives
\begin{align*}
x^2+\varphi(x)^2-1=0.
\end{align*}
Hence
\begin{align*}
x^2+\varphi(x)^2=1.
\end{align*}
So near any point of the unit circle with nonzero $y$-coordinate, the circle is locally the graph of a $C^1$ function of $x$.
At a point where $b=0$, the equation $a^2+b^2=1$ becomes
\begin{align*}
a^2=1.
\end{align*}
Therefore $a=1$ or $a=-1$. Also
\begin{align*}
\partial_yF(a,0)=2\cdot 0=0.
\end{align*}
In fact the circle cannot be locally represented there as a graph $y=\varphi(x)$ over an open interval around $a$: if $a=1$, every open interval around $1$ contains some $x>1$, and then
\begin{align*}
x^2+y^2-1>1+y^2-1=y^2\ge 0
\end{align*}
with equality impossible because $x^2>1$. Thus no real $y$ satisfies $x^2+y^2=1$ for such $x$. The same argument applies at $a=-1$ using points $x<-1$, for which again $x^2>1$.
However, at these same points one can solve for $x$ as a function of $y$, because
\begin{align*}
\partial_xF(a,0)=2a
\end{align*}
and $a=\pm 1$, so
\begin{align*}
2a\neq 0.
\end{align*}
By *Implicit Function Theorem*, the unit circle is locally a $C^1$ graph in the form $x=\psi(y)$ near $(1,0)$ and near $(-1,0)$. The failure is therefore not a singularity of the circle, but only a failure of the chosen graph direction.
[/example]
The example shows that the failure of one chosen derivative does not mean the set is singular. It means that a different coordinate may be needed.
## Regularity, Boundaries, and Function Spaces
### Boundary Regularity
The phrase $C^1$ can refer to functions on open sets, compact sets, closures of domains, or manifolds, and the meaning changes slightly with the ambient setting. The open-set definition is the basic one because derivatives are local. [Boundary regularity](/theorems/99) requires an extension or one-sided interpretation.
When a domain has boundary, analysts often need the function and its first derivatives to extend continuously to the boundary. This is the right condition for boundary values and classical boundary conditions, so it deserves notation separate from the open-domain space.
[definition: $C^1(\overline{U};\mathbb R^m)$]
Let $U \subset \mathbb R^n$ be open, and let $f: \overline{U} \to \mathbb R^m$ be a function. The function $f$ belongs to $C^1(\overline{U};\mathbb R^m)$ if there exists an open set $V \subset \mathbb R^n$ with $\overline{U} \subset V$ and a function $F \in C^1(V;\mathbb R^m)$ such that $F|_{\overline{U}}=f$.
[/definition]
The extension viewpoint keeps boundary regularity tied to the open-set definition. It says that the function is the restriction of a genuinely $C^1$ function on an open neighbourhood of the closure.
[example: A Function Smooth Inside but Not $C^1$ Up to the Boundary]
Let $U=(0,1)$ and define $f:[0,1]\to\mathbb R$ by $f(x)=\sqrt{x}$. For $x\in(0,1)$, we have $f(x)=x^{1/2}$, so the one-variable power rule gives
\begin{align*}
f'(x)=\frac{1}{2}x^{-1/2}=\frac{1}{2\sqrt{x}}.
\end{align*}
The function $x\mapsto 1/(2\sqrt{x})$ is continuous on $(0,1)$, hence $f|_U\in C^1(U)$.
We show that $f$ is not $C^1$ up to the boundary. Suppose, for contradiction, that there were an open set $V\subset\mathbb R$ with $[0,1]\subset V$ and a function $F\in C^1(V)$ such that $F|_{[0,1]}=f$. Since $V$ is open and $0\in V$, there is $\varepsilon>0$ such that $(-\varepsilon,\varepsilon)\subset V$. Since $F'$ is continuous at $0$, there is $\delta>0$ such that $|F'(x)-F'(0)|<1$ whenever $|x|<\delta$. Therefore, for such $x$,
\begin{align*}
|F'(x)|\le |F'(0)|+1.
\end{align*}
On the other hand, for every $x\in(0,\min\{1,\varepsilon,\delta\})$, the functions $F$ and $f$ agree on a neighbourhood of $x$ inside $(0,1)$, so
\begin{align*}
F'(x)=f'(x)=\frac{1}{2\sqrt{x}}.
\end{align*}
Choose integers $k$ large enough that $x_k=1/k^2$ lies in $(0,\min\{1,\varepsilon,\delta\})$. Then
\begin{align*}
|F'(x_k)|=\frac{1}{2\sqrt{1/k^2}}=\frac{k}{2}.
\end{align*}
The numbers $k/2$ are unbounded as $k\to\infty$, contradicting the bound $|F'(x)|\le |F'(0)|+1$ near $0$. Hence no such $C^1$ extension exists, so $f|_U\in C^1(U)$ but $f\notin C^1(\overline U)$.
[/example]
### Position in the Regularity Scale
This distinction matters in boundary value problems. Classical boundary conditions use values on $\partial U$, while differential equations inside $U$ use derivatives in the interior. To compare $C^1$ with stronger classes, we also need the standard endpoint of the finite differentiability scale: smoothness.
[definition: Smooth Function]
Let $U \subset \mathbb R^n$ be open and let $f: U \to \mathbb R^m$. The function $f$ is smooth if all partial derivatives of all orders exist on $U$ and are continuous on $U$.
[/definition]
Smooth functions are the most flexible class for formal calculations, but many natural solutions of differential equations are not smooth. Since smoothness includes first derivatives among all higher derivatives, the next theorem records the basic inclusion that lets every smooth construction be used in $C^1$ contexts.
[quotetheorem:8526]
The inclusion is simple, but its role is important. It lets every smooth construction be used in $C^1$ contexts, while reminding the reader that $C^1$ is a lower regularity threshold.
[example: A $C^1$ Function That Is Not $C^2$]
Let $g:\mathbb R\to\mathbb R$ be defined by $g(x)=x|x|$. For $x>0$, $|x|=x$, so
\begin{align*}
g(x)=x\cdot x=x^2
\end{align*}
and therefore
\begin{align*}
g'(x)=2x=2|x|.
\end{align*}
For $x<0$, $|x|=-x$, so
\begin{align*}
g(x)=x(-x)=-x^2
\end{align*}
and therefore
\begin{align*}
g'(x)=-2x=2|x|.
\end{align*}
At $0$, the difference quotient is
\begin{align*}
\frac{g(h)-g(0)}{h}=\frac{h|h|-0}{h}=|h|
\end{align*}
for $h\neq 0$, and $|h|\to 0$ as $h\to 0$. Hence $g'(0)=0=2|0|$, so $g'(x)=2|x|$ for every $x\in\mathbb R$.
The function $x\mapsto 2|x|$ is continuous on $\mathbb R$, so $g\in C^1(\mathbb R)$. To test whether $g$ is $C^2$ at $0$, compute the derivative of $g'$ at $0$ from the difference quotient. For $h>0$,
\begin{align*}
\frac{g'(h)-g'(0)}{h}=\frac{2|h|-0}{h}=\frac{2h}{h}=2.
\end{align*}
For $h<0$,
\begin{align*}
\frac{g'(h)-g'(0)}{h}=\frac{2|h|-0}{h}=\frac{2(-h)}{h}=-2.
\end{align*}
The one-sided limits of this quotient are different, so $g''(0)$ does not exist. Thus $g$ is $C^1$ but not $C^2$, showing that one continuous derivative does not force two derivatives.
[/example]
Examples of this kind prevent a common misconception: once a function has one continuous derivative, no higher differentiability is automatic.
## Approximation and Density
In applications, $C^1$ functions are often reached by approximation. A rough function is smoothed by convolution, a smooth function is used as a test object, and then an estimate is passed to a limit. This is why $C^1$ belongs not only to classical calculus but also to modern analysis.
The standard source of smooth approximations is a mollifier. It averages a function over a small ball without changing it too much where the function is already continuous.
[definition: Standard Mollifier]
A [standard mollifier](/page/Standard%20Mollifier) on $\mathbb R^n$ is a function $\eta: \mathbb R^n \to \mathbb R$ such that $\eta \in C_c^\infty(B(0,1))$, $\eta \ge 0$, and
\begin{align*}
\int_{\mathbb R^n} \eta(x)\,d\mathcal L^n(x)=1.
\end{align*}
For $\varepsilon>0$, the rescaled mollifier is the function $\eta_\varepsilon: \mathbb R^n \to \mathbb R$ defined by
\begin{align*}
\eta_\varepsilon(x)=\varepsilon^{-n}\eta(x/\varepsilon).
\end{align*}
[/definition]
Mollifiers are designed so that convolution with $\eta_\varepsilon$ replaces a function by a local average at scale $\varepsilon$. As $\varepsilon$ shrinks, the average sees less of the surrounding region. We next name the smoothed function produced by this averaging operation.
[definition: Mollification]
Let $U \subset \mathbb R^n$ be open, let $f \in L^1_{\mathrm{loc}}(U;\mathbb R^m)$, and let $\eta_\varepsilon$ be a standard mollifier. Define
\begin{align*}
U_\varepsilon=\{x\in U:B(x,\varepsilon)\subset U\}.
\end{align*}
The mollification of $f$ at scale $\varepsilon$ is the function $f_\varepsilon: U_\varepsilon \to \mathbb R^m$ defined componentwise by
\begin{align*}
(f_\varepsilon)_i(x)=\int_U \eta_\varepsilon(x-y)f_i(y)\,d\mathcal L^n(y)
\end{align*}
for $1 \le i \le m$.
[/definition]
The point of restricting to $U_\varepsilon$ is to avoid averaging outside the domain. Near the boundary, one either works on smaller subdomains or extends the function first. The next theorem explains why this averaging operation is useful: as the scale shrinks, it recovers continuous functions uniformly on compact subsets.
[quotetheorem:8529]
This theorem is one of the bridges from classical to weak analysis. It says that smooth functions can approximate continuous functions locally uniformly, which lets estimates first be proved for smooth objects and then transferred.
[quotetheorem:8532]
For $C^1$ functions, approximation occurs at the level of both values and first derivatives. That is the natural topology of the $C^1$ class on compact subsets.
[example: Mollifying an Absolute Value Cusp]
Let $f:\mathbb R\to\mathbb R$ be $f(x)=|x|$, and let $\eta$ be an even standard mollifier on $\mathbb R$. For $\varepsilon>0$, write $\eta_\varepsilon(t)=\varepsilon^{-1}\eta(t/\varepsilon)$ and define
\begin{align*}
f_\varepsilon(x)=(\eta_\varepsilon*f)(x)=\int_{\mathbb R}\eta_\varepsilon(z)|x-z|\,dz.
\end{align*}
Since $\eta_\varepsilon$ is smooth and compactly supported, differentiation may be passed to the smooth kernel in the convolution, so $f_\varepsilon$ is smooth.
The smoothing only changes the cusp near $0$. Because $\eta$ is supported in $(-1,1)$, the function $\eta_\varepsilon$ is supported in $(-\varepsilon,\varepsilon)$. If $x\ge \varepsilon$, then every $z$ in the support of $\eta_\varepsilon$ satisfies $x-z\ge 0$, hence $|x-z|=x-z$. Therefore
\begin{align*}
f_\varepsilon(x)=\int_{\mathbb R}\eta_\varepsilon(z)(x-z)\,dz=x\int_{\mathbb R}\eta_\varepsilon(z)\,dz-\int_{\mathbb R}z\eta_\varepsilon(z)\,dz.
\end{align*}
The first integral is $1$ by the normalization of the mollifier. The second integral is $0$ because $z\eta_\varepsilon(z)$ is odd and compactly supported. Thus
\begin{align*}
f_\varepsilon(x)=x=f(x)
\end{align*}
for every $x\ge \varepsilon$. Similarly, if $x\le -\varepsilon$, then $x-z\le 0$ on the support of $\eta_\varepsilon$, so $|x-z|=-(x-z)$ and the same two integrals give
\begin{align*}
f_\varepsilon(x)=-x=f(x).
\end{align*}
Thus mollification modifies $|x|$ only inside the interval $(-\varepsilon,\varepsilon)$.
At the cusp itself,
\begin{align*}
f_\varepsilon(0)=\int_{\mathbb R}\eta_\varepsilon(z)|z|\,dz\ge 0.
\end{align*}
The mollified function is even, since
\begin{align*}
f_\varepsilon(-x)=\int_{\mathbb R}\eta_\varepsilon(z)|-x-z|\,dz=\int_{\mathbb R}\eta_\varepsilon(w)|x-w|\,dw=f_\varepsilon(x),
\end{align*}
where the substitution $w=-z$ uses the evenness of $\eta_\varepsilon$. Since $f_\varepsilon$ is smooth and even, its derivative at $0$ satisfies
\begin{align*}
f_\varepsilon'(0)=0.
\end{align*}
So the corner of $|x|$ is replaced by a smooth rounded transition across $(-\varepsilon,\varepsilon)$.
Finally, fix $\delta>0$. If $0<\varepsilon<\delta$, then every $x\in[\delta,1]$ satisfies $x\ge\varepsilon$, so the computation above gives
\begin{align*}
f_\varepsilon(x)=x=f(x)
\end{align*}
for all $x\in[\delta,1]$. Differentiating this identity on $[\delta,1]$ gives
\begin{align*}
f_\varepsilon'(x)=1=f'(x)
\end{align*}
there. Hence on any interval $[\delta,1]$ away from the cusp, $f_\varepsilon$ converges to $f$ together with its derivative; in fact, for all sufficiently small $\varepsilon$, the equality is exact on that interval.
[/example]
The example shows both the strength and limitation of smoothing. Mollification repairs singularities, but convergence of derivatives can only occur where the original derivative exists continuously.
## Beyond and Connected Topics
The $C^1$ condition is the entry point into several larger theories. In multivariable analysis, it is the regularity class used for the inverse and implicit function theorems, change of variables, and the differential geometry of regular level sets. The natural continuation is [Cambridge IB Analysis and Topology](/page/Cambridge%20IB%20Analysis%20and%20Topology), where differentiability and compactness interact systematically.
In analysis of functions, $C^1$ sits beside [uniform convergence](/page/Uniform%20Convergence), differentiability under limits, and approximation by smoother functions. The page [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions) develops this broader viewpoint, especially the distinction between pointwise, uniform, and derivative-level convergence.
Functional analysis replaces finite-dimensional derivative estimates with normed-space maps and bounded linear operators. The derivative map $Df: U \to \mathcal L(\mathbb R^n,\mathbb R^m)$ is a finite-dimensional preview of this language. For that direction, see [Cambridge II Linear Analysis](/page/Cambridge%20II%20Linear%20Analysis).
In PDE and geometric measure theory, classical $C^1$ regularity is often too strong. Weak derivatives, Sobolev spaces, and [functions of bounded variation](/page/Functions%20of%20Bounded%20Variation) preserve enough derivative information to solve problems where classical derivatives may fail. A later-stage continuation is [Geometric Measure Theory III: BV Functions and Sets of Finite Perimeter](/page/Geometric%20Measure%20Theory%20III%3A%20BV%20Functions%20and%20Sets%20of%20Finite%20Perimeter), where derivative information is encoded by measures rather than continuous functions.
## References
Androma, [Cambridge IB Analysis and Topology](/page/Cambridge%20IB%20Analysis%20and%20Topology).
Androma, [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions).
Androma, [Cambridge II Linear Analysis](/page/Cambridge%20II%20Linear%20Analysis).
Androma, [Geometric Measure Theory III: BV Functions and Sets of Finite Perimeter](/page/Geometric%20Measure%20Theory%20III%3A%20BV%20Functions%20and%20Sets%20of%20Finite%20Perimeter).
Spivak, *Calculus on Manifolds* (1965).
Rudin, *Principles of Mathematical Analysis* (1976).
Evans, *Partial Differential Equations* (1998).
$C^1$ Function
Also known as: C1 function, C-one function, continuously differentiable function, continuously differentiable map, differentiable function with continuous derivative, C^1 Function