Analysis is the rigorous foundation of calculus — the discipline that transforms Newton and Leibniz's brilliant but logically shaky arguments into precise mathematics. The course answers three questions that pervade all of mathematics:
1. **When does an infinite process have a well-defined result?** Sequences, [series](/page/Series), and limits formalise the idea of "approaching" a value. The completeness of $\mathbb{R}$ — the fact that it has no "gaps" — is the single axiom that makes this work, and its absence in $\mathbb{Q}$ causes the entire theory to collapse.
2. **What can we deduce from local regularity?** Continuity (no jumps) and differentiability (existence of tangent lines) are local conditions — they describe behaviour near a single point. Yet combined with the [topology](/page/Topology) of closed bounded intervals, they yield powerful global conclusions: continuous [functions](/page/Function) attain their extrema ([Extreme Value Theorem](/theorems/182)), skip no intermediate values ([Intermediate Value Theorem](/theorems/180)), and admit polynomial approximations of arbitrary accuracy (Taylor's theorem).
3. **How do we define area rigorously?** The [Riemann integral](/page/Riemann%20Integral) replaces Archimedes' exhaustion arguments with a precise limit of sums, then the [Fundamental Theorem of Calculus](/theorems/632) reveals that integration and differentiation are inverse operations — the deepest connection in elementary analysis.
The course builds a chain: [sequences](/page/Sequence) and [limits](/page/Limit) → [continuity](/page/Continuity%20(Real%20Analysis)) → differentiability → [power series](/page/Power%20Series) → integration. Each link depends on the previous one, and completeness of $\mathbb{R}$ runs through everything like a thread.
# Limits and Convergence
Imagine watching a particle move along a line, its position at second $n$ given by $1/n$. At $n=1$ it sits at $1$; at $n=2$ it reaches $1/2$; at $n=1000$ it hovers near $0.001$. We sense the particle "approaches" zero—but what does this intuition mean mathematically? In the 17th century, Newton and Leibniz manipulated such limiting processes with brilliant but logically shaky arguments. It took two centuries for Cauchy and Weierstrass to replace geometric intuition with precise language. Their $\epsilon$-$N$ definition transformed analysis from a collection of clever tricks into a rigorous discipline capable of handling pathological examples that defy visualization.
This section develops the theory of limits from first principles, revealing how a single idea—the precise quantification of "arbitrarily close"—unlocks the behavior of sequences, infinite sums, and eventually functions. We begin with sequences because they provide the simplest arena for studying convergence, yet already expose deep connections to the completeness of the real numbers.
## Essential definitions
A sequence is not merely a list of numbers—it is a function whose domain is the natural numbers. This functional perspective matters: it forces us to treat sequences as objects with precise domains and codomains, avoiding the informal "list" mentality that obscures subtle issues.
[definition: Sequence]
A sequence in a set $T$ is a function with domain $\mathbb{N}$ and codomain $T$:
\begin{align*}
a: \mathbb{N} &\to T \\
n &\mapsto a_n
\end{align*}
We denote such a sequence by $(a_n)_{n=1}^{\infty}$ or simply $(a_n)$.
[/definition]
To discuss convergence we need distance. On the real line, distance is absolute difference; in the complex plane, it is Euclidean distance. These metrics transform vague notions of "closeness" into quantifiable relationships.
[definition: Euclidean Metric Real]
The Euclidean metric on $\mathbb{R}$ is the function:
\begin{align*}
d: \mathbb{R} \times \mathbb{R} &\to [0,\infty) \\
(x,y) &\mapsto |x-y|
\end{align*}
[/definition]
[definition: Euclidean Metric Complex]
The Euclidean metric on $\mathbb{C}$ is the function:
\begin{align*}
d: \mathbb{C} \times \mathbb{C} &\to [0,\infty) \\
(z,w) &\mapsto |z-w| = \sqrt{(\operatorname{Re}(z-w))^2 + (\operatorname{Im}(z-w))^2}
\end{align*}
[/definition]
With distance defined, we formalize convergence. The key insight—due to Cauchy—is that "approaching a limit" means surviving *every* tolerance test. No matter how small a neighborhood $\epsilon$ we draw around the candidate limit $a$, eventually *all* terms must remain inside that neighborhood.
[definition: Limit Of Sequence]
Let $(a_n)_{n=1}^{\infty}$ be a sequence in $\mathbb{R}$ or $\mathbb{C}$. We say $(a_n)$ converges to $a$ (written $a_n \to a$ as $n \to \infty$) if for every $\epsilon > 0$ there exists $N \in \mathbb{N}$ such that:
\begin{align*}
|a_n - a| < \epsilon \quad \text{for all } n \geq N
\end{align*}
When this limit exists, we write $\lim_{n \to \infty} a_n = a$.
[/definition]
[motivation]
Why this particular formulation? Earlier mathematicians used infinitesimals—"infinitely small" quantities—to describe limits. But infinitesimals lacked rigorous foundation. Cauchy's genius was to eliminate them entirely: instead of saying "the difference becomes infinitesimal," he demanded verification for *every* positive $\epsilon$, however small. This quantifier structure ($\forall \epsilon > 0 \; \exists N \in \mathbb{N}$) is the engine of analysis—it converts vague continuity into verifiable inequalities. Crucially, it works in abstract spaces where geometric intuition fails, making it the cornerstone of modern analysis.
[/motivation]
Real sequences enjoy additional structure through ordering. While most sequences oscillate (like $(-1)^n$), monotonic sequences—those that only increase or decrease—behave remarkably well when bounded. This special behavior reveals a profound truth about $\mathbb{R}$: it contains no "gaps."
[definition: Increasing Sequence]
A sequence $(a_n)_{n=1}^{\infty}$ of real numbers is increasing if:
\begin{align*}
a_n \leq a_{n+1} \quad \text{for all } n \in \mathbb{N}
\end{align*}
[/definition]
[definition: Decreasing Sequence]
A sequence $(a_n)_{n=1}^{\infty}$ of real numbers is decreasing if:
\begin{align*}
a_n \geq a_{n+1} \quad \text{for all } n \in \mathbb{N}
\end{align*}
[/definition]
[definition: Monotonic Sequence]
A sequence $(a_n)_{n=1}^{\infty}$ of real numbers is monotonic if it is either increasing or decreasing.
[/definition]
Consider the sequence of decimal approximations to $\sqrt{2}$: $1, 1.4, 1.41, 1.414, \dots$. It is increasing and bounded above by $2$, yet has no limit in $\mathbb{Q}$. In $\mathbb{R}$, however, it *must* converge—this is not a theorem but an axiom, the bedrock of real analysis.
[definition: Fundamental Axiom Real Numbers]
Every increasing sequence of real numbers that is bounded above converges to a real number. Equivalently, every decreasing sequence of real numbers that is bounded below converges to a real number.
[/definition]
This axiom distinguishes $\mathbb{R}$ from $\mathbb{Q}$ and guarantees that bounded monotonic sequences cannot "escape" to a missing limit point. It enables the supremum concept, which precisely characterizes the least upper bound of a set—often the limit of an increasing sequence.
[definition: Supremum]
Let $S \subseteq \mathbb{R}$ be non-empty and bounded above. A real number $K$ is the supremum of $S$ (written $K = \sup S$) if:
\begin{align*}
&\text{(i) } x \leq K \quad \text{for all } x \in S \\
&\text{(ii) for every } \epsilon > 0 \text{ there exists } x \in S \text{ such that } x > K - \epsilon
\end{align*}
[/definition]
The supremum is not merely an upper bound—it is the *least* upper bound, approached arbitrarily closely by elements of $S$. For an increasing bounded sequence $(a_n)$, the limit equals $\sup \{a_n : n \in \mathbb{N}\}$, linking sequence convergence directly to set-theoretic bounds.
## Properties of limits
The $\epsilon$-$N$ definition, while precise, is cumbersome for routine calculations. Fortunately, limits respect algebraic operations—a fact that transforms convergence questions into mechanical verifications once basic properties are established. These properties are not mere conveniences; they reflect the compatibility of the limit operation with the field structure of $\mathbb{R}$ and $\mathbb{C}$.
[quotetheorem:170]
[citeproof:170]
[example: Limit Of Reciprocal Sequence]
Consider the sequence $(a_n)_{n=1}^{\infty}$ defined by $a_n = \frac{1}{n}$. This sequence is decreasing and bounded below by $0$. By the Fundamental Axiom of real numbers, it converges to some limit $a \in \mathbb{R}$. Consider the subsequence $(a_{2n})_{n=1}^{\infty}$ where $a_{2n} = \frac{1}{2n}$. By property (ii) of limits, $a_{2n} \to a$. By property (v), $a_{2n} = \frac{1}{2} \cdot \frac{1}{n} \to \frac{1}{2}a$. By [uniqueness of limits](/theorems/625) (property (i)), $a = \frac{1}{2}a$, which implies $a = 0$. Therefore $\lim_{n \to \infty} \frac{1}{n} = 0$.
[/example]
Boundedness alone does not guarantee convergence—consider $a_n = (-1)^n$, which oscillates forever between $-1$ and $1$. Yet even this pathological sequence contains convergent subsequences: the even terms converge to $1$, the odd terms to $-1$. This observation generalizes profoundly: *every* bounded sequence in $\mathbb{R}$ contains a convergent subsequence. This is the [Bolzano-Weierstrass theorem](/theorems/628), reflecting the compactness of closed bounded intervals — a topological property with far-reaching consequences.
[quotetheorem:171]
[citeproof:171]
[motivation]
Visualize infinitely many points confined to a bounded interval $[-K, K]$. Intuition suggests clustering must occur—infinitely many points cannot remain uniformly separated without violating boundedness. The bisection method formalizes this geometric insight: repeatedly divide the interval containing infinitely many terms, selecting the half that still contains infinitely many terms. The endpoints of these nested intervals converge to a common limit point $x$, and we extract one sequence term from each interval to construct a subsequence converging to $x$. This argument fails in $\mathbb{Q}$ (e.g., rational approximations to $\sqrt{2}$ have no rational cluster point), revealing again that completeness is essential.
[/motivation]
## Cauchy sequences
The $\epsilon$-$N$ definition requires prior knowledge of the limit value $a$. When analyzing series $\sum_{j=1}^{\infty} a_j$, we examine partial sums $S_N = \sum_{j=1}^{N} a_j$ whose limiting behavior must be established *without* knowing the sum in advance. This practical limitation motivates an intrinsic convergence criterion based solely on the relative behavior of sequence terms—Cauchy's profound insight.
[definition: Cauchy Sequence]
A sequence $(a_n)_{n=1}^{\infty}$ in $\mathbb{R}$ is a Cauchy sequence if for every $\epsilon > 0$ there exists $N \in \mathbb{N}$ such that:
\begin{align*}
|a_n - a_m| < \epsilon \quad \text{for all } n,m \geq N
\end{align*}
[/definition]
Cauchy sequences exhibit "self-convergence": terms eventually become arbitrarily close to each other regardless of proximity to any external limit point. This property is necessary for convergence — if terms approach $a$, they must approach each other. More profoundly, in $\mathbb{R}$ it is also *sufficient* — a fact reflecting the [completeness of the real numbers](/page/Cauchy%20Sequence) that fails catastrophically in $\mathbb{Q}$.
[quotetheorem:172]
[citeproof:172]
This equivalence is the workhorse of analysis. When studying power series or [Fourier series](/page/Fourier%20Series), we rarely know the limit in advance; instead, we verify the Cauchy property of partial sums. The proof reveals deep structure: Cauchy sequences are bounded (hence admit a convergent subsequence by Bolzano-Weierstrass), and the Cauchy property forces the entire sequence to converge to that subsequence's limit. Completeness is indispensable—without it, Cauchy sequences could "converge" to missing points, as rational approximations to $\sqrt{2}$ demonstrate.
## Series
An infinite series $\sum_{j=1}^{\infty} a_j$ is not a sum in the ordinary sense—addition is a binary operation, defined only for finitely many terms. Instead, a series is shorthand for the limiting behavior of its partial sums. This perspective unifies series with sequence theory: convergence questions reduce to analyzing the sequence $(S_N)_{N=1}^{\infty}$ where $S_N = \sum_{j=1}^{N} a_j$.
[definition: Series Convergence]
Let $(a_j)_{j=1}^{\infty}$ be a sequence in $\mathbb{R}$. The series $\sum_{j=1}^{\infty} a_j$ converges to $S \in \mathbb{R}$ if the sequence of partial sums $(S_N)_{N=1}^{\infty}$ defined by:
\begin{align*}
S: \mathbb{N} &\to \mathbb{R} \\
N &\mapsto S_N = \sum_{j=1}^{N} a_j
\end{align*}
converges to $S$. We write $\sum_{j=1}^{\infty} a_j = S$.
[/definition]
The harmonic series $\sum_{j=1}^{\infty} \frac{1}{j}$ illustrates a subtle danger: its terms tend to zero, yet the series diverges. This shows that $a_j \to 0$ is necessary but *not sufficient* for convergence. What matters is whether partial sums form a Cauchy sequence—whether the "tail" $\sum_{j=N+1}^{M} a_j$ becomes arbitrarily small as $N,M \to \infty$.
[definition: Absolute Convergence]
A series $\sum_{j=1}^{\infty} a_j$ converges absolutely if $\sum_{j=1}^{\infty} |a_j|$ converges.
[/definition]
[definition: Conditional Convergence]
A series $\sum_{j=1}^{\infty} a_j$ converges conditionally if it converges but does not converge absolutely.
[/definition]
Absolute convergence is a robust property guaranteeing rearrangement invariance—a striking contrast with conditional convergence, where Riemann showed rearrangements can alter the sum or induce divergence. This dichotomy reveals a deep truth: absolute convergence reflects genuine "smallness" of terms, while conditional convergence relies on delicate cancellation that rearrangements can destroy.
[quotetheorem:178]
[citeproof:178]
## Convergence tests
Practical analysis demands tools to determine convergence without computing partial sums explicitly. These tests exploit monotonicity, comparison with known series, or asymptotic behavior—each revealing different facets of convergence.
[quotetheorem:173]
[citeproof:173]
The comparison test reduces convergence questions to bounding—a fundamental technique in analysis. Its power emerges when comparing to geometric series $\sum r^j$ (convergent for $|r|<1$) or $p$-series $\sum 1/j^p$.
[quotetheorem:174]
[citeproof:174]
The ratio test measures asymptotic decay rate. When $L < 1$, terms decay faster than a geometric series with ratio $r \in (L,1)$; when $L > 1$, terms grow in magnitude, violating the necessary condition $a_j \to 0$.
[quotetheorem:175]
[citeproof:175]
The root test often succeeds where the ratio test fails—particularly for series with irregular term behavior. It measures the geometric mean decay rate, providing a more robust asymptotic indicator.
[quotetheorem:176]
[citeproof:176]
Cauchy's condensation test transforms series convergence into a question about dyadic subsequences. By grouping terms in blocks of size $2^k$, it exploits monotonicity to create sharp upper and lower bounds. This technique reveals the precise threshold for $p$-series convergence.
[example: Harmonic Series Divergence]
Consider the harmonic series $\sum_{j=1}^{\infty} \frac{1}{j}$. The sequence $(\frac{1}{j})_{j=1}^{\infty}$ is decreasing and non-negative. Applying the Cauchy condensation test:
\begin{align*}
\sum_{k=0}^{\infty} 2^k \cdot \frac{1}{2^k} = \sum_{k=0}^{\infty} 1
\end{align*}
diverges. Therefore $\sum_{j=1}^{\infty} \frac{1}{j}$ diverges—despite terms tending to zero. Grouping terms as $(1) + (\frac{1}{2}) + (\frac{1}{3}+\frac{1}{4}) + (\frac{1}{5}+\cdots+\frac{1}{8}) + \cdots$ reveals each block exceeds $\frac{1}{2}$, forcing partial sums beyond any bound.
[/example]
[example: P-Series Convergence]
Consider $\sum_{j=1}^{\infty} \frac{1}{j^p}$ for $p > 0$. The sequence $(\frac{1}{j^p})_{j=1}^{\infty}$ is decreasing and non-negative. Applying the Cauchy condensation test:
\begin{align*}
\sum_{k=0}^{\infty} 2^k \cdot \frac{1}{(2^k)^p} = \sum_{k=0}^{\infty} 2^{k(1-p)}
\end{align*}
This is a geometric series with ratio $2^{1-p}$. It converges if and only if $2^{1-p} < 1$, i.e., $1-p < 0$, or $p > 1$. Therefore $\sum_{j=1}^{\infty} \frac{1}{j^p}$ converges if and only if $p > 1$—a fundamental threshold governing series behavior.
[/example]
## Alternating series
Series with alternating signs exhibit special convergence behavior through cancellation. The alternating harmonic series $1 - \frac{1}{2} + \frac{1}{3} - \frac{1}{4} + \cdots$ converges despite the harmonic series diverging—a striking demonstration that sign structure can rescue convergence when absolute convergence fails.
[definition: Alternating Series]
A series $\sum_{j=1}^{\infty} a_j$ is alternating if $a_j = (-1)^{j-1} b_j$ or $a_j = (-1)^j b_j$ where $(b_j)_{j=1}^{\infty}$ is a sequence of non-negative real numbers.
[/definition]
[quotetheorem:177]
[citeproof:177]
The alternating series test exploits monotonic decay to bound error precisely. Even terms form an increasing sequence bounded above; odd terms form a decreasing sequence bounded below; both converge to the same limit because their difference $b_{N+1} \to 0$. The error bound $|S - S_N| \leq b_{N+1}$ provides practical truncation estimates—essential for numerical computation.
[example: Alternating Harmonic Series]
Consider $\sum_{j=1}^{\infty} \frac{(-1)^{j-1}}{j} = 1 - \frac{1}{2} + \frac{1}{3} - \frac{1}{4} + \cdots$. The sequence $(\frac{1}{j})_{j=1}^{\infty}$ is decreasing and tends to $0$. By the alternating series test, the series converges. However, $\sum_{j=1}^{\infty} \frac{1}{j}$ diverges, so the convergence is conditional. Remarkably, rearranging terms can produce *any* real number as the sum—a phenomenon impossible for absolutely convergent series.
[/example]
The theory of limits and convergence forms the foundation of analysis. From the precise $\epsilon$-$N$ definition to the powerful [Cauchy criterion](/page/Cauchy%20Sequence), each concept resolves limitations of its predecessor while revealing deeper structure in the real numbers. This progression — from sequences to series, from absolute to conditional convergence — exemplifies analysis itself: replacing intuition with rigour, then discovering richer truths than intuition alone could provide.
Sequences describe the behaviour of discrete processes: what happens at step $n$ as $n \to \infty$. But most of analysis concerns *functions* — objects that vary continuously over intervals, not just at integer steps. The next chapter asks: when does a function respect the limiting structure we have built? When does $x_n \to a$ guarantee $f(x_n) \to f(a)$? The answer — [continuity](/page/Continuity%20(Real%20Analysis)) — is the bridge from discrete to continuous analysis.
# Continuity
Picture a particle moving along a line without teleporting—its position changes smoothly, never jumping instantaneously from one location to another. This intuitive notion of "no jumps" underlies continuity, yet translating it into rigorous mathematics proved surprisingly subtle. In the 18th century, Euler defined continuous functions as those expressible by a single analytic formula, confidently asserting that such functions must be differentiable "except at isolated points." This belief collapsed when Weierstrass constructed a function continuous everywhere but differentiable nowhere—a pathological example that shattered geometric intuition and forced analysts to rebuild continuity from first principles.
The modern $\epsilon$-$\delta$ definition, pioneered by Cauchy and refined by Weierstrass, eliminates reliance on formulas or visualization. Instead, it quantifies the idea that small changes in input produce small changes in output—a principle that survives even when functions behave bizarrely. This section develops continuity systematically, revealing how a simple local property (continuity at a point) combines with compactness to yield powerful global consequences (boundedness, attainment of extrema). The journey exposes a profound truth: continuity alone cannot guarantee differentiability, yet it suffices for integration and intermediate value properties that differentiation lacks.
## The $\epsilon$-$\delta$ definition
Continuity concerns behavior *at a point*. A function $f$ is continuous at $a$ if values $f(x)$ remain close to $f(a)$ whenever $x$ stays sufficiently near $a$. Crucially, "sufficiently near" depends on how close we demand $f(x)$ to be to $f(a)$—a dependency captured precisely by quantifiers.
[definition: Continuity At Point]
Let $E \subseteq \mathbb{R}$, let $f: E \to \mathbb{R}$ be a function, and let $a \in E$. We say $f$ is continuous at $a$ if for every $\epsilon > 0$ there exists $\delta > 0$ such that:
\begin{align*}
|x - a| < \delta \quad \text{and} \quad x \in E \quad \text{implies} \quad |f(x) - f(a)| < \epsilon
\end{align*}
We say $f$ is continuous on $E$ if it is continuous at every point $a \in E$.
[/definition]
[motivation]
Why this asymmetric quantifier structure ($\forall \epsilon > 0 \; \exists \delta > 0$)? Earlier mathematicians spoke of "infinitesimal changes" in $x$ producing "infinitesimal changes" in $f(x)$, but infinitesimals lacked logical foundation. Cauchy's insight was to replace them with a verifiable relationship: for *any* tolerance $\epsilon$ on the output, we can find a corresponding tolerance $\delta$ on the input that guarantees the output stays within $\epsilon$ of $f(a)$. The dependency $\delta = \delta(\epsilon, a)$ is essential—different points may require different $\delta$ values even for the same $\epsilon$, a subtlety that later motivates uniform continuity.
[/motivation]
The definition requires $a \in E$ so that $f(a)$ is defined. When $a$ is an isolated point of $E$ (e.g., $E = \{0\} \cup [1,2]$ and $a=0$), continuity holds vacuously—any $\delta < 1$ ensures no other points of $E$ lie within $\delta$ of $a$, making the implication true. Continuity becomes substantive only at limit points of $E$, where sequences of distinct points approach $a$.
[definition: Limit Point]
A point $a \in \mathbb{R}$ is a limit point of $E \subseteq \mathbb{R}$ if for every $\delta > 0$ there exists $x \in E$ with $0 < |x - a| < \delta$.
[/definition]
At limit points, continuity connects naturally to limits of sequences—a perspective often more intuitive for computation.
[quotetheorem:179]
[citeproof:179]
This characterization transforms continuity verification into sequence manipulation—a technique invaluable for proving discontinuity (by exhibiting a single sequence violating the condition) and for establishing continuity of composite functions.
[example: Dirichlet Function]
Define $f: \mathbb{R} \to \mathbb{R}$ by:
\begin{align*}
f(x) =
\begin{cases}
1 & \text{if } x \in \mathbb{Q} \\
0 & \text{if } x \notin \mathbb{Q}
\end{cases}
\end{align*}
At any $a \in \mathbb{R}$, select a sequence of rationals $(r_n)$ and irrationals $(s_n)$ both converging to $a$ (possible because $\mathbb{Q}$ and $\mathbb{R} \setminus \mathbb{Q}$ are dense). Then $f(r_n) \to 1$ but $f(s_n) \to 0$. Since $1 \neq 0$, $f$ cannot be continuous at $a$. The Dirichlet function is discontinuous everywhere—a stark reminder that continuity is not generic.
[/example]
[example: Thomae Function]
Define $f: (0,1) \to \mathbb{R}$ by:
\begin{align*}
f(x) =
\begin{cases}
\frac{1}{q} & \text{if } x = \frac{p}{q} \text{ in lowest terms with } p,q \in \mathbb{N} \\
0 & \text{if } x \notin \mathbb{Q}
\end{cases}
\end{align*}
At irrational $a$, given $\epsilon > 0$, only finitely many rationals in $(0,1)$ have denominator $q \leq 1/\epsilon$. Choose $\delta > 0$ small enough to exclude these finitely many rationals from $(a-\delta, a+\delta)$. Then $|x-a| < \delta$ implies either $x$ irrational (so $f(x)=0$) or $x=p/q$ with $q > 1/\epsilon$ (so $f(x) < \epsilon$). Hence $|f(x)-f(a)| = |f(x)| < \epsilon$, proving continuity at irrationals.
At rational $a = p/q$, consider a sequence of irrationals $(x_n)$ converging to $a$. Then $f(x_n) = 0 \to 0$ but $f(a) = 1/q > 0$, so $f(x_n) \not\to f(a)$. Thus $f$ is discontinuous at rationals. The Thomae function is continuous precisely on the irrationals—demonstrating that continuity sets can be highly irregular yet dense.
[/example]
## Algebra of continuous functions
Continuity respects algebraic operations—a consequence of limit properties for sequences. This enables construction of complex continuous functions from simple building blocks.
[quotetheorem:197]
[citeproof:197]
Polynomials emerge as continuous functions through repeated application: constant functions and the identity $x \mapsto x$ are continuous; sums and products preserve continuity. Rational functions inherit continuity wherever their denominator is non-zero. This algebraic closure makes continuous functions abundant—yet abundance does not imply tameness, as Weierstrass's nowhere-differentiable example demonstrates.
## The Intermediate Value Theorem
Continuity enables a profound global property absent in discontinuous functions: a continuous function on an interval cannot "skip" values. If it takes values $A$ and $B$, it must take every intermediate value. This Intermediate Value Property seems intuitively obvious for curves drawn without lifting a pen—but rigor demands proof from the $\epsilon$-$\delta$ definition alone.
[quotetheorem:180]
[citeproof:180]
[motivation]
Why does this fail for discontinuous functions? Consider $f(x) = \operatorname{sgn}(x)$ on $[-1,1]$: $f(-1) = -1$, $f(1) = 1$, yet $f$ never equals $0$. The jump discontinuity at $0$ creates a "gap" in the range. Continuity prevents such gaps—but proving this requires the completeness of $\mathbb{R}$. The standard proof constructs $c = \sup \{ x \in [a,b] : f(x) < \lambda \}$ and uses continuity to show $f(c) = \lambda$. Without completeness (e.g., in $\mathbb{Q}$), the supremum might not exist in the domain, allowing continuous functions to skip values—revealing that the Intermediate Value Theorem characterizes the connectedness of intervals in $\mathbb{R}$.
[/motivation]
The theorem guarantees existence of roots: if $f(a) < 0 < f(b)$, some $c \in (a,b)$ satisfies $f(c) = 0$. This underpins numerical root-finding methods (bisection, regula falsi) and justifies defining inverse functions for strictly monotonic continuous functions. Crucially, the Intermediate Value Property does *not* characterize continuity—there exist discontinuous functions satisfying it (e.g., derivatives of differentiable functions, by Darboux's theorem)—but continuity *implies* the property on intervals.
## Boundedness and attainment of extrema
Continuity on a *closed bounded interval* yields stronger global properties than on arbitrary domains. The function $f(x) = 1/x$ on $(0,1]$ is continuous but unbounded; $f(x) = x$ on $(0,1)$ is continuous and bounded but attains neither its supremum nor infimum. Closing and bounding the domain eliminates these pathologies—a consequence of compactness interacting with continuity.
[quotetheorem:181]
[citeproof:181]
[quotetheorem:182]
[citeproof:182]
[motivation]
Why closed and bounded? The proof uses the Bolzano-Weierstrass theorem: if $f$ were unbounded, we could construct a sequence $(x_n)$ with $|f(x_n)| \to \infty$; by Bolzano-Weierstrass, $(x_n)$ has a convergent subsequence $x_{n_j} \to c \in [a,b]$; continuity forces $f(x_{n_j}) \to f(c)$, contradicting $|f(x_{n_j})| \to \infty$. The closedness of $[a,b]$ ensures the limit point $c$ lies in the domain—without it, the subsequence might converge to a [boundary](/page/Boundary) point outside the domain (e.g., $x_n = 1/n$ in $(0,1]$ converges to $0 \notin (0,1]$). Boundedness ensures the sequence has a convergent subsequence at all. This interplay between domain topology and function behavior exemplifies how analysis unifies local properties (continuity) with global structure (compactness).
[/motivation]
These theorems justify optimization on closed intervals: continuous objective functions always attain extrema, making calculus-based optimization rigorous. They also enable uniform continuity—a strengthening essential for integration theory.
## Uniform continuity
Pointwise continuity allows $\delta$ to depend on both $\epsilon$ *and* the point $a$. For $f(x) = x^2$ on $\mathbb{R}$, achieving $|f(x)-f(a)| < \epsilon$ requires $\delta < \epsilon/(2|a|+1)$—smaller $\delta$ near large $|a|$. This dependency causes trouble when approximating integrals or composing functions. Uniform continuity eliminates point-dependence: a single $\delta$ works *simultaneously* for all points in the domain.
[definition: Uniform Continuity]
Let $E \subseteq \mathbb{R}$ and $f: E \to \mathbb{R}$. We say $f$ is uniformly continuous on $E$ if for every $\epsilon > 0$ there exists $\delta > 0$ such that for all $x,y \in E$:
\begin{align*}
|x - y| < \delta \quad \text{implies} \quad |f(x) - f(y)| < \epsilon
\end{align*}
[/definition]
Every uniformly continuous function is continuous, but the converse fails on unbounded or non-closed domains. The function $f(x) = 1/x$ on $(0,1)$ is continuous but not uniformly continuous: near $0$, arbitrarily small changes in $x$ produce large changes in $f(x)$. Remarkably, continuity on a *closed bounded interval* guarantees uniform continuity—a profound consequence of compactness.
[quotetheorem:183]
[citeproof:183]
[motivation]
The proof exploits compactness through the Heine-Borel property. For each $a \in [a,b]$, continuity provides $\delta_a > 0$ such that $|x-a| < \delta_a$ implies $|f(x)-f(a)| < \epsilon/2$. The intervals $(a - \delta_a/2, a + \delta_a/2)$ form an open cover of $[a,b]$; by Heine-Borel, finitely many suffice: $(a_i - \delta_{a_i}/2, a_i + \delta_{a_i}/2)$ for $i=1,\dots,N$. Set $\delta = \min \{ \delta_{a_1}/2, \dots, \delta_{a_N}/2 \} > 0$. For any $x,y \in [a,b]$ with $|x-y| < \delta$, both lie within $\delta_{a_i}$ of some common center $a_i$, forcing $|f(x)-f(y)| < \epsilon$. This finite subcover argument—impossible on non-compact domains—transforms local continuity into global uniform control.
[/motivation]
Uniform continuity ensures that Riemann sums converge independently of partition choice, making integration of continuous functions on closed intervals well-defined. It also guarantees that continuous functions map Cauchy sequences to Cauchy sequences—a property failing for merely continuous functions on non-compact domains.
## Pathologies and the limits of continuity
Continuity alone cannot guarantee differentiability. Weierstrass's famous example:
\begin{align*}
W(x) = \sum_{n=0}^{\infty} a^n \cos(b^n \pi x)
\end{align*}
with $0 < a < 1$, $b$ an odd integer, and $ab > 1 + \frac{3\pi}{2}$, is continuous everywhere (by the Weierstrass M-test) but differentiable nowhere. The high-frequency oscillations at every scale prevent tangent lines from forming—yet the function remains continuous because oscillation *amplitude* decays geometrically.
Conversely, differentiability implies continuity—a one-way street revealing that continuity is strictly weaker than differentiability. The absolute value function $f(x) = |x|$ is continuous everywhere but not differentiable at $0$, demonstrating that corners obstruct differentiability while preserving continuity.
These pathologies teach humility: geometric intuition fails for functions defined by infinite processes. Yet continuity remains robust—sufficient for integration, intermediate value properties, and approximation by polynomials ([Weierstrass approximation theorem](/theorems/480)). It forms the minimal regularity needed for much of analysis, a foundation upon which stronger properties (differentiability, smoothness) are built—but never assumed.
Continuity transforms the local $\epsilon$-$\delta$ condition into global behaviour through domain structure. On arbitrary [sets](/page/Set), continuity offers little; on intervals, it guarantees intermediate values; on closed bounded intervals, it ensures boundedness, extrema attainment, and uniform control. This progression — from local definition to global consequence — exemplifies analysis itself: precise local conditions, combined with completeness and compactness, yield powerful global theorems that intuition alone could never reliably predict.
Continuity tells us that a function has no jumps — its values change "gradually." But *how* gradually? The next chapter asks whether the *rate* of change is itself well-defined. A continuous function need not have a tangent line at every point (consider $|x|$ at $0$), and Weierstrass's nowhere-differentiable function shows that "gradual" change can be so erratic that no instantaneous rate of change exists anywhere. [Differentiability](/page/Derivative) is the condition that separates well-behaved functions from merely continuous ones.
# Differentiability
The derivative captures instantaneous rate of change—the slope of a tangent line, the velocity of a moving particle at an exact moment, the marginal cost in economics. Newton and Leibniz independently discovered this concept in the 17th century, launching the calculus revolution. Yet their methods relied on "infinitesimals," quantities smaller than any positive real number yet non-zero—a notion that troubled even its creators. Bishop Berkeley famously mocked them as "ghosts of departed quantities." It took 150 years for Cauchy and Weierstrass to replace infinitesimals with rigorous limits, transforming the derivative from a heuristic tool into a precisely defined mathematical object.
The modern definition frames differentiability as the existence of a *linear approximation*: near a point $a$, the function $f$ behaves like the line $f(a) + f'(a)(x-a)$, with error vanishing faster than $|x-a|$. This perspective reveals differentiability as a *stronger* property than continuity—while continuity requires $f(x) \to f(a)$ as $x \to a$, differentiability demands that the *rate* of this convergence follows a precise linear pattern. This extra structure enables powerful local-to-global principles (Mean Value Theorem) and systematic approximation (Taylor series), yet remains fragile: functions can be continuous everywhere yet differentiable nowhere, and differentiability at a point provides no guarantee of differentiability nearby.
## The definition of the derivative
Differentiability concerns the limiting behavior of difference quotients. Geometrically, the difference quotient $\frac{f(x)-f(a)}{x-a}$ represents the slope of the secant line through $(a,f(a))$ and $(x,f(x))$. Differentiability means these secant slopes approach a definite limit as $x \to a$—the slope of the unique tangent line.
[definition: Derivative At Point]
Let $E \subseteq \mathbb{R}$, let $f: E \to \mathbb{R}$, and let $a \in E$ be a limit point of $E$. We say $f$ is differentiable at $a$ if the limit:
\begin{align*}
\lim_{x \to a} \frac{f(x) - f(a)}{x - a}
\end{align*}
exists as a finite real number. When this limit exists, we denote it by $f'(a)$ and call it the derivative of $f$ at $a$. Equivalently, $f$ is differentiable at $a$ if there exists $L \in \mathbb{R}$ such that:
\begin{align*}
\lim_{h \to 0} \frac{f(a+h) - f(a) - Lh}{h} = 0
\end{align*}
where $h$ ranges over values with $a+h \in E$ and $h \neq 0$.
[/definition]
[motivation]
The equivalent formulation $f(a+h) = f(a) + Lh + o(h)$ reveals the essence of differentiability: $f$ admits a linear approximation near $a$ with error $o(h)$ (vanishing faster than $|h|$). This perspective generalizes naturally to higher dimensions, where the derivative becomes a linear transformation approximating the function. The requirement that $a$ be a limit point ensures the difference quotient is defined for points arbitrarily close to $a$—isolated points admit no meaningful notion of derivative.
[/motivation]
Differentiability implies continuity—a fundamental relationship showing that smoothness requires at least basic regularity.
[quotetheorem:184]
[citeproof:184]
The converse fails dramatically. The absolute value function $f(x) = |x|$ is continuous everywhere but not differentiable at $0$—the left and right difference quotients approach $-1$ and $1$, preventing a unique tangent slope. More strikingly, Weierstrass constructed functions continuous everywhere yet differentiable nowhere, proving that continuity provides no guarantee of local linear structure.
[example: Absolute Value Function]
Define $f: \mathbb{R} \to \mathbb{R}$ by $f(x) = |x|$. At $a = 0$:
\begin{align*}
\lim_{h \to 0^+} \frac{f(0+h) - f(0)}{h} = \lim_{h \to 0^+} \frac{h}{h} = 1 \\
\lim_{h \to 0^-} \frac{f(0+h) - f(0)}{h} = \lim_{h \to 0^-} \frac{-h}{h} = -1
\end{align*}
Since the one-sided limits differ, $f'(0)$ does not exist. Geometrically, the graph has a "corner" at $0$—no unique tangent line exists.
[/example]
## Algebra of derivatives
Differentiation respects algebraic operations, enabling systematic computation. These rules follow directly from limit properties applied to difference quotients.
[quotetheorem:198]
[citeproof:198]
[motivation]
The chain rule deserves special attention—it transforms differentiation of composites into multiplication of derivatives, a deceptively simple result with profound consequences. Its proof requires care: when $g(x) = g(a)$ for $x$ near $a$, the standard difference quotient manipulation fails. The resolution involves defining an auxiliary function that handles the case $g(x) = g(a)$ separately, ensuring the limit exists regardless of whether $g$ is locally constant. This subtlety explains why early calculus texts often stated the chain rule with unnecessary hypotheses like "$g(x) \neq g(a)$ for $x \neq a$ near $a$"—a restriction eliminated by rigorous limit analysis.
[/motivation]
These rules, combined with derivatives of basic functions ($x^n$, $\sin x$, $\cos x$, $e^x$, $\log x$), enable differentiation of complex expressions. Yet computation alone reveals little about the deeper meaning of derivatives—this requires global theorems connecting local behavior to overall function structure.
## Rolle's theorem and the Mean Value Theorem
Rolle's theorem states an intuitively obvious fact: a differentiable function that returns to its starting value must have a horizontal tangent somewhere in between. Despite its simplicity, it underpins the most powerful results in differential calculus.
[quotetheorem:185]
[citeproof:185]
[motivation]
Why does this require proof? Geometric intuition suggests it must be true—but intuition fails for pathological functions. The proof exploits the Maximum-Minimum Theorem: $f$ attains a maximum and minimum on $[a,b]$. If both occur at endpoints, $f$ is constant and $f' = 0$ everywhere. Otherwise, an interior extremum occurs at some $c \in (a,b)$; differentiability forces $f'(c) = 0$ (the difference quotients from left and right have opposite signs, so their common limit must be zero). This argument reveals a deep connection: local extrema of differentiable functions occur precisely where the derivative vanishes—a principle driving optimization theory.
[/motivation]
Rolle's theorem generalizes to the Mean Value Theorem (MVT), which compares a function's average rate of change over an interval to its instantaneous rate at some interior point.
[quotetheorem:186]
[citeproof:186]
The MVT transforms local derivative information into global function behavior. If $f' = 0$ everywhere on an interval, the MVT forces $f$ to be constant—proving that antiderivatives are unique up to additive constants. If $f' \geq 0$, the MVT implies $f$ is increasing; if $|f'| \leq M$, it yields the Lipschitz bound $|f(x)-f(y)| \leq M|x-y|$. These consequences make the MVT the workhorse of differential calculus.
[motivation]
The standard proof applies Rolle's theorem to $g(x) = f(x) - \ell(x)$, where $\ell(x)$ is the secant line through $(a,f(a))$ and $(b,f(b))$. Since $g(a) = g(b) = 0$, Rolle's theorem provides $c \in (a,b)$ with $g'(c) = 0$, i.e., $f'(c) = \ell'(c) = \frac{f(b)-f(a)}{b-a}$. This elegant reduction shows how a simple extremal principle (Rolle) generates profound global information (MVT). Crucially, the hypotheses—continuity on $[a,b]$ and differentiability on $(a,b)$—are minimal: discontinuities or non-differentiable points can invalidate the conclusion, as $f(x) = |x|$ on $[-1,1]$ demonstrates ($\frac{f(1)-f(-1)}{1-(-1)} = 0$ but $f'(x) \neq 0$ for all $x \neq 0$).
[/motivation]
Cauchy's Mean Value Theorem extends the MVT to pairs of functions, enabling L'Hôpital's rule for indeterminate forms.
[quotetheorem:187]
[citeproof:187]
## Taylor's theorem
The derivative provides a first-order (linear) approximation to a function. Higher-order derivatives enable polynomial approximations of arbitrary degree—Taylor polynomials—that capture increasingly subtle local behavior. Taylor's theorem quantifies the error in these approximations, revealing when infinite series representations converge to the original function.
[definition: Taylor Polynomial]
Let $f: E \to \mathbb{R}$ where $E \subseteq \mathbb{R}$ contains a neighborhood of $a$. Suppose $f$ has $n$ continuous derivatives at $a$. The $n$th Taylor polynomial of $f$ at $a$ is:
\begin{align*}
T_n(x) = \sum_{k=0}^{n} \frac{f^{(k)}(a)}{k!} (x-a)^k
\end{align*}
[/definition]
Taylor's theorem states that $f(x) = T_n(x) + R_n(x)$ where the remainder $R_n(x)$ vanishes faster than $(x-a)^n$ as $x \to a$. Several equivalent remainder formulas serve different purposes.
[quotetheorem:188]
[citeproof:188]
[quotetheorem:199]
[citeproof:199]
[quotetheorem:189]
[citeproof:189]
[motivation]
The Lagrange remainder resembles the next term in the series with the derivative evaluated at an intermediate point—ideal for bounding errors when derivative magnitudes are known. The Cauchy remainder provides sharper bounds for alternating series. The integral remainder, while requiring stronger hypotheses, enables precise asymptotic analysis and connects differentiation to integration via the Fundamental Theorem of Calculus. All three forms reveal a profound truth: smoothness (existence of high-order derivatives) guarantees accurate polynomial approximation—a principle underpinning [numerical analysis](/page/Numerical%20Analysis), physics perturbation theory, and the definition of analytic functions.
[/motivation]
When the remainder $R_n(x) \to 0$ as $n \to \infty$ for all $x$ in some interval, $f$ equals its Taylor series there and is called *analytic*. Not all smooth functions are analytic: the classic example
\begin{align*}
f(x) =
\begin{cases}
e^{-1/x^2} & \text{if } x \neq 0 \\
0 & \text{if } x = 0
\end{cases}
\end{align*}
has $f^{(k)}(0) = 0$ for all $k$, so its Taylor series is identically zero—yet $f(x) > 0$ for $x \neq 0$. This function is smooth but not analytic at $0$, demonstrating that infinite differentiability does not guarantee convergence of the Taylor series to the function.
## L'Hôpital's rule
Indeterminate forms like $0/0$ or $\infty/\infty$ arise frequently in limit calculations. L'Hôpital's rule leverages Cauchy's Mean Value Theorem to resolve them by comparing derivative ratios.
[quotetheorem:200]
[citeproof:200]
[quotetheorem:201]
[citeproof:201]
[motivation]
Despite its utility, L'Hôpital's rule is often misapplied. It requires the *derivative ratio* limit to exist—if $\lim f'(x)/g'(x)$ fails to exist, the rule says nothing about $\lim f(x)/g(x)$. Moreover, it applies only to genuine indeterminate forms: $\lim \frac{x+\sin x}{x}$ equals $1$ directly, but misapplying L'Hôpital gives $\lim \frac{1+\cos x}{1}$ which oscillates and fails to exist. The rule's proof for $0/0$ extends $f$ and $g$ continuously to $a$ by setting $f(a)=g(a)=0$, then applies Cauchy's MVT to $[a,x]$ for $x > a$, yielding $\frac{f(x)}{g(x)} = \frac{f'(c_x)}{g'(c_x)}$ for some $c_x \in (a,x)$; as $x \to a^+$, $c_x \to a^+$, forcing the limit equality. The $\infty/\infty$ case requires a more delicate argument using reciprocal transformations.
[/motivation]
## Pathologies and the fragility of differentiability
Differentiability is remarkably fragile compared to continuity. While continuous functions form a robust class closed under uniform limits (by the [Uniform Limit Theorem](/theorems/258)), differentiable functions are not: the sequence $f_n(x) = \sqrt{x^2 + 1/n}$ converges uniformly to $|x|$ on $\mathbb{R}$, yet each $f_n$ is smooth while the limit is not differentiable at $0$.
Weierstrass's nowhere-differentiable continuous function shattered 19th-century intuition:
\begin{align*}
W(x) = \sum_{n=0}^{\infty} a^n \cos(b^n \pi x)
\end{align*}
with $0 < a < 1$, $b$ an odd integer, and $ab > 1 + \frac{3\pi}{2}$. The series [converges uniformly](/page/Uniform%20Convergence) (by the Weierstrass M-test with $M_n = a^n$), so $W$ is continuous. Yet the high-frequency oscillations at every scale—amplified by the factor $b^n$—prevent any tangent line from forming. The proof examines difference quotients at scales matching the oscillation periods, showing they oscillate without approaching a limit.
This pathology teaches a profound lesson: continuity describes *position* regularity, while differentiability describes *velocity* regularity. A particle can move continuously along a path without ever having a well-defined instantaneous velocity—its motion can be so erratic at every scale that no tangent direction emerges. Differentiability imposes strict local structure that continuity alone cannot guarantee.
Differentiability transforms local linear approximation into global function behaviour through the Mean Value Theorem. Yet this power comes at a cost: differentiability is easily destroyed by corners, cusps, or infinite oscillation. The [derivative](/page/Derivative) exists only where a function's graph admits a unique tangent line — a stringent requirement that fails even for continuous functions. This fragility makes differentiability a precious property, one that enables Taylor approximation, optimisation, and differential equations, yet demands careful verification at each point. The journey from Newton's intuitive fluxions to Weierstrass's rigorous limits reveals analysis at its best: replacing geometric intuition with precise language that survives even when intuition fails.
Taylor's theorem shows that a function with $n$ derivatives can be approximated by a polynomial of degree $n$, with quantitative error bounds. The natural question is: what happens when $n \to \infty$? If $f$ is infinitely differentiable, does the Taylor polynomial converge to $f$ as the degree grows? This leads to *power series* — infinite polynomials that represent functions like $e^x$, $\sin x$, and $\log(1+x)$ exactly within their [radius of convergence](/theorems/273). Power series bridge algebra and analysis: they behave like polynomials (supporting term-by-term calculus) while capturing transcendental functions that resist algebraic description.
# Power Series
A power series is an infinite polynomial—an expression of the form $\sum_{n=0}^{\infty} a_n (x-c)^n$ that may converge for some values of $x$ and diverge for others. This simple object bridges algebra and analysis: it behaves like a polynomial (supporting term-by-term differentiation and integration) while representing transcendental functions like $e^x$, $\sin x$, and $\log(1+x)$ that resist algebraic description. The 18th century saw mathematicians manipulate power series with astonishing success yet shaky foundations—Euler famously derived correct results from divergent series through formal manipulation. It fell to Cauchy and Abel in the 19th century to establish rigorous convergence criteria, revealing that power series possess a hidden geometric structure: they converge precisely within an interval (or disk in $\mathbb{C}$) centered at $c$, with behavior at the boundary requiring separate analysis.
This section develops the theory of power series as the natural habitat for analytic functions—those locally representable by convergent power series. We discover that within their interval of convergence, power series inherit the best properties of polynomials (smoothness, term-wise calculus operations) while transcending polynomial limitations through infinite degree. The radius of convergence emerges as a fundamental invariant encoding the distance to the nearest singularity in the complex plane—a profound connection between real analysis and complex function theory that anticipates the full power of complex analysis.
## Definition and radius of convergence
A power series centered at $c \in \mathbb{R}$ is a series whose terms involve powers of $(x-c)$ with constant coefficients.
[definition: Power Series]
A power series centered at $c \in \mathbb{R}$ is an expression of the form:
\begin{align*}
\sum_{n=0}^{\infty} a_n (x - c)^n
\end{align*}
where $(a_n)_{n=0}^{\infty}$ is a sequence of real numbers called the coefficients.
[/definition]
For each fixed $x$, the series becomes a numerical series $\sum a_n (x-c)^n$ whose convergence depends on $x$. Remarkably, convergence occurs precisely on an interval symmetric about $c$—a consequence of the geometric decay/growth of $(x-c)^n$.
[quotetheorem:202]
[citeproof:202]
[motivation]
Why symmetric intervals? Consider $x_1$ with $|x_1 - c| = r$ where the series converges. For any $x_2$ with $|x_2 - c| < r$, we have $|a_n (x_2-c)^n| = |a_n (x_1-c)^n| \cdot |(x_2-c)/(x_1-c)|^n \leq M \rho^n$ where $\rho = |x_2-c|/|x_1-c| < 1$ and $M$ bounds $|a_n (x_1-c)^n|$ (since convergent sequences are bounded). The geometric series $\sum M \rho^n$ converges, so $\sum a_n (x_2-c)^n$ converges absolutely by comparison. This "domination" argument shows convergence propagates inward from any convergent point, forcing symmetry about $c$. Divergence propagates outward similarly: if the series diverges at $x_1$, it must diverge at all $x_2$ with $|x_2-c| > |x_1-c|$, else convergence at $x_2$ would force convergence at $x_1$ by the same domination argument.
[/motivation]
The radius $R$ can be computed explicitly using the ratio or root tests applied to the numerical series at fixed $x$.
[quotetheorem:203]
[citeproof:203]
[quotetheorem:204]
[citeproof:204]
[example: Geometric Series]
The geometric series $\sum_{n=0}^{\infty} x^n$ has $a_n = 1$, so $|a_n|^{1/n} = 1$ and $R = 1$. It converges to $\frac{1}{1-x}$ for $|x| < 1$ and diverges for $|x| \geq 1$. This simplest power series underpins all others through formal manipulation and serves as the prototype for radius of convergence behavior.
[/example]
[example: Exponential Series]
The series $\sum_{n=0}^{\infty} \frac{x^n}{n!}$ has $\left| \frac{a_{n+1}}{a_n} \right| = \frac{1}{n+1} \to 0$, so $R = \infty$. It converges for all $x \in \mathbb{R}$—a consequence of factorial growth dominating any exponential $|x|^n$.
[/example]
[example: Logarithmic Series]
The series $\sum_{n=1}^{\infty} \frac{(-1)^{n-1}}{n} x^n$ has $\left| \frac{a_{n+1}}{a_n} \right| = \frac{n}{n+1} \to 1$, so $R = 1$. It converges for $|x| < 1$ and at $x = 1$ (alternating harmonic series) but diverges at $x = -1$ (harmonic series). This illustrates that boundary behavior ($|x-c| = R$) must be analyzed separately—convergence may hold at some boundary points and fail at others.
[/example]
## Analytic functions and term-wise calculus
Within the open interval of convergence $(c-R, c+R)$, power series behave like "infinite polynomials": they are infinitely differentiable, and calculus operations commute with infinite summation.
[quotetheorem:205]
[citeproof:205]
[quotetheorem:206]
[citeproof:206]
[quotetheorem:207]
[citeproof:207]
[motivation]
Why do differentiation and integration preserve the radius of convergence? For differentiation, the Cauchy-Hadamard formula gives:
\begin{align*}
\limsup_{n \to \infty} |n a_n|^{1/n} = \limsup_{n \to \infty} n^{1/n} |a_n|^{1/n} = \limsup_{n \to \infty} |a_n|^{1/n}
\end{align*}
since $n^{1/n} \to 1$. Thus the derived series has identical radius $R$. Integration introduces factors $1/(n+1)$ whose $n$th roots also tend to $1$, preserving $R$. Crucially, while the radius remains unchanged, boundary behavior may differ: the series $\sum_{n=1}^{\infty} \frac{x^n}{n}$ diverges at $x=1$, but its term-wise integral $\sum_{n=1}^{\infty} \frac{x^{n+1}}{n(n+1)}$ converges at $x=1$—integration can improve boundary convergence, while differentiation may worsen it.
[/motivation]
These theorems justify defining fundamental transcendental functions through power series—bypassing geometric or limit-based definitions to obtain immediate analyticity.
[definition: Exponential Function]
The exponential function $\exp: \mathbb{R} \to \mathbb{R}$ is defined by:
\begin{align*}
\exp(x) = \sum_{n=0}^{\infty} \frac{x^n}{n!}
\end{align*}
This series has radius of convergence $R = \infty$, so $\exp$ is defined for all real $x$.
[/definition]
From the series definition, we derive the functional equation $\exp(x+y) = \exp(x)\exp(y)$ through Cauchy products of absolutely convergent series, then define $e = \exp(1)$ and $e^x = \exp(x)$. Differentiation yields $\frac{d}{dx} e^x = e^x$ with $e^0 = 1$—characterizing the exponential as the unique solution to this differential equation.
[definition: Sine And Cosine Functions]
The sine and cosine functions are defined by:
\begin{align*}
\sin x &= \sum_{n=0}^{\infty} (-1)^n \frac{x^{2n+1}}{(2n+1)!} \\
\cos x &= \sum_{n=0}^{\infty} (-1)^n \frac{x^{2n}}{(2n)!}
\end{align*}
Both series have radius of convergence $R = \infty$.
[/definition]
Term-wise differentiation gives $\sin' x = \cos x$ and $\cos' x = -\sin x$, with initial conditions $\sin 0 = 0$, $\cos 0 = 1$. The Pythagorean identity $\sin^2 x + \cos^2 x = 1$ follows from differentiating the left-hand side to obtain zero, then evaluating at $x=0$. Euler's formula $e^{ix} = \cos x + i \sin x$ emerges naturally by substituting $ix$ into the exponential series and separating real/imaginary parts.
[definition: Natural Logarithm]
For $|x| < 1$, define:
\begin{align*}
\log(1+x) = \sum_{n=1}^{\infty} (-1)^{n-1} \frac{x^n}{n}
\end{align*}
This series has radius of convergence $R = 1$ and converges at $x = 1$ (alternating harmonic series) but diverges at $x = -1$. Term-wise differentiation yields $\frac{d}{dx} \log(1+x) = \frac{1}{1+x}$ for $|x| < 1$, justifying the logarithmic interpretation. The full logarithm function on $(0,\infty)$ is constructed by analytic continuation using functional equations.
[/definition]
## Analytic continuation and the identity theorem
A function $f: I \to \mathbb{R}$ defined on an open interval $I$ is analytic at $c \in I$ if there exists a power series $\sum a_n (x-c)^n$ with positive radius of convergence that equals $f(x)$ for all $x$ in some neighborhood of $c$. It is analytic on $I$ if analytic at every point of $I$.
Polynomials, rational functions (away from poles), $\exp$, $\sin$, $\cos$, and $\log$ (on $(0,\infty)$) are analytic. Remarkably, *every* analytic function is infinitely differentiable—but the converse fails: the smooth non-analytic function
\begin{align*}
f(x) =
\begin{cases}
e^{-1/x^2} & \text{if } x \neq 0 \\
0 & \text{if } x = 0
\end{cases}
\end{align*}
has $f^{(k)}(0) = 0$ for all $k$, so its Taylor series at $0$ is identically zero yet $f(x) > 0$ for $x \neq 0$. This demonstrates that infinite differentiability does not guarantee local power series representation—analyticity is strictly stronger than smoothness.
Analytic functions possess a rigidity absent in merely smooth functions: their local behavior determines global structure through the identity theorem.
[quotetheorem:208]
[citeproof:208]
[motivation]
Why does this fail for smooth functions? The non-analytic function $f(x) = e^{-1/x^2}$ for $x>0$ and $f(x)=0$ for $x\leq 0$ is smooth everywhere and agrees with the zero function on $(-\infty,0]$, yet differs for $x>0$. The set of agreement has limit point $0$, but $f$ is not analytic at $0$. For analytic functions, agreement on a sequence $x_n \to c$ forces all derivatives to match at $c$ (by continuity of derivatives and term-wise differentiation), so the Taylor series at $c$ coincides for $f$ and $g$. Since both functions equal their Taylor series in a neighborhood of $c$, they agree near $c$. Repeating this argument propagates agreement throughout $I$—a "domino effect" impossible for non-analytic functions. This rigidity makes analytic functions exceptionally well-behaved: a single power series coefficient determines the entire function globally.
[/motivation]
The identity theorem explains why power series representations are unique: if $\sum a_n (x-c)^n = \sum b_n (x-c)^n$ on an interval containing $c$, then $a_n = b_n$ for all $n$. It also justifies formal power series manipulations—if two analytic functions satisfy a functional equation on a small interval, the identity theorem extends it globally.
## Complex perspective and singularities
Though developed for real analysis, power series reveal their deepest structure when viewed through complex analysis. The radius of convergence $R$ equals the distance from $c$ to the nearest singularity of the complex extension of $f$.
[example: Geometric Series Revisited]
The geometric series $\sum x^n$ converges for $|x| < 1$ because its sum $1/(1-x)$ has a pole (singularity) at $x = 1$ in the complex plane—distance $1$ from the center $c = 0$.
[/example]
[example: Logarithmic Series Revisited]
The series $\sum (-1)^{n-1} x^n / n$ converges for $|x| < 1$ because $\log(1+x)$ has a branch point singularity at $x = -1$ in the complex plane—distance $1$ from $c = 0$. The convergence at $x = 1$ is a real-axis phenomenon not predicted by complex distance alone.
[/example]
This complex perspective explains why $\sum x^n/n^2$ has radius $1$ despite converging at both endpoints: the dilogarithm function $\operatorname{Li}_2(x)$ has a singularity at $x = 1$ (logarithmic branch point), setting $R = 1$. It also clarifies why some series have infinite radius: $\exp(z)$, $\sin z$, and $\cos z$ are entire functions with no singularities in the finite complex plane.
Power series transform analysis by providing a unified framework for transcendental functions. Within their disk of convergence, they combine algebraic manipulability with analytic power — supporting term-wise calculus while representing functions beyond polynomial reach. The radius of convergence encodes geometric information about singularities, the identity theorem imposes global rigidity from local data, and complex analysis reveals hidden structure invisible on the real line alone. From Euler's bold manipulations to Cauchy's rigorous foundations, power series exemplify analysis at its most fruitful: infinite processes tamed by precise convergence criteria to yield profound mathematical insight.
The course has now built functions from three directions: continuity (no jumps), differentiability (existence of tangent lines), and power series (convergent infinite polynomials). All three concern *local* structure — what happens near a point. The final chapter turns to a *global* operation: integration, which accumulates function values over an entire interval. The [Riemann integral](/page/Riemann%20Integral) defines area rigorously as a limit of sums, and the Fundamental Theorem of Calculus reveals that this global operation is the inverse of the local operation of differentiation — closing the circle that began with limits of sequences.
# Integration
Integration completes the triumvirate of core concepts in analysis—limits, continuity, and differentiation—by formalizing the ancient geometric notion of "area under a curve." While Archimedes computed areas using exhaustion arguments and Newton-Leibniz developed the powerful but logically shaky Fundamental Theorem linking integration to antidifferentiation, it fell to Riemann in the 19th century to provide a rigorous foundation based solely on limits of sums. His insight was deceptively simple: approximate the area by rectangles whose heights sample the function's values, then refine the partition until upper and lower approximations converge to a common value. This approach transforms geometric intuition into precise arithmetic, yet reveals surprising subtleties—functions with infinitely many discontinuities may still be integrable, while seemingly tame unbounded functions resist integration entirely.
The Riemann integral's elegance lies in its constructive nature: it requires no prior knowledge of antiderivatives, making it suitable for defining integrals of functions whose primitives remain unknown. Yet its limitations—failure to integrate certain bounded functions, incompatibility with limit operations—motivated Lebesgue's revolutionary measure-theoretic approach in the 20th century. For our purposes, Riemann integration provides the perfect balance: sufficient for calculus applications while exposing the delicate interplay between continuity, boundedness, and limiting processes that defines real analysis.
## The Riemann integral: partitions and sums
To define integration rigorously, we replace vague notions of "area" with precise arithmetic approximations. Given a bounded function $f: [a,b] \to \mathbb{R}$, we partition the interval into subintervals and construct rectangles whose heights capture $f$'s behavior on each subinterval. Two natural choices emerge: rectangles reaching the supremum of $f$ on each subinterval (overestimating area) and those reaching the infimum (underestimating area). When these approximations converge to the same value as partitions refine, we declare $f$ integrable.
[definition: Partition]
A partition (or dissection) $\mathcal{D}$ of the interval $[a,b]$ with $a < b$ is a finite set of points:
\begin{align*}
\mathcal{D} = \{ x_0, x_1, \dots, x_n \} \subseteq [a,b]
\end{align*}
satisfying $a = x_0 < x_1 < \cdots < x_{n-1} < x_n = b$. The mesh size of $\mathcal{D}$ is:
\begin{align*}
\|\mathcal{D}\| = \max_{1 \leq j \leq n} (x_j - x_{j-1})
\end{align*}
[/definition]
Partitions provide the scaffolding for approximating area. Refining a partition—adding more points—typically improves approximation accuracy, a principle formalized by the refinement lemma.
[definition: Upper and Lower Sums]
Let $f: [a,b] \to \mathbb{R}$ be bounded and $\mathcal{D} = \{x_0, \dots, x_n\}$ a partition of $[a,b]$. Define:
\begin{align*}
M_j &= \sup \{ f(x) : x \in [x_{j-1}, x_j] \} \\
m_j &= \inf \{ f(x) : x \in [x_{j-1}, x_j] \}
\end{align*}
The upper sum $S(f,\mathcal{D})$ and lower sum $s(f,\mathcal{D})$ are:
\begin{align*}
S(f,\mathcal{D}) &= \sum_{j=1}^{n} M_j (x_j - x_{j-1}) \\
s(f,\mathcal{D}) &= \sum_{j=1}^{n} m_j (x_j - x_{j-1})
\end{align*}
[/definition]
[motivation]
Why [supremum and infimum](/page/Supremum%20and%20Infimum) rather than maximum and minimum? Continuous functions attain extrema on closed intervals, but discontinuous functions may not. The supremum/infimum formulation accommodates arbitrary bounded functions while preserving the geometric interpretation: $S(f,\mathcal{D})$ represents the total area of rectangles covering the region under $f$, while $s(f,\mathcal{D})$ represents rectangles contained within it. When $f$ is continuous, $M_j$ and $m_j$ become maximum and minimum values, recovering the intuitive picture. The gap $S(f,\mathcal{D}) - s(f,\mathcal{D})$ quantifies oscillation—large gaps indicate wild variation that may prevent [integrability](/page/Integral).
[/motivation]
Refining partitions reduces this oscillation gap, a crucial monotonicity property.
[quotetheorem:192]
[citeproof:192]
This monotonicity implies that upper sums decrease and lower sums increase under refinement, suggesting convergence to limiting values.
[definition: Upper and Lower Integrals]
Let $f: [a,b] \to \mathbb{R}$ be bounded. The upper integral and lower integral are:
\begin{align*}
I^*(f) &= \inf \{ S(f,\mathcal{D}) : \mathcal{D} \text{ is a partition of } [a,b] \} \\
I_*(f) &= \sup \{ s(f,\mathcal{D}) : \mathcal{D} \text{ is a partition of } [a,b] \}
\end{align*}
[/definition]
By the Refinement Lemma, $I^*(f) \geq I_*(f)$ always holds. Integrability occurs precisely when equality holds—when upper and lower approximations squeeze to a common value.
[definition: Riemann Integrable]
A bounded function $f: [a,b] \to \mathbb{R}$ is Riemann integrable if $I^*(f) = I_*(f)$. In this case, the Riemann integral is:
\begin{align*}
\int_a^b f(x) \,d\mathcal{L}^1(x) = I^*(f) = I_*(f)
\end{align*}
[/definition]
This definition captures the essence of integration: the area exists when overestimates and underestimates converge to the same value. Yet verifying integrability directly from this definition is cumbersome. A practical criterion emerges from the oscillation gap.
[quotetheorem:193]
[citeproof:193]
This criterion transforms integrability into a concrete approximation problem: can we partition the domain so that $f$'s oscillation on each subinterval becomes negligible in aggregate?
## Integrability of continuous and monotonic functions
Continuity provides precisely the regularity needed for integrability. On a closed bounded interval, continuity implies uniform continuity—a global constraint on oscillation that enables precise partition control.
[quotetheorem:195]
[citeproof:195]
Monotonic functions, though potentially discontinuous, also admit integration—their oscillation is confined to jump discontinuities that become negligible under refinement.
[quotetheorem:194]
[citeproof:194]
These theorems reveal a profound truth: integrability requires neither continuity nor monotonicity, but rather controlled oscillation. The Dirichlet function (1 on rationals, 0 on irrationals) fails integrability because its oscillation remains maximal on every subinterval. Yet Thomae's function—discontinuous at rationals but continuous at irrationals—achieves integrability through diminishing oscillation near discontinuities.
[example: Thomae Function Is Integrable]
Define $f: [0,1] \to \mathbb{R}$ by:
\begin{align*}
f(x) =
\begin{cases}
\frac{1}{q} & \text{if } x = \frac{p}{q} \text{ in lowest terms with } q > 0 \\
0 & \text{if } x \text{ is irrational}
\end{cases}
\end{align*}
This function is discontinuous at every rational but continuous at every irrational. To show integrability:
Given $\epsilon > 0$, choose $N \in \mathbb{N}$ with $1/N < \epsilon/2$. The set $B = \{ x \in [0,1] : f(x) \geq 1/N \}$ contains only rationals $p/q$ with $q \leq N$, so $|B| \leq N^2$ is finite.
Construct a partition $\mathcal{D}$ where each point of $B$ lies in an interval of length at most $\epsilon/(2N^2)$. On intervals containing points of $B$, $f \leq 1$; on other intervals, $f < 1/N < \epsilon/2$. Then:
\begin{align*}
S(f,\mathcal{D}) - s(f,\mathcal{D}) &\leq \sum_{\text{intervals with } B} 1 \cdot \frac{\epsilon}{2N^2} + \sum_{\text{other intervals}} \frac{\epsilon}{2} \cdot (x_j - x_{j-1}) \\
&\leq N^2 \cdot \frac{\epsilon}{2N^2} + \frac{\epsilon}{2} \cdot 1 = \epsilon
\end{align*}
Hence $f$ is integrable with $\int_0^1 f = 0$.
[/example]
## Properties of the integral
Once integrability is established, the integral inherits algebraic structure from its definition as a limit of sums.
[quotetheorem:209]
[citeproof:209]
These properties transform the integral into a linear functional on the space of integrable functions—a cornerstone of functional analysis.
## The Fundamental Theorem of Calculus
The deepest connection in elementary analysis links integration and differentiation. While Riemann integration defines area independently of antiderivatives, the Fundamental Theorem reveals that for continuous functions, integration *constructs* antiderivatives and differentiation *recovers* integrands.
[quotetheorem:190]
[citeproof:190]
[motivation]
This theorem resolves a historical paradox: Newton and Leibniz treated integration as antidifferentiation, yet Riemann defined it through sums without reference to derivatives. The Fundamental Theorem bridges these perspectives—it shows that Riemann's area construction *automatically* produces antiderivatives for continuous functions. Geometrically, $F(x)$ accumulates area from $a$ to $x$; the rate of accumulation at $x$ equals the current height $f(x)$—a profound connection between static area and dynamic rate of change.
[/motivation]
The converse direction completes the circle: antidifferentiation computes areas.
[quotetheorem:191]
[citeproof:191]
Together, these theorems justify the Newton-Leibniz formula $\int_a^b f = F(b) - F(a)$ where $F' = f$, while grounding it in rigorous limit theory. They also solve initial value problems: the unique solution to $y' = f(x)$, $y(a) = y_0$ is $y(x) = y_0 + \int_a^x f(t) \,d\mathcal{L}^1(t)$.
## Integration techniques
The Fundamental Theorem enables powerful computational methods derived from differentiation rules.
[quotetheorem:210]
[citeproof:210]
[quotetheorem:211]
[citeproof:211]
These techniques extend calculus's computational reach while preserving rigorous foundations.
## Improper integrals
Riemann integration requires bounded functions on bounded intervals. Yet many applications involve unbounded domains or singularities—$\int_0^\infty e^{-x} \,d\mathcal{L}^1(x)$ or $\int_0^1 x^{-1/2} \,d\mathcal{L}^1(x)$. Improper integrals extend integration through limiting processes.
[definition: Improper Integral on an Infinite Interval]
Let $f: [a,\infty) \to \mathbb{R}$ be Riemann integrable on $[a,R]$ for all $R > a$. If $\lim_{R \to \infty} \int_a^R f(x) \,d\mathcal{L}^1(x)$ exists as a finite real number, we define:
\begin{align*}
\int_a^\infty f(x) \,d\mathcal{L}^1(x) = \lim_{R \to \infty} \int_a^R f(x) \,d\mathcal{L}^1(x)
\end{align*}
and say the improper integral converges. Otherwise, it diverges.
[/definition]
[definition: Improper Integral at a Singularity]
Let $f: (a,b] \to \mathbb{R}$ be unbounded near $a$ but Riemann integrable on $[\delta,b]$ for all $\delta > a$. If $\lim_{\delta \to a^+} \int_\delta^b f(x) \,d\mathcal{L}^1(x)$ exists finitely, we define:
\begin{align*}
\int_a^b f(x) \,d\mathcal{L}^1(x) = \lim_{\delta \to a^+} \int_\delta^b f(x) \,d\mathcal{L}^1(x)
\end{align*}
[/definition]
[motivation]
Improper integrals reveal subtle distinctions between series and integrals. While $\sum a_n$ convergent implies $a_n \to 0$, improper integrals may converge without the integrand tending to zero—consider a sequence of triangular spikes of height 1 and width $1/n^2$ centered at integers; the total area $\sum 1/n^2$ converges while the integrand oscillates between 0 and 1. This pathology underscores that integration smooths local behavior, unlike summation which preserves pointwise values.
[/motivation]
The integral test connects series and improper integrals for monotonic functions.
[quotetheorem:196]
[citeproof:196]
This test justifies defining the Euler-Mascheroni constant $\gamma = \lim_{n \to \infty} (1 + \frac{1}{2} + \cdots + \frac{1}{n} - \log n)$, a fundamental constant appearing in number theory and analysis.
Integration completes our journey through the core concepts of real analysis. From Cauchy's $\epsilon$-$\delta$ definition of limits to Riemann's partition-based integral, each concept builds upon the last, transforming geometric intuition into rigorous arithmetic. The Fundamental Theorem of Calculus stands as the crowning achievement—unifying differentiation and integration through the completeness of $\mathbb{R}$, while improper integrals extend this framework to unbounded domains. Yet limitations remain: the Dirichlet function resists Riemann integration despite boundedness, and limit operations under the integral sign require additional hypotheses. These shortcomings motivated Lebesgue's measure-theoretic integration in the 20th century—a deeper theory where analysis truly flourishes. For now, Riemann integration provides the perfect foundation: sufficient for calculus applications while exposing the delicate interplay between continuity, boundedness, and limiting processes that defines real analysis.