Convex geometry studies the shape, structure, and quantitative behavior of convex sets in Euclidean space. The course develops the basic language of convexity and then moves toward the central geometric tools that make the subject powerful: separation by hyperplanes, support functions, Minkowski addition, polarity, and the ways convex bodies encode dual descriptions of the same object. Along the way, it connects geometric intuition with quantitative invariants such as volume, mixed volume, and distance between bodies.
The chapters are arranged to build from foundational definitions to deeper structural results. After introducing convex sets and convex bodies, the course establishes separation theorems and supporting hyperplanes, then uses support functions and the Hausdorff metric to measure and compare bodies. Minkowski addition and polarity lead naturally to gauge functions and duality, while the middle chapters develop the main inequalities of the subject, including Brunn-Minkowski, Steiner’s formula, and the Minkowski and Alexandrov-Fenchel inequalities. The later chapters show how these ideas control approximation by ellipsoids and connect convex geometry with probability, revealing the breadth of the theory and its applications.
# 1. Convex Sets and Convex Bodies
This opening chapter fixes the language in which the rest of the course is written. Building on the introductory question about the smallest convex set containing a point configuration, convex geometry studies sets that contain the line segment between any two of their points, but many of the useful ideas require a little more structure: affine hulls locate the correct ambient dimension, relative interiors avoid false boundary phenomena, and convex hulls let us pass from point configurations to bodies. The central finiteness principles in this chapter are Caratheodory's theorem, Radon's theorem, and Helly's theorem, which explain why convexity in $\mathbb R^n$ is controlled by subsets of size at most $n+2$.
## Convexity in the Correct Affine Space
Given a set $C \subset \mathbb R^n$, the first question is: when does knowing two points of $C$ force us to know all points between them? This segment condition is the basic local rule for convexity, and it is deliberately invariant under affine changes of coordinates.
[definition: Line Segment]
Let $x,y \in \mathbb R^n$. The line segment joining $x$ and $y$ is
\begin{align*}
[x,y] := \{(1-t)x + ty : 0 \le t \le 1\}.
\end{align*}
[/definition]
The parameter $t$ records how far along the segment we have moved from $x$ to $y$. The definition includes the endpoints, so a segment is a compact subset of $\mathbb R^n$.
To turn segments into a property of a whole set, we ask for a closure rule: whenever the set contains two endpoints, it must contain every interpolated point between them. This is the basic condition that prevents gaps along straight paths and makes linear inequalities, averages, and geometric hulls behave predictably.
[definition: Convex Set]
A set $C \subset \mathbb R^n$ is convex if for all $x,y \in C$ and all $t \in [0,1]$, one has
\begin{align*}
(1-t)x + ty \in C.
\end{align*}
[/definition]
Convexity should be read as a closure property under averaging with non-negative coefficients summing to $1$. The two-point definition is the primitive operation, but repeated averaging is what appears in constructions such as centres of mass, barycentric coordinates, and finite mixtures.
A finite configuration usually generates many points at once, not just the points on one chosen segment. We therefore need notation for assigning non-negative weights to several points while keeping the total weight equal to one; this records exactly the algebra of finite averaging without committing to an order of successive segment constructions.
[definition: Convex Combination]
Let $x_1,\dots,x_m \in \mathbb R^n$. A convex combination of $x_1,\dots,x_m$ is a point of the form
\begin{align*}
\sum_{i=1}^m \lambda_i x_i,
\end{align*}
where $\lambda_i \ge 0$ for every $i$ and $\sum_{i=1}^m \lambda_i = 1$.
[/definition]
The definition is useful only if it really captures everything convexity forces from finitely many points. Starting with pairwise segments, one must know that repeated averaging never leaves a convex set, and conversely that closure under all finite convex combinations already implies ordinary two-point convexity. The following criterion makes that equivalence precise, turning a geometric closure condition into an algebraic test on coefficients.
[quotetheorem:4081]
[citeproof:4081]
The criterion lets us replace repeated segment arguments by a single statement about coefficients. The finite hypothesis matters: convexity alone says nothing about arbitrary infinite averages or limits unless a topological assumption such as closedness is also present. The theorem also does not say that convex representations are unique; uniqueness is special to simplices and will fail for most convex bodies. Its forward role is to turn convexity into coefficient bookkeeping, which is the language needed for convex hulls, Caratheodory's theorem, and Radon's theorem.
[example: Convexity Of A Half-Space]
Let $a \in \mathbb R^n$ with $a \ne 0$ and let $\alpha \in \mathbb R$. We show that the half-space
\begin{align*}
H := \{x \in \mathbb R^n : a \cdot x \le \alpha\}
\end{align*}
is convex. Take $x,y \in H$ and $0 \le t \le 1$. Since $x,y \in H$, we have $a\cdot x \le \alpha$ and $a\cdot y \le \alpha$. By bilinearity of the dot product,
\begin{align*}
a \cdot ((1-t)x+ty)
&= a\cdot((1-t)x) + a\cdot(ty)\\
&= (1-t)(a\cdot x) + t(a\cdot y).
\end{align*}
Because $0 \le t \le 1$, both coefficients $1-t$ and $t$ are non-negative, so multiplying the inequalities $a\cdot x \le \alpha$ and $a\cdot y \le \alpha$ gives
\begin{align*}
(1-t)(a\cdot x) &\le (1-t)\alpha,\\
t(a\cdot y) &\le t\alpha.
\end{align*}
Adding these two inequalities,
\begin{align*}
(1-t)(a\cdot x)+t(a\cdot y)
&\le (1-t)\alpha+t\alpha\\
&= \alpha.
\end{align*}
Therefore $a \cdot ((1-t)x+ty)\le \alpha$, so $(1-t)x+ty\in H$. Thus $H$ is convex: a single linear inequality cuts out a region closed under taking line segments.
[/example]
A second question appears as soon as a convex set lies in a lower-dimensional plane. If $C$ is a triangle in $\mathbb R^3$, then its natural interior is two-dimensional, not three-dimensional.
[definition: Affine Hull]
Let $A \subset \mathbb R^n$. The affine hull $\operatorname{aff}(A)$ is the smallest affine subspace of $\mathbb R^n$ containing $A$.
[/definition]
The affine hull is the correct ambient space for intrinsic statements about $A$. If $A$ consists of points lying in a plane, then boundary and interior should usually be interpreted inside that plane.
This leads to a modified notion of interior that ignores directions unavailable inside the affine hull. It lets lower-dimensional convex sets have genuine interior points relative to their own span, which is essential for separating boundary phenomena from artifacts of the surrounding space.
[definition: Relative Interior]
Let $C \subset \mathbb R^n$. The relative interior of $C$, denoted $\operatorname{relint}(C)$, is the interior of $C$ taken in the subspace topology on $\operatorname{aff}(C)$.
[/definition]
Some convex sets have enough compactness to behave like geometric objects with extrema, while unbounded examples require separate treatment. The term convex body is used in these notes for non-empty compact convex sets; when full dimension matters, we will say explicitly that the body has non-empty interior in the ambient space.
[definition: Convex Body]
A convex body in $\mathbb R^n$ is a non-empty compact convex set $K \subset \mathbb R^n$.
[/definition]
Full-dimensional convex bodies are the special case with non-empty interior in the ambient space. Thus a solid ball in $\mathbb R^3$ is full-dimensional, while a filled triangle in a plane in $\mathbb R^3$ is still a compact convex body but has empty ordinary interior; its natural interior is relative to its affine hull.
[example: Relative Interior Of A Triangle In Space]
Let $p_1,p_2,p_3\in\mathbb R^3$ be non-collinear, and let
\begin{align*}
T=\{\lambda_1p_1+\lambda_2p_2+\lambda_3p_3:\lambda_i\ge 0,\ \lambda_1+\lambda_2+\lambda_3=1\}
\end{align*}
be their filled triangle. Since the points are non-collinear, the vectors $p_2-p_1$ and $p_3-p_1$ are linearly independent, so the affine hull of $T$ is the plane
\begin{align*}
P=p_1+\operatorname{span}\{p_2-p_1,p_3-p_1\}.
\end{align*}
Indeed, if $\lambda_1+\lambda_2+\lambda_3=1$, then
\begin{align*}
\lambda_1p_1+\lambda_2p_2+\lambda_3p_3
&=(1-\lambda_2-\lambda_3)p_1+\lambda_2p_2+\lambda_3p_3\\
&=p_1+\lambda_2(p_2-p_1)+\lambda_3(p_3-p_1),
\end{align*}
so every point of $T$ lies in $P$, and $P$ is the smallest affine plane containing the three non-collinear points.
Inside this plane, use coordinates
\begin{align*}
(s,t)\longmapsto p_1+s(p_2-p_1)+t(p_3-p_1).
\end{align*}
Under this affine coordinate map, the triangle corresponds to
\begin{align*}
\{(s,t)\in\mathbb R^2:s\ge 0,\ t\ge 0,\ s+t\le 1\}.
\end{align*}
Its interior in $\mathbb R^2$ is
\begin{align*}
\{(s,t)\in\mathbb R^2:s>0,\ t>0,\ s+t<1\}.
\end{align*}
Writing $\lambda_2=s$, $\lambda_3=t$, and $\lambda_1=1-s-t$, this becomes
\begin{align*}
\operatorname{relint}(T)
=\{\lambda_1p_1+\lambda_2p_2+\lambda_3p_3:\lambda_i>0,\ \lambda_1+\lambda_2+\lambda_3=1\}.
\end{align*}
As a subset of $\mathbb R^3$, however, $T$ has empty ordinary interior. If $n=(p_2-p_1)\times(p_3-p_1)$, then $n\ne 0$ and every point of $P$ satisfies $(x-p_1)\cdot n=0$. For any $x\in T$ and any $\varepsilon>0$, the point
\begin{align*}
x+\frac{\varepsilon}{2}\frac{n}{\|n\|}
\end{align*}
has distance $\varepsilon/2$ from $x$ but is not in $P$, hence is not in $T$. Thus no three-dimensional open ball around a point of $T$ lies inside $T$, while the relative interior records the genuine two-dimensional interior of the triangle in its own affine plane.
[/example]
## Convex Hulls and Finite Generation
If a set $A$ is not convex, the next question is: what is the smallest convex set forced by $A$? This construction converts point clouds into polygons, polytopes, and higher-dimensional convex objects.
[definition: Convex Hull]
Let $A \subset \mathbb R^n$. The convex hull of $A$, denoted $\operatorname{conv}(A)$, is the intersection of all convex subsets of $\mathbb R^n$ that contain $A$.
[/definition]
Because intersections of convex sets are convex, this definition produces a convex set, but the definition is not yet computational. An intersection over all convex supersets is hard to inspect directly, while a point in a hull should be described by explicit weights on points of $A$. The issue is whether the abstract minimal convex set is exactly the set of all finite convex combinations drawn from $A$.
[quotetheorem:4082]
[citeproof:4082]
The theorem identifies two complementary viewpoints on the same object. The intersection definition is canonical and proves minimality, while the finite-combination description is usable for calculations and membership certificates. It also explains why convex hulls of infinite sets still have finite witnesses for individual points: each point in the hull is built from some finite list of generators, even though no single finite list need describe the whole hull.
This description may still use many points, and without a dimension-dependent reduction it gives no uniform certificate for membership. Caratheodory's theorem supplies exactly that reduction: in $\mathbb R^n$ we never need more than $n+1$ points.
[quotetheorem:4083]
[citeproof:4083]
The ambient dimension hypothesis is the heart of the theorem: in an affine subspace of dimension $d$, the sharper bound is $d+1$, while without finite dimension there is no uniform finite bound of this kind. The theorem does not give uniqueness of the chosen points or coefficients, and it does not identify the smallest possible number for a particular point. It is the bridge from convex geometry to finite combinatorics: membership in a possibly infinite convex hull can be certified by at most $n+1$ generators, a fact used repeatedly in separation, Helly-type results, and optimisation.
[example: Convex Hull Of Four Planar Points]
Let $A=\{(0,0),(1,0),(0,1),(1,1)\}\subset \mathbb R^2$. We first identify its convex hull with the square $[0,1]^2$. Each point of $A$ lies in $[0,1]^2$, and $[0,1]^2$ is convex because if $x=(x_1,x_2),y=(y_1,y_2)\in [0,1]^2$ and $0\le t\le 1$, then
\begin{align*}
0\le (1-t)x_i+ty_i \le (1-t)\cdot 1+t\cdot 1=1
\end{align*}
for $i=1,2$. Hence $\operatorname{conv}(A)\subset [0,1]^2$. Conversely, if $(u,v)\in [0,1]^2$, then
\begin{align*}
(u,v)
&=(1-u)(1-v)(0,0)+u(1-v)(1,0)+(1-u)v(0,1)+uv(1,1).
\end{align*}
The four coefficients are non-negative because $0\le u,v\le 1$, and their sum is
\begin{align*}
(1-u)(1-v)+u(1-v)+(1-u)v+uv
&=(1-v)((1-u)+u)+v((1-u)+u)\\
&=(1-v)\cdot 1+v\cdot 1\\
&=1.
\end{align*}
Thus every point of $[0,1]^2$ is a convex combination of points of $A$, so $[0,1]^2\subset \operatorname{conv}(A)$.
For example,
\begin{align*}
\frac{1}{3}(1,0)+\frac{2}{3}(0,1)
&=\left(\frac{1}{3},0\right)+\left(0,\frac{2}{3}\right)\\
&=\left(\frac{1}{3},\frac{2}{3}\right),
\end{align*}
with coefficients $\frac13,\frac23\ge 0$ and $\frac13+\frac23=1$. Thus *Caratheodory's theorem* gives an upper bound, but this point uses only two generators in $\mathbb R^2$. The point $(1/2,1/2)$ also has multiple convex representations:
\begin{align*}
\frac12(0,0)+\frac12(1,1)
&=(0,0)+\left(\frac12,\frac12\right)
=\left(\frac12,\frac12\right),\\
\frac12(1,0)+\frac12(0,1)
&=\left(\frac12,0\right)+\left(0,\frac12\right)
=\left(\frac12,\frac12\right).
\end{align*}
So convex-hull membership can be certified by coefficients, but those coefficients need not be unique.
[/example]
The planar picture shows how convex hulls fill in a polygonal region. In three dimensions the same finite-combination definition produces solids rather than planar polygons.
[example: Convex Hull In Three Dimensions]
Let
\begin{align*}
T=\operatorname{conv}\{0,e_1,e_2,e_3\}
\quad\text{and}\quad
q=(1,1,1),
\qquad
A=\{0,e_1,e_2,e_3,q\}.
\end{align*}
We show that
\begin{align*}
\operatorname{conv}(A)=\{(1-s)z+sq:0\le s\le 1,\ z\in T\}.
\end{align*}
If $z\in T$, then $z=\lambda_0 0+\lambda_1e_1+\lambda_2e_2+\lambda_3e_3$ with $\lambda_i\ge 0$ and $\sum_{i=0}^3\lambda_i=1$. Hence
\begin{align*}
(1-s)z+sq
&=(1-s)\lambda_0 0+(1-s)\lambda_1e_1+(1-s)\lambda_2e_2+(1-s)\lambda_3e_3+s q,
\end{align*}
and the coefficients are non-negative with sum
\begin{align*}
(1-s)(\lambda_0+\lambda_1+\lambda_2+\lambda_3)+s
&=(1-s)\cdot 1+s\\
&=1.
\end{align*}
Thus every such point belongs to $\operatorname{conv}(A)$. Conversely, if $x\in\operatorname{conv}(A)$, then
\begin{align*}
x=\lambda_0 0+\lambda_1e_1+\lambda_2e_2+\lambda_3e_3+\mu q
\end{align*}
with all coefficients non-negative and $\lambda_0+\lambda_1+\lambda_2+\lambda_3+\mu=1$. Put $s=\mu$. If $s=1$, then $x=q=(1-s)0+sq$ with $0\in T$. If $s<1$, define
\begin{align*}
z=\frac{\lambda_0}{1-s}0+\frac{\lambda_1}{1-s}e_1+\frac{\lambda_2}{1-s}e_2+\frac{\lambda_3}{1-s}e_3.
\end{align*}
The coefficients of $z$ are non-negative and have sum
\begin{align*}
\frac{\lambda_0+\lambda_1+\lambda_2+\lambda_3}{1-s}
&=\frac{1-\mu}{1-s}\\
&=1,
\end{align*}
so $z\in T$, and
\begin{align*}
(1-s)z+sq
&=\lambda_0 0+\lambda_1e_1+\lambda_2e_2+\lambda_3e_3+\mu q\\
&=x.
\end{align*}
The point $(1/2,1/2,1/2)$ lies on the segment from $0$ to $q$ because
\begin{align*}
\frac12 0+\frac12(1,1,1)
&=(0,0,0)+\left(\frac12,\frac12,\frac12\right)\\
&=\left(\frac12,\frac12,\frac12\right).
\end{align*}
It is not in the old tetrahedron $T$: every point of $T$ has the form
\begin{align*}
\lambda_1e_1+\lambda_2e_2+\lambda_3e_3=(\lambda_1,\lambda_2,\lambda_3)
\end{align*}
with $\lambda_i\ge 0$ and $\lambda_1+\lambda_2+\lambda_3\le 1$, whereas
\begin{align*}
\frac12+\frac12+\frac12=\frac32>1.
\end{align*}
More generally, any point $(r,r,r)$ with $r>1/3$ cannot be represented using only $0,e_1,e_2,e_3$, since its coordinate sum is $3r>1$. Thus the added vertex $(1,1,1)$ genuinely enlarges the tetrahedron, while *Caratheodory's theorem* still guarantees that each point of this three-dimensional convex hull can be represented using at most four of the five listed generators.
[/example]
## Basic Operations Preserving Convexity
Once convex sets are available, we need to know which constructions keep us inside the category. The operations used throughout the course are intersections, products, affine images, projections, translations, and dilations.
[quotetheorem:4084]
[citeproof:4084]
The family may be infinite, and this is important because many convex sets are best described by infinitely many supporting inequalities. The theorem does not say that intersections preserve non-empty interior, openness, or compactness: nested slabs can collapse to a lower-dimensional set, and unbounded intersections can remain unbounded. It explains why half-spaces are so important: any intersection of linear inequalities is convex, and polyhedra and polytopes will later be studied through this lens.
Intersections are only one way of building new convex sets. We also need operations that move sets between spaces or combine coordinates, because examples often arise as products, images, and shadows of simpler convex bodies. The obstruction to check is whether taking a segment before or after such an operation gives compatible results.
[quotetheorem:4085]
[citeproof:4085]
The hypotheses in the product statement cannot be dropped: if one factor is non-convex, then the product contains non-convex slices. Linearity is also essential in the image statement, or more generally affinity; arbitrary nonlinear maps can bend a segment into a curve and destroy convexity of the image. The result does not claim that inverse images under arbitrary maps are convex, but it does show that translations, dilations, coordinate projections, and changes of linear coordinates preserve the convex category.
Translations and dilations are affine versions of the same principle. If $a\in\mathbb R^n$ and $\lambda\ge 0$, then $C+a:=\{x+a:x\in C\}$ and $\lambda C:=\{\lambda x:x\in C\}$ are convex whenever $C$ is convex.
A recurring task is to pass from a high-dimensional convex set to the lower-dimensional shadow seen on a chosen subspace. To make that operation precise, we name the linear maps that keep the chosen subspace fixed while discarding complementary information.
[definition: Projection]
Let $V \le \mathbb R^n$ be a linear subspace. A projection onto $V$ is a linear map $P:\mathbb R^n\to V$ satisfying $P(v)=v$ for all $v\in V$.
[/definition]
Since a projection is a linear map, projections preserve convexity. This simple observation is the geometric source of many shadows and sections arguments later in the course.
[example: Projecting A Cylinder]
Let
\begin{align*}
C=B(0,1)\times[0,2]
=\{(u,r)\in\mathbb R^2\times\mathbb R:\|u\|\le 1,\ 0\le r\le 2\}.
\end{align*}
We first verify that $C$ is convex. Take $(u,r),(v,s)\in C$ and $0\le t\le 1$. Since $\|u\|\le 1$ and $\|v\|\le 1$, the triangle inequality and homogeneity of the Euclidean norm give
\begin{align*}
\|(1-t)u+tv\|
&\le \|(1-t)u\|+\|tv\|\\
&=(1-t)\|u\|+t\|v\|\\
&\le (1-t)\cdot 1+t\cdot 1\\
&=1.
\end{align*}
Also, from $0\le r,s\le 2$ and $0\le t\le 1$,
\begin{align*}
0
&=(1-t)\cdot 0+t\cdot 0\\
&\le (1-t)r+ts\\
&\le (1-t)\cdot 2+t\cdot 2\\
&=2.
\end{align*}
Thus
\begin{align*}
(1-t)(u,r)+t(v,s)=((1-t)u+tv,(1-t)r+ts)\in C,
\end{align*}
so the cylinder is convex.
Now let $P_{12}:\mathbb R^2\times\mathbb R\to\mathbb R^2$ be $P_{12}(u,r)=u$. If $(u,r)\in C$, then $\|u\|\le 1$, so $P_{12}(u,r)=u\in B(0,1)$. Conversely, if $u\in B(0,1)$, then $(u,0)\in C$ because $0\in[0,2]$, and $P_{12}(u,0)=u$. Hence
\begin{align*}
P_{12}(C)=B(0,1).
\end{align*}
Similarly, for $P_3(u,r)=r$, every $(u,r)\in C$ has $r\in[0,2]$, while every $r\in[0,2]$ occurs as $P_3(0,r)$ because $0\in B(0,1)$. Therefore
\begin{align*}
P_3(C)=[0,2].
\end{align*}
The projections keep convexity but discard different coordinates: one shadow is the two-dimensional base disc, and the other is the one-dimensional height interval.
[/example]
## Standard Families of Convex Sets
Which examples should be kept in mind while learning the general theory? The recurring models are simplices, cubes, cross-polytopes, Euclidean balls, ellipsoids, cylinders, cones, and polytopes.
[definition: Simplex]
Let $p_0,\dots,p_n\in\mathbb R^n$ be affinely independent. The simplex with vertices $p_0,\dots,p_n$ is
\begin{align*}
\operatorname{conv}\{p_0,\dots,p_n\}.
\end{align*}
[/definition]
A simplex is the smallest full-dimensional polytope: in $\mathbb R^1$ it is a segment, in $\mathbb R^2$ a triangle, and in $\mathbb R^3$ a tetrahedron.
After simplices, the next finite model allows any finite list of generating points. This captures convex sets whose shape is determined by vertices rather than by curved boundary data, and it gives the class that later connects convex hulls with systems of linear inequalities.
[definition: Polytope]
A polytope in $\mathbb R^n$ is the convex hull of a finite subset of $\mathbb R^n$.
[/definition]
Finite hulls describe bounded shapes built from points, but later local geometry also needs sets of directions emanating from a point. Such a set should remain stable under non-negative rescaling as well as under convex averaging, which leads to the cone notion.
[definition: Convex Cone]
A set $C\subset \mathbb R^n$ is a convex cone if $C$ is convex and $\lambda x\in C$ for every $x\in C$ and every $\lambda\ge 0$.
[/definition]
Cones are usually not convex bodies because they are often unbounded. They are nevertheless central because tangent cones and normal cones describe local behaviour near boundary points.
[example: Cube Cross-Polytope And Ball]
In $\mathbb R^n$, the cube, cross-polytope, and Euclidean unit ball are respectively
\begin{align*}
Q&=[-1,1]^n=\{x\in\mathbb R^n:-1\le x_i\le 1\text{ for }1\le i\le n\},\\
P&=\left\{x\in\mathbb R^n:\sum_{i=1}^n |x_i|\le 1\right\},\\
B&=B(0,1)=\{x\in\mathbb R^n:\|x\|\le 1\}.
\end{align*}
The cube is convex: if $x,y\in Q$ and $0\le t\le 1$, then for each coordinate $i$,
\begin{align*}
-1
&=(1-t)(-1)+t(-1)\\
&\le (1-t)x_i+ty_i\\
&\le (1-t)\cdot 1+t\cdot 1\\
&=1,
\end{align*}
so $(1-t)x+ty\in Q$.
The cross-polytope is exactly the convex hull of $\{\pm e_i:1\le i\le n\}$. First, if
\begin{align*}
x=\sum_{i=1}^n \alpha_i e_i+\sum_{i=1}^n \beta_i(-e_i),
\end{align*}
where $\alpha_i,\beta_i\ge 0$ and $\sum_i\alpha_i+\sum_i\beta_i=1$, then the $i$th coordinate is $x_i=\alpha_i-\beta_i$, and hence
\begin{align*}
\sum_{i=1}^n |x_i|
&=\sum_{i=1}^n |\alpha_i-\beta_i|\\
&\le \sum_{i=1}^n(\alpha_i+\beta_i)\\
&=1.
\end{align*}
Thus every convex combination of the points $\pm e_i$ lies in $P$. Conversely, take $x\in P$. For each $i$, write
\begin{align*}
x_i^+=\max\{x_i,0\},
\qquad
x_i^-=\max\{-x_i,0\}.
\end{align*}
Then $x_i=x_i^+-x_i^-$ and $|x_i|=x_i^++x_i^-$. Let
\begin{align*}
r=1-\sum_{i=1}^n |x_i|.
\end{align*}
Since $x\in P$, we have $r\ge 0$. For $n\ge 1$, using $e_1$ and $-e_1$ to absorb the leftover coefficient gives
\begin{align*}
x
&=\sum_{i=1}^n x_i^+ e_i+\sum_{i=1}^n x_i^-(-e_i)+\frac r2 e_1+\frac r2(-e_1),
\end{align*}
because $\frac r2 e_1+\frac r2(-e_1)=0$. The coefficients are non-negative, and their sum is
\begin{align*}
\sum_{i=1}^n x_i^++\sum_{i=1}^n x_i^-+\frac r2+\frac r2
&=\sum_{i=1}^n |x_i|+r\\
&=\sum_{i=1}^n |x_i|+\left(1-\sum_{i=1}^n |x_i|\right)\\
&=1.
\end{align*}
So $x$ is a convex combination of the points $\pm e_i$.
The ball is convex by the triangle inequality and homogeneity of the Euclidean norm: if $x,y\in B$ and $0\le t\le 1$, then
\begin{align*}
\|(1-t)x+ty\|
&\le \|(1-t)x\|+\|ty\|\\
&=(1-t)\|x\|+t\|y\|\\
&\le (1-t)\cdot 1+t\cdot 1\\
&=1.
\end{align*}
Thus the cube is governed by coordinate inequalities, the cross-polytope by the $\ell^1$ inequality $\sum_i|x_i|\le 1$, and the Euclidean ball by the quadratic norm inequality $\sum_i x_i^2\le 1$. These three models show the main contrast: coordinate-flat faces for the cube, vertex-generated geometry for the cross-polytope, and smooth rotational symmetry for the Euclidean ball.
[/example]
The ball is the symmetric reference body. Ellipsoids are what that reference body becomes after an invertible linear change of coordinates.
[example: Ellipsoid As A Linear Image Of A Ball]
Let $A\in \mathbb R^{n\times n}$ be invertible and let $c\in\mathbb R^n$. Write
\begin{align*}
E=c+A(B(0,1))
=\{c+Au:u\in\mathbb R^n,\ \|u\|\le 1\}.
\end{align*}
We verify directly that $E$ is convex. Take $x,y\in E$ and $0\le t\le 1$. Then there are $u,v\in B(0,1)$ such that
\begin{align*}
x=c+Au,
\qquad
y=c+Av.
\end{align*}
Using distributivity and linearity of $A$,
\begin{align*}
(1-t)x+ty
&=(1-t)(c+Au)+t(c+Av)\\
&=(1-t)c+(1-t)Au+tc+tAv\\
&=((1-t)+t)c+A((1-t)u)+A(tv)\\
&=c+A((1-t)u+tv).
\end{align*}
It remains to check that $(1-t)u+tv\in B(0,1)$. Since $\|u\|\le 1$ and $\|v\|\le 1$, the triangle inequality and homogeneity of the Euclidean norm give
\begin{align*}
\|(1-t)u+tv\|
&\le \|(1-t)u\|+\|tv\|\\
&=(1-t)\|u\|+t\|v\|\\
&\le (1-t)\cdot 1+t\cdot 1\\
&=1.
\end{align*}
Thus $(1-t)u+tv\in B(0,1)$, so $(1-t)x+ty\in E$. Hence the ellipsoid is convex.
When $A$ is diagonal with positive entries, say
\begin{align*}
A=\operatorname{diag}(a_1,\dots,a_n),
\qquad
a_i>0,
\end{align*}
a point $x\in E$ has the form
\begin{align*}
x=c+Au
=(c_1+a_1u_1,\dots,c_n+a_nu_n)
\end{align*}
with $\sum_{i=1}^n u_i^2\le 1$. Equivalently,
\begin{align*}
u_i=\frac{x_i-c_i}{a_i}
\end{align*}
for each $i$, so $x\in E$ exactly when
\begin{align*}
\sum_{i=1}^n \left(\frac{x_i-c_i}{a_i}\right)^2\le 1.
\end{align*}
Along the $i$th coordinate direction, taking $u=\pm e_i$ gives the two boundary points
\begin{align*}
c+A e_i=c+a_i e_i,
\qquad
c+A(-e_i)=c-a_i e_i.
\end{align*}
These points lie a distance $a_i$ from the centre $c$ in the $\pm e_i$ directions, so the diagonal entries are precisely the semi-axis lengths.
[/example]
These standard bodies also preview duality: cubes and cross-polytopes will later exchange roles under polarity.
[remark: Polarity Preview]
For centrally symmetric convex bodies containing $0$ in their interiors, polarity reverses containment and exchanges some vertex and facet data. Under the standard Euclidean polarity, the cube $[-1,1]^n$ is dual to the cross-polytope $\{x: \sum_i |x_i|\le 1\}$, while the Euclidean unit ball is self-dual. Chapter 5 makes this precise after Chapter 3 introduces support functions.
[/remark]
## Radon and Helly Finiteness Principles
Convexity has a striking finite character in $\mathbb R^n$: large families of points or sets are controlled by small subfamilies. Radon's theorem is the point-configuration form, and Helly's theorem is the intersection form.
[quotetheorem:4086]
[citeproof:4086]
The number $n+2$ is sharp: the $n+1$ vertices of a simplex in $\mathbb R^n$ cannot be split into two parts whose convex hulls meet. The theorem does not describe a unique partition; symmetric configurations can have several Radon partitions, and degenerate configurations can have more. Its main role is combinatorial: affine dependence among points becomes an intersection statement for convex hulls, and this is the engine behind Helly-type theorems.
[example: Radon Partitions In The Plane]
Let $p_1,p_2,p_3,p_4\in\mathbb R^2$. In the first configuration, suppose $p_4$ lies in the triangle with vertices $p_1,p_2,p_3$. Then there are coefficients $\lambda_1,\lambda_2,\lambda_3\ge 0$ with
\begin{align*}
\lambda_1+\lambda_2+\lambda_3=1
\end{align*}
such that
\begin{align*}
p_4=\lambda_1p_1+\lambda_2p_2+\lambda_3p_3.
\end{align*}
Thus $p_4\in\operatorname{conv}\{p_1,p_2,p_3\}$, while also $p_4\in\operatorname{conv}\{p_4\}$ because
\begin{align*}
p_4=1\cdot p_4.
\end{align*}
Therefore
\begin{align*}
\operatorname{conv}\{p_4\}\cap \operatorname{conv}\{p_1,p_2,p_3\}
\end{align*}
contains $p_4$, so $\{p_4\}$ and $\{p_1,p_2,p_3\}$ form a Radon partition.
In the second configuration, suppose $p_1,p_2,p_3,p_4$ are the vertices of a convex quadrilateral in cyclic order. Let the diagonals $[p_1,p_3]$ and $[p_2,p_4]$ meet at $z$. Then for some $s,t\in[0,1]$,
\begin{align*}
z&=(1-t)p_1+tp_3,\\
z&=(1-s)p_2+sp_4.
\end{align*}
Since $1-t,t\ge 0$ and $(1-t)+t=1$, the first equation gives
\begin{align*}
z\in\operatorname{conv}\{p_1,p_3\}.
\end{align*}
Since $1-s,s\ge 0$ and $(1-s)+s=1$, the second equation gives
\begin{align*}
z\in\operatorname{conv}\{p_2,p_4\}.
\end{align*}
Hence
\begin{align*}
z\in \operatorname{conv}\{p_1,p_3\}\cap \operatorname{conv}\{p_2,p_4\},
\end{align*}
so the two opposite-vertex pairs form a Radon partition. In both cases, the partition is witnessed by an explicit common point of the two convex hulls.
[/example]
Radon's theorem is the combinatorial engine behind a stronger intersection principle. The question is how many local intersection checks are needed to force a common point for an entire finite family of convex sets. In dimension $n$, the right threshold is controlled by subfamilies of size at most $n+1$, reflecting the same finite-dimensional dependence that appeared in Caratheodory's theorem.
[quotetheorem:4087]
[citeproof:4087]
The phrase "at most $n+1$" is necessary: if the family has fewer than $n+1$ sets, requiring intersections only for subfamilies of exactly $n+1$ members would be vacuous. Convexity is also essential, since finite families of non-convex sets can have all small intersections non-empty while the total intersection is empty. The theorem does not give a point constructively, but it converts a global intersection problem into finitely many bounded-size checks; in $\mathbb R^2$, pairwise intersection is not enough, while triplewise intersection is enough.
[example: Three Discs Versus Pairwise Intersections]
Take the three closed unit discs
\begin{align*}
D_1&=\{x\in\mathbb R^2:\|x-(0,0)\|\le 1\},\\
D_2&=\{x\in\mathbb R^2:\|x-(2,0)\|\le 1\},\\
D_3&=\{x\in\mathbb R^2:\|x-(1,\sqrt 3)\|\le 1\}.
\end{align*}
They are convex because each is a translate of the Euclidean unit ball, and the ball is convex by the triangle inequality and homogeneity of the Euclidean norm.
The discs intersect pairwise. Indeed,
\begin{align*}
\|(1,0)-(0,0)\|&=1,
&
\|(1,0)-(2,0)\|&=1,
\end{align*}
so $(1,0)\in D_1\cap D_2$. Also,
\begin{align*}
\left\|\left(\frac12,\frac{\sqrt3}{2}\right)-(0,0)\right\|^2
&=\left(\frac12\right)^2+\left(\frac{\sqrt3}{2}\right)^2
=\frac14+\frac34
=1,\\
\left\|\left(\frac12,\frac{\sqrt3}{2}\right)-(1,\sqrt3)\right\|^2
&=\left(-\frac12\right)^2+\left(-\frac{\sqrt3}{2}\right)^2
=\frac14+\frac34
=1,
\end{align*}
so $\left(\frac12,\frac{\sqrt3}{2}\right)\in D_1\cap D_3$. Similarly,
\begin{align*}
\left\|\left(\frac32,\frac{\sqrt3}{2}\right)-(2,0)\right\|^2
&=\left(-\frac12\right)^2+\left(\frac{\sqrt3}{2}\right)^2
=1,\\
\left\|\left(\frac32,\frac{\sqrt3}{2}\right)-(1,\sqrt3)\right\|^2
&=\left(\frac12\right)^2+\left(-\frac{\sqrt3}{2}\right)^2
=1,
\end{align*}
so $\left(\frac32,\frac{\sqrt3}{2}\right)\in D_2\cap D_3$.
However, the three discs have no common point. If $x\in D_1\cap D_2$, then by the triangle inequality,
\begin{align*}
2
&=\|(2,0)-(0,0)\|\\
&\le \|x-(0,0)\|+\|x-(2,0)\|\\
&\le 1+1\\
&=2.
\end{align*}
Thus equality holds throughout, so both distances are equal to $1$ and $x$ lies on the line segment between $(0,0)$ and $(2,0)$; hence $x=(1,0)$. But
\begin{align*}
\|(1,0)-(1,\sqrt3)\|
&=\|(0,-\sqrt3)\|\\
&=\sqrt3\\
&>1,
\end{align*}
so $(1,0)\notin D_3$. Therefore
\begin{align*}
D_1\cap D_2\cap D_3=\varnothing.
\end{align*}
This shows why pairwise intersection is not enough in $\mathbb R^2$: by *Helly Theorem In Euclidean Space*, the correct finite test is intersection of every triple.
[/example]
## Compact Convex Bodies as the Main Objects
The last question in the chapter is why the course repeatedly restricts attention to compact convex sets with interior. Compactness supplies maxima and minima for continuous functions, while non-empty interior excludes lower-dimensional degeneracies.
[quotetheorem:4088]
[citeproof:4088]
The compactness assumption is essential: the convex hull of a non-compact set such as an open ball is not compact. The theorem does not assert that convex hulls preserve interior or dimension; points lying on a line still generate only a segment. Its role is to justify a recurring passage in convex geometry and optimisation: compact input data can be replaced by a compact convex feasible region on which continuous functions attain extrema.
[example: From A Point Cloud To A Convex Body]
Let $A=\{a_1,\dots,a_m\}\subset\mathbb R^2$ be finite, and suppose $A$ contains three non-collinear points $p_1,p_2,p_3$. By the definition of convex hull,
\begin{align*}
\operatorname{conv}(A)
=\left\{\sum_{i=1}^m \lambda_i a_i:\lambda_i\ge 0,\ \sum_{i=1}^m\lambda_i=1\right\}.
\end{align*}
This set is convex because if
\begin{align*}
x=\sum_{i=1}^m \lambda_i a_i,
\qquad
y=\sum_{i=1}^m \mu_i a_i,
\end{align*}
with $\lambda_i,\mu_i\ge 0$ and $\sum_i\lambda_i=\sum_i\mu_i=1$, then for $0\le t\le 1$,
\begin{align*}
(1-t)x+ty
&=(1-t)\sum_{i=1}^m \lambda_i a_i+t\sum_{i=1}^m \mu_i a_i\\
&=\sum_{i=1}^m\bigl((1-t)\lambda_i+t\mu_i\bigr)a_i.
\end{align*}
Each coefficient satisfies
\begin{align*}
(1-t)\lambda_i+t\mu_i\ge 0,
\end{align*}
and their sum is
\begin{align*}
\sum_{i=1}^m\bigl((1-t)\lambda_i+t\mu_i\bigr)
&=(1-t)\sum_{i=1}^m\lambda_i+t\sum_{i=1}^m\mu_i\\
&=(1-t)\cdot 1+t\cdot 1\\
&=1.
\end{align*}
Hence $(1-t)x+ty\in\operatorname{conv}(A)$.
The set is compact because it is the image of the simplex
\begin{align*}
\Delta_{m-1}=\left\{(\lambda_1,\dots,\lambda_m)\in\mathbb R^m:\lambda_i\ge 0,\ \sum_{i=1}^m\lambda_i=1\right\}
\end{align*}
under the continuous map
\begin{align*}
(\lambda_1,\dots,\lambda_m)\longmapsto \sum_{i=1}^m\lambda_i a_i.
\end{align*}
The simplex $\Delta_{m-1}$ is closed and bounded in $\mathbb R^m$, hence compact, so its continuous image $\operatorname{conv}(A)$ is compact.
It remains to see that the interior is non-empty. Since $p_1,p_2,p_3$ are non-collinear, the vectors
\begin{align*}
u=p_2-p_1,
\qquad
v=p_3-p_1
\end{align*}
are linearly independent. The triangle
\begin{align*}
T=\{\lambda_1p_1+\lambda_2p_2+\lambda_3p_3:\lambda_i\ge 0,\ \lambda_1+\lambda_2+\lambda_3=1\}
\end{align*}
lies in $\operatorname{conv}(A)$ because $p_1,p_2,p_3\in A$. Its barycentre is
\begin{align*}
b=\frac13p_1+\frac13p_2+\frac13p_3
=p_1+\frac13u+\frac13v.
\end{align*}
For any real $\alpha,\beta$ with $|\alpha|+|\beta|<\frac13$, the point
\begin{align*}
b+\alpha u+\beta v
&=p_1+\left(\frac13+\alpha\right)u+\left(\frac13+\beta\right)v\\
&=\left(1-\left(\frac13+\alpha\right)-\left(\frac13+\beta\right)\right)p_1
+\left(\frac13+\alpha\right)p_2
+\left(\frac13+\beta\right)p_3\\
&=\left(\frac13-\alpha-\beta\right)p_1
+\left(\frac13+\alpha\right)p_2
+\left(\frac13+\beta\right)p_3.
\end{align*}
The three coefficients are positive because
\begin{align*}
\frac13+\alpha>0,
\qquad
\frac13+\beta>0,
\qquad
\frac13-\alpha-\beta>0,
\end{align*}
and their sum is
\begin{align*}
\left(\frac13-\alpha-\beta\right)+\left(\frac13+\alpha\right)+\left(\frac13+\beta\right)=1.
\end{align*}
Thus a small two-dimensional neighbourhood of $b$ lies inside $T$, so $\operatorname{conv}(A)$ has non-empty interior. Therefore $\operatorname{conv}(A)$ is a compact convex polygon with non-empty interior, hence a convex body in $\mathbb R^2$.
If all points of $A$ lie on one line $L$, then every convex combination of points of $A$ also lies in $L$, because affine lines are closed under expressions
\begin{align*}
(1-t)x+ty
\end{align*}
with $x,y\in L$ and $0\le t\le 1$. Hence $\operatorname{conv}(A)\subset L$, so it has empty ordinary interior in $\mathbb R^2$. In that case the convex hull is a compact segment or a point: it is a convex body, but not a full-dimensional one in $\mathbb R^2$. This is why affine hulls and relative interiors are needed before making statements that depend on interior.
[/example]
After the first chapter's finite-dimensional toolkit, the natural next step is to ask what a convex set reveals through its supporting directions. Chapter 2 answers this by studying separation by hyperplanes, showing how geometric structure can be detected and often recovered from linear functionals.
# 2. Separation and Supporting Hyperplanes
After Chapter 1's finite-dimensional toolkit for convex sets, this chapter studies the first major analytic tool in convex geometry: separating a convex set from points or from other convex sets by affine hyperplanes. The guiding question is how much of a convex set can be recovered from the linear inequalities it satisfies. We pass from basic convexity to the language of half-spaces, supporting hyperplanes, exposed faces, and normal cones, then use these ideas to describe the faces and extreme points of familiar convex bodies.
## Linear Inequalities and Hyperplanes
How can a subset of $\mathbb R^n$ be detected by linear measurements? Write $(\mathbb R^n)^*$ for the dual space, meaning the vector space of all linear maps $u:\mathbb R^n\to\mathbb R$. A non-zero linear functional $u \in (\mathbb R^n)^*$ assigns a scalar height $u(x)$ to each point $x$, and affine hyperplanes are the level sets of such measurements. The associated half-spaces are the simplest convex sets, and separation theorems say that they are numerous enough to distinguish convex sets.
[definition: Affine Hyperplane]
Let $u \in (\mathbb R^n)^* \setminus \{0\}$ and let $\alpha \in \mathbb R$. The affine hyperplane with normal functional $u$ and level $\alpha$ is
\begin{align*}
H(u,\alpha) := \{x \in \mathbb R^n : u(x)=\alpha\}.
\end{align*}
[/definition]
The two closed sides of $H(u,\alpha)$ are the corresponding closed half-spaces. Replacing $u$ by $-u$ exchanges the two sides, so the orientation is part of the notation rather than an intrinsic property of the hyperplane.
To use hyperplanes as tests for containment and separation, we need notation for the two inequality regions they bound. These regions are the basic building blocks for linear descriptions of convex sets, because a single linear measurement decides on which side of the hyperplane a point lies.
[definition: Closed Half-Space]
Let $u \in (\mathbb R^n)^* \setminus \{0\}$ and $\alpha \in \mathbb R$. The closed positive and negative half-spaces determined by $H(u,\alpha)$ are
\begin{align*}
H^+(u,\alpha) &:= \{x \in \mathbb R^n : u(x) \ge \alpha\}, \\
H^-(u,\alpha) &:= \{x \in \mathbb R^n : u(x) \le \alpha\}.
\end{align*}
[/definition]
Half-spaces are the simplest convex sets defined by one linear inequality. In the plane they are easy to visualize as one side of a line.
[example: Half-Spaces In The Plane]
In $\mathbb R^2$, let $u(x_1,x_2)=x_1+2x_2$ and take $\alpha=3$. By the definition of $H(u,\alpha)$,
\begin{align*}
H(u,3)
&=\{(x_1,x_2)\in\mathbb R^2:u(x_1,x_2)=3\}\\
&=\{(x_1,x_2)\in\mathbb R^2:x_1+2x_2=3\}.
\end{align*}
Solving the equation for $x_2$ gives
\begin{align*}
x_1+2x_2&=3,\\
2x_2&=3-x_1,\\
x_2&=\frac{3-x_1}{2},
\end{align*}
so the hyperplane is the affine line of slope $-\frac12$ and vertical intercept $\frac32$.
The negative closed half-space is
\begin{align*}
H^-(u,3)
&=\{(x_1,x_2)\in\mathbb R^2:u(x_1,x_2)\le 3\}\\
&=\{(x_1,x_2)\in\mathbb R^2:x_1+2x_2\le 3\}.
\end{align*}
Since $2>0$, this inequality can be rewritten without reversing the sign:
\begin{align*}
x_1+2x_2&\le 3,\\
2x_2&\le 3-x_1,\\
x_2&\le \frac{3-x_1}{2}.
\end{align*}
Thus $H^-(u,3)$ is the closed region on or below the slanted line $x_2=(3-x_1)/2$. The point $(0,0)$ lies in this half-space because $u(0,0)=0\le3$, while $(0,2)$ does not because $u(0,2)=4>3$. This shows that a half-space is determined by a linear measurement, not by being horizontal or vertical in the coordinate axes.
[/example]
This example also shows how separation data changes. Varying $\alpha$ translates the line without changing its normal direction, while replacing $u$ by a positive scalar keeps the same geometric half-space after rescaling the level. Replacing $u$ by $-u$ reverses the inequality and swaps the chosen side.
The next structural question is whether these linear regions are compatible with convexity itself. Separation arguments will only be useful if a half-space is a convex set and if its boundary hyperplane behaves predictably under convex combinations.
[quotetheorem:4089]
[citeproof:4089]
This elementary calculation is the algebraic source of separation. The non-zero hypothesis on $u$ prevents the level set from being all of $\mathbb R^n$ or empty, so it ensures that the hyperplane is a genuine codimension-one object. Convexity of half-spaces is what makes linear inequalities compatible with convex hulls: once every point of a set satisfies an inequality, every convex combination satisfies it too. The limitation is that one inequality records only one direction of information; the later separation theorems explain when enough such inequalities can distinguish a convex set from points or other sets.
## Supporting Hyperplanes and Exposed Faces
When does a hyperplane touch a convex set without cutting through it? The answer is encoded by the maximum or minimum value of a linear functional on the set. For compact convex sets, such extrema exist, so every direction produces a supporting hyperplane.
[definition: Supporting Hyperplane]
Let $C \subset \mathbb R^n$ be convex. An affine hyperplane $H(u,\alpha)$ supports $C$ if $C \subset H^-(u,\alpha)$ and $C \cap H(u,\alpha) \ne \varnothing$, or if $C \subset H^+(u,\alpha)$ and $C \cap H(u,\alpha) \ne \varnothing$.
[/definition]
The supporting condition says that the set lies on one side and has contact with the boundary hyperplane. For compact $C$, the value $\alpha$ is usually chosen as $\max_{x\in C}u(x)$ or $\min_{x\in C}u(x)$.
A supporting hyperplane leaves open which part of $C$ is responsible for the contact. To connect support with boundary geometry, we need a named object for the entire contact set cut out by one supporting direction, not just for the hyperplane itself.
[definition: Exposed Face]
Let $C \subset \mathbb R^n$ be convex and let $u \in (\mathbb R^n)^* \setminus \{0\}$. If $u$ attains its maximum on $C$, the exposed face of $C$ in direction $u$ is
\begin{align*}
F(C,u) := \{x \in C : u(x)=\sup_{y\in C}u(y)\}.
\end{align*}
[/definition]
For non-compact sets the supremum may be infinite or may not be attained, so the definition is most useful for compact convex sets and for directions where the supremum is attained. Exposed faces are convex because equality in a linear inequality is preserved along segments.
[quotetheorem:4090]
[citeproof:4090]
Compactness is the hypothesis that turns a direction into an actual point of contact. If $C$ is not compact, a linear functional may have a finite supremum without attaining it, as happens for an open ball with any non-zero linear functional. If $C$ is unbounded, the supremum may be infinite in some directions, so there may be no supporting hyperplane with that normal at all. The contact set produced here is the first example of an exposed face, and collecting all directions that expose a given point leads to the normal cone.
[definition: Normal Cone]
Let $C \subset \mathbb R^n$ be convex and let $x_0 \in C$. The normal cone of $C$ at $x_0$ is
\begin{align*}
N_C(x_0) := \{u \in (\mathbb R^n)^* : u(x-x_0) \le 0 \text{ for all } x \in C\}.
\end{align*}
[/definition]
Thus $u\in N_C(x_0)$ means that $x_0$ maximises the linear functional $u$ over $C$. The zero functional is included, so $N_C(x_0)$ is always non-empty; boundary points are recognised by the presence of non-zero normal functionals.
[example: Normal Cones Of A Square]
Let $C=[-1,1]^2\subset\mathbb R^2$, and identify the functional $u$ with a vector $(a,b)$ by writing
\begin{align*}
u(x_1,x_2)=a x_1+b x_2.
\end{align*}
At $x_0=(1,1)$, the condition $(a,b)\in N_C(1,1)$ means
\begin{align*}
a(x_1-1)+b(x_2-1)\le 0
\end{align*}
for every $(x_1,x_2)\in[-1,1]^2$. Since $x_1-1\le0$ and $x_2-1\le0$ on $C$, every pair $a\ge0$, $b\ge0$ satisfies this inequality. Conversely, testing the inequality at $(-1,1)$ gives
\begin{align*}
a(-1-1)+b(1-1)&\le0,\\
-2a&\le0,\\
a&\ge0,
\end{align*}
and testing it at $(1,-1)$ gives
\begin{align*}
a(1-1)+b(-1-1)&\le0,\\
-2b&\le0,\\
b&\ge0.
\end{align*}
Thus
\begin{align*}
N_C(1,1)=\{(a,b):a\ge0,\ b\ge0\}.
\end{align*}
Now let $x_0=(1,t)$ with $-1<t<1$. The normal cone condition becomes
\begin{align*}
a(x_1-1)+b(x_2-t)\le0
\end{align*}
for every $(x_1,x_2)\in C$. Testing at $(1,1)$ gives
\begin{align*}
b(1-t)\le0,
\end{align*}
and since $1-t>0$, this implies $b\le0$. Testing at $(1,-1)$ gives
\begin{align*}
b(-1-t)\le0,
\end{align*}
and since $-1-t<0$, this implies $b\ge0$. Hence $b=0$. With $b=0$, the condition is
\begin{align*}
a(x_1-1)\le0
\end{align*}
for every $x_1\in[-1,1]$. Testing at $x_1=-1$ gives
\begin{align*}
-2a\le0,
\end{align*}
so $a\ge0$, and this condition is sufficient because $x_1-1\le0$ on $[-1,1]$. Therefore
\begin{align*}
N_C(1,t)=\{(a,0):a\ge0\}.
\end{align*}
Finally, let $x_0=(s,t)$ be an interior point, so $-1<s<1$ and $-1<t<1$. Choose $\varepsilon>0$ small enough that $s\pm\varepsilon\in[-1,1]$ and $t\pm\varepsilon\in[-1,1]$. The normal cone condition at $(s,t)$ gives
\begin{align*}
a((s+\varepsilon)-s)+b(t-t)&\le0,\\
a\varepsilon&\le0,
\end{align*}
and also
\begin{align*}
a((s-\varepsilon)-s)+b(t-t)&\le0,\\
-a\varepsilon&\le0.
\end{align*}
Since $\varepsilon>0$, these two inequalities force $a=0$. Applying the same argument to $(s,t+\varepsilon)$ and $(s,t-\varepsilon)$ forces $b=0$. Hence
\begin{align*}
N_C(s,t)=\{0\}.
\end{align*}
The vertex has a two-dimensional normal cone, a relative interior point of an edge has a one-dimensional normal cone, and an interior point has only the zero normal, so the normal cones record the local combinatorics of the square.
[/example]
## Weak and Strong Separation
Given two disjoint convex sets, when can a hyperplane be placed between them? There are two answers. Weak separation allows the two sets to touch the separating hyperplane, while strong separation inserts a positive gap between the two linear ranges.
[definition: Weak Separation]
Two sets $A,B\subset\mathbb R^n$ are weakly separated by a hyperplane if there exist $u\in(\mathbb R^n)^*\setminus\{0\}$ and $\alpha\in\mathbb R$ such that
\begin{align*}
u(a) \le \alpha \le u(b)
\end{align*}
for all $a\in A$ and $b\in B$.
[/definition]
Weak separation permits the boundary case where the two sets have the same extremal value in the separating direction. This is useful for tangent convex sets, but it does not provide a quantitative margin.
For many applications, merely ordering the two sets along one linear functional is not enough; one needs room to perturb points or inequalities without destroying separation. Strong separation records this stability by requiring a genuine interval between the projected values of the two sets.
[definition: Strong Separation]
Two sets $A,B\subset\mathbb R^n$ are strongly separated if there exist $u\in(\mathbb R^n)^*\setminus\{0\}$ and real numbers $\alpha<\beta$ such that
\begin{align*}
u(a) \le \alpha < \beta \le u(b)
\end{align*}
for all $a\in A$ and $b\in B$.
[/definition]
Strong separation is equivalent to a positive distance between the images $u(A)$ and $u(B)$ on the real line. In finite-dimensional convex geometry, compactness is a common hypothesis that supplies the positive gap.
A different source of strictness is openness. When one convex set contains a small neighbourhood around each of its points, separation can use that room to force a strict inequality on the open side. In finite-dimensional Euclidean space this is the separation principle needed here: if a non-empty open convex set is disjoint from another convex set, then a non-zero linear functional can place the open set strictly on one side of a level and the other set on the opposite side. The openness hypothesis is what creates the strict inequality; without it, two closed convex sets can approach each other without leaving a positive gap.
Compactness supplies a second finite-dimensional route to strict separation. If $A$ is compact convex, $B$ is closed convex, and $A\cap B=\varnothing$, then the distance between $A$ and $B$ is positive after minimizing over $A-B$; a separating hyperplane for the difference set gives a strong separating inequality for the original sets. Closedness of $B$ is used to keep the difference set closed, and compactness of $A$ prevents the two sets from approaching each other only at infinity. The following example shows that closedness and convexity alone do not force such a margin.
[example: Failure Of Strong Separation Without Compactness]
In $\mathbb R^2$, let
\begin{align*}
A=\{(x,y):y\ge e^x\}
\qquad\text{and}\qquad
B=\{(x,0):x\in\mathbb R\}.
\end{align*}
The set $A$ is closed because it is the preimage of $[0,\infty)$ under the continuous function $(x,y)\mapsto y-e^x$, and $B$ is a closed affine line. The set $A$ is convex because if $(x_1,y_1),(x_2,y_2)\in A$ and $t\in[0,1]$, then
\begin{align*}
ty_1+(1-t)y_2
&\ge t e^{x_1}+(1-t)e^{x_2}\\
&\ge e^{t x_1+(1-t)x_2},
\end{align*}
where the last inequality is the convexity of the exponential function. The set $B$ is convex because
\begin{align*}
t(x_1,0)+(1-t)(x_2,0)=(tx_1+(1-t)x_2,0)\in B.
\end{align*}
They are disjoint since every point of $A$ has $y\ge e^x>0$, while every point of $B$ has $y=0$.
Weak separation is given by the functional $u(x,y)=-y$ and the level $\alpha=0$. If $(x,y)\in A$, then
\begin{align*}
u(x,y)=-y\le -e^x<0=\alpha,
\end{align*}
and if $(x,0)\in B$, then
\begin{align*}
u(x,0)=0=\alpha.
\end{align*}
Thus $u(a)\le0\le u(b)$ for every $a\in A$ and $b\in B$.
We now show that no strong separation is possible. Suppose, for contradiction, that a non-zero linear functional
\begin{align*}
u(x,y)=px+qy
\end{align*}
and numbers $\alpha<\beta$ satisfy
\begin{align*}
u(a)\le\alpha<\beta\le u(b)
\end{align*}
for all $a\in A$ and $b\in B$. For $b=(x,0)\in B$ this gives
\begin{align*}
\beta\le u(x,0)=px
\end{align*}
for every $x\in\mathbb R$. If $p>0$, then choosing $x<\beta/p$ gives $px<\beta$, impossible. If $p<0$, then choosing $x>\beta/p$ gives $px<\beta$, also impossible. Hence $p=0$, so $u(x,y)=qy$ and $\beta\le0$.
Since $u$ is non-zero, $q\ne0$. If $q>0$, then for every $Y\ge1$ the point $(0,Y)\in A$, so
\begin{align*}
qY=u(0,Y)\le\alpha.
\end{align*}
Choosing $Y>\alpha/q$ contradicts this. Hence $q<0$. For each $x\in\mathbb R$, the boundary point $(x,e^x)\in A$ gives
\begin{align*}
q e^x=u(x,e^x)\le\alpha.
\end{align*}
As $x\to-\infty$, we have $e^x\to0$, and because $q<0$,
\begin{align*}
q e^x\to0.
\end{align*}
Therefore $\alpha\ge0$. Combining this with $\alpha<\beta\le0$ gives
\begin{align*}
0\le\alpha<\beta\le0,
\end{align*}
which is impossible. Thus the two closed convex sets are weakly separated but not strongly separated; the obstruction is that the graph of $e^x$ approaches the line $y=0$ at infinity without ever meeting it.
[/example]
This counterexample is a warning about infinite directions. Closed convex sets can approach each other at infinity, and separation by a positive margin requires a hypothesis that rules out this escape.
## Faces, Extreme Points, and Exposed Points
Which parts of a convex set behave like genuine boundary pieces rather than accidental subsets? Faces are subsets that contain every segment of the set whose relative interior they meet. Extreme points are the zero-dimensional faces, while exposed points are those singled out by a supporting hyperplane.
[definition: Face]
Let $C\subset\mathbb R^n$ be convex. A convex subset $F\subset C$ is a face of $C$ if whenever $x,y\in C$, $t\in(0,1)$, and $tx+(1-t)y\in F$, then $x,y\in F$.
[/definition]
The face condition rules out subsets that merely sit on the boundary but cut through a genuine segment of $C$. The smallest possible non-empty faces are singletons, and these are the points that cannot be decomposed inside the convex set.
For many arguments the face itself is not the main object; what matters is whether a point can be written as the interior of a non-trivial segment lying in $C$. If it can, then the point is not a true corner of the set, even if it lies on the boundary. The point-level condition below packages exactly this obstruction to decomposition, so that vertices of polytopes and their analogues in arbitrary convex sets can be discussed without repeatedly naming the singleton face.
[definition: Extreme Point]
Let $C\subset\mathbb R^n$ be convex. A point $x_0\in C$ is an extreme point of $C$ if $\{x_0\}$ is a face of $C$.
[/definition]
Indecomposability is an intrinsic condition, while optimization asks for a stronger kind of detectability. For supporting hyperplanes and linear programming, we need the points that are uniquely selected by maximizing one non-zero linear functional.
[definition: Exposed Point]
Let $C\subset\mathbb R^n$ be convex. A point $x_0\in C$ is an exposed point of $C$ if there exists $u\in(\mathbb R^n)^*\setminus\{0\}$ such that
\begin{align*}
\{x_0\}=\{x\in C:u(x)=\sup_{y\in C}u(y)\}.
\end{align*}
[/definition]
The definitions now raise a natural comparison question: does optimization always produce a face, and in the singleton case does it always produce an extreme point? The answer explains the precise relationship between exposed objects and the broader face structure.
[quotetheorem:4091]
[citeproof:4091]
The hypothesis that the supremum is attained is essential: without a non-empty maximising set there is no candidate face. Convexity of $C$ is used twice, once to know that the intersection with the supporting hyperplane is convex and once to interpret the segment condition inside $C$. This theorem explains why exposed points are extreme, but it also marks a limitation: faces need not be exposed for arbitrary compact convex sets. Polytopes, introduced in Chapter 1 as finite convex hulls, are the main class where the converse becomes true.
[example: Faces Of A Simplex]
Let $\Delta=\operatorname{conv}\{v_0,\dots,v_n\}\subset\mathbb R^n$ be an $n$-simplex with affinely independent vertices. Every point $x\in\Delta$ has a unique barycentric expression
\begin{align*}
x=\lambda_0v_0+\cdots+\lambda_nv_n,
\qquad
\lambda_i\ge0,
\qquad
\lambda_0+\cdots+\lambda_n=1.
\end{align*}
For a subset $I\subset\{0,\dots,n\}$, the set
\begin{align*}
\Delta_I:=\operatorname{conv}\{v_i:i\in I\}
\end{align*}
is a face of $\Delta$. Indeed, suppose $x,y\in\Delta$, $t\in(0,1)$, and
\begin{align*}
tx+(1-t)y\in \Delta_I.
\end{align*}
Write
\begin{align*}
x=\sum_{i=0}^n a_iv_i,
\qquad
y=\sum_{i=0}^n b_iv_i,
\end{align*}
with $a_i,b_i\ge0$ and $\sum_i a_i=\sum_i b_i=1$. Then
\begin{align*}
tx+(1-t)y
&=t\sum_{i=0}^n a_iv_i+(1-t)\sum_{i=0}^n b_iv_i\\
&=\sum_{i=0}^n\bigl(ta_i+(1-t)b_i\bigr)v_i.
\end{align*}
Since $tx+(1-t)y\in\Delta_I$, uniqueness of barycentric coordinates gives
\begin{align*}
ta_j+(1-t)b_j=0
\end{align*}
for every $j\notin I$. Because $t>0$, $1-t>0$, and $a_j,b_j\ge0$, this forces $a_j=0$ and $b_j=0$ for every $j\notin I$. Hence $x,y\in\Delta_I$, so $\Delta_I$ is a face.
Conversely, let $F$ be a non-empty face of $\Delta$, and let
\begin{align*}
I=\{i:v_i\in F\}.
\end{align*}
Since $F$ is convex, $\Delta_I\subset F$. If $x\in F$, write
\begin{align*}
x=\sum_{i=0}^n\lambda_iv_i
\end{align*}
with barycentric coordinates. Whenever $\lambda_k>0$, either $\lambda_k=1$ and $x=v_k$, or
\begin{align*}
x=\lambda_kv_k+(1-\lambda_k)\left(\sum_{i\ne k}\frac{\lambda_i}{1-\lambda_k}v_i\right),
\end{align*}
where the point in parentheses lies in $\Delta$ because its coefficients are non-negative and sum to
\begin{align*}
\sum_{i\ne k}\frac{\lambda_i}{1-\lambda_k}
=\frac{1-\lambda_k}{1-\lambda_k}
=1.
\end{align*}
The face condition then gives $v_k\in F$, so $k\in I$. Thus every vertex appearing with positive barycentric coefficient in $x$ lies in $I$, and therefore $x\in\Delta_I$. Hence $F=\Delta_I$.
Now take a non-empty proper subset $I\subsetneq\{0,\dots,n\}$. Define the affine function
\begin{align*}
\phi_I(x)=\sum_{i\in I}\lambda_i(x),
\end{align*}
where $\lambda_i(x)$ denotes the $i$th barycentric coordinate of $x$. On $\Delta$ we have $0\le\phi_I(x)\le1$, and
\begin{align*}
\phi_I(x)=1
\end{align*}
holds exactly when $\lambda_j(x)=0$ for every $j\notin I$, equivalently when $x\in\Delta_I$. Writing $\phi_I(x)=u_I(x)+c$ with $u_I$ linear, the same maximisers are obtained from $u_I$, because adding the constant $c$ does not change which points maximise the function. Thus each non-empty proper face $\Delta_I$ is exposed.
The normal cone along the relative interior of $\Delta_I$ consists exactly of those linear functionals $u$ for which
\begin{align*}
u(v_i)=M \quad \text{for all } i\in I,
\qquad
u(v_j)\le M \quad \text{for all } j\notin I,
\end{align*}
where $M=\max_{0\le k\le n}u(v_k)$. If all inequalities for $j\notin I$ are strict, then the exposed face is exactly $\Delta_I$; if equality also holds at some outside vertex, the exposed face enlarges to include that vertex. Thus the faces of a simplex are precisely the convex hulls of chosen vertices, and the normal cone records which vertices share the maximal linear height.
[/example]
## Face Lattices of Polytopes
How do the boundary pieces of a polytope fit together? The set of all faces is ordered by inclusion, and this ordered structure is the face lattice. For polytopes, it is finite and encodes vertices, edges, facets, and the whole polytope in one object.
[definition: Polytope]
A polytope in $\mathbb R^n$ is the convex hull of finitely many points in $\mathbb R^n$.
[/definition]
Finiteness is the feature that distinguishes polytopes from general compact convex bodies in this section. It implies that the boundary decomposes into finitely many combinatorial pieces rather than a continuum of curved tangent points.
Listing the faces alone loses the incidence information that makes a polytope combinatorial. The next object records containment among all boundary pieces, so that vertices, edges, facets, and the whole polytope can be studied through one ordered structure.
[definition: Face Lattice]
Let $P\subset\mathbb R^n$ be a polytope. The face lattice of $P$ is the collection of all faces of $P$, ordered by inclusion.
[/definition]
The empty face is often included by convention, and $P$ itself is the largest face. Meets are intersections of faces, while joins are the smallest faces containing a given union.
For the face lattice to connect back to supporting hyperplanes, its order-theoretic pieces must be recoverable by linear optimization.
The next question is whether the combinatorial notion of a proper face matches the analytic notion of an exposed face. If every non-empty proper face can be selected by a single supporting functional, then the finite face lattice can be studied using linear objectives rather than separate ad hoc descriptions of each boundary piece.
[quotetheorem:4092]
[citeproof:4092]
This theorem is special to polytopes. Non-emptiness of the face is needed because an exposed face is a maximising set and hence cannot be empty on a non-empty compact polytope. Properness is needed because the whole polytope is exposed only by the zero functional, which is excluded in the definition of exposed face. For curved or partly flat convex bodies, some faces may not be produced by a single supporting hyperplane, and the distinction between exposed and non-exposed faces becomes meaningful. The theorem is therefore a bridge from analytic separation to the finite combinatorics of the face lattice.
[example: Face Lattice Of A Square]
Let $P=[-1,1]^2$. Its four vertices are
\begin{align*}
(-1,-1),\quad (-1,1),\quad (1,-1),\quad (1,1),
\end{align*}
and its four edges are
\begin{align*}
E_R&=\{(1,t):-1\le t\le1\},&
E_L&=\{(-1,t):-1\le t\le1\},\\
E_T&=\{(s,1):-1\le s\le1\},&
E_B&=\{(s,-1):-1\le s\le1\}.
\end{align*}
Together with $\varnothing$ and $P$ itself, these are the faces in the face lattice of the square.
For example, the right edge is exposed by $u(x_1,x_2)=x_1$. Since every $(x_1,x_2)\in P$ satisfies $-1\le x_1\le1$, we have
\begin{align*}
u(x_1,x_2)=x_1\le1.
\end{align*}
Equality holds exactly when $x_1=1$, so
\begin{align*}
F(P,u)
&=\{(x_1,x_2)\in P:u(x_1,x_2)=1\}\\
&=\{(x_1,x_2)\in[-1,1]^2:x_1=1\}\\
&=\{(1,t):-1\le t\le1\}\\
&=E_R.
\end{align*}
Similarly, the vertex $(1,1)$ is exposed by $v(x_1,x_2)=x_1+x_2$. For $(x_1,x_2)\in P$,
\begin{align*}
x_1&\le1,\\
x_2&\le1,
\end{align*}
so adding the two inequalities gives
\begin{align*}
v(x_1,x_2)=x_1+x_2\le2.
\end{align*}
If equality holds, then
\begin{align*}
x_1+x_2=2.
\end{align*}
Because $x_1\le1$ and $x_2\le1$, the only way their sum can be $2$ is
\begin{align*}
x_1=1,\qquad x_2=1.
\end{align*}
Hence
\begin{align*}
F(P,v)
&=\{(x_1,x_2)\in P:x_1+x_2=2\}\\
&=\{(1,1)\}.
\end{align*}
The inclusion order records, for instance,
\begin{align*}
\{(1,1)\}\subset E_R\subset P
\qquad\text{and}\qquad
\{(1,1)\}\subset E_T\subset P.
\end{align*}
The normal cones grow in the opposite direction: an interior point has only the zero normal, a relative interior point of an edge such as $(1,t)$ with $-1<t<1$ has the ray $\{(a,0):a\ge0\}$, and the vertex $(1,1)$ has the quadrant $\{(a,b):a\ge0,\ b\ge0\}$. Thus smaller faces correspond to larger collections of supporting directions.
[/example]
For polytopes, faces are flat pieces cut out by supporting directions. The Euclidean ball has no flat sides, so its faces collapse to boundary points.
[example: Faces Of The Euclidean Ball]
Let $B=\{x\in\mathbb R^n:|x|\le1\}$, and let $x_0\in B$ satisfy $|x_0|=1$. We show first that $x_0$ is exposed by the functional
\begin{align*}
u(x)=x_0\cdot x.
\end{align*}
For any $x\in B$,
\begin{align*}
0&\le |x-x_0|^2\\
&=(x-x_0)\cdot(x-x_0)\\
&=|x|^2-2x_0\cdot x+|x_0|^2\\
&=|x|^2-2u(x)+1.
\end{align*}
Hence
\begin{align*}
2u(x)&\le |x|^2+1\\
&\le 1+1\\
&=2,
\end{align*}
so $u(x)\le1$. At $x=x_0$,
\begin{align*}
u(x_0)=x_0\cdot x_0=|x_0|^2=1,
\end{align*}
so the maximum value of $u$ on $B$ is $1$. If $u(x)=1$, then the inequalities above force
\begin{align*}
|x|^2=1
\end{align*}
and
\begin{align*}
|x-x_0|^2=|x|^2-2u(x)+1=1-2+1=0.
\end{align*}
Thus $x=x_0$, and therefore
\begin{align*}
F(B,u)=\{x_0\}.
\end{align*}
So every boundary point of the Euclidean ball is an exposed point.
Now let $F$ be a non-empty proper face of $B$. If $z\in F$ satisfies $|z|<1$, then for any $y\in B$ choose $t\in(0,1)$ with
\begin{align*}
t\le \frac{1-|z|}{2}.
\end{align*}
Define
\begin{align*}
w=\frac{z-ty}{1-t}.
\end{align*}
Then
\begin{align*}
|w|
&=\left|\frac{z-ty}{1-t}\right|\\
&\le \frac{|z|+t|y|}{1-t}\\
&\le \frac{|z|+t}{1-t}\\
&\le 1,
\end{align*}
because $|z|+t\le 1-t$. Hence $w\in B$, and
\begin{align*}
z=ty+(1-t)w.
\end{align*}
Since $z\in F$ and $F$ is a face, this forces $y\in F$. As $y\in B$ was arbitrary, $F=B$, contradicting that $F$ is proper. Therefore every point of a proper face lies on the boundary sphere $|x|=1$.
If a proper face $F$ contained two distinct points $x,y$ with $|x|=|y|=1$, then by convexity of $F$ the midpoint $(x+y)/2$ would lie in $F$. Its norm satisfies
\begin{align*}
\left|\frac{x+y}{2}\right|^2
&=\frac{1}{4}(x+y)\cdot(x+y)\\
&=\frac{1}{4}\bigl(|x|^2+2x\cdot y+|y|^2\bigr)\\
&=\frac{1}{2}(1+x\cdot y).
\end{align*}
Also
\begin{align*}
0<|x-y|^2
&=(x-y)\cdot(x-y)\\
&=|x|^2-2x\cdot y+|y|^2\\
&=2-2x\cdot y,
\end{align*}
so $x\cdot y<1$. Hence
\begin{align*}
\left|\frac{x+y}{2}\right|^2<1,
\end{align*}
which would put an interior point in $F$, impossible for a proper face. Thus every non-empty proper face is a singleton boundary point.
Finally, identify linear functionals with vectors by $v(x)=v\cdot x$. For $|x_0|=1$,
\begin{align*}
v\in N_B(x_0)
\end{align*}
means
\begin{align*}
v\cdot(x-x_0)\le0
\end{align*}
for every $x\in B$, equivalently
\begin{align*}
v\cdot x\le v\cdot x_0
\end{align*}
for every $x\in B$. If $v=\lambda x_0$ with $\lambda\ge0$, then for every $x\in B$ the computation above gives $x_0\cdot x\le1=x_0\cdot x_0$, so
\begin{align*}
v\cdot x=\lambda x_0\cdot x\le \lambda=\lambda x_0\cdot x_0=v\cdot x_0.
\end{align*}
Thus $\lambda x_0\in N_B(x_0)$.
Conversely, if $v\in N_B(x_0)$, write
\begin{align*}
v=\lambda x_0+w,
\qquad
w\cdot x_0=0,
\end{align*}
where $\lambda=v\cdot x_0$. Testing the normal-cone inequality at $x=-x_0$ gives
\begin{align*}
v\cdot(-x_0)&\le v\cdot x_0,\\
-\lambda&\le\lambda,
\end{align*}
so $\lambda\ge0$. If $w\ne0$, then $x=w/|w|\in B$ and $x\cdot x_0=0$, so
\begin{align*}
v\cdot x
&=(\lambda x_0+w)\cdot\frac{w}{|w|}\\
&=\lambda x_0\cdot\frac{w}{|w|}+w\cdot\frac{w}{|w|}\\
&=0+|w|\\
&>|0|\\
&=v\cdot x_0-\lambda.
\end{align*}
Since $v\cdot x_0=\lambda$, this gives $v\cdot x>|0|$ but not yet a contradiction unless compared to $\lambda$. Instead take, for sufficiently small $\varepsilon>0$,
\begin{align*}
x=\sqrt{1-\varepsilon^2}\,x_0+\varepsilon\frac{w}{|w|}.
\end{align*}
Then
\begin{align*}
|x|^2
&=(1-\varepsilon^2)|x_0|^2
+2\varepsilon\sqrt{1-\varepsilon^2}\,x_0\cdot\frac{w}{|w|}
+\varepsilon^2\left|\frac{w}{|w|}\right|^2\\
&=1-\varepsilon^2+0+\varepsilon^2\\
&=1,
\end{align*}
so $x\in B$. The normal-cone inequality gives
\begin{align*}
\lambda\sqrt{1-\varepsilon^2}+\varepsilon |w|
&=v\cdot x\\
&\le v\cdot x_0\\
&=\lambda.
\end{align*}
Rearranging,
\begin{align*}
\varepsilon |w|\le \lambda\bigl(1-\sqrt{1-\varepsilon^2}\bigr).
\end{align*}
For $\varepsilon>0$, divide by $\varepsilon$:
\begin{align*}
|w|
&\le \lambda\frac{1-\sqrt{1-\varepsilon^2}}{\varepsilon}\\
&=\lambda\frac{\varepsilon^2}{\varepsilon(1+\sqrt{1-\varepsilon^2})}\\
&=\lambda\frac{\varepsilon}{1+\sqrt{1-\varepsilon^2}}.
\end{align*}
Letting $\varepsilon\to0^+$ gives $|w|\le0$, so $w=0$. Therefore
\begin{align*}
N_B(x_0)=\{\lambda x_0:\lambda\ge0\}.
\end{align*}
The Euclidean ball has no positive-dimensional proper faces: its proper faces are exactly the exposed singleton boundary points, and each such point has a one-dimensional outward normal cone.
[/example]
## Compact Convex Sets from Extreme Points
If supporting hyperplanes reveal the boundary, can the whole body be rebuilt from its most economical boundary points? In finite dimension the answer is yes for compact convex sets: the extreme points generate the set by convex hull. This is the finite-dimensional form of Minkowski's theorem, also called the Krein-Milman theorem in this setting.
[quotetheorem:4093]
[citeproof:4093]
Compactness is essential: a closed half-line has one extreme point but is not the convex hull of that point alone. Finite dimensionality is also part of the theorem's force, since infinite-dimensional compactness and closed convex hulls require more delicate functional-analytic hypotheses. Convexity is needed for the statement to be meaningful as a reconstruction by convex combinations, and non-emptiness avoids the separate convention for the convex hull of the empty set. Minkowski's theorem says that compact convex geometry in finite dimension is generated by points that cannot be split further. For polytopes this reduces to the familiar statement that a polytope is the convex hull of its vertices; for a Euclidean ball it says that the closed ball is the convex hull of its unit sphere.
[example: Extreme Points In The Main Examples]
For the square $Q=[-1,1]^2$, we show that the extreme points are exactly
\begin{align*}
(-1,-1),\quad (-1,1),\quad (1,-1),\quad (1,1).
\end{align*}
Let $v=(1,1)$. If
\begin{align*}
v=t(x_1,x_2)+(1-t)(y_1,y_2),
\qquad 0<t<1,
\end{align*}
with $(x_1,x_2),(y_1,y_2)\in Q$, then the first coordinate gives
\begin{align*}
1=tx_1+(1-t)y_1.
\end{align*}
Since $x_1\le1$ and $y_1\le1$, we have
\begin{align*}
tx_1+(1-t)y_1\le t\cdot1+(1-t)\cdot1=1.
\end{align*}
Equality can hold only if $x_1=1$ and $y_1=1$. The second coordinate gives the same conclusion:
\begin{align*}
1=tx_2+(1-t)y_2
\end{align*}
forces $x_2=1$ and $y_2=1$. Hence both points are $(1,1)$. The same argument, with the relevant signs changed, proves that the other three vertices are extreme.
Conversely, if $p=(a,b)\in Q$ is not a vertex, then at least one coordinate lies strictly between $-1$ and $1$. If $-1<a<1$, choose $\varepsilon>0$ with $a-\varepsilon\ge-1$ and $a+\varepsilon\le1$. Then
\begin{align*}
p
&=(a,b)\\
&=\frac12(a-\varepsilon,b)+\frac12(a+\varepsilon,b),
\end{align*}
and the two points on the right are distinct points of $Q$, so $p$ is not extreme. If instead $a=\pm1$ and $-1<b<1$, the same argument in the second coordinate gives
\begin{align*}
(a,b)=\frac12(a,b-\varepsilon)+\frac12(a,b+\varepsilon),
\end{align*}
with two distinct points of $Q$. Thus the only extreme points of the square are its four vertices.
For an $n$-simplex $\Delta=\operatorname{conv}\{v_0,\dots,v_n\}$ with affinely independent vertices, the extreme points are exactly $v_0,\dots,v_n$. If
\begin{align*}
v_k=tx+(1-t)y,
\qquad 0<t<1,
\end{align*}
with $x,y\in\Delta$, write
\begin{align*}
x=\sum_{i=0}^n a_i v_i,
\qquad
y=\sum_{i=0}^n b_i v_i,
\end{align*}
where $a_i,b_i\ge0$ and $\sum_i a_i=\sum_i b_i=1$. Then
\begin{align*}
v_k
&=t\sum_{i=0}^n a_i v_i+(1-t)\sum_{i=0}^n b_i v_i\\
&=\sum_{i=0}^n\bigl(ta_i+(1-t)b_i\bigr)v_i.
\end{align*}
By uniqueness of barycentric coordinates in a simplex,
\begin{align*}
ta_k+(1-t)b_k=1,
\qquad
ta_j+(1-t)b_j=0 \quad (j\ne k).
\end{align*}
Since all coefficients are non-negative and $t,1-t>0$, the equations for $j\ne k$ force $a_j=b_j=0$. Hence $a_k=b_k=1$, so $x=y=v_k$. Thus each defining vertex is extreme.
If $x\in\Delta$ is not one of the vertices, write
\begin{align*}
x=\sum_{i=0}^n\lambda_i v_i,
\qquad
\lambda_i\ge0,
\qquad
\sum_{i=0}^n\lambda_i=1.
\end{align*}
Since $x$ is not a vertex, at least two coefficients are positive. Choose $k$ with $0<\lambda_k<1$. Then
\begin{align*}
x
&=\lambda_k v_k+(1-\lambda_k)\left(\sum_{i\ne k}\frac{\lambda_i}{1-\lambda_k}v_i\right).
\end{align*}
The point in parentheses lies in $\Delta$, because its coefficients are non-negative and
\begin{align*}
\sum_{i\ne k}\frac{\lambda_i}{1-\lambda_k}
=\frac{1-\lambda_k}{1-\lambda_k}
=1.
\end{align*}
It is not equal to $v_k$, so $x$ is a non-trivial convex combination and is not extreme. Therefore the extreme points of the simplex are precisely its defining vertices.
For the Euclidean ball
\begin{align*}
B(0,1)=\{x\in\mathbb R^n:|x|\le1\},
\end{align*}
every interior point is not extreme. If $|x|<1$, choose $\varepsilon>0$ with $|x|+\varepsilon\le1$, and choose any unit vector $e$. Then
\begin{align*}
|x+\varepsilon e|&\le |x|+\varepsilon\le1,\\
|x-\varepsilon e|&\le |x|+\varepsilon\le1,
\end{align*}
so both points lie in $B(0,1)$, and
\begin{align*}
x=\frac12(x+\varepsilon e)+\frac12(x-\varepsilon e).
\end{align*}
The two points are distinct, so $x$ is not extreme.
Now let $|x|=1$ and suppose
\begin{align*}
x=ty+(1-t)z,
\qquad
0<t<1,
\qquad
y,z\in B(0,1).
\end{align*}
Then
\begin{align*}
1
&=|x|^2\\
&=|ty+(1-t)z|^2\\
&=t^2|y|^2+(1-t)^2|z|^2+2t(1-t)y\cdot z.
\end{align*}
Also
\begin{align*}
0\le |y-z|^2=|y|^2-2y\cdot z+|z|^2,
\end{align*}
so
\begin{align*}
2y\cdot z\le |y|^2+|z|^2.
\end{align*}
Substituting this into the previous expansion gives
\begin{align*}
1
&\le t^2|y|^2+(1-t)^2|z|^2+t(1-t)(|y|^2+|z|^2)\\
&=t|y|^2+(1-t)|z|^2\\
&\le t+(1-t)\\
&=1.
\end{align*}
Thus equality holds throughout, so
\begin{align*}
t(1-|y|^2)+(1-t)(1-|z|^2)=0.
\end{align*}
Since $t>0$, $1-t>0$, and $1-|y|^2,1-|z|^2\ge0$, this forces
\begin{align*}
|y|=|z|=1.
\end{align*}
Equality in
\begin{align*}
2y\cdot z\le |y|^2+|z|^2
\end{align*}
also gives
\begin{align*}
|y-z|^2=|y|^2-2y\cdot z+|z|^2=0,
\end{align*}
so $y=z$. Hence $x=y=z$, and $x$ is extreme. Therefore the extreme points of the Euclidean ball are exactly the boundary points $|x|=1$.
[/example]
Once separation and supporting hyperplanes are available, convex sets can be encoded analytically rather than only geometrically. Chapter 3 turns to support functions and the Hausdorff metric, giving a framework for comparing compact convex bodies and passing to limits in a controlled way.
# 3. Support Functions and the Hausdorff Metric
Building on Chapter 2's supporting hyperplanes and separation theorems, this chapter develops the analytic language used to study compact convex sets in Euclidean space. The course assumes compactness in $\mathbb R^n$ and elementary normed-space ideas such as uniform convergence. The central goal is to replace geometric statements about supporting hyperplanes by function-theoretic statements about support functions, then use that dictionary to understand convergence of convex bodies.
## Measuring a Convex Body in a Direction
What numerical data should a convex body expose if we want to remember all of its supporting hyperplanes? A direction $u \in \mathbb R^n$ determines the linear functional $x \mapsto \langle x,u\rangle$, and maximizing this functional over a compact convex set gives the position of the supporting hyperplane orthogonal to $u$.
[definition: Support Function]
Let $K \subset \mathbb R^n$ be a non-empty compact convex set. The support function of $K$ is the function $h_K: \mathbb R^n \to \mathbb R$ defined by
\begin{align*}
h_K(u) = \sup_{x \in K} \langle x,u\rangle.
\end{align*}
[/definition]
Since $K$ is compact, the supremum is attained for every $u \in \mathbb R^n$. The hyperplane
\begin{align*}
H(K,u)=\{x \in \mathbb R^n : \langle x,u\rangle = h_K(u)\}
\end{align*}
is a supporting hyperplane when $u \ne 0$, and $K$ lies in the closed half-space $\langle x,u\rangle \le h_K(u)$.
[example: Euclidean Ball]
Let $K=\overline{B}(a,r)\subset \mathbb R^n$, with $a\in\mathbb R^n$ and $r\ge 0$. For $u\in\mathbb R^n$, every $x\in K$ can be written as $x=a+y$ with $|y|\le r$, so
\begin{align*}
\langle x,u\rangle
&=\langle a+y,u\rangle\\
&=\langle a,u\rangle+\langle y,u\rangle\\
&\le \langle a,u\rangle+|y|\,|u|\\
&\le \langle a,u\rangle+r|u|.
\end{align*}
Thus $h_K(u)\le \langle a,u\rangle+r|u|$.
If $u\ne 0$, choose $y=r u/|u|$. Then $|y|=r$ and
\begin{align*}
\langle a+y,u\rangle
&=\left\langle a+\frac{r u}{|u|},u\right\rangle\\
&=\langle a,u\rangle+\frac{r}{|u|}\langle u,u\rangle\\
&=\langle a,u\rangle+\frac{r}{|u|}|u|^2\\
&=\langle a,u\rangle+r|u|.
\end{align*}
If $u=0$, then $\langle x,0\rangle=0$ for every $x\in K$, and the same formula gives $\langle a,0\rangle+r|0|=0$. Therefore
\begin{align*}
h_K(u)=\langle a,u\rangle+r|u|.
\end{align*}
For the unit ball $B_2^n=\overline{B}(0,1)$, this becomes $h_{B_2^n}(u)=|u|$, so the support value in direction $u$ is exactly the Euclidean length of $u$.
[/example]
The formula for a ball suggests that support functions should behave predictably when directions are rescaled or added. Before using them as analytic representatives of convex bodies, we need the basic inequalities that every such supremum of linear functions must satisfy.
[quotetheorem:4094]
[citeproof:4094]
A function with these two properties is often called sublinear. Thus every compact convex set gives a finite sublinear function on $\mathbb R^n$. The theorem does not say that every sublinear function is the support function of a compact body: a growth condition is also needed to prevent the recovered set from being unbounded. Compactness is used to keep the support function finite and to ensure that supporting hyperplanes touch the body, while convexity is not needed for the two inequalities themselves. For instance, a non-convex compact set and its convex hull have the same support function, so these functions cannot remember non-convex features.
[example: Box]
Let
\begin{align*}
K=\prod_{i=1}^n [-a_i,a_i], \qquad a_i \ge 0,
\end{align*}
and write $u=(u_1,\dots,u_n)$. For any $x=(x_1,\dots,x_n)\in K$, each coordinate satisfies $-a_i\le x_i\le a_i$, so $|x_i|\le a_i$. Hence
\begin{align*}
\langle x,u\rangle
&=\sum_{i=1}^n x_i u_i\\
&\le \sum_{i=1}^n |x_i u_i|\\
&=\sum_{i=1}^n |x_i|\,|u_i|\\
&\le \sum_{i=1}^n a_i |u_i|.
\end{align*}
Taking the supremum over $x\in K$ gives
\begin{align*}
h_K(u)\le \sum_{i=1}^n a_i |u_i|.
\end{align*}
To attain this bound, define $x\in K$ coordinatewise by
\begin{align*}
x_i=
\begin{cases}
a_i, & u_i\ge 0,\\
-a_i, & u_i<0.
\end{cases}
\end{align*}
Then $x_i\in[-a_i,a_i]$ for every $i$, and
\begin{align*}
x_i u_i
&=
\begin{cases}
a_i u_i, & u_i\ge 0,\\
-a_i u_i, & u_i<0
\end{cases}\\
&=a_i|u_i|.
\end{align*}
Therefore
\begin{align*}
\langle x,u\rangle
&=\sum_{i=1}^n x_i u_i\\
&=\sum_{i=1}^n a_i|u_i|.
\end{align*}
Combining the upper bound with this attaining point gives
\begin{align*}
h_K(u)=\sum_{i=1}^n a_i |u_i|.
\end{align*}
Thus each coordinate contributes independently to the support value, and the flat facets of the box appear analytically as absolute values in the support function.
[/example]
The box example also previews the role of dual norms: the support function of a centrally symmetric body is a norm precisely when the body is the unit ball of the dual norm. A segment is the simplest degenerate version of this phenomenon, because it measures only the component of $u$ in one direction.
[example: Line Segment]
Let
\begin{align*}
K=[a,b]=\{(1-t)a+tb:0\le t\le 1\}
\end{align*}
for $a,b\in\mathbb R^n$. Fix $u\in\mathbb R^n$. For a point $x=(1-t)a+tb\in K$,
\begin{align*}
\langle x,u\rangle
&=\langle (1-t)a+tb,u\rangle\\
&=(1-t)\langle a,u\rangle+t\langle b,u\rangle\\
&=\langle a,u\rangle+t(\langle b,u\rangle-\langle a,u\rangle).
\end{align*}
If $\langle b,u\rangle-\langle a,u\rangle\ge 0$, then this expression is largest on $0\le t\le 1$ when $t=1$, and the maximum value is $\langle b,u\rangle$. If $\langle b,u\rangle-\langle a,u\rangle<0$, then it is largest when $t=0$, and the maximum value is $\langle a,u\rangle$. Therefore
\begin{align*}
h_K(u)=\max\{\langle a,u\rangle,\langle b,u\rangle\}.
\end{align*}
For the symmetric segment $K=[-v,v]$, substitute $a=-v$ and $b=v$:
\begin{align*}
h_K(u)
&=\max\{\langle -v,u\rangle,\langle v,u\rangle\}\\
&=\max\{-\langle v,u\rangle,\langle v,u\rangle\}\\
&=|\langle v,u\rangle|.
\end{align*}
Thus a symmetric segment records only the component of $u$ in the direction of $v$, measured in absolute value.
[/example]
## Characterizing Which Functions Are Support Functions
Which functions arise from convex bodies? Positive homogeneity and subadditivity are necessary, but compactness of the body also forces continuity and a linear growth bound. Conversely, a finite sublinear function determines a closed convex set by half-spaces, and boundedness is the condition that this set is compact.
[definition: Sublinear Function]
A function $p:\mathbb R^n \to \mathbb R$ is sublinear if for all $u,v \in \mathbb R^n$ and all $\lambda \ge 0$,
\begin{align*}
p(\lambda u) &= \lambda p(u),\\
p(u+v) &\le p(u)+p(v).
\end{align*}
[/definition]
Sublinearity implies convexity of $p$ as a function on $\mathbb R^n$, because
\begin{align*}
p((1-t)u+tv)\le (1-t)p(u)+tp(v)
\end{align*}
for $0\le t\le 1$. In finite dimensions, a finite sublinear function is continuous, so its values on the unit sphere control its values everywhere.
The support function construction raises a converse problem: which sublinear functions actually arise from compact convex sets? The obstruction is growth. A sublinear function can encode supporting half-spaces consistently while still allowing an unbounded intersection unless its size is controlled linearly.
[quotetheorem:4095]
[citeproof:4095]
This theorem is the analytic form of the supporting-half-space philosophy. A convex body is the simultaneous solution to all inequalities imposed by its support function. The growth bound is the compactness condition: without it, a sublinear function may define an unbounded closed convex set rather than a convex body. The sublinearity condition is the compatibility condition among all supporting half-spaces; arbitrary continuous data on the sphere need not extend to a consistent family of half-spaces. This is the first point where convex analysis enters the course in a structural way, because support functions are exactly finite sublinear functions with the correct boundedness.
[example: Simplex]
Let $K=\operatorname{conv}\{v_0,\dots,v_m\}\subset \mathbb R^n$, and fix $u\in\mathbb R^n$. Every point $x\in K$ can be written as
\begin{align*}
x=\sum_{j=0}^m \lambda_j v_j,
\qquad
\lambda_j\ge 0,
\qquad
\sum_{j=0}^m \lambda_j=1.
\end{align*}
By linearity of the inner product in the first variable,
\begin{align*}
\langle x,u\rangle
&=\left\langle \sum_{j=0}^m \lambda_j v_j,u\right\rangle\\
&=\sum_{j=0}^m \lambda_j \langle v_j,u\rangle.
\end{align*}
If
\begin{align*}
M=\max_{0\le j\le m}\langle v_j,u\rangle,
\end{align*}
then $\langle v_j,u\rangle\le M$ for every $j$, so
\begin{align*}
\langle x,u\rangle
&=\sum_{j=0}^m \lambda_j \langle v_j,u\rangle\\
&\le \sum_{j=0}^m \lambda_j M\\
&=M\sum_{j=0}^m \lambda_j\\
&=M.
\end{align*}
Taking the supremum over $x\in K$ gives $h_K(u)\le M$.
Conversely, choose an index $j_0$ such that
\begin{align*}
\langle v_{j_0},u\rangle=M.
\end{align*}
Since $v_{j_0}\in K$, the definition of the support function gives
\begin{align*}
h_K(u)
&=\sup_{x\in K}\langle x,u\rangle\\
&\ge \langle v_{j_0},u\rangle\\
&=M.
\end{align*}
Therefore
\begin{align*}
h_K(u)=\max_{0\le j\le m}\langle v_j,u\rangle.
\end{align*}
Thus the support function of a simplex is the maximum of finitely many linear functions, and the directions for which a fixed vertex attains this maximum form the corresponding cone in the normal fan.
[/example]
The preceding example is the polyhedral model: finitely many exposed points produce a piecewise-linear support function. Smooth strictly convex bodies behave differently, and the ellipsoid gives the standard quadratic example.
[example: Ellipsoid]
Let $A\in \mathbb R^{n\times n}$ be symmetric positive definite and set
\begin{align*}
K=\{x\in \mathbb R^n: x^\top A^{-1}x\le 1\}.
\end{align*}
For $x,y\in\mathbb R^n$, define
\begin{align*}
(x,y)_A=x^\top A^{-1}y.
\end{align*}
Since $A^{-1}$ is symmetric positive definite, this is an inner product. For any $x\in K$ and $u\in\mathbb R^n$,
\begin{align*}
\langle x,u\rangle
&=x^\top u\\
&=x^\top A^{-1}Au\\
&=(x,Au)_A\\
&\le |(x,Au)_A|\\
&\le (x,x)_A^{1/2}(Au,Au)_A^{1/2}\\
&=(x^\top A^{-1}x)^{1/2}\bigl((Au)^\top A^{-1}(Au)\bigr)^{1/2}\\
&\le \bigl(u^\top A^\top A^{-1}Au\bigr)^{1/2}\\
&=\bigl(u^\top A u\bigr)^{1/2},
\end{align*}
where the fourth inequality is the *Cauchy-Schwarz inequality* for the inner product $(\cdot,\cdot)_A$, and the last equality uses $A^\top=A$ and $A^{-1}A=I$. Hence
\begin{align*}
h_K(u)\le (u^\top A u)^{1/2}.
\end{align*}
If $u=0$, then $\langle x,0\rangle=0$ for every $x\in K$, and the formula gives $(0^\top A0)^{1/2}=0$. Now suppose $u\ne 0$. Since $A$ is positive definite, $u^\top A u>0$, so
\begin{align*}
x_u=\frac{Au}{(u^\top A u)^{1/2}}
\end{align*}
is well-defined. This point lies in $K$ because
\begin{align*}
x_u^\top A^{-1}x_u
&=\left(\frac{Au}{(u^\top A u)^{1/2}}\right)^\top
A^{-1}
\left(\frac{Au}{(u^\top A u)^{1/2}}\right)\\
&=\frac{(Au)^\top A^{-1}(Au)}{u^\top A u}\\
&=\frac{u^\top A^\top A^{-1}A u}{u^\top A u}\\
&=\frac{u^\top A u}{u^\top A u}\\
&=1.
\end{align*}
At this point,
\begin{align*}
\langle x_u,u\rangle
&=\left\langle \frac{Au}{(u^\top A u)^{1/2}},u\right\rangle\\
&=\frac{(Au)^\top u}{(u^\top A u)^{1/2}}\\
&=\frac{u^\top A^\top u}{(u^\top A u)^{1/2}}\\
&=\frac{u^\top A u}{(u^\top A u)^{1/2}}\\
&=(u^\top A u)^{1/2}.
\end{align*}
Combining the upper bound with this attaining point gives
\begin{align*}
h_K(u)=(u^\top A u)^{1/2}.
\end{align*}
Thus the ellipsoid has a quadratic support function, and the supporting point in a non-zero direction $u$ is obtained by applying $A$ to $u$ and normalizing to the boundary of $K$.
[/example]
## Recovering the Body from Supporting Half-Spaces
If the support function gives all supporting half-spaces, why does their intersection give back exactly the original body? The answer is separation: any point outside a compact convex set can be strictly separated from it, and the separating hyperplane supplies a direction in which the support inequality fails.
[quotetheorem:4096]
[citeproof:4096]
The theorem says that no supporting half-space can be omitted from the full directional family without potentially enlarging the recovered body. Compactness is used through strong separation; for non-closed or non-convex sets, the same intersection recovers a closed convex hull rather than the original set. The restriction to $S^{n-1}$ is only a normalization, because positive homogeneity stores all radial information. The recovery formula also turns set inclusion into an inequality between functions. This is often the simplest way to compare convex bodies.
[quotetheorem:4097]
[citeproof:4097]
This result is useful because it replaces a set-theoretic containment problem by a scalar inequality on the sphere. The compact convex hypotheses are still essential: for arbitrary compact sets, the support function compares convex hulls, not the sets themselves. The theorem also explains why many operations on convex bodies are easier to study after translating them into operations on support functions.
[remark: Translations and Linear Images]
For $a\in \mathbb R^n$, the translate $K+a$ satisfies
\begin{align*}
h_{K+a}(u)=h_K(u)+\langle a,u\rangle.
\end{align*}
If $T:\mathbb R^n\to \mathbb R^m$ is linear, then
\begin{align*}
h_{T K}(u)=h_K(T^\top u), \qquad u\in \mathbb R^m.
\end{align*}
These formulas are useful because they reduce many computations to standard-position examples.
[/remark]
## The Hausdorff Metric
How should we measure that two convex bodies are close? Pointwise comparison of sets is too rigid, while volume can miss thin protrusions. The Hausdorff distance measures the largest distance needed to move from either set to the other, and for convex bodies it has an exact expression through support functions.
[definition: Parallel Body]
Let $K\subset \mathbb R^n$ be non-empty and compact, and let $\varepsilon\ge 0$. The outer parallel body of $K$ at radius $\varepsilon$ is
\begin{align*}
K_\varepsilon=K+\overline{B}(0,\varepsilon)=\{x+y:x\in K, |y|\le \varepsilon\}.
\end{align*}
[/definition]
The set $K_\varepsilon$ consists of all points at Euclidean distance at most $\varepsilon$ from $K$. For convex $K$, it is again compact and convex.
Parallel bodies give a way to test whether one set lies within a uniform error of another. To make this into a symmetric notion of closeness, we require each set to fit inside a parallel enlargement of the other and then minimize the required radius.
[definition: Hausdorff Distance]
Let $\mathcal C(\mathbb R^n)$ denote the set of non-empty compact subsets of $\mathbb R^n$. The Hausdorff distance is the map
\begin{align*}
d_H:\mathcal C(\mathbb R^n)\times \mathcal C(\mathbb R^n)&\to \mathbb R_{\ge 0},\\
(K,L)&\mapsto \inf\{\varepsilon\ge 0: K\subset L_\varepsilon \text{ and } L\subset K_\varepsilon\}.
\end{align*}
[/definition]
This definition is symmetric by construction and vanishes exactly when the two compact sets are equal. In applications, the inclusion form is often more useful than the equivalent pointwise distance form.
[example: Two Balls]
Let $K=\overline{B}(a,r)$ and $L=\overline{B}(b,s)$ with $r,s\ge 0$. By the support-function formula for Euclidean balls,
\begin{align*}
h_K(u)&=\langle a,u\rangle+r|u|,\\
h_L(u)&=\langle b,u\rangle+s|u|.
\end{align*}
For $u\in S^{n-1}$, we have $|u|=1$, so
\begin{align*}
h_K(u)-h_L(u)
&=\langle a,u\rangle+r-\langle b,u\rangle-s\\
&=\langle a-b,u\rangle+(r-s).
\end{align*}
Set $c=a-b$ and $\alpha=r-s$. For every $u\in S^{n-1}$,
\begin{align*}
|\langle c,u\rangle+\alpha|
&\le |\langle c,u\rangle|+|\alpha|\\
&\le |c|\,|u|+|\alpha|\\
&=|c|+|\alpha|,
\end{align*}
where the second inequality is Cauchy-Schwarz.
It remains to see that this upper bound is attained. If $c=0$, then
\begin{align*}
|\langle c,u\rangle+\alpha|
&=|\alpha|
=|c|+|\alpha|
\end{align*}
for every $u\in S^{n-1}$. If $c\ne 0$ and $\alpha\ge 0$, choose $u=c/|c|$. Then
\begin{align*}
|\langle c,u\rangle+\alpha|
&=\left|\left\langle c,\frac{c}{|c|}\right\rangle+\alpha\right|\\
&=\left|\frac{\langle c,c\rangle}{|c|}+\alpha\right|\\
&=\left|\frac{|c|^2}{|c|}+\alpha\right|\\
&=|c|+\alpha\\
&=|c|+|\alpha|.
\end{align*}
If $c\ne 0$ and $\alpha<0$, choose $u=-c/|c|$. Then
\begin{align*}
|\langle c,u\rangle+\alpha|
&=\left|\left\langle c,-\frac{c}{|c|}\right\rangle+\alpha\right|\\
&=\left|-\frac{\langle c,c\rangle}{|c|}+\alpha\right|\\
&=\left|-|c|+\alpha\right|\\
&=|c|-\alpha\\
&=|c|+|\alpha|.
\end{align*}
Therefore
\begin{align*}
\sup_{u\in S^{n-1}}|h_K(u)-h_L(u)|
&=\sup_{u\in S^{n-1}}|\langle a-b,u\rangle+(r-s)|\\
&=|a-b|+|r-s|.
\end{align*}
By *Hausdorff Distance via Support Functions*,
\begin{align*}
d_H(K,L)=|a-b|+|r-s|.
\end{align*}
Thus the Hausdorff distance between two Euclidean balls splits into the distance between their centres and the absolute change in their radii.
[/example]
## Uniform Convergence of Support Functions
Why does the Hausdorff metric fit support functions so well? Enlarging a convex body by an $\varepsilon$-ball adds exactly $\varepsilon$ to its support function on the unit sphere. This turns geometric containment up to error into a uniform inequality.
[quotetheorem:4098]
[citeproof:4098]
Minkowski addition is therefore linearized by support functions. The compactness assumption ensures that the separate suprema are attained, though the equality can also be proved by approximation without naming maximizers. The special case of adding a ball is the bridge to the Hausdorff metric, because it converts geometric thickening into adding the constant function $\varepsilon$ on $S^{n-1}$.
This bridge suggests a precise metric question. If two convex bodies have nearly the same support function in every unit direction, then every supporting half-space for one should be close to a supporting half-space for the other. Conversely, a small Hausdorff error should perturb each directional support value by at most that error. The next result states that these two measurements are exactly the same.
[quotetheorem:4099]
[citeproof:4099]
This result says that the support-function transform preserves distances, provided the space of functions is given the uniform norm on $S^{n-1}$. Convexity is the reason the formula is exact: for a non-convex compact set, the support function only records the convex hull, so the right-hand side could be zero while the Hausdorff distance between the original sets is positive. Compactness keeps the Hausdorff distance finite and ensures the support functions are continuous on the sphere. The theorem is also the first approximation principle of the chapter: convergence of convex bodies can now be checked by uniform convergence of ordinary functions.
[quotetheorem:4100]
[citeproof:4100]
The image is not all of $C(S^{n-1})$: the homogeneous extension must satisfy the global subadditivity inequalities, which are invisible if one checks only pointwise continuity on the sphere. This restriction is what distinguishes convex-body convergence from arbitrary uniform approximation of functions. In Banach-space language, centrally symmetric convex bodies correspond to norms through their polar bodies, and this theorem is the metric version of that duality.
The embedding is a useful change of language. Convergence of convex bodies in the Hausdorff metric is precisely uniform convergence of the corresponding support functions on the sphere.
[example: Converging Polytopes to a Ball]
Let $m\ge 3$, and write the vertices of the regular inscribed $m$-gon as
\begin{align*}
v_j=(\cos(2\pi j/m),\sin(2\pi j/m)),
\qquad j=0,\dots,m-1.
\end{align*}
Thus $K_m=\operatorname{conv}\{v_0,\dots,v_{m-1}\}$. If $u\in S^1$, write $u=(\cos\theta,\sin\theta)$. For any $x\in K_m$, there are coefficients $\lambda_j\ge 0$ with $\sum_{j=0}^{m-1}\lambda_j=1$ and
\begin{align*}
x=\sum_{j=0}^{m-1}\lambda_j v_j.
\end{align*}
Hence
\begin{align*}
\langle x,u\rangle
&=\left\langle \sum_{j=0}^{m-1}\lambda_j v_j,u\right\rangle\\
&=\sum_{j=0}^{m-1}\lambda_j\langle v_j,u\rangle\\
&\le \sum_{j=0}^{m-1}\lambda_j\max_{0\le k\le m-1}\langle v_k,u\rangle\\
&=\max_{0\le k\le m-1}\langle v_k,u\rangle.
\end{align*}
Since each vertex $v_k$ lies in $K_m$, the reverse inequality is attained by choosing a vertex at which the maximum occurs. Therefore
\begin{align*}
h_{K_m}(u)=\max_{0\le j\le m-1}\langle v_j,u\rangle.
\end{align*}
Now
\begin{align*}
\langle v_j,u\rangle
&=\cos(2\pi j/m)\cos\theta+\sin(2\pi j/m)\sin\theta\\
&=\cos(\theta-2\pi j/m).
\end{align*}
Choose $j=j(\theta)$ so that the circular distance
\begin{align*}
\delta_m(\theta)=\left|\theta-\frac{2\pi j}{m}\right|
\end{align*}
is minimal, taking angles modulo $2\pi$. The $m$ vertex angles divide the circle into arcs of length $2\pi/m$, so
\begin{align*}
0\le \delta_m(\theta)\le \frac{\pi}{m}.
\end{align*}
Since $m\ge 3$, we have $\pi/m\le \pi/3$, and $\cos t$ is decreasing on $[0,\pi/3]$. Thus the nearest vertex gives the largest inner product, and
\begin{align*}
h_{K_m}(u)=\cos \delta_m(\theta).
\end{align*}
For the unit disk $B_2^2$, the Euclidean ball computation gives $h_{B_2^2}(u)=|u|=1$ on $S^1$. Therefore
\begin{align*}
0\le h_{B_2^2}(u)-h_{K_m}(u)
&=1-\cos\delta_m(\theta)\\
&=2\sin^2\left(\frac{\delta_m(\theta)}{2}\right)\\
&\le 2\left(\frac{\delta_m(\theta)}{2}\right)^2\\
&=\frac{\delta_m(\theta)^2}{2}\\
&\le \frac{\pi^2}{2m^2}.
\end{align*}
Taking the supremum over $u\in S^1$ gives
\begin{align*}
\sup_{u\in S^1}|h_{K_m}(u)-h_{B_2^2}(u)|
\le \frac{\pi^2}{2m^2}\to 0.
\end{align*}
By *Hausdorff Distance via Support Functions*, this means $d_H(K_m,B_2^2)\to 0$. The convergence need not be monotone in $m$, because the usual sequence of regular inscribed polygons is not nested.
[/example]
This example illustrates why support functions are especially effective for approximation by polytopes. A polytope gives a maximum of finitely many linear functions, so polytope approximation becomes uniform approximation of a support function by finite maxima. The final remark records the boundary of this method.
[remark: Why Convexity Matters]
The Hausdorff distance is defined for all non-empty compact sets, but support functions only see convex hulls: $h_K=h_{\operatorname{conv} K}$. Thus the isometric description by support functions belongs to convex geometry, not arbitrary compact-set geometry. This is why the recovery theorem reconstructs the compact convex body rather than the original non-convex set.
[/remark]
Support functions already hint that convex bodies behave like linear objects under addition and scaling. Chapter 4 makes that idea explicit through Minkowski addition, developing the mixed linear structure that lets convex geometry interact with algebra in a systematic way.
# 4. Minkowski Addition and Mixed Linear Structure
Minkowski addition is the operation that treats convex bodies as objects which can be added, scaled, and offset. The guiding question is how much of ordinary linear algebra survives when vectors are replaced by compact convex sets. Building on Chapter 3's support-function dictionary, this chapter explains why support functions are the right coordinates for this operation. It also shows why geometric constructions such as thickening a body by a ball have polynomial-looking behaviour.
## Adding Convex Bodies
What should it mean to add two shapes? For points the answer is vector addition; for sets the natural construction is to add all possible pairs of points. The first issue is whether this operation stays inside the class of convex bodies.
[definition: Minkowski Sum]
The Minkowski addition operation on subsets of $\mathbb R^n$ is the map
\begin{align*}
+ : \mathcal P(\mathbb R^n) \times \mathcal P(\mathbb R^n) &\to \mathcal P(\mathbb R^n), \\
(A,B) &\mapsto A+B,
\end{align*}
where
\begin{align*}
A+B := \{a+b : a \in A,\ b \in B\}.
\end{align*}
[/definition]
The definition applies to arbitrary sets, but it is most useful for convex bodies because convexity and compactness are stable under this operation. Without a stability theorem, adding two well-behaved shapes could leave the class in which the course is working.
Minkowski addition is only one half of the linear-looking notation used for convex bodies. Mixed expressions such as $sK+tL$ also require a set-level version of scalar dilation, so the scalar operation must be fixed before any algebraic rules can be stated.
[definition: Scalar Multiple Of A Set]
For each $\lambda \in \mathbb R$, scalar multiplication by $\lambda$ on subsets of $\mathbb R^n$ is the map
\begin{align*}
S_\lambda : \mathcal P(\mathbb R^n) &\to \mathcal P(\mathbb R^n), \\
A &\mapsto \lambda A,
\end{align*}
where
\begin{align*}
\lambda A := \{\lambda a : a \in A\}.
\end{align*}
[/definition]
For $\lambda \ge 0$ and convex $A$, scalar multiplication behaves like dilation about the origin. If $x_0 \in \mathbb R^n$, then $A+x_0$ is shorthand for $A+\{x_0\}$ and represents translation.
Before these operations can be used as basic algebra on convex bodies, we must know that they do not leave the category.
The obstruction is that set operations can easily destroy compactness or convexity outside the convex-body setting. The stability result needed here is simultaneous: after adding bodies or dilating by a non-negative scalar, non-emptiness, compactness, and convexity should all remain intact.
[quotetheorem:4101]
[citeproof:4101]
This theorem lets us regard convex bodies as forming a cone-like structure under addition and non-negative scalar multiplication. Each hypothesis is doing work: non-emptiness prevents the empty set from entering, compactness prevents unbounded sums such as a half-line plus a segment, and convexity ensures that sums of intermediate points remain in the sum. For arbitrary closed convex sets, compactness can fail; for arbitrary compact sets, convexity can fail, for instance the sum of two disconnected finite sets need not be convex. The first examples therefore focus on bodies where the operation behaves like a geometric version of vector addition.
[example: Sum Of Two Disks]
In $\mathbb R^2$, let $K=\overline{B}(a,r)$ and $L=\overline{B}(b,s)$ with $r,s \ge 0$. We show that
\begin{align*}
K+L=\overline{B}(a+b,r+s).
\end{align*}
First take $z\in K+L$. Then $z=x+y$ for some $x\in K$ and $y\in L$, so $|x-a|\le r$ and $|y-b|\le s$. Hence, by the triangle inequality,
\begin{align*}
|z-(a+b)|
&=|(x+y)-(a+b)|\\
&=|(x-a)+(y-b)|\\
&\le |x-a|+|y-b|\\
&\le r+s.
\end{align*}
Thus $z\in \overline{B}(a+b,r+s)$, so $K+L\subseteq \overline{B}(a+b,r+s)$.
Conversely, take $z\in \overline{B}(a+b,r+s)$ and write
\begin{align*}
w=z-(a+b).
\end{align*}
Then $|w|\le r+s$. If $r+s=0$, then $r=s=0$, so $|w|\le 0$ gives $w=0$ and hence
\begin{align*}
z=a+b\in \{a\}+\{b\}=K+L.
\end{align*}
If $r+s>0$, define
\begin{align*}
x=a+\frac{r}{r+s}w,\qquad y=b+\frac{s}{r+s}w.
\end{align*}
Then
\begin{align*}
x+y
&=a+b+\frac{r}{r+s}w+\frac{s}{r+s}w\\
&=a+b+\frac{r+s}{r+s}w\\
&=a+b+w\\
&=z.
\end{align*}
Also,
\begin{align*}
|x-a|
&=\left|\frac{r}{r+s}w\right|\\
&=\frac{r}{r+s}|w|\\
&\le \frac{r}{r+s}(r+s)\\
&=r,
\end{align*}
and similarly
\begin{align*}
|y-b|
&=\left|\frac{s}{r+s}w\right|\\
&=\frac{s}{r+s}|w|\\
&\le \frac{s}{r+s}(r+s)\\
&=s.
\end{align*}
Therefore $x\in K$, $y\in L$, and $z=x+y\in K+L$. This proves the reverse inclusion, so the Minkowski sum of the two closed disks is exactly the closed disk centered at $a+b$ with radius $r+s$.
[/example]
The disk example suggests the correct intuition: Minkowski addition combines directions and adds the available extent in each direction. The obstruction is that a set is not described just by its radius in one direction unless some convexity is present. For general bodies, cancellation is delicate because adding another set may fill holes or blur separated pieces.
[remark: Cancellation Phenomena]
If $K+M=L+M$ for convex bodies $K,L,M \subset \mathbb R^n$, then support functions will imply $K=L$. For arbitrary non-convex compact sets, equality after adding the same set can lose information because the operation fills gaps and convexifies directional data after taking convex hulls.
[/remark]
The safe way to recover linear behaviour is to encode a convex body by directional data rather than by its individual points. Segment sums are the simplest non-round examples where this encoding can still be read directly from generators.
[example: Segment Sums And Zonotopes]
Let $v_1,\dots,v_m \in \mathbb R^n$ and set
\begin{align*}
S_i=[-v_i,v_i]=\{t_i v_i:-1\le t_i\le 1\}.
\end{align*}
We compute the Minkowski sum
\begin{align*}
Z=S_1+\cdots+S_m.
\end{align*}
If $z\in Z$, then by the definition of Minkowski sum there are points $x_i\in S_i$ such that
\begin{align*}
z=x_1+\cdots+x_m.
\end{align*}
Since $x_i\in S_i$, there is a scalar $t_i\in[-1,1]$ with $x_i=t_i v_i$. Substituting these expressions gives
\begin{align*}
z
&=x_1+\cdots+x_m\\
&=t_1v_1+\cdots+t_mv_m\\
&=\sum_{i=1}^m t_i v_i,
\end{align*}
with $-1\le t_i\le 1$ for every $i$.
Conversely, if $t_1,\dots,t_m\in[-1,1]$, then each $t_i v_i\in S_i$, so the definition of Minkowski sum gives
\begin{align*}
\sum_{i=1}^m t_i v_i \in S_1+\cdots+S_m=Z.
\end{align*}
Therefore
\begin{align*}
Z=\left\{\sum_{i=1}^m t_i v_i:-1\le t_i\le 1\right\}.
\end{align*}
Equivalently, if $T:\mathbb R^m\to\mathbb R^n$ is the linear map
\begin{align*}
T(t_1,\dots,t_m)=\sum_{i=1}^m t_i v_i,
\end{align*}
then
\begin{align*}
Z=T([-1,1]^m).
\end{align*}
Since $[-1,1]^m$ is a convex polytope and $T$ is linear, $Z$ is a convex polytope. It is centrally symmetric because if
\begin{align*}
z=\sum_{i=1}^m t_i v_i\in Z,
\end{align*}
then $-1\le -t_i\le 1$ for every $i$, and hence
\begin{align*}
-z
&=-\sum_{i=1}^m t_i v_i\\
&=\sum_{i=1}^m (-t_i)v_i\\
&\in Z.
\end{align*}
Thus $Z$ is the centrally symmetric convex polytope generated by adding the segments in the directions $v_1,\dots,v_m$; such a polytope is called a zonotope.
[/example]
This example is the bridge between polytope geometry and linear algebra: a zonotope records a finite list of directions and lengths, then sums their contributions. It also points to the limitation of pointwise descriptions of $K+L$: different pairs of points may produce the same boundary point, so the boundary of the sum is not obtained by simply adding boundaries. A directional measurement avoids this ambiguity.
## Support Functions As Linear Coordinates
How can a nonlinear-looking operation on sets become linear? The answer is to measure a convex body by its supporting value in each direction. Addition of points then turns into addition of suprema, and the supremum is exactly the operation that remembers the farthest supporting hyperplane in that direction.
Recall from Chapter 3 that for a non-empty compact convex set $K$, the support function is
\begin{align*}
h_K(u):=\sup_{x\in K} u\cdot x.
\end{align*}
Because $K$ is compact, the supremum is attained for every $u$. The value $h_K(u)$ is the supporting value in direction $u$.
[quotetheorem:4102]
[citeproof:4102]
This is the main reason support functions are central in convex geometry: they turn Minkowski addition into addition of functions. Non-empty compactness matters because the support value should be finite and attained; for unbounded sets it may be $+\infty$ in some directions. The restriction $\lambda \ge 0$ matters because negative scaling reverses directions, giving $h_{\lambda K}(u)=|\lambda|h_K(-u)$ when $\lambda<0$. Support functions also identify the closed convex hull of a compact set rather than the original non-convex set, so this coordinate system is faithful precisely in the convex setting.
[example: Square Plus Disk]
Let $Q=[-1,1]^2$ and let $B=\overline{B}(0,r)\subset \mathbb R^2$ with $r\ge 0$. For $u=(u_1,u_2)$, we first compute the two support functions from the definition. A point $x\in Q$ has the form $x=(x_1,x_2)$ with $-1\le x_1,x_2\le 1$, so
\begin{align*}
u\cdot x
&=u_1x_1+u_2x_2\\
&\le |u_1||x_1|+|u_2||x_2|\\
&\le |u_1|+|u_2|.
\end{align*}
This upper bound is attained by taking $x_i=1$ if $u_i\ge 0$ and $x_i=-1$ if $u_i<0$. Hence
\begin{align*}
h_Q(u)=|u_1|+|u_2|.
\end{align*}
For the disk, if $x\in B$, then $|x|\le r$, and the Cauchy-Schwarz inequality gives
\begin{align*}
u\cdot x\le |u|\,|x|\le r|u|.
\end{align*}
If $u\ne 0$, the point $x=r u/|u|$ belongs to $B$ and satisfies
\begin{align*}
u\cdot x
&=u\cdot \frac{r u}{|u|}\\
&=\frac{r}{|u|}(u\cdot u)\\
&=\frac{r}{|u|}|u|^2\\
&=r|u|.
\end{align*}
If $u=0$, both sides are $0$. Therefore
\begin{align*}
h_B(u)=r|u|.
\end{align*}
By *Support Functions Linearize Minkowski Addition*,
\begin{align*}
h_{Q+B}(u)
&=h_Q(u)+h_B(u)\\
&=|u_1|+|u_2|+r|u|.
\end{align*}
Thus adding the disk increases every supporting value by the radial amount $r|u|$; geometrically, $Q+B$ is the square with each side pushed outward by distance $r$ and each corner filled by a quarter-circle of radius $r$.
[/example]
Support functions also explain cancellation. Since a convex body is the intersection of all closed half-spaces determined by its support function, equality of support functions forces equality of bodies.
The remaining question is whether equality after adding the same body can be detected at the level of support functions. Because addition becomes ordinary addition of support functions, cancellation should follow once the support function is known to determine the original convex body.
[quotetheorem:4103]
[citeproof:4103]
The result shows the precise role of convexity: the support function remembers the closed convex hull of a compact set, and for convex bodies that is the original set. This is why cancellation is not a purely set-theoretic fact about arbitrary compact sets. The next question is whether the same coordinates also control limiting processes, since convex bodies are often handled by approximating them with simpler polytopes.
## Continuity In Hausdorff Distance
If two shapes are close, are their Minkowski sums close? This is important because convex bodies are often approximated by polytopes, and operations should respect such approximations.
[definition: Hausdorff Distance]
The Hausdorff distance on non-empty compact subsets of $\mathbb R^n$ is the map
\begin{align*}
d_H : \mathcal K(\mathbb R^n) \times \mathcal K(\mathbb R^n) &\to [0,\infty), \\
(A,B) &\mapsto d_H(A,B),
\end{align*}
where $\mathcal K(\mathbb R^n)$ denotes the set of non-empty compact subsets of $\mathbb R^n$ and
\begin{align*}
d_H(A,B):=\max\left\{\sup_{a\in A}\operatorname{dist}(a,B),\ \sup_{b\in B}\operatorname{dist}(b,A)\right\}.
\end{align*}
[/definition]
For general compact sets, the Hausdorff distance tracks the actual set rather than only its convex hull. Support functions cannot do that for non-convex sets, since a set and its convex hull have the same support function. For convex bodies, this obstruction disappears, and the Hausdorff distance can be read directly from support functions.
By the Hausdorff distance formula for support functions proved in Chapter 3, geometric convergence of convex bodies is the same as uniform convergence of their support functions on the unit sphere. Convexity is essential here: replacing a compact set by its support function would otherwise erase holes and disconnected components before any distance comparison is made. The payoff is that continuity of Minkowski addition becomes an inequality for functions, which is exactly the setting where linear estimates are most transparent.
The natural stability question is whether small errors in two summands can accumulate unpredictably after Minkowski addition. Support functions show that no hidden geometric instability occurs: addition of bodies becomes ordinary addition of functions. Therefore the Hausdorff error of the sum should be controlled by the sum of the two input errors.
[quotetheorem:4105]
[citeproof:4105]
This continuity result justifies computing sums by approximation, for example replacing smooth bodies by inscribed or circumscribed polytopes. The estimate is also sharp in spirit: each input error can propagate directly into the sum, so no cancellation should be expected at the level of Hausdorff distance. We now apply the same operation to the most important fixed summand, the Euclidean unit ball, which turns Minkowski addition into geometric thickening.
## Parallel Bodies And Outer Offsets
What happens when a convex body is thickened by a fixed radius in every direction? Minkowski addition with a Euclidean ball formalises this operation and creates the parallel body.
[definition: Parallel Body]
For each $t\ge 0$, the outer parallel body operation at distance $t$ is the map
\begin{align*}
P_t : \mathcal C^n &\to \mathcal C^n, \\
K &\mapsto P_t(K),
\end{align*}
where $\mathcal C^n$ denotes the set of convex bodies in $\mathbb R^n$ and
\begin{align*}
P_t(K):=K+tB(0,1).
\end{align*}
[/definition]
The support function records the offset by adding the same radial term in every direction:
\begin{align*}
h_{K+tB(0,1)}(u)=h_K(u)+t|u|.
\end{align*}
For $|u|=1$, every supporting hyperplane is shifted outward by distance $t$.
[illustration:convex-polygon-parallel-body]
[example: Polygon Offset By A Ball]
Let $P\subset \mathbb R^2$ be a convex polygon, let $B(0,1)$ be the Euclidean unit disk, and let $t>0$. We first identify
\begin{align*}
P+tB(0,1)=\{z\in \mathbb R^2:\operatorname{dist}(z,P)\le t\}.
\end{align*}
Indeed, if $z\in P+tB(0,1)$, then $z=p+tu$ for some $p\in P$ and $|u|\le 1$, so
\begin{align*}
|z-p|=|tu|=t|u|\le t,
\end{align*}
and hence $\operatorname{dist}(z,P)\le t$. Conversely, if $\operatorname{dist}(z,P)\le t$, then there is $p\in P$ with $|z-p|\le t$ because $P$ is compact. If $t>0$, set $u=(z-p)/t$. Then $|u|\le 1$ and
\begin{align*}
z=p+t u\in P+tB(0,1).
\end{align*}
Now write the side lengths of $P$ as $\ell_1,\dots,\ell_m$, and let $\alpha_1,\dots,\alpha_m$ be the exterior angles at the vertices, measured in radians. Convexity implies that the outward normal direction turns once around the circle while traversing the boundary, so
\begin{align*}
\alpha_1+\cdots+\alpha_m=2\pi.
\end{align*}
The part of $P+tB(0,1)$ adjacent to the $i$th edge and outside $P$ is a rectangle of length $\ell_i$ and width $t$, hence has area
\begin{align*}
\ell_i t.
\end{align*}
Summing over the edges gives the total strip area
\begin{align*}
\sum_{i=1}^m \ell_i t
&=t\sum_{i=1}^m \ell_i\\
&=t\,\operatorname{Per}(P).
\end{align*}
At the $i$th vertex, the remaining outside piece is a circular sector of radius $t$ and angle $\alpha_i$. Its area is the fraction $\alpha_i/(2\pi)$ of the area of the full disk of radius $t$, so
\begin{align*}
\mathcal L^2(\text{sector}_i)
&=\frac{\alpha_i}{2\pi}\cdot \pi t^2\\
&=\frac{1}{2}\alpha_i t^2.
\end{align*}
The total sector contribution is therefore
\begin{align*}
\sum_{i=1}^m \frac{1}{2}\alpha_i t^2
&=\frac{1}{2}t^2\sum_{i=1}^m \alpha_i\\
&=\frac{1}{2}t^2(2\pi)\\
&=\pi t^2.
\end{align*}
Thus the offset is obtained by adjoining edge rectangles and vertex sectors to $P$, giving
\begin{align*}
\mathcal L^2(P+tB(0,1))
=\mathcal L^2(P)+t\,\operatorname{Per}(P)+\pi t^2.
\end{align*}
The formula shows exactly how the area increase splits into linear edge contributions and a quadratic corner contribution.
[/example]
The preceding example shows that a polygonal offset has separate contributions from area, edges, and rounded corners. To turn this computation into a principle rather than a polygon-specific formula, we need a result that replaces the missing edges and vertices of a general convex body with intrinsic coefficients. The issue is whether the same kind of polynomial expansion survives without a combinatorial boundary decomposition.
[quotetheorem:4106]
[citeproof:4106]
This formula says that the area of an outer offset is a quadratic polynomial in the offset radius. Its coefficients encode intrinsic geometric data: area, perimeter, and the constant $\pi$ coming from the unit ball. The result is special to convex bodies in this clean form; for non-convex sets, offset regions can overlap or fill gaps, so the same coefficient interpretation can fail. It is the first visible case of mixed-volume theory, where polynomial coefficients become geometric invariants.
[example: Offset Of A Rectangle]
Let $R=[0,a]\times[0,b]$ with $a,b>0$, let $B=B(0,1)$, and let $t\ge 0$. The rectangle has area
\begin{align*}
\mathcal L^2(R)=ab.
\end{align*}
The offset $R+tB$ consists of $R$, four rectangular strips along the sides, and four quarter-disks of radius $t$ at the corners.
The two horizontal sides have length $a$, so their two strips have total area
\begin{align*}
at+at=2at.
\end{align*}
The two vertical sides have length $b$, so their two strips have total area
\begin{align*}
bt+bt=2bt.
\end{align*}
Thus the total strip contribution is
\begin{align*}
2at+2bt
&=2(a+b)t.
\end{align*}
Each corner contributes a quarter of a disk of radius $t$. Since a disk of radius $t$ has area $\pi t^2$, the four corner pieces have total area
\begin{align*}
4\cdot \frac{1}{4}\pi t^2
&=\pi t^2.
\end{align*}
Adding the original rectangle, the side strips, and the corner pieces gives
\begin{align*}
\mathcal L^2(R+tB)
&=ab+2at+2bt+\pi t^2\\
&=ab+2(a+b)t+\pi t^2.
\end{align*}
The formula separates the area increase into the linear contribution from the four sides and the quadratic contribution from rounding the four corners.
[/example]
In three dimensions the same principle gives a cubic expansion, with volume, surface area, mean-width-type data, and the volume of the unit ball appearing as coefficients. The full mixed-volume theory developed later explains these coefficients systematically.
[remark: From Parallel Bodies To Mixed Volumes]
The expression $K+tB$ is the first instance of a mixed linear structure: geometry is encoded by polynomial dependence on scalar coefficients in Minkowski combinations. Later chapters replace the single ball $B$ by several convex bodies $K_1,\dots,K_m$ and study the coefficients of
\begin{align*}
\mathcal L^n(t_1K_1+\cdots+t_mK_m).
\end{align*}
Those coefficients are mixed volumes, and the linearity of support functions is the analytic mechanism behind the theory.
[/remark]
With Minkowski addition in hand, the next question is how to pass to a dual description. Chapter 5 introduces polarity, gauge functions, and duality, translating convex bodies into inequalities for linear functionals and preparing the ground for volumetric arguments.
# 5. Polarity, Gauge Functions, and Duality
After support functions and Minkowski addition have translated convex bodies into directional inequalities, polarity turns a convex body into the set of all linear functionals whose value on the body is at most one. This chapter explains why the origin matters, how gauges and radial functions encode the same geometry from opposite directions, and how support functions become gauges after passing to the polar. The guiding theme is order-reversing duality: large sets have small polars, intersections become convex hulls, and linear maps turn into adjoint maps.
## Bodies Seen from the Origin
How should we record the size of a convex body when the origin is the point from which all directions are measured? A support function looks outward in all dual directions, while a radial function looks along rays from the origin. Polarity links these two measurements, but only after the position of the origin has been fixed.
[definition: Polar Body]
Polarity is the map
\begin{align*}
(\cdot)^\circ : \mathcal P(\mathbb R^n) \to \mathcal P(\mathbb R^n)
\end{align*}
defined as follows. For $K \subset \mathbb R^n$, the polar of $K$ is
\begin{align*}
K^\circ := \{y \in \mathbb R^n : x \cdot y \le 1 \text{ for all } x \in K\}.
\end{align*}
[/definition]
The definition is asymmetric in appearance because it uses the level $1$ and the ambient origin. Translating $K$ changes $K^\circ$ in a substantial way; polarity is therefore a construction for bodies whose relation to $0$ is part of the data.
[definition: Gauge Function]
Let $K \subset \mathbb R^n$ be a convex set with $0 \in K$. The gauge of $K$ is the function $p_K: \mathbb R^n \to [0,\infty]$ defined by
\begin{align*}
p_K(x) := \inf\{\lambda > 0 : x \in \lambda K\}.
\end{align*}
[/definition]
When $K$ is absorbing, the gauge is finite everywhere. If $K$ is closed, bounded, convex, centrally symmetric, and has non-empty interior, then $p_K$ is the norm whose closed unit ball is $K$. Boundedness is essential: an unbounded centrally symmetric convex set with non-empty interior may have zero gauge in some non-zero directions, so the gauge can be degenerate rather than a norm.
The gauge records how much a vector must be rescaled to enter $K$, but this inverse viewpoint can hide the actual geometry of a ray.
For many questions about polarity and support functions, the natural datum is not the scaling needed to pull a vector into $K$, but the endpoint reached by traveling outward from the origin in a fixed direction. This motivates a separate function that measures the maximal distance for which the ray remains inside the body.
[definition: Radial Function]
Let $K \subset \mathbb R^n$ be a convex body with $0 \in K$. The radial function of $K$ is the function $\rho_K: \mathbb R^n \setminus \{0\} \to [0,\infty]$ defined by
\begin{align*}
\rho_K(u) := \sup\{t \ge 0 : tu \in K\}.
\end{align*}
[/definition]
For directions $u$ with $\rho_K(u)>0$, the gauge and radial function are reciprocal along the same ray:
\begin{align*}
p_K(u)=\frac{1}{\rho_K(u)}.
\end{align*}
Thus the gauge measures how much a vector must be scaled down to enter $K$, while the radial function measures how far one can travel in that direction while remaining in $K$.
[example: Gauge of a Euclidean Ball]
Let $K=\overline{B}(0,r)=\{z\in\mathbb R^n: |z|\le r\}$ with $r>0$. For $\lambda>0$,
\begin{align*}
x\in \lambda K
&\Longleftrightarrow x=\lambda z \text{ for some } z\in K\\
&\Longleftrightarrow x=\lambda z \text{ for some } z\in\mathbb R^n \text{ with } |z|\le r\\
&\Longleftrightarrow \left|\frac{x}{\lambda}\right|\le r\\
&\Longleftrightarrow |x|\le \lambda r\\
&\Longleftrightarrow \frac{|x|}{r}\le \lambda .
\end{align*}
Therefore the set of admissible $\lambda$ in the gauge definition is
\begin{align*}
\{\lambda>0:x\in\lambda K\}
=
\left\{\lambda>0:\lambda\ge \frac{|x|}{r}\right\},
\end{align*}
so
\begin{align*}
p_K(x)=\inf\left\{\lambda>0:\lambda\ge \frac{|x|}{r}\right\}=\frac{|x|}{r}.
\end{align*}
For the radial function, if $u\ne 0$, then
\begin{align*}
tu\in K
&\Longleftrightarrow |tu|\le r\\
&\Longleftrightarrow t|u|\le r \qquad (t\ge 0)\\
&\Longleftrightarrow t\le \frac{r}{|u|}.
\end{align*}
Hence
\begin{align*}
\rho_K(u)=\sup\left\{t\ge 0:t\le \frac{r}{|u|}\right\}=\frac{r}{|u|}.
\end{align*}
For the open ball $B(0,r)$, the corresponding membership condition is $|x|<\lambda r$ rather than $|x|\le \lambda r$, but the infimum is still $|x|/r$. Thus the Euclidean ball has gauge equal to the Euclidean norm scaled by $1/r$, and its radial function records the reciprocal distance to the boundary along each ray.
[/example]
## Inclusion Reversal and the Bipolar Theorem
What information is lost when we replace $K$ by all inequalities $x\cdot y\le 1$ valid on $K$? The first answer is that polarity reverses containment. The deeper answer is the bipolar theorem: under the right closedness and convexity hypotheses, taking the polar twice returns the original body.
[quotetheorem:4107]
[citeproof:4107]
This order reversal is the source of many dual identities. The hypothesis $K\subset L$ is exactly what allows every inequality valid on $L$ to be reused on $K$; without containment there is no reason for either polar to contain the other. For instance, in $\mathbb R$, the sets $[-1,1]$ and $[2,3]$ are not nested, and their polars are $[-1,1]$ and $(-\infty,1/3]$, which are not nested in the corresponding reverse way. The theorem does not say that polarity preserves size numerically, only that it converts inclusion into reverse inclusion, which is the mechanism behind the convex-hull and intersection identities below.
[quotetheorem:4108]
[citeproof:4108]
The theorem says that closed convex sets containing the origin are exactly the sets determined by their polar inequalities. Each hypothesis has a role: without convexity, the second polar cannot recover non-convex dents, and without closedness it recovers the closure rather than the original set. More generally, for an arbitrary set $A\subset\mathbb R^n$, the bipolar is the closed convex hull of $A\cup\{0\}$. The origin condition is not cosmetic: without it, the normalisation by the level $1$ no longer captures the set correctly, as the next example shows.
[example: Failure After Translation]
Let $K=[1,2]\subset\mathbb R$. By the definition of the polar,
\begin{align*}
K^\circ
&=\{y\in\mathbb R:xy\le 1 \text{ for every } x\in[1,2]\}.
\end{align*}
If $y>1/2$, then choosing $x=2$ gives $xy=2y>1$, so $y\notin K^\circ$. If $y\le 1/2$, then for every $x\in[1,2]$ we have $x\ge 1>0$ and $x\le 2$. When $y\ge 0$,
\begin{align*}
xy\le 2y\le 1,
\end{align*}
and when $y<0$,
\begin{align*}
xy<0\le 1.
\end{align*}
Thus
\begin{align*}
K^\circ=(-\infty,1/2].
\end{align*}
Now compute the bipolar:
\begin{align*}
(K^\circ)^\circ
&=\{x\in\mathbb R:xy\le 1 \text{ for every } y\le 1/2\}.
\end{align*}
If $x<0$, then taking $y$ negative with $y<1/x$ gives $xy>1$, because dividing by the negative number $x$ reverses the inequality; hence $x\notin(K^\circ)^\circ$. If $x\ge 0$, then for every $y\le 1/2$,
\begin{align*}
xy\le x\cdot \frac12=\frac{x}{2}.
\end{align*}
Therefore the condition $xy\le 1$ for all $y\le 1/2$ is equivalent to
\begin{align*}
\frac{x}{2}\le 1,
\end{align*}
that is, $0\le x\le 2$. Hence
\begin{align*}
(K^\circ)^\circ=[0,2].
\end{align*}
The bipolar fills in the interval from the origin to $K$, so it recovers $[0,2]$ rather than the translated interval $[1,2]$; the missing hypothesis is that the original set should contain the origin.
[/example]
## Support Functions and Gauges Under Duality
Can support functions and gauges be regarded as the same construction viewed in dual spaces? The support function of $K$ records the largest value of a linear functional on $K$, while the gauge of $K^\circ$ asks how much that functional must be scaled to enter the polar. These two descriptions match exactly.
For the support function introduced in Chapter 3, now allowing the extended value $+\infty$ when $K$ is unbounded, write
\begin{align*}
h_K(y):=\sup_{x\in K} x\cdot y.
\end{align*}
The polar is the unit sublevel set of the support function:
\begin{align*}
K^\circ=\{y\in\mathbb R^n:h_K(y)\le 1\}.
\end{align*}
This observation turns many geometric statements about polars into analytic statements about homogeneous convex functions.
[quotetheorem:4109]
[citeproof:4109]
For centrally symmetric convex bodies, this is exactly the usual duality of norms. The assumption $0\in K$ is needed because both the gauge and the polar use dilation from the origin; translating $K$ changes the gauge and polar in a way not detected by the original support function alone. If $K$ is not absorbing, then $p_K$ may take the value $\infty$, so the identity should be read in the extended-valued sense rather than as an equality of finite norms.
When $K$ is the closed unit ball of a norm, polarity should produce another unit ball. To state this familiar norm-theoretic form of the same geometry, we name the norm obtained by maximizing linear functionals over the original unit ball.
[definition: Dual Norm]
Let $\|\cdot\|$ be a norm on $\mathbb R^n$. The dual norm is the map $\|\cdot\|_*: \mathbb R^n \to [0,\infty)$ defined by
\begin{align*}
y \mapsto \|y\|_*:=\sup\{x\cdot y: \|x\|\le 1\}.
\end{align*}
[/definition]
The dual norm definition becomes concrete on the standard cube: maximizing a linear functional over coordinate bounds produces the $\ell^1$ expression.
[example: Cube and Cross-Polytope]
Let $K=[-1,1]^n$. Fix $y=(y_1,\dots,y_n)\in\mathbb R^n$. If $x=(x_1,\dots,x_n)\in K$, then $|x_i|\le 1$ for each $i$, so
\begin{align*}
x\cdot y
&=\sum_{i=1}^n x_i y_i\\
&\le \sum_{i=1}^n |x_i y_i|\\
&=\sum_{i=1}^n |x_i|\,|y_i|\\
&\le \sum_{i=1}^n |y_i|.
\end{align*}
Hence, if $\sum_{i=1}^n |y_i|\le 1$, then $x\cdot y\le 1$ for every $x\in[-1,1]^n$.
Conversely, define $s=(s_1,\dots,s_n)$ by
\begin{align*}
s_i=
\begin{cases}
1,& y_i\ge 0,\\
-1,& y_i<0.
\end{cases}
\end{align*}
Then $s\in[-1,1]^n$, and
\begin{align*}
s\cdot y
&=\sum_{i=1}^n s_i y_i\\
&=\sum_{\{i:y_i\ge 0\}} y_i+\sum_{\{i:y_i<0\}}(-y_i)\\
&=\sum_{\{i:y_i\ge 0\}} |y_i|+\sum_{\{i:y_i<0\}} |y_i|\\
&=\sum_{i=1}^n |y_i|.
\end{align*}
Therefore, if $x\cdot y\le 1$ for every $x\in[-1,1]^n$, applying this to $x=s$ gives $\sum_{i=1}^n |y_i|\le 1$. Thus
\begin{align*}
[-1,1]^n{}^\circ
&=\{y\in\mathbb R^n:x\cdot y\le 1\text{ for every }x\in[-1,1]^n\}\\
&=\left\{y\in\mathbb R^n:\sum_{i=1}^n |y_i|\le 1\right\}.
\end{align*}
So the polar of the $\ell^\infty$ unit ball is the $\ell^1$ unit ball, also called the cross-polytope.
[/example]
The cube example is polyhedral: finitely many coordinate bounds become finitely many extremal sign choices in the support function. For the Euclidean ball, the corresponding computation is smooth and governed by Cauchy--Schwarz.
[example: Polar of an Ellipsoid]
Let $A\in\mathbb R^{n\times n}$ be symmetric positive definite and set
\begin{align*}
E_A:=\{x\in\mathbb R^n:x^\top A x\le 1\}.
\end{align*}
We compute the largest value of $x\cdot y$ on $E_A$. If $y=0$, then $x\cdot y=0$ for every $x$, and the formula below gives $0=(0^\top A^{-1}0)^{1/2}$. Now assume $y\ne 0$ and put $v=A^{-1}y$. Since $A$ is positive definite,
\begin{align*}
y^\top A^{-1}y
&=y^\top v\\
&=(Av)^\top v\\
&=v^\top A v\\
&>0.
\end{align*}
For any $x\in E_A$ and any real number $s$,
\begin{align*}
0
&\le (x-sv)^\top A(x-sv)\\
&=x^\top A x-sv^\top A x-sx^\top A v+s^2v^\top A v\\
&=x^\top A x-2s\,x^\top A v+s^2v^\top A v\\
&=x^\top A x-2s\,x^\top y+s^2y^\top A^{-1}y.
\end{align*}
Taking
\begin{align*}
s=\frac{x^\top y}{y^\top A^{-1}y}
\end{align*}
gives
\begin{align*}
0
&\le x^\top A x
-2\frac{(x^\top y)^2}{y^\top A^{-1}y}
+\frac{(x^\top y)^2}{(y^\top A^{-1}y)^2}\,y^\top A^{-1}y\\
&=x^\top A x-\frac{(x^\top y)^2}{y^\top A^{-1}y}.
\end{align*}
Thus
\begin{align*}
(x^\top y)^2
&\le (x^\top A x)(y^\top A^{-1}y)\\
&\le y^\top A^{-1}y,
\end{align*}
because $x^\top A x\le 1$. Hence
\begin{align*}
x\cdot y=x^\top y\le (y^\top A^{-1}y)^{1/2}
\end{align*}
for every $x\in E_A$.
This upper bound is attained by
\begin{align*}
x_0=\frac{A^{-1}y}{(y^\top A^{-1}y)^{1/2}}.
\end{align*}
Indeed,
\begin{align*}
x_0^\top A x_0
&=\frac{(A^{-1}y)^\top A(A^{-1}y)}{y^\top A^{-1}y}\\
&=\frac{y^\top A^{-1}AA^{-1}y}{y^\top A^{-1}y}\\
&=\frac{y^\top A^{-1}y}{y^\top A^{-1}y}\\
&=1,
\end{align*}
so $x_0\in E_A$, and
\begin{align*}
x_0\cdot y
&=x_0^\top y\\
&=\frac{(A^{-1}y)^\top y}{(y^\top A^{-1}y)^{1/2}}\\
&=\frac{y^\top A^{-1}y}{(y^\top A^{-1}y)^{1/2}}\\
&=(y^\top A^{-1}y)^{1/2}.
\end{align*}
Therefore
\begin{align*}
\sup_{x^\top A x\le 1}x\cdot y=(y^\top A^{-1}y)^{1/2}.
\end{align*}
By the definition of the polar,
\begin{align*}
E_A^\circ
&=\{y\in\mathbb R^n:x\cdot y\le 1\text{ for every }x\in E_A\}\\
&=\left\{y\in\mathbb R^n:\sup_{x\in E_A}x\cdot y\le 1\right\}\\
&=\{y\in\mathbb R^n:(y^\top A^{-1}y)^{1/2}\le 1\}\\
&=\{y\in\mathbb R^n:y^\top A^{-1}y\le 1\}.
\end{align*}
Thus polarity sends the centred ellipsoid with defining matrix $A$ to the centred ellipsoid with defining matrix $A^{-1}$.
[/example]
## Polar Identities for Operations
How does polarity interact with the operations used earlier in the course, such as linear images and intersections? Since a polar is defined by inequalities, the answer is governed by how inequalities transform under adjoints and by how simultaneous systems of inequalities combine.
[quotetheorem:4110]
[citeproof:4110]
This identity is especially useful for projections and sections. The map must be linear and origin-compatible: for an affine map $x\mapsto Tx+a$, the extra translation contributes an additional term $a\cdot y$ to the defining inequality, so the displayed formula no longer applies unchanged. The transpose appears because inequalities are tested by pairing points with linear functionals, and moving $T$ across this pairing gives $T^\top$. Thus polarity converts geometry in the primal space into constraints in the dual space, a viewpoint used throughout optimization and separating-hyperplane arguments.
Polarity also reverses the logical operations used to assemble sets. For a point to lie in the polar of a union, it must satisfy the defining inequality on each piece at once. Since linear inequalities cannot distinguish a set from its convex hull, the correct statement involves the convex hull on one side and an intersection of polars on the other.
The point of the identity is to make the logical quantifier visible. A functional that is bounded by $1$ on every point of $K\cup L$ is bounded by $1$ on $K$ and on $L$ separately, and no extra information is gained by passing to convex combinations. This is the precise mechanism by which a union in the primal space becomes an intersection in the polar space.
[quotetheorem:4111]
[citeproof:4111]
The theorem depends on convex hulls because linear inequalities are preserved under convex averaging. It does not say that the polar of an arbitrary union is simply an intersection after any nonlinear operation; the convex hull enters precisely because linear inequalities cannot distinguish a set from its convex hull. This is the first half of the union/intersection dictionary: after the bipolar theorem is applied, the same idea converts intersections into closed convex hulls of polars.
The reverse operation is subtler. Intersecting primal bodies imposes fewer pointwise constraints than belonging to either body separately, so the polar should grow rather than shrink. The growth is not usually a plain union: convex combinations and closure enter because the bipolar theorem returns closed convex sets containing the origin.
The theorem below records the completed dual operation. It asks how to describe all functionals that are bounded on both bodies at once, using only the separate polar data of $K$ and $L$. The obstruction is that adding polar bodies produces convex mixtures of supporting inequalities, while closedness must be restored at the end.
[quotetheorem:4112]
[citeproof:4112]
The closure in the intersection formula is not a decoration; convex hulls of closed sets need not be closed, and the bipolar theorem naturally returns closed convex sets. The assumptions that $K$ and $L$ are closed, convex, and contain $0$ ensure that applying polarity twice returns the original intersection rather than its closed convex hull with the origin adjoined. The identity does not extend verbatim to arbitrary translated sets, because polarity is tied to the origin and translations alter the level-one inequalities. The chapter therefore ends with a dictionary: containment reverses, support functions become gauges, matrices invert for centred ellipsoids, cubes become cross-polytopes, and linear maps become transposed constraints. These rules will be used later when volume inequalities, extremal ellipsoids, and optimization dual certificates are expressed through dual bodies.
The dual viewpoint clarifies structure, but volume measures how that structure changes quantitatively. Chapter 6 takes up volume and the Brunn-Minkowski theory, asking how Lebesgue measure reacts to addition, scaling, and affine normalization of convex bodies.
# 6. Volume, Brunn-Minkowski, and Consequences
Volume turns the algebra of Minkowski addition into numerical inequalities. Chapters 3 through 5 developed support functions, Minkowski addition, and polarity; here the central question is how Lebesgue volume reacts to those operations. The main theorem is the Brunn-Minkowski inequality, which says that the $n$-th root of volume is superadditive under Minkowski addition. Its consequences link convex geometry with isoperimetry and with the functional theory of log-concavity.
## Measuring Convex Bodies
What should a size functional on convex bodies do under translation, linear change of variables, and dilation? Lebesgue volume is the basic answer: it is invariant under translations, homogeneous under scalar dilation, and transforms by determinants under linear maps. These properties make it compatible with the affine structure used throughout convex geometry.
[definition: Convex Body Volume]
Let $\mathcal K^n$ denote the class of convex bodies in $\mathbb R^n$. The volume functional is the map
\begin{align*}
|\cdot|: \mathcal K^n \to [0,\infty), \qquad K \mapsto |K|:=\mathcal L^n(K),
\end{align*}
where $\mathcal L^n$ denotes $n$-dimensional Lebesgue measure.
[/definition]
Compactness and convexity are not needed for Lebesgue measure to exist, but they are useful here because convex bodies are measurable, bounded, and geometrically stable under the operations of the course. The notation $|K|$ is reserved for $n$-dimensional volume in the ambient space; if $K$ lies in a lower-dimensional affine subspace, then $|K|=0$ as a subset of $\mathbb R^n$.
Before volume can be used as an affine invariant, we need its basic transformation rules under the operations already used for convex bodies. The essential question is how translation, scalar dilation, and an invertible linear map change the measured size of a body.
[quotetheorem:4113]
[citeproof:4113]
This theorem is the affine bookkeeping behind many later normalisations. The compact convex-body hypotheses keep the objects bounded and measurable, so every term in the displayed identities is finite and well defined. For arbitrary nonmeasurable sets, $|K|$ need not exist, and for unbounded measurable sets the identities may become statements about $\infty$ rather than useful formulae. The theorem also says only how volume changes under a single affine operation; it does not describe how volume behaves under nonlinear operations such as Minkowski addition. For instance, ellipsoids have the same volume behaviour as balls once written as linear images of balls.
[example: Volume Of An Ellipsoid]
Let $B(0,1) \subset \mathbb R^n$ be the Euclidean unit ball, let $A \in \mathbb R^{n \times n}$ be invertible, and let
\begin{align*}
E=A B(0,1)+x_0.
\end{align*}
By translation invariance from *Volume Under Affine Operations*,
\begin{align*}
|E|
&=|A B(0,1)+x_0|\\
&=|A B(0,1)|.
\end{align*}
Applying the determinant transformation formula from *Volume Under Affine Operations* to the linear map $A$ gives
\begin{align*}
|A B(0,1)|
&=|\det A|\,|B(0,1)|.
\end{align*}
Combining the two displayed identities,
\begin{align*}
|E|=|\det A|\,|B(0,1)|.
\end{align*}
If $A=\operatorname{diag}(a_1,\dots,a_n)$ with $a_i>0$, then
\begin{align*}
A B(0,1)
=\left\{(a_1y_1,\dots,a_ny_n): y_1^2+\cdots+y_n^2\le 1\right\}.
\end{align*}
Writing $x_i=a_i y_i$, this is equivalently
\begin{align*}
A B(0,1)
=\left\{x\in \mathbb R^n:\frac{x_1^2}{a_1^2}+\cdots+\frac{x_n^2}{a_n^2}\le 1\right\},
\end{align*}
so the semi-axis lengths are $a_1,\dots,a_n$. Since
\begin{align*}
\det A=a_1\cdots a_n,
\end{align*}
the volume is
\begin{align*}
|E|=(a_1\cdots a_n)|B(0,1)|.
\end{align*}
Thus an ellipsoid has the same volume as the unit ball multiplied by the determinant of the linear map that produces it.
[/example]
The most important interaction in this chapter is not volume under a single linear map, but volume under the addition of two sets. Minkowski addition usually enlarges sets in a way that is nonlinear in volume, so the correct scale is $|K|^{1/n}$ rather than $|K|$. The failure of volume itself to have the right scale is already visible from $K+K=2K$: if a superadditive inequality used $|K|$ directly, it would compare $2^n|K|$ with $2|K|$, which reflects dimension rather than a length-like size. Taking the $n$-th root converts volume back to a quantity homogeneous of degree one.
## The Brunn-Minkowski Inequality
How large must the Minkowski sum $K+L$ be, given only the volumes of $K$ and $L$? The Brunn-Minkowski inequality gives a sharp lower bound. It is one of the central structural results of convex geometry because it converts addition of sets into concavity of a scalar quantity.
Recall from Chapter 4 that for subsets $A,B \subset \mathbb R^n$,
\begin{align*}
A+B := \{a+b : a \in A,\ b \in B\}, \qquad tA := \{ta : a \in A\}\quad(t\ge 0).
\end{align*}
Minkowski addition preserves convex bodies.
[quotetheorem:4114]
[citeproof:4114]
The theorem says that $K \mapsto |K|^{1/n}$ behaves like a concave size under Minkowski addition, but its hypotheses are doing real work. Nonemptiness avoids the meaningless expression $|\varnothing|^{1/n}+|L|^{1/n}$ on the right while $\varnothing+L=\varnothing$ on the left. Compactness keeps the sets bounded and Lebesgue measurable, so the displayed volumes are finite and the sum remains in the same clean geometric category. The inequality is sharp, and the equality cases explain why homothetic sets are the correct geometric model.
[definition: Homothetic Convex Bodies]
Convex bodies $K,L \subset \mathbb R^n$ are homothetic if there exist $a>0$ and $x_0 \in \mathbb R^n$ such that
\begin{align*}
L=aK+x_0.
\end{align*}
[/definition]
The definition identifies the geometric model in which Minkowski addition merely rescales and translates shape. The next issue is whether this model is exactly what accounts for equality in Brunn-Minkowski for full-dimensional convex bodies, rather than just a source of examples.
[quotetheorem:4115]
[citeproof:4115]
The reverse direction is the part most often used in computations. The nonempty-interior hypothesis rules out lower-dimensional degeneracies: two line segments in $\mathbb R^2$ have zero area, so equality can hold for volume reasons without expressing the genuine full-dimensional equality geometry. The theorem is also deliberately stated for convex bodies, because equality for arbitrary measurable sets has additional measure-theoretic qualifications and is not just a naive homothety statement. Geometrically, equality means that Minkowski addition has produced no extra spreading beyond dilation and translation, which is why balls, cubes, and boxes with proportional side lengths attain equality.
[example: Homothetic Balls]
Let $K=B(0,r)$ and $L=B(x_0,s)$ in $\mathbb R^n$, with $r,s>0$. We first verify the Minkowski sum. If $z\in K+L$, then $z=u+v$ for some $u\in B(0,r)$ and $v\in B(x_0,s)$. Hence
\begin{align*}
|z-x_0|
&=|u+v-x_0|\\
&=|u+(v-x_0)|\\
&\le |u|+|v-x_0|\\
&\le r+s,
\end{align*}
so $z\in B(x_0,r+s)$.
Conversely, let $z\in B(x_0,r+s)$. If $z=x_0$, then $z=0+x_0$ with $0\in B(0,r)$ and $x_0\in B(x_0,s)$. If $z\ne x_0$, set
\begin{align*}
u=\frac{r}{r+s}(z-x_0),
\qquad
v=x_0+\frac{s}{r+s}(z-x_0).
\end{align*}
Then
\begin{align*}
|u|
&=\frac{r}{r+s}|z-x_0|
\le r,\\
|v-x_0|
&=\frac{s}{r+s}|z-x_0|
\le s,
\end{align*}
so $u\in B(0,r)$ and $v\in B(x_0,s)$. Also
\begin{align*}
u+v
&=\frac{r}{r+s}(z-x_0)+x_0+\frac{s}{r+s}(z-x_0)\\
&=x_0+\frac{r+s}{r+s}(z-x_0)\\
&=z.
\end{align*}
Therefore
\begin{align*}
K+L=B(x_0,r+s).
\end{align*}
By *Volume Under Affine Operations*, translation invariance and homogeneity give, for every $a>0$,
\begin{align*}
|B(x_0,a)|
&=|B(0,a)|\\
&=|aB(0,1)|\\
&=a^n|B(0,1)|.
\end{align*}
Applying this with $a=r+s$, $a=r$, and $a=s$ gives
\begin{align*}
|K+L|^{1/n}
&=|B(x_0,r+s)|^{1/n}\\
&=\left((r+s)^n|B(0,1)|\right)^{1/n}\\
&=(r+s)|B(0,1)|^{1/n},
\end{align*}
while
\begin{align*}
|K|^{1/n}+|L|^{1/n}
&=|B(0,r)|^{1/n}+|B(x_0,s)|^{1/n}\\
&=\left(r^n|B(0,1)|\right)^{1/n}
+\left(s^n|B(0,1)|\right)^{1/n}\\
&=r|B(0,1)|^{1/n}+s|B(0,1)|^{1/n}\\
&=(r+s)|B(0,1)|^{1/n}.
\end{align*}
Thus
\begin{align*}
|K+L|^{1/n}=|K|^{1/n}+|L|^{1/n},
\end{align*}
so these homothetic balls attain equality in the Brunn-Minkowski inequality.
[/example]
Balls show the equality mechanism in its most symmetric form. Boxes scaled by one common factor have the same algebraic equality pattern, with side lengths replacing radii.
[example: Homothetic Boxes]
Let
\begin{align*}
K&=\prod_{i=1}^n [0,a_i], & L&=\prod_{i=1}^n [b_i,b_i+\lambda a_i],
\end{align*}
where $a_i>0$ and $\lambda>0$. Since
\begin{align*}
\lambda K+b
&=\left\{\lambda x+b:x_i\in[0,a_i]\text{ for every }i\right\}\\
&=\left\{y:y_i=\lambda x_i+b_i,\ x_i\in[0,a_i]\text{ for every }i\right\}\\
&=\prod_{i=1}^n [b_i,b_i+\lambda a_i],
\end{align*}
we have $L=\lambda K+b$, where $b=(b_1,\dots,b_n)$.
We compute the Minkowski sum coordinate by coordinate. If $z\in K+L$, then $z=x+y$ with $x_i\in[0,a_i]$ and $y_i\in[b_i,b_i+\lambda a_i]$, so for each $i$,
\begin{align*}
b_i\le z_i=x_i+y_i\le a_i+(b_i+\lambda a_i)=b_i+(1+\lambda)a_i.
\end{align*}
Thus
\begin{align*}
K+L\subseteq \prod_{i=1}^n [b_i,b_i+(1+\lambda)a_i].
\end{align*}
Conversely, if $z_i\in[b_i,b_i+(1+\lambda)a_i]$, set
\begin{align*}
x_i=\frac{1}{1+\lambda}(z_i-b_i),
\qquad
y_i=b_i+\frac{\lambda}{1+\lambda}(z_i-b_i).
\end{align*}
Then
\begin{align*}
0\le x_i\le a_i,
\qquad
b_i\le y_i\le b_i+\lambda a_i,
\end{align*}
and
\begin{align*}
x_i+y_i
&=\frac{1}{1+\lambda}(z_i-b_i)+b_i+\frac{\lambda}{1+\lambda}(z_i-b_i)\\
&=b_i+\frac{1+\lambda}{1+\lambda}(z_i-b_i)\\
&=z_i.
\end{align*}
Hence
\begin{align*}
K+L=\prod_{i=1}^n [b_i,b_i+(1+\lambda)a_i].
\end{align*}
The volume of an axis-parallel box is the product of its side lengths, so
\begin{align*}
|K|
&=\prod_{i=1}^n a_i,\\
|L|
&=\prod_{i=1}^n \lambda a_i
=\lambda^n\prod_{i=1}^n a_i
=\lambda^n |K|,\\
|K+L|
&=\prod_{i=1}^n (1+\lambda)a_i
=(1+\lambda)^n\prod_{i=1}^n a_i
=(1+\lambda)^n|K|.
\end{align*}
Taking $n$-th roots gives
\begin{align*}
|L|^{1/n}
&=(\lambda^n|K|)^{1/n}
=\lambda |K|^{1/n},\\
|K+L|^{1/n}
&=((1+\lambda)^n|K|)^{1/n}
=(1+\lambda)|K|^{1/n}.
\end{align*}
Therefore
\begin{align*}
|K+L|^{1/n}
&=(1+\lambda)|K|^{1/n}\\
&=|K|^{1/n}+\lambda |K|^{1/n}\\
&=|K|^{1/n}+|L|^{1/n}.
\end{align*}
Thus these boxes attain equality because $L$ is obtained from $K$ by one common dilation factor and a translation.
[/example]
The proportionality of every side length is essential. If different directions are scaled by different factors, Minkowski addition introduces extra area rather than merely dilating the original rectangle.
[example: Strictness For Non-Homothetic Rectangles]
In $\mathbb R^2$, let $K=[0,a]\times[0,b]$ and $L=[0,c]\times[0,d]$, with $a,b,c,d>0$. We first compute the Minkowski sum. If $z=(z_1,z_2)\in K+L$, then $z=x+y$ with $x\in K$ and $y\in L$, so
\begin{align*}
0\le z_1=x_1+y_1\le a+c,
\qquad
0\le z_2=x_2+y_2\le b+d.
\end{align*}
Thus $K+L\subseteq[0,a+c]\times[0,b+d]$. Conversely, if $z\in[0,a+c]\times[0,b+d]$, set
\begin{align*}
x_1=\frac{a}{a+c}z_1,\quad y_1=\frac{c}{a+c}z_1,
\qquad
x_2=\frac{b}{b+d}z_2,\quad y_2=\frac{d}{b+d}z_2.
\end{align*}
Then $0\le x_1\le a$, $0\le y_1\le c$, $0\le x_2\le b$, and $0\le y_2\le d$, while
\begin{align*}
x_1+y_1
&=\frac{a}{a+c}z_1+\frac{c}{a+c}z_1
=z_1,\\
x_2+y_2
&=\frac{b}{b+d}z_2+\frac{d}{b+d}z_2
=z_2.
\end{align*}
Hence $z=x+y$ with $x\in K$ and $y\in L$, so
\begin{align*}
K+L=[0,a+c]\times[0,b+d].
\end{align*}
The areas are
\begin{align*}
|K|=ab,\qquad |L|=cd,\qquad |K+L|=(a+c)(b+d).
\end{align*}
Therefore the Brunn-Minkowski inequality in this case reads
\begin{align*}
\sqrt{(a+c)(b+d)} \ge \sqrt{ab}+\sqrt{cd}.
\end{align*}
Since both sides are nonnegative, this is equivalent to the squared inequality
\begin{align*}
(a+c)(b+d)\ge \left(\sqrt{ab}+\sqrt{cd}\right)^2.
\end{align*}
Expanding both sides gives
\begin{align*}
(a+c)(b+d)
&=ab+ad+bc+cd,\\
\left(\sqrt{ab}+\sqrt{cd}\right)^2
&=ab+2\sqrt{ab}\sqrt{cd}+cd\\
&=ab+2\sqrt{abcd}+cd.
\end{align*}
Thus the difference between the left side and the squared right side is
\begin{align*}
ab+ad+bc+cd-\left(ab+2\sqrt{abcd}+cd\right)
&=ad+bc-2\sqrt{abcd}\\
&=(\sqrt{ad})^2-2\sqrt{ad}\sqrt{bc}+(\sqrt{bc})^2\\
&=(\sqrt{ad}-\sqrt{bc})^2\\
&\ge 0.
\end{align*}
Equality holds exactly when $\sqrt{ad}=\sqrt{bc}$, equivalently $ad=bc$, equivalently
\begin{align*}
\frac{a}{b}=\frac{c}{d}.
\end{align*}
This is precisely the condition that $L$ is obtained from $K$ by one common dilation factor, so non-homothetic rectangles make the inequality strict.
[/example]
## Minkowski Interpolation And Concavity
What happens if we continuously move from one convex body to another using Minkowski combinations? Brunn-Minkowski says that volume to the power $1/n$ is concave along this interpolation. This is the form in which the theorem is most useful for variational arguments.
[definition: Minkowski Interpolation]
For convex bodies $K,L \subset \mathbb R^n$ and $t \in [0,1]$, the Minkowski interpolation from $K$ to $L$ is
\begin{align*}
K_t := (1-t)K+tL.
\end{align*}
[/definition]
The set $K_t$ is a convex body for each $t \in [0,1]$. At $t=0$ it is $K$, at $t=1$ it is $L$, and for intermediate $t$ it averages the bodies using the linear structure of the ambient space rather than a pointwise parametrisation of their boundaries.
This interpolation becomes useful once a geometric quantity can be controlled along the whole path.
The central problem is that volume itself does not behave linearly under Minkowski interpolation. The useful replacement is the volume radius, whose behaviour along the path is rigid enough to compare endpoints and intermediate bodies by a one-variable concavity principle.
[quotetheorem:4116]
[citeproof:4116]
This result is often called concavity of the volume radius. Convexity ensures that every interpolation set is again a convex body, so the path remains inside the same geometric category. Without convexity, the set $(1-t)A+tB$ may fill gaps or join separated components in ways that are not captured by the endpoints alone, and equality behaviour becomes much less rigid. The theorem also concerns the volume radius rather than volume itself: ordinary volume can be convex, concave, or neither along set addition depending on dimension and shape. Its main use is to turn geometric interpolation into a one-variable concavity statement that can be differentiated or compared.
[example: Interpolating Between Rectangles]
Let $K=[0,a]\times[0,b]$ and $L=[0,c]\times[0,d]$ in $\mathbb R^2$, where $a,b,c,d>0$ and $t\in[0,1]$. We compute the interpolation set coordinate by coordinate. Since
\begin{align*}
(1-t)K
&=[0,(1-t)a]\times[0,(1-t)b],\\
tL
&=[0,tc]\times[0,td],
\end{align*}
the same rectangle-sum argument as for axis-parallel boxes gives
\begin{align*}
(1-t)K+tL
=
[0,(1-t)a+tc]\times[0,(1-t)b+td].
\end{align*}
Thus
\begin{align*}
|(1-t)K+tL|
&=((1-t)a+tc)((1-t)b+td).
\end{align*}
Set
\begin{align*}
A(t)&=(1-t)a+tc=a+(c-a)t,\\
B(t)&=(1-t)b+td=b+(d-b)t,
\end{align*}
so the volume radius is
\begin{align*}
F(t)=|(1-t)K+tL|^{1/2}=\sqrt{A(t)B(t)}.
\end{align*}
Let $p=c-a$ and $q=d-b$. Then $A'(t)=p$, $B'(t)=q$, and, with $G(t)=A(t)B(t)$,
\begin{align*}
G'(t)&=A'(t)B(t)+A(t)B'(t)\\
&=pB(t)+qA(t),\\
G''(t)&=pB'(t)+qA'(t)\\
&=pq+qp\\
&=2pq.
\end{align*}
Since $F(t)=G(t)^{1/2}$ and $G(t)>0$,
\begin{align*}
F'(t)
&=\frac{G'(t)}{2G(t)^{1/2}},\\
F''(t)
&=\frac{G''(t)}{2G(t)^{1/2}}-\frac{(G'(t))^2}{4G(t)^{3/2}}\\
&=\frac{2G(t)G''(t)-(G'(t))^2}{4G(t)^{3/2}}\\
&=\frac{4pqA(t)B(t)-(pB(t)+qA(t))^2}{4(A(t)B(t))^{3/2}}\\
&=-\frac{(pB(t)-qA(t))^2}{4(A(t)B(t))^{3/2}}\\
&\le 0.
\end{align*}
Therefore $F$ is concave on $[0,1]$, matching the volume-radius concavity predicted by *Concavity Of Volume Radius*.
The concavity is linear exactly when $F''(t)=0$ for every $t$, which is equivalent to
\begin{align*}
pB(t)-qA(t)=0
\end{align*}
for every $t$. Expanding,
\begin{align*}
pB(t)-qA(t)
&=(c-a)(b+(d-b)t)-(d-b)(a+(c-a)t)\\
&=(c-a)b+(c-a)(d-b)t-(d-b)a-(d-b)(c-a)t\\
&=(c-a)b-(d-b)a\\
&=bc-ab-ad+ab\\
&=bc-ad.
\end{align*}
Hence linearity occurs exactly when $bc=ad$, equivalently
\begin{align*}
\frac{a}{b}=\frac{c}{d}.
\end{align*}
This is precisely the condition that the two rectangles have the same aspect ratio; otherwise the interpolation has strictly concave volume radius.
[/example]
## The Isoperimetric Inequality From Brunn-Minkowski
Why is the Euclidean ball the shape of least boundary area among bodies with fixed volume? The isoperimetric inequality is a surface-area consequence of Brunn-Minkowski. The bridge is to thicken a set by a small ball and compare the first-order change in volume.
[definition: Outer Parallel Body]
The outer parallel body operation is the map
\begin{align*}
\mathcal K^n \times [0,\infty) &\to \mathcal K^n,\\
(K,r) &\mapsto K+rB(0,1).
\end{align*}
[/definition]
The volume of $K+rB(0,1)$ records how much space is swept out by moving a ball of radius $r$ around $K$. A direct boundary-area formula would require smoothness, and convex bodies may have corners or flat faces where classical surface parametrisations are inconvenient. The limiting definition below measures surface area through the first-order growth of volume, which remains stable under convex approximation.
[illustration:outer-parallel-layer]
[definition: Surface Area Of A Convex Body]
The surface area functional is the map
\begin{align*}
S: \mathcal K^n \to [0,\infty)
\end{align*}
defined by
\begin{align*}
S(K):=\lim_{r \downarrow 0}\frac{|K+rB(0,1)|-|K|}{r}.
\end{align*}
[/definition]
The definition turns boundary size into the first-order change of volume under outer parallel expansion. With that interpretation in place, the natural question is whether Brunn-Minkowski controls this first variation strongly enough to force the sharp lower bound for surface area at fixed volume.
[quotetheorem:4117]
[citeproof:4117]
This proof shows that isoperimetry is not an isolated boundary phenomenon. It is a differential consequence of the additive inequality for volume. Convexity is used to make the parallel-body derivative well behaved and to keep the surface-area functional within the convex-geometric framework of the chapter. The theorem does not by itself treat arbitrary rough measurable sets, where boundary size may depend on the chosen notion of perimeter and may even be infinite. The surface-area hypothesis is therefore not cosmetic: without a meaningful first variation of volume, the displayed inequality has no well-defined left-hand side.
[example: The Ball Has The Sharp Constant]
Let $K=B(0,R)$ with $R>0$. Since $B(0,R)=RB(0,1)$, homogeneity from *Volume Under Affine Operations* gives
\begin{align*}
|K|
&=|B(0,R)|\\
&=|RB(0,1)|\\
&=R^n|B(0,1)|.
\end{align*}
For $r>0$, the Minkowski sum of concentric balls is
\begin{align*}
B(0,R)+rB(0,1)=B(0,R+r),
\end{align*}
because the same radial decomposition used for sums of balls gives both inclusions. Therefore
\begin{align*}
|K+rB(0,1)|
&=|B(0,R+r)|\\
&=(R+r)^n|B(0,1)|.
\end{align*}
Using the definition of surface area through outer parallel bodies,
\begin{align*}
S(K)
&=\lim_{r\downarrow 0}\frac{|K+rB(0,1)|-|K|}{r}\\
&=\lim_{r\downarrow 0}\frac{(R+r)^n|B(0,1)|-R^n|B(0,1)|}{r}\\
&=|B(0,1)|\lim_{r\downarrow 0}\frac{(R+r)^n-R^n}{r}.
\end{align*}
The last limit is the derivative of $x\mapsto x^n$ at $x=R$, so
\begin{align*}
S(K)=nR^{n-1}|B(0,1)|.
\end{align*}
Now compute both sides of the isoperimetric inequality:
\begin{align*}
S(K)^n
&=\left(nR^{n-1}|B(0,1)|\right)^n\\
&=n^n R^{n(n-1)}|B(0,1)|^n,
\end{align*}
while
\begin{align*}
n^n|B(0,1)|\,|K|^{n-1}
&=n^n|B(0,1)|\left(R^n|B(0,1)|\right)^{n-1}\\
&=n^n|B(0,1)|\,R^{n(n-1)}|B(0,1)|^{n-1}\\
&=n^n R^{n(n-1)}|B(0,1)|^n.
\end{align*}
Thus
\begin{align*}
S(K)^n=n^n|B(0,1)|\,|K|^{n-1},
\end{align*}
so balls attain equality and the constant in the isoperimetric inequality cannot be improved.
[/example]
## The Prekopa-Leindler Viewpoint
Can Brunn-Minkowski be seen as a statement about functions rather than sets? The Prekopa-Leindler inequality gives a functional analogue in which indicator functions of sets are replaced by nonnegative functions. It is a foundational link between convex geometry, log-concave functions, and probability.
[definition: Log-Concave Function]
A function $f: \mathbb R^n \to [0,\infty)$ is log-concave if
\begin{align*}
f((1-t)x+ty) \ge f(x)^{1-t} f(y)^t
\end{align*}
for all $x,y \in \mathbb R^n$ and all $t \in [0,1]$.
[/definition]
When $f=e^{-V}$ with $V: \mathbb R^n \to (-\infty,\infty]$ convex, this condition is exactly convexity of $V$ written multiplicatively. Indicator functions of convex sets are log-concave for $t \in (0,1)$: if $x,y \in K$, then $(1-t)x+ty \in K$, and if either point is outside $K$, the right-hand side is $0$.
To extend Brunn-Minkowski beyond indicators, we need an inequality whose hypothesis compares function values at interpolated points and whose conclusion compares integrals. This is the functional replacement for controlling the volume of Minkowski combinations of sets.
[quotetheorem:4118]
[citeproof:4118]
This inequality is imported here rather than developed in the chapter. Standard derivations use induction on dimension, optimal transport, or reduction to Brunn-Minkowski through level sets. For the present course, its role is to show that Brunn-Minkowski is one member of a larger family of concavity principles. The next result records the simplest passage back from functions to sets: replace functions by indicators and interpret the pointwise hypothesis as convex closure under Minkowski interpolation.
[quotetheorem:4119]
[citeproof:4119]
This functional viewpoint becomes important in high-dimensional convex geometry. Probability measures with log-concave densities behave like softened convex bodies: their level sets are convex, their marginals remain log-concave, and many concentration phenomena are governed by the same concavity mechanism.
[example: Gaussian Density Is Log-Concave]
Let $f: \mathbb R^n \to (0,\infty)$ be the standard Gaussian density
\begin{align*}
f(x)=(2\pi)^{-n/2}e^{-|x|^2/2}.
\end{align*}
Set
\begin{align*}
C=\frac n2\log(2\pi),
\qquad
V(x)=\frac{|x|^2}{2}+C.
\end{align*}
Then
\begin{align*}
e^{-V(x)}
&=e^{-|x|^2/2-C}\\
&=e^{-|x|^2/2}e^{-(n/2)\log(2\pi)}\\
&=e^{-|x|^2/2}(2\pi)^{-n/2}\\
&=f(x),
\end{align*}
so $f=e^{-V}$.
We verify the log-concavity inequality. Let $x,y\in\mathbb R^n$ and $t\in[0,1]$. Expanding the squared norm,
\begin{align*}
|(1-t)x+ty|^2
&=\langle (1-t)x+ty,(1-t)x+ty\rangle\\
&=(1-t)^2|x|^2+2t(1-t)\langle x,y\rangle+t^2|y|^2.
\end{align*}
Also,
\begin{align*}
(1-t)|x|^2+t|y|^2-|(1-t)x+ty|^2
&=(1-t)|x|^2+t|y|^2\\
&\quad-\left((1-t)^2|x|^2+2t(1-t)\langle x,y\rangle+t^2|y|^2\right)\\
&=t(1-t)|x|^2-2t(1-t)\langle x,y\rangle+t(1-t)|y|^2\\
&=t(1-t)|x-y|^2\\
&\ge 0.
\end{align*}
Therefore
\begin{align*}
|(1-t)x+ty|^2\le (1-t)|x|^2+t|y|^2.
\end{align*}
Adding the constant $C=(1-t)C+tC$ and dividing the quadratic terms by $2$ gives
\begin{align*}
V((1-t)x+ty)
&=\frac{|(1-t)x+ty|^2}{2}+C\\
&\le (1-t)\frac{|x|^2}{2}+t\frac{|y|^2}{2}+(1-t)C+tC\\
&=(1-t)V(x)+tV(y).
\end{align*}
Multiplying by $-1$ reverses the inequality, and exponentiating preserves it:
\begin{align*}
f((1-t)x+ty)
&=e^{-V((1-t)x+ty)}\\
&\ge e^{-(1-t)V(x)-tV(y)}\\
&=\left(e^{-V(x)}\right)^{1-t}\left(e^{-V(y)}\right)^t\\
&=f(x)^{1-t}f(y)^t.
\end{align*}
Thus the Gaussian density is log-concave, so Gaussian measure belongs to the same concavity framework as uniform measure on convex bodies, even though its support is all of $\mathbb R^n$.
[/example]
The chapter's main message is that volume is not merely a measure of size. Under Minkowski addition it has a concavity structure, and that structure controls equality, interpolation, isoperimetry, and functional inequalities.
The volume inequalities of Chapter 6 become more precise once mixed volumes are introduced. Chapter 7 turns the geometry of Minkowski addition into polynomial invariants, leading to the Steiner formula and the numerical data carried by convex bodies under parallel enlargement.
# 7. Mixed Volumes and the Steiner Formula
This chapter builds directly on Chapters 3 and 4: support functions linearize Minkowski addition, and Hausdorff convergence lets us pass from polytopes to general convex bodies. The goal in this chapter is to turn the geometric operation of adding convex bodies into computable numerical invariants. The main obstruction is that volume is nonlinear under Minkowski addition: usually $\operatorname{Vol}_n(K+L)$ is not determined by $\operatorname{Vol}_n(K)$ and $\operatorname{Vol}_n(L)$ alone. Mixed volumes solve this by recording all polynomial interaction terms at once.
## Volume as a Polynomial under Minkowski Addition
What information is contained in the volume of a Minkowski linear combination $t_1K_1+\cdots+t_mK_m$? For a single body, scaling gives $\operatorname{Vol}_n(tK)=t^n\operatorname{Vol}_n(K)$. For several bodies, the interaction terms record how the bodies face each other in different directions.
[definition: Minkowski Linear Combination]
Let $\mathcal K^n$ denote the class of convex bodies in $\mathbb R^n$. For fixed $m\in\mathbb N$ and $t=(t_1,\dots,t_m)\in[0,\infty)^m$, the Minkowski linear combination map is
\begin{align*}
\Phi_t:(\mathcal K^n)^m&\to \mathcal K^n,\\
\Phi_t(K_1,\dots,K_m)&=t_1K_1+\cdots+t_mK_m,
\end{align*}
where
\begin{align*}
t_1K_1+\cdots+t_mK_m
=\{t_1x_1+\cdots+t_mx_m : x_i\in K_i \text{ for }1\le i\le m\}.
\end{align*}
[/definition]
The restriction $t_i\ge 0$ is not cosmetic: allowing negative coefficients would usually destroy convex-body structure because there is no inverse operation for Minkowski addition inside $\mathcal K^n$. The support function viewpoint from earlier chapters gives
\begin{align*}
h_{t_1K_1+\cdots+t_mK_m}(u)=t_1h_{K_1}(u)+\cdots+t_mh_{K_m}(u),
\end{align*}
so Minkowski addition is linear after passing to support functions.
Since the bodies combine linearly at the support-function level, the next question is how ordinary volume responds to the coefficients $t_i$. The basic structural fact is that volume is not linear but becomes a homogeneous polynomial in those coefficients.
[quotetheorem:4120]
[citeproof:4120]
The hypotheses that the $K_i$ are convex bodies matter in two ways: convexity gives support functions and stable Minkowski sums, while compactness ensures finite volume and Hausdorff approximation by polytopes. The theorem does not say that volume is additive under Minkowski addition; the mixed terms are precisely the correction to that false expectation. It also does not identify the coefficients geometrically yet, only proves that they exist and are intrinsic. The next definition names these coefficients so that later inequalities and variation formulas can be stated multilinearly.
[definition: Mixed Volume]
Let $\mathcal K^n$ denote the class of convex bodies in $\mathbb R^n$. The mixed volume is the symmetric functional
\begin{align*}
V:(\mathcal K^n)^n\to \mathbb R
\end{align*}
defined by requiring that, for every finite family $K_1,\dots,K_m\in\mathcal K^n$,
\begin{align*}
\operatorname{Vol}_n(t_1K_1+\cdots+t_mK_m)
=\sum_{i_1,\dots,i_n=1}^{m} V(K_{i_1},\dots,K_{i_n})t_{i_1}\cdots t_{i_n}
\end{align*}
for all $t_1,\dots,t_m\ge 0$.
[/definition]
Because the sum is over ordered $n$-tuples, multinomial factors are already built into the expansion. For example, if only two bodies $K,L$ are present, then
\begin{align*}
\operatorname{Vol}_n(sK+tL)
=\sum_{j=0}^{n}\binom{n}{j}V(\underbrace{K,\dots,K}_{n-j},\underbrace{L,\dots,L}_{j})s^{n-j}t^j.
\end{align*}
[example: Mixed Area of Rectangles]
In $\mathbb R^2$, let
\begin{align*}
R=[0,a]\times[0,b],\qquad S=[0,c]\times[0,d],
\end{align*}
where $a,b,c,d>0$, and let $s,t\ge 0$. Since
\begin{align*}
sR&=[0,sa]\times[0,sb],\\
tS&=[0,tc]\times[0,td],
\end{align*}
their Minkowski sum is
\begin{align*}
sR+tS
&=\{(x_1+y_1,x_2+y_2):0\le x_1\le sa,\ 0\le x_2\le sb,\ 0\le y_1\le tc,\ 0\le y_2\le td\}\\
&=[0,sa+tc]\times[0,sb+td].
\end{align*}
Therefore its area is the product of its side lengths:
\begin{align*}
\operatorname{Area}(sR+tS)
&=(sa+tc)(sb+td)\\
&=sa\cdot sb+sa\cdot td+tc\cdot sb+tc\cdot td\\
&=s^2ab+st\,ad+st\,bc+t^2cd\\
&=s^2ab+st(ad+bc)+t^2cd.
\end{align*}
By the definition of mixed area in dimension $2$, the same polynomial must have the form
\begin{align*}
\operatorname{Area}(sR+tS)
&=s^2V(R,R)+stV(R,S)+tsV(S,R)+t^2V(S,S)\\
&=s^2V(R,R)+2stV(R,S)+t^2V(S,S),
\end{align*}
using symmetry of mixed volume in the second line. Comparing the coefficient of $st$ in the two displayed expansions gives
\begin{align*}
2V(R,S)=ad+bc,
\end{align*}
and hence
\begin{align*}
V(R,S)=\frac{ad+bc}{2}.
\end{align*}
Thus the mixed area records the two cross-pairings: the horizontal side of $R$ with the vertical side of $S$, and the horizontal side of $S$ with the vertical side of $R$.
[/example]
## Structural Properties of Mixed Volumes
Which features of ordinary volume survive after polarising it into mixed volume? Since mixed volume is defined as a coefficient of a volume polynomial, the basic properties of volume under translation, inclusion, and scaling become algebraic rules for $V$.
[quotetheorem:4121]
[citeproof:4121]
The hypotheses in this theorem are stronger than formal algebra would suggest: monotonicity depends on positivity of mixed area measures, not on a general rule that pointwise inequalities of polynomials imply coefficientwise inequalities. The statement also uses non-negative dilations because the Minkowski combination $aL+bM$ must remain a convex body. The normalization tells us that mixed volume is a genuine extension of volume, not a different invariant. Multilinearity makes it useful: once $K$ is decomposed into simpler Minkowski summands, mixed volumes can often be computed from the summands.
[remark: Positivity]
Mixed volumes of convex bodies are non-negative. This is compatible with monotonicity because every convex body contains at least one point and point bodies contribute zero unless all arguments together span $\mathbb R^n$ in the relevant mixed sense.
[/remark]
The warning in this remark is that positivity is not the same as strict positivity. Lower-dimensional bodies can have zero ordinary volume, but they still contribute to mixed volume when their directions complement one another. Segments are the cleanest test case because the mixed volume reduces to a determinant.
[example: Segments and Parallelepipeds]
Let $I_i=[0,v_i]\subset\mathbb R^n$, where $v_1,\dots,v_n\in\mathbb R^n$, and let $t_1,\dots,t_n\ge 0$. Since
\begin{align*}
t_iI_i
&=\{t_ix: x\in [0,v_i]\}\\
&=\{t_i(s_iv_i):0\le s_i\le 1\}\\
&=\{s_it_iv_i:0\le s_i\le 1\},
\end{align*}
their Minkowski sum is
\begin{align*}
t_1I_1+\cdots+t_nI_n
&=\left\{x_1+\cdots+x_n:x_i\in t_iI_i\right\}\\
&=\left\{\sum_{i=1}^{n}s_it_iv_i:0\le s_i\le 1\right\}.
\end{align*}
Let $A$ be the linear map whose $i$-th column is $t_iv_i$. The displayed set is $A([0,1]^n)$. If $v_1,\dots,v_n$ are linearly dependent, then $A([0,1]^n)$ lies in a proper affine subspace, so its $n$-dimensional volume is $0$, and also
\begin{align*}
\det(t_1v_1,\dots,t_nv_n)=t_1\cdots t_n\det(v_1,\dots,v_n)=0.
\end{align*}
If $v_1,\dots,v_n$ are linearly independent, the change-of-variables formula gives
\begin{align*}
\operatorname{Vol}_n(A([0,1]^n))
&=|\det A|\,\operatorname{Vol}_n([0,1]^n)\\
&=|\det(t_1v_1,\dots,t_nv_n)|\cdot 1\\
&=|t_1\cdots t_n\det(v_1,\dots,v_n)|\\
&=t_1\cdots t_n|\det(v_1,\dots,v_n)|.
\end{align*}
Thus in all cases,
\begin{align*}
\operatorname{Vol}_n(t_1I_1+\cdots+t_nI_n)
=t_1\cdots t_n|\det(v_1,\dots,v_n)|.
\end{align*}
On the other hand, the ordered mixed-volume expansion gives
\begin{align*}
\operatorname{Vol}_n(t_1I_1+\cdots+t_nI_n)
=\sum_{i_1,\dots,i_n=1}^{n}V(I_{i_1},\dots,I_{i_n})t_{i_1}\cdots t_{i_n}.
\end{align*}
The monomial $t_1\cdots t_n$ occurs exactly when $(i_1,\dots,i_n)$ is a permutation of $(1,\dots,n)$. There are $n!$ such permutations, and symmetry of mixed volume makes all corresponding coefficients equal to $V(I_1,\dots,I_n)$. Therefore the coefficient of $t_1\cdots t_n$ in the ordered expansion is
\begin{align*}
n!V(I_1,\dots,I_n).
\end{align*}
Comparing this coefficient with the coefficient $|\det(v_1,\dots,v_n)|$ in the volume formula gives
\begin{align*}
n!V(I_1,\dots,I_n)=|\det(v_1,\dots,v_n)|,
\end{align*}
so
\begin{align*}
V(I_1,\dots,I_n)=\frac{1}{n!}|\det(v_1,\dots,v_n)|.
\end{align*}
The factor $n!$ appears because the ordered polynomial expansion counts the same fully mixed monomial once for each ordering of the $n$ segment arguments.
[/example]
## Quermassintegrals and Parallel Bodies
How does the volume of a convex body change when it is thickened by a Euclidean ball? This question turns mixed volumes into geometric coefficients: the first-order term is surface area, and the higher-order terms are averaged curvature-type quantities.
Recall from Chapter 4 that the outer parallel body of $K$ at radius $r\ge 0$ is
\begin{align*}
K+rB=\{x+ry:x\in K,\ y\in B\},
\end{align*}
where $B=B(0,1)$ is the Euclidean unit ball.
The set $K+rB$ consists of all points whose distance from $K$ is at most $r$. It is again a convex body, and its support function is $h_K+r$ on the unit sphere. Without the ball, there is no canonical notion of thickening: adding a cube, simplex, or arbitrary convex body would produce anisotropic expansions with different coefficients.
[illustration:convex-body-distance-offset]
[definition: Quermassintegral]
Let $\mathcal K^n$ denote the class of convex bodies in $\mathbb R^n$, and let $B=B(0,1)$. For each $0\le j\le n$, the $j$-th quermassintegral is the functional
\begin{align*}
W_j:\mathcal K^n&\to\mathbb R,\\
W_j(K)&=V(\underbrace{K,\dots,K}_{n-j},\underbrace{B,\dots,B}_{j}).
\end{align*}
[/definition]
Thus $W_0(K)=\operatorname{Vol}_n(K)$ and $W_n(K)=\operatorname{Vol}_n(B)$. The intermediate values interpolate between volume and the volume of the Euclidean ball. Their role is to record how much volume is created when $K$ is thickened uniformly in every direction. The key question is whether the entire function $r\mapsto \operatorname{Vol}_n(K+rB)$ is controlled by these finitely many coefficients, rather than by more complicated features of the boundary.
[quotetheorem:4122]
[citeproof:4122]
The hypothesis $r\ge 0$ is essential because $K+rB$ is an outer thickening; negative $r$ would correspond to erosion, which can change combinatorial type and need not be governed by the same polynomial down to arbitrary radius. The use of the Euclidean ball is also essential for the classical intrinsic coefficients $W_j(K)$; replacing $B$ by another convex body gives a relative mixed-volume expansion instead. The Steiner formula is a precise version of the idea that a thickened body gains volume from its interior, boundary, edges, and higher-codimension features. For nonsmooth bodies these features are not literal smooth curvatures, but the polynomial coefficients still exist and behave continuously under approximation.
[quotetheorem:4123]
[citeproof:4123]
This theorem explains why the first mixed coefficient is not merely formal: it is the infinitesimal rate at which volume changes under uniform thickening. The definition of $S(K)$ through first variation is important for nonsmooth convex bodies, where a classical smooth surface integral may not exist. The statement does not identify the higher coefficients with curvatures in full generality; that requires additional regularity or the language of curvature measures. In the plane, however, the first variation has the familiar interpretation as perimeter.
[example: Recovering Perimeter from a Planar Parallel Body]
Let $K\subset\mathbb R^2$ be a convex body and let $B$ be the unit disk. Applying the *Steiner Formula* in dimension $2$ gives
\begin{align*}
\operatorname{Area}(K+rB)
&=\sum_{j=0}^{2}\binom{2}{j}W_j(K)r^j\\
&=\binom{2}{0}W_0(K)+\binom{2}{1}W_1(K)r+\binom{2}{2}W_2(K)r^2\\
&=W_0(K)+2W_1(K)r+W_2(K)r^2.
\end{align*}
By the definition of the quermassintegrals,
\begin{align*}
W_0(K)&=V(K,K)=\operatorname{Area}(K),\\
W_2(K)&=V(B,B)=\operatorname{Area}(B)=\pi,
\end{align*}
so
\begin{align*}
\operatorname{Area}(K+rB)
=\operatorname{Area}(K)+2W_1(K)r+\pi r^2.
\end{align*}
The theorem *Surface Area as a Mixed Volume Coefficient* gives $S(K)=2W_1(K)$ in dimension $2$, and planar surface area is perimeter, so
\begin{align*}
\operatorname{Per}(K)=2W_1(K).
\end{align*}
Substituting this into the preceding expansion yields
\begin{align*}
\operatorname{Area}(K+rB)
=\operatorname{Area}(K)+\operatorname{Per}(K)r+\pi r^2.
\end{align*}
For the rectangle $K=[0,a]\times[0,b]$, we have
\begin{align*}
\operatorname{Area}(K)&=ab,\\
\operatorname{Per}(K)&=a+b+a+b=2(a+b).
\end{align*}
Therefore
\begin{align*}
\operatorname{Area}(K+rB)
&=ab+2(a+b)r+\pi r^2.
\end{align*}
Geometrically, this separates into the original rectangle of area $ab$, two horizontal strips of total area $2ar$, two vertical strips of total area $2br$, and four quarter-disks of radius $r$ whose total area is $\pi r^2$.
[/example]
## Reading Mixed Coefficients Geometrically
What should the higher quermassintegrals be thought of as? In dimension two there are only three coefficients: area, half-perimeter, and disk area. In dimension three, the same expansion separates volume, surface area, integrated mean-width type data, and the ball volume.
[example: The Three-Dimensional Steiner Expansion]
Let $K\subset\mathbb R^3$ be a convex body and let $B$ be the unit ball. Applying the *Steiner Formula* with $n=3$ gives
\begin{align*}
\operatorname{Vol}_3(K+rB)
&=\sum_{j=0}^{3}\binom{3}{j}W_j(K)r^j\\
&=\binom{3}{0}W_0(K)+\binom{3}{1}W_1(K)r+\binom{3}{2}W_2(K)r^2+\binom{3}{3}W_3(K)r^3\\
&=W_0(K)+3W_1(K)r+3W_2(K)r^2+W_3(K)r^3.
\end{align*}
The first coefficient is
\begin{align*}
W_0(K)=V(K,K,K)=\operatorname{Vol}_3(K),
\end{align*}
and the theorem *Surface Area as a Mixed Volume Coefficient* gives
\begin{align*}
S(K)=3V(K,K,B)=3W_1(K).
\end{align*}
For the last coefficient,
\begin{align*}
W_3(K)=V(B,B,B)=\operatorname{Vol}_3(B).
\end{align*}
Since the horizontal slice of the unit ball at height $z\in[-1,1]$ is a disk of radius $\sqrt{1-z^2}$, its area is $\pi(1-z^2)$, so
\begin{align*}
\operatorname{Vol}_3(B)
&=\int_{-1}^{1}\pi(1-z^2)\,dz\\
&=\pi\left[z-\frac{z^3}{3}\right]_{-1}^{1}\\
&=\pi\left(\left(1-\frac{1}{3}\right)-\left(-1+\frac{1}{3}\right)\right)\\
&=\pi\left(\frac{2}{3}+\frac{2}{3}\right)\\
&=\frac{4\pi}{3}.
\end{align*}
Thus
\begin{align*}
\operatorname{Vol}_3(K+rB)
=\operatorname{Vol}_3(K)+S(K)r+3W_2(K)r^2+\frac{4\pi}{3}r^3.
\end{align*}
In dimension three, the Steiner coefficients separate the volume term, the surface-area term, the second-order curvature term $3W_2(K)$, and the universal ball-volume term.
[/example]
This calculation also explains why mixed volumes are not decorative coefficients: they isolate the interaction terms that ordinary volume hides.
[remark: Why Mixed Volumes Matter]
Mixed volumes turn a nonlinear invariant, volume, into a symmetric multilinear object. This is the mechanism behind later inequalities: Brunn--Minkowski, Minkowski's first inequality, and Alexandrov--Fenchel can all be read as positivity or concavity statements about the coefficients introduced here. The Steiner formula is the first major application because it extracts geometric boundary data from the same polynomial.
[/remark]
Mixed volumes provide coefficients, but the deeper question is what inequalities those coefficients satisfy. Chapter 8 studies the Minkowski and Alexandrov-Fenchel inequalities, which govern the strongest concavity and positivity phenomena in the Brunn-Minkowski theory.
# 8. Minkowski and Alexandrov-Fenchel Inequalities
This chapter studies the strongest volume inequalities that arise from the mixed-volume polynomial. Chapters 6 and 7 introduced Brunn--Minkowski, the Steiner formula, and mixed volumes; here the focus shifts to the inequalities satisfied by the coefficients that appear when several convex bodies are added at once. Minkowski's first inequality is the first bridge between Brunn--Minkowski and mixed volumes, while the Alexandrov--Fenchel inequality is the deep structural result from which many classical inequalities follow.
## From Brunn--Minkowski to Minkowski's First Inequality
What does the Brunn--Minkowski inequality say about the first-order change of volume under Minkowski addition? If $K,L \subset \mathbb R^n$ are convex bodies, then the function $t \mapsto V(K+tL)^{1/n}$ is concave on $[0,\infty)$ in the sense forced by Brunn--Minkowski. Differentiating this concavity at $t=0$ leads to an inequality for the mixed volume $V(K[n-1],L)$.
[definition: Mixed Volume Notation]
Let $K_1,\dots,K_m \subset \mathbb R^n$ be convex bodies and let $i_1+\cdots+i_m=n$. The notation
\begin{align*}
V(K_1[i_1],\dots,K_m[i_m])
\end{align*}
means the mixed volume in which $K_j$ appears $i_j$ times.
[/definition]
Thus $V(K[n])=V(K)$ is ordinary $n$-dimensional volume, and $V(K[n-1],L)$ is the coefficient measuring the first variation of $V(K+tL)$ at $t=0$.
To turn this notation into a usable differential statement, one needs a precise link between the polynomial expansion of $V(K+tL)$ and the derivative at the origin. The issue is normalization: the coefficient of $t$ in the mixed-volume polynomial differs from the derivative by the factor coming from the $n$ possible positions of $L$ among the arguments. The next formula fixes that convention before inequalities are applied.
[quotetheorem:4124]
[citeproof:4124]
The formula turns a geometric infinitesimal question into a coefficient extraction problem. It says that the first-order expansion of volume under the perturbation $K+tL$ is governed by the mixed volume with one copy of $L$ and $n-1$ copies of $K$. This interpretation is important because it separates the analytic operation of differentiating from the geometric content of the coefficient. Minkowski's first inequality is the resulting lower bound on this first coefficient.
[quotetheorem:4125]
[citeproof:4125]
[example: Euclidean Balls in Minkowski First Inequality]
Let $K=B(0,r)$ and $L=B(0,s)$ in $\mathbb R^n$, with $r,s>0$, and write $\omega_n=V(B(0,1))$. For $t\ge 0$, Minkowski addition of concentric Euclidean balls gives
\begin{align*}
K+tL
&=B(0,r)+tB(0,s) \\
&=B(0,r)+B(0,ts) \\
&=B(0,r+ts),
\end{align*}
because the sum of two vectors of lengths at most $r$ and $ts$ has length at most $r+ts$, and every vector of length at most $r+ts$ can be split into parallel vectors of lengths at most $r$ and $ts$.
Therefore
\begin{align*}
V(K+tL)
&=V(B(0,r+ts)) \\
&=\omega_n(r+ts)^n \\
&=\omega_n\sum_{j=0}^{n}\binom{n}{j}r^{n-j}(ts)^j \\
&=\sum_{j=0}^{n}\binom{n}{j}t^j\omega_n r^{n-j}s^j.
\end{align*}
The mixed-volume polynomial is
\begin{align*}
V(K+tL)=\sum_{j=0}^{n}\binom{n}{j}t^jV(K[n-j],L[j]).
\end{align*}
Comparing the coefficient of $t$ in the two displayed expansions gives
\begin{align*}
nV(K[n-1],L)=n\omega_n r^{n-1}s,
\end{align*}
so
\begin{align*}
V(K[n-1],L)=\omega_n r^{n-1}s.
\end{align*}
Also $V(K)=\omega_n r^n$ and $V(L)=\omega_n s^n$, hence
\begin{align*}
V(K[n-1],L)^n
&=(\omega_n r^{n-1}s)^n \\
&=\omega_n^n r^{n(n-1)}s^n,
\end{align*}
while
\begin{align*}
V(K)^{n-1}V(L)
&=(\omega_n r^n)^{n-1}(\omega_n s^n) \\
&=\omega_n^{n-1}r^{n(n-1)}\omega_n s^n \\
&=\omega_n^n r^{n(n-1)}s^n.
\end{align*}
Thus $V(K[n-1],L)^n=V(K)^{n-1}V(L)$, exactly as expected since the two concentric balls are homothetic.
[/example]
The first inequality can be read as a mixed-volume form of the isoperimetric principle. When $L=B(0,1)$, the quantity $nV(K[n-1],B(0,1))$ is the surface area term identified in Chapter 7's Steiner formula, so the inequality recovers the route from Brunn--Minkowski to the classical isoperimetric inequality developed in Chapter 6.
## The Alexandrov--Fenchel Inequality
Minkowski's first inequality controls the mixed volume when all but one of the entries agree. What happens when two entries vary while the remaining $n-2$ entries are fixed? The Alexandrov--Fenchel inequality answers this and is the central convex-geometric strengthening of the Cauchy--Schwarz inequality.
[quotetheorem:4126]
[citeproof:4126]
Alexandrov--Fenchel is deeper than Brunn--Minkowski because it controls the mixed-volume form itself, not only the one-parameter volume radius. After freezing $n-2$ entries, the inequality says that the remaining two slots obey a rigid quadratic log-concavity law. This is why so many coefficient inequalities for parallel bodies and mixed volumes can be recovered from it.
[remark: Quadratic Mixed-Volume Analogy]
One useful way to remember the shape of Alexandrov--Fenchel is to freeze $K_3,\dots,K_n$ and view $V(\,\cdot\,,\,\cdot\,,K_3,\dots,K_n)$ as a symmetric two-variable mixed-volume pairing. The theorem says that the mixed term controls the two diagonal terms in a log-concavity inequality. This is an analogy for the algebraic form of the inequality, not a claim that the mixed-volume pairing behaves like an ordinary positive definite inner product.
[/remark]
The inequality becomes especially transparent in the plane, where mixed area can be compared directly with ordinary area.
[example: Planar Mixed Area Inequality]
In dimension $2$, the Alexandrov--Fenchel inequality has no fixed auxiliary bodies, so it reads
\begin{align*}
V(K,L)^2 \ge V(K)V(L).
\end{align*}
Take parallel rectangles $K$ and $L$ with side lengths $(a,b)$ and $(c,d)$, where $a,b,c,d>0$. Up to translation, write
\begin{align*}
K=[0,a]\times[0,b],
\qquad
L=[0,c]\times[0,d].
\end{align*}
For $t\ge 0$,
\begin{align*}
K+tL
&=([0,a]\times[0,b])+([0,tc]\times[0,td]) \\
&=[0,a+tc]\times[0,b+td],
\end{align*}
because addition in each coordinate gives $[0,a]+[0,tc]=[0,a+tc]$ and $[0,b]+[0,td]=[0,b+td]$. Hence
\begin{align*}
V(K+tL)
&=(a+tc)(b+td) \\
&=ab+a(td)+(tc)b+(tc)(td) \\
&=ab+t(ad+bc)+t^2cd.
\end{align*}
The mixed-area polynomial in dimension $2$ is
\begin{align*}
V(K+tL)=V(K)+2tV(K,L)+t^2V(L).
\end{align*}
Since $V(K)=ab$ and $V(L)=cd$, comparison of the coefficient of $t$ gives
\begin{align*}
2V(K,L)=ad+bc,
\end{align*}
so
\begin{align*}
V(K,L)=\frac{ad+bc}{2}.
\end{align*}
Therefore the mixed-area inequality becomes
\begin{align*}
\left(\frac{ad+bc}{2}\right)^2 &\ge abcd.
\end{align*}
Multiplying by $4$ gives the equivalent inequality
\begin{align*}
(ad+bc)^2 &\ge 4abcd.
\end{align*}
The difference between the two sides is
\begin{align*}
(ad+bc)^2-4abcd
&=a^2d^2+2abcd+b^2c^2-4abcd \\
&=a^2d^2-2abcd+b^2c^2 \\
&=(ad-bc)^2 \\
&\ge 0.
\end{align*}
Thus the planar mixed-area inequality for parallel rectangles is exactly the elementary inequality $(ad+bc)^2\ge 4abcd$, with equality precisely when $ad=bc$.
[/example]
The planar case is useful because it shows the shape of the general theorem without the analytic machinery. The expression $V(K,L)$ behaves like a geometric inner product, but its inequality reflects concavity of area under Minkowski addition.
## Mixed Discriminants and Ellipsoids
Why do ellipsoids provide a concrete test case for the mixed-volume inequalities? An ellipsoid can be encoded by a positive definite matrix, and the mixed volumes of aligned ellipsoidal data are closely related to mixed discriminants. This gives a finite-dimensional algebraic model for Alexandrov--Fenchel.
[definition: Mixed Discriminant]
Let $A_1,\dots,A_n$ be real symmetric $n\times n$ matrices. The mixed discriminant $D(A_1,\dots,A_n)$ is the coefficient of $t_1\cdots t_n$ in
\begin{align*}
\det(t_1A_1+\cdots+t_nA_n).
\end{align*}
[/definition]
The definition is multilinear and symmetric in the matrices. When all entries are equal, $D(A,\dots,A)=n!\det A$ under this coefficient convention.
The mixed discriminant is useful because positive semidefinite matrices behave like an algebraic shadow of convex bodies. The determinant is the analogue of volume, and the coefficient of $t_1\cdots t_n$ plays the role of a mixed volume. Alexandrov--Fenchel therefore predicts a quadratic log-concavity inequality for these coefficients, but now the statement can be made entirely in matrix language.
[quotetheorem:4127]
[citeproof:4127]
[example: Diagonal Mixed Discriminants]
Suppose $A_j=\operatorname{diag}(a_{j1},\dots,a_{jn})$ with all $a_{ji}\ge 0$. Since sums of diagonal matrices are diagonal, we have
\begin{align*}
t_1A_1+\cdots+t_nA_n
&=\operatorname{diag}\left(\sum_{j=1}^n t_j a_{j1},\dots,\sum_{j=1}^n t_j a_{jn}\right).
\end{align*}
The determinant of a diagonal matrix is the product of its diagonal entries, so
\begin{align*}
\det(t_1A_1+\cdots+t_nA_n)
&=\prod_{i=1}^n\left(\sum_{j=1}^n t_j a_{ji}\right).
\end{align*}
To obtain the coefficient of $t_1\cdots t_n$, one must choose from the $i$th factor a term $t_{\sigma(i)}a_{\sigma(i),i}$, and each variable $t_1,\dots,t_n$ must be chosen exactly once. Thus $\sigma$ ranges over all permutations of $\{1,\dots,n\}$, and
\begin{align*}
D(A_1,\dots,A_n)
&=\sum_{\sigma\in S_n}\prod_{i=1}^n a_{\sigma(i),i}.
\end{align*}
This is the permanent of the matrix $(a_{ji})$, up to the harmless convention of indexing rows by $j$ and columns by $i$.
For example, in dimension $2$, let
\begin{align*}
A=\operatorname{diag}(a_1,a_2),
\qquad
B=\operatorname{diag}(b_1,b_2).
\end{align*}
Then
\begin{align*}
\det(t_1A+t_2B)
&=(t_1a_1+t_2b_1)(t_1a_2+t_2b_2) \\
&=t_1^2a_1a_2+t_1t_2a_1b_2+t_1t_2b_1a_2+t_2^2b_1b_2,
\end{align*}
so
\begin{align*}
D(A,B)=a_1b_2+b_1a_2.
\end{align*}
Similarly,
\begin{align*}
\det(t_1A+t_2A)
&=(t_1+t_2)^2a_1a_2 \\
&=t_1^2a_1a_2+2t_1t_2a_1a_2+t_2^2a_1a_2,
\end{align*}
hence $D(A,A)=2a_1a_2$, and likewise $D(B,B)=2b_1b_2$. The inequality
\begin{align*}
D(A,B)^2\ge D(A,A)D(B,B)
\end{align*}
therefore becomes
\begin{align*}
(a_1b_2+b_1a_2)^2
&\ge (2a_1a_2)(2b_1b_2) \\
&=4a_1a_2b_1b_2.
\end{align*}
The difference between the two sides is
\begin{align*}
(a_1b_2+b_1a_2)^2-4a_1a_2b_1b_2
&=a_1^2b_2^2+2a_1a_2b_1b_2+b_1^2a_2^2-4a_1a_2b_1b_2 \\
&=a_1^2b_2^2-2a_1a_2b_1b_2+b_1^2a_2^2 \\
&=(a_1b_2-b_1a_2)^2 \\
&\ge 0.
\end{align*}
Thus, for diagonal matrices, mixed discriminants become permanental coefficients, and the Alexandrov--Fenchel inequality becomes an explicit log-concavity inequality between sums of products.
[/example]
This algebraic example is not a substitute for the geometric theorem, but it is a reliable way to remember its form. Mixed volumes generalise determinant-like coefficients, and Alexandrov--Fenchel says that these coefficients satisfy a robust log-concavity law.
## Log-Concavity of Quermassintegrals
The next question is what Alexandrov--Fenchel says when most entries are Euclidean balls. This specialisation produces the quermassintegrals, the coefficients in the Steiner formula for parallel bodies. Their log-concavity is one of the main usable consequences of Alexandrov--Fenchel.
[definition: Quermassintegrals]
Let $K \subset \mathbb R^n$ be a convex body and let $B=B(0,1)$. The quermassintegrals $W_0(K),\dots,W_n(K)$ are defined by the Steiner formula
\begin{align*}
V(K+tB)=\sum_{j=0}^{n}\binom{n}{j}W_j(K)t^j, \qquad t\ge 0.
\end{align*}
[/definition]
Thus $W_0(K)=V(K)$, $nW_1(K)$ is the surface area when $K$ has sufficiently regular boundary, and $W_n(K)=V(B)$. In mixed-volume notation,
\begin{align*}
W_j(K)=V(K[n-j],B[j]).
\end{align*}
Since each $W_j(K)$ is a mixed volume with copies of the Euclidean ball inserted, Alexandrov--Fenchel should impose concavity constraints on the whole Steiner coefficient sequence. The useful form is not ordinary concavity but log-concavity, because the mixed-volume inequality compares the square of one coefficient with its two neighbours. This gives a numerical regularity theorem for parallel-body volumes.
[quotetheorem:4128]
[citeproof:4128]
[example: Quermassintegrals of a Ball]
Let $K=B(0,r)$ with $r>0$, and let $B=B(0,1)$. For $t\ge 0$,
\begin{align*}
K+tB
&=B(0,r)+tB(0,1) \\
&=B(0,r)+B(0,t) \\
&=B(0,r+t),
\end{align*}
because if $\|x\|\le r$ and $\|y\|\le t$, then $\|x+y\|\le r+t$, and conversely every $z$ with $\|z\|\le r+t$ can be written as
\begin{align*}
z=\frac{r}{r+t}z+\frac{t}{r+t}z,
\end{align*}
where the two summands have norms at most $r$ and $t$ respectively.
Therefore
\begin{align*}
V(K+tB)
&=V(B(0,r+t)) \\
&=\omega_n(r+t)^n \\
&=\omega_n\sum_{j=0}^{n}\binom{n}{j}r^{n-j}t^j \\
&=\sum_{j=0}^{n}\binom{n}{j}\omega_n r^{n-j}t^j.
\end{align*}
The Steiner formula writes the same volume as
\begin{align*}
V(K+tB)=\sum_{j=0}^{n}\binom{n}{j}W_j(K)t^j.
\end{align*}
Comparing the coefficient of $t^j$ in the two polynomial expansions gives
\begin{align*}
\binom{n}{j}W_j(K)=\binom{n}{j}\omega_n r^{n-j},
\end{align*}
and since $\binom{n}{j}\ne 0$,
\begin{align*}
W_j(K)=\omega_n r^{n-j}.
\end{align*}
For $1\le j\le n-1$, this gives
\begin{align*}
W_j(K)^2
&=(\omega_n r^{n-j})^2 \\
&=\omega_n^2 r^{2n-2j},
\end{align*}
while
\begin{align*}
W_{j-1}(K)W_{j+1}(K)
&=(\omega_n r^{n-(j-1)})(\omega_n r^{n-(j+1)}) \\
&=(\omega_n r^{n-j+1})(\omega_n r^{n-j-1}) \\
&=\omega_n^2 r^{(n-j+1)+(n-j-1)} \\
&=\omega_n^2 r^{2n-2j}.
\end{align*}
Thus $W_j(K)^2=W_{j-1}(K)W_{j+1}(K)$ for every $1\le j\le n-1$, exactly the equality pattern produced by the fact that $K$ and $B$ are homothetic balls.
[/example]
The log-concavity theorem says that the sequence of intrinsic volume coefficients has no internal dips on a logarithmic scale. This is the coefficient-level shadow of Alexandrov--Fenchel.
## Generalized Isoperimetric Inequalities
How do the preceding coefficient inequalities recover and extend the classical isoperimetric inequality? The classical inequality compares volume and surface area; the generalized inequalities compare all neighbouring quermassintegrals. Alexandrov--Fenchel supplies the log-concavity needed to interpolate between the endpoints $W_0(K)$ and $W_n(K)$.
[quotetheorem:4129]
[citeproof:4129]
Taking $i=0$ and $j=1$ recovers the usual isoperimetric inequality in the form relating $V(K)$ and surface area. Larger values of $j$ compare volume with mean width, total mean curvature, and the higher intrinsic volumes.
[example: Recovering the Classical Isoperimetric Inequality]
Let $S(K)=nW_1(K)$ denote surface area and $V(K)=W_0(K)$. Applying the generalized isoperimetric inequality with $i=0$ and $j=1$ gives
\begin{align*}
\left(\frac{W_0(K)}{W_0(B)}\right)^{1/n}
&\le
\left(\frac{W_1(K)}{W_1(B)}\right)^{1/(n-1)}.
\end{align*}
For the unit ball $B=B(0,1)$, we have $W_0(B)=V(B)=\omega_n$ and $W_1(B)=\omega_n$, while $W_0(K)=V(K)$ and $W_1(K)=S(K)/n$. Substituting these identities gives
\begin{align*}
\left(\frac{V(K)}{\omega_n}\right)^{1/n}
&\le
\left(\frac{S(K)/n}{\omega_n}\right)^{1/(n-1)} \\
&=
\left(\frac{S(K)}{n\omega_n}\right)^{1/(n-1)}.
\end{align*}
Both sides are nonnegative, so raising to the power $n(n-1)$ preserves the inequality:
\begin{align*}
\left(\left(\frac{V(K)}{\omega_n}\right)^{1/n}\right)^{n(n-1)}
&\le
\left(\left(\frac{S(K)}{n\omega_n}\right)^{1/(n-1)}\right)^{n(n-1)}.
\end{align*}
Using $(x^\alpha)^\beta=x^{\alpha\beta}$ for $x\ge 0$, this becomes
\begin{align*}
\left(\frac{V(K)}{\omega_n}\right)^{n-1}
&\le
\left(\frac{S(K)}{n\omega_n}\right)^n.
\end{align*}
Expanding the two sides gives
\begin{align*}
\frac{V(K)^{n-1}}{\omega_n^{n-1}}
&\le
\frac{S(K)^n}{n^n\omega_n^n}.
\end{align*}
Multiplying by $n^n\omega_n^n$ yields
\begin{align*}
n^n\omega_n V(K)^{n-1}
&\le
S(K)^n,
\end{align*}
or equivalently
\begin{align*}
S(K)^n \ge n^n\omega_n V(K)^{n-1}.
\end{align*}
Thus the generalized isoperimetric inequality recovers the classical isoperimetric inequality, with equality for Euclidean balls.
[/example]
The hierarchy of inequalities shows why Alexandrov--Fenchel is often treated as the master inequality of the Brunn--Minkowski theory. Brunn--Minkowski controls volume under addition, Minkowski's first inequality controls the first mixed coefficient, and Alexandrov--Fenchel controls the whole family of mixed coefficients.
The mixed-volume inequalities point toward a canonical way to normalize arbitrary convex bodies. Chapter 9 uses John ellipsoids to compare a general body with a Euclidean ball after an affine change of coordinates, linking extremal geometry to approximation.
# 9. John Ellipsoid and Approximation by Euclidean Balls
This chapter returns from mixed-volume inequalities to affine normalisation: it explains how an arbitrary convex body can be approximated, after an affine change of coordinates, by a Euclidean ball. It uses the earlier material on convex bodies, supporting hyperplanes, affine maps, and Euclidean volume. The central object is the maximum-volume ellipsoid contained in the body, called the John ellipsoid, and the main theorem says that the quality of this approximation is controlled only by the dimension, with a better constant for centrally symmetric bodies.
## The Maximum-Volume Inscribed Ellipsoid
Given a convex body $K \subset \mathbb R^n$, the first problem is to find the best ellipsoid lying inside $K$. A largest Euclidean ball is not the right affine invariant object: after stretching coordinates, a ball becomes an ellipsoid, and the numerical radius of the largest ball can change dramatically. Ellipsoids are exactly affine images of the Euclidean ball, so maximizing the volume of an inscribed ellipsoid is the affine version of asking for the largest round object that fits inside a possibly very non-round body.
For John ellipsoids the relevant case is a full-dimensional convex body: a non-empty compact convex set $K \subset \mathbb R^n$ with non-empty interior. Compactness ensures finite volume and prevents maximizing ellipsoids from escaping, while non-empty interior gives room for an inscribed ellipsoid of positive volume. Convexity is also essential: without it, a set could contain large pieces far apart from each other while admitting no single ellipsoid that reflects its global shape. These hypotheses make the maximum-volume problem geometric rather than pathological.
[definition: Ellipsoid]
An ellipsoid in $\mathbb R^n$ is a set of the form
\begin{align*}
E = a + T B(0,1),
\end{align*}
where $a \in \mathbb R^n$ and $T: \mathbb R^n \to \mathbb R^n$ is an invertible linear map.
[/definition]
The volume of this ellipsoid is $\mathcal L^n(E) = |\det T|\mathcal L^n(B(0,1))$. Thus maximizing the volume of an inscribed ellipsoid is equivalent to maximizing $|\det T|$ among affine images $a+TB(0,1)$ contained in $K$.
A convex body may contain many inscribed ellipsoids with different centers, axes, and volumes, so there is no canonical choice until one imposes an optimization criterion. Volume is the affine-invariant quantity that singles out the best-fitting inscribed ellipsoid, provided the supremum is actually attained. The term below names such a maximizing ellipsoid, separating the definition of the object from the later existence and uniqueness theorem.
[definition: John Ellipsoid]
Let $K \subset \mathbb R^n$ be a convex body. A John ellipsoid of $K$ is an ellipsoid $E \subset K$ such that
\begin{align*}
\mathcal L^n(E) = \sup\{\mathcal L^n(F) : F \subset K \text{ is an ellipsoid}\}.
\end{align*}
[/definition]
The definition raises two separate issues: a largest-volume inscribed ellipsoid must actually exist, and it must be unique if it is to serve as a canonical approximation to $K$. Existence is not automatic from the definition alone, because one has to rule out degenerating ellipsoids and control maximizing sequences inside the compact body. Uniqueness is equally important, since otherwise symmetries of $K$ would not determine a single normalized position.
[quotetheorem:4130]
[citeproof:4130]
The theorem is not a constructive algorithm for finding the ellipsoid; it says that the optimization problem has a well-defined answer. Compactness prevents maximizing sequences from escaping to infinity, non-empty interior rules out the degenerate zero-volume case, and convexity ensures that the limit of admissible ellipsoids remains controlled by supporting hyperplanes. Uniqueness is the feature that lets the John ellipsoid behave canonically under symmetries and affine changes of coordinates. This leads to the next structural fact: applying an invertible affine map to $K$ sends its John ellipsoid to the John ellipsoid of the image.
[quotetheorem:4131]
[citeproof:4131]
Because of affine equivariance, the main theorem can be proved after normalizing the John ellipsoid to be $B(0,1)$. The invertibility of the affine map is essential: a non-invertible map can collapse volume, destroy interior, and send many different ellipsoids to the same lower-dimensional set. The affine nature is also essential; a nonlinear map need not send ellipsoids to ellipsoids, so it does not preserve the optimization problem. For example, a shear or dilation changes the coordinates but preserves the class of ellipsoids, while a quadratic map does not. The normalized situation is called John position.
[definition: John Position]
A convex body $K \subset \mathbb R^n$ is in John position if its John ellipsoid is the Euclidean unit ball $B(0,1)$.
[/definition]
In John position, the geometry of $K$ is encoded by the points where $K$ touches the unit ball.
To state the optimality conditions precisely, we need a name for the active boundary points where the containment $B(0,1) \subset K$ is tight. These are the points at which supporting hyperplanes can obstruct infinitesimal attempts to enlarge or move the ball.
[definition: Contact Point]
Let $K \subset \mathbb R^n$ be a convex body with $B(0,1) \subset K$. A contact point of $K$ with $B(0,1)$ is a point $u \in \partial K \cap \partial B(0,1)$.
[/definition]
At a contact point $u$, the hyperplane $\{x \in \mathbb R^n : u \cdot x = 1\}$ supports the unit ball.
The remaining question is how many such active constraints are needed, and how they must be arranged, to certify that the unit ball is the maximal-volume inscribed ellipsoid. John's characterization answers this by converting the geometry of contact hyperplanes into a finite algebraic decomposition. In the statement, $u\otimes u$ denotes the rank-one linear operator $x\mapsto (u\cdot x)u$, equivalently the matrix $uu^\top$ in Euclidean coordinates.
[quotetheorem:4132]
[citeproof:4132]
The identity decomposition is the algebraic heart of the theorem. John position is necessary here: if the unit ball is merely contained in $K$ but is not the maximal-volume inscribed ellipsoid, the active supporting hyperplanes need not balance translation and linear deformation. Contact points record exactly where the containment constraint blocks first-order attempts to enlarge or move the ball. The equations say that these blocking constraints are not concentrated in a lower-dimensional set; they span all directions through the identity matrix. Taking traces gives $\sum_i c_i = n$, since $|u_i|=1$ for every contact point.
[illustration:john-position-contact-hyperplanes]
[example: John Ellipsoid of an Ellipsoid]
Let $K=a+TB(0,1)$, where $T$ is invertible. By definition, $K$ is itself an ellipsoid, and it is contained in $K$ because every set is contained in itself. If $F\subset K$ is any ellipsoid, then monotonicity of Euclidean volume gives
\begin{align*}
\mathcal L^n(F)\le \mathcal L^n(K).
\end{align*}
Since the ellipsoid $K$ is one of the admissible ellipsoids in the supremum
\begin{align*}
\sup\{\mathcal L^n(F):F\subset K \text{ is an ellipsoid}\},
\end{align*}
we also have
\begin{align*}
\sup\{\mathcal L^n(F):F\subset K \text{ is an ellipsoid}\}\ge \mathcal L^n(K).
\end{align*}
The two inequalities force the supremum to equal $\mathcal L^n(K)$, and the admissible ellipsoid attaining it is $K$ itself. Therefore the John ellipsoid of $K$ is $K$.
Under the affine map $A(x)=T^{-1}(x-a)$, one has
\begin{align*}
A(K)
&= \{T^{-1}(x-a):x\in a+TB(0,1)\}\\
&= \{T^{-1}(a+Ty-a):y\in B(0,1)\}\\
&= \{y:y\in B(0,1)\}\\
&= B(0,1).
\end{align*}
Thus an ellipsoid is already in its own John position after the natural affine normalization sending it to the Euclidean unit ball.
[/example]
## John Theorem for Symmetric and General Bodies
The main question is how much larger $K$ can be than its maximum-volume inscribed ellipsoid. The answer is dimension-dependent and sharp in order: after normalizing $E=B(0,1)$, every centrally symmetric body lies inside $\sqrt n B(0,1)$, and every convex body lies inside $nB(0,1)$.
[definition: Centrally Symmetric Convex Body]
A convex body $K \subset \mathbb R^n$ is centrally symmetric about $0$ if
\begin{align*}
x \in K \implies -x \in K.
\end{align*}
[/definition]
Centrally symmetric bodies have no preferred direction of translation, which is why their John approximation is stronger.
After putting such a body in John position, the remaining question is how far a point of $K$ can extend from the unit ball. Symmetry converts every supporting contact direction into a constraint in the opposite direction as well, so the body is controlled like the unit ball of a norm rather than a one-sided convex set. This two-sided control is what permits a $\sqrt n$ enlargement instead of an order-$n$ enlargement.
[quotetheorem:4133]
[citeproof:4133]
The symmetry step explains exactly where the constant improves: it turns the one-sided supporting inequalities $u_i \cdot x \le 1$ into two-sided estimates $|u_i \cdot x| \le 1$. The theorem does not say that $K$ is close to a Euclidean ball in its original coordinates; the affine normalization is part of the statement. Symmetry is also genuinely needed for the $\sqrt n$ constant: a regular simplex is highly unbalanced around any contact facet and reaches the larger order-$n$ general bound. Thus the symmetric theorem is a statement about norm balls, where every direction and its negative are present on equal footing.
Without central symmetry, the contact inequalities control the body only from the supporting side. A point of $K$ can lie far in the opposite direction from many contact points, so the previous two-sided estimate is no longer available. The general theorem asks how much containment remains from John position alone, and the answer is a weaker but still dimension-controlled dilation.
[quotetheorem:4134]
[citeproof:4134]
The general estimate is weaker because the body may be much longer in directions opposite its supporting contact points. The proof still uses more than scalar averaging: the lower bound $u_i\cdot x\ge -|x|$ comes from the fact that the $u_i$ are unit vectors in the same Euclidean space as $x$. The theorem does not identify which directions are extremal, nor does it imply that most of $K$ is near the ball. A regular simplex shows that the factor $n$ cannot be replaced by a universal constant independent of $n$, and in fact attains the correct sharp behaviour for the general nonsymmetric case.
[example: John Ellipsoid of a Cube]
Consider the cube $K=[-1,1]^n$, and let $E$ be its John ellipsoid. The cube is invariant under every signed permutation matrix $Q$, meaning $QK=K$. By *Affine Equivariance of the John Ellipsoid* and uniqueness of the John ellipsoid, $Q(E)$ is again the John ellipsoid of $K$, so $Q(E)=E$ for every signed permutation matrix $Q$.
Write
\begin{align*}
E=\{x\in \mathbb R^n:(x-a)^T S^{-1}(x-a)\le 1\},
\end{align*}
where $a\in\mathbb R^n$ and $S$ is positive definite. The equality $Q(E)=E$ forces $Qa=a$ and $QSQ^T=S$ for every signed permutation matrix $Q$. Taking $Q$ to be the sign change in the $j$th coordinate gives $(Qa)_j=-a_j$, so $a_j=-a_j$ and hence $a_j=0$. Since this holds for every $j$, we have $a=0$.
Now write $S=(s_{ij})$. If $Q$ changes the sign of the $i$th coordinate and leaves the others fixed, then for $i\ne j$,
\begin{align*}
(QSQ^T)_{ij}=(-1)s_{ij}= -s_{ij}.
\end{align*}
Since $QSQ^T=S$, this gives $s_{ij}=-s_{ij}$, hence $s_{ij}=0$ for $i\ne j$. Thus $S$ is diagonal. If $P$ is a permutation matrix exchanging coordinates $i$ and $j$, then $PSP^T=S$ gives $s_{ii}=s_{jj}$. Therefore all diagonal entries are equal, say $s_{ii}=r^2$, and
\begin{align*}
E=\{x\in\mathbb R^n: |x|^2\le r^2\}=B(0,r).
\end{align*}
The ball $B(0,1)$ is contained in $K$ because if $|x|\le 1$, then for every coordinate,
\begin{align*}
|x_j|\le |x|\le 1.
\end{align*}
Conversely, if $B(0,r)\subset K$, then $re_j\in K$ for each coordinate vector $e_j$, so
\begin{align*}
|r|=|(re_j)_j|\le 1,
\end{align*}
and therefore $r\le 1$. Hence the largest inscribed Euclidean ball is $B(0,1)$, so the John ellipsoid of $[-1,1]^n$ is $B(0,1)$.
By *John Theorem for Centrally Symmetric Bodies*, the normalized inclusion is
\begin{align*}
[-1,1]^n\subset B(0,\sqrt n).
\end{align*}
This is sharp for the cube: if $v=(\varepsilon_1,\dots,\varepsilon_n)$ with each $\varepsilon_j\in\{-1,1\}$, then
\begin{align*}
|v|^2=\varepsilon_1^2+\cdots+\varepsilon_n^2=1+\cdots+1=n,
\end{align*}
so $|v|=\sqrt n$.
[/example]
The cube is controlled by coordinate slabs. The cross-polytope has a different symmetry, so its John ellipsoid is detected through sign and coordinate balance rather than through facets alone.
[example: John Ellipsoid of a Cross-Polytope]
Let
\begin{align*}
K=\{x\in\mathbb R^n: |x_1|+\cdots+|x_n|\le 1\}.
\end{align*}
If $Q$ is a signed permutation matrix, then $Q$ only permutes coordinates and changes signs, so
\begin{align*}
| (Qx)_1|+\cdots+|(Qx)_n|
=
|x_1|+\cdots+|x_n|.
\end{align*}
Hence $QK=K$. By *Affine Equivariance of the John Ellipsoid* and uniqueness of the John ellipsoid, $Q(E)=E$ for every signed permutation matrix $Q$.
Write
\begin{align*}
E=\{x\in\mathbb R^n:(x-a)^T S^{-1}(x-a)\le 1\},
\end{align*}
where $a\in\mathbb R^n$ and $S$ is positive definite. The equality $Q(E)=E$ forces $Qa=a$ and $QSQ^T=S$ for every signed permutation matrix $Q$. Taking $Q$ to change only the sign of the $j$th coordinate gives
\begin{align*}
(Qa)_j=-a_j.
\end{align*}
Since $Qa=a$, we get $a_j=-a_j$, hence $a_j=0$. This holds for every $j$, so $a=0$.
Now let $S=(s_{ij})$. If $Q$ changes the sign of the $i$th coordinate and leaves the others fixed, then for $i\ne j$,
\begin{align*}
(QSQ^T)_{ij}=(-1)s_{ij}=-s_{ij}.
\end{align*}
Since $QSQ^T=S$, this gives $s_{ij}=-s_{ij}$, so $s_{ij}=0$ for $i\ne j$. Thus $S$ is diagonal. If $P$ is a permutation matrix exchanging coordinates $i$ and $j$, then $PSP^T=S$ gives $s_{ii}=s_{jj}$. Therefore $S=r^2 I_n$ for some $r>0$, and
\begin{align*}
E=\{x\in\mathbb R^n: |x|^2\le r^2\}=B(0,r).
\end{align*}
It remains to find the largest $r$ for which $B(0,r)\subset K$. For any $x\in\mathbb R^n$, Cauchy-Schwarz gives
\begin{align*}
|x_1|+\cdots+|x_n|
&= (1,\dots,1)\cdot (|x_1|,\dots,|x_n|)\\
&\le |(1,\dots,1)|\, |(|x_1|,\dots,|x_n|)|\\
&= \sqrt{1+\cdots+1}\,\sqrt{|x_1|^2+\cdots+|x_n|^2}\\
&= \sqrt n\, |x|.
\end{align*}
Thus if $|x|\le 1/\sqrt n$, then
\begin{align*}
|x_1|+\cdots+|x_n|\le \sqrt n\,|x|\le \sqrt n\cdot \frac{1}{\sqrt n}=1,
\end{align*}
so $B(0,1/\sqrt n)\subset K$.
Conversely, suppose $B(0,r)\subset K$. Let
\begin{align*}
v=\left(\frac r{\sqrt n},\dots,\frac r{\sqrt n}\right).
\end{align*}
Then
\begin{align*}
|v|^2
&=\left(\frac r{\sqrt n}\right)^2+\cdots+\left(\frac r{\sqrt n}\right)^2\\
&=n\cdot \frac{r^2}{n}\\
&=r^2,
\end{align*}
so $v\in B(0,r)\subset K$. Therefore
\begin{align*}
1
&\ge |v_1|+\cdots+|v_n|\\
&=\frac r{\sqrt n}+\cdots+\frac r{\sqrt n}\\
&=n\cdot \frac r{\sqrt n}\\
&=r\sqrt n.
\end{align*}
Hence $r\le 1/\sqrt n$. The largest inscribed Euclidean ball is therefore $B(0,1/\sqrt n)$, and since the John ellipsoid has already been forced by symmetry to be a Euclidean ball, the John ellipsoid of the cross-polytope is $B(0,1/\sqrt n)$.
[/example]
The simplex is the asymmetric test case: unlike the cube or cross-polytope, its extremal ellipsoid reflects barycentric rather than coordinate symmetry.
[example: John Ellipsoid of a Simplex]
Translate the regular simplex so that its barycentre is $0$, and write its vertices as $v_0,\dots,v_n \in \mathbb R^n$. Since the simplex is regular, all vertices have the same Euclidean length, say $|v_k|=R$, and the barycentre condition gives
\begin{align*}
v_0+\cdots+v_n=0.
\end{align*}
For a regular simplex the off-diagonal inner products are all equal. If $v_i\cdot v_j=\alpha$ for $i\ne j$, then taking the inner product of $\sum_{j=0}^n v_j=0$ with $v_i$ gives
\begin{align*}
0
&=v_i\cdot \sum_{j=0}^n v_j\\
&=v_i\cdot v_i+\sum_{\substack{0\le j\le n\\ j\ne i}}v_i\cdot v_j\\
&=R^2+n\alpha.
\end{align*}
Hence $\alpha=-R^2/n$, so
\begin{align*}
v_i\cdot v_j=-\frac{R^2}{n}\qquad (i\ne j).
\end{align*}
The facet opposite $v_i$ is the convex hull of the vertices $v_j$ with $j\ne i$. For every such vertex,
\begin{align*}
v_i\cdot v_j=-\frac{R^2}{n},
\end{align*}
so that facet lies in the hyperplane
\begin{align*}
\left\{x\in\mathbb R^n:v_i\cdot x=-\frac{R^2}{n}\right\}.
\end{align*}
The vertex $v_i$ itself satisfies $v_i\cdot v_i=R^2>-R^2/n$, so the simplex is contained in the half-space
\begin{align*}
v_i\cdot x\ge -\frac{R^2}{n}.
\end{align*}
Equivalently, with
\begin{align*}
u_i=-\frac{v_i}{R},
\end{align*}
the supporting inequality is
\begin{align*}
u_i\cdot x\le \frac{R}{n}.
\end{align*}
Thus the distance from $0$ to every facet is $R/n$, so the insphere is
\begin{align*}
E=B\left(0,\frac{R}{n}\right).
\end{align*}
To see that this insphere is the John ellipsoid, use the symmetries of the regular simplex. Every symmetry fixes the barycentre and permutes the vertices. By *Affine Equivariance of the John Ellipsoid* and uniqueness of the John ellipsoid, the John ellipsoid is fixed by all these symmetries. The corresponding linear action on the barycentric hyperplane has no preferred direction, so the only invariant ellipsoid centred at $0$ is a Euclidean ball. Since any inscribed Euclidean ball centred at $0$ must satisfy all facet inequalities, its radius is at most the distance $R/n$ to the facets. Therefore the largest such ball is exactly $B(0,R/n)$, and this is the John ellipsoid.
Finally, the vertices show that the factor $n$ in the general John theorem is attained. Since each vertex satisfies
\begin{align*}
|v_i|=R=n\cdot \frac{R}{n},
\end{align*}
the simplex is contained in $nE=B(0,R)$, and its vertices lie on the boundary of $nE$. Thus for a regular simplex the inclusion $K\subset nE$ is sharp in the vertex directions.
[/example]
## Banach-Mazur Distance and Ellipsoidal Approximation
John theorem says that every finite-dimensional normed geometry is within controlled distance of Euclidean geometry. The remaining difficulty is that convex bodies may look very different before choosing coordinates, so ordinary Hausdorff distance or direct radius comparison is too rigid. The Banach-Mazur distance packages the affine question by measuring how well two convex bodies can be matched by affine transformations and dilations.
[definition: Banach-Mazur Distance of Convex Bodies]
Let $K,L \subset \mathbb R^n$ be convex bodies. The Banach-Mazur distance between $K$ and $L$ is
\begin{align*}
d_{BM}(K,L) = \inf\{\lambda \ge 1 : \exists A \text{ invertible affine with } A(K) \subset L \subset \lambda A(K)\}.
\end{align*}
[/definition]
For centrally symmetric bodies it is common to restrict to invertible linear maps and bodies centred at $0$. In that setting the definition agrees with the Banach-Mazur distance between the associated normed spaces.
Once the distance is defined, the basic problem is to estimate how far an arbitrary convex body can be from the Euclidean ball after the best affine change of coordinates. John theorem supplies exactly this universal comparison, with different dimension dependence in the symmetric and non-symmetric cases.
[quotetheorem:4135]
[citeproof:4135]
These bounds are often the first approximation step in high-dimensional convex geometry: put $K$ in John position, prove an estimate for bodies between $B(0,1)$ and $nB(0,1)$, and then transfer the result back by affine invariance. The hypotheses matter because Banach-Mazur distance is designed for full-dimensional convex bodies; without interior, the affine image may live in a lower-dimensional subspace and the comparison with $B(0,1)$ breaks down. The estimate is also coarse: it gives a worst-case affine approximation and does not preserve volume distribution, face structure, or finer metric invariants. The cube and regular simplex show that the same sharp dimension dependence visible in John theorem is already present in Banach-Mazur distance from the Euclidean ball. This connection is one entry point to finite-dimensional normed spaces and the local theory of Banach spaces, where convex bodies are studied through their unit balls.
[example: Norms Compared by John Position]
Let $K \subset \mathbb R^n$ be a centrally symmetric convex body with $0 \in \operatorname{int} K$, and define its Minkowski functional by
\begin{align*}
\|x\|_K=\inf\{t>0:x\in tK\}.
\end{align*}
After a linear change of variables putting $K$ in John position, *John Theorem for Centrally Symmetric Bodies* gives
\begin{align*}
B(0,1)\subset K\subset B(0,\sqrt n).
\end{align*}
We first prove the upper bound. If $x=0$, then $0\in tK$ for every $t>0$, so $\|0\|_K=0\le |0|$. If $x\ne 0$, then
\begin{align*}
\left|\frac{x}{|x|}\right|=1,
\end{align*}
so $x/|x|\in B(0,1)\subset K$. Multiplying by $|x|$ gives
\begin{align*}
x\in |x|K.
\end{align*}
Thus $|x|$ is one admissible value in the infimum defining $\|x\|_K$, and hence
\begin{align*}
\|x\|_K\le |x|.
\end{align*}
For the lower bound, let $t>0$ be such that $x\in tK$. Since $K\subset B(0,\sqrt n)$, scaling gives
\begin{align*}
tK\subset tB(0,\sqrt n)=B(0,t\sqrt n).
\end{align*}
Therefore $x\in B(0,t\sqrt n)$, which means
\begin{align*}
|x|\le t\sqrt n.
\end{align*}
Dividing by $\sqrt n$ gives
\begin{align*}
\frac{1}{\sqrt n}|x|\le t.
\end{align*}
This holds for every admissible $t$ in the definition of $\|x\|_K$, so taking the infimum over all such $t$ yields
\begin{align*}
\frac{1}{\sqrt n}|x|\le \|x\|_K.
\end{align*}
Combining the two estimates,
\begin{align*}
\frac{1}{\sqrt n}|x|\le \|x\|_K\le |x|.
\end{align*}
Thus the geometric containment of $K$ between two Euclidean balls becomes a two-sided comparison between the norm with unit ball $K$ and the Euclidean norm.
[/example]
This example records the dimensional sharpness of John's theorem: the factor cannot be removed by a better choice of coordinates in general.
[remark: Sharpness and Dimension Dependence]
The cube shows that the symmetric constant $\sqrt n$ is attained for the inclusion of $K$ in its John ellipsoid after normalization. The regular simplex shows that the general constant $n$ is attained. Thus John theorem gives dimension-dependent constants of the correct order in both cases.
[/remark]
The main lesson is that ellipsoids provide the correct affine substitute for Euclidean balls. A convex body may have many incompatible widths in its original coordinates, but in John position those widths are controlled by the contact point decomposition of the identity.
Once a convex body has been put in a well-controlled affine position, it becomes natural to study it probabilistically. Chapter 10 shifts to uniform measure on convex bodies and asks how geometric regularity is reflected in random points, averages, and concentration phenomena.
# 10. Convex Bodies and Probability
This chapter brings the preceding geometric theory into contact with probability. A convex body is now viewed not only through its support function, mixed volumes, or ellipsoids, but also as a probability space carrying its uniform measure. The main questions are how to normalise a body by its first and second moments, why convexity produces log-concavity after projection, and what high dimension says about the fluctuations of natural functions.
## Uniform Measure and Isotropic Position
How should a convex body be placed in space if translations and linear changes of variables are allowed? Chapter 9 used affine maps to simplify geometry by putting a body near its John ellipsoid. Without a moment normalisation, the same body can be stretched in one direction or translated far from the origin, and probabilistic quantities such as coordinate variances then reflect the chosen coordinates rather than the intrinsic geometry. For probabilistic questions the corresponding normalisation is moment-based: move the barycenter to the origin and make the covariance matrix a scalar multiple of the identity.
[definition: Uniform Measure on a Convex Body]
Let $K \subset \mathbb R^n$ be a convex body with non-empty interior. The uniform probability measure on $K$ is the Borel probability measure $\mu_K$ defined by
\begin{align*}
\mu_K(A) = \frac{\mathcal L^n(A \cap K)}{\mathcal L^n(K)}
\end{align*}
for every Borel set $A \subset \mathbb R^n$.
[/definition]
The uniform measure converts geometric averages over $K$ into expectations. If $X \sim \mu_K$, then integration over $K$ may be written as expectation with respect to $X$.
To turn this probabilistic viewpoint into a coordinate normalisation, we need the first two moments of the uniform distribution. The first moment records where the body is centered, while the second centered moment records how its mass spreads in each direction.
[definition: Barycenter and Covariance Matrix]
Let $K \subset \mathbb R^n$ be a convex body and let $X \sim \mu_K$. The barycenter of $K$ is the vector $b(K) \in \mathbb R^n$ given by
\begin{align*}
b(K) = \mathbb E[X] = \frac{1}{\mathcal L^n(K)} \int_K x\,d\mathcal L^n(x).
\end{align*}
The covariance matrix of $K$ is the positive semidefinite symmetric matrix $\operatorname{Cov}(K) \in \operatorname{Sym}_n^+$ given by
\begin{align*}
\operatorname{Cov}(K) = \mathbb E[(X-b(K)) \otimes (X-b(K))].
\end{align*}
[/definition]
For a direction $u \in \mathbb R^n$, the quadratic form of the covariance matrix is the variance of the coordinate projection:
\begin{align*}
u^\top \operatorname{Cov}(K)u = \mathbb E[((X-b(K))\cdot u)^2].
\end{align*}
Thus the covariance matrix records the average squared width of $K$ in every direction, weighted by volume rather than by support planes.
The desired normal form is the one in which these averaged widths no longer prefer any direction. After also fixing the volume and moving the barycenter to the origin, this gives the moment analogue of placing a body in a canonical affine position.
[definition: Isotropic Convex Body]
A convex body $K \subset \mathbb R^n$ with $\mathcal L^n(K)=1$ is isotropic if
\begin{align*}
b(K) &= 0, & \operatorname{Cov}(K) &= L_K^2 I_n
\end{align*}
for some constant $L_K>0$.
[/definition]
The number $L_K$ is the isotropic constant of $K$ in this normalisation. Some authors allow arbitrary volume and absorb the volume factor into the definition; the present convention separates volume-one scaling from moment normalisation.
The existence question is whether every full-dimensional convex body can actually be moved into this normal form by an invertible affine map. The obstruction would be a degenerate covariance matrix or an affine constraint that prevents equalising all directional variances without collapsing the body.
[quotetheorem:4136]
[citeproof:4136]
The non-empty interior hypothesis is essential: if $K$ lies in a proper affine subspace, then some non-zero linear functional is constant on $K$, so the covariance matrix is singular and no invertible affine map can make it a positive multiple of $I_n$. The affine map must be invertible because isotropic position is a normalisation of the same full-dimensional body, not a projection that discards degenerate directions. Isotropic position is not unique, because orthogonal maps preserve the defining conditions, and the theorem does not assert any dimension-free bound for $L_K$; that boundedness question is precisely the slicing problem introduced later. Thus the result is best understood as a coordinate normal form, analogous to putting a positive definite quadratic form into standard Euclidean shape.
[example: Isotropic Normalisation of the Cube]
Let $Q=[-1,1]^n$ and let $X\sim\mu_Q$. Since
\begin{align*}
\mathcal L^n(Q)=2^n,
\end{align*}
the density of $X$ is $2^{-n}\mathbf 1_Q$. For each coordinate,
\begin{align*}
\mathbb E[X_i]
&=\frac{1}{2^n}\int_{[-1,1]^n}x_i\,dx_1\cdots dx_n\\
&=\frac{1}{2^n}\left(\int_{-1}^1 x_i\,dx_i\right)\prod_{k\ne i}\left(\int_{-1}^1 dx_k\right)\\
&=\frac{1}{2^n}\cdot 0\cdot 2^{n-1}
=0,
\end{align*}
so $b(Q)=0$. For $i\ne j$,
\begin{align*}
\mathbb E[X_iX_j]
&=\frac{1}{2^n}\int_{[-1,1]^n}x_ix_j\,dx_1\cdots dx_n\\
&=\frac{1}{2^n}\left(\int_{-1}^1 x_i\,dx_i\right)\left(\int_{-1}^1 x_j\,dx_j\right)\prod_{k\ne i,j}\left(\int_{-1}^1 dx_k\right)\\
&=\frac{1}{2^n}\cdot 0\cdot 0\cdot 2^{n-2}
=0,
\end{align*}
while
\begin{align*}
\mathbb E[X_i^2]
&=\frac{1}{2^n}\int_{[-1,1]^n}x_i^2\,dx_1\cdots dx_n\\
&=\frac{1}{2^n}\left(\int_{-1}^1 x_i^2\,dx_i\right)\prod_{k\ne i}\left(\int_{-1}^1 dx_k\right)\\
&=\frac{1}{2^n}\left(\frac{x_i^3}{3}\Big|_{-1}^1\right)2^{n-1}\\
&=\frac{1}{2^n}\cdot \frac{2}{3}\cdot 2^{n-1}
=\frac{1}{3}.
\end{align*}
Because the barycenter is $0$, these second moments are the covariance entries, and therefore
\begin{align*}
\operatorname{Cov}(Q)=\frac{1}{3}I_n.
\end{align*}
The volume-one cube is $\widetilde Q=\frac12 Q=[-\frac12,\frac12]^n$, since
\begin{align*}
\mathcal L^n(\widetilde Q)=\left(\frac12\right)^n\mathcal L^n(Q)=\left(\frac12\right)^n2^n=1.
\end{align*}
If $\widetilde X=\frac12X$, then
\begin{align*}
b(\widetilde Q)&=\mathbb E[\widetilde X]=\frac12\mathbb E[X]=0,\\
\operatorname{Cov}(\widetilde Q)
&=\mathbb E[\widetilde X\otimes \widetilde X]
=\mathbb E\left[\left(\frac12X\right)\otimes\left(\frac12X\right)\right]
=\frac14\operatorname{Cov}(Q)
=\frac{1}{12}I_n.
\end{align*}
Thus the cube already has isotropic covariance directions; passing to volume one only changes the scalar, giving $L_{\widetilde Q}=1/\sqrt{12}$ in this normalisation.
[/example]
The cube illustrates that product symmetry can make the covariance computation diagonal. Rotational symmetry gives an even stronger simplification for the Euclidean ball.
[example: Isotropic Normalisation of the Euclidean Ball]
Let $B_2^n=B(0,1)\subset\mathbb R^n$ and let $X\sim\mu_{B_2^n}$. For each coordinate $i$, the sign-change map $(x_1,\ldots,x_i,\ldots,x_n)\mapsto(x_1,\ldots,-x_i,\ldots,x_n)$ preserves $B_2^n$ and Lebesgue measure, while it changes $x_i$ to $-x_i$. Hence
\begin{align*}
\mathbb E[X_i]
&=\frac{1}{\mathcal L^n(B_2^n)}\int_{B_2^n}x_i\,d\mathcal L^n(x)\\
&=-\frac{1}{\mathcal L^n(B_2^n)}\int_{B_2^n}x_i\,d\mathcal L^n(x)
=-\mathbb E[X_i],
\end{align*}
so $\mathbb E[X_i]=0$ for every $i$ and therefore $b(B_2^n)=0$.
For $i\ne j$, the same sign-change in the $i$-th coordinate preserves $B_2^n$ and sends $x_ix_j$ to $-x_ix_j$, so
\begin{align*}
\mathbb E[X_iX_j]=-\mathbb E[X_iX_j]=0.
\end{align*}
Permuting coordinates preserves $B_2^n$ and Lebesgue measure, so the diagonal second moments are all equal. Thus there is a number $a>0$ such that
\begin{align*}
\operatorname{Cov}(B_2^n)=aI_n.
\end{align*}
Since $b(B_2^n)=0$,
\begin{align*}
\mathbb E[|X|^2]
&=\mathbb E[X_1^2+\cdots+X_n^2]\\
&=\mathbb E[X_1^2]+\cdots+\mathbb E[X_n^2]\\
&=a+\cdots+a
=na.
\end{align*}
Using polar coordinates, with the surface-area factor cancelling between numerator and denominator,
\begin{align*}
\mathbb E[|X|^2]
&=\frac{\int_{B_2^n}|x|^2\,d\mathcal L^n(x)}{\mathcal L^n(B_2^n)}\\
&=\frac{\int_0^1 r^2r^{n-1}\,dr}{\int_0^1 r^{n-1}\,dr}\\
&=\frac{\int_0^1 r^{n+1}\,dr}{\int_0^1 r^{n-1}\,dr}\\
&=\frac{\frac{1}{n+2}}{\frac{1}{n}}
=\frac{n}{n+2}.
\end{align*}
Comparing $na=\mathbb E[|X|^2]$ gives
\begin{align*}
na=\frac{n}{n+2},
\end{align*}
and hence
\begin{align*}
a=\frac{1}{n+2}.
\end{align*}
Therefore
\begin{align*}
\operatorname{Cov}(B_2^n)=\frac{1}{n+2}I_n.
\end{align*}
Let $\kappa_n=\mathcal L^n(B_2^n)$ and set $\widetilde B=\kappa_n^{-1/n}B_2^n$. Then
\begin{align*}
\mathcal L^n(\widetilde B)
&=(\kappa_n^{-1/n})^n\mathcal L^n(B_2^n)\\
&=\kappa_n^{-1}\kappa_n
=1.
\end{align*}
If $\widetilde X=\kappa_n^{-1/n}X$, then
\begin{align*}
b(\widetilde B)
&=\mathbb E[\widetilde X]
=\kappa_n^{-1/n}\mathbb E[X]
=0,
\end{align*}
and
\begin{align*}
\operatorname{Cov}(\widetilde B)
&=\mathbb E[\widetilde X\otimes \widetilde X]\\
&=\mathbb E[(\kappa_n^{-1/n}X)\otimes(\kappa_n^{-1/n}X)]\\
&=\kappa_n^{-2/n}\mathbb E[X\otimes X]\\
&=\frac{\kappa_n^{-2/n}}{n+2}I_n.
\end{align*}
Thus the Euclidean ball is isotropic after volume-one scaling, with
\begin{align*}
L_{\widetilde B}=\frac{\kappa_n^{-1/n}}{\sqrt{n+2}}.
\end{align*}
Rotational symmetry makes the covariance scalar, and the radial integral determines the scalar exactly.
[/example]
## Log-Concave Measures and Marginals
Why do convex bodies give probability measures that remain well behaved after projection? For an arbitrary density, projecting can create several separated peaks or deep valleys, so the marginal need not retain any convex-geometric structure. The key point is that convexity of sets becomes log-concavity of densities and marginal densities. This is the analytic form of the same phenomenon behind Brunn-Minkowski: intermediate slices have at least the expected geometric size.
Recall from Chapter 6 that a function $f:\mathbb R^n \to [0,\infty)$ is log-concave if
\begin{align*}
f((1-t)x+ty) \ge f(x)^{1-t} f(y)^t
\end{align*}
for all $x,y \in \mathbb R^n$ and all $t \in [0,1]$. The indicator function $\mathbb{1}_K$ of a convex set $K$ is log-concave by the segment condition.
[definition: Log-Concave Measure]
A Borel probability measure $\mu$ on $\mathbb R^n$ is log-concave if for all nonempty compact sets $A,B\subset\mathbb R^n$ and all $t\in[0,1]$,
\begin{align*}
\mu((1-t)A+tB)\ge \mu(A)^{1-t}\mu(B)^t.
\end{align*}
When $\mu$ has a density $f$ with respect to Lebesgue measure on its affine support, this is equivalent to log-concavity of $f$ on that support.
[/definition]
Uniform measures on convex bodies are the first examples: their densities are constant multiples of $\mathbf 1_K$. Gaussian measures also fit this framework because the logarithm of the Gaussian density is a concave quadratic function up to an additive constant.
[example: Uniform Measures as Log-Concave Measures]
Let $K \subset \mathbb R^n$ be a convex body. By the definition of the uniform measure on $K$, its density with respect to Lebesgue measure is
\begin{align*}
f_K(x)=\frac{1}{\mathcal L^n(K)}\mathbf 1_K(x).
\end{align*}
We verify log-concavity directly. Fix $x,y\in\mathbb R^n$ and $t\in[0,1]$.
If $x,y\in K$, then convexity gives $(1-t)x+ty\in K$, so
\begin{align*}
f_K((1-t)x+ty)
&=\frac{1}{\mathcal L^n(K)},\\
f_K(x)^{1-t}f_K(y)^t
&=\left(\frac{1}{\mathcal L^n(K)}\right)^{1-t}
\left(\frac{1}{\mathcal L^n(K)}\right)^t\\
&=\left(\frac{1}{\mathcal L^n(K)}\right)^{(1-t)+t}
=\frac{1}{\mathcal L^n(K)}.
\end{align*}
Thus equality holds in the log-concavity inequality in this case.
If at least one of $x,y$ is not in $K$, then at least one of $f_K(x)$ and $f_K(y)$ is $0$, and hence
\begin{align*}
f_K(x)^{1-t}f_K(y)^t=0
\end{align*}
with the usual endpoint convention when $t=0$ or $t=1$. Since $f_K((1-t)x+ty)\ge 0$, we again have
\begin{align*}
f_K((1-t)x+ty)\ge f_K(x)^{1-t}f_K(y)^t.
\end{align*}
Therefore $f_K$ is log-concave, so the uniform measure on every convex body is a log-concave probability measure.
[/example]
The central structural question is whether this property survives when some coordinates are forgotten. Prekopa's theorem answers this by turning the geometry of fibres into an analytic statement about marginal densities.
[quotetheorem:4137]
[citeproof:4137]
The theorem says that log-concavity is stable under marginalisation, which is a strong restriction because marginalisation usually smooths and mixes densities without preserving convexity of level sets. For arbitrary densities the conclusion fails: two separated bumps can project to a marginal with two peaks and a dip between them. Prekopa does not by itself give Gaussian-type concentration or tail bounds; it only preserves the one-dimensional convexity structure of the logarithm of the density. In geometric language, if a convex body is projected onto a lower-dimensional subspace, then the density of the projected uniform measure is log-concave, and later concentration estimates start from this preserved structure.
The measure-theoretic version packages this projection operation as a statement about probability measures. Instead of starting with a function on a product space and integrating out variables, one begins with a log-concave probability measure and asks whether its pushforward is still log-concave. This formulation is deliberately measure-theoretic: a non-injective linear map can produce a measure supported on a lower-dimensional affine subspace, where log-concavity should be read on that support rather than as full-dimensional Lebesgue density.
[quotetheorem:4138]
[citeproof:4138]
The conclusion is therefore stronger than a statement about full-dimensional densities: it survives projections and other linear maps even when the image measure is singular in the ambient space. When the map has full rank onto its target, the statement can be read in the familiar density language; when the image lies in a lower-dimensional subspace, the same log-concavity is interpreted on that affine support. This result prepares the geometric slice-volume examples below, where the relevant marginal is full-dimensional and its density is literally the volume of a fibre.
[example: Slice Volumes of a Convex Body]
Let $K \subset \mathbb R^{n+m}$ be a convex body, and for $x\in\mathbb R^n$ define the fibre
\begin{align*}
K_x=\{y\in\mathbb R^m:(x,y)\in K\}.
\end{align*}
Then the slice-volume function is
\begin{align*}
g(x)=\mathcal L^m(K_x).
\end{align*}
If $X=(U,V)$ is uniformly distributed on $K$, then the density of $X$ is
\begin{align*}
f(u,v)=\frac{1}{\mathcal L^{n+m}(K)}\mathbf 1_K(u,v).
\end{align*}
The marginal density of $U$ is therefore
\begin{align*}
h(x)
&=\int_{\mathbb R^m} f(x,y)\,d\mathcal L^m(y)\\
&=\int_{\mathbb R^m}\frac{1}{\mathcal L^{n+m}(K)}\mathbf 1_K(x,y)\,d\mathcal L^m(y)\\
&=\frac{1}{\mathcal L^{n+m}(K)}
\int_{\mathbb R^m}\mathbf 1_{\{y:(x,y)\in K\}}(y)\,d\mathcal L^m(y)\\
&=\frac{1}{\mathcal L^{n+m}(K)}\mathcal L^m(K_x)\\
&=\frac{g(x)}{\mathcal L^{n+m}(K)}.
\end{align*}
Thus $g=\mathcal L^{n+m}(K)h$, so multiplying the marginal density by the positive constant $\mathcal L^{n+m}(K)$ does not change log-concavity.
Since $K$ is convex, its indicator $\mathbf 1_K$ is log-concave, and therefore the density $f=\mathcal L^{n+m}(K)^{-1}\mathbf 1_K$ is log-concave as well. By *Prekopa Theorem*, the marginal density $h$ is log-concave on its support. Hence for $x_0,x_1$ in the projection of $K$ and $t\in[0,1]$,
\begin{align*}
g((1-t)x_0+tx_1)
&=\mathcal L^{n+m}(K)h((1-t)x_0+tx_1)\\
&\ge \mathcal L^{n+m}(K)h(x_0)^{1-t}h(x_1)^t\\
&=\mathcal L^{n+m}(K)
\left(\frac{g(x_0)}{\mathcal L^{n+m}(K)}\right)^{1-t}
\left(\frac{g(x_1)}{\mathcal L^{n+m}(K)}\right)^t\\
&=\mathcal L^{n+m}(K)
\frac{g(x_0)^{1-t}g(x_1)^t}
{(\mathcal L^{n+m}(K))^{(1-t)+t}}\\
&=g(x_0)^{1-t}g(x_1)^t.
\end{align*}
So the volumes of parallel slices of a convex body vary log-concavely across the projection.
[/example]
The preceding example is analytic, because it treats slice volume as a density. The geometric question is why convex slices should have log-concave volume in the first place. The theorem below supplies the mechanism: convexity forces intermediate fibres to contain Minkowski combinations of endpoint fibres, and Brunn-Minkowski then turns that containment into concavity of fibre-volume roots.
[quotetheorem:4139]
[citeproof:4139]
This result explains why Brunn-Minkowski is a geometric origin of log-concavity. Convexity is essential: for an arbitrary measurable set, the fibre over a midpoint can be tiny or empty even when the endpoint fibres have large measure, because the set need not contain the segments joining the endpoint fibres. The theorem also gives only concavity of the $m$-th root of fibre volume, not detailed information about the shape of individual fibres or sharp tail estimates for the marginal. Prekopa extends the same principle from indicator functions of convex sets to general log-concave densities.
[quotetheorem:4140]
[citeproof:4140]
Borell lemma is a basic moment-comparison principle for log-concave measures. The log-concavity hypothesis is doing real work: for a probability measure with heavy tails, a high moment may be infinite even when a lower moment is finite, so no constant depending only on $p$ and $q$ can exist. The conclusion is not a concentration inequality by itself, because it compares moments of a seminorm rather than giving a direct exponential tail bound. Its role in the course is to let estimates proved at one moment level be transferred to other moment levels uniformly over the dimension and over the particular log-concave measure.
## Concentration on Spheres and Convex Bodies
What changes when the dimension is large? High-dimensional convex geometry is governed by concentration: many Lipschitz functions are close to their typical value on most of the space. The sphere is the clean model case, and convex bodies motivate harder analogues such as the thin-shell and slicing problems.
[definition: Lipschitz Function on a Metric Space]
Let $(X,d)$ be a metric space. A function $f:X\to\mathbb R$ is $L$-Lipschitz if
\begin{align*}
|f(x)-f(y)| \le Ld(x,y)
\end{align*}
for all $x,y \in X$.
[/definition]
For $S^{n-1} \subset \mathbb R^n$, the metric may be the Euclidean distance inherited from $\mathbb R^n$ or the geodesic distance on the sphere. These metrics are comparable at the scale relevant for concentration estimates, so the course states the principle in a Euclidean Lipschitz form.
With the Lipschitz condition fixed, the key high-dimensional question is how large the exceptional set can be where such a function differs noticeably from its typical value. Levy's lemma gives the model answer on the sphere: deviations are exponentially small once measured at the natural scale set by the Lipschitz constant and the dimension.
[quotetheorem:4141]
[citeproof:4141]
Levy lemma is the fundamental concentration theorem for the sphere. The Lipschitz hypothesis is essential, because a function with sharp spikes on small caps can have large deviations on those caps without violating measurability or boundedness. The estimate is specific to the highly symmetric spherical measure and does not automatically transfer to every convex body; isotropic log-concave measures require separate arguments and generally weaker bounds. In these notes the lemma supplies the model for high-dimensional concentration: deviations of a stable observable occur at scale $L/\sqrt n$.
[example: Concentration of a Coordinate on the Sphere]
Let $X$ be uniformly distributed on $S^{n-1}$ and set $f(x)=x_1$. For $x,y\in S^{n-1}$,
\begin{align*}
|f(x)-f(y)|
&=|x_1-y_1|\\
&\le \left((x_1-y_1)^2+\cdots+(x_n-y_n)^2\right)^{1/2}\\
&=|x-y|,
\end{align*}
so $f$ is $1$-Lipschitz with respect to Euclidean distance. The reflection
\begin{align*}
R(x_1,x_2,\ldots,x_n)=(-x_1,x_2,\ldots,x_n)
\end{align*}
preserves $S^{n-1}$ and the rotation-invariant probability measure, and sends the set $\{x:x_1>0\}$ onto $\{x:x_1<0\}$. Since the equator $\{x:x_1=0\}$ has spherical measure $0$, we get
\begin{align*}
\sigma_{n-1}(\{x:x_1>0\})
&=\sigma_{n-1}(\{x:x_1<0\}),\\
\sigma_{n-1}(\{x:x_1>0\})+\sigma_{n-1}(\{x:x_1<0\})
&=1,
\end{align*}
and therefore each of these two hemispheres has measure $1/2$. Thus $0$ is a median of $f$.
Applying *Levy Lemma* with $L=1$ and $m_f=0$ gives, for every $t>0$,
\begin{align*}
\mathbb P(|X_1|\ge t)
&=\sigma_{n-1}(\{x\in S^{n-1}:|f(x)-0|\ge t\})\\
&\le C\exp(-cnt^2).
\end{align*}
This says that deviations of one coordinate occur on the scale $t\sim n^{-1/2}$: indeed, if $t=A/\sqrt n$, then
\begin{align*}
\mathbb P\left(|X_1|\ge \frac{A}{\sqrt n}\right)
&\le C\exp\left(-cn\frac{A^2}{n}\right)\\
&=C\exp(-cA^2).
\end{align*}
The exact one-dimensional density is
\begin{align*}
p_n(s)=c_n(1-s^2)^{(n-3)/2},\qquad -1\le s\le 1,
\end{align*}
where $c_n$ is the normalising constant chosen so that $\int_{-1}^1p_n(s)\,ds=1$. This has the same scale: when $s=A/\sqrt n$,
\begin{align*}
(1-s^2)^{(n-3)/2}
&=\left(1-\frac{A^2}{n}\right)^{(n-3)/2},
\end{align*}
which is comparable to a Gaussian factor in $A$ for fixed $A$ and large $n$. Thus a typical coordinate of a uniformly chosen point on $S^{n-1}$ has size of order $n^{-1/2}$.
[/example]
The sphere example concerns a fixed coordinate on a highly symmetric space. For convex bodies, the analogous question is often radial: after isotropic normalisation, does most mass sit near one Euclidean radius?
[example: Euclidean Norm in an Isotropic Convex Body]
Let $K \subset \mathbb R^n$ be isotropic with volume $1$, and let $X\sim\mu_K$. By isotropicity,
\begin{align*}
b(K)&=0, &
\operatorname{Cov}(K)&=L_K^2I_n.
\end{align*}
Since $b(K)=\mathbb E[X]=0$, the covariance matrix is
\begin{align*}
\operatorname{Cov}(K)
&=\mathbb E[(X-b(K))\otimes (X-b(K))]\\
&=\mathbb E[X\otimes X].
\end{align*}
Taking the $i$-th diagonal entry gives
\begin{align*}
\mathbb E[X_i^2]
&=(\operatorname{Cov}(K))_{ii}\\
&=(L_K^2I_n)_{ii}
=L_K^2.
\end{align*}
Therefore
\begin{align*}
\mathbb E[|X|^2]
&=\mathbb E[X_1^2+\cdots+X_n^2]\\
&=\mathbb E[X_1^2]+\cdots+\mathbb E[X_n^2]\\
&=L_K^2+\cdots+L_K^2\\
&=nL_K^2.
\end{align*}
Thus the root-mean-square Euclidean radius is $\sqrt{\mathbb E[|X|^2]}=\sqrt n\,L_K$. The thin-shell question asks whether $|X|$ is sharply concentrated around this scale for every isotropic convex body, with bounds independent of the particular shape of $K$. This shows how isotropic position turns a geometric question about $K$ into a probabilistic fluctuation question for the random variable $|X|$.
[/example]
The two main open problems below separate the scale of an isotropic body from the fluctuations around that scale. Slicing asks whether the scale itself can be universally bounded, while thin-shell asks whether the radial distribution is universally narrow once the scale is fixed.
[definition: Slicing Problem]
The slicing problem asks whether there exists a universal constant $C>0$ such that every isotropic convex body $K \subset \mathbb R^n$ of volume $1$ satisfies $L_K \le C$, independently of $n$ and $K$.
[/definition]
The slicing problem is also called the hyperplane conjecture. It is equivalent to asking whether every volume-one convex body has a hyperplane section of volume bounded below by a universal positive constant after the correct affine normalisation.
[definition: Thin-Shell Problem]
The thin-shell problem asks for dimension-free control on the fluctuations of $|X|$ when $X$ is uniformly distributed on an isotropic convex body $K \subset \mathbb R^n$.
[/definition]
A typical formulation seeks bounds for
\begin{align*}
\mathbb E\left[\left(\frac{|X|}{\sqrt n L_K}-1\right)^2\right]
\end{align*}
or for the probability that $|X|$ deviates from $\sqrt n L_K$. The problem measures whether mass in an isotropic convex body lies in a thin annulus.
[remark: Relation Between Slicing and Concentration]
Slicing controls the scale $L_K$, while thin-shell estimates control radial fluctuation once that scale is fixed. Both problems ask whether affine normalisation reveals universal high-dimensional behaviour across all convex bodies. This is why isotropic position, log-concavity, and concentration appear together in modern convex geometry.
[/remark]
The chapter ends by shifting the viewpoint from exact convex-geometric identities to probabilistic structure. Uniform measures on convex bodies form a central class of log-concave measures, Prekopa explains why projections preserve this class, and concentration inequalities describe the high-dimensional behaviour that remains after isotropic normalisation.
Contents
- 1. Convex Sets and Convex Bodies
- Convexity in the Correct Affine Space
- Convex Hulls and Finite Generation
- Basic Operations Preserving Convexity
- Standard Families of Convex Sets
- Radon and Helly Finiteness Principles
- Compact Convex Bodies as the Main Objects
- 2. Separation and Supporting Hyperplanes
- Linear Inequalities and Hyperplanes
- Supporting Hyperplanes and Exposed Faces
- Weak and Strong Separation
- Faces, Extreme Points, and Exposed Points
- Face Lattices of Polytopes
- Compact Convex Sets from Extreme Points
- 3. Support Functions and the Hausdorff Metric
- Measuring a Convex Body in a Direction
- Characterizing Which Functions Are Support Functions
- Recovering the Body from Supporting Half-Spaces
- The Hausdorff Metric
- Uniform Convergence of Support Functions
- 4. Minkowski Addition and Mixed Linear Structure
- Adding Convex Bodies
- Support Functions As Linear Coordinates
- Continuity In Hausdorff Distance
- Parallel Bodies And Outer Offsets
- 5. Polarity, Gauge Functions, and Duality
- Bodies Seen from the Origin
- Inclusion Reversal and the Bipolar Theorem
- Support Functions and Gauges Under Duality
- Polar Identities for Operations
- 6. Volume, Brunn-Minkowski, and Consequences
- Measuring Convex Bodies
- The Brunn-Minkowski Inequality
- Minkowski Interpolation And Concavity
- The Isoperimetric Inequality From Brunn-Minkowski
- The Prekopa-Leindler Viewpoint
- 7. Mixed Volumes and the Steiner Formula
- Volume as a Polynomial under Minkowski Addition
- Structural Properties of Mixed Volumes
- Quermassintegrals and Parallel Bodies
- Reading Mixed Coefficients Geometrically
- 8. Minkowski and Alexandrov-Fenchel Inequalities
- From Brunn--Minkowski to Minkowski's First Inequality
- The Alexandrov--Fenchel Inequality
- Mixed Discriminants and Ellipsoids
- Log-Concavity of Quermassintegrals
- Generalized Isoperimetric Inequalities
- 9. John Ellipsoid and Approximation by Euclidean Balls
- The Maximum-Volume Inscribed Ellipsoid
- John Theorem for Symmetric and General Bodies
- Banach-Mazur Distance and Ellipsoidal Approximation
- 10. Convex Bodies and Probability
- Uniform Measure and Isotropic Position
- Log-Concave Measures and Marginals
- Concentration on Spheres and Convex Bodies
Prerequisites (0/2 completed)
Log in to track your prerequisite progress.
Prerequisites Graph
Interactive dependency map showing prerequisite concepts
Loading dependency graph...
Theorem
Definition
Current
Requires
Rate this page
★
★
★
★
★
Poor
Excellent