Convexity is the geometry of not losing points while averaging. If two feasible designs are allowed but the straight path between them leaves the feasible region, then interpolation can create a forbidden design even before any optimisation or analysis begins.
[example: Two Feasible Points with a Forbidden Midpoint]
Let $S=D_- \cup D_+$, where
\begin{align*}
D_-=\{p\in\mathbb{R}^2:\|p-(-2,0)\|_2\leq 1\}
\end{align*}
and
\begin{align*}
D_+=\{p\in\mathbb{R}^2:\|p-(2,0)\|_2\leq 1\}.
\end{align*}
The point $x=(-2,0)$ lies in $D_-$ because
\begin{align*}
\|x-(-2,0)\|_2=\|(0,0)\|_2=0\leq 1,
\end{align*}
and the point $y=(2,0)$ lies in $D_+$ because
\begin{align*}
\|y-(2,0)\|_2=\|(0,0)\|_2=0\leq 1.
\end{align*}
Thus $x,y\in S$. Their midpoint is
\begin{align*}
\frac{x+y}{2}=\frac{(-2,0)+(2,0)}{2}=\frac{(0,0)}{2}=(0,0).
\end{align*}
This midpoint is not in $D_-$, since
\begin{align*}
\|(0,0)-(-2,0)\|_2=\|(2,0)\|_2=2>1,
\end{align*}
and it is not in $D_+$, since
\begin{align*}
\|(0,0)-(2,0)\|_2=\|(-2,0)\|_2=2>1.
\end{align*}
Therefore $(0,0)\notin S$, even though it is the average of two points of $S$. This shows that a set can contain two locally solid pieces while still failing to contain the interpolation forced by averaging.
[/example]
The failure in the example is not about smoothness, dimension, or coordinates. It is about whether the set contains each line segment joining two of its points, and that single requirement becomes the organizing principle for [convex geometry](/page/Convex%20Geometry).
## Definition
The basic question is how to formalize the phrase "no gaps along straight interpolation." A definition should mention only the segment joining two selected points, because longer finite averages can then be built from repeated two-point averages. This local segment test is the smallest condition that still controls the global averaging behavior of the set.
[definition: Convex Set]
Let $V$ be a real [vector space](/page/Vector%20Space). A convex set in $V$ is a subset $C \subset V$ such that for every $x,y \in C$ and every $t \in [0,1]$, the point
\begin{align*}
(1-t)x + ty
\end{align*}
belongs to $C$.
[/definition]
The definition says that every chord whose endpoints lie in $C$ lies entirely inside $C$. The parameter $t$ records position along the chord, so convexity is invariant under affine changes of coordinates rather than depending on a chosen origin.
The two-point condition is not enough notation for arguments that average several points at once. We need a way to record finite averaging without introducing an order of pairwise operations, while still preventing cancellation by negative coefficients and preserving affine invariance.
[definition: Convex Combination]
Let $V$ be a real vector space. A convex combination of points $x_1,\ldots,x_m \in V$ is a point of the form
\begin{align*}
\sum_{i=1}^m \lambda_i x_i,
\end{align*}
where $\lambda_i \geq 0$ for each $i$ and
\begin{align*}
\sum_{i=1}^m \lambda_i = 1.
\end{align*}
[/definition]
The weights behave like masses attached to the points. Negative weights would allow cancellation and directions outside the geometric span of the selected points, while the condition that the weights sum to $1$ keeps the construction affine.
There is a small gap between the definition of convexity and the way convex combinations are used: convexity only mentions two points, while a convex combination may involve many. The obstruction is that a finite weighted average is not visibly produced in one step from the definition; it must be assembled by repeatedly joining intermediate points.
To use convexity in calculations, we need to know that this repeated interpolation never leaves the set. Once that is established, checking membership in a convex set can be reduced to expressing a point as a finite convex combination of known points.
[quotetheorem:10092]
This result turns convexity into a finite [algebraic closure](/page/Algebraic%20Closure) property. In practice, it lets us prove that a point remains feasible by writing it as a weighted average of known feasible points.
[example: The Disk Is Convex]
Let $C=\{x\in\mathbb{R}^2:\|x\|_2\leq 1\}$. To show that $C$ is convex, take $x,y\in C$ and $t\in[0,1]$; we must prove that $(1-t)x+ty\in C$.
By the triangle inequality for the Euclidean norm,
\begin{align*}
\|(1-t)x+ty\|_2\leq \|(1-t)x\|_2+\|ty\|_2.
\end{align*}
Since $t\in[0,1]$, both $1-t$ and $t$ are nonnegative, so absolute homogeneity gives
\begin{align*}
\|(1-t)x\|_2+\|ty\|_2=(1-t)\|x\|_2+t\|y\|_2.
\end{align*}
Because $x,y\in C$, we have $\|x\|_2\leq 1$ and $\|y\|_2\leq 1$, hence
\begin{align*}
(1-t)\|x\|_2+t\|y\|_2\leq (1-t)\cdot 1+t\cdot 1.
\end{align*}
Finally,
\begin{align*}
(1-t)\cdot 1+t\cdot 1=1-t+t=1.
\end{align*}
Therefore $\|(1-t)x+ty\|_2\leq 1$, so $(1-t)x+ty\in C$. Every segment between two points of the disk remains inside the disk, and hence the disk is convex.
[/example]
The same computation works for balls in any [normed vector space](/page/Normed%20Vector%20Space) because it used only homogeneity and the triangle inequality. Convexity is therefore not a peculiarity of Euclidean circles; it is built into the geometry of norms.
The opposite question is just as important: given an arbitrary set, what is the smallest convex world that contains it? This construction is needed whenever a finite sample, a collection of constraints, or a set of extreme possibilities must be enlarged until averaging becomes legal.
[definition: Convex Hull]
Let $V$ be a real vector space and let $A \subset V$. The convex hull of $A$, denoted $\operatorname{conv}(A)$, is the intersection of all convex subsets of $V$ that contain $A$.
[/definition]
The intersection definition makes existence automatic because $V$ itself is convex and contains $A$. The more useful description is constructive: take all possible finite convex combinations of points of $A$.
The intersection definition is clean but not directly computable: it tells us what must contain the hull, not how to list its points. To use convex hulls in examples and calculations, we need to know that no points are missing from the finite convex combinations and no extra points have been added.
[quotetheorem:10093]
The theorem gives a practical recipe for building convex sets from raw data. Once the vertices or samples are known, the remaining points are exactly the weighted averages allowed by the convexity rule.
## Segments, Hulls, and Polytopes
### Line Segments as Atomic Convex Sets
A line segment is the smallest nonempty convex object determined by two points, so it is the atom from which the whole theory is assembled. Studying segments first also separates affine geometry from metric geometry: no distance or angle is required.
[definition: Line Segment]
Let $V$ be a real vector space and let $x,y \in V$. The line segment from $x$ to $y$ is
\begin{align*}
[x,y] = \{(1-t)x + ty : t \in [0,1]\}.
\end{align*}
[/definition]
With this notation, a set is convex exactly when it contains $[x,y]$ for every pair $x,y$ in the set. This reformulation is often the fastest way to detect both convexity and failure of convexity in examples.
When several convex restrictions are imposed simultaneously, feasibility means lying in all of the corresponding sets. The possible obstruction is that combining constraints might destroy convexity, so we need a closure principle showing that common solutions still respect line segments.
[quotetheorem:4084]
This theorem is why convex feasible regions can be built from many separate convex constraints. Half-spaces, balls, norm inequalities, and affine equations may be intersected without leaving the convex category.
Intersections handle adding constraints, but they do not explain what happens when the same convex region is viewed through a different set of variables. Projections, coordinate changes, translations, and affine slices all move points while preserving the operation of taking weighted averages. To make those operations part of the theory rather than an informal geometric habit, we need to name the maps that respect straight-line interpolation.
[definition: Affine Map]
Let $V$ and $W$ be real vector spaces. A map $T: V \to W$ is affine if there exist a [linear map](/page/Linear%20Map) $L: V \to W$ and a vector $b \in W$ such that
\begin{align*}
T(x) = Lx + b
\end{align*}
for every $x \in V$.
[/definition]
Affine maps are translations of linear maps. From this standard form, they preserve two-point affine combinations:
\begin{align*}
T((1-t)x+ty) = (1-t)T(x)+tT(y)
\end{align*}
for every $x,y \in V$ and $t \in [0,1]$. Thus an affine map sends line segments to line segments with the same parameter. This is useful only if it lets convexity survive the operations analysts and geometers actually perform: projecting a feasible region to fewer variables, pulling back a constraint from the target space, or changing coordinates before solving a problem. The next theorem records those two directions separately, because images and inverse images are used in different ways.
[quotetheorem:10094]
This theorem is the mechanism behind projections of feasible regions, affine slices such as $Ax=b$, and changes of coordinates. It keeps the elementary theory from being tied to one chosen representation of the same geometric set.
### Polyhedra and Polytopes
Finite-dimensional convex geometry often begins with linear inequalities rather than with curved bodies. A linear inequality selects one side of an affine hyperplane, so finite systems of inequalities create regions with flat faces.
[definition: Half-Space]
Let $V$ be a finite-dimensional real vector space, let $\ell: V \to \mathbb{R}$ be a nonconstant affine functional, and let $\alpha \in \mathbb{R}$. The subset
\begin{align*}
H = \{x \in V : \ell(x) \leq \alpha\}
\end{align*}
is a closed half-space.
[/definition]
The nonconstant condition excludes the degenerate inequalities that produce all of $V$ or $\varnothing$. Half-spaces are convex because affine functionals preserve line segments. They are the basic constraints in linear programming and the local models for the flat faces of a convex body.
A single half-space models one linear inequality, but linear optimisation usually imposes many such inequalities at once. We therefore need a name for the regions whose entire shape is determined by finitely many flat-sided constraints.
[definition: Polyhedron]
Let $V$ be a finite-dimensional real vector space. A polyhedron in $V$ is an intersection of finitely many closed half-spaces.
[/definition]
A bounded polyhedron has only a finite amount of room, and its boundary is assembled from flat pieces. Such sets are the geometric form of feasible regions in finite linear optimisation.
Polyhedra can be unbounded, so finite inequalities alone still include strips, cones, and half-lines that do not behave like finite-faced solids. Those examples have flat sides, but they do not have the finite extent expected of a polygon or a solid polyhedron.
The next distinction separates general feasible regions from the finite solid shapes that behave like polygons and three-dimensional polyhedra. To single out those compact finite-inequality objects, we impose boundedness and give them their own name, since they are the sets whose vertex-and-face structure is studied in combinatorial convexity.
[definition: Polytope]
Let $V$ be a finite-dimensional real vector space. A polytope in $V$ is a bounded polyhedron.
[/definition]
Polytopes are the finite-dimensional convex sets that can be drawn with vertices, edges, and faces. They provide the bridge between combinatorial geometry and analytic convexity.
[example: A Triangle as a Convex Hull]
Let $a,b,c \in \mathbb{R}^2$ be non-collinear. By *Convex Hull by Convex Combinations*, every point of $\operatorname{conv}\{a,b,c\}$ is a finite convex combination of points chosen from $\{a,b,c\}$. If such a combination is written as $\sum_{j=1}^m \mu_j p_j$, where $p_j\in\{a,b,c\}$, $\mu_j\geq 0$, and $\sum_{j=1}^m\mu_j=1$, set
\begin{align*}
\lambda_1=\sum_{j:p_j=a}\mu_j,\quad \lambda_2=\sum_{j:p_j=b}\mu_j,\quad \lambda_3=\sum_{j:p_j=c}\mu_j.
\end{align*}
Then $\lambda_1,\lambda_2,\lambda_3\geq 0$ and
\begin{align*}
\lambda_1+\lambda_2+\lambda_3=\sum_{j=1}^m\mu_j=1.
\end{align*}
Grouping the terms with the same vertex gives
\begin{align*}
\sum_{j=1}^m\mu_j p_j=\lambda_1a+\lambda_2b+\lambda_3c.
\end{align*}
Conversely, whenever $\lambda_i\geq 0$ and $\lambda_1+\lambda_2+\lambda_3=1$, the point $\lambda_1a+\lambda_2b+\lambda_3c$ is itself a convex combination of points of $\{a,b,c\}$. Hence
\begin{align*}
\operatorname{conv}\{a,b,c\}=\{\lambda_1a+\lambda_2b+\lambda_3c:\lambda_i\geq 0,\ \lambda_1+\lambda_2+\lambda_3=1\}.
\end{align*}
The coefficients $\lambda_1,\lambda_2,\lambda_3$ are the barycentric coordinates of the point. They are unique: if
\begin{align*}
\lambda_1a+\lambda_2b+\lambda_3c=\mu_1a+\mu_2b+\mu_3c
\end{align*}
and both triples have sum $1$, then subtracting and using $\lambda_3-\mu_3=-(\lambda_1-\mu_1)-(\lambda_2-\mu_2)$ gives
\begin{align*}
(\lambda_1-\mu_1)(a-c)+(\lambda_2-\mu_2)(b-c)=0.
\end{align*}
Since $a,b,c$ are non-collinear, the vectors $a-c$ and $b-c$ are linearly independent, so $\lambda_1=\mu_1$, $\lambda_2=\mu_2$, and then $\lambda_3=\mu_3$. If $\lambda_3=0$, the point is $\lambda_1a+\lambda_2b$ with $\lambda_1+\lambda_2=1$, so it lies on the edge $[a,b]$; the other two edges are obtained by setting $\lambda_1=0$ or $\lambda_2=0$. Points with all three coefficients positive avoid all three edges, so they are exactly the interior points of the triangle.
[/example]
This example shows how inequalities on coefficients encode geometric position. It also hints at the broader fact that finite convex hulls are exactly polytopes in finite-dimensional spaces.
The triangle example used vertices, while the definition of a polytope used inequalities and boundedness. These are a priori different finite descriptions: one builds a set from points, while the other cuts it out by linear constraints. Without a bridge between them, results about convex hulls and results about linear inequalities would apply to apparently different classes of objects.
The central question is whether these two descriptions actually determine the same shapes. We need to know both that every finite list of vertices produces a bounded intersection of half-spaces, and that every bounded finite system of linear inequalities can be recovered from finitely many vertices.
[quotetheorem:10095]
The theorem explains why pictures of polytopes can alternate between vertex lists and systems of inequalities. Algorithms for convex optimisation exploit this dual description, choosing the representation that makes the current step simpler.
## Convex Functions and Epigraphs
### Convexity Above a Graph
Convex sets are not only subsets of the original space; they also encode inequalities for functions. The geometric trick is to place a function inside one higher dimension and look at the region above its graph.
[definition: Epigraph]
Let $C \subset V$ be a subset of a real vector space and let $f: C \to \mathbb{R}$ be a function. The epigraph of $f$ is
\begin{align*}
\operatorname{epi}(f)
= \{(x,r) \in C \times \mathbb{R} : f(x) \leq r\}.
\end{align*}
[/definition]
The epigraph remembers every upper bound on the function value at every point. If the region above the graph is convex, then averaging inputs and heights respects the graph in exactly the way convex analysis needs.
The epigraph criterion is geometric, but many arguments with functions require an inequality that can be checked directly at two input points. The scalar condition must say that averaging inputs does no worse than averaging the corresponding function values.
[definition: Convex Function]
Let $C$ be a convex subset of a real vector space $V$. A function $f: C \to \mathbb{R}$ is convex if for every $x,y \in C$ and every $t \in [0,1]$,
\begin{align*}
f((1-t)x+ty) \leq (1-t)f(x)+t f(y).
\end{align*}
[/definition]
The inequality says that the graph of $f$ lies below each chord connecting two graph points. This is the analytic form of the geometric statement that the epigraph contains line segments.
The quoted theorem below uses the standard convex-analysis convention of an extended-real-valued function
\begin{align*}
f:\mathbb{R}^n\to(-\infty,+\infty].
\end{align*}
This means that $f$ is allowed to take the value $+\infty$. Its effective domain is $\{x\in\mathbb{R}^n:f(x)<+\infty\}$, and convexity is the same [Jensen inequality](/theorems/515) interpreted in the extended-real order. This convention packages the finite-domain situation above into a function on all of $\mathbb{R}^n$: points outside the intended domain can be assigned the value $+\infty$.
With that convention, the epigraph criterion says exactly that a convexity inequality is equivalent to the geometry of the region above the graph. The forward implication says that if the values of $f$ obey the chord inequality, then any two points lying above the graph can be averaged without falling below the graph. The reverse implication reads the same fact backward: apply convexity of the epigraph to points sitting on the graph and recover the Jensen inequality from the height coordinate. Thus the theorem below is the bridge between the analytic definition of a [convex function](/page/Convex%20Function) and the convex-set language developed earlier in the chapter.
[quotetheorem:6668]
The theorem turns a function into a geometric object without changing the essential inequality. This viewpoint is especially useful when minimization is studied through supporting hyperplanes and sublevel sets.
[example: Squaring on the Real Line]
Let $f:\mathbb{R}\to\mathbb{R}$ be given by $f(u)=u^2$. To prove convexity, fix $x,y\in\mathbb{R}$ and $t\in[0,1]$; we compare the weighted average $(1-t)f(x)+tf(y)$ with $f((1-t)x+ty)$.
First expand the square:
\begin{align*}
((1-t)x+ty)^2=((1-t)x)^2+2((1-t)x)(ty)+(ty)^2.
\end{align*}
Using multiplication in $\mathbb{R}$, this is
\begin{align*}
((1-t)x+ty)^2=(1-t)^2x^2+2t(1-t)xy+t^2y^2.
\end{align*}
Therefore
\begin{align*}
(1-t)x^2+ty^2-((1-t)x+ty)^2=(1-t)x^2+ty^2-(1-t)^2x^2-2t(1-t)xy-t^2y^2.
\end{align*}
Collecting the $x^2$ and $y^2$ coefficients gives
\begin{align*}
(1-t)x^2-(1-t)^2x^2=t(1-t)x^2.
\end{align*}
Similarly,
\begin{align*}
ty^2-t^2y^2=t(1-t)y^2.
\end{align*}
Substituting these two identities into the difference gives
\begin{align*}
(1-t)x^2+ty^2-((1-t)x+ty)^2=t(1-t)x^2-2t(1-t)xy+t(1-t)y^2.
\end{align*}
Factoring out $t(1-t)$ gives
\begin{align*}
(1-t)x^2+ty^2-((1-t)x+ty)^2=t(1-t)(x^2-2xy+y^2).
\end{align*}
Since $x^2-2xy+y^2=(x-y)^2$, we obtain
\begin{align*}
(1-t)x^2+ty^2-((1-t)x+ty)^2=t(1-t)(x-y)^2.
\end{align*}
Now $t\in[0,1]$ implies $t\geq 0$ and $1-t\geq 0$, and $(x-y)^2\geq 0$, so
\begin{align*}
t(1-t)(x-y)^2\geq 0.
\end{align*}
Hence
\begin{align*}
((1-t)x+ty)^2\leq (1-t)x^2+ty^2.
\end{align*}
This is exactly the convexity inequality for $f(u)=u^2$, so squaring is convex on $\mathbb{R}$; geometrically, its epigraph is the region above the parabola.
[/example]
This computation is small, but it is the model for many convexity proofs: compare an average of function values with the function value at an average. The nonnegative error term measures how much curvature the function has.
### Sublevel Sets
When a function is minimized, the sets of points whose value is below a threshold describe the feasible levels of the problem. Convexity of the function forces these levels to have no gaps along segments.
[definition: Sublevel Set]
Let $C$ be a set and let $f: C \to \mathbb{R}$. For $\alpha \in \mathbb{R}$, the $\alpha$-sublevel set of $f$ is
\begin{align*}
\{x \in C : f(x) \leq \alpha\}.
\end{align*}
[/definition]
Sublevel sets are the regions swept out by moving a horizontal line upward across the graph of a function. In optimisation, they describe which points are at least as good as a specified tolerance.
A sublevel set is useful for optimization only if it preserves the no-gaps geometry of convexity. The possible failure is clear: two points may both satisfy the same value bound, but a point between them could rise above the allowed level if the function bends the wrong way.
This raises the basic compatibility question between convex functions and convex feasible regions: when a function is controlled by chords, do its value-bounded regions automatically inherit convexity? The following result records that bridge from an analytic inequality on function values to the geometric shape of every sublevel set.
[quotetheorem:10096]
The converse is not valid for ordinary convexity; functions with convex sublevel sets are called quasiconvex. This distinction is important because level-set geometry can be weaker than chord control of the graph.
[example: A Convex Set from an Inequality]
Let $f:\mathbb{R}^n\to\mathbb{R}$ be the Euclidean norm, $f(x)=\|x\|_2$, and fix $r\geq 0$. Its $r$-sublevel set is
\begin{align*}
B_r=\{x\in\mathbb{R}^n:\|x\|_2\leq r\},
\end{align*}
the closed ball of radius $r$ centered at the origin. We show that $B_r$ is convex.
Take $x,y\in B_r$ and $t\in[0,1]$. Since $x,y\in B_r$, we have
\begin{align*}
\|x\|_2\leq r
\end{align*}
and
\begin{align*}
\|y\|_2\leq r.
\end{align*}
By the triangle inequality for the Euclidean norm,
\begin{align*}
\|(1-t)x+ty\|_2\leq \|(1-t)x\|_2+\|ty\|_2.
\end{align*}
Since $t\in[0,1]$, both $1-t$ and $t$ are nonnegative, so absolute homogeneity gives
\begin{align*}
\|(1-t)x\|_2=(1-t)\|x\|_2
\end{align*}
and
\begin{align*}
\|ty\|_2=t\|y\|_2.
\end{align*}
Substituting these identities,
\begin{align*}
\|(1-t)x+ty\|_2\leq (1-t)\|x\|_2+t\|y\|_2.
\end{align*}
Using $\|x\|_2\leq r$ and $\|y\|_2\leq r$,
\begin{align*}
(1-t)\|x\|_2+t\|y\|_2\leq (1-t)r+tr.
\end{align*}
Finally,
\begin{align*}
(1-t)r+tr=r-tr+tr=r.
\end{align*}
Therefore
\begin{align*}
\|(1-t)x+ty\|_2\leq r,
\end{align*}
so $(1-t)x+ty\in B_r$. Thus every segment joining two points of the closed ball remains inside the ball, and the inequality $\|x\|_2\leq r$ defines a convex set.
[/example]
This example shows how convex sets and convex functions produce one another. A norm creates balls, balls create constraints, and constraints create feasible regions for minimization problems.
## Separation and Support
### Affine Span and Relative Interior
Many convex sets live naturally in a lower-dimensional affine plane even when the ambient space is larger. To state separation and support results accurately, we need language for the smallest affine world containing the set.
[definition: Affine Hull]
Let $V$ be a real vector space and let $A \subset V$. The affine hull of $A$, denoted $\operatorname{aff}(A)$, is the intersection of all affine subspaces of $V$ that contain $A$.
[/definition]
For a line segment in the plane, the affine hull is the whole line through the segment rather than the whole plane. Relative notions of interior and boundary are measured inside this smaller affine hull.
Ordinary interior depends on the ambient space, so a segment in $\mathbb{R}^2$ would have empty interior even though it plainly has interior points along its own line. The same issue appears for polygons lying in planes inside higher-dimensional spaces and for feasible regions forced into an affine constraint set.
To avoid confusing these lower-dimensional convex sets with genuinely boundary-only objects, interior must be measured inside the affine hull. We therefore need a version of interior that ignores irrelevant ambient directions and records openness only within the affine world where the convex set actually lives.
[definition: Relative Interior]
Let $V$ be a finite-dimensional real vector space and let $C \subset V$ be convex. The relative interior of $C$, denoted $\operatorname{relint}(C)$, is the interior of $C$ with respect to the topology on $\operatorname{aff}(C)$ inherited from $V$.
[/definition]
Relative interior is the correct interior for faces, segments, polygons in planes, and lower-dimensional feasible regions. It keeps boundary language compatible with the affine dimension of the set being studied.
[example: Relative Interior of a Segment]
Let $a,b \in \mathbb{R}^2$ with $a \neq b$, and set $C=[a,b]$. Write $d=b-a$, so $d\neq 0$ and
\begin{align*}
C=\{a+td:0\leq t\leq 1\}.
\end{align*}
The affine hull of $C$ is the line
\begin{align*}
\operatorname{aff}(C)=\{a+sd:s\in\mathbb{R}\}.
\end{align*}
As a subset of $\mathbb{R}^2$, the segment has empty interior. Indeed, take any point $z=a+td\in C$ and any $\varepsilon>0$. If $d=(d_1,d_2)$, let $n=(-d_2,d_1)$. Then
\begin{align*}
d\cdot n=d_1(-d_2)+d_2d_1=0.
\end{align*}
Also $n\neq 0$, because $d\neq 0$. The point
\begin{align*}
w=z+\frac{\varepsilon}{2\|n\|_2}n
\end{align*}
satisfies
\begin{align*}
\|w-z\|_2=\left\|\frac{\varepsilon}{2\|n\|_2}n\right\|_2=\frac{\varepsilon}{2}<\varepsilon.
\end{align*}
But $w\notin C$: if $w=a+sd$ for some $s$, then
\begin{align*}
w-z=(a+sd)-(a+td)=(s-t)d,
\end{align*}
while the definition of $w$ gives
\begin{align*}
w-z=\frac{\varepsilon}{2\|n\|_2}n.
\end{align*}
Taking dot products with $n$ would give
\begin{align*}
0=(s-t)d\cdot n=\frac{\varepsilon}{2\|n\|_2}n\cdot n=\frac{\varepsilon}{2\|n\|_2}\|n\|_2^2>0,
\end{align*}
a contradiction. Thus every Euclidean ball around every point of $C$ contains points outside $C$, so $C$ has empty interior in $\mathbb{R}^2$.
Inside $\operatorname{aff}(C)$, the relative interior is
\begin{align*}
\{a+td:0<t<1\}=\{(1-t)a+tb:0<t<1\}.
\end{align*}
For $0<t<1$, choose $\delta>0$ with $\delta<t$ and $\delta<1-t$. If $|s-t|<\delta$, then
\begin{align*}
0<t-\delta<s<t+\delta<1,
\end{align*}
so every nearby point $a+sd$ of the affine line remains in $C$. At $t=0$, every relative neighborhood contains points $a+sd$ with $s<0$, and those are not in $C$; at $t=1$, every relative neighborhood contains points $a+sd$ with $s>1$, and those are not in $C$. Hence the endpoints $a$ and $b$ form the relative boundary, while the points with $0<t<1$ are exactly the points with room on both sides along the line.
[/example]
This example prevents a common misreading of separation theorems. The relevant interior may be measured in the affine hull, especially when convex sets are constrained by equalities.
### Hyperplanes That Witness Geometry
If two convex sets do not meet, geometry often supplies a linear inequality that puts one set strictly on one side of a level and the other set on the opposite side. Such inequalities are the reason convex sets can be studied through dual variables, normals, and supporting planes.
[definition: Separating Hyperplane]
Let $V$ be a finite-dimensional real vector space, let $C \subset V$, and let $p \in V \setminus C$. A separating hyperplane between $p$ and $C$ is an affine hyperplane
\begin{align*}
\{x \in V : \ell(x)=\alpha\}
\end{align*}
such that $\ell(p)>\alpha$ and $\ell(x)\leq \alpha$ for every $x \in C$, where $\ell: V \to \mathbb{R}$ is a nonconstant affine functional.
[/definition]
The nonconstant condition is essential: the linear part of $\ell$ must be nonzero, otherwise the level set cannot be a genuine hyperplane. With that understood, the hyperplane certifies nonmembership by a single scalar inequality. This certificate is stronger than saying that $p$ is outside, because it gives a direction in which the set and the point are ordered.
After defining separating hyperplanes, the real question is whether such certificates actually exist. A broad separation theorem works in a real normed vector space $X$ for two disjoint convex sets when one of them is open. Here $X^*$ denotes the [dual space](/page/Dual%20Space) of continuous linear functionals $X\to\mathbb{R}$. In finite-dimensional Euclidean statements, $(\mathbb{R}^n)^*$ similarly denotes the space of linear functionals on $\mathbb{R}^n$.
[quotetheorem:974]
Separation converts geometry into inequalities. In the theorem, the functional $f$ is the separating direction and the scalar $\alpha$ is the separating level: every point of $A$ lies strictly below that level, while every point of $B$ lies at or above it. In optimisation, such a functional becomes a normal direction or multiplier.
Separation concerns disjoint sets, but boundary geometry also asks for one-sided certificates attached to a convex set itself. Instead of prescribing the contact point in advance, one can prescribe a nonzero linear direction and ask where that direction is maximized on the set. A hyperplane at such a maximizer supports the set because all other points have no larger value in that direction.
[definition: Supporting Hyperplane]
Let $C \subset V$ be a convex subset of a finite-dimensional real vector space and let $x_0 \in C$. A supporting hyperplane to $C$ at $x_0$ is an affine hyperplane
\begin{align*}
\{x \in V : \ell(x)=\ell(x_0)\}
\end{align*}
such that $\ell(x)\leq \ell(x_0)$ for every $x \in C$, where $\ell: V \to \mathbb{R}$ is a nonconstant affine functional.
[/definition]
A supporting hyperplane is a local certificate that all of $C$ lies on one side of a boundary plane. The nonconstant condition again rules out degenerate constant level sets, so the support is supplied by a real normal direction. Smooth convex bodies have tangent supporting planes, while polytopes have supporting planes along faces.
For compact convex sets in $\mathbb{R}^n$, every nonzero functional $u\in(\mathbb{R}^n)^*$ has a maximum on the set. The next theorem turns that maximum into a supporting hyperplane: the contact point is chosen by the direction $u$, and the half-space inequality is simply the statement that no point of $C$ has larger $u$-value.
[quotetheorem:4090]
Supporting hyperplanes are the geometric ancestors of first-order optimality conditions. At a minimizer over a convex region, the objective tries to decrease in directions that the feasible set blocks.
## Extreme Points and Structure
### Points That Cannot Be Split
Convex sets are made from averages, so it is natural to ask which points cannot themselves be expressed as genuine averages of other points in the set. These indecomposable points often carry the combinatorial structure of the whole set.
[definition: Extreme Point]
Let $C$ be a convex subset of a real vector space. A point $x \in C$ is an extreme point of $C$ if whenever
\begin{align*}
x=(1-t)y+tz
\end{align*}
with $y,z \in C$ and $t \in (0,1)$, then $y=x$ and $z=x$.
[/definition]
An extreme point is a point that cannot be written as a nondegenerate point of a segment lying in $C$. Vertices of polygons are the guiding example, but the definition also applies to infinite-dimensional convex sets.
[example: Extreme Points of a Square]
Let $C=[-1,1]^2\subset\mathbb{R}^2$. We show first that each corner is extreme. Fix a corner $v=(\varepsilon_1,\varepsilon_2)$, where each $\varepsilon_i$ is either $1$ or $-1$, and suppose
\begin{align*}
v=(1-t)y+tz
\end{align*}
with $y=(y_1,y_2)\in C$, $z=(z_1,z_2)\in C$, and $t\in(0,1)$. For each coordinate $i$, the equality gives
\begin{align*}
\varepsilon_i=(1-t)y_i+tz_i.
\end{align*}
If $\varepsilon_i=1$, then $y_i,z_i\leq 1$, so
\begin{align*}
(1-t)y_i+tz_i\leq (1-t)\cdot 1+t\cdot 1=1.
\end{align*}
Equality can hold only if $y_i=1$ and $z_i=1$: if $y_i<1$, then
\begin{align*}
(1-t)y_i+tz_i\leq (1-t)y_i+t\cdot 1<(1-t)\cdot 1+t\cdot 1=1,
\end{align*}
and if $z_i<1$, then
\begin{align*}
(1-t)y_i+tz_i\leq (1-t)\cdot 1+tz_i<(1-t)\cdot 1+t\cdot 1=1.
\end{align*}
If $\varepsilon_i=-1$, then $y_i,z_i\geq -1$, so the same argument with inequalities reversed shows $y_i=-1$ and $z_i=-1$. Thus every coordinate of $y$ and $z$ equals the corresponding coordinate of $v$, so $y=v$ and $z=v$. Hence all four corners $(\pm 1,\pm 1)$ are extreme points of $C$.
A non-corner point on an edge is not extreme. For example, let $p=(s,1)$ with $-1<s<1$. Set
\begin{align*}
t=\frac{s+1}{2}.
\end{align*}
Since $-1<s<1$, we have $0<t<1$. Also
\begin{align*}
(1-t)(-1,1)+t(1,1)=(-1+t+t,\ 1-t+t)=(2t-1,\ 1).
\end{align*}
Using $t=(s+1)/2$,
\begin{align*}
2t-1=2\cdot\frac{s+1}{2}-1=s+1-1=s.
\end{align*}
Therefore
\begin{align*}
p=(1-t)(-1,1)+t(1,1),
\end{align*}
where $(-1,1)$ and $(1,1)$ are distinct points of $C$. So $p$ is a genuine interior point of a segment lying in $C$, not an extreme point. The other three edges are handled in the same way by holding the appropriate coordinate equal to $1$ or $-1$.
[/example]
The square illustrates that extreme points are not the same as boundary points. Edges contain many boundary points, but only the corners resist being split into a segment inside the square.
The structural question is whether these indecomposable points are merely special boundary points or whether they determine the whole compact convex set. Without such a result, extreme points would describe only isolated features rather than a usable reconstruction principle.
[quotetheorem:4093]
For polytopes, this theorem reduces to the familiar statement that the polytope is the convex hull of its vertices. For curved compact convex bodies, the set of extreme points can be infinite, but the finite-dimensional theorem still says that the body is their convex hull.
### Faces as Exposed Subsets
When a supporting hyperplane touches a convex set, the contact region may be a single point, an edge, or a higher-dimensional piece. These contact regions are the faces of the set and organize its boundary.
[definition: Face]
Let $C$ be a convex subset of a real vector space. A convex subset $F \subset C$ is a face of $C$ if whenever $x,y \in C$, $t \in (0,1)$, and
\begin{align*}
(1-t)x+ty \in F,
\end{align*}
then $x \in F$ and $y \in F$.
[/definition]
The definition says that a face contains every whole segment whose relative interior touches it. This prevents a face from cutting through the middle of a segment in the ambient convex set.
In computations, faces usually appear as the set of points where some linear functional is as large as possible. The point that needs justification is that such an exposed contact set cannot slice through a segment of $C$; it must satisfy the intrinsic segment condition in the definition of a face.
[quotetheorem:4091]
This theorem explains why optimizing a linear functional over a convex set selects a face rather than an arbitrary subset. Linear programming uses exactly this mechanism when an optimal solution lies on a vertex, edge, or higher-dimensional face.
## Beyond and Connected Topics
Convex sets become a working language in optimisation once objective functions and constraints are placed into the same geometry. The Androma page [Cambridge IB Optimisation](/page/Cambridge%20IB%20Optimisation) is a natural continuation because it studies feasible regions, linear programming, and multiplier ideas that rely on separation and supporting hyperplanes.
The analytic side of convexity appears when compactness, [weak convergence](/page/Weak%20Convergence), and lower semicontinuity enter the picture. The page [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions) gives a broader environment for function spaces and limiting arguments, while convex sets supply the stable domains on which minimization problems behave well.
Convexity also reaches complex and differential geometry through pseudoconvexity, hulls, and boundary behavior. The Androma page [Several Complex Variables V: CR Geometry and Boundary Behavior](/page/Several%20Complex%20Variables%20V%3A%20CR%20Geometry%20and%20Boundary%20Behavior) is a later-stage direction where the simple segment condition is replaced by geometry adapted to holomorphic functions.
## References
Androma, [Cambridge IB Optimisation](/page/Cambridge%20IB%20Optimisation).
Androma, [Cambridge IA Differential Equations](/page/Cambridge%20IA%20Differential%20Equations).
Androma, [Cambridge II Analysis of Functions](/page/Cambridge%20II%20Analysis%20of%20Functions).
Androma, [Several Complex Variables V: CR Geometry and Boundary Behavior](/page/Several%20Complex%20Variables%20V%3A%20CR%20Geometry%20and%20Boundary%20Behavior).
R. Tyrrell Rockafellar, *Convex Analysis* (1970).
Rolf Schneider, *Convex Bodies: The Brunn-Minkowski Theory* (2014).
Stephen Boyd and Lieven Vandenberghe, *Convex Optimization* (2004).
Convex Set
Also known as: Convex subset, Convex region, Convex body, Convex domain, Convexity