Hahn-Banach Separation Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We reduce the separation of two disjoint convex sets to the separation of a single convex set from the origin, then use the Minkowski functional (gauge) of that set as a sublinear majorant for the [Hahn-Banach Theorem](/theorems/879). The key construction is to form the open convex set $C = A - B + x_0$ for a suitable translation, apply Hahn-Banach to extend a one-dimensional functional dominated by the gauge of $C$, and then unwind the translation to obtain the separating hyperplane. [/proofplan] [step:Reduce to separating a single open convex set from the origin] Since $A$ is open and $A \cap B = \varnothing$, the set \begin{align*} C := A - B := \{a - b : a \in A, \, b \in B\} \end{align*} is open in $X$ (for each fixed $b \in B$, the map $a \mapsto a - b$ is a translation, and $A - b$ is open; then $C = \bigcup_{b \in B}(A - b)$ is a union of [open sets](/page/Open%20Set)). Moreover, $C$ is convex: given $c_1 = a_1 - b_1$ and $c_2 = a_2 - b_2$ in $C$ and $t \in (0,1)$, \begin{align*} tc_1 + (1-t)c_2 = \bigl(ta_1 + (1-t)a_2\bigr) - \bigl(tb_1 + (1-t)b_2\bigr) \in A - B = C, \end{align*} using the convexity of both $A$ and $B$. Since $A \cap B = \varnothing$, we have $0 \notin C$. [guided] The idea is to convert the problem of separating two sets $A$ and $B$ into the problem of separating a single set $C$ from the origin. Why? Because the Hahn-Banach theorem gives us control over functionals dominated by sublinear functions, and the Minkowski functional of a convex set containing the origin provides exactly such a function — but only relative to the origin. So we need to arrange for the origin to be the "point to separate from." Form the Minkowski difference $C := A - B = \{a - b : a \in A, \, b \in B\}$. We must verify three properties: **Openness.** For each $b \in B$, the translate $A - b = \{a - b : a \in A\}$ is open (translations are homeomorphisms in a [normed space](/page/Normed%20Vector%20Space)). Then $C = \bigcup_{b \in B}(A - b)$ is a union of open sets, hence open. **Convexity.** Take $c_1 = a_1 - b_1$ and $c_2 = a_2 - b_2$ in $C$ and $t \in (0,1)$. Then \begin{align*} tc_1 + (1-t)c_2 = \bigl(ta_1 + (1-t)a_2\bigr) - \bigl(tb_1 + (1-t)b_2\bigr). \end{align*} By the convexity of $A$, the element $ta_1 + (1-t)a_2$ lies in $A$. By the convexity of $B$, the element $tb_1 + (1-t)b_2$ lies in $B$. So the difference lies in $A - B = C$. **$0 \notin C$.** If $0 = a - b$ for some $a \in A$, $b \in B$, then $a = b \in A \cap B$, contradicting $A \cap B = \varnothing$. We have now reduced the problem: finding $f \in X^*$ and $\alpha \in \mathbb{R}$ with $f(a) < \alpha \le f(b)$ for all $a \in A$, $b \in B$ is equivalent to finding $f \in X^*$ with $f(c) < 0$ for all $c \in C$ (then set $\alpha = \inf_{b \in B} f(b)$). In the next step, we further translate $C$ so that it contains the origin, in order to define its Minkowski functional. [/guided] [/step] [step:Construct the Minkowski functional of a translated copy of $C$] Fix any element $x_0 \in C$ (which exists since $A$ and $B$ are nonempty). Define the translated set \begin{align*} D := C - x_0 = \{c - x_0 : c \in C\}. \end{align*} Then $D$ is open, convex, and $0 \in D$ (since $x_0 - x_0 = 0 \in D$). Define the Minkowski functional of $D$: \begin{align*} p: X &\to \mathbb{R} \\ x &\mapsto \inf\{t > 0 : x \in tD\}. \end{align*} Since $D$ is open, convex, and contains the origin, $p$ is well-defined (the infimum is finite for every $x \in X$, since $D$ is absorbing by openness at $0$), positively homogeneous ($p(\lambda x) = \lambda p(x)$ for all $\lambda > 0$), and subadditive ($p(x_1 + x_2) \le p(x_1) + p(x_2)$). Moreover, because $D$ is open, \begin{align*} x \in D \iff p(x) < 1. \end{align*} Since $0 \notin C$, we have $-x_0 \notin D$ (for if $-x_0 \in D$, then $-x_0 = c - x_0$ for some $c \in C$, giving $c = 0 \in C$, a contradiction). Hence $p(-x_0) \ge 1$. [guided] We now need to introduce a sublinear function that the Hahn-Banach theorem can work with. The Minkowski functional (gauge) is the standard tool for this, but it requires the set to contain the origin. Since $0 \notin C$, we translate: pick any $x_0 \in C$ and set $D = C - x_0$. **$D$ contains the origin:** $0 = x_0 - x_0 \in C - x_0 = D$. **$D$ is open and convex:** translations preserve both properties. Define the Minkowski functional $p: X \to \mathbb{R}$ by $p(x) = \inf\{t > 0 : x \in tD\}$. We verify the key properties: **$p$ is finite everywhere:** Since $D$ is open and $0 \in D$, there exists $\varepsilon > 0$ with $B(0, \varepsilon) \subset D$. For any $x \in X$ with $x \neq 0$, we have $x \in tD$ whenever $t > \|x\|/\varepsilon$ (because $x/t \in B(0, \varepsilon) \subset D$), so $p(x) \le \|x\|/\varepsilon < \infty$. **Positive homogeneity:** For $\lambda > 0$, $p(\lambda x) = \inf\{t > 0 : \lambda x \in tD\} = \inf\{t > 0 : x \in (t/\lambda)D\} = \lambda \inf\{s > 0 : x \in sD\} = \lambda p(x)$. **Subadditivity:** Given $x_1, x_2 \in X$ and $\varepsilon > 0$, pick $t_1, t_2 > 0$ with $x_1 \in t_1 D$, $x_2 \in t_2 D$, $t_1 < p(x_1) + \varepsilon$, $t_2 < p(x_2) + \varepsilon$. Write $x_1 = t_1 d_1$, $x_2 = t_2 d_2$ with $d_1, d_2 \in D$. Then $x_1 + x_2 = (t_1 + t_2)\bigl(\frac{t_1}{t_1+t_2}d_1 + \frac{t_2}{t_1+t_2}d_2\bigr)$. The convex combination $\frac{t_1}{t_1+t_2}d_1 + \frac{t_2}{t_1+t_2}d_2 \in D$ by convexity, so $x_1 + x_2 \in (t_1 + t_2)D$. Hence $p(x_1 + x_2) \le t_1 + t_2 < p(x_1) + p(x_2) + 2\varepsilon$. Since $\varepsilon > 0$ is arbitrary, $p(x_1 + x_2) \le p(x_1) + p(x_2)$. **Characterisation of $D$:** If $x \in D$, then $x \in 1 \cdot D$, so $p(x) \le 1$. In fact $p(x) < 1$: since $D$ is open and $x \in D$, there exists $\delta > 0$ with $B(x, \delta) \subset D$. Then $x \in (1-\varepsilon)D$ for sufficiently small $\varepsilon > 0$ (because $x/(1-\varepsilon) \in D$ by openness), giving $p(x) \le 1 - \varepsilon < 1$. Conversely, if $p(x) < 1$, then there exists $t < 1$ with $x \in tD$, i.e., $x/t \in D$. Since $0 \in D$ and $D$ is convex, $x = t(x/t) + (1-t) \cdot 0 \in D$. **$p(-x_0) \ge 1$:** If $p(-x_0) < 1$, then $-x_0 \in D = C - x_0$, so $-x_0 + x_0 = 0 \in C$, contradicting $0 \notin C$. [/guided] [/step] [step:Apply the Hahn-Banach theorem to extend a functional dominated by $p$] Consider the one-dimensional subspace $W := \operatorname{span}\{-x_0\} = \{-\lambda x_0 : \lambda \in \mathbb{R}\}$ and define the linear functional \begin{align*} g: W &\to \mathbb{R} \\ -\lambda x_0 &\mapsto \lambda p(-x_0). \end{align*} We verify $g(w) \le p(w)$ for all $w \in W$. Write $w = -\lambda x_0$, so $g(w) = \lambda p(-x_0)$. - If $\lambda > 0$: $p(w) = p(-\lambda x_0) = \lambda p(-x_0) = g(w)$, so $g(w) = p(w)$. - If $\lambda \le 0$: $g(w) = \lambda p(-x_0) \le 0 \le p(w)$, since $p \ge 0$. In both cases $g(w) \le p(w)$. By the [Hahn-Banach Theorem](/theorems/879), applied with the sublinear functional $p$ (which is positively homogeneous and subadditive), there exists a linear functional $\tilde{g}: X \to \mathbb{R}$ extending $g$ with $\tilde{g}(x) \le p(x)$ for all $x \in X$. [guided] We need to build a linear functional on all of $X$ that separates $C$ from the origin. The Hahn-Banach theorem extends functionals from subspaces, so we start on a one-dimensional subspace. The natural choice is $W = \operatorname{span}\{-x_0\}$, because $-x_0 \notin D$ (equivalently, $p(-x_0) \ge 1$), so a functional "pointing in the $-x_0$ direction" will be strictly positive on $-x_0$ and has a chance of being negative on $D$. Define $g(-\lambda x_0) = \lambda p(-x_0)$. **$g$ is linear:** This is immediate from the definition: $g(-(\lambda_1 + \lambda_2)x_0) = (\lambda_1 + \lambda_2)p(-x_0) = g(-\lambda_1 x_0) + g(-\lambda_2 x_0)$, and $g(-(\mu\lambda)x_0) = \mu\lambda p(-x_0) = \mu g(-\lambda x_0)$. **$g$ is dominated by $p$:** For $w = -\lambda x_0 \in W$: - When $\lambda > 0$: $g(w) = \lambda p(-x_0) = p(-\lambda x_0) = p(w)$ by positive homogeneity. - When $\lambda = 0$: $g(0) = 0 = p(0)$. - When $\lambda < 0$: $g(w) = \lambda p(-x_0) < 0 \le p(w)$ since $p(-x_0) \ge 1 > 0$ and $\lambda < 0$. The [Hahn-Banach Theorem](/theorems/879) applies: $p$ is positively homogeneous and subadditive (verified in the previous step), $W$ is a subspace of the real vector space $X$, and $g \le p$ on $W$. The conclusion gives a linear extension $\tilde{g}: X \to \mathbb{R}$ with $\tilde{g}|_W = g$ and $\tilde{g}(x) \le p(x)$ for all $x \in X$. [/guided] [/step] [step:Show $\tilde{g}$ is continuous and separates $C$ from the origin] **$\tilde{g}$ separates $D$ from $\{-x_0\}$:** For any $d \in D$, we have $p(d) < 1$ (since $D$ is open and contains $0$, as shown above), so $\tilde{g}(d) \le p(d) < 1$. On the other hand, $\tilde{g}(-x_0) = g(-x_0) = p(-x_0) \ge 1$. Hence \begin{align*} \tilde{g}(d) < 1 \le \tilde{g}(-x_0) \quad \text{for all } d \in D. \end{align*} **$\tilde{g}$ is continuous:** Since $D$ is open and $0 \in D$, there exists $r > 0$ with $B(0, r) \subset D$. For any $x \in X$ with $\|x\| \le r$, we have $x \in D$, so $\tilde{g}(x) \le p(x) < 1$. Replacing $x$ by $-x$ (and noting $-x \in D$ as well), $\tilde{g}(-x) < 1$, i.e., $-\tilde{g}(x) < 1$. Therefore $|\tilde{g}(x)| < 1$ for all $\|x\| \le r$, which gives $\|\tilde{g}\|_{X^*} \le 1/r < \infty$. Hence $\tilde{g} \in X^*$. **Separation of $C$ from $0$:** Translating back, for any $c \in C$ we have $c - x_0 \in D$, so $\tilde{g}(c - x_0) < 1$, i.e., $\tilde{g}(c) < 1 + \tilde{g}(x_0)$ by linearity. From $\tilde{g}(-x_0) = g(-x_0) = p(-x_0) \ge 1$, we obtain $\tilde{g}(x_0) = -\tilde{g}(-x_0) \le -1$. Hence \begin{align*} \tilde{g}(c) < 1 + \tilde{g}(x_0) \le 1 - 1 = 0. \end{align*} Thus $\tilde{g}(c) < 0$ for all $c \in C = A - B$. [guided] **Separation on $D$.** For $d \in D$, the characterisation $D = \{x \in X : p(x) < 1\}$ gives $p(d) < 1$. Since $\tilde{g}(d) \le p(d)$, we conclude $\tilde{g}(d) < 1$. Meanwhile, $\tilde{g}(-x_0) = g(-x_0) = 1 \cdot p(-x_0) \ge 1$. **[Continuity](/page/Continuity) of $\tilde{g}$.** A linear functional on a normed space is continuous if and only if it is bounded on some neighbourhood of the origin. Since $D$ is open and $0 \in D$, there exists $r > 0$ with $B(0, r) \subset D$. For $\|x\| \le r$, both $x$ and $-x$ lie in $B(0, r) \subset D$, so $\tilde{g}(x) \le p(x) < 1$ and $\tilde{g}(-x) \le p(-x) < 1$, giving $|\tilde{g}(x)| < 1$. By homogeneity, $|\tilde{g}(y)| < \|y\|/r$ for all $y \in X$, so $\tilde{g} \in X^*$ with $\|\tilde{g}\|_{X^*} \le 1/r$. **Back to $C$.** For any $c \in C$, $c - x_0 \in D$, so $\tilde{g}(c - x_0) < 1$, i.e., $\tilde{g}(c) < 1 + \tilde{g}(x_0)$. From $\tilde{g}(-x_0) \ge 1$ we get $\tilde{g}(x_0) \le -1$, and therefore \begin{align*} \tilde{g}(c) < 1 + \tilde{g}(x_0) \le 1 + (-1) = 0. \end{align*} So $\tilde{g}$ is strictly negative on $C = A - B$: for every $a \in A$ and $b \in B$, $\tilde{g}(a - b) < 0$, i.e., $\tilde{g}(a) < \tilde{g}(b)$. [/guided] [/step] [step:Extract the separating hyperplane for $A$ and $B$] Set $f := \tilde{g} \in X^*$. From the previous step, $f(a) < f(b)$ for all $a \in A$, $b \in B$. Define \begin{align*} \alpha := \sup_{a \in A} f(a). \end{align*} Then $f(a) \le \alpha$ for all $a \in A$, and $f(b) \ge \alpha$ for all $b \in B$ (since $f(b) > f(a)$ for every $a \in A$, taking the supremum gives $f(b) \ge \alpha$). It remains to show that the inequality is strict on $A$: $f(a) < \alpha$ for all $a \in A$. Fix $a_0 \in A$. Since $A$ is open, there exists $\delta > 0$ with $a_0 + \delta v \in A$ for all $v \in X$ with $\|v\| \le 1$. Since $f \not\equiv 0$ (otherwise $f(a) = 0 = f(b)$, contradicting $f(a) < f(b)$), there exists $v_0 \in X$ with $\|v_0\| \le 1$ and $f(v_0) > 0$. Then $a_0 + \delta v_0 \in A$ and \begin{align*} f(a_0 + \delta v_0) = f(a_0) + \delta f(v_0) > f(a_0). \end{align*} Therefore $\alpha \ge f(a_0 + \delta v_0) > f(a_0)$, confirming $f(a_0) < \alpha$. Combining: $f(a) < \alpha \le f(b)$ for all $a \in A$ and $b \in B$. [guided] We have established $f(a) < f(b)$ for all $a \in A$, $b \in B$. To produce the constant $\alpha$ from the theorem statement, set $\alpha = \sup_{a \in A} f(a)$. **$\alpha \le f(b)$ for all $b \in B$:** For any $b \in B$, the inequality $f(a) < f(b)$ holds for every $a \in A$. Taking the supremum over $a \in A$ gives $\alpha = \sup_{a \in A} f(a) \le f(b)$. **$f(a) < \alpha$ for all $a \in A$:** This is where the openness of $A$ is used a second time. Fix $a_0 \in A$. Since $A$ is open, there exists $\delta > 0$ with $B(a_0, \delta) \subset A$. The functional $f$ is not identically zero — if it were, then $f(a) = 0 = f(b)$ for all $a, b$, contradicting $f(a) < f(b)$ (recall $A$ and $B$ are nonempty). Pick $v_0 \in X$ with $\|v_0\| = 1$ and $f(v_0) > 0$ (such $v_0$ exists since $f \neq 0$ and $f$ achieves both signs by linearity — or if $f \ge 0$ everywhere, pick any $v_0$ with $f(v_0) > 0$). Then $a_0 + \delta v_0 \in B(a_0, \delta) \subset A$, and \begin{align*} f(a_0 + \delta v_0) = f(a_0) + \delta f(v_0) > f(a_0). \end{align*} Hence $\alpha \ge f(a_0 + \delta v_0) > f(a_0)$, giving the strict inequality $f(a_0) < \alpha$. The conclusion is: there exist $f \in X^*$ and $\alpha \in \mathbb{R}$ with $f(a) < \alpha \le f(b)$ for all $a \in A$, $b \in B$. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Hahn-Banach Separation Theorem (Theorem # 974)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Hahn-Banach Separation Theorem (Theorem # 974)

Discussion

Proof

Explore Further