Carathéodory's Theorem — Statement & Proof

Carathéodory's Theorem (Theorem # 4083)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] Because $x \in \operatorname{conv}(A)$, it has a representation as a finite convex combination of points of $A$. If more than $n+1$ points occur with positive coefficient, those points are affinely dependent in $\mathbb{R}^n$. An affine dependence gives a direction in coefficient space along which the represented point and the total coefficient sum remain fixed. Moving as far as possible in that direction eliminates at least one positive coefficient without creating a negative coefficient; repeating this finite reduction gives a convex representation using at most $n+1$ points, which we then pad with zero coefficients if necessary. [/proofplan] [step:Start with a finite convex representation and discard zero coefficients] Since $x \in \operatorname{conv}(A)$, there exist an integer $m \in \mathbb{N}$, points $b_1,\dots,b_m \in A$, and coefficients $\mu_1,\dots,\mu_m \in [0,\infty)$ such that \begin{align*} \sum_{i=1}^m \mu_i = 1 \end{align*} and \begin{align*} x = \sum_{i=1}^m \mu_i b_i. \end{align*} Discard every index $i$ for which $\mu_i = 0$. After relabelling, we obtain an integer $r \in \mathbb{N}$, points $c_1,\dots,c_r \in A$, and coefficients $\theta_1,\dots,\theta_r \in (0,\infty)$ such that \begin{align*} \sum_{i=1}^r \theta_i = 1 \end{align*} and \begin{align*} x = \sum_{i=1}^r \theta_i c_i. \end{align*} If $r \leq n+1$, the desired representation follows by setting $a_i = c_i$ and $\lambda_i = \theta_i$ for $1 \leq i \leq r$, and then choosing any already used point of $A$ for the remaining $a_i$ and setting the remaining $\lambda_i$ equal to $0$. Thus it remains to show that any representation with $r > n+1$ positive coefficients can be shortened. [/step] [step:Produce a nonzero affine relation among more than $n+1$ points] Assume $r > n+1$. The $r-1$ vectors \begin{align*} c_1 - c_r,\ c_2 - c_r,\ \dots,\ c_{r-1} - c_r \end{align*} belong to the $n$-dimensional [vector space](/page/Vector%20Space) $\mathbb{R}^n$. Since $r-1 > n$, these vectors are linearly dependent. Hence there exist scalars $\alpha_1,\dots,\alpha_{r-1} \in \mathbb{R}$, not all zero, such that \begin{align*} \sum_{i=1}^{r-1} \alpha_i(c_i - c_r) = 0. \end{align*} Define \begin{align*} \alpha_r := -\sum_{i=1}^{r-1} \alpha_i. \end{align*} Then the scalars $\alpha_1,\dots,\alpha_r$ are not all zero and satisfy \begin{align*} \sum_{i=1}^r \alpha_i = 0 \end{align*} and \begin{align*} \sum_{i=1}^r \alpha_i c_i = 0. \end{align*} [guided] We need a direction in which to change the coefficients without changing either the represented point $x$ or the fact that the coefficients sum to $1$. Such a direction is exactly an affine relation among the points $c_1,\dots,c_r$. Because $r > n+1$, we have $r-1 > n$. The vectors \begin{align*} c_1 - c_r,\ c_2 - c_r,\ \dots,\ c_{r-1} - c_r \end{align*} are $r-1$ vectors in the $n$-dimensional vector space $\mathbb{R}^n$, so they are linearly dependent. Therefore there are [real numbers](/page/Real%20Numbers) $\alpha_1,\dots,\alpha_{r-1}$, not all zero, with \begin{align*} \sum_{i=1}^{r-1} \alpha_i(c_i - c_r) = 0. \end{align*} Now define the missing coefficient by \begin{align*} \alpha_r := -\sum_{i=1}^{r-1} \alpha_i. \end{align*} This definition gives \begin{align*} \sum_{i=1}^r \alpha_i = 0. \end{align*} Expanding the linear dependence gives \begin{align*} 0 &= \sum_{i=1}^{r-1} \alpha_i(c_i - c_r) \\ &= \sum_{i=1}^{r-1} \alpha_i c_i - \left(\sum_{i=1}^{r-1} \alpha_i\right)c_r \\ &= \sum_{i=1}^{r-1} \alpha_i c_i + \alpha_r c_r \\ &= \sum_{i=1}^r \alpha_i c_i. \end{align*} Thus $\alpha_1,\dots,\alpha_r$ give a nonzero affine relation: their sum is $0$, and the same coefficients applied to the points also sum to $0$. [/guided] [/step] [step:Move along the affine relation until one coefficient vanishes] Since $\alpha_1,\dots,\alpha_r$ are not all zero and \begin{align*} \sum_{i=1}^r \alpha_i = 0, \end{align*} at least one $\alpha_i$ is positive. Define the nonempty index set \begin{align*} I_+ := \{i \in \{1,\dots,r\} : \alpha_i > 0\}. \end{align*} Define \begin{align*} t_* := \min_{i \in I_+} \frac{\theta_i}{\alpha_i}. \end{align*} Because each $\theta_i > 0$ and each $\alpha_i > 0$ for $i \in I_+$, we have $t_* > 0$. For $1 \leq i \leq r$, define new coefficients \begin{align*} \theta_i' := \theta_i - t_* \alpha_i. \end{align*} If $\alpha_i > 0$, then $t_* \leq \theta_i/\alpha_i$, so $\theta_i' \geq 0$. If $\alpha_i \leq 0$, then $\theta_i' = \theta_i - t_*\alpha_i \geq \theta_i > 0$. Therefore $\theta_i' \geq 0$ for every $i$. By the definition of $t_*$, there is an index $j \in I_+$ such that \begin{align*} t_* = \frac{\theta_j}{\alpha_j}. \end{align*} For this index, \begin{align*} \theta_j' = \theta_j - t_* \alpha_j = 0. \end{align*} Thus at least one coefficient has been eliminated. [guided] The affine relation gives a way to perturb the coefficient vector. For a real parameter $t$, the candidate new coefficients are \begin{align*} \theta_i(t) := \theta_i - t\alpha_i \end{align*} for $1 \leq i \leq r$. We want to choose $t > 0$ as large as possible while keeping all coefficients nonnegative. First, there must be at least one positive $\alpha_i$. If all $\alpha_i \leq 0$, then \begin{align*} \sum_{i=1}^r \alpha_i = 0 \end{align*} would force every $\alpha_i = 0$, contradicting that the affine relation is nonzero. Therefore the index set \begin{align*} I_+ := \{i \in \{1,\dots,r\} : \alpha_i > 0\} \end{align*} is nonempty. Define \begin{align*} t_* := \min_{i \in I_+} \frac{\theta_i}{\alpha_i}. \end{align*} This is a minimum over a finite nonempty set of positive real numbers, since $\theta_i > 0$ and $\alpha_i > 0$ for $i \in I_+$. Hence $t_* > 0$. Now define \begin{align*} \theta_i' := \theta_i - t_*\alpha_i \end{align*} for each $i$. We check nonnegativity in two cases. If $\alpha_i > 0$, then the definition of $t_*$ gives \begin{align*} t_* \leq \frac{\theta_i}{\alpha_i}, \end{align*} so \begin{align*} \theta_i' = \theta_i - t_*\alpha_i \geq 0. \end{align*} If $\alpha_i \leq 0$, then subtracting $t_*\alpha_i$ increases the coefficient or leaves it unchanged: \begin{align*} \theta_i' = \theta_i - t_*\alpha_i \geq \theta_i > 0. \end{align*} Thus all new coefficients are nonnegative. Finally, because the minimum defining $t_*$ is attained, there exists $j \in I_+$ with \begin{align*} t_* = \frac{\theta_j}{\alpha_j}. \end{align*} For this index, \begin{align*} \theta_j' = \theta_j - t_*\alpha_j = 0. \end{align*} So the perturbation has removed at least one point from the positive part of the convex combination. [/guided] [/step] [step:Verify that the shortened coefficients still represent $x$] The total coefficient sum is preserved because \begin{align*} \sum_{i=1}^r \theta_i' &= \sum_{i=1}^r (\theta_i - t_*\alpha_i) \\ &= \sum_{i=1}^r \theta_i - t_*\sum_{i=1}^r \alpha_i \\ &= 1. \end{align*} The represented point is preserved because \begin{align*} \sum_{i=1}^r \theta_i' c_i &= \sum_{i=1}^r (\theta_i - t_*\alpha_i)c_i \\ &= \sum_{i=1}^r \theta_i c_i - t_*\sum_{i=1}^r \alpha_i c_i \\ &= x. \end{align*} Since at least one $\theta_i'$ equals $0$, discarding all zero coefficients gives a convex representation of $x$ using strictly fewer than $r$ points of $A$. [/step] [step:Iterate the reduction and pad the final representation to length $n+1$] Starting from the representation with $r$ positive coefficients, repeat the shortening procedure whenever the current number of positive coefficients is greater than $n+1$. Each shortening reduces the number of positive coefficients by at least one, and this number is a positive integer, so the process terminates after finitely many steps. We obtain an integer $s \leq n+1$, points $d_1,\dots,d_s \in A$, and coefficients $\rho_1,\dots,\rho_s \in (0,\infty)$ such that \begin{align*} \sum_{i=1}^s \rho_i = 1 \end{align*} and \begin{align*} x = \sum_{i=1}^s \rho_i d_i. \end{align*} If $s = n+1$, set $a_i := d_i$ and $\lambda_i := \rho_i$ for all $1 \leq i \leq n+1$. If $s < n+1$, set $a_i := d_i$ and $\lambda_i := \rho_i$ for $1 \leq i \leq s$, and for $s < i \leq n+1$ set $a_i := d_1$ and $\lambda_i := 0$. Then $a_1,\dots,a_{n+1} \in A$, each $\lambda_i \geq 0$, \begin{align*} \sum_{i=1}^{n+1} \lambda_i = 1, \end{align*} and \begin{align*} x = \sum_{i=1}^{n+1} \lambda_i a_i. \end{align*} This proves the theorem. [/step]

Prerequisites (0/2 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Definitions & Concepts

Explore Further

Real Numbers Definition Vector Space Definition Additivity and Positive Homogeneity of Support Functions under Minkowski Operations geometry Minkowski Sum of Convex Bodies is a Convex Body geometry Borell's Lemma for Seminorms under Log-Concave Measures geometry Concavity of Volume Radius under Minkowski Interpolation geometry Helly's Theorem geometry Lévy's Lemma geometry Minkowski's Theorem on Extreme Points geometry Prékopa-Leindler Inequality geometry

What brings you to Androma?

Start with a route through the knowledge graph.

Carathéodory's Theorem (Theorem # 4083)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further

Sign in to Androma

Check your inbox

One last step

Carathéodory's Theorem (Theorem # 4083)

Discussion

Proof

Prerequisites (0/2 completed)

Prerequisites Graph

Explore Further