Jensen's Inequality and Risk Reduction by Conditional Averaging

Jensen's Inequality and Risk Reduction by Conditional Averaging (Theorem # 6296)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] The proof reduces [Jensen's inequality](/theorems/9) to the existence of a supporting affine minorant for a finite convex function at the barycenter of the distribution. We first isolate the smallest affine subspace carrying the random vector and use closed finite-dimensional separation to show that the barycenter is a relative interior point of the closed convex support. Restricting the loss to the part of this support lying in $C$ gives a supporting affine lower bound on the values actually assumed by $Z$, and integrating that affine lower bound gives [Jensen's inequality](/theorems/1977). The risk comparison is obtained by applying this [Jensen inequality](/theorems/515) conditionally to the regular conditional law of the randomized report. [/proofplan] [step:Place the barycenter in the relative interior of the essential convex support] Let $\mu := \mathbb{P} \circ Z^{-1}$ be the pushforward law of $Z$ on $C$, let $S \subset C$ be the support of the measure $\mu$ in the relative topology of $C$, and define \begin{align*} K := \overline{\operatorname{conv}(S)} \end{align*} where the closure is taken in the finite-dimensional ambient [vector space](/page/Vector%20Space). The relative topology on $C$ is second countable because $C$ lies in a finite-dimensional normed space. Hence $C\setminus S$ is a countable union of relatively open $\mu$-null sets, so $\mu(S)=1$. Therefore $Z$ takes values in $S \subset K$ $\mathbb{P}$-almost surely. Define the expectation vector \begin{align*} m := \mathbb{E}[Z]. \end{align*} We claim that $m$ lies in the relative interior of $K$ inside the affine hull $\operatorname{aff}(K)$. First, $m \in K$. Suppose instead that $m \notin K$. Since $K$ is closed and convex in the finite-dimensional affine space $\operatorname{aff}(K \cup \{m\})$ and $m\notin K$, the finite-dimensional strong separating hyperplane theorem gives a linear functional $\lambda: \operatorname{aff}(K \cup \{m\})-m \to \mathbb{R}$ and a number $\delta>0$ such that \begin{align*} \lambda(x-m) \geq \delta \end{align*} for every $x \in K$. Since $Z$ takes values in $K$ almost surely, $\lambda(Z-m) \geq \delta$ almost surely. Because $Z$ is integrable, $\lambda(Z-m)$ is integrable, and linearity of expectation gives \begin{align*} \mathbb{E}[\lambda(Z-m)] = \lambda(\mathbb{E}[Z]-m) = 0, \end{align*} contradicting $\mathbb{E}[\lambda(Z-m)] \geq \delta$. Thus $m \in K$. Now suppose that $m \notin \operatorname{relint}(K)$. Since $m \in K$, it is a relative boundary point of the closed convex set $K$ inside $\operatorname{aff}(K)$. The finite-dimensional [Supporting Hyperplane Theorem](/theorems/2551), applied in $\operatorname{aff}(K)$, gives a nonzero linear functional $\lambda: \operatorname{aff}(K)-m \to \mathbb{R}$ such that $\lambda(x-m) \geq 0$ for every $x \in K$ and such that this inequality is strict at some point of $K$. Since $Z$ takes values in $K$ almost surely, \begin{align*} \lambda(Z-m) \geq 0 \end{align*} almost surely. Because $Z$ is integrable, $\lambda(Z-m)$ is integrable, and linearity of expectation gives \begin{align*} \mathbb{E}[\lambda(Z-m)] = \lambda(\mathbb{E}[Z]-m) = 0. \end{align*} A non-negative integrable [random variable](/page/Random%20Variable) with expectation $0$ vanishes almost surely, so $\lambda(Z-m)=0$ almost surely. Hence the support $S$ of $\mu$ is contained in the proper affine hyperplane $\{x \in \operatorname{aff}(K): \lambda(x-m)=0\}$. Since this hyperplane is closed and convex, it contains $\overline{\operatorname{conv}(S)}=K$, contradicting the strict inequality at some point of $K$. Therefore $m \in \operatorname{relint}(K)$. [guided] The role of this step is to rule out a boundary obstruction. Convex functions need not have supporting affine minorants at arbitrary boundary points, but they do have such minorants at relative interior points of their domain. We therefore work only in the affine space actually seen by the distribution of $Z$. Let $\mu := \mathbb{P} \circ Z^{-1}$ denote the pushforward law of the random vector $Z$ on $C$. Let $S \subset C$ be the support of the measure $\mu$ in the relative topology of $C$, and define \begin{align*} K := \overline{\operatorname{conv}(S)}, \end{align*} where the closure is taken in the finite-dimensional ambient vector space. This closed convex set records the affine directions seen by the distribution, while the later integration only uses values on $S \subset C$. Why is the measure concentrated on $S$? The relative topology on $C$ is second countable because $C$ is a subspace of a finite-dimensional [normed vector space](/page/Normed%20Vector%20Space). By definition of support, every point of $C\setminus S$ has a relatively open neighbourhood of $\mu$-measure $0$. Choosing these neighbourhoods from a countable base shows that $C\setminus S$ is contained in a countable union of $\mu$-null relatively open sets. Hence $\mu(S)=1$, and therefore $Z(\omega) \in S \subset K$ for $\mathbb{P}$-almost every $\omega \in \Omega$. Define \begin{align*} m := \mathbb{E}[Z]. \end{align*} We prove that $m$ is a relative interior point of $K$ in the affine space $\operatorname{aff}(K)$. First, $m$ belongs to $K$. If $m \notin K$, then $K$ is a closed convex subset of the finite-dimensional affine space $\operatorname{aff}(K \cup \{m\})$, so the finite-dimensional strong separating hyperplane theorem gives a linear functional \begin{align*} \lambda: \operatorname{aff}(K \cup \{m\})-m \to \mathbb{R} \end{align*} and a number $\delta>0$ such that \begin{align*} \lambda(x-m) \geq \delta \end{align*} for all $x \in K$. Applying this to $Z(\omega)$ gives $\lambda(Z-m)\geq \delta$ almost surely. The random variable $\lambda(Z-m)$ is integrable because $Z$ is integrable and $\lambda$ is linear on a finite-dimensional vector space. Taking expectations gives \begin{align*} \mathbb{E}[\lambda(Z-m)] = \lambda(\mathbb{E}[Z]-m)=0, \end{align*} which contradicts $\mathbb{E}[\lambda(Z-m)]\geq\delta$. Hence $m \in K$. Now suppose that $m$ is not in the relative interior of $K$. Since $m\in K$, it is a relative boundary point of the closed convex set $K$ inside $\operatorname{aff}(K)$. The finite-dimensional [Supporting Hyperplane Theorem](/theorems/2551), applied inside $\operatorname{aff}(K)$, gives a nonzero linear functional \begin{align*} \lambda: \operatorname{aff}(K)-m \to \mathbb{R} \end{align*} such that $\lambda(x-m) \geq 0$ for all $x \in K$, with strict inequality somewhere on $K$. Applying this to the random point $Z(\omega)$ gives \begin{align*} \lambda(Z-m) \geq 0 \end{align*} almost surely. The random variable $\lambda(Z-m)$ is integrable because $Z$ is integrable and $\lambda$ is linear on a finite-dimensional vector space. Taking expectations and using the defining property $m=\mathbb{E}[Z]$, we obtain \begin{align*} \mathbb{E}[\lambda(Z-m)] = \lambda(\mathbb{E}[Z]-m) = 0. \end{align*} Thus $\lambda(Z-m)$ is non-negative and has expectation $0$, so it is $0$ almost surely. This means the support $S$ of $\mu$ is contained in the affine hyperplane \begin{align*} \{x \in \operatorname{aff}(K): \lambda(x-m)=0\}. \end{align*} Since this hyperplane is closed and convex, every point of $K=\overline{\operatorname{conv}(S)}$ also lies in that hyperplane. This contradicts the strict inequality somewhere on $K$. Hence $m \in \operatorname{relint}(K)$. [/guided] [/step] [step:Support the convex loss by an affine minorant at the barycenter] Define the convex set \begin{align*} D := K \cap C. \end{align*} Since $S \subset C$ and $C$ is convex, $\operatorname{conv}(S)\subset C$, hence $\operatorname{conv}(S)\subset D\subset K$. In finite dimensions, taking closure does not change relative interior of a convex set, so \begin{align*} \operatorname{relint}(\operatorname{conv}(S))=\operatorname{relint}(K) \end{align*} inside $\operatorname{aff}(K)$. The preceding step gives $m\in\operatorname{relint}(K)$, hence $m\in\operatorname{relint}(\operatorname{conv}(S))$. Therefore some relative neighbourhood of $m$ in $\operatorname{aff}(K)$ is contained in $\operatorname{conv}(S)\subset D$, and so $m\in\operatorname{relint}(D)$ inside $\operatorname{aff}(K)$. Since $\ell$ is a finite-valued convex function on $C$, its restriction to $D$ is finite-valued and convex. Because $m\in\operatorname{relint}(D)$, the finite-dimensional subgradient theorem for finite convex functions gives a linear functional \begin{align*} \alpha: \operatorname{aff}(K)-m \to \mathbb{R} \end{align*} such that \begin{align*} \ell(x) \geq \ell(m) + \alpha(x-m) \end{align*} for every $x \in D$. This theorem applies in the finite-dimensional affine space $\operatorname{aff}(K)$ to the finite convex function $\ell|_D:D\to\mathbb{R}$ at the relative interior point $m$; it does not require $D$ to be closed or $\ell|_D$ to be lower semicontinuous on the boundary. [guided] We now use the closed convex set $K$ only to identify the correct affine directions. Because $\ell$ is defined on $C$, we restrict to the convex set \begin{align*} D := K \cap C. \end{align*} This set contains $S$, so it contains the values assumed by $Z$ almost surely, and it contains $m$ by the hypothesis $\mathbb E[Z]\in C$. The point that needs proof is that $m$ is not moved to the boundary by intersecting with $C$. Since $S\subset C$ and $C$ is convex, every finite convex combination of points of $S$ still lies in $C$; equivalently, \begin{align*} \operatorname{conv}(S)\subset C. \end{align*} Also $\operatorname{conv}(S)\subset K$ by the definition of $K$, hence \begin{align*} \operatorname{conv}(S)\subset D\subset K. \end{align*} In a finite-dimensional affine space, a convex set and its closure have the same relative interior. Applying this to $\operatorname{conv}(S)$ inside $\operatorname{aff}(K)$ gives \begin{align*} \operatorname{relint}(\operatorname{conv}(S))=\operatorname{relint}(K). \end{align*} The previous step proved $m\in\operatorname{relint}(K)$, so $m\in\operatorname{relint}(\operatorname{conv}(S))$. Thus there is a relative neighbourhood of $m$ in $\operatorname{aff}(K)$ contained in $\operatorname{conv}(S)$. Since $\operatorname{conv}(S)\subset D$, the same neighbourhood is contained in $D$, and therefore \begin{align*} m\in\operatorname{relint}(D) \end{align*} inside $\operatorname{aff}(K)$. We now build the affine lower bound. The restricted function $\ell|_D:D\to\mathbb{R}$ is finite-valued and convex because $\ell:C\to\mathbb{R}$ is finite-valued and convex. We use the finite-dimensional subgradient theorem for finite convex functions: if a finite convex function is defined on a convex subset of a finite-dimensional affine space, then at every relative interior point of its domain it has a subgradient. The hypotheses are now verified. The ambient affine space is $\operatorname{aff}(K)$, the domain $D$ is convex, the function $\ell|_D$ is finite-valued and convex, and the point $m$ lies in $\operatorname{relint}(D)$ by the preceding paragraph. Therefore there exists a linear functional \begin{align*} \alpha: \operatorname{aff}(K)-m \to \mathbb{R} \end{align*} such that, for every $x\in D$, \begin{align*} \ell(x) \geq \ell(m)+\alpha(x-m). \end{align*} This is exactly the affine lower bound needed for integration, because $Z$ takes values in $S\subset D$ almost surely. The point of using the subgradient theorem rather than a direct supporting hyperplane theorem for the epigraph is that $D$ need not be closed and $\ell|_D$ need not be lower semicontinuous at boundary points of $D$; relative interior is the condition that guarantees a subgradient without those [boundary regularity](/theorems/99) assumptions. [/guided] [/step] [step:Integrate the affine minorant to prove Jensen's inequality] Since $Z \in S\subset D$ almost surely, the affine minorant gives \begin{align*} \ell(Z) \geq \ell(m) + \alpha(Z-m) \end{align*} almost surely. The random variable $\ell(Z)$ is integrable by hypothesis, and $\alpha(Z-m)$ is integrable because $Z$ is integrable. Taking expectations yields \begin{align*} \mathbb{E}[\ell(Z)] \geq \mathbb{E}[\ell(m) + \alpha(Z-m)] = \ell(m) + \alpha(\mathbb{E}[Z]-m) = \ell(m). \end{align*} Since $m=\mathbb{E}[Z]$, this proves \begin{align*} \ell(\mathbb{E}[Z]) \leq \mathbb{E}[\ell(Z)]. \end{align*} [/step] [step:Apply Jensen conditionally to compare risks] For the consequence, let $(\Omega,\mathcal F,\mathbb P)$ be the probability space, let $\mathcal G\subset\mathcal F$ be the conditioning sub-$\sigma$-algebra, let $(T,\mathcal T)$ be the standard Borel truth-value space, let $A:(\Omega,\mathcal F)\to(C,\mathcal B(C))$ be the randomized report, and let $B:(\Omega,\mathcal G)\to(T,\mathcal T)$ be the true value. Since $C$ is a convex subset of a finite-dimensional real vector space, $C$ is Borel in its relative topology and is a standard Borel space with its Borel $\sigma$-algebra $\mathcal B(C)$. Let \begin{align*} L:C\times T &\to \mathbb R \end{align*} be the Borel loss map from the statement. Fix a norm $|\cdot|$ on the finite-dimensional ambient vector space containing $C$; integrability of $C$-valued random vectors is understood with respect to this norm, and this notion is independent of the chosen norm in finite dimensions. For $\mathbb{P}$-almost every $\omega \in \Omega$, define the convex function $\ell_\omega:C\to\mathbb{R}$ by letting $\ell_\omega(a)=L(a,B(\omega))$ for $a \in C$. Because $B$ is $\mathcal{G}$-measurable, $B(\omega)$ is fixed when conditioning on $\mathcal{G}$. Let $\kappa_\omega$ denote a regular conditional law of $A$ given $\mathcal{G}$. On the full-measure set where the identity map $\operatorname{id}_C:C\to C$ is $\kappa_\omega$-integrable, define its barycenter by \begin{align*} \bar A(\omega) = \int_C a \, d\kappa_\omega(a) \in C. \end{align*} Componentwise, after choosing any basis of the finite-dimensional ambient vector space, this integral is a version of $\mathbb{E}[A\mid\mathcal{G}](\omega)$. We verify the integrability hypotheses pointwise on a full-measure set. Since $A$ is integrable, the conditional first moment function \begin{align*} \omega \mapsto \int_C |a| \, d\kappa_\omega(a) \end{align*} is a version of $\mathbb{E}[|A|\mid \mathcal{G}](\omega)$ and is finite for $\mathbb{P}$-almost every $\omega$. Since $C$ and the truth-value space are standard Borel spaces and $B$ is $\mathcal{G}$-measurable, the regular conditional law exists and the following kernel identity is valid. It first holds for indicators $\varphi(a,b)=\mathbb{1}_{E}(a)\mathbb{1}_{F}(b)$ with $E\subset C$ and $F$ Borel, because $\kappa_\omega$ is a regular conditional law of $A$ given $\mathcal{G}$ and $\mathbb{1}_{F}(B)$ is $\mathcal{G}$-measurable. The [Monotone Class Theorem](/theorems/4925) then extends it to every non-negative Borel map $\varphi$ on the report-truth product space: \begin{align*} \mathbb{E}[\varphi(A,B)\mid\mathcal{G}](\omega)=\int_C \varphi(a,B(\omega))\,d\kappa_\omega(a). \end{align*} For integrable $\varphi$, the same identity follows by applying the non-negative case to the positive and negative parts of $\varphi$. Applying this with $\varphi(a,b)=|L(a,b)|$ and using the integrability of $L(A,B)$ shows that the conditional loss moment function \begin{align*} \omega \mapsto \int_C |L(a,B(\omega))| \, d\kappa_\omega(a) \end{align*} is a version of $\mathbb{E}[|L(A,B)|\mid \mathcal{G}](\omega)$ and is finite for $\mathbb{P}$-almost every $\omega$. On the intersection of these two full-measure sets, the identity map is $\kappa_\omega$-integrable, the displayed integral defines the conditional barycenter $\bar A(\omega)$, and all hypotheses of the Jensen inequality already proved are satisfied for the probability measure $\kappa_\omega$ and the convex function $\ell_\omega$. Applying Jensen's inequality gives \begin{align*} L(\bar A(\omega),B(\omega)) \leq \int_C L(a,B(\omega)) \, d\kappa_\omega(a) \end{align*} for almost every $\omega$. The right-hand side is a version of the [conditional expectation](/page/Conditional%20Expectation) $\mathbb{E}[L(A,B)\mid \mathcal{G}](\omega)$. Therefore \begin{align*} L(\bar A,B) \leq \mathbb{E}[L(A,B)\mid \mathcal{G}] \end{align*} almost surely. The assumed integrability of $L(\bar A,B)$ and $L(A,B)$ makes both expectations below well-defined. Taking expectations and using the defining property of conditional expectation, \begin{align*} \mathbb{E}[L(\bar A,B)] \leq \mathbb{E}[\mathbb{E}[L(A,B)\mid \mathcal{G}]]. \end{align*} By the [Tower Property of Conditional Expectation](/theorems/1150), \begin{align*} \mathbb{E}[\mathbb{E}[L(A,B)\mid \mathcal{G}]] = \mathbb{E}[L(A,B)]. \end{align*} This is precisely the asserted risk comparison. [/step]

Prerequisites (0/10 completed)

Prerequisites Graph

Interactive dependency map showing how this theorem builds on foundational concepts

Loading dependency graph...

Theorems

Definitions & Concepts

Explore Further

Conditional Expectation Definition Random Variable Definition Distribution Definition Vector Space Definition Expectation Definition Jensen's Inequality Theorem #1977 Supporting Hyperplane Theorem for Convex Functions Theorem #2551 Jensen's Inequality for finite measure spaces Theorem #8 Jensen's Inequality Theorem #9 Jensen Inequality Theorem #515 Equality Condition in the Gauss-Markov Theorem Probability & Statistics Fast-Rate Lasso Prediction Bound Under the Compatibility Condition Probability & Statistics Radon-Nikodym Theorem (Probabilistic) Martingale Theory Weak $\ell_q$ Sparsity Effective Support Bound Probability & Statistics Portmanteau Theorem Weak Convergence Skorokhod Embedding Theorem Brownian Motion Martingale Regularisation Theorem Stochastic Processes Dvoretzky-Kiefer-Wolfowitz Inequality Probability & Statistics Probability & Statistics Area

What brings you to Androma?

Start with a route through the knowledge graph.