Moreau Decomposition Theorem — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

Proof

[proofplan] We prove the identity by comparing the optimality conditions for the two proximal minimization problems. First, the primal proximal point satisfies a subgradient inclusion. Next, the defining inequality for the convex conjugate shows that this inclusion is equivalent to a dual subgradient inclusion. Finally, that dual inclusion is exactly the optimality condition for the dual proximal point, and the decomposition follows from the definition of the auxiliary vector. [/proofplan] [step:Translate the primal proximal minimizer into a subgradient inclusion] Fix $v \in \mathbb{R}^n$. Let $f^*:\mathbb{R}^n \to (-\infty,\infty]$ denote the convex conjugate of $f$, meaning \begin{align*} f^*(u) := \sup_{y \in \mathbb{R}^n}\{\langle u,y\rangle-f(y)\} \end{align*} for $u \in \mathbb{R}^n$. For any proper convex function $g:\mathbb{R}^n \to (-\infty,\infty]$ and any point $a \in \mathbb{R}^n$ with $g(a)<\infty$, let $\partial g(a) \subset \mathbb{R}^n$ denote the subdifferential, namely the set of all $p \in \mathbb{R}^n$ such that $g(y) \geq g(a)+\langle p,y-a\rangle$ for every $y \in \mathbb{R}^n$. Define \begin{align*} x := \operatorname{prox}_{\lambda f}(v) \in \mathbb{R}^n. \end{align*} By the definition of the proximal map, $x$ is the unique minimizer of the function $\Phi:\mathbb{R}^n \to (-\infty,\infty]$ defined by \begin{align*} \Phi(y) := f(y) + \frac{1}{2\lambda}|y-v|^2. \end{align*} Existence follows from the following finite-dimensional argument. The function $\Phi$ is lower semicontinuous because $f$ is lower semicontinuous and the quadratic term is continuous. To justify coercivity of $\Phi$, use the affine minorant property for proper closed convex functions in finite-dimensional Euclidean space, obtained by applying the supporting-hyperplane separation theorem to the closed convex epigraph of $f$: there exist $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ such that $f(y) \geq \langle a,y\rangle+b$ for every $y \in \mathbb{R}^n$. Hence the quadratic term dominates the affine lower bound as $|y| \to \infty$, so $\Phi(y) \to \infty$ as $|y| \to \infty$. Choose $R>0$ so that $\Phi(y)>\Phi(y_0)$ whenever $|y|>R$, where $y_0 \in \mathbb{R}^n$ is any point with $f(y_0)<\infty$. Let $\overline{B}(0,R) := \{y \in \mathbb{R}^n : |y| \leq R\}$ denote the closed Euclidean ball of radius $R$ centered at $0$. The restriction of $\Phi$ to the compact ball $\overline{B}(0,R)$ attains its minimum because its sublevel sets are closed, so $\Phi$ has a minimizer on $\mathbb{R}^n$. Uniqueness follows from strict convexity: $\Phi$ is strictly convex because $f$ is convex while $y \mapsto |y-v|^2/(2\lambda)$ is $1/\lambda$-strongly convex. We claim that \begin{align*} \frac{v-x}{\lambda} \in \partial f(x). \end{align*} Fix $y \in \mathbb{R}^n$. If $f(y)=\infty$, then the desired subgradient inequality at $y$ is immediate. Hence assume $f(y)<\infty$. For $t \in (0,1)$, define $z_t := x+t(y-x) \in \mathbb{R}^n$. Since $x$ minimizes $\Phi$, \begin{align*} f(x) + \frac{1}{2\lambda}|x-v|^2 \leq f(z_t) + \frac{1}{2\lambda}|z_t-v|^2. \end{align*} By convexity of $f$, \begin{align*} f(z_t) \leq (1-t)f(x)+t f(y). \end{align*} Combining these two inequalities and subtracting $f(x)+\frac{1}{2\lambda}|x-v|^2$ gives \begin{align*} 0 \leq t\bigl(f(y)-f(x)\bigr)+\frac{1}{2\lambda}\bigl(|x-v+t(y-x)|^2-|x-v|^2\bigr). \end{align*} Using the Euclidean square expansion, \begin{align*} |x-v+t(y-x)|^2-|x-v|^2 = 2t\langle x-v,y-x\rangle+t^2|y-x|^2. \end{align*} Therefore \begin{align*} 0 \leq t\left(f(y)-f(x)+\left\langle \frac{x-v}{\lambda},y-x\right\rangle\right)+\frac{t^2}{2\lambda}|y-x|^2. \end{align*} Dividing by $t>0$ and letting $t \downarrow 0$ yields \begin{align*} f(y) \geq f(x)+\left\langle \frac{v-x}{\lambda},y-x\right\rangle. \end{align*} Since this holds for every $y \in \mathbb{R}^n$, this is precisely the definition of \begin{align*} \frac{v-x}{\lambda} \in \partial f(x). \end{align*} [guided] The proximal point $x$ is defined by minimizing $f$ plus a quadratic penalty that keeps the minimizer near $v$. In the language of the proximal map, this means that $x$ is the minimizer of the function $\Phi:\mathbb{R}^n \to (-\infty,\infty]$ defined by \begin{align*} \Phi(y) := f(y) + \frac{1}{2\lambda}|y-v|^2. \end{align*} The minimizer exists by a direct finite-dimensional compactness argument, and it is unique because $\Phi$ is strictly convex. We verify the ingredients. The function $\Phi$ is lower semicontinuous because $f$ is lower semicontinuous and the quadratic penalty is continuous. The remaining point for existence is coercivity. We use the affine minorant property for proper closed convex functions in finite-dimensional Euclidean space, obtained by applying the supporting-hyperplane separation theorem to the closed convex epigraph of $f$: there are $a \in \mathbb{R}^n$ and $b \in \mathbb{R}$ such that $f(y) \geq \langle a,y\rangle+b$ for every $y \in \mathbb{R}^n$. Therefore the quadratic term in $\Phi$ dominates this affine lower bound as $|y| \to \infty$, so $\Phi(y) \to \infty$ as $|y| \to \infty$. Choose $R>0$ so that $\Phi(y)>\Phi(y_0)$ whenever $|y|>R$, where $y_0 \in \mathbb{R}^n$ is any point with $f(y_0)<\infty$. Let $\overline{B}(0,R) := \{y \in \mathbb{R}^n : |y| \leq R\}$ denote the closed Euclidean ball of radius $R$ centered at $0$. On the compact ball $\overline{B}(0,R)$, lower semicontinuity implies that the sublevel sets of $\Phi$ are closed, so $\Phi$ attains its minimum there; the choice of $R$ makes this a global minimum on $\mathbb{R}^n$. Finally, $\Phi$ is strictly convex because $f$ is convex and $y \mapsto |y-v|^2/(2\lambda)$ is $1/\lambda$-strongly convex, so this global minimizer is unique. The goal is to turn the minimality of $x$ for $\Phi$ into the subgradient inequality for $f$ at $x$. Fix a comparison point $y \in \mathbb{R}^n$. If $f(y)=\infty$, then the inequality \begin{align*} f(y) \geq f(x)+\left\langle \frac{v-x}{\lambda},y-x\right\rangle \end{align*} holds immediately in the extended-real sense. Thus assume $f(y)<\infty$. Instead of comparing $x$ directly to $y$, compare $x$ to points on the line segment from $x$ to $y$. For each $t \in (0,1)$, define $z_t := x+t(y-x) \in \mathbb{R}^n$. Since $x$ minimizes $\Phi$, \begin{align*} f(x) + \frac{1}{2\lambda}|x-v|^2 \leq f(z_t) + \frac{1}{2\lambda}|z_t-v|^2. \end{align*} Convexity of $f$ gives \begin{align*} f(z_t) \leq (1-t)f(x)+t f(y). \end{align*} Substituting this convexity bound into the minimality inequality gives \begin{align*} f(x) + \frac{1}{2\lambda}|x-v|^2 \leq (1-t)f(x)+t f(y)+\frac{1}{2\lambda}|x-v+t(y-x)|^2. \end{align*} After subtracting $f(x)+\frac{1}{2\lambda}|x-v|^2$, we obtain \begin{align*} 0 \leq t\bigl(f(y)-f(x)\bigr)+\frac{1}{2\lambda}\bigl(|x-v+t(y-x)|^2-|x-v|^2\bigr). \end{align*} Now expand the square in the direction $y-x$: \begin{align*} |x-v+t(y-x)|^2-|x-v|^2 = 2t\langle x-v,y-x\rangle+t^2|y-x|^2. \end{align*} Substitution yields \begin{align*} 0 \leq t\left(f(y)-f(x)+\left\langle \frac{x-v}{\lambda},y-x\right\rangle\right)+\frac{t^2}{2\lambda}|y-x|^2. \end{align*} Because $t>0$, we may divide by $t$: \begin{align*} 0 \leq f(y)-f(x)+\left\langle \frac{x-v}{\lambda},y-x\right\rangle+\frac{t}{2\lambda}|y-x|^2. \end{align*} Letting $t \downarrow 0$ removes the last term and gives \begin{align*} f(y) \geq f(x)+\left\langle \frac{v-x}{\lambda},y-x\right\rangle. \end{align*} This holds for every $y \in \mathbb{R}^n$, which is exactly the definition of the subdifferential inclusion \begin{align*} \frac{v-x}{\lambda} \in \partial f(x). \end{align*} [/guided] [/step] [step:Convert the primal subgradient inclusion into a dual subgradient inclusion] Define \begin{align*} u := \frac{v-x}{\lambda} \in \mathbb{R}^n. \end{align*} From the previous step, $u \in \partial f(x)$, meaning that for every $y \in \mathbb{R}^n$, \begin{align*} f(y) \geq f(x) + \langle u, y-x\rangle. \end{align*} Equivalently, \begin{align*} \langle u,y\rangle - f(y) \leq \langle u,x\rangle - f(x) \end{align*} for every $y \in \mathbb{R}^n$. Therefore, by the definition of the convex conjugate, \begin{align*} f^*(u) = \langle u,x\rangle - f(x). \end{align*} Now let $w \in \mathbb{R}^n$. By the definition of $f^*(w)$, \begin{align*} f^*(w) \geq \langle w,x\rangle - f(x). \end{align*} Using $f^*(u)=\langle u,x\rangle-f(x)$, this becomes \begin{align*} f^*(w) \geq f^*(u) + \langle x, w-u\rangle. \end{align*} Since this holds for every $w \in \mathbb{R}^n$, we have \begin{align*} x \in \partial f^*(u). \end{align*} [guided] The point of this step is to translate the primal optimality information into the dual language of the convex conjugate. Define \begin{align*} u := \frac{v-x}{\lambda} \in \mathbb{R}^n. \end{align*} The previous step proved $u \in \partial f(x)$, which means that for every $y \in \mathbb{R}^n$, \begin{align*} f(y) \geq f(x) + \langle u, y-x\rangle. \end{align*} Rearranging this inequality gives \begin{align*} \langle u,y\rangle - f(y) \leq \langle u,x\rangle - f(x). \end{align*} Thus the expression whose supremum defines $f^*(u)$ is maximized at $y=x$, so \begin{align*} f^*(u) = \langle u,x\rangle - f(x). \end{align*} Now fix $w \in \mathbb{R}^n$. The definition of the convex conjugate gives the lower bound \begin{align*} f^*(w) \geq \langle w,x\rangle - f(x). \end{align*} Using the identity for $f^*(u)$, we rewrite the right-hand side as \begin{align*} \langle w,x\rangle - f(x) = f^*(u) + \langle x,w-u\rangle. \end{align*} Therefore \begin{align*} f^*(w) \geq f^*(u) + \langle x,w-u\rangle \end{align*} for every $w \in \mathbb{R}^n$. This is exactly the subdifferential inequality for $f^*$ at $u$, with subgradient $x$, and hence \begin{align*} x \in \partial f^*(u). \end{align*} [/guided] [/step] [step:Identify the dual proximal minimizer] We show that $u$ is the proximal point of the scaled conjugate at the scaled vector. Since $f^*:\mathbb{R}^n \to (-\infty,\infty]$ has already been defined, introduce $h:\mathbb{R}^n \to (-\infty,\infty]$ by \begin{align*} h(w) := \frac{1}{\lambda}f^*(w). \end{align*} Thus $h$ is the scaled function customarily denoted $f^*/\lambda$. Define $\Psi:\mathbb{R}^n \to (-\infty,\infty]$ by \begin{align*} \Psi(w) := \frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2. \end{align*} The subgradient inclusion $x \in \partial f^*(u)$ says that for every $w \in \mathbb{R}^n$, \begin{align*} f^*(w) \geq f^*(u) + \langle x, w-u\rangle. \end{align*} Multiplying by $1/\lambda$ gives \begin{align*} \frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{x}{\lambda}, w-u\right\rangle. \end{align*} Since the definition of $u$ gives \begin{align*} \frac{x}{\lambda} = \frac{v}{\lambda} - u. \end{align*} Thus \begin{align*} \frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{v}{\lambda}-u, w-u\right\rangle. \end{align*} Equivalently, \begin{align*} \frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2 \geq \frac{1}{\lambda}f^*(u) + \frac{1}{2}\left|u-\frac{v}{\lambda}\right|^2 + \frac{1}{2}|w-u|^2. \end{align*} Therefore $\Psi(w)\geq \Psi(u)$ for every $w \in \mathbb{R}^n$, so by the definition of the proximal map, \begin{align*} u = \operatorname{prox}_{f^*/\lambda}(v/\lambda). \end{align*} [guided] We now show that the vector $u$ is not merely related to the dual problem, but is exactly its proximal minimizer. For the scaled conjugate \begin{align*} h(w) := \frac{1}{\lambda}f^*(w), \end{align*} the proximal objective at the point $v/\lambda$ is the function $\Psi:\mathbb{R}^n \to (-\infty,\infty]$ defined by \begin{align*} \Psi(w) := \frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2. \end{align*} From the previous step, $x \in \partial f^*(u)$, so for every $w \in \mathbb{R}^n$, \begin{align*} f^*(w) \geq f^*(u) + \langle x, w-u\rangle. \end{align*} Multiplying by $1/\lambda$ gives \begin{align*} \frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{x}{\lambda}, w-u\right\rangle. \end{align*} Because of the defining relation between $u$, $x$, and $v$, we have \begin{align*} \frac{x}{\lambda} = \frac{v}{\lambda}-u. \end{align*} Substituting this identity yields \begin{align*} \frac{1}{\lambda}f^*(w) \geq \frac{1}{\lambda}f^*(u) + \left\langle \frac{v}{\lambda}-u, w-u\right\rangle. \end{align*} Adding the quadratic term and completing the square gives \begin{align*} \frac{1}{\lambda}f^*(w) + \frac{1}{2}\left|w-\frac{v}{\lambda}\right|^2 \geq \frac{1}{\lambda}f^*(u) + \frac{1}{2}\left|u-\frac{v}{\lambda}\right|^2 + \frac{1}{2}|w-u|^2. \end{align*} The final nonnegative term shows $\Psi(w) \geq \Psi(u)$ for every $w \in \mathbb{R}^n$. Hence $u$ minimizes the dual proximal objective, and by the definition of the proximal map, \begin{align*} u = \operatorname{prox}_{f^*/\lambda}(v/\lambda). \end{align*} [/guided] [/step] [step:Substitute the dual proximal point into the decomposition] By the definition of $u$, \begin{align*} v = x + \lambda u. \end{align*} The first step defined \begin{align*} x=\operatorname{prox}_{\lambda f}(v), \end{align*} and the previous step proved \begin{align*} u = \operatorname{prox}_{f^*/\lambda}(v/\lambda). \end{align*} Substituting these two identities into \begin{align*} v=x+\lambda u \end{align*} gives \begin{align*} v = \operatorname{prox}_{\lambda f}(v) + \lambda\operatorname{prox}_{f^*/\lambda}(v/\lambda). \end{align*} Since $v \in \mathbb{R}^n$ was arbitrary, the Moreau decomposition holds for every $v \in \mathbb{R}^n$. [guided] The last step is only substitution, but it is where the two proximal identities are assembled. By definition, \begin{align*} u := \frac{v-x}{\lambda} \end{align*} so multiplying by $\lambda$ and adding $x$ gives \begin{align*} v = x + \lambda u. \end{align*} The first proximal problem defined \begin{align*} x = \operatorname{prox}_{\lambda f}(v), \end{align*} and the dual proximal step proved \begin{align*} u = \operatorname{prox}_{f^*/\lambda}(v/\lambda). \end{align*} Substituting these two identities into \begin{align*} v=x+\lambda u \end{align*} gives \begin{align*} v = \operatorname{prox}_{\lambda f}(v) + \lambda\operatorname{prox}_{f^*/\lambda}(v/\lambda). \end{align*} Because the argument began with an arbitrary vector $v \in \mathbb{R}^n$, this proves the Moreau decomposition for every $v \in \mathbb{R}^n$. [/guided] [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Moreau Decomposition Theorem (Theorem # 6699)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Moreau Decomposition Theorem (Theorem # 6699)

Discussion

Proof

Explore Further