Calculus of Variations II: Direct Methods

Also known as: Direct Methods in the Calculus of Variations, Direct Method, Variational Direct Methods, Existence Methods for Variational Problems, Lower Semicontinuity and Coercivity

Edit 0 Issues 0 Pull Requests Roadmap Admin

Content

Problems

History

Issues Verification Attributions

This course develops the direct method in the [calculus of variations](/page/Calculus%20of%20Variations) as a systematic framework for proving existence of minimizers for variational problems that lie beyond classical Euler-Lagrange theory. The focus is on functionals defined on weakly convergent sequences, where minimizers are sought without assuming smooth critical points or explicit solvability. The course emphasizes the structural ingredients that make existence theory work: coercivity, compactness, and lower semicontinuity, together with the ways these properties interact in scalar problems, constrained problems, and models from nonlinear elasticity. The chapters build from the basic philosophy of variational minimization to the technical tools needed to make it rigorous. Early chapters explain why weak topologies are natural, how compactness is recovered from coercive bounds, and why lower semicontinuity is the decisive criterion for passing to limits. From there, the course treats Tonelli-type existence theorems, obstacle and inequality constraints, and relaxation as a way to repair ill-posed problems. Later chapters move to the multidimensional setting, where quasiconvexity, polyconvexity, and minors become the right notions for weak lower semicontinuity and elastic energy minimization. The final part of the course studies the limits of the direct method and what comes after existence: Lavrentiev gaps, failure of density, and regularity as a separate question from existence. The synthesis chapter ties these themes together by comparing the different hypotheses under which the direct method succeeds, clarifying which functional-analytic and structural assumptions are needed in each setting, and showing how the theory changes as one moves from scalar problems to nonlinear vector-valued models. # Introduction This course is about existence: given a functional $I$ on an infinite-dimensional class of admissible functions, when does the variational problem \begin{align*} \inf_{u \in \mathcal A} I[u] \end{align*} have a minimizer? Classical calculus of variations often begins by deriving Euler-Lagrange equations for smooth critical points, but the direct method starts from minimizing sequences and asks whether compactness and lower semicontinuity are strong enough to pass to a limit. The guiding theme is that existence is a structural question about the functional, the topology, and the constraint class, not just about solving a differential equation. The first course in the calculus of variations introduced functionals, first variations, and Euler-Lagrange equations. This second course shifts the emphasis to weak topologies, Sobolev spaces, convexity conditions, relaxation, and the modern existence theory for integral functionals. The final part connects these ideas to nonlinear elasticity, where Chapters 7-9 replace ordinary convexity by quasiconvexity and the more checkable sufficient condition of polyconvexity. We use standard Sobolev notation from the start. The symbol $d\mathcal L^n$ denotes integration with respect to $n$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure). The space $H^1(U)$ is $W^{1,2}(U)$, $H^1_0(U)$ is the closure of compactly supported smooth functions in $H^1(U)$, and $W^{1,p}_0(U)$ has the analogous meaning in $W^{1,p}(U)$. Weak derivatives are distributional derivatives that belong to the stated $L^p$ space. On sufficiently regular boundaries, the trace operator records Sobolev boundary values in spaces such as $H^{1/2}(\partial U)$, so boundary conditions are understood in the trace sense rather than pointwise. ## What the Direct Method Tries to Prove The central problem is to turn the formal statement "minimize $I[u]$ over $\mathcal A$" into an existence theorem. In finite dimensions the Weierstrass theorem gives a model: compactness gives a convergent subsequence, and continuity passes the value of the function to the limit. In infinite dimensions neither compactness nor continuity usually survives in the norm topology, so the method replaces them with weak compactness and weak lower semicontinuity. [definition: Minimizer] Let $X$ be a set, let $\mathcal A \subset X$, and let $I: \mathcal A \to \mathbb R \cup \{+\infty\}$. An element $u \in \mathcal A$ is a minimizer of $I$ over $\mathcal A$ if \begin{align*} I[u] = \inf_{v \in \mathcal A} I[v]. \end{align*} [/definition] The definition records the target, but it gives no method for finding $u$. In an infinite-dimensional admissible class the infimum may be approached by better and better competitors even when no best competitor is visible. This creates the first object that an existence proof can actually build: not the unknown minimizer, but a controlled list of admissible competitors whose energies converge down to the infimum. The next definition names that list so the later compactness argument has a precise sequence to extract a subsequence from, and the lower semicontinuity argument has a precise limit to test against. [definition: Minimizing Sequence] Let $I: \mathcal A \to \mathbb R \cup \{+\infty\}$ be bounded below on $\mathcal A$. A sequence $(u_k)_{k=1}^{\infty}$ in $\mathcal A$ is a minimizing sequence for $I$ over $\mathcal A$ if \begin{align*} I[u_k] \to \inf_{v \in \mathcal A} I[v]. \end{align*} [/definition] A minimizing sequence is usually easy to obtain from the definition of the infimum. The difficulty is that it may oscillate, concentrate, escape to infinity, lose boundary conditions, or converge only in a topology too weak for the functional to be continuous; this motivates a theorem isolating exactly what compactness and semicontinuity must provide. [quotetheorem:8724] [citeproof:8724] This template is the whole course in compressed form, but it is also deliberately limited. It does not construct a minimizing sequence beyond the infimum argument, it does not identify the minimizer uniquely, and it does not give regularity or an Euler-Lagrange equation. It says only that if compactness, admissibility of the limit, and lower semicontinuity are already available, then existence follows. Each hypothesis excludes a specific failure mode. If compactness fails, a minimizing sequence can escape to infinity; for instance $I[x]=e^x$ on $\mathbb R$ has infimum $0$ but no minimizer. If the admissible class is not closed under the convergence used, the limit may solve the wrong problem; for instance minimizing $x^2$ on $(0,1)$ produces sequences converging to $0\notin (0,1)$. If lower semicontinuity fails, compactness can still produce a limit without producing a minimizer: on $X=[-1,1]$, set $I[0]=1$ and $I[x]=|x|$ for $x\ne 0$. Then $X$ is compact and $\inf_X I=0$, approached by $x_k=1/k$, but the infimum is not attained because the only possible limiting point is assigned energy $1$. Each later chapter supplies hypotheses under which these obstructions are ruled out: compactness of minimizing sequences, closedness of the admissible class, and lower semicontinuity of the energy. [example: Dirichlet Energy With Boundary Data] [claim]The Dirichlet energy attains its infimum on $\mathcal A=g+H^1_0(U)$.[/claim] [proof]Choose a minimizing sequence $(u_k)$ in $\mathcal A$, and write $u_k=g+w_k$ with $w_k\in H^1_0(U)$. Since $(u_k)$ is minimizing, the numbers $I[u_k]$ are bounded above along the tail, so there is $M<\infty$ such that \begin{align*} \int_U |\nabla u_k|^2\,d\mathcal L^n \le M \end{align*} for all sufficiently large $k$. By Poincare's inequality applied to $w_k=u_k-g$, \begin{align*} \|u_k-g\|_{L^2(U)}^2 \le C_P^2\|\nabla u_k-\nabla g\|_{L^2(U)}^2. \end{align*} Using $|a-b|^2\le 2|a|^2+2|b|^2$ pointwise with $a=\nabla u_k$ and $b=\nabla g$ gives \begin{align*} \|\nabla u_k-\nabla g\|_{L^2(U)}^2 \le 2\|\nabla u_k\|_{L^2(U)}^2+2\|\nabla g\|_{L^2(U)}^2. \end{align*} Therefore \begin{align*} \|u_k-g\|_{H^1(U)}^2 \le (C_P^2+1)\|\nabla u_k-\nabla g\|_{L^2(U)}^2. \end{align*} Combining the last two estimates gives \begin{align*} \|u_k-g\|_{H^1(U)}^2 \le 2(C_P^2+1)\bigl(M+\|\nabla g\|_{L^2(U)}^2\bigr). \end{align*} Thus $(u_k)$ is bounded in $H^1(U)$. Since $H^1(U)$ is reflexive, a subsequence satisfies $u_{k_j}\rightharpoonup u$ in $H^1(U)$. The differences $u_{k_j}-g$ lie in the closed linear subspace $H^1_0(U)$, so weak closedness of closed convex subspaces gives $u-g\in H^1_0(U)$, hence $u\in\mathcal A$. Finally, weak lower semicontinuity of the Hilbert norm gives \begin{align*} I[u]=\|\nabla u\|_{L^2(U)}^2 \le \liminf_{j\to\infty}\|\nabla u_{k_j}\|_{L^2(U)}^2=\inf_{v\in\mathcal A} I[v]. \end{align*} Since $u\in\mathcal A$, the reverse inequality $\inf_{\mathcal A}I\le I[u]$ is automatic, so $I[u]=\inf_{\mathcal A}I$.[/proof] This example displays the direct method in its basic Sobolev form: energy control gives compactness, the boundary condition survives weak limits, and convexity of the square norm supplies the lower semicontinuity needed to pass to the minimizer. [/example] ## Why Euler-Lagrange Equations Are Not Enough The next problem is to understand why solving the Euler-Lagrange equation is not the same thing as solving the minimization problem. A stationary point may fail to minimize, a minimizer may not be smooth enough for the classical equation, and an equation may have many weak solutions without identifying the one selected by an energy principle. [definition: First Variation] Let $X$ be a [vector space](/page/Vector%20Space), let $\mathcal A \subset X$, and let $I: \mathcal A \to \mathbb R$. If $u \in \mathcal A$ and $v \in X$ are such that $u + \varepsilon v \in \mathcal A$ for all sufficiently small $\varepsilon \in \mathbb R$, the first variation of $I$ at $u$ in the direction $v$ is \begin{align*} \delta I[u;v] = \frac{d}{d\varepsilon}\Big|_{\varepsilon = 0} I[u + \varepsilon v] \end{align*} when this derivative exists. [/definition] The first variation is the differential test used by the Euler-Lagrange method, but passing that test does not identify a global minimizer. This motivates the definition of a stationary point, the class of candidates detected by vanishing first variation. [definition: Stationary Point] Let $X$ be a vector space, let $\mathcal A \subset X$, and let $I: \mathcal A \to \mathbb R$. An element $u \in \mathcal A$ is a stationary point of $I$ with respect to an admissible class of variations $V \subset X$ if \begin{align*} \delta I[u;v] = 0 \end{align*} for every $v \in V$ for which the first variation is defined. [/definition] Stationarity is local and differential, while minimality is global and order-theoretic. Direct methods prove the existence of minimizers first; Euler-Lagrange equations are then derived as additional information when the minimizer lies in a regime where variations are legitimate. [example: Stationary Point Which Is Not a Minimizer] Consider $I:\mathbb R\to\mathbb R$ given by $I[x]=x^3$ on the admissible class $\mathcal A=[-1,1]$. For any direction $v\in\mathbb R$, the variation $0+\varepsilon v$ lies in $[-1,1]$ whenever $|\varepsilon|\le 1/|v|$ if $v\ne 0$, and for all $\varepsilon$ if $v=0$. Its first variation at $0$ is \begin{align*} \delta I[0;v]=\frac{d}{d\varepsilon}\Big|_{\varepsilon=0} I[\varepsilon v]. \end{align*} Since $I[\varepsilon v]=(\varepsilon v)^3=\varepsilon^3v^3$, we get \begin{align*} \frac{d}{d\varepsilon}(\varepsilon^3v^3)=3\varepsilon^2v^3. \end{align*} Evaluating at $\varepsilon=0$ gives \begin{align*} \delta I[0;v]=3\cdot 0^2\cdot v^3=0. \end{align*} Thus $0$ is stationary for all admissible interior variations. However, \begin{align*} I[-1]=(-1)^3=-1. \end{align*} Also, \begin{align*} I[0]=0^3=0. \end{align*} Therefore $I[-1]=-1<0=I[0]$, so $0$ is not a minimizer on $[-1,1]$. The example shows that vanishing first variation is only a stationarity condition, not a global minimality condition. [/example] ## The Three Structural Inputs The direct method succeeds when three structural questions have satisfactory answers. First, are minimizing sequences bounded in a space with useful compactness? Second, is the admissible class closed under the convergence produced by that compactness? Third, does the functional lower its value, or at least not increase in the wrong direction, when passing to the limit? [definition: Coercive Functional] Let $X$ be a [normed vector space](/page/Normed%20Vector%20Space) with norm $\|\cdot\|_X$, let $\mathcal A \subset X$, and let $I: \mathcal A \to \mathbb R \cup \{+\infty\}$. The functional $I$ is coercive on $\mathcal A$ if \begin{align*} I[u_k] \to +\infty \end{align*} for every sequence $(u_k)_{k=1}^{\infty}$ in $\mathcal A$ with $\|u_k\|_X \to \infty$. [/definition] Coercivity converts energy control into norm control. In reflexive spaces, boundedness is the entry point to weak compactness, but compactness alone does not pass an energy inequality to the limit; this motivates lower semicontinuity in the [weak topology](/page/Weak%20Topology). [definition: Sequential Weak Lower Semicontinuity] Let $X$ be a [Banach space](/page/Banach%20Space), let $\mathcal A \subset X$, and let $I: \mathcal A \to \mathbb R \cup \{+\infty\}$. The functional $I$ is sequentially weakly lower semicontinuous on $\mathcal A$ if, whenever $u_k \rightharpoonup u$ in $X$ with $u_k,u \in \mathcal A$, one has \begin{align*} I[u] \le \liminf_{k \to \infty} I[u_k]. \end{align*} [/definition] Lower semicontinuity is the direction compatible with minimization. This motivates the Coercive Reflexive Direct Method theorem, which combines coercivity, weak compactness, weak closedness, and lower semicontinuity into a single existence result. [quotetheorem:8725] [citeproof:8725] This theorem is the basic existence machine for the first half of the course, and its assumptions are not decorative. Coercivity prevents minimizing sequences from escaping to infinity; without it, the model $I[x]=e^x$ on $\mathbb R$ has no minimizer. Reflexivity supplies [weak sequential compactness](/theorems/214) of bounded sequences; in non-reflexive spaces such as $L^1$, bounded sequences need not have weakly convergent subsequences, so bounded energy alone may not produce a limit. Sequential weak closedness keeps the limiting object admissible; the concrete problem of minimizing $x^2$ over the [open set](/page/Open%20Set) $(0,\infty)\subset\mathbb R$ has minimizing sequences $x_k\downarrow 0$, but the limit $0$ is not admissible and no minimizer exists. Sequential weak lower semicontinuity is the energy inequality that turns convergence into minimality; without it, weak limits may have strictly larger energy than the limiting infimum. The rest of the theory refines these assumptions so that the theorem applies to the functionals arising in PDE, geometry, and elasticity. The functional-analysis input is weak compactness, while the variational input is lower semicontinuity, and much of the course is about recognizing those two structures in concrete problems. [example: Coercivity From Poincare Inequality] Let $U \subset \mathbb R^n$ be bounded, let $\mathcal A=H^1_0(U)$, and fix $f\in L^2(U)$. For \begin{align*} I[u]=\int_U |\nabla u|^2\,d\mathcal L^n-\int_U fu\,d\mathcal L^n, \end{align*} we show that $I$ is coercive on $H^1_0(U)$. Let $u\in H^1_0(U)$. By the *[Cauchy-Schwarz inequality](/theorems/432)*, \begin{align*} \left|\int_U fu\,d\mathcal L^n\right|\le \|f\|_{L^2(U)}\|u\|_{L^2(U)}. \end{align*} By *Poincare's inequality*, there is a constant $C_P>0$ such that \begin{align*} \|u\|_{L^2(U)}\le C_P\|\nabla u\|_{L^2(U)}. \end{align*} Therefore \begin{align*} -\int_U fu\,d\mathcal L^n\ge -\left|\int_U fu\,d\mathcal L^n\right|\ge -C_P\|f\|_{L^2(U)}\|\nabla u\|_{L^2(U)}. \end{align*} Substituting this into the definition of $I$ gives \begin{align*} I[u]\ge \|\nabla u\|_{L^2(U)}^2-C_P\|f\|_{L^2(U)}\|\nabla u\|_{L^2(U)}. \end{align*} Again using Poincare's inequality, \begin{align*} \|u\|_{H^1(U)}^2=\|u\|_{L^2(U)}^2+\|\nabla u\|_{L^2(U)}^2\le (C_P^2+1)\|\nabla u\|_{L^2(U)}^2. \end{align*} Hence \begin{align*} \|\nabla u\|_{L^2(U)}^2\ge \frac{1}{C_P^2+1}\|u\|_{H^1(U)}^2. \end{align*} Also $\|\nabla u\|_{L^2(U)}\le \|u\|_{H^1(U)}$, so the previous lower bound becomes \begin{align*} I[u]\ge \frac{1}{C_P^2+1}\|u\|_{H^1(U)}^2-C_P\|f\|_{L^2(U)}\|u\|_{H^1(U)}. \end{align*} If $t=\|u\|_{H^1(U)}$, this has the form \begin{align*} I[u]\ge \frac{1}{C_P^2+1}t^2-C_P\|f\|_{L^2(U)}t. \end{align*} The quadratic term dominates the linear term: for \begin{align*} t\ge 2C_P(C_P^2+1)\|f\|_{L^2(U)}, \end{align*} we have \begin{align*} C_P\|f\|_{L^2(U)}t\le \frac{1}{2(C_P^2+1)}t^2, \end{align*} and therefore \begin{align*} I[u]\ge \frac{1}{2(C_P^2+1)}t^2. \end{align*} Thus $I[u]\to +\infty$ whenever $\|u\|_{H^1(U)}\to\infty$ in $H^1_0(U)$. In particular, any sequence in $H^1_0(U)$ with bounded energy is bounded in $H^1(U)$, which is exactly the compactness input needed before extracting weakly convergent subsequences. [/example] ## Integral Functionals and Convexity Conditions The main examples in this course are integral functionals, and the key question is which structural assumptions on the integrand imply lower semicontinuity. For scalar convex problems, ordinary convexity in the gradient variable is often enough. For vector-valued maps, especially in elasticity, convexity is too restrictive and must be replaced by weaker notions adapted to gradients. [definition: Integral Functional] Let $U \subset \mathbb R^n$ be open, let $m \in \mathbb N$, let $\mathcal A \subset W^{1,p}(U;\mathbb R^m)$, and let $L: U \times \mathbb R^m \times \mathbb R^{m \times n} \to \mathbb R \cup \{+\infty\}$. The associated integral functional is the map $I: \mathcal A \to \mathbb R \cup \{+\infty\}$ defined by \begin{align*} I[u] = \int_U L(x,u(x),\nabla u(x))\,d\mathcal L^n. \end{align*} [/definition] The gradient dependence is where most of the analysis happens. Weak convergence in $W^{1,p}$ gives weak convergence of gradients in $L^p$, but nonlinear expressions of gradients need not behave continuously under weak limits. A typical obstruction is oscillation: gradients may alternate rapidly between two values $a$ and $b$, converge weakly to their average, and yet the energy density may prefer the separate phases to the average. This is exactly the mechanism behind microstructure in materials, and it motivates convexity as the first structural condition on the integrand. [definition: Convex Function] Let $V$ be a real vector space and let $f: V \to \mathbb R \cup \{+\infty\}$. The function $f$ is convex if, for all $x,y \in V$ and all $t \in [0,1]$, \begin{align*} f(tx + (1-t)y) \le t f(x) + (1-t)f(y). \end{align*} [/definition] Convexity is the first lower semicontinuity principle because weak limits interact well with supporting hyperplanes. Before specializing to Sobolev energies, it is useful to isolate the functional-analytic core: a convex lower semicontinuous extended-real functional on a Banach space cannot drop under weak convergence. The point of the next result is to separate the weak-limit argument from the special structure of integral energies. Once an energy has been recognized as a convex lower semicontinuous functional on a Banach space, weak lower semicontinuity follows from a general principle rather than from a new compactness argument each time. [quotetheorem:986] [citeproof:986] The theorem explains why convexity is powerful in variational problems. Weak convergence gives fewer test functions than norm convergence, but convex lower semicontinuous functionals can still be recovered from their affine supporting information, so their sublevel sets remain compatible with weak limits. In applications, one first checks that the energy under study really defines such a convex lower semicontinuous functional on the relevant Banach space; only then can this abstract result be applied. For integral energies depending on gradients, that verification becomes a separate analytic problem. In the scalar convex case, the energy density can often be shown to generate a weakly lower semicontinuous Sobolev functional. Vector-valued gradients require more care because gradients are matrices constrained by compatibility, and ordinary convexity is too strong for many elasticity models. Later chapters introduce quasiconvexity and polyconvexity as conditions designed around this constraint. [example: Convex Gradient Energy] Let $1<p<\infty$, let $U\subset\mathbb R^n$ be bounded and open, and define $F(\rho)=|\rho|^p$ for $\rho\in\mathbb R^n$. We first verify the convexity condition needed for the lower semicontinuity theorem. If $\rho,\sigma\in\mathbb R^n$ and $t\in[0,1]$, then the triangle inequality and homogeneity of the Euclidean norm give \begin{align*} |t\rho+(1-t)\sigma|\le t|\rho|+(1-t)|\sigma|. \end{align*} Since $s\mapsto s^p$ is increasing on $[0,\infty)$, this implies \begin{align*} |t\rho+(1-t)\sigma|^p\le \bigl(t|\rho|+(1-t)|\sigma|\bigr)^p. \end{align*} By convexity of the scalar function $s\mapsto s^p$ on $[0,\infty)$, \begin{align*} \bigl(t|\rho|+(1-t)|\sigma|\bigr)^p\le t|\rho|^p+(1-t)|\sigma|^p. \end{align*} Therefore \begin{align*} F(t\rho+(1-t)\sigma)\le tF(\rho)+(1-t)F(\sigma), \end{align*} so $F$ is convex. The function $F$ is continuous, hence lower semicontinuous and Borel measurable, and it satisfies the lower bound \begin{align*} F(\rho)=|\rho|^p\ge 0\ge -1\cdot(1+|\rho|^p). \end{align*} Thus *[Lower Semicontinuity for Convex Scalar Integral Functionals](/theorems/8738)* applies to \begin{align*} I[u]=\int_U |\nabla u|^p\,d\mathcal L^n. \end{align*} Consequently, whenever $u_k\rightharpoonup u$ in $W^{1,p}(U)$, one has \begin{align*} \int_U |\nabla u|^p\,d\mathcal L^n\le \liminf_{k\to\infty}\int_U |\nabla u_k|^p\,d\mathcal L^n. \end{align*} Together with boundary conditions defining a weakly closed admissible class, this gives precisely the lower semicontinuity input needed in the direct method for $p$-energy minimization. [/example] ## Nonattainment, Relaxation, and the Shape of the Course The final introductory problem is what to do when the direct method fails. A sequence may develop oscillations whose weak limit has lower energy than any genuine limit of gradients, or the infimum may be approached only outside the original admissible class. Relaxation enlarges or modifies the functional so that the limiting behaviour of minimizing sequences is represented correctly. [definition: Relaxed Functional] Let $X$ be a [topological space](/page/Topological%20Space), let $\mathcal A \subset X$, and let $I: \mathcal A \to \mathbb R \cup \{+\infty\}$. The relaxed functional of $I$ with respect to convergence in $X$ is the function $\overline I: \mathcal A \to \mathbb R \cup \{+\infty\}$ defined by \begin{align*} \overline I[u] = \inf\left\{\liminf_{k \to \infty} I[u_k] : u_k \in \mathcal A,\ u_k \to u \text{ in } X\right\}. \end{align*} [/definition] Relaxation is not a technical afterthought. It identifies the energy actually seen by minimizing sequences and explains why convex envelopes, quasiconvex envelopes, and Young-measure descriptions appear in modern variational theory. [example: Oscillation And Loss Of Pointwise Gradient Information] Let $1<p<\infty$, let $U=(0,1)$, and choose two distinct slopes $a,b\in\mathbb R$. Set \begin{align*} c=\frac{a+b}{2}. \end{align*} For each $k\in\mathbb N$, define $r_k$ on every interval $(j/k,(j+1)/k)$ by setting $r_k=a$ on the first half of the interval and $r_k=b$ on the second half. Then $r_k$ alternates between $a$ and $b$ on intervals of length $1/(2k)$, and define \begin{align*} u_k(x)=\int_0^x r_k(t)\,d\mathcal L^1(t). \end{align*} On each full interval $(j/k,(j+1)/k)$, the integral of $r_k-c$ is \begin{align*} \frac{1}{2k}(a-c)+\frac{1}{2k}(b-c)=\frac{1}{2k}\left(a-\frac{a+b}{2}\right)+\frac{1}{2k}\left(b-\frac{a+b}{2}\right)=0. \end{align*} For an arbitrary $x\in(0,1)$, only one incomplete interval can remain after the full intervals before $x$, so \begin{align*} \left|u_k(x)-cx\right|=\left|\int_0^x (r_k(t)-c)\,d\mathcal L^1(t)\right|\le \frac{|a-c|+|b-c|}{2k}=\frac{|a-b|}{2k}. \end{align*} Therefore $u_k\to u$ uniformly, where $u(x)=cx$. The derivatives satisfy $u_k'=r_k$. To see the weak limit of the derivatives, first test against a function $\varphi$ that is constant on each interval $(j/k,(j+1)/k)$ with value $\varphi_j$. Then \begin{align*} \int_0^1 (r_k-c)\varphi\,d\mathcal L^1=\sum_{j=0}^{k-1}\varphi_j\int_{j/k}^{(j+1)/k}(r_k-c)\,d\mathcal L^1=0. \end{align*} For a general $\varphi\in L^{p'}(0,1)$, approximate $\varphi$ in $L^{p'}$ by step functions and use Holder's inequality with the uniform bound $\|r_k-c\|_{L^p(0,1)}=|a-b|/2$. Hence $r_k\rightharpoonup c$ in $L^p(0,1)$, and therefore $u_k\rightharpoonup u$ in $W^{1,p}(0,1)$. Now take the nonconvex density \begin{align*} F(\rho)=\min\{|\rho-a|^p,|\rho-b|^p\}. \end{align*} Since $r_k$ takes only the values $a$ and $b$, we have \begin{align*} F(r_k(x))=0 \end{align*} for almost every $x\in(0,1)$. Thus \begin{align*} \int_0^1 F(u_k')\,d\mathcal L^1=\int_0^1 F(r_k)\,d\mathcal L^1=0. \end{align*} The weak limit has derivative $u'=c$, and \begin{align*} F(c)=\min\left\{\left|\frac{a+b}{2}-a\right|^p,\left|\frac{a+b}{2}-b\right|^p\right\}=\left(\frac{|a-b|}{2}\right)^p. \end{align*} Therefore \begin{align*} \int_0^1 F(u')\,d\mathcal L^1=\left(\frac{|a-b|}{2}\right)^p>0. \end{align*} The weak limit remembers only the averaged slope $c$, while the energies of $u_k$ remember that the microscopic slopes were exactly the preferred values $a$ and $b$; relaxation replaces $F$ by an effective density that records this limiting oscillation cost. [/example] These notes follow the logical dependencies of the direct method. We begin with examples showing why classical minimization fails, then develop weak compactness and coercivity, then prove lower semicontinuity theorems for convex and Sobolev integral functionals. The later chapters study relaxation, quasiconvexity, polyconvexity, and applications to nonlinear elasticity, where the existence theory becomes sensitive to determinant constraints, orientation preservation, and the geometry of deformations. The final results in this chapter point beyond the classical finite-dimensional picture: the existence theory for variational problems is driven less by critical points and more by weak compactness and lower semicontinuity in infinite-dimensional spaces. The next chapter begins by isolating the compactness mechanism that makes minimizing sequences controllable in that setting. # 1. Variational Problems Beyond Classical Critical Points The direct method begins from a mismatch between the variational problems that arise in analysis and the classical picture of finding a critical point of a smooth finite-dimensional function. In finite dimensions, compactness and continuity often turn bounded minimization problems into existence theorems. In infinite-dimensional function spaces, minimizing sequences can oscillate, concentrate, escape boundary constraints, or converge only weakly, so the main task is to choose a setting in which the variational problem has enough compactness and enough lower semicontinuity. This chapter sets up that shift in viewpoint. We first isolate the ways classical minimization can fail, then introduce the Sobolev and trace framework in which boundary-value minimization problems are posed, and finally distinguish minimizers from stationary points. The recurring theme is that existence is not a consequence of differentiating the functional; it is a compactness statement plus a closedness statement plus a lower-semicontinuity statement. ## Failure of Classical Minimization What breaks when a finite-dimensional minimization argument is transported to a function space? The finite-dimensional Weierstrass theorem says that a [continuous function](/page/Continuous%20Function) attains its minimum on a compact set, so nonattainment must come from losing compactness, losing closedness of the admissible class, or using a topology in which the functional is not lower semicontinuous. Direct methods are built by identifying and repairing exactly these failures. [quotetheorem:7620] [citeproof:7620] This result displays the entire architecture of the direct method, but each hypothesis is doing real work. If $K$ is not compact, the function $f(x)=e^x$ on $\mathbb R$ has infimum $0$ and no minimizer; if $K$ is empty, there is no admissible point to minimize over; if continuity is replaced by a discontinuous function, for instance $f(0)=1$ and $f(x)=x$ on $K=[0,1]$ for $x>0$, the infimum $0$ is not attained. The theorem also says nothing about infinite-dimensional closed bounded sets, because finite-dimensional compactness is the ingredient that the proof uses. In Sobolev spaces, the convergence extracted from a minimizing sequence is often weak convergence, so the later direct method has to replace continuity by weak lower semicontinuity and compactness by weak compactness. [example: Finite Dimensional Quadratic Minimization] Let $A\in \mathbb R^{n\times n}$ be symmetric and positive definite, and let $b\in \mathbb R^n$. Since $y\mapsto y\cdot Ay$ is continuous and positive on the compact unit sphere, there is $\alpha>0$ such that $y\cdot Ay\ge \alpha$ whenever $|y|=1$, hence $x\cdot Ax\ge \alpha |x|^2$ for every $x\in\mathbb R^n$. Therefore \begin{align*} f(x)=\frac12 x\cdot Ax-b\cdot x\ge \frac{\alpha}{2}|x|^2-|b||x|. \end{align*} If $|x|\ge 4|b|/\alpha$, then $|b||x|\le \frac{\alpha}{4}|x|^2$, so \begin{align*} f(x)\ge \frac{\alpha}{4}|x|^2. \end{align*} Thus $f(x)\to\infty$ as $|x|\to\infty$. Choose $R$ so large that $f(x)>f(0)$ for $|x|>R$. Any minimizer over the closed ball $\overline B(0,R)$ is then a minimizer over all of $\mathbb R^n$, and such a minimizer exists by *Weierstrass Theorem in Finite Dimensions*. We now identify it. For $x,h\in\mathbb R^n$, \begin{align*} f(x+h)-f(x)=\frac12(x+h)\cdot A(x+h)-b\cdot(x+h)-\frac12 x\cdot Ax+b\cdot x. \end{align*} Expanding the quadratic term gives \begin{align*} f(x+h)-f(x)=\frac12 x\cdot Ah+\frac12 h\cdot Ax+\frac12 h\cdot Ah-b\cdot h. \end{align*} Since $A$ is symmetric, $x\cdot Ah=h\cdot Ax$, so \begin{align*} f(x+h)-f(x)=(Ax-b)\cdot h+\frac12 h\cdot Ah. \end{align*} Taking $h=\varepsilon v$ gives \begin{align*} \frac{f(x+\varepsilon v)-f(x)}{\varepsilon}=(Ax-b)\cdot v+\frac{\varepsilon}{2}v\cdot Av. \end{align*} Hence the Euler equation at a minimizer is $(Ax-b)\cdot v=0$ for every $v\in\mathbb R^n$, which is equivalent to $Ax=b$. Since $A$ is positive definite, it is invertible, so the only stationary candidate is $x_*=A^{-1}b$. Finally, this candidate is indeed the unique minimizer. For $y=x-x_*$ and $b=Ax_*$, \begin{align*} f(x_*+y)=\frac12(x_*+y)\cdot A(x_*+y)-Ax_*\cdot(x_*+y). \end{align*} Expanding and using symmetry, \begin{align*} f(x_*+y)=\frac12 x_*\cdot Ax_*+x_*\cdot Ay+\frac12 y\cdot Ay-x_*\cdot Ax_*-x_*\cdot Ay. \end{align*} The linear terms cancel, so \begin{align*} f(x_*+y)=f(x_*)+\frac12 y\cdot Ay. \end{align*} Positive definiteness gives $y\cdot Ay>0$ for $y\ne 0$, so $f(x)>f(x_*)$ whenever $x\ne x_*$. The compactness argument gives existence, while the Euler equation and positivity identify the minimizer uniquely. [/example] This example hides a finite-dimensional compactness input: closed bounded sets are compact. To understand why direct methods need weak topologies, we need a theorem showing that this compactness input disappears in infinite-dimensional normed spaces. [quotetheorem:8726] [citeproof:8726] The failure is not a technical inconvenience; it is the central reason weak topologies enter the subject. In finite-dimensional normed spaces, Heine-Borel restores the Weierstrass strategy because closed bounded sets are compact, so the theorem above is a genuinely infinite-dimensional obstruction. The obstruction is also tied to the norm topology: in a reflexive Banach space, closed bounded sets are weakly compact, while in nonreflexive spaces even weak compactness can fail. A bounded Sobolev sequence may not converge strongly, but for $1<p<\infty$ reflexivity gives weakly convergent subsequences in $W^{1,p}$. The rest of the course develops conditions under which weak convergence is enough to solve the minimization problem, while keeping track of cases where oscillation or concentration survives in the limit. [example: Oscillating Sequence Without Strong Compactness] On $U=(0,2\pi)$, set $u_k(x)=\sin(kx)$. Its $L^2$ norm is independent of $k$ because \begin{align*} \|u_k\|_{L^2(U)}^2=\int_0^{2\pi}\sin^2(kx)\,dx=\int_0^{2\pi}\frac{1-\cos(2kx)}{2}\,dx=\pi-\frac{\sin(4\pi k)-\sin(0)}{4k}=\pi. \end{align*} Thus $\|u_k\|_{L^2(U)}=\sqrt{\pi}$ for every $k$, so no subsequence can converge strongly to $0$ in $L^2(U)$, since strong convergence to $0$ would force the norms to converge to $0$. For a smooth [test function](/page/Test%20Function) $\varphi\in C^1([0,2\pi])$, [integration by parts](/theorems/210) gives \begin{align*} \int_0^{2\pi}\varphi(x)\sin(kx)\,dx=\frac{-\varphi(2\pi)\cos(2\pi k)+\varphi(0)\cos(0)}{k}+\frac{1}{k}\int_0^{2\pi}\varphi'(x)\cos(kx)\,dx. \end{align*} Since $\cos(2\pi k)=\cos(0)=1$, this becomes \begin{align*} \int_0^{2\pi}\varphi(x)\sin(kx)\,dx=\frac{\varphi(0)-\varphi(2\pi)}{k}+\frac{1}{k}\int_0^{2\pi}\varphi'(x)\cos(kx)\,dx. \end{align*} The right-hand side tends to $0$ as $k\to\infty$, because the first term is constant divided by $k$ and the second is bounded in absolute value by $k^{-1}\|\varphi'\|_{L^1(U)}$. For a general $\psi\in L^2(U)$, choose $\varphi\in C^1([0,2\pi])$ with $\|\psi-\varphi\|_{L^2(U)}<\eta$, using density of smooth functions in $L^2$. Then Cauchy's inequality gives \begin{align*} \left|\int_U(\psi-\varphi)u_k\,d\mathcal L^1\right|\le \|\psi-\varphi\|_{L^2(U)}\|u_k\|_{L^2(U)}<\eta\sqrt{\pi}. \end{align*} Since $\int_U\varphi u_k\,d\mathcal L^1\to 0$ and $\eta>0$ is arbitrary, $\int_U\psi u_k\,d\mathcal L^1\to 0$. Hence $u_k\rightharpoonup 0$ weakly in $L^2(U)$, while the sequence does not converge strongly to $0$; the high-frequency oscillation disappears under testing, but its $L^2$ size remains fixed. [/example] Another route to nonattainment is that the infimum is approached only by escaping the admissible class. The following example isolates loss of compactness at infinity before we introduce any Sobolev-space structure. [example: One Dimensional Nonattainment] Consider \begin{align*} \inf\{e^x:x\in \mathbb R\}. \end{align*} For every $x\in\mathbb R$, the exponential satisfies $e^x>0$, so $0$ is a lower bound for the set $\{e^x:x\in\mathbb R\}$. To see that it is the greatest lower bound, take $x_k=-k$. Then \begin{align*} e^{x_k}=e^{-k}=\frac{1}{e^k}. \end{align*} Since $e>1$, the sequence $e^k$ tends to $\infty$, hence $e^{-k}\to 0$. Therefore \begin{align*} \inf\{e^x:x\in\mathbb R\}=0. \end{align*} No point attains this value, because $e^x>0$ for every real $x$. The minimizing sequence $x_k=-k$ also escapes every compact subset of $\mathbb R$: if $K\subset\mathbb R$ is compact, then $K\subset[-M,M]$ for some $M>0$, and for every integer $k>M$ one has $x_k=-k<-M$, so $x_k\notin K$. Thus the infimum is approached only by running off to infinity, which is the basic loss-of-compactness mechanism in this example. [/example] This example shows that the admissible class matters as much as the formula for the energy. We now need terminology for a different admissible-class failure: even when a rougher class is meant to be the closure of a smoother one, the two infima may not agree. [definition: Lavrentiev Gap] Let $X\subset Y$ be admissible classes for a functional $I:Y\to (-\infty,\infty]$. There is a Lavrentiev gap between $X$ and $Y$ if \begin{align*} \inf_{u\in X} I[u] > \inf_{u\in Y} I[u]. \end{align*} [/definition] A Lavrentiev gap means that minimizing over smoother functions gives the wrong value compared with the natural energy class. Chapter 6 treats relaxation and Chapter 7 treats quasiconvexity partly because the naive class of classical competitors may not be closed under the compactness available for the problem. [remark: Meaning of a Gap] The definition does not assert that either infimum is attained. It compares two variational problems whose admissible classes are nested. In applications, $X$ might be a class of smooth maps satisfying boundary data, while $Y$ is a Sobolev class with the same trace; the gap says that approximation by smooth maps fails at the level of energy. [/remark] ## Sobolev Classes and Boundary Constraints How should we choose the space in which a variational problem lives? The space must encode the quantities appearing in the energy, the boundary data required by the model, and the compactness needed to pass to a limit. For first-order integral functionals, Sobolev spaces are the natural setting because they control both the function and its weak gradient. [definition: Sobolev Space] Let $U\subset \mathbb R^n$ be open, let $1\le p\le \infty$, and let $m\in \mathbb N$. The Sobolev space $W^{1,p}(U;\mathbb R^m)$ consists of all $u\in L^p(U;\mathbb R^m)$ whose first weak derivatives $\partial_i u$ exist and belong to $L^p(U;\mathbb R^m)$ for $1\le i\le n$. For $1\le p<\infty$, its norm is \begin{align*} \|u\|_{W^{1,p}(U)} = \left(\|u\|_{L^p(U)}^p + \sum_{i=1}^n \|\partial_i u\|_{L^p(U)}^p\right)^{1/p}. \end{align*} For $p=\infty$, the norm is \begin{align*} \|u\|_{W^{1,\infty}(U)} = \|u\|_{L^\infty(U)} + \sum_{i=1}^n \|\partial_i u\|_{L^\infty(U)}. \end{align*} [/definition] This definition supplies the minimum regularity needed to interpret energies depending on $u$ and $\nabla u$. It also makes the admissible class a Banach space, and for $1<p<\infty$ it gives access to weak compactness through reflexivity. [example: Dirichlet Energy with Fixed Boundary Data] Let $U\subset \mathbb R^n$ be bounded and Lipschitz, let $g\in H^1(U)$, and consider \begin{align*} I[u]=\frac12\int_U |\nabla u|^2\,d\mathcal L^n \end{align*} among functions with boundary value $g$. The Sobolev way to encode that boundary condition is to write each admissible function as \begin{align*} u=g+v\quad\text{with }v\in H^1_0(U), \end{align*} so the admissible class is $g+H^1_0(U)$. For such a function $u=g+v$, the gradients satisfy \begin{align*} \nabla v=\nabla u-\nabla g. \end{align*} Using $|a-b|^2\le 2|a|^2+2|b|^2$ pointwise with $a=\nabla u$ and $b=\nabla g$, we get \begin{align*} \|\nabla v\|_{L^2(U)}^2=\int_U |\nabla u-\nabla g|^2\,d\mathcal L^n\le 2\int_U |\nabla u|^2\,d\mathcal L^n+2\int_U |\nabla g|^2\,d\mathcal L^n. \end{align*} Since $I[u]=\frac12\int_U|\nabla u|^2\,d\mathcal L^n$, this becomes \begin{align*} \|\nabla v\|_{L^2(U)}^2\le 4I[u]+2\|\nabla g\|_{L^2(U)}^2. \end{align*} By *Poincare Inequality for Zero Trace Functions*, there is $C_U>0$ such that \begin{align*} \|v\|_{L^2(U)}\le C_U\|\nabla v\|_{L^2(U)}. \end{align*} Therefore an energy bound on $u$ gives a bound on both $\|\nabla v\|_{L^2(U)}$ and $\|v\|_{L^2(U)}$, hence a bound on $\|v\|_{H^1(U)}$. Finally, \begin{align*} \|u\|_{H^1(U)}=\|g+v\|_{H^1(U)}\le \|g\|_{H^1(U)}+\|v\|_{H^1(U)}. \end{align*} Thus the fixed boundary datum removes the uncontrolled constant directions: after subtracting $g$, the Dirichlet energy controls the full $H^1$ size through the zero-trace estimate. [/example] The Dirichlet example uses boundary values for Sobolev functions, but such values cannot generally be read pointwise on $\partial U$. We therefore need a trace-based constraint class that records boundary data in a form stable under Sobolev convergence. [definition: Trace Constraint Class] Let $U\subset \mathbb R^n$ be a bounded Lipschitz domain and let $T:H^1(U)\to H^{1/2}(\partial U)$ denote the trace operator. For boundary data $g\in H^1(U)$, the affine trace class with boundary value $g$ is \begin{align*} \mathcal A_g = \{u\in H^1(U): Tu = Tg\}. \end{align*} [/definition] The class $\mathcal A_g$ is affine rather than linear, because the boundary value is prescribed. On sufficiently regular boundaries the trace may also be viewed in $L^2(\partial U)$ through the usual embedding, but the natural trace space for $H^1(U)$ is $H^{1/2}(\partial U)$. To estimate functions in this class, we need to subtract the boundary datum and work in the corresponding homogeneous space. [definition: Zero Trace Sobolev Space] Let $U\subset \mathbb R^n$ be open. The space $H^1_0(U)$ is the closure of $C_c^\infty(U)$ in the $H^1(U)$ norm. [/definition] For bounded Lipschitz domains, $H^1_0(U)$ agrees with the Sobolev functions whose trace vanishes on the boundary. The next estimate is needed because the Dirichlet energy controls only $\nabla v$, while the $H^1$ norm also contains $\|v\|_{L^2(U)}$. [quotetheorem:76] [citeproof:76] Poincare's inequality turns gradient control into full $H^1$ control on homogeneous constrained classes. Boundedness of $U$ is essential here: on unbounded domains, compactly supported functions can spread out so that their $L^2$ mass is large compared with their gradient. The zero trace condition is also essential, since constants on a bounded domain have zero gradient but nonzero $L^2$ norm. The inequality does not give compactness by itself and does not identify a minimizer; it gives the coercive estimate needed before weak compactness can be applied. We now transfer that estimate back to affine boundary-value classes, since minimization problems usually prescribe nonzero boundary data. [quotetheorem:8727] [citeproof:8727] This result is not yet an existence theorem, since boundedness in $H^1$ gives weak compactness rather than strong compactness. The fixed boundary datum removes the constant-shift directions that the Dirichlet energy cannot see; without a trace or mean constraint, $u\mapsto u+c$ leaves the gradient term unchanged while the $H^1$ norm can diverge. The bounded Lipschitz hypothesis is also part of the mechanism, because it gives a well-behaved trace operator and the Poincare inequality used in the proof. Coercivity here is therefore a property of the pair consisting of the energy and the admissible class, not of the integrand alone. To see why the constraint is essential, we compare it with the same energy over an unconstrained class. [example: Noncoercive Quadratic Energy] Let $U\subset \mathbb R^n$ be bounded with $\mathcal L^n(U)>0$, and consider \begin{align*} I[u]=\int_U |\nabla u|^2\,d\mathcal L^n \end{align*} over all $u\in H^1(U)$, with no boundary condition and no mean constraint. For each integer $k\ge 1$, set $u_k(x)=k$. Since $u_k$ is constant, its weak derivatives satisfy $\partial_i u_k=0$ for every $1\le i\le n$, hence \begin{align*} I[u_k]=\int_U |\nabla u_k|^2\,d\mathcal L^n=\int_U 0\,d\mathcal L^n=0. \end{align*} The integrand $|\nabla u|^2$ is nonnegative for every $u\in H^1(U)$, so $I[u]\ge 0$ for every admissible $u$. Therefore $\inf_{H^1(U)} I=0$, and each $u_k$ is a minimizer. However, the $H^1$ norms of these minimizers are unbounded. Indeed, \begin{align*} \|u_k\|_{L^2(U)}^2=\int_U |k|^2\,d\mathcal L^n=k^2\mathcal L^n(U). \end{align*} Since $\nabla u_k=0$, the $H^1$ norm gives \begin{align*} \|u_k\|_{H^1(U)}^2=\|u_k\|_{L^2(U)}^2+\|\nabla u_k\|_{L^2(U)}^2=k^2\mathcal L^n(U). \end{align*} Because $\mathcal L^n(U)>0$, this tends to $\infty$ as $k\to\infty$. Thus the same energy can have minimizing sequences that are unbounded in $H^1(U)$; coercivity depends on the admissible class, not only on the formula for the energy. [/example] ## Minimizers and Stationary Points Once the problem has been placed in a Sobolev class, what should count as a solution? Classical calculus often finds candidates by differentiating the functional and solving the Euler-Lagrange equation. Direct methods instead first seek a minimizer, then derive stationarity as a consequence when differentiability and admissible variations permit it. [definition: Minimizer] Let $X$ be a set, let $\mathcal A\subset X$, and let $I:\mathcal A\to (-\infty,\infty]$. An element $u_*\in \mathcal A$ is a minimizer of $I$ over $\mathcal A$ if \begin{align*} I[u_*]\le I[u]\quad\text{for all }u\in \mathcal A. \end{align*} [/definition] The definition is global: it compares $u_*$ with every admissible competitor. It does not require differentiability of $I$, a linear structure on $\mathcal A$, or an Euler-Lagrange equation. [definition: First Variation] Let $X$ be a normed vector space, let $\mathcal A\subset X$, let $I:\mathcal A\to \mathbb R$, and let $u\in \mathcal A$. For an admissible direction $v\in X$ such that $u+\varepsilon v\in \mathcal A$ for all sufficiently small $\varepsilon\in \mathbb R$, the first variation of $I$ at $u$ in direction $v$ is \begin{align*} \delta I[u;v]=\frac{d}{d\varepsilon}\Big|_{\varepsilon=0} I[u+\varepsilon v], \end{align*} when this derivative exists. [/definition] The first variation detects infinitesimal stationarity along admissible curves. A minimizer has vanishing first variation under suitable differentiability assumptions, but the converse usually needs convexity or additional structure. [definition: Stationary Point] Let $X$ be a normed vector space, let $\mathcal A\subset X$, let $I:\mathcal A\to \mathbb R$, and for $u\in\mathcal A$ set \begin{align*} \mathcal V_u=\{v\in X: u+\varepsilon v\in\mathcal A\text{ for all sufficiently small }\varepsilon\in\mathbb R\}. \end{align*} An element $u\in \mathcal A$ is a stationary point of $I$ if $\delta I[u;v]=0$ for every $v\in\mathcal V_u$ for which the first variation exists. [/definition] Stationarity is therefore a necessary condition rather than an existence principle. To connect minimization with the Euler-Lagrange viewpoint, one must justify why an actual minimizer has zero first variation in every admissible two-sided direction. The obstruction is directional. A first variation is computed along a line $u_*+\varepsilon v$, and a minimizer only rules out energy-decreasing perturbations that stay inside the admissible class. When both signs of $\varepsilon$ are admissible, a nonzero derivative would point downhill in one of the two directions, contradicting minimality. The formal statement below isolates exactly these hypotheses before Euler-Lagrange equations are used later. [quotetheorem:8728] [citeproof:8728] The result explains why Euler-Lagrange equations appear after minimizers are known. The two-sided variation hypothesis matters: if only $\varepsilon\ge 0$ were admissible, a minimizer would give a one-sided inequality for the derivative rather than the equation $\delta I[u_*;v]=0$. Differentiability at $0$ is also essential, since nondifferentiable convex functionals lead instead to variational inequalities or subdifferential conditions. The theorem does not say that every stationary point minimizes the functional, and it does not provide compactness for constructing $u_*$. Later nonlinear problems use the same implication after existence has been secured by weak compactness and lower semicontinuity. [example: Stationary Point That Is Not a Minimizer] For $f:\mathbb R\to\mathbb R$ given by $f(x)=x^3$, we compute the first derivative from the difference quotient: \begin{align*} \frac{f(0+h)-f(0)}{h}=\frac{h^3-0^3}{h}=h^2 \end{align*} for $h\ne 0$. Since $h^2\to 0$ as $h\to 0$, we have $f'(0)=0$. The point $0$ is nevertheless not a local minimizer. For every $\delta>0$, choose $x=-\delta/2$. Then $|x|=\delta/2<\delta$, but \begin{align*} f(x)=\left(-\frac{\delta}{2}\right)^3=-\frac{\delta^3}{8}<0=f(0). \end{align*} Thus every neighborhood of $0$ contains points with strictly smaller value than $f(0)$. On the other side, if $x>0$, then $f(x)=x^3>0=f(0)$. The vanishing derivative at $0$ is only a stationarity condition; it does not by itself imply even local minimality. [/example] For the Dirichlet energy, the two viewpoints meet: the direct method gives the minimizer, and the first variation gives the weak Laplace equation. This pattern is the prototype for later nonlinear elliptic problems. [example: Euler Equation for the Dirichlet Energy] Let $U\subset \mathbb R^n$ be bounded and Lipschitz, let $g\in H^1(U)$, and consider \begin{align*} I[u]=\frac12\int_U |\nabla u|^2\,d\mathcal L^n \end{align*} over $\mathcal A_g=g+H^1_0(U)$. Fix $u\in\mathcal A_g$ and $v\in H^1_0(U)$. Since $u=g+w$ for some $w\in H^1_0(U)$, we have \begin{align*} u+\varepsilon v=g+(w+\varepsilon v)\in g+H^1_0(U)=\mathcal A_g \end{align*} for every $\varepsilon\in\mathbb R$, because $H^1_0(U)$ is a vector space. We compute the first variation along this admissible line. By linearity of weak derivatives, \begin{align*} \nabla(u+\varepsilon v)=\nabla u+\varepsilon \nabla v. \end{align*} Pointwise in $U$, \begin{align*} |\nabla u+\varepsilon\nabla v|^2=|\nabla u|^2+2\varepsilon\nabla u\cdot\nabla v+\varepsilon^2|\nabla v|^2. \end{align*} Therefore \begin{align*} I[u+\varepsilon v]=\frac12\int_U |\nabla u|^2\,d\mathcal L^n+\varepsilon\int_U \nabla u\cdot\nabla v\,d\mathcal L^n+\frac{\varepsilon^2}{2}\int_U |\nabla v|^2\,d\mathcal L^n. \end{align*} Subtracting $I[u]$ gives \begin{align*} I[u+\varepsilon v]-I[u]=\varepsilon\int_U \nabla u\cdot\nabla v\,d\mathcal L^n+\frac{\varepsilon^2}{2}\int_U |\nabla v|^2\,d\mathcal L^n. \end{align*} For $\varepsilon\ne 0$, \begin{align*} \frac{I[u+\varepsilon v]-I[u]}{\varepsilon}=\int_U \nabla u\cdot\nabla v\,d\mathcal L^n+\frac{\varepsilon}{2}\int_U |\nabla v|^2\,d\mathcal L^n. \end{align*} Since $\nabla v\in L^2(U;\mathbb R^n)$, the last integral is finite, so the second term tends to $0$ as $\varepsilon\to 0$. Hence \begin{align*} \delta I[u;v]=\int_U \nabla u\cdot\nabla v\,d\mathcal L^n. \end{align*} If $u$ minimizes $I$ over $\mathcal A_g$, then *[Minimizers Have Vanishing First Variation](/theorems/8728)* gives $\delta I[u;v]=0$ for every $v\in H^1_0(U)$. Thus \begin{align*} \int_U \nabla u\cdot\nabla v\,d\mathcal L^n=0 \end{align*} for every $v\in H^1_0(U)$. This is the weak form of $-\Delta u=0$ in $U$, while the condition $u\in g+H^1_0(U)$ encodes the boundary value $g$. [/example] The direct method can now be summarized as a sequence of requirements rather than a single theorem. First choose a topology in which minimizing sequences have convergent subsequences. Next verify that the admissible class is closed under that convergence. Finally prove the energy is lower semicontinuous, so the limit of a minimizing sequence has energy no larger than the limiting infimum. [definition: Direct Method Framework] Let $X$ be a topological space, let $\mathcal A\subset X$, and let $I:\mathcal A\to (-\infty,\infty]$. A direct-method existence argument consists of the following data: a minimizing sequence $(u_k)$ in $\mathcal A$, a convergent subsequence $u_{k_j}\to u_*$ in $X$, closedness of $\mathcal A$ under this convergence, and lower semicontinuity of $I$ along the subsequence. [/definition] The definition is a checklist for the rest of the course. Coercivity produces bounded minimizing sequences, weak compactness extracts subsequences, weak closedness preserves constraints, and weak lower semicontinuity passes the energy inequality to the limit. [quotetheorem:3105] [citeproof:3105] This abstract theorem is deliberately spare, and each assumption excludes a standard failure mode. Nonemptiness prevents the variational problem from being vacuous, boundedness below rules out minimizing sequences with energy tending to $-\infty$, compactness prevents escape or oscillation, and closedness of $\mathcal A$ prevents the limit from losing the constraint. Lower semicontinuity is the hypothesis that turns convergence of competitors into an energy inequality; without it, a convergent minimizing sequence may land at a point whose energy jumps upward. The theorem also does not identify which topology to use or prove that an integral functional has the required lower semicontinuity. The mathematical content in applications lies in proving these hypotheses for Sobolev spaces and integral functionals, using weak compactness from functional analysis, compact embeddings, convexity and quasiconvexity, relaxation when lower semicontinuity fails, and applications such as nonlinear elasticity. We have now seen why minimizing sequences can escape the classical framework and why weak convergence is the right replacement. The next step is to identify when bounded energy actually produces a weakly convergent subsequence, which is the role of coercivity and weak compactness. # 2. Weak Compactness and Coercivity The first chapter showed why existence cannot be reduced to solving Euler-Lagrange equations: minimizing sequences may drift away from classical compactness, and stationary points may not minimize. The direct method replaces pointwise or uniform compactness by weak compactness in a Banach space. This chapter develops the compactness side of the method: how [coercivity bounds minimizing sequences](/theorems/8730), how reflexivity gives weakly convergent subsequences, and how compact embeddings recover strong convergence for terms of lower order. ## Weak and Weak-* Convergence The central compactness question is this: if a sequence is bounded in the natural energy space, what kind of subsequential limit can we extract? Norm convergence is too much to expect in most Sobolev spaces, but weak convergence preserves enough linear information to pass to many convex or lower semicontinuous quantities. [definition: Weak Convergence] Let $X$ be a Banach space. A sequence $(u_k)_{k=1}^{\infty}$ in $X$ converges weakly to $u \in X$, written $u_k \rightharpoonup u$ in $X$, if \begin{align*} f(u_k) \to f(u) \end{align*} for every $f \in X^*$. [/definition] Weak convergence records convergence after testing against every bounded linear functional. It is weaker than norm convergence, but it is adapted to variational problems because many energy estimates control norms while the Euler-Lagrange information appears through duality. [example: Oscillating Sine Sequence] Let $u_k(x)=\sin(kx)$ on $(0,2\pi)$. First, \begin{align*} \|u_k\|_{L^2(0,2\pi)}^2=\int_0^{2\pi}\sin^2(kx)\,dx=\int_0^{2\pi}\frac{1-\cos(2kx)}{2}\,dx=\pi, \end{align*} because $\int_0^{2\pi}\cos(2kx)\,dx=0$ for every integer $k\ge 1$. Hence $\|u_k\|_{L^2(0,2\pi)}=\sqrt{\pi}$, so the sequence is bounded. We show that $u_k\rightharpoonup 0$ in $L^2(0,2\pi)$. Since $L^2(0,2\pi)$ is a [Hilbert space](/page/Hilbert%20Space), it is enough to prove \begin{align*} \int_0^{2\pi}\sin(kx)\,\varphi(x)\,dx\to 0 \end{align*} for every $\varphi\in L^2(0,2\pi)$. First take a trigonometric monomial. For $m\ge 0$, \begin{align*} \int_0^{2\pi}\sin(kx)\cos(mx)\,dx=\frac12\int_0^{2\pi}\sin((k+m)x)\,dx+\frac12\int_0^{2\pi}\sin((k-m)x)\,dx=0, \end{align*} where the second term is also $0$ when $k=m$, because then its integrand is $\sin(0)=0$. Similarly, \begin{align*} \int_0^{2\pi}\sin(kx)\sin(mx)\,dx=\frac12\int_0^{2\pi}\cos((k-m)x)\,dx-\frac12\int_0^{2\pi}\cos((k+m)x)\,dx. \end{align*} This equals $\pi$ if $k=m$ and equals $0$ if $k\ne m$. Therefore, for any fixed trigonometric polynomial $P$, the integral $\int_0^{2\pi}\sin(kx)P(x)\,dx$ is $0$ for all sufficiently large $k$, because only finitely many sine modes occur in $P$. Now let $\varphi\in L^2(0,2\pi)$ and let $\varepsilon>0$. By [density of trigonometric polynomials](/theorems/1219) in $L^2(0,2\pi)$, choose $P$ with $\|\varphi-P\|_{L^2}<\varepsilon/\sqrt{\pi}$. For all sufficiently large $k$, \begin{align*} \left|\int_0^{2\pi}\sin(kx)\varphi(x)\,dx\right|=\left|\int_0^{2\pi}\sin(kx)(\varphi(x)-P(x))\,dx\right|\le \|\sin(kx)\|_{L^2}\|\varphi-P\|_{L^2}<\varepsilon, \end{align*} by Cauchy-Schwarz. Thus $\int_0^{2\pi}\sin(kx)\varphi(x)\,dx\to 0$ for every $\varphi\in L^2(0,2\pi)$, which proves $u_k\rightharpoonup 0$ in $L^2(0,2\pi)$. The weak limit records the vanishing of all fixed test averages, even though the oscillations themselves do not disappear in norm. [/example] This example shows that weak convergence may erase oscillation, which is useful when minimizing sequences develop fine-scale behaviour. It also raises a second compactness problem: in dual spaces, boundedness often gives compactness only after weakening the topology further, so the next definition introduces weak* convergence. [definition: Weak Star Convergence] Let $X$ be a Banach space. A sequence $(f_k)_{k=1}^{\infty}$ in $X^*$ converges weak* to $f \in X^*$, written $f_k \overset{*}{\rightharpoonup} f$ in $X^*$, if \begin{align*} f_k(x) \to f(x) \end{align*} for every $x \in X$. [/definition] Weak* convergence is weaker than weak convergence on $X^*$ because it tests only against points of $X$, not against all elements of $X^{**}$. The reason this definition is needed is that dual unit balls have a [compactness theorem](/theorems/2748) in this topology. [quotetheorem:212] The theorem is quoted as a foundational compactness result from functional analysis. Its proof uses the embedding of the dual unit ball into the product space $\prod_{x \in X}\overline{B}(0,\|x\|_X)$, followed by Tychonoff compactness. The hypotheses matter in two ways. First, the compactness is in the [weak* topology](/page/Weak*%20Topology) coming from a specified predual $X$, so it does not say that bounded sets in $X^*$ are compact for the norm topology, or even for the ordinary weak topology of $X^*$ in general. For example, in an infinite-dimensional Hilbert space $H$ the orthonormal sequence $(e_k)$ lies in the closed unit ball but has no norm-convergent subsequence, since $|e_k-e_j|=\sqrt{2}$ for $k\ne j$. Second, Banach-Alaoglu is fundamentally a compactness theorem for nets; [sequential compactness](/page/Sequential%20Compactness) requires extra metrizability or separability hypotheses, such as restricting to weak* compact balls in the dual of a [separable space](/page/Separable%20Space). This distinction is why direct-method arguments in reflexive Sobolev spaces usually prefer weak compactness of bounded sequences, while weak* compactness becomes essential in measure-valued limits, $L^{\infty}$ bounds, BV compactness, and relaxation. [example: Weak Star Compactness in Bounded Densities] Let $\|f_k\|_{L^{\infty}(\Omega)}\le M$ for every $k$. For each $k$, define a bounded linear functional on $L^1(\Omega)$ by \begin{align*}\Lambda_k(\varphi)=\int_{\Omega} f_k(x)\varphi(x)\,d\mathcal L^n(x).\end{align*} For every $\varphi\in L^1(\Omega)$, \begin{align*}|\Lambda_k(\varphi)|\le \int_{\Omega}|f_k(x)|\,|\varphi(x)|\,d\mathcal L^n(x)\le M\int_{\Omega}|\varphi(x)|\,d\mathcal L^n(x)=M\|\varphi\|_{L^1}.\end{align*} Thus $\|\Lambda_k\|_{(L^1)^*}\le M$, so $(\Lambda_k)$ lies in the closed radius-$M$ ball of $(L^1(\Omega))^*$. By *Banach-Alaoglu*, this ball is weak* compact, hence there is a weak* convergent subnet $\Lambda_{k_\alpha}\overset{*}{\rightharpoonup}\Lambda$. Using the identification $(L^1(\Omega))^*\simeq L^{\infty}(\Omega)$, write $\Lambda(\varphi)=\int_{\Omega} f\varphi\,d\mathcal L^n$ for some $f\in L^{\infty}(\Omega)$ with $\|f\|_{L^\infty}\le M$. The weak* convergence statement is exactly \begin{align*}\int_{\Omega}f_{k_\alpha}(x)\varphi(x)\,d\mathcal L^n(x)\to \int_{\Omega}f(x)\varphi(x)\,d\mathcal L^n(x)\quad\text{for every }\varphi\in L^1(\Omega).\end{align*} Because $\Omega$ has finite measure, $L^1(\Omega)$ is separable. Let $(\varphi_j)_{j=1}^{\infty}$ be dense in $L^1(\Omega)$. For each fixed $j$, the scalar sequence $\Lambda_k(\varphi_j)$ is bounded since \begin{align*}|\Lambda_k(\varphi_j)|\le M\|\varphi_j\|_{L^1}.\end{align*} Bolzano-Weierstrass gives a subsequence on which $\Lambda_k(\varphi_1)$ converges; from that subsequence choose a further subsequence on which $\Lambda_k(\varphi_2)$ converges, and continue. The diagonal subsequence, still denoted $(\Lambda_{k_\ell})$, makes $\Lambda_{k_\ell}(\varphi_j)$ converge for every fixed $j$. If $\varphi\in L^1(\Omega)$ and $\varepsilon>0$, choose $j$ with $\|\varphi-\varphi_j\|_{L^1}<\varepsilon/(3M)$ when $M>0$. Then for $\ell,m$ large enough, \begin{align*}|\Lambda_{k_\ell}(\varphi)-\Lambda_{k_m}(\varphi)|\le |\Lambda_{k_\ell}(\varphi-\varphi_j)|+|\Lambda_{k_\ell}(\varphi_j)-\Lambda_{k_m}(\varphi_j)|+|\Lambda_{k_m}(\varphi_j-\varphi)|<\varepsilon.\end{align*} Hence $\Lambda_{k_\ell}(\varphi)$ converges for every $\varphi\in L^1(\Omega)$, which is weak* convergence along a subsequence. This is why bounded coefficients may converge weakly against every fixed test density even when no strong $L^\infty$ convergence is available. [/example] For Sobolev minimization, the strongest compactness usually comes not from weak* compactness but from bounded sequences in the primal space. Boundedness alone is not compactness in a general Banach space, so the direct method needs a structural condition ensuring that closed bounded sets have weak compactness available in the original space rather than only after passing to a larger dual object. [definition: Reflexive Banach Space] Let $X$ be a Banach space, and let $J:X \to X^{**}$ be the canonical isometric embedding defined by $J(x)(f)=f(x)$ for $f \in X^*$. The space $X$ is reflexive if $J$ is surjective. [/definition] Reflexivity is useful because it aligns bounded sets with weak compactness. Direct methods work with minimizing sequences rather than arbitrary nets, so we also need a theorem that converts weak compactness into a sequential statement. The next result supplies exactly that bridge between the topology of weak compactness and the subsequences used in existence proofs. [quotetheorem:987] The course uses this theorem as a compactness principle rather than proving it. Its significance is specifically Banach-space theoretic: in arbitrary topological spaces, compactness need not be detected by sequences, so a compactness theorem stated with open covers or nets would not automatically justify the subsequence extraction used in variational proofs. Eberlein-Smulian says that weak compactness in Banach spaces is compatible with the sequential language of minimizing sequences. The theorem does not itself make a bounded sequence compact. It must be combined with a reason that the [bounded set](/page/Bounded%20Set) under consideration is weakly compact; reflexivity is the standard reason in Sobolev direct methods. Outside this Banach weak-compactness setting, sequences can fail to detect compactness: the space $[0,\omega_1]$ of countable ordinals together with the first uncountable ordinal is compact in its order topology, while the subset $[0,\omega_1)$ is sequentially closed but not closed. Thus sequential tests do not characterize closedness or compactness in arbitrary topological spaces. To apply sequential compactness to integral functionals, we therefore need to know that the standard energy spaces are reflexive. [quotetheorem:8729] [citeproof:8729] The restriction $1<p<\infty$ is not cosmetic. It is exactly the range in which the duality theory of $L^p$ gives reflexivity and hence weak compactness of bounded sequences. The openness of $\Omega$ is part of the Sobolev setup: weak derivatives are defined distributionally using test functions compactly supported in $\Omega$, and the closed-subspace argument works because the [distributional derivative](/page/Distributional%20Derivative) relation is stable under $L^p$ limits. Reflexivity does not say that bounded sequences converge strongly, nor does it identify the limit of a minimizing sequence as a minimizer. It supplies only the weakly convergent subsequence; lower semicontinuity and weak closedness of the admissible class are separate requirements. At the endpoints $p=1$ and $p=\infty$, the direct method often needs measures, BV compactness, weak* compactness, or relaxation instead of reflexive weak compactness. [example: Endpoint Failure of Reflexivity] For $u_k=k\mathbb{1}_{(0,1/k)}$ in $L^1(0,1)$, its norm is \begin{align*} \|u_k\|_{L^1(0,1)}=\int_0^1 k\mathbb{1}_{(0,1/k)}(x)\,d\mathcal L^1(x)=\int_0^{1/k}k\,dx=k\cdot \frac1k=1. \end{align*} Thus $(u_k)$ is bounded in $L^1(0,1)$. We show that it has no weakly convergent subsequence in $L^1(0,1)$. Suppose that $u_{k_j}\rightharpoonup f$ in $L^1(0,1)$. For each fixed $\delta\in(0,1)$, the function $\mathbb{1}_{(0,\delta)}$ belongs to $L^\infty(0,1)$, so weak convergence gives \begin{align*} \int_0^1 u_{k_j}(x)\mathbb{1}_{(0,\delta)}(x)\,d\mathcal L^1(x)\to \int_0^\delta f(x)\,d\mathcal L^1(x). \end{align*} Once $k_j>1/\delta$, we have $(0,1/k_j)\subset(0,\delta)$, and therefore \begin{align*} \int_0^1 u_{k_j}(x)\mathbb{1}_{(0,\delta)}(x)\,d\mathcal L^1(x)=\int_0^{1/k_j}k_j\,dx=1. \end{align*} Hence $\int_0^\delta f(x)\,d\mathcal L^1(x)=1$ for every $\delta\in(0,1)$. But $f\in L^1(0,1)$ implies absolute continuity of the [Lebesgue integral](/page/Lebesgue%20Integral), so \begin{align*} \left|\int_0^\delta f(x)\,d\mathcal L^1(x)\right|\le \int_0^\delta |f(x)|\,d\mathcal L^1(x)\to 0 \end{align*} as $\delta\downarrow 0$, a contradiction. Thus no subsequence of $(u_k)$ converges weakly in $L^1(0,1)$, and by *Eberlein-Smulian* this rules out reflexivity of $L^1(0,1)$. The same sequence explains the informal statement that the missing limit is a Dirac mass. If $\varphi\in C([0,1])$, then \begin{align*} \int_0^1 u_k(x)\varphi(x)\,d\mathcal L^1(x)-\varphi(0)=k\int_0^{1/k}(\varphi(x)-\varphi(0))\,dx. \end{align*} Taking absolute values gives \begin{align*} \left|k\int_0^{1/k}(\varphi(x)-\varphi(0))\,dx\right|\le k\cdot \frac1k \sup_{0<x<1/k}|\varphi(x)-\varphi(0)|=\sup_{0<x<1/k}|\varphi(x)-\varphi(0)|, \end{align*} which tends to $0$ by continuity of $\varphi$ at $0$. Therefore $u_k$ converges against continuous test functions to the functional $\varphi\mapsto\varphi(0)$, namely the Dirac mass at $0$, not integration against an $L^1$ function. For $L^\infty(0,1)$, use the duality $(L^1(0,1))^*\simeq L^\infty(0,1)$. If $L^\infty(0,1)$ were reflexive, then the dual of $L^1(0,1)$ would be reflexive, and the standard implication “$X^*$ reflexive implies $X$ reflexive” would force $L^1(0,1)$ to be reflexive. This contradicts the bounded sequence above. Thus $L^\infty(0,1)$ is not reflexive: bounded $L^\infty$ sequences may have weak* limits after choosing the predual $L^1$, but this is weaker than weak compactness in the norm-dual topology. [/example] ## Coercive Functionals and Minimizing Sequences Once weak compactness is available, the next question is how a variational problem produces bounded sequences in the first place. Coercivity is the condition that prevents minimizing sequences from escaping to infinity in the admissible space. [definition: Minimizing Sequence] Let $X$ be a Banach space, let $\mathcal A \subset X$, and let $I:\mathcal A \to (-\infty,\infty]$. A sequence $(u_k)_{k=1}^{\infty}$ in $\mathcal A$ is a minimizing sequence for $I$ over $\mathcal A$ if \begin{align*} I[u_k] \to \inf_{v \in \mathcal A} I[v]. \end{align*} [/definition] A minimizing sequence is what remains when a minimizer is not yet known. The direct method turns such a sequence into a convergent subsequence by proving boundedness, extracting a weak limit, and then passing to the limit in the energy. [definition: Coercive Functional] Let $X$ be a Banach space, let $\mathcal A \subset X$, and let $I:\mathcal A \to (-\infty,\infty]$. The functional $I$ is coercive on $\mathcal A$ if, for every sequence $(u_k)_{k=1}^{\infty}$ in $\mathcal A$ with $\|u_k\|_X \to \infty$, one has \begin{align*} I[u_k] \to \infty. \end{align*} [/definition] Coercivity is a growth condition measured in the topology where compactness will be used. Its immediate job is to prevent a minimizing sequence from escaping to infinite norm. The next compactness step requires this intuition as a precise boundedness lemma. Once the infimum is finite and the sequence really has finite energies approaching it, coercivity rules out unbounded norms because escape to infinity would force the energies to diverge to $+\infty$. [quotetheorem:8730] [citeproof:8730] This short lemma is the first compactness step in most existence proofs, but each hypothesis has a role. The assumption $\inf_{\mathcal A} I<\infty$ says that the variational problem has at least one admissible competitor with finite energy and gives the minimizing sequence a finite target value. If instead $\inf_{\mathcal A} I=-\infty$, the definition of a minimizing sequence would force $I[u_k]\to-\infty$, so coercivity at large norm would no longer give the contradiction used in the proof. The eventual finite-energy condition avoids terms equal to $\infty$ obscuring the limiting comparison with the infimum. Boundedness alone still does not produce a minimizer. It only places the minimizing sequence inside a ball of the energy space; reflexivity and Eberlein-Smulian are then needed to extract a weakly convergent subsequence, and later weak lower semicontinuity is needed to pass the inequality to the limit. Thus coercivity is the bridge from variational information to compactness information. [example: Coercivity of the Dirichlet Integral] Let $\Omega \subset \mathbb R^n$ be bounded, let $1<p<\infty$, and define on $W^{1,p}_0(\Omega)$ \begin{align*} I[u]=\frac{1}{p}\int_{\Omega}|\nabla u|^p\,d\mathcal L^n. \end{align*} We show that $I$ is coercive with respect to the usual Sobolev norm. By *Poincare's inequality*, there is a constant $C_{\Omega}>0$ such that \begin{align*} \|u\|_{L^p(\Omega)}\le C_{\Omega}\|\nabla u\|_{L^p(\Omega)} \end{align*} for every $u\in W^{1,p}_0(\Omega)$. Therefore \begin{align*} \|u\|_{W^{1,p}(\Omega)}^p=\|u\|_{L^p(\Omega)}^p+\|\nabla u\|_{L^p(\Omega)}^p. \end{align*} Using the Poincare estimate in the first term gives \begin{align*} \|u\|_{W^{1,p}(\Omega)}^p\le C_{\Omega}^p\|\nabla u\|_{L^p(\Omega)}^p+\|\nabla u\|_{L^p(\Omega)}^p. \end{align*} Hence \begin{align*} \|u\|_{W^{1,p}(\Omega)}^p\le (C_{\Omega}^p+1)\|\nabla u\|_{L^p(\Omega)}^p. \end{align*} Rearranging, \begin{align*} \|\nabla u\|_{L^p(\Omega)}^p\ge \frac{1}{C_{\Omega}^p+1}\|u\|_{W^{1,p}(\Omega)}^p. \end{align*} Since \begin{align*} I[u]=\frac{1}{p}\|\nabla u\|_{L^p(\Omega)}^p, \end{align*} we obtain \begin{align*} I[u]\ge \frac{1}{p(C_{\Omega}^p+1)}\|u\|_{W^{1,p}(\Omega)}^p. \end{align*} Thus whenever $\|u_k\|_{W^{1,p}(\Omega)}\to\infty$, the lower bound forces $I[u_k]\to\infty$. Hence the Dirichlet integral is coercive on $W^{1,p}_0(\Omega)$; the zero boundary condition removes the constant directions that would otherwise have zero gradient energy. [/example] Coercivity may fail when the admissible class does not remove directions along which the energy is constant. The next example explains why boundary conditions or quotienting by constants are structural, not merely technical. [example: Noncoercivity Without Boundary Conditions] Let $\Omega \subset \mathbb R^n$ be bounded, connected, and nonempty, and define \begin{align*} I[u]=\int_{\Omega}|\nabla u|^p\,d\mathcal L^n \end{align*} on $W^{1,p}(\Omega)$. For the constant functions $u_k(x)=k$, the [weak derivative](/page/Weak%20Derivative) is $\nabla u_k=0$, so \begin{align*} I[u_k]=\int_{\Omega}|0|^p\,d\mathcal L^n=0. \end{align*} On the other hand, the usual Sobolev norm satisfies \begin{align*} \|u_k\|_{W^{1,p}(\Omega)}^p=\|u_k\|_{L^p(\Omega)}^p+\|\nabla u_k\|_{L^p(\Omega)}^p. \end{align*} Substituting $u_k(x)=k$ and $\nabla u_k=0$ gives \begin{align*} \|u_k\|_{W^{1,p}(\Omega)}^p=\int_{\Omega}|k|^p\,d\mathcal L^n+0=k^p\mathcal L^n(\Omega). \end{align*} Since $\Omega$ is nonempty and open, $\mathcal L^n(\Omega)>0$, hence \begin{align*} \|u_k\|_{W^{1,p}(\Omega)}=k\,\mathcal L^n(\Omega)^{1/p}\to\infty. \end{align*} Thus there is a sequence with $\|u_k\|_{W^{1,p}(\Omega)}\to\infty$ but $I[u_k]=0$ for every $k$, so $I[u_k]$ does not tend to $\infty$. Therefore $I$ is not coercive on $W^{1,p}(\Omega)$. The obstruction is exactly the constant direction: the gradient energy cannot see additive constants, while on $W^{1,p}_0(\Omega)$ this direction is removed and *Poincare's inequality* controls the full Sobolev norm by the gradient norm. [/example] ## Compact Embeddings and Lower-Order Terms Weak convergence is enough for convex gradient terms, but many variational integrals also contain terms depending on $u$ itself. The problem is that weak convergence $u_k \rightharpoonup u$ in $W^{1,p}$ does not generally imply pointwise or $L^p$ convergence. Compact embeddings supply strong convergence in lower-order spaces. [definition: Compact Embedding] Let $X$ and $Y$ be Banach spaces with a continuous embedding $X \hookrightarrow Y$. The embedding is compact, written $X \hookrightarrow\hookrightarrow Y$, if every bounded sequence in $X$ has a subsequence converging strongly in $Y$. [/definition] Compact embeddings are stronger than continuous embeddings because they convert boundedness into strong subsequential convergence in a weaker norm. The Sobolev version is the compactness result needed to pass lower-order terms to the limit. [quotetheorem:8731] This compactness theorem belongs to Sobolev space theory and is quoted in this course. Its role here is to recover strong convergence of $u_k$ from weak compactness of $u_k$ in $W^{1,p}$, but only in lower-order norms. The theorem does not make bounded sets compact in $W^{1,p}(\Omega)$ itself, and the boundedness and [boundary regularity](/theorems/99) assumptions prevent mass from escaping through rough geometry or to infinity. The strict inequality $q<p^*$ is also essential. At the critical exponent $p^*$ the Sobolev embedding is continuous but generally not compact, because concentrating or rescaling sequences can keep their critical $L^{p^*}$ size while losing strong convergence. In direct-method applications, Rellich-Kondrachov is therefore used for genuinely subcritical lower-order terms. [example: Compactness for Zero Trace Sobolev Spaces] [claim]If $\Omega \subset \mathbb R^n$ is bounded with Lipschitz boundary, $1<p<\infty$, and $(u_k)$ is bounded in $W^{1,p}_0(\Omega)$, then a subsequence converges weakly in $W^{1,p}_0(\Omega)$ and strongly in each compactly embedded lower-order space.[/claim] [proof]Since $(u_k)$ is bounded, there is $R>0$ such that \begin{align*} \|u_k\|_{W^{1,p}(\Omega)}\le R \end{align*} for every $k$. The space $W^{1,p}(\Omega)$ is reflexive by *Reflexivity of Lebesgue and Sobolev Spaces*, and $W^{1,p}_0(\Omega)$ is a closed linear subspace of $W^{1,p}(\Omega)$, so $W^{1,p}_0(\Omega)$ is reflexive. Hence the closed ball \begin{align*} \{v\in W^{1,p}_0(\Omega):\|v\|_{W^{1,p}(\Omega)}\le R\} \end{align*} is weakly compact, and *[Eberlein-Smulian Theorem](/theorems/987)* gives a subsequence, still denoted $(u_k)$, and some $u\in W^{1,p}_0(\Omega)$ such that \begin{align*} u_k\rightharpoonup u \quad \text{in } W^{1,p}_0(\Omega). \end{align*} If $1<p<n$ and $1\le q<p^*=\frac{np}{n-p}$, then *[Rellich Kondrachov Compactness Theorem](/theorems/8731)* gives a further subsequence and some $v\in L^q(\Omega)$ such that \begin{align*} u_k\to v \quad \text{in } L^q(\Omega). \end{align*} The embedding $W^{1,p}_0(\Omega)\hookrightarrow L^q(\Omega)$ is continuous, so for every $\psi\in (L^q(\Omega))^*$ the map \begin{align*} w\mapsto \psi(w) \end{align*} is a bounded linear functional on $W^{1,p}_0(\Omega)$. Therefore weak convergence in $W^{1,p}_0(\Omega)$ gives \begin{align*} \psi(u_k)\to \psi(u). \end{align*} Strong convergence in $L^q(\Omega)$ gives \begin{align*} \psi(u_k)\to \psi(v). \end{align*} Since limits of scalar sequences are unique, $\psi(u)=\psi(v)$ for every $\psi\in (L^q(\Omega))^*$, hence $u=v$ in $L^q(\Omega)$. Thus \begin{align*} u_k\to u \quad \text{in } L^q(\Omega) \end{align*} for every $1\le q<p^*$ after passing to a subsequence. When $p=n$, the same argument uses the compact embedding $W^{1,n}(\Omega)\hookrightarrow\hookrightarrow L^q(\Omega)$ for every $1\le q<\infty$. When $p>n$, it uses the compact embedding into $C^0(\bar{\Omega})$, or more sharply into $C^{0,\alpha}(\bar{\Omega})$ for every $0<\alpha<1-n/p$.[/proof] Thus bounded zero-trace Sobolev sequences have the compactness needed in the direct method: weak convergence remains available for gradient terms, while compact embedding gives strong convergence for lower-order terms. [/example] Strong convergence is especially useful when an integral contains a continuous potential term. We now package weak compactness and compact embedding into the exact lemma used in the direct method. [quotetheorem:8732] [citeproof:8732] This lemma packages the compactness half of the direct method, and its assumptions correspond to three separate obstructions. Reflexivity prevents bounded minimizing sequences from disappearing without a weak subsequential limit: in $L^1(0,1)$, the concentrating sequence $u_k=k\mathbb{1}_{(0,1/k)}$ is bounded but has no weakly convergent subsequence in $L^1(0,1)$. Sequential weak closedness prevents the limit from leaving the admissible class: in a Hilbert space, the open unit ball is not weakly closed, and the sequence $(1-1/k)e_1$ is admissible but converges weakly and strongly to the boundary point $e_1$. Compact embedding adds strong convergence in a weaker space, which is exactly what lower-order nonlinearities often require; without compactness, an orthonormal sequence in an infinite-dimensional Hilbert space is bounded and weakly converges to $0$, but it has no strongly convergent subsequence. The lemma does not prove existence of a minimizer by itself. It gives a candidate limit and the convergences available along a subsequence; the next chapter will combine this with weak lower semicontinuity to prove that the candidate actually attains the infimum. In PDE existence arguments, this is the point where compactness and variational inequalities begin to interact. [example: Passing a Lower-Order Term to the Limit] Let $\Omega \subset \mathbb R^n$ be bounded with Lipschitz boundary, let $1<p<n$, and take $1\le r<p^*=\frac{np}{n-p}$. Suppose $u_k\rightharpoonup u$ in $W^{1,p}_0(\Omega)$. After passing to a subsequence, *Rellich Kondrachov Compactness Theorem* gives \begin{align*} u_k\to u \quad \text{in } L^r(\Omega). \end{align*} We show that this strong lower-order convergence is enough to pass to the potential term when $F:\mathbb R\to\mathbb R$ is continuous and satisfies $|F(s)|\le C(1+|s|^r)$. Strong convergence in $L^r(\Omega)$ implies [convergence in measure](/page/Convergence%20in%20Measure): for every $\delta>0$, \begin{align*} \mathcal L^n(\{x\in\Omega:|u_k(x)-u(x)|>\delta\})\le \delta^{-r}\|u_k-u\|_{L^r(\Omega)}^r\to 0. \end{align*} Since $F$ is continuous, $F(u_k)\to F(u)$ in measure. To apply Vitali convergence, it remains to check uniform integrability. The inequality \begin{align*} |F(u_k(x))|\le C+C|u_k(x)|^r \end{align*} shows that it is enough to control the integrals of $|u_k|^r$ on small measurable sets. For every measurable $E\subset\Omega$ and every $k$, \begin{align*} \int_E |u_k|^r\,d\mathcal L^n=\int_E |(u_k-u)+u|^r\,d\mathcal L^n. \end{align*} Using $|a+b|^r\le 2^{r-1}(|a|^r+|b|^r)$ gives \begin{align*} \int_E |u_k|^r\,d\mathcal L^n\le 2^{r-1}\int_E |u_k-u|^r\,d\mathcal L^n+2^{r-1}\int_E |u|^r\,d\mathcal L^n. \end{align*} For large $k$, the first term is small because $\|u_k-u\|_{L^r}^r\to0$. For the finitely many remaining indices, absolute continuity of the Lebesgue integral makes $\int_E |u_k|^r$ small when $\mathcal L^n(E)$ is small. The same absolute continuity applies to $\int_E |u|^r$. Hence the family $(|u_k|^r)$ is uniformly integrable, and therefore $(F(u_k))$ is uniformly integrable. By *[Vitali convergence theorem](/theorems/950)*, convergence in measure plus uniform integrability gives \begin{align*} \|F(u_k)-F(u)\|_{L^1(\Omega)}\to 0. \end{align*} Consequently, \begin{align*} \left|\int_{\Omega}F(u_k)\,d\mathcal L^n-\int_{\Omega}F(u)\,d\mathcal L^n\right|\le \int_{\Omega}|F(u_k)-F(u)|\,d\mathcal L^n\to 0. \end{align*} Thus \begin{align*} \int_{\Omega}F(u_k)\,d\mathcal L^n \to \int_{\Omega}F(u)\,d\mathcal L^n. \end{align*} Compact embedding supplies exactly the strong convergence needed for this lower-order potential term, while the weak convergence in $W^{1,p}_0(\Omega)$ alone would not give this conclusion. [/example] Chapter 2 showed how coercivity and compactness produce a weak limit, but a weak limit is useful only if the functional behaves well under that convergence. The next chapter takes up that issue directly by asking when energies are lower semicontinuous in weak topologies. # 3. Lower Semicontinuity in Weak Topologies Chapter 2 supplied the compactness side of the direct method: coercivity turns a minimizing sequence into a weakly convergent subsequence in a suitable Banach space. This chapter supplies the second half: the passage from a weak limit to an actual minimizer. The central question is when a functional respects weak limits in the correct inequality direction, so that no energy is lost when the minimizing sequence converges only weakly. Weak lower semicontinuity is the variational substitute for continuity. In finite-dimensional minimization, compactness and continuity are enough; in Sobolev spaces, compactness is usually weak and the functional must be stable under weak convergence. The chapter first isolates the abstract principle, then proves the convex mechanism behind weak lower semicontinuity, and finally adds compact lower-order perturbations. ## Sequential Lower Semicontinuity and the Direct Method The direct method starts from a minimizing sequence, extracts a convergent subsequence, and then tries to compare the energy of the limit with the limiting infimum. The question is therefore not whether $I[u_k]$ converges to $I[u]$, but whether the limit can have no larger energy than the limiting sequence. [definition: Sequential Lower Semicontinuity] Let $X$ be a topological space and let $I:X\to(-\infty,\infty]$. The functional $I$ is sequentially lower semicontinuous on $X$ if for every sequence $(u_k)_{k=1}^{\infty}$ in $X$ with $u_k\to u$ in $X$, \begin{align*} I[u]\le \liminf_{k\to\infty} I[u_k]. \end{align*} [/definition] For the direct method the topology in this definition is usually the weak topology of a Banach space. The inequality says that energy may drop at the limit, but it may not jump upward along the convergent sequence. This is exactly the direction needed for minimization. [definition: Sequential Weak Lower Semicontinuity] Let $X$ be a Banach space and let $I:X\to(-\infty,\infty]$. The functional $I$ is sequentially weakly lower semicontinuous if for every sequence $(u_k)_{k=1}^{\infty}$ in $X$ with $u_k\rightharpoonup u$ in $X$, \begin{align*} I[u]\le \liminf_{k\to\infty} I[u_k]. \end{align*} [/definition] Sequential weak lower semicontinuity is paired with weak compactness. In reflexive spaces, bounded sequences have weakly convergent subsequences, so coercivity supplies the subsequence and weak lower semicontinuity passes the infimum to its limit. The next theorem records the abstract existence argument so that later sections can focus on verifying its hypotheses for concrete energies. [quotetheorem:7620] [citeproof:7620] The theorem separates existence into checkable hypotheses, and each hypothesis rules out a real failure mode. Without coercivity, the functional $I[u]=e^{-\|u\|_X}$ on $X$ has infimum $0$ but no minimizer; without weak closedness, minimizing over an open ball can drive the minimum to the boundary without attaining it; without weak lower semicontinuity, the weak limit of a minimizing sequence can have larger energy than the limiting infimum. The theorem also does not assert uniqueness, regularity, or an Euler-Lagrange equation: those require additional convexity, differentiability, and PDE estimates. The rest of this chapter is therefore devoted to verifying the lower semicontinuity hypothesis in the Sobolev settings where Chapter 2 already supplied the compactness part. [example: Dirichlet Energy With Forcing] Let $\Omega\subset\mathbb R^n$ be bounded, let $f\in H^{-1}(\Omega)=(H^1_0(\Omega))^*$, and consider \begin{align*} I[u]=\frac{1}{2}\int_\Omega |\nabla u|^2\,d\mathcal L^n-f(u) \end{align*} on $H^1_0(\Omega)$. By *Poincare's inequality*, there is $C_0>0$ such that $\|u\|_{H^1_0}\le C_0\|\nabla u\|_{L^2}$ for all $u\in H^1_0(\Omega)$. Since $f$ is bounded linear, \begin{align*} |f(u)|\le \|f\|_{H^{-1}}\|u\|_{H^1_0}\le C_0\|f\|_{H^{-1}}\|\nabla u\|_{L^2}. \end{align*} Using $ab\le \frac14 a^2+b^2$ with $a=\|\nabla u\|_{L^2}$ and $b=C_0\|f\|_{H^{-1}}$, we get \begin{align*} I[u]\ge \frac12\|\nabla u\|_{L^2}^2-C_0\|f\|_{H^{-1}}\|\nabla u\|_{L^2}\ge \frac14\|\nabla u\|_{L^2}^2-C_0^2\|f\|_{H^{-1}}^2. \end{align*} Thus $I[u]\to\infty$ whenever $\|u\|_{H^1_0}\to\infty$, so $I$ is coercive. Now suppose $u_k\rightharpoonup u$ in $H^1_0(\Omega)$. Since $f\in (H^1_0)^*$, weak convergence gives \begin{align*} f(u_k)\to f(u). \end{align*} For the gradient term, weak convergence gives $\int_\Omega \nabla u_k\cdot \nabla u\,d\mathcal L^n\to \int_\Omega |\nabla u|^2\,d\mathcal L^n$, and Cauchy-Schwarz gives \begin{align*} \|\nabla u\|_{L^2}^2=\lim_{k\to\infty}\int_\Omega \nabla u_k\cdot \nabla u\,d\mathcal L^n\le \left(\liminf_{k\to\infty}\|\nabla u_k\|_{L^2}\right)\|\nabla u\|_{L^2}. \end{align*} If $\|\nabla u\|_{L^2}>0$, division gives $\|\nabla u\|_{L^2}\le \liminf_k\|\nabla u_k\|_{L^2}$, and the case $\|\nabla u\|_{L^2}=0$ is immediate. Therefore \begin{align*} I[u]\le \liminf_{k\to\infty} I[u_k]. \end{align*} The hypotheses of the *[Direct Method Template](/theorems/8724)* are satisfied with $\mathcal A=H^1_0(\Omega)$, so $I$ has a minimizer $u\in H^1_0(\Omega)$. For every $\varphi\in H^1_0(\Omega)$ and every $t\in\mathbb R$, \begin{align*} I[u+t\varphi]-I[u]=t\left(\int_\Omega \nabla u\cdot \nabla\varphi\,d\mathcal L^n-f(\varphi)\right)+\frac{t^2}{2}\int_\Omega |\nabla\varphi|^2\,d\mathcal L^n. \end{align*} Since $t=0$ minimizes this quadratic polynomial in $t$, its linear coefficient is $0$, hence \begin{align*} \int_\Omega \nabla u\cdot \nabla\varphi\,d\mathcal L^n=f(\varphi)\quad\text{for every }\varphi\in H^1_0(\Omega). \end{align*} This is exactly the weak formulation of $-\Delta u=f$ with zero boundary data. [/example] This example already displays the two kinds of terms that occur throughout the course. The leading convex term behaves well under weak convergence, while the linear lower-order term is continuous under weak convergence because it is a bounded functional. ## Convexity and Weak Lower Semicontinuity The main structural source of weak lower semicontinuity is convexity. Weak convergence tests only against continuous linear functionals, so convex sets and convex functionals are compatible with weak topology through separation and averaging. [definition: Convex Functional] Let $X$ be a real vector space and let $I:X\to(-\infty,\infty]$. The functional $I$ is convex if for all $u,v\in X$ and all $t\in[0,1]$, \begin{align*} I[tu+(1-t)v]\le tI[u]+(1-t)I[v]. \end{align*} [/definition] Convexity alone is algebraic, so it must be combined with a topological condition. The next lemma provides the bridge: weak limits can be approximated in norm by convex combinations of the original sequence. [quotetheorem:216] [citeproof:216] [Mazur's lemma](/theorems/216) converts weak convergence into strong convergence after averaging, and the averaging is essential. In an infinite-dimensional Hilbert space, the standard basis $(e_k)$ satisfies $e_k\rightharpoonup 0$ but $\|e_k-0\|_H=1$ for every $k$, so weak convergence alone does not provide norm convergence of the original sequence. What the lemma gives instead is strong convergence of convex averages, and this is exactly the form compatible with convex functionals; for nonconvex functionals, averaging may lower the energy in an uncontrolled way. The next theorem makes this mechanism precise. [quotetheorem:986] [citeproof:986] The convex functional theorem handles energies depending on a single weakly convergent Banach-space variable, but its hypotheses are not cosmetic. On an infinite-dimensional Hilbert space, the functional $I[u]=-\|u\|_H^2$ is strongly continuous but concave, and along the standard basis $e_k\rightharpoonup 0$ it gives $I[0]=0>-1=\liminf_k I[e_k]$, so weak lower semicontinuity fails. Strong continuity cannot simply be removed either: convex functions may have discontinuities on non-Banach or badly topologized domains, and the proof needs norm convergence of Mazur averages to pass from $v_j\to u$ to $I[v_j]\to I[u]$. Integral functionals introduce two variables: $u_k$, which may converge strongly after compactness, and $\nabla u_k$, which is controlled only weakly. This motivates a lower semicontinuity theorem where convexity is imposed in the gradient variable and the lower-order variable is controlled by a normal-integrand hypothesis. [quotetheorem:8733] [citeproof:8733] The theorem explains why the direct method for first-order integral functionals usually asks for convexity in $\nabla u$, not in $u$. Its assumptions also mark the boundary of the result. If $P\mapsto F(P)$ is not convex, weak lower semicontinuity can fail: in one dimension, oscillating gradients with values near $-1$ and $1$ may converge weakly to $0$, and an energy density with wells at $\pm1$ can have limiting energy strictly below the energy forced at the weak limit. The theorem also does not cover critical concentration phenomena, sign-changing integrands without a lower bound, or arbitrary discontinuous dependence on $u$; those cases require quasiconvexity, concentration-compactness, or sharper compactness hypotheses. Weak convergence gives only $\nabla u_k\rightharpoonup\nabla u$ in $L^p$, while compact embeddings often provide strong convergence of $u_k$ in lower-order spaces. [example: The $p$-Dirichlet Energy] Let $1<p<\infty$, let $\Omega\subset\mathbb R^n$ be bounded, and let $f\in W^{-1,p'}(\Omega)=(W^{1,p}_0(\Omega))^*$. On $W^{1,p}_0(\Omega)$ define \begin{align*} I[u]=\frac{1}{p}\int_\Omega |\nabla u|^p\,d\mathcal L^n-f(u). \end{align*} By *Poincare's inequality*, there is $C_0>0$ such that $\|u\|_{W^{1,p}_0}\le C_0\|\nabla u\|_{L^p}$ for every $u\in W^{1,p}_0(\Omega)$. Since $f$ is bounded linear, \begin{align*} |f(u)|\le \|f\|_{W^{-1,p'}}\|u\|_{W^{1,p}_0}\le C_0\|f\|_{W^{-1,p'}}\|\nabla u\|_{L^p}. \end{align*} Set $A=C_0\|f\|_{W^{-1,p'}}$ and $r=\|\nabla u\|_{L^p}$. [Young's inequality](/theorems/244) with $p'=\frac{p}{p-1}$, $x=2^{-1/p}r$, and $y=2^{1/p}A$ gives \begin{align*} Ar=xy\le \frac{x^p}{p}+\frac{y^{p'}}{p'}=\frac{r^p}{2p}+\frac{2^{p'/p}A^{p'}}{p'}. \end{align*} Therefore \begin{align*} I[u]\ge \frac{1}{p}\|\nabla u\|_{L^p}^p-A\|\nabla u\|_{L^p}\ge \frac{1}{2p}\|\nabla u\|_{L^p}^p-\frac{2^{p'/p}A^{p'}}{p'}. \end{align*} If $\|u\|_{W^{1,p}_0}\to\infty$, then $\|\nabla u\|_{L^p}\to\infty$ by the Poincare bound, and the last inequality shows that $I[u]\to\infty$. Thus $I$ is coercive. Now suppose $u_k\rightharpoonup u$ in $W^{1,p}_0(\Omega)$. The gradient map is bounded linear from $W^{1,p}_0(\Omega)$ to $L^p(\Omega;\mathbb R^n)$, so $\nabla u_k\rightharpoonup\nabla u$ in $L^p(\Omega;\mathbb R^n)$. The map $P\mapsto |P|^p$ is convex, and hence the functional $g\mapsto \frac1p\int_\Omega |g|^p\,d\mathcal L^n$ is weakly lower semicontinuous by the *Weak Lower Semicontinuity of Convex Continuous Functionals*. Thus \begin{align*} \frac1p\int_\Omega |\nabla u|^p\,d\mathcal L^n\le \liminf_{k\to\infty}\frac1p\int_\Omega |\nabla u_k|^p\,d\mathcal L^n. \end{align*} Also, because $f\in (W^{1,p}_0)^*$ and $u_k\rightharpoonup u$, \begin{align*} f(u_k)\to f(u). \end{align*} Combining these two facts gives \begin{align*} I[u]\le \liminf_{k\to\infty} I[u_k]. \end{align*} The hypotheses of the *Direct Method Template* are therefore satisfied with $\mathcal A=W^{1,p}_0(\Omega)$, so $I$ has a minimizer $u\in W^{1,p}_0(\Omega)$. For $\varphi\in W^{1,p}_0(\Omega)$ and $t\in\mathbb R$, minimality gives $I[u]\le I[u+t\varphi]$. For each $x\in\Omega$, differentiating $t\mapsto |\nabla u(x)+t\nabla\varphi(x)|^p/p$ at $t=0$ gives $|\nabla u(x)|^{p-2}\nabla u(x)\cdot\nabla\varphi(x)$, and the standard first-variation formula for the convex $p$-power energy yields \begin{align*} 0=\frac{d}{dt}\bigg|_{t=0} I[u+t\varphi]=\int_\Omega |\nabla u|^{p-2}\nabla u\cdot\nabla\varphi\,d\mathcal L^n-f(\varphi). \end{align*} Hence \begin{align*} \int_\Omega |\nabla u|^{p-2}\nabla u\cdot\nabla\varphi\,d\mathcal L^n=f(\varphi)\quad\text{for every }\varphi\in W^{1,p}_0(\Omega). \end{align*} This is the weak formulation of $-\Delta_p u=f$ with zero boundary data. [/example] This example is the nonlinear analogue of the Hilbert-space Dirichlet energy. The convexity is no longer quadratic, but the direct method uses the same compactness and lower semicontinuity pattern. ## Strong Lower-Order Perturbations and Compact Embeddings Many energies are not purely convex in the full Sobolev variable. The leading part controls $\nabla u$, while lower-order terms depend on $u$ itself and may be nonconvex. The question is when those lower-order terms are continuous along the same minimizing sequences that converge weakly in $W^{1,p}$. [definition: Strongly Continuous Perturbation Along Weakly Convergent Sequences] Let $X$ and $Y$ be Banach spaces with a compact embedding $X\hookrightarrow Y$. A functional $J:Y\to\mathbb R$ defines a strongly continuous lower-order perturbation on $X$ if $J$ is continuous with respect to $\|\cdot\|_Y$ and $J$ is evaluated on $X$ through the embedding $X\hookrightarrow Y$. [/definition] The definition packages a common situation: weak convergence in the energy space plus compactness gives strong convergence in a lower-order space. The perturbation need only be continuous in that lower-order topology. [quotetheorem:8734] [citeproof:8734] This stability result is the standard way to include potentials, reaction terms, and source terms that are compact relative to the principal coercive energy. Compactness is the essential extra input: in $H^1_0(0,1)$, weak convergence implies strong convergence in $L^2(0,1)$, but on an unbounded domain such as $\mathbb R^n$ translations can satisfy $u_k\rightharpoonup 0$ in $H^1(\mathbb R^n)$ while $\|u_k\|_{L^q}$ stays constant, so an $L^q$ potential need not be continuous along the sequence. The theorem therefore does not say that every continuous perturbation preserves weak lower semicontinuity; it says this only on bounded weakly convergent sequences for perturbations that are continuous in a topology obtained compactly from the energy space. The compact embedding must match the growth of the perturbation. [example: Semilinear Energy With Subcritical Potential] Let $\Omega\subset\mathbb R^n$ be bounded with boundary regularity sufficient for the compact Sobolev embedding, let $1<p<n$, and let $1\le q<p^*=np/(n-p)$. On $W^{1,p}_0(\Omega)$ consider \begin{align*} I[u]=\frac{1}{p}\int_\Omega |\nabla u|^p\,d\mathcal L^n-\int_\Omega G(x,u)\,d\mathcal L^n, \end{align*} where $G$ is a Caratheodory function satisfying \begin{align*} |G(x,s)|\le a(x)+C|s|^q \end{align*} with $a\in L^1(\Omega)$ and $C>0$. By the compact Sobolev embedding, $W^{1,p}_0(\Omega)\hookrightarrow L^q(\Omega)$ compactly because $q<p^*$. Suppose $u_k\rightharpoonup u$ in $W^{1,p}_0(\Omega)$. The compact embedding gives \begin{align*} u_k\to u\quad\text{strongly in }L^q(\Omega) \end{align*} after passing to the relevant subsequence in the liminf argument. The gradient map is bounded linear from $W^{1,p}_0(\Omega)$ to $L^p(\Omega;\mathbb R^n)$, so \begin{align*} \nabla u_k\rightharpoonup \nabla u\quad\text{in }L^p(\Omega;\mathbb R^n). \end{align*} Since $P\mapsto |P|^p/p$ is convex and continuous, the leading term is weakly lower semicontinuous by *Weak Lower Semicontinuity of Convex Continuous Functionals*: \begin{align*} \frac1p\int_\Omega |\nabla u|^p\,d\mathcal L^n\le \liminf_{k\to\infty}\frac1p\int_\Omega |\nabla u_k|^p\,d\mathcal L^n. \end{align*} Assume, as supplied by the usual Nemytskii-continuity hypothesis for this growth class, that \begin{align*} u_k\to u\text{ in }L^q(\Omega)\quad\Longrightarrow\quad \int_\Omega G(x,u_k)\,d\mathcal L^n\to \int_\Omega G(x,u)\,d\mathcal L^n. \end{align*} Then \begin{align*} -\int_\Omega G(x,u_k)\,d\mathcal L^n\to -\int_\Omega G(x,u)\,d\mathcal L^n. \end{align*} Combining the liminf inequality for the leading term with the convergence of the potential term gives \begin{align*} I[u]\le \liminf_{k\to\infty} I[u_k]. \end{align*} The growth bound also shows exactly what coercivity must overcome. If $S_q>0$ is a Sobolev constant with \begin{align*} \|u\|_{L^q}\le S_q\|\nabla u\|_{L^p}, \end{align*} then \begin{align*} \int_\Omega G(x,u)\,d\mathcal L^n\le \int_\Omega |G(x,u)|\,d\mathcal L^n\le \|a\|_{L^1}+C\|u\|_{L^q}^q\le \|a\|_{L^1}+C S_q^q\|\nabla u\|_{L^p}^q. \end{align*} Hence \begin{align*} I[u]\ge \frac1p\|\nabla u\|_{L^p}^p-\|a\|_{L^1}-C S_q^q\|\nabla u\|_{L^p}^q. \end{align*} Thus, whenever the positive $p$-growth term dominates this negative potential bound, for instance when $q<p$, the right-hand side tends to $+\infty$ as $\|\nabla u\|_{L^p}\to\infty$. Since Poincare's inequality gives an equivalent norm on $W^{1,p}_0(\Omega)$ through $\|\nabla u\|_{L^p}$, this is coercivity. The hypotheses of the *Direct Method Template* are then satisfied with $\mathcal A=W^{1,p}_0(\Omega)$, so $I$ attains its minimum. [/example] The restriction $q<p^*$ is the subcritical condition. At the critical exponent, compactness may fail, and minimizing sequences can lose mass through concentration; this is why critical problems require additional tools beyond the basic direct method. [remark: Coercivity After Perturbation] Weak lower semicontinuity is not enough for existence if coercivity is destroyed. A lower-order perturbation may be continuous along weakly convergent bounded sequences and still make the energy unbounded below. Existence requires both the lower semicontinuity mechanism of this chapter and the coercive bounds from the previous chapter. [/remark] The chapter's conclusion is that lower semicontinuity is not a separate miracle but a structural consequence of convexity, weak topology, and compact embeddings. The direct method works when the leading part is convex and coercive in the weakly controlled variables, while the remaining terms are either weakly continuous or compactly continuous in stronger lower-order topologies. Once compactness is available, existence reduces to a lower semicontinuity question: can the energy at the weak limit be no larger than along the minimizing sequence? The next chapter turns the abstract machinery from Chapters 2 and 3 into Tonelli's existence theorem for scalar integral functionals. # 4. Tonelli's Existence Theorem for Scalar Integral Functionals This chapter turns the compactness machinery of Chapter 2 and the lower semicontinuity machinery of Chapter 3 into a usable existence theorem for scalar integral functionals. The main object is an energy of the form $I[u]=\int_U f(x,u,\nabla u)\,d\mathcal L^n$, where the unknown $u$ is scalar-valued and the gradient variable lies in $\mathbb R^n$. The direct method asks for three ingredients: compactness of minimizing sequences, closedness of the admissible class under weak convergence, and weak lower semicontinuity of the energy. Tonelli's theorem packages these requirements into hypotheses on measurability, convexity, growth, and coercivity. ## Measurable Integrands and Integral Functionals The first problem is definitional but not merely formal: if the integrand depends on $x$, on the value $u(x)$, and on the gradient $\nabla u(x)$, then we need hypotheses ensuring that $x \mapsto f(x,u(x),\nabla u(x))$ is measurable for Sobolev maps. Continuity in all variables would be too restrictive for applications with rough coefficients, while pure measurability would not interact well with approximation and lower semicontinuity. [definition: Caratheodory Integrand] Let $U \subset \mathbb R^n$ be open. A function $f:U\times \mathbb R\times \mathbb R^n \to (-\infty,\infty]$ is a Caratheodory integrand if: 1. for every $(s,\xi)\in \mathbb R\times \mathbb R^n$, the map $x\mapsto f(x,s,\xi)$ is $\mathcal L^n$-measurable on $U$; 2. for $\mathcal L^n$-a.e. $x\in U$, the map $(s,\xi)\mapsto f(x,s,\xi)$ is continuous on $\mathbb R\times \mathbb R^n$. [/definition] The definition separates the rough variable $x$ from the fibre variables $(s,\xi)$. The next question is whether this separation is strong enough to make the variational energy measurable once $s$ and $\xi$ are replaced by $u(x)$ and $\nabla u(x)$. [quotetheorem:8735] [citeproof:8735] The theorem justifies writing the energy as an extended real number whenever $u\in W^{1,p}(U)$ and the positive and negative parts are not both infinite. The measurability hypothesis in $x$ is essential: without it, even a constant Sobolev function could produce a nonmeasurable integrand value. The fibre continuity assumption is also doing real work, because it allows measurable approximations of $(u,\nabla u)$ to pass through $f$ by pointwise limits; arbitrary separately defined functions need not have this stability. This result only proves measurability of the composed integrand, not integrability, coercivity, or lower semicontinuity. The next step is to name the functional whose minimizers the direct method will seek. [definition: Scalar Integral Functional] Let $1\le p<\infty$, let $U\subset\mathbb R^n$ be open, and let $f:U\times\mathbb R\times\mathbb R^n\to(-\infty,\infty]$ be measurable on compositions. Define \begin{align*} D(I):=\{u\in W^{1,p}(U):\int_U f(x,u(x),\nabla u(x))\,d\mathcal L^n \text{ is well-defined in }(-\infty,\infty]\}. \end{align*} The associated scalar integral functional is the map $I:D(I)\to(-\infty,\infty]$ given by \begin{align*} I[u] := \int_U f(x,u(x),\nabla u(x))\,d\mathcal L^n. \end{align*} [/definition] This definition gives the object to minimize, but continuity in $(s,\xi)$ is stronger than lower semicontinuity arguments require. To include indicator-type constraints and nonsmooth convex densities, we next record the lower-semicontinuous measurable class of integrands. [definition: Normal Integrand] Let $U\subset\mathbb R^n$ be open. A function $f:U\times\mathbb R\times\mathbb R^n\to(-\infty,\infty]$ is a normal integrand if: 1. for $\mathcal L^n$-a.e. $x\in U$, the map $(s,\xi)\mapsto f(x,s,\xi)$ is lower semicontinuous; 2. the epigraph-valued map \begin{align*} x\mapsto \{(s,\xi,t)\in\mathbb R\times\mathbb R^n\times\mathbb R:f(x,s,\xi)\le t\} \end{align*} is measurable as a multifunction from $U$ into the closed subsets of $\mathbb R^{n+2}$ equipped with the Effros Borel structure. [/definition] Caratheodory integrands are the most common normal integrands encountered in these notes. The normal integrand formulation is useful because lower semicontinuity survives limits and because many constrained problems are encoded by allowing $f=\infty$ outside an admissible set. [example: Rough Weighted Quadratic Energy] Let $U\subset\mathbb R^n$ be bounded, choose a measurable representative of $a\in L^\infty(U)$, and assume $a(x)\ge a_0>0$ for a.e. $x\in U$. Define \begin{align*} f(x,s,\xi)=a(x)|\xi|^2+s^2. \end{align*} For fixed $(s,\xi)\in\mathbb R\times\mathbb R^n$, the map $x\mapsto a(x)|\xi|^2+s^2$ is measurable because $a$ is measurable and $|\xi|^2,s^2$ are constants. For a.e. fixed $x\in U$, the map $(s,\xi)\mapsto a(x)|\xi|^2+s^2$ is continuous, since it is the sum of the polynomial $s\mapsto s^2$ and the quadratic map $\xi\mapsto a(x)|\xi|^2$. Thus $f$ is a Caratheodory integrand. If $u\in H^1(U)$, then $u\in L^2(U)$ and $\nabla u\in L^2(U;\mathbb R^n)$, so \begin{align*} |a(x)||\nabla u(x)|^2+|u(x)|^2\le \|a\|_{L^\infty(U)}|\nabla u(x)|^2+|u(x)|^2 \end{align*} for a.e. $x\in U$. The right-hand side is integrable, hence the associated energy is the finite quantity \begin{align*} I[u]=\int_U a(x)|\nabla u|^2+u^2\,d\mathcal L^n. \end{align*} The lower bound $a(x)\ge a_0$ also gives \begin{align*} I[u]\ge \int_U a_0|\nabla u|^2+u^2\,d\mathcal L^n, \end{align*} so the rough coefficient still controls the quadratic gradient energy. This example shows that coefficient-dependent variational problems need measurability in $x$, not continuity in $x$: bounded measurable weights already give a well-defined Sobolev energy. [/example] ## Growth, Coercivity, and Boundary-Value Classes Once the energy is meaningful, the next question is whether minimizing sequences have weakly convergent subsequences. This is not a consequence of convexity alone. The functional must control the Sobolev norm strongly enough, and the boundary or constraint class must be stable under the weak topology. [definition: p-Growth Bounds] Let $1<p<\infty$, let $U\subset\mathbb R^n$ be open, and let $f:U\times\mathbb R\times\mathbb R^n\to\mathbb R$ be an integrand. We say that $f$ satisfies two-sided $p$-growth if there exist constants $\alpha>0$, $\beta\ge0$, $c\ge0$, and functions $a\in L^1(U)$, $b\in L^1(U)$ such that for a.e. $x\in U$ and all $(s,\xi)\in\mathbb R\times\mathbb R^n$, \begin{align*} \alpha |\xi|^p - c|s|^p - a(x) \le f(x,s,\xi) \le \beta(1+|s|^p+|\xi|^p)+b(x). \end{align*} [/definition] The upper bound gives finiteness on natural Sobolev classes, while the lower bound is the starting point for coercivity. To turn gradient control into full Sobolev control, the direct method next needs admissible classes with boundary information. [definition: Dirichlet Boundary Class] Let $U\subset\mathbb R^n$ be a bounded Lipschitz domain, let $1<p<\infty$, and let $g\in W^{1,p}(U)$. The Dirichlet boundary class with trace $g$ is \begin{align*} \mathcal A_g := g+W^{1,p}_0(U)=\{u\in W^{1,p}(U):u-g\in W^{1,p}_0(U)\}. \end{align*} [/definition] This class is affine rather than linear, but its usefulness depends on what happens after taking weak limits. A bounded minimizing sequence may converge weakly to some $u$, and the direct method fails at once if that limit has lost the prescribed trace. The obstruction is that weak convergence does not visibly preserve pointwise boundary values, since Sobolev functions are represented by equivalence classes and their boundary data are encoded through traces. The admissibility question is therefore whether membership in $g+W^{1,p}_0(U)$ survives weak limits in $W^{1,p}(U)$. The formal result below provides that weak closedness step, so a compactness argument cannot leave the Dirichlet class. [quotetheorem:8736] [citeproof:8736] Weak closedness keeps the limit admissible, but it does not produce the limit. This hypothesis cannot be omitted: an open affine constraint such as $\{u\in W^{1,p}(U):\|u-g\|_{W^{1,p}}<1\}$ may contain a weakly convergent minimizing sequence whose limit lies on the boundary and is no longer admissible. The theorem also uses that the boundary class is closed in the Sobolev norm; replacing $W^{1,p}_0(U)$ by a merely dense subspace such as $C_c^\infty(U)$ would lose closedness. The next definition isolates the condition ensuring that minimizing sequences cannot escape to infinity in the ambient Banach space. [definition: Coercive Functional on an Admissible Class] Let $X$ be a Banach space, let $\mathcal A\subset X$, and let $I:\mathcal A\to(-\infty,\infty]$. The functional $I$ is coercive on $\mathcal A$ if for every sequence $(u_k)\subset\mathcal A$ with $\|u_k\|_X\to\infty$, we have $I[u_k]\to\infty$. [/definition] For Dirichlet classes, coercivity is usually proved from a lower $p$-growth estimate in $\nabla u$. The possible obstruction is that gradient bounds alone do not control additive constants, so a sequence could keep nearly the same gradient energy while its $L^p$ norm drifts to infinity unless the boundary condition anchors it. [quotetheorem:8737] [citeproof:8737] The Dirichlet condition is not a cosmetic assumption in this coercivity argument. If only $\int_U |\nabla u|^p\,d\mathcal L^n$ is controlled on all of $W^{1,p}(U)$, the constants $u_k\equiv k$ have zero gradient energy and unbounded $L^p$ norm. A boundary condition, a mean-zero constraint, or a coercive lower-order term is needed to rule out this additive-constant escape. Not all admissible sets are pure boundary classes. Obstacles and unilateral constraints occur when minimizers are required to lie above a prescribed function, and their weak closedness is another stability input. [example: Obstacle-Type Admissible Class] Let $U\subset\mathbb R^n$ be bounded, let $1<p<\infty$, let $g\in W^{1,p}(U)$, and let $\psi\in W^{1,p}(U)$ satisfy $\psi\le g$ in the trace sense on $\partial U$. Consider \begin{align*} \mathcal K_{g,\psi}:=\{u\in g+W^{1,p}_0(U):u\ge \psi \text{ a.e. in }U\}. \end{align*} We verify that this obstacle class is convex and weakly closed in $W^{1,p}(U)$. If $u,v\in\mathcal K_{g,\psi}$ and $\theta\in[0,1]$, then \begin{align*} \theta u+(1-\theta)v-g=\theta(u-g)+(1-\theta)(v-g). \end{align*} Since $u-g,v-g\in W^{1,p}_0(U)$ and $W^{1,p}_0(U)$ is a linear subspace, the right-hand side belongs to $W^{1,p}_0(U)$. Also, because $u\ge\psi$ and $v\ge\psi$ a.e., \begin{align*} \theta u+(1-\theta)v\ge \theta\psi+(1-\theta)\psi=\psi \end{align*} a.e. in $U$. Hence $\theta u+(1-\theta)v\in\mathcal K_{g,\psi}$, so $\mathcal K_{g,\psi}$ is convex. Now suppose $u_k\in\mathcal K_{g,\psi}$ and $u_k\rightharpoonup u$ in $W^{1,p}(U)$. Since $u_k-g\rightharpoonup u-g$ and $W^{1,p}_0(U)$ is a norm-closed convex subset of $W^{1,p}(U)$, it is weakly closed; therefore $u-g\in W^{1,p}_0(U)$. It remains to keep the obstacle inequality. By *Mazur's lemma*, for each $j$ there are indices $N_j\ge j$ and numbers $\lambda_{j,k}\ge0$ with \begin{align*} \sum_{k=j}^{N_j}\lambda_{j,k}=1 \end{align*} such that \begin{align*} v_j:=\sum_{k=j}^{N_j}\lambda_{j,k}u_k\to u \end{align*} strongly in $W^{1,p}(U)$, hence also strongly in $L^p(U)$. Since every $u_k\ge\psi$ a.e., \begin{align*} v_j-\psi=\sum_{k=j}^{N_j}\lambda_{j,k}(u_k-\psi)\ge0 \end{align*} a.e. in $U$. Strong convergence in $L^p(U)$ gives a subsequence $v_{j_m}$ with $v_{j_m}(x)\to u(x)$ for a.e. $x\in U$. Passing to the pointwise limit in $v_{j_m}(x)\ge\psi(x)$ gives $u(x)\ge\psi(x)$ for a.e. $x\in U$. Thus $u\in\mathcal K_{g,\psi}$, so the class is weakly closed. The obstacle inequality is therefore stable under weak limits in exactly the way required by the direct method: the affine boundary condition survives weak convergence, and the unilateral constraint survives through convex averaging and almost-everywhere convergence. [/example] ## Lower Semicontinuity for Convex Scalar Integrands After compactness has produced a weak limit, the decisive question is whether the energy of the limit is no larger than the limiting infimum of the energies. For integral functionals depending on gradients, the structural condition giving weak lower semicontinuity in the scalar case is convexity in the gradient variable. The reason is visible already for oscillating gradients: weak convergence can average rapidly alternating slopes, and a nonconvex density can assign a lower cost to the oscillations than to their weak average. Thus convexity is not just a technical convenience; it is the condition that prevents microscopic mixing from lowering the limiting energy. [definition: Convexity in the Gradient Variable] Let $U\subset\mathbb R^n$ be open. An integrand $f:U\times\mathbb R\times\mathbb R^n\to(-\infty,\infty]$ is convex in the gradient variable if for a.e. $x\in U$, every $s\in\mathbb R$, every $\xi,\eta\in\mathbb R^n$, and every $\theta\in[0,1]$, \begin{align*} f(x,s,\theta\xi+(1-\theta)\eta)\le \theta f(x,s,\xi)+(1-\theta)f(x,s,\eta). \end{align*} [/definition] Convexity is applied pointwise in $x$ and $s$, but the convergence available from compactness is weak convergence in $W^{1,p}$. The obstruction is that weakly convergent gradients need not converge pointwise or strongly, so one must know that oscillations cannot reduce the limiting integral below the value at the weak limit. [quotetheorem:8738] [citeproof:8738] The theorem separates two roles played by compactness: weak compactness controls gradients, while strong convergence of $u_k$ controls the value variable. Convexity in $\xi$ is necessary for this mechanism; nonconvex wells can be defeated by gradients that oscillate between preferred slopes while converging weakly to their average. The nonnegativity and growth assumptions keep the integral from losing mass through uncontrolled negative parts and ensure that the value dependence is compatible with $L^p$ convergence. If the integrand is independent of $s$, the strong convergence assumption on $u_k$ is unnecessary, but when $f$ depends on $s$ it cannot be replaced by weak convergence alone without additional compactness or continuity hypotheses. [example: Weighted p-Growth Energies] Let $1<p<\infty$, let $U\subset\mathbb R^n$ be bounded, let $a\in L^\infty(U)$ satisfy $a(x)\ge a_0>0$ for a.e. $x\in U$, and let $V:U\times\mathbb R\to[0,\infty)$ be a Caratheodory function with \begin{align*} V(x,s)\le C(1+|s|^p) \end{align*} for a.e. $x\in U$ and every $s\in\mathbb R$. Define \begin{align*} f(x,s,\xi)=a(x)|\xi|^p+V(x,s). \end{align*} For fixed $(s,\xi)$, the map $x\mapsto a(x)|\xi|^p+V(x,s)$ is measurable because $a$ is measurable, $x\mapsto V(x,s)$ is measurable, and $|\xi|^p$ is constant. For a.e. fixed $x$, the map $(s,\xi)\mapsto a(x)|\xi|^p+V(x,s)$ is continuous because $\xi\mapsto |\xi|^p$ is continuous and $s\mapsto V(x,s)$ is continuous. Hence $f$ is a Caratheodory integrand. For a.e. $x$ with $a(x)\ge0$, the gradient section is convex. Indeed, if $\theta\in[0,1]$ and $\xi,\eta\in\mathbb R^n$, then by the triangle inequality and convexity of $t\mapsto t^p$ on $[0,\infty)$, \begin{align*} |\theta\xi+(1-\theta)\eta|^p\le (\theta|\xi|+(1-\theta)|\eta|)^p\le \theta|\xi|^p+(1-\theta)|\eta|^p. \end{align*} Multiplying by $a(x)\ge0$ and adding the term $V(x,s)$ gives \begin{align*} f(x,s,\theta\xi+(1-\theta)\eta)\le \theta f(x,s,\xi)+(1-\theta)f(x,s,\eta). \end{align*} The same hypotheses give the upper $p$-growth estimate \begin{align*} f(x,s,\xi)\le \|a\|_{L^\infty(U)}|\xi|^p+C(1+|s|^p)\le C_f(1+|s|^p+|\xi|^p) \end{align*} for $C_f=\max\{\|a\|_{L^\infty(U)},C\}+C$. The lower bound is \begin{align*} f(x,s,\xi)=a(x)|\xi|^p+V(x,s)\ge a_0|\xi|^p, \end{align*} since $V\ge0$. Now let $u_k\rightharpoonup u$ in $W^{1,p}(U)$ and $u_k\to u$ in $L^p(U)$. The convexity in $\xi$, the Caratheodory property, and the growth bound give \begin{align*} \int_U f(x,u,\nabla u)\,d\mathcal L^n\le \liminf_{k\to\infty}\int_U f(x,u_k,\nabla u_k)\,d\mathcal L^n. \end{align*} For the lower-order term separately, strong $L^p$ convergence implies convergence in measure, so a subsequence satisfies $u_{k_j}(x)\to u(x)$ for a.e. $x$. At those points, continuity of $s\mapsto V(x,s)$ gives \begin{align*} V(x,u_{k_j}(x))\to V(x,u(x)). \end{align*} The estimate \begin{align*} 0\le V(x,u_{k_j}(x))\le C(1+|u_{k_j}(x)|^p) \end{align*} is compatible with uniform integrability because $u_{k_j}\to u$ strongly in $L^p(U)$. Thus the value dependence is controlled by strong convergence, while the gradient dependence is controlled by convexity and weak convergence. This is the model scalar energy for heterogeneous media with a rough positive weight and a lower-order potential. [/example] ## Tonelli's Direct Method Theorem We can now assemble the direct method. The theorem below is the scalar Tonelli-type existence result for convex integral functionals with $p$-growth. Its hypotheses are not decorative: each corresponds to one of the three direct-method steps. [quotetheorem:8739] [citeproof:8739] This theorem is the template for the rest of the course, and each hypothesis protects one step of the proof. If $p=1$, reflexivity of $W^{1,p}(U)$ is lost and bounded minimizing sequences need not have weakly convergent subsequences in the same space. If coercivity fails, constants or concentrating sequences can escape every bounded Sobolev set; if weak closedness fails, the weak limit may violate the constraint; if convexity fails, oscillating gradients can destroy lower semicontinuity. Later chapters weaken or replace convexity when the unknown is vector-valued, because convexity in the full gradient becomes too restrictive for nonlinear elasticity, minimal-surface-type energies, and variational inequalities with nonlinear constraints. [example: Minimal Surface Integrand Under Convexity Restrictions] Let $U\subset\mathbb R^n$ be bounded and define the nonparametric area density by \begin{align*} f(x,s,\xi)=\sqrt{1+|\xi|^2}. \end{align*} For fixed $(s,\xi)$ this is constant as a function of $x$, hence measurable, and for fixed $x$ the map $(s,\xi)\mapsto \sqrt{1+|\xi|^2}$ is continuous. Thus it is a Caratheodory integrand. We verify convexity in the gradient variable. For $\theta\in[0,1]$ and $\xi,\eta\in\mathbb R^n$, view $(1,\xi)$ and $(1,\eta)$ as vectors in $\mathbb R^{n+1}$. Then \begin{align*} f(x,s,\theta\xi+(1-\theta)\eta)=|(1,\theta\xi+(1-\theta)\eta)|. \end{align*} Since \begin{align*} (1,\theta\xi+(1-\theta)\eta)=\theta(1,\xi)+(1-\theta)(1,\eta), \end{align*} the triangle inequality and homogeneity of the Euclidean norm give \begin{align*} |(1,\theta\xi+(1-\theta)\eta)|\le \theta |(1,\xi)|+(1-\theta)|(1,\eta)|. \end{align*} Because $|(1,\xi)|=\sqrt{1+|\xi|^2}$ and $|(1,\eta)|=\sqrt{1+|\eta|^2}$, this becomes \begin{align*} f(x,s,\theta\xi+(1-\theta)\eta)\le \theta f(x,s,\xi)+(1-\theta)f(x,s,\eta). \end{align*} Thus the area density has the convexity structure used in the lower semicontinuity step for scalar integral functionals. Its growth, however, is linear rather than $p$-growth for any $p>1$. For $t=|\xi|\ge0$, \begin{align*} \sqrt{1+t^2}\le 1+t \end{align*} because $1+t^2\le 1+2t+t^2=(1+t)^2$. Hence \begin{align*} f(x,s,\xi)\le 1+|\xi|. \end{align*} In particular, for any unit vector $e\in\mathbb R^n$ and $k>0$, \begin{align*} \frac{f(x,s,ke)}{k^p}=\frac{\sqrt{1+k^2}}{k^p}\le \frac{1+k}{k^p}=k^{-p}+k^{1-p}. \end{align*} Since $p>1$, the right-hand side tends to $0$ as $k\to\infty$, so no positive multiple of $|\xi|^p$ can be bounded above by this density at large gradients. Therefore the $p$-growth coercivity hypothesis in the Tonelli theorem above is not satisfied by the unregularized area functional \begin{align*} I[u]=\int_U \sqrt{1+|\nabla u|^2}\,d\mathcal L^n. \end{align*} The example separates the two issues: convexity supplies the lower semicontinuity mechanism, while existence in a reflexive Sobolev space still requires an additional restriction or regularization that restores coercivity. [/example] ## Existence with $p$-Growth and Constrained Classes The practical use of Tonelli's theorem is not limited to pure Dirichlet problems. Many variational problems include convex constraints, obstacle inequalities, or affine side conditions, and the same proof works whenever the admissible class is weakly closed and the energy is coercive on that class. [quotetheorem:8740] [citeproof:8740] This final formulation is the working version used in applications: prove coercivity, check weak closedness of the constraints, and verify sequential weak lower semicontinuity, often by the convex-integrand theorem above. On bounded Lipschitz domains, a bounded weakly convergent sequence in $W^{1,p}(U)$ has a subsequence converging strongly in $L^p(U)$, so the lower semicontinuity theorem supplies hypothesis 3 for Caratheodory integrands with the stated convexity and growth assumptions. The order of the argument matters because lower semicontinuity is only useful after compactness has produced a candidate limit. Noncoercivity allows minimizing sequences to leave every bounded Sobolev set, nonclosed constraints can lose the admissibility condition at the weak limit, and failure of lower semicontinuity allows the weak limit to have larger energy than the limiting infimum. [example: Obstacle Problem with p-Laplacian Energy] Let $U\subset\mathbb R^n$ be bounded and Lipschitz, let $1<p<\infty$, let $g\in W^{1,p}(U)$, and let $\psi\in W^{1,p}(U)$ be compatible with the boundary data, so the obstacle class \begin{align*} \mathcal K_{g,\psi}=\{u\in g+W^{1,p}_0(U):u\ge\psi\text{ a.e. in }U\} \end{align*} is nonempty. For $h\in L^{p'}(U)$, where $1/p+1/p'=1$, consider \begin{align*} I[u]=\int_U \frac{1}{p}|\nabla u|^p-hu\,d\mathcal L^n. \end{align*} The integrand \begin{align*} f(x,s,\xi)=\frac{1}{p}|\xi|^p-h(x)s \end{align*} is measurable in $x$ for fixed $(s,\xi)$ because $h$ is measurable, and it is continuous in $(s,\xi)$ for a.e. fixed $x$ because $\xi\mapsto |\xi|^p$ and $s\mapsto -h(x)s$ are continuous. The gradient section is convex: for $\theta\in[0,1]$ and $\xi,\eta\in\mathbb R^n$, the triangle inequality and convexity of $t\mapsto t^p$ on $[0,\infty)$ give \begin{align*} |\theta\xi+(1-\theta)\eta|^p\le (\theta|\xi|+(1-\theta)|\eta|)^p\le \theta|\xi|^p+(1-\theta)|\eta|^p. \end{align*} Dividing by $p$ and adding the unchanged affine term $-h(x)s$ gives \begin{align*} f(x,s,\theta\xi+(1-\theta)\eta)\le \theta f(x,s,\xi)+(1-\theta)f(x,s,\eta). \end{align*} The energy is coercive on the Dirichlet class. Write $u=g+w$ with $w\in W^{1,p}_0(U)$. By Poincare's inequality, there is $C_P>0$ such that \begin{align*} \|w\|_{L^p(U)}\le C_P\|\nabla w\|_{L^p(U)}. \end{align*} Since $\nabla w=\nabla u-\nabla g$, the triangle inequality gives \begin{align*} \|u\|_{L^p(U)}\le \|g\|_{L^p(U)}+C_P\|\nabla u\|_{L^p(U)}+C_P\|\nabla g\|_{L^p(U)}. \end{align*} Holder's inequality gives \begin{align*} \left|\int_U hu\,d\mathcal L^n\right|\le \|h\|_{L^{p'}(U)}\|u\|_{L^p(U)}. \end{align*} Therefore, with $A=\|\nabla u\|_{L^p(U)}$ and $B=\|h\|_{L^{p'}(U)}(\|g\|_{L^p(U)}+C_P\|\nabla g\|_{L^p(U)})$, \begin{align*} I[u]\ge \frac{1}{p}A^p-C_P\|h\|_{L^{p'}(U)}A-B. \end{align*} Because $p>1$, the term $A^p/p$ dominates the linear term in $A$, so $I[u]\to\infty$ whenever $\|\nabla u\|_{L^p(U)}\to\infty$. The displayed $L^p$ estimate also shows that unboundedness of $\|u\|_{W^{1,p}(U)}$ in $g+W^{1,p}_0(U)$ forces $\|\nabla u\|_{L^p(U)}\to\infty$, hence $I$ is coercive on $\mathcal K_{g,\psi}$. If $u_k\rightharpoonup u$ in $W^{1,p}(U)$, then the convex integral \begin{align*} u\mapsto \int_U \frac{1}{p}|\nabla u|^p\,d\mathcal L^n \end{align*} is weakly lower semicontinuous by the convex-integrand lower semicontinuity theorem, while \begin{align*} u\mapsto \int_U hu\,d\mathcal L^n \end{align*} is weakly continuous because it is a bounded linear functional on $L^p(U)$. Thus \begin{align*} I[u]\le \liminf_{k\to\infty} I[u_k]. \end{align*} The class $\mathcal K_{g,\psi}$ is weakly closed by the obstacle-class argument above: the affine boundary condition is weakly closed, and the inequality $u\ge\psi$ is preserved by Mazur convex combinations and almost-everywhere limits. Hence coercivity gives a bounded minimizing sequence, reflexivity of $W^{1,p}(U)$ gives a weakly convergent subsequence, weak closedness keeps the limit in $\mathcal K_{g,\psi}$, and lower semicontinuity makes that limit a minimizer. This example is the prototype for variational inequalities: the convex gradient energy supplies lower semicontinuity, the Dirichlet condition supplies coercivity, and the unilateral obstacle constraint survives weak limits. [/example] The scalar Tonelli theory shows how coercivity, compactness, and lower semicontinuity combine to produce minimizers under concrete hypotheses. The next chapter extends this direct-method logic to problems with constraints, where admissibility is no longer determined by boundary data alone. # 5. Constraints, Obstacles, and Variational Inequalities Chapters 2-4 established the compactness and lower semicontinuity mechanisms behind the direct method. We now add constraints that are not equality constraints: the admissible functions may be required to lie above an obstacle, remain inside a box, or satisfy a unilateral sign condition. These constraints usually destroy the possibility of writing a classical Euler-Lagrange equation everywhere, but convexity replaces equality of first variations by a variational inequality. The central question is: if the admissible class is a closed convex subset of a Sobolev space, how much of the direct method survives? The answer is that weak closedness of convex sets, coercivity, and lower semicontinuity still give existence, while first-order optimality becomes a one-sided condition against all admissible competitors. This chapter develops that principle and applies it to obstacle-type problems. ## Closed Convex Constraint Sets in Sobolev Spaces The basic direct method asks for a weakly convergent subsequence of a minimizing sequence and then needs the limit to remain admissible. With equality boundary data this was handled by trace continuity; for unilateral or pointwise constraints, the admissible class is often a convex set rather than an affine space. The problem is to identify hypotheses under which convex constraints are stable under weak convergence. [definition: Closed Convex Constraint Set] Let $X$ be a Banach space. A subset $K \subset X$ is a closed convex constraint set if $K$ is nonempty, norm-closed in $X$, and for every $u,v \in K$ and every $t \in [0,1]$, the element $tu+(1-t)v$ belongs to $K$. [/definition] Convexity is not cosmetic here. In infinite-dimensional spaces, closed sets need not be weakly closed, so norm-closedness by itself is not enough for the direct method. Convexity is the extra structure that makes closedness compatible with weak limits. [quotetheorem:985] [citeproof:985] This result is the constraint analogue of trace stability. The closedness hypothesis is needed because weak limits cannot be expected to remain in a set that is not even closed in norm. The convexity hypothesis is also essential: in an infinite-dimensional Hilbert space, the unit sphere is norm-closed but not weakly closed, since an orthonormal sequence converges weakly to $0$ while every term has norm $1$. The theorem does not say that arbitrary nonlinear constraints are stable under weak convergence; it identifies convexity as the structural condition that allows the direct method to pass to the limit. We next check that the order constraints used in obstacle problems really fit this closed convex framework. [example: Nonnegative Sobolev Functions] Let $1<p<\infty$ and let $U \subset \mathbb R^n$ be open. We verify that \begin{align*} K=\{u\in W^{1,p}(U):u\ge 0\text{ a.e. in }U\} \end{align*} is a closed convex subset of $W^{1,p}(U)$. If $u,v\in K$ and $t\in[0,1]$, then for a.e. $x\in U$ we have $u(x)\ge 0$ and $v(x)\ge 0$, so \begin{align*} tu(x)+(1-t)v(x)\ge t\cdot 0+(1-t)\cdot 0=0. \end{align*} Thus $tu+(1-t)v\in K$, proving convexity. To prove norm-closedness, suppose $u_j\in K$ and $u_j\to u$ in $W^{1,p}(U)$. Then $u_j\to u$ in $L^p(U)$, because the $W^{1,p}$ norm controls the $L^p$ norm. From $L^p$ convergence, choose a subsequence $u_{j_k}$ such that $u_{j_k}(x)\to u(x)$ for a.e. $x\in U$. For those $x$, each $u_{j_k}(x)\ge 0$, and the order is preserved under limits: \begin{align*} u(x)=\lim_{k\to\infty}u_{j_k}(x)\ge 0. \end{align*} Hence $u\ge 0$ a.e. in $U$, so $u\in K$. Therefore $K$ is norm-closed and convex. Consequently, by *Closed Convex Sets Are Weakly Closed*, any weak limit in $W^{1,p}(U)$ of admissible nonnegative functions remains nonnegative. [/example] The nonnegative cone is the simplest unilateral constraint, but obstacle problems require comparison with a prescribed lower barrier rather than comparison with zero. The next definition packages that shifted cone so that it can be intersected with boundary conditions and used as an admissible class. [definition: Obstacle Constraint Set] Let $U \subset \mathbb R^n$ be open, let $1<p<\infty$, and let $\psi \in W^{1,p}(U)$. The obstacle constraint set associated with $\psi$ is \begin{align*} K_\psi = \{u \in W^{1,p}(U) : u \ge \psi \text{ a.e. in } U\}. \end{align*} [/definition] The set $K_\psi$ is a translate of the nonnegative cone, so it is closed and convex. In boundary value problems one usually intersects this set with an affine trace class, and the intersection remains closed and convex. [example: Box Constraints In Sobolev Spaces] Let $a,b \in W^{1,p}(U)$ satisfy $a \le b$ a.e., and define \begin{align*} K_{a,b}=\{u \in W^{1,p}(U): a \le u \le b \text{ a.e. in } U\}. \end{align*} The set is nonempty because $a\in K_{a,b}$: the assumptions give $a\le a$ everywhere and $a\le b$ a.e. To prove convexity, take $u,v\in K_{a,b}$ and $t\in[0,1]$. On a full-measure subset of $U$, we have \begin{align*} a(x)\le u(x)\le b(x) \quad \text{and} \quad a(x)\le v(x)\le b(x). \end{align*} Multiplying the lower inequalities by the nonnegative numbers $t$ and $1-t$ gives \begin{align*} ta(x)\le tu(x) \quad \text{and} \quad (1-t)a(x)\le (1-t)v(x). \end{align*} Adding these inequalities gives \begin{align*} a(x)=ta(x)+(1-t)a(x)\le tu(x)+(1-t)v(x). \end{align*} Similarly, \begin{align*} tu(x)\le tb(x) \quad \text{and} \quad (1-t)v(x)\le (1-t)b(x), \end{align*} so \begin{align*} tu(x)+(1-t)v(x)\le tb(x)+(1-t)b(x)=b(x). \end{align*} Thus $a\le tu+(1-t)v\le b$ a.e., so $tu+(1-t)v\in K_{a,b}$. To prove norm-closedness, suppose $u_j\in K_{a,b}$ and $u_j\to u$ in $W^{1,p}(U)$. Since $\|w\|_{L^p(U)}\le \|w\|_{W^{1,p}(U)}$, we also have $u_j\to u$ in $L^p(U)$. By the standard subsequence principle for $L^p$ convergence, there is a subsequence $u_{j_k}$ such that $u_{j_k}(x)\to u(x)$ for a.e. $x\in U$. For those $x$, each $u_{j_k}$ satisfies \begin{align*} a(x)\le u_{j_k}(x)\le b(x). \end{align*} Taking the limit along $k$ preserves the two inequalities: \begin{align*} a(x)\le \lim_{k\to\infty}u_{j_k}(x)=u(x)\le b(x). \end{align*} Hence $u\in K_{a,b}$, so $K_{a,b}$ is norm-closed. Therefore, for $1<p<\infty$, *Closed Convex Sets Are Weakly Closed* implies that $K_{a,b}$ is weakly closed in $W^{1,p}(U)$; box constraints are therefore stable under weak limits and can serve as admissible classes for upper and lower material bounds. [/example] These examples show that convex constraints are compatible with Sobolev weak convergence. The direct method then needs a theorem that combines this stability with the analytic assumptions already developed earlier in the course. [quotetheorem:8741] [citeproof:8741] This theorem is the constrained version of the direct method. Each hypothesis has a distinct failure mode: without reflexivity a bounded minimizing sequence need not have a weakly convergent subsequence, without closed convexity the weak limit can leave the admissible class, without coercivity the sequence can escape to infinity, and without weak lower semicontinuity the limiting energy can jump upward in the wrong direction. The theorem does not give uniqueness, regularity, or an Euler-Lagrange equation; it gives existence only. The obstacle problem below is the model application, where the admissible class is closed and convex but not an affine space of free variations. ## Projection-Free Existence for Obstacle Problems Obstacle problems ask for an energy minimizer among functions constrained to lie above a prescribed function. In a Hilbert space one might try to project an unconstrained minimizer onto the constraint set, but this is rarely compatible with nonlinear energies or Sobolev boundary data. The direct method gives existence without constructing projections. [definition: Classical Obstacle Admissible Class] Let $U \subset \mathbb R^n$ be bounded and open, let $\psi \in H^1(U)$, and let $g \in H^1(U)$ with $g \ge \psi$ on the boundary in the trace sense. The classical obstacle admissible class is \begin{align*} K_{\psi,g}=\{u \in H^1(U): u-g \in H^1_0(U),\ u\ge \psi \text{ a.e. in } U\}. \end{align*} [/definition] This class combines an affine boundary condition with a unilateral interior condition. The assumption that it is nonempty is part of the problem data; for instance, $g$ itself is admissible when $g\ge \psi$ a.e. in $U$. [example: Classical Obstacle Problem] Let $U \subset \mathbb R^n$ be bounded and open, let $f\in L^2(U)$, and assume $K_{\psi,g}$ is nonempty. We compute why the direct method applies to \begin{align*} I[u]=\int_U \left(\frac{1}{2}|\nabla u|^2-fu\right)\,d\mathcal L^n \end{align*} on $K_{\psi,g}$. Take $u\in K_{\psi,g}$ and write $u=g+w$ with $w\in H^1_0(U)$. By *Poincare's inequality*, there is a constant $C_U>0$ such that \begin{align*} \|w\|_{L^2(U)}\le C_U\|\nabla w\|_{L^2(U)}. \end{align*} Since $\nabla w=\nabla u-\nabla g$, the triangle inequality gives \begin{align*} \|\nabla w\|_{L^2(U)}\le \|\nabla u\|_{L^2(U)}+\|\nabla g\|_{L^2(U)}. \end{align*} Therefore \begin{align*} \|u\|_{L^2(U)}\le \|w\|_{L^2(U)}+\|g\|_{L^2(U)}\le C_U\|\nabla u\|_{L^2(U)}+C_U\|\nabla g\|_{L^2(U)}+\|g\|_{L^2(U)}. \end{align*} Using Cauchy-Schwarz on the forcing term, \begin{align*} \left|\int_U fu\,d\mathcal L^n\right|\le \|f\|_{L^2(U)}\|u\|_{L^2(U)}. \end{align*} Combining this with the previous estimate gives \begin{align*} I[u]\ge \frac{1}{2}\|\nabla u\|_{L^2(U)}^2-C_U\|f\|_{L^2(U)}\|\nabla u\|_{L^2(U)}-\|f\|_{L^2(U)}\bigl(C_U\|\nabla g\|_{L^2(U)}+\|g\|_{L^2(U)}\bigr). \end{align*} Young's inequality gives \begin{align*} C_U\|f\|_{L^2(U)}\|\nabla u\|_{L^2(U)}\le \frac{1}{4}\|\nabla u\|_{L^2(U)}^2+C_U^2\|f\|_{L^2(U)}^2. \end{align*} Hence \begin{align*} I[u]\ge \frac{1}{4}\|\nabla u\|_{L^2(U)}^2-C_U^2\|f\|_{L^2(U)}^2-\|f\|_{L^2(U)}\bigl(C_U\|\nabla g\|_{L^2(U)}+\|g\|_{L^2(U)}\bigr). \end{align*} The earlier $L^2$ estimate also shows that $\|u\|_{H^1(U)}\to\infty$ along $K_{\psi,g}$ forces $\|\nabla u\|_{L^2(U)}\to\infty$, so this lower bound proves coercivity on $K_{\psi,g}$. The admissible set is closed and convex because it is the intersection of the affine closed condition $u-g\in H^1_0(U)$ with the closed convex obstacle condition $u\ge \psi$ a.e. The term $u\mapsto -\int_U fu\,d\mathcal L^n$ is weakly continuous on $H^1(U)$ since weak convergence in $H^1(U)$ implies weak convergence in $L^2(U)$, while $u\mapsto \frac{1}{2}\|\nabla u\|_{L^2(U)}^2$ is weakly lower semicontinuous by convexity of the squared Hilbert norm. Thus $I$ is weakly lower semicontinuous on the nonempty closed convex set $K_{\psi,g}$, and the constrained direct method gives a minimizer without constructing any projection or first solving an Euler-Lagrange equation. [/example] This example contains the whole existence argument, but the course needs it as a reusable theorem with explicit hypotheses on the domain, data, admissible class, and functional. The theorem below records exactly what the projection-free direct method supplies before any variational inequality is derived. [quotetheorem:6454] [citeproof:6454] The hypotheses are doing real work here. Boundedness of $U$ and Poincare's inequality prevent the fixed-boundary perturbation $w=u-g$ from drifting by large constants, nonemptiness prevents the theorem from minimizing over the empty set, and the compatibility of $g$ with the obstacle is what makes that nonemptiness plausible. The theorem does not identify the contact set, prove smoothness, or say that the minimizer solves $-\Delta u=f$ everywhere. To extract the PDE content we must use only variations that remain above the obstacle, which leads to a variational inequality rather than a free Euler-Lagrange equation. [example: Unilateral Membrane Problem] A stretched membrane over $U$ with boundary height $g$ and rigid floor $\psi$ is modeled by minimizing \begin{align*} E[u]=\frac12\int_U |\nabla u|^2\,d\mathcal L^n \end{align*} over the admissible class $K_{\psi,g}$. If $u$ is a minimizer and $v\in K_{\psi,g}$, then convexity of $K_{\psi,g}$ makes $u+t(v-u)\in K_{\psi,g}$ for every $t\in[0,1]$. Since $u$ minimizes $E$, the function $\phi(t)=E[u+t(v-u)]$ satisfies $\phi(t)\ge \phi(0)$ for $t\ge0$. Expanding the energy along this admissible chord gives \begin{align*} \phi(t)=\frac12\int_U |\nabla u+t\nabla(v-u)|^2\,d\mathcal L^n. \end{align*} Using $|A+tB|^2=|A|^2+2tA\cdot B+t^2|B|^2$ with $A=\nabla u$ and $B=\nabla(v-u)$, \begin{align*} \phi(t)=\frac12\int_U |\nabla u|^2\,d\mathcal L^n+t\int_U \nabla u\cdot\nabla(v-u)\,d\mathcal L^n+\frac{t^2}{2}\int_U |\nabla(v-u)|^2\,d\mathcal L^n. \end{align*} Thus \begin{align*} \frac{\phi(t)-\phi(0)}{t}=\int_U \nabla u\cdot\nabla(v-u)\,d\mathcal L^n+\frac{t}{2}\int_U |\nabla(v-u)|^2\,d\mathcal L^n. \end{align*} Because $\phi(t)-\phi(0)\ge0$ for $t>0$, letting $t\downarrow0$ gives the variational inequality \begin{align*} \int_U \nabla u\cdot\nabla(v-u)\,d\mathcal L^n\ge0 \end{align*} for every $v\in K_{\psi,g}$. On a subregion where $u>\psi$, small perturbations $v=u+\varepsilon\eta$ and $v=u-\varepsilon\eta$ with compactly supported $\eta$ remain above the floor when $\varepsilon>0$ is small. Substituting $v=u+\varepsilon\eta$ gives \begin{align*} \varepsilon\int_U \nabla u\cdot\nabla\eta\,d\mathcal L^n\ge0, \end{align*} so $\int_U \nabla u\cdot\nabla\eta\,d\mathcal L^n\ge0$. Substituting $v=u-\varepsilon\eta$ gives \begin{align*} -\varepsilon\int_U \nabla u\cdot\nabla\eta\,d\mathcal L^n\ge0, \end{align*} so $\int_U \nabla u\cdot\nabla\eta\,d\mathcal L^n\le0$. Hence $\int_U \nabla u\cdot\nabla\eta\,d\mathcal L^n=0$ there, which is the weak form of $-\Delta u=0$. At contact points, downward perturbations may violate $u\ge\psi$, so the equality of first variations is replaced on the full domain by the one-sided inequality above. [/example] ## Variational Inequalities As First-Order Optimality Conditions For unconstrained differentiable minimization, first-order optimality says that the derivative vanishes in every direction. On a convex constraint set, only directions pointing from the minimizer toward admissible competitors are allowed. The resulting condition is a variational inequality. [definition: Variational Inequality] Let $X$ be a Banach space, let $K\subset X$ be convex, and let $A:K\to X^*$ be an operator. A solution of the variational inequality associated with $(A,K)$ is an element $u\in K$ such that \begin{align*} A(u)(v-u) \ge 0 \qquad \text{for every } v\in K. \end{align*} [/definition] When $A$ is the first variation of a functional, this condition says that every admissible chord leaving $u$ has nonnegative [directional derivative](/page/Directional%20Derivative). The next theorem justifies the definition by deriving it from constrained minimization, and it also explains why convex energies can be recovered from their variational inequalities. [quotetheorem:8742] [citeproof:8742] The convexity of $K$ is essential because it makes the chord $u+t(v-u)$ admissible; for a nonconvex set, a minimizer can have no admissible line segment in the direction of another competitor. Differentiability is also essential for this formulation, since nonsmooth energies lead instead to subdifferential inequalities. The theorem does not prove existence and does not turn the inequality into a PDE by itself; it only translates an already-known minimizer into a first-order condition. Linear elliptic inequalities are often written through a [bilinear form](/page/Bilinear%20Form), so we next isolate the boundedness and coercivity assumptions that make the Hilbert space theory work. [definition: Coercive Bilinear Form] Let $H$ be a Hilbert space. A bilinear form $B:H\times H\to \mathbb R$ is bounded and coercive if there are constants $M>0$ and $\alpha>0$ such that $|B[u,v]| \le M\|u\|_H\|v\|_H$ for every $u,v\in H$, and $B[u,u] \ge \alpha\|u\|_H^2$ for every $u\in H$. [/definition] Coercivity replaces compactness in the Hilbert space variational inequality theorem. It forces uniqueness and supplies the estimate needed to construct the solution, while closed convexity controls admissibility. [quotetheorem:88] [citeproof:88] Stampacchia's theorem is especially useful for linear elliptic problems with unilateral constraints, but the Hilbert space and bilinear form must be chosen so that coercivity is true. Closed convexity is needed for the metric projection, and coercivity is needed for uniqueness; if $B[u,v]=\int_U \nabla u\cdot\nabla v\,d\mathcal L^n$ is placed on all of $H^1(U)$, constants lie in the kernel and coercivity fails. The theorem does not automatically handle noncoercive Neumann-type forms or nonlinear energies. For obstacle problems with fixed boundary data, the usual remedy is to shift by the boundary datum and work on $H^1_0(U)$, where Poincare's inequality supplies the missing coercivity. [example: Linear Elliptic Obstacle Inequality] Let $U\subset\mathbb R^n$ be bounded, assume $K_{\psi,g}\subset H^1(U)$ is nonempty, and write every admissible function as $v=g+y$ with $y\in H^1_0(U)$. The shifted admissible set is \begin{align*} \widetilde K=\{y\in H^1_0(U): y+g\ge \psi \text{ a.e. in }U\}. \end{align*} It is nonempty because $K_{\psi,g}$ is nonempty. It is convex because if $y_1+g\ge\psi$ and $y_2+g\ge\psi$, then for $t\in[0,1]$, \begin{align*} (ty_1+(1-t)y_2)+g=t(y_1+g)+(1-t)(y_2+g)\ge t\psi+(1-t)\psi=\psi. \end{align*} It is closed in $H^1_0(U)$ by the same a.e. subsequence argument used for Sobolev order constraints. On $H^1_0(U)$ define \begin{align*} B[z,w]=\int_U \nabla z\cdot\nabla w\,d\mathcal L^n \end{align*} and \begin{align*} F(w)=\int_U fw\,d\mathcal L^n-\int_U \nabla g\cdot\nabla w\,d\mathcal L^n. \end{align*} With the equivalent norm $\|w\|_{H^1_0(U)}=\|\nabla w\|_{L^2(U)}$, Cauchy-Schwarz gives \begin{align*} |B[z,w]|\le \|\nabla z\|_{L^2(U)}\|\nabla w\|_{L^2(U)}=\|z\|_{H^1_0(U)}\|w\|_{H^1_0(U)}. \end{align*} Also \begin{align*} B[z,z]=\int_U |\nabla z|^2\,d\mathcal L^n=\|z\|_{H^1_0(U)}^2, \end{align*} so $B$ is coercive with coercivity constant $1$ in this norm. For $F$, Cauchy-Schwarz and *Poincare's inequality* give \begin{align*} \left|\int_U fw\,d\mathcal L^n\right|\le \|f\|_{L^2(U)}\|w\|_{L^2(U)}\le C_U\|f\|_{L^2(U)}\|\nabla w\|_{L^2(U)}. \end{align*} Similarly, \begin{align*} \left|\int_U \nabla g\cdot\nabla w\,d\mathcal L^n\right|\le \|\nabla g\|_{L^2(U)}\|\nabla w\|_{L^2(U)}. \end{align*} Therefore \begin{align*} |F(w)|\le \bigl(C_U\|f\|_{L^2(U)}+\|\nabla g\|_{L^2(U)}\bigr)\|w\|_{H^1_0(U)}, \end{align*} so $F$ is a bounded linear functional on $H^1_0(U)$. By *Stampacchia Variational Inequality Theorem*, there is a unique $z\in \widetilde K$ such that \begin{align*} B[z,y-z]\ge F(y-z) \end{align*} for every $y\in\widetilde K$. Expanding the definitions of $B$ and $F$ gives \begin{align*} \int_U \nabla z\cdot\nabla(y-z)\,d\mathcal L^n\ge \int_U f(y-z)\,d\mathcal L^n-\int_U \nabla g\cdot\nabla(y-z)\,d\mathcal L^n. \end{align*} Adding $\int_U \nabla g\cdot\nabla(y-z)\,d\mathcal L^n$ to both sides gives \begin{align*} \int_U (\nabla z+\nabla g)\cdot\nabla(y-z)\,d\mathcal L^n\ge \int_U f(y-z)\,d\mathcal L^n. \end{align*} Now set $u=g+z$ and $v=g+y$. Then $v-u=y-z$ and $\nabla u=\nabla g+\nabla z$, so the inequality becomes \begin{align*} \int_U \nabla u\cdot\nabla(v-u)\,d\mathcal L^n\ge \int_U f(v-u)\,d\mathcal L^n. \end{align*} This holds for every $v\in K_{\psi,g}$, because $v\in K_{\psi,g}$ is equivalent to $v=g+y$ with $y\in\widetilde K$. Thus the shifted Hilbert-space variational inequality is exactly the obstacle inequality in the original variables: competitors keep the same boundary trace and stay above the obstacle, so equality of first variations survives only in noncontact regions while the full domain retains the one-sided inequality. [/example] The final form of the obstacle problem is often written as complementarity: an equation holds where the obstacle is inactive, an inequality holds everywhere, and the reaction vanishes away from contact. This is the bridge from variational inequalities back to PDE language. [quotetheorem:6455] [citeproof:6455] The extra regularity assumptions are not decorative. Without continuous, or at least quasi-continuous, representatives, the pointwise sets $\{u>\psi\}$ and $\{u=\psi\}$ are not canonical for arbitrary $H^1$ functions; without the measure representation of $-\Delta u-f$, the phrase "support of the reaction" has to be interpreted purely distributionally. The theorem does not prove that such regularity always holds, nor does it locate the free boundary. It explains how, once the regularity framework is available, the variational inequality becomes a PDE with an unknown nonnegative reaction supported where the constraint is active. [example: Contact And Noncontact Regions] Suppose $U=(0,1)$, $f=0$, and $\psi$ is a smooth obstacle lying below compatible boundary data, so the minimizer $u\in K_{\psi,g}$ satisfies the obstacle variational inequality. By *Complementarity Formulation Of The Obstacle Problem*, the reaction is \begin{align*}-u''\ge 0\end{align*} in the distributional sense, and the reaction is supported on the contact set $\{x:u(x)=\psi(x)\}$. Let $I=(\alpha,\beta)$ be an open interval on which $u>\psi$. Since $I$ is contained in the noncontact region, the reaction vanishes on $I$, so \begin{align*}-u''=0\end{align*} there in the distributional sense. Equivalently, for every $\eta\in C_c^\infty(I)$, \begin{align*}\int_I u'(x)\eta'(x)\,dx=0.\end{align*} Thus $u''=0$ on $I$. If $u$ is represented classically on this interval, integrating $u''=0$ gives \begin{align*}u'(x)=c\end{align*} for a constant $c$, and integrating once more gives \begin{align*}u(x)=cx+d\end{align*} for constants $c,d$. Hence the minimizer is affine on every interval where it stays strictly above the obstacle. On an interval where $u=\psi$, the constraint is active. If the reaction has a density $r$, then \begin{align*}r=-u''\end{align*} because $f=0$, while $u=\psi$ gives \begin{align*}r=-\psi''\end{align*} on that contact interval. The condition $r\ge 0$ means that this reaction pushes upward exactly where the unconstrained affine profile would otherwise violate $u\ge\psi$. The free boundary is the set separating the affine noncontact intervals from the contact region, and its location is determined by the minimization problem rather than prescribed in advance. [/example] The main lesson of the chapter is that convex constraints preserve the direct method while changing the form of the Euler-Lagrange condition. Existence still comes from weak compactness, coercivity, and lower semicontinuity, but optimality is expressed by inequalities against all admissible competitors. Obstacle problems are the model case: the minimizer exists by projection-free compactness arguments, satisfies a variational inequality, and admits a complementarity interpretation as a PDE with a unilateral reaction. The direct method now has the tools to handle constrained minimization, but constraints change the variational landscape: admissible sets may be closed under weak limits even when they are not linear or smooth. The next chapter studies what happens when the problem itself must be modified by relaxation or convexification. # 6. Relaxation and Convexification This chapter studies what remains of a variational problem when minimizing sequences refuse to converge strongly. In the previous chapters, coercivity and weak lower semicontinuity gave existence of minimizers by passing to weak limits. Here the weak limit may lose the fine oscillations that lower the energy, so the original functional is replaced by a relaxed functional that records the limiting cost seen by all weakly convergent sequences. The guiding question is: if a functional has no minimizer because minimizing sequences oscillate, what is the correct variational problem whose minimizers describe the macroscopic limit? In the scalar convexification regime, the answer is governed by convex envelopes. Young measures then give a language for diagnosing which oscillations are present and why Jensen-type inequalities are the correct lower semicontinuity mechanism. ## Relaxed Functionals and Lower Semicontinuous Envelopes A minimizing sequence usually gives only weak convergence in a Sobolev space. The first problem is to define a replacement functional that has the same infimum as the original problem but behaves well under the convergence supplied by compactness. [definition: Sequential Relaxation] Let $X$ be a topological space, let $F:X\to (-\infty,\infty]$, and let $\tau$ be a convergence structure on $X$. The sequential relaxation of $F$ with respect to $\tau$ is the functional $\overline{F}:X\to[-\infty,\infty]$ defined by \begin{align*} \overline{F}(u)=\inf\{\liminf_{k\to\infty}F(u_k):u_k\xrightarrow{\tau}u\}. \end{align*} [/definition] This definition turns every possible recovery of the limit $u$ into a candidate cost. To justify replacing $F$ by $\overline F$, the next theorem records the two facts needed by the direct method: the minimum value is unchanged, and recovery sequences for relaxed minimizers are genuine minimizing sequences for the original functional. [definition: Lower Semicontinuous Envelope] Let $X$ be a topological space and let $F:X\to(-\infty,\infty]$. The lower semicontinuous envelope of $F$ is \begin{align*} F^{\mathrm{lsc}}(u)=\sup\{G(u):G\le F\text{ on }X,\ G\text{ is lower semicontinuous}\}. \end{align*} [/definition] The envelope viewpoint is useful because existence theorems apply to lower semicontinuous functionals. The sequential formula is useful because it tells us how to compute the value by looking at actual approximating sequences. The following theorem is the compatibility check that lets both viewpoints serve the original minimization problem. [quotetheorem:8743] [citeproof:8743] This theorem explains why relaxation does not change the minimum value. The hypothesis on constant sequences is what guarantees $\overline F\le F$: without it, the original competitor $u$ might not be admissible as a sequence converging to itself. For instance, if the chosen convergence structure declared no sequence to converge to a given point $u_0$, then the infimum defining $\overline F(u_0)$ would be taken over an empty class and could fail to compare with $F(u_0)$. The theorem also does not say that every relaxed value is attained by a recovery sequence; it only says that when such a sequence exists at a relaxed minimizer, it recovers a genuine minimizing sequence for the original problem. Relaxation therefore changes the class of admissible macroscopic states and records the cost of unresolved small-scale behaviour, but the construction of recovery sequences remains a separate and usually substantial step. [example: Relaxing A Jump Cost] Let $X=L^1(0,1)$ and define $F(u)=0$ when $u(x)\in\{0,1\}$ for a.e. $x$, and $F(u)=\infty$ otherwise. We compute the sequential relaxation for weak-* convergence in $L^\infty(0,1)$ along sequences bounded between $0$ and $1$. First suppose $u_k\in\{0,1\}$ a.e. and $u_k\rightharpoonup^\ast u$ in $L^\infty(0,1)$. For every nonnegative $\varphi\in L^1(0,1)$, \begin{align*} 0\le \int_0^1 u_k(x)\varphi(x)\,dx\le \int_0^1\varphi(x)\,dx. \end{align*} Passing to the weak-* limit gives \begin{align*} 0\le \int_0^1 u(x)\varphi(x)\,dx\le \int_0^1\varphi(x)\,dx. \end{align*} Testing this with $\varphi=\mathbf 1_{\{u<0\}}$ and then with $\varphi=\mathbf 1_{\{u>1\}}$ shows $0\le u\le1$ a.e. Hence no weak-* limit of admissible zero-cost states can lie outside the interval constraint. Conversely, let $0\le u\le1$ a.e. Approximate $u$ in $L^1$ by interval-step functions $s_j=\sum_i\theta_{j,i}\mathbf 1_{I_{j,i}}$ with $0\le\theta_{j,i}\le1$. On each interval $I_{j,i}$, divide $I_{j,i}$ into many equal subintervals and set $v_{j,k}=1$ on the first proportion $\theta_{j,i}$ of each small subinterval and $v_{j,k}=0$ on the rest. Then $v_{j,k}\in\{0,1\}$ a.e., and for every fixed test function $\varphi\in L^1(0,1)$ the averaging on each shrinking cell gives \begin{align*} \lim_{k\to\infty}\int_0^1 v_{j,k}(x)\varphi(x)\,dx=\int_0^1 s_j(x)\varphi(x)\,dx. \end{align*} Choosing a diagonal sequence $u_j=v_{j,k(j)}$ and using $s_j\to u$ in $L^1$ gives $u_j\rightharpoonup^\ast u$ in $L^\infty(0,1)$. Since every $u_j$ is $\{0,1\}$-valued, $F(u_j)=0$ for all $j$, so the relaxed cost of every $u$ with $0\le u\le1$ a.e. is $0$. Therefore \begin{align*} \overline F(u)=0\text{ if }0\le u\le1\text{ a.e., and }\overline F(u)=\infty\text{ otherwise.} \end{align*} The relaxation has replaced the nonconvex pointwise constraint $\{0,1\}$ by its convex hull $[0,1]$, because fine oscillations between the two pure states are visible weakly only through their local volume fraction. [/example] ## Convex Envelopes in Scalar Integral Problems The next question is computational: given an integral functional with a nonconvex density, what formula describes the relaxed density? In scalar-gradient problems the answer is obtained by replacing the density by its convex envelope. [definition: Convex Envelope] Let $f:\mathbb R^m\to(-\infty,\infty]$. The convex envelope $f^{**}$ of $f$ is \begin{align*} f^{**}(\xi)=\sup\{\ell(\xi):\ell:\mathbb R^m\to\mathbb R\text{ affine and }\ell\le f\text{ on }\mathbb R^m\}. \end{align*} [/definition] The notation $f^{**}$ comes from the Legendre-Fenchel biconjugate. For lower semicontinuous proper functions with suitable growth, this biconjugate is the lower semicontinuous convex envelope. The forward relaxation question is how this pointwise convexification changes the integral functional itself. In scalar Sobolev problems, minimizing sequences can form fine mixtures of slopes, so the theorem identifies the relaxed energy density seen under weak convergence. [quotetheorem:8744] [citeproof:8744] The theorem is the scalar prototype for relaxation. In the vectorial setting of Chapter 7 the corresponding replacement is not generally the convex envelope; quasiconvexity becomes the correct condition, so scalar convexification is best treated as a special case rather than as the final theory. [example: Convex Envelope Of A Double Well] Consider $f:\mathbb R\to[0,\infty)$ given by $f(s)=(s^2-1)^2$. Since \begin{align*} f(1)=(1^2-1)^2=0 \end{align*} and \begin{align*} f(-1)=((-1)^2-1)^2=0, \end{align*} the two wells are at $s=-1$ and $s=1$. Define \begin{align*} g(s)=0\text{ for }-1\le s\le1,\qquad g(s)=f(s)\text{ for }|s|\ge1. \end{align*} For $|s|\ge1$, \begin{align*} f'(s)=4s(s^2-1) \end{align*} and \begin{align*} f''(s)=12s^2-4\ge 8, \end{align*} so $f$ is convex on $(-\infty,-1]$ and on $[1,\infty)$. Also \begin{align*} f'(-1)=4(-1)((-1)^2-1)=0 \end{align*} and \begin{align*} f'(1)=4(1)(1^2-1)=0, \end{align*} which matches the slope of the flat part $g=0$ on $[-1,1]$. Hence $g$ is convex and $g\le f$. Now let $h$ be any convex function with $h\le f$. Since $f(-1)=f(1)=0$, we have $h(-1)\le0$ and $h(1)\le0$. For $-1\le s\le1$, write \begin{align*} s=\frac{1-s}{2}(-1)+\frac{1+s}{2}(1). \end{align*} Convexity gives \begin{align*} h(s)\le \frac{1-s}{2}h(-1)+\frac{1+s}{2}h(1)\le0=g(s). \end{align*} For $|s|\ge1$, the inequality $h\le f$ gives $h(s)\le f(s)=g(s)$. Therefore every convex minorant of $f$ lies below $g$, so $f^{**}=g$. Thus $f^{**}(s)=0$ on $[-1,1]$ and $f^{**}(s)=f(s)$ for $|s|\ge1$. The flat interval records zero-cost mixing: if $s\in[-1,1]$ and $\theta=(s+1)/2$, then \begin{align*} \theta\cdot 1+(1-\theta)(-1)=2\theta-1=s \end{align*} and \begin{align*} \theta f(1)+(1-\theta)f(-1)=0. \end{align*} So gradients oscillating between the two wells can produce any average slope in $[-1,1]$ while paying zero relaxed density. [/example] This flat part of the relaxed density is the analytic signature of microstructure. It says that the macroscopic gradient does not determine a single microscopic gradient; it determines an average over gradients selected by a fine-scale pattern. [example: Nonattainment For A Double-Well Gradient Energy] Let $U=(0,1)$, impose $u(0)=u(1)=0$, and define \begin{align*} F(u)=\int_0^1 ((u'(x))^2-1)^2\,dx. \end{align*} For each $n\ge1$, split $(0,1)$ into $n$ periods of length $1/n$. On the period $[m/n,(m+1)/n]$, set \begin{align*} u_n'(x)=1\text{ on }\left[m/n,m/n+1/(2n)\right),\qquad u_n'(x)=-1\text{ on }\left[m/n+1/(2n),(m+1)/n\right), \end{align*} and choose $u_n(0)=0$. Then each period has zero net change, since \begin{align*} \int_{m/n}^{(m+1)/n}u_n'(x)\,dx=\int_{m/n}^{m/n+1/(2n)}1\,dx+\int_{m/n+1/(2n)}^{(m+1)/n}(-1)\,dx=\frac{1}{2n}-\frac{1}{2n}=0. \end{align*} Hence $u_n(m/n)=0$ for every $m$, in particular $u_n(0)=u_n(1)=0$. Also $0\le u_n(x)\le1/(2n)$, so $u_n\to0$ strongly in $L^p(0,1)$. The derivatives converge weakly to $0$ in $L^p(0,1)$ for $1<p<\infty$. Indeed, if $\varphi\in C^1([0,1])$, then on the $m$-th period \begin{align*} \int_{m/n}^{(m+1)/n}u_n'(x)\varphi(x)\,dx=\int_{m/n}^{m/n+1/(2n)}\varphi(x)\,dx-\int_{m/n+1/(2n)}^{(m+1)/n}\varphi(x)\,dx. \end{align*} With $h=1/(2n)$ and $a=m/n$, this equals \begin{align*} \int_a^{a+h}\bigl(\varphi(x)-\varphi(x+h)\bigr)\,dx. \end{align*} Since $\varphi(x+h)-\varphi(x)=\int_x^{x+h}\varphi'(t)\,dt$, we get \begin{align*} \left|\int_a^{a+h}\bigl(\varphi(x)-\varphi(x+h)\bigr)\,dx\right|\le\int_a^{a+h}\int_x^{x+h}|\varphi'(t)|\,dt\,dx\le h^2\|\varphi'\|_\infty. \end{align*} Summing over $n$ periods gives \begin{align*} \left|\int_0^1u_n'(x)\varphi(x)\,dx\right|\le n h^2\|\varphi'\|_\infty=\frac{1}{4n}\|\varphi'\|_\infty\to0. \end{align*} Density of $C^1([0,1])$ in $L^{p'}(0,1)$ and the uniform bound $|u_n'|\le1$ extend this convergence to all $L^{p'}$ test functions, so $u_n\rightharpoonup0$ in $W^{1,p}(0,1)$. For every $n$ and almost every $x$, $u_n'(x)\in\{-1,1\}$, so \begin{align*} ((u_n'(x))^2-1)^2=(1-1)^2=0. \end{align*} Therefore \begin{align*} F(u_n)=\int_0^1 0\,dx=0. \end{align*} The weak limit is $u=0$, whose derivative is $u'=0$, and hence \begin{align*} F(0)=\int_0^1 ((0)^2-1)^2\,dx=\int_0^1 1\,dx=1. \end{align*} Thus the original energy is not weakly lower semicontinuous at this oscillating sequence: the sequence has zero energy, but its weak limit has energy $1$. The relaxed density replaces the double well by its convex envelope. Since $0$ is the average \begin{align*} 0=\frac12\cdot1+\frac12\cdot(-1) \end{align*} and the two well values satisfy \begin{align*} f(1)=((1)^2-1)^2=0,\qquad f(-1)=(((-1)^2)-1)^2=0, \end{align*} the convexified density has value $0$ at slope $0$. The relaxed energy therefore assigns the weak limit $u=0$ the value $0$, recording the cost of the unresolved oscillation rather than the cost of the averaged derivative. [/example] ## Young Measures as a Diagnostic for Oscillation Weak convergence records only averages, so it cannot say how a minimizing sequence splits its mass among different wells. Young measures add the missing diagnostic: at almost every point, they record the probability distribution of limiting microscopic gradients. [definition: Young Measure Generated By Gradients] Let $U\subset\mathbb R^n$ be open and let $(v_k)$ be a sequence of measurable maps $v_k:U\to\mathbb R^m$. A parametrized family $(\nu_x)_{x\in U}$ of probability measures on $\mathbb R^m$ is a Young measure generated by $(v_k)$ if, for every continuous function $g:\mathbb R^m\to\mathbb R$ with suitable growth, \begin{align*} g(v_k)\rightharpoonup \left(x\mapsto\int_{\mathbb R^m}g(\xi)\,d\nu_x(\xi)\right) \end{align*} weakly in the corresponding $L^1$ space. [/definition] For gradients, the barycentre of the Young measure is constrained by the weak limit. If $\nabla u_k\rightharpoonup\nabla u$, then the average microscopic gradient must be $\nabla u(x)$ for a.e. $x$. The next theorem turns that barycentre constraint into the [Jensen inequality](/theorems/515) that underlies convex lower semicontinuity. [quotetheorem:8745] [citeproof:8745] This inequality is the local mechanism behind weak lower semicontinuity for convex densities. Convexity is essential: for a concave function the Jensen inequality goes in the opposite direction, and for a nonconvex function no one-sided bound is available in general. For example, if $\nu=\frac12\delta_{-1}+\frac12\delta_1$ and $\varphi(s)=s^2$, then equality gives $\varphi(0)=0\le1=\int\varphi\,d\nu$; but for the nonconvex double-well $f(s)=(s^2-1)^2$, the same measure gives $f(0)=1$ while $\int f\,d\nu=0$. Thus the bound is sharp for affine or non-oscillating situations, while strict inequality measures the energy that oscillation can hide from the weak limit. [example: Two-Well Young Measure] Let $a,b\in\mathbb R^n$ and let $0<\theta<1$. Suppose the gradients $\nabla u_k$ alternate on finer and finer stripes between the two values $a$ and $b$, using the value $a$ on a fraction $\theta$ of each period and the value $b$ on the remaining fraction $1-\theta$. The generated homogeneous Young measure is \begin{align*} \nu_x=\theta\delta_a+(1-\theta)\delta_b. \end{align*} Indeed, for every continuous test function $g:\mathbb R^n\to\mathbb R$ compatible with the growth of the sequence, \begin{align*} \int_{\mathbb R^n}g(\xi)\,d\nu_x(\xi)=\theta g(a)+(1-\theta)g(b). \end{align*} Applying this identity to the coordinate functions $g_i(\xi)=\xi_i$ gives the barycentre component by component: \begin{align*} \int_{\mathbb R^n}\xi_i\,d\nu_x(\xi)=\theta a_i+(1-\theta)b_i. \end{align*} Therefore \begin{align*} \int_{\mathbb R^n}\xi\,d\nu_x(\xi)=\theta a+(1-\theta)b. \end{align*} Thus any weak gradient limit produced by this oscillation has average gradient $\theta a+(1-\theta)b$. If a density $f$ satisfies $f(a)=0$ and $f(b)=0$, then the Young-measure energy density is \begin{align*} \int_{\mathbb R^n}f(\xi)\,d\nu_x(\xi)=\theta f(a)+(1-\theta)f(b)=\theta\cdot0+(1-\theta)\cdot0=0. \end{align*} The pointwise density at the weak limit is instead \begin{align*} f\bigl(\theta a+(1-\theta)b\bigr). \end{align*} This value can be positive when $f$ is nonconvex, so the Young measure records zero microscopic cost even though the averaged gradient may lie away from the wells. [/example] ## Existence for the Relaxed Problem After identifying the relaxed functional, the final question is whether it has minimizers. This is where the direct method returns: relaxation is designed to restore lower semicontinuity without losing the infimum. [quotetheorem:8746] [citeproof:8746] The minimizer of the relaxed problem is the macroscopic state selected by the variational problem. It may not minimize the original functional, but the relaxation theorem supplies recovery sequences whose fine-scale oscillations realize the same limiting cost. [remark: Meaning Of A Relaxed Minimizer] A relaxed minimizer should be read together with at least one recovery sequence. The function $u$ gives the coarse deformation, concentration, or phase field, while the recovery sequence describes the microscopic pattern that the original energy favours. In scalar convexification problems this microscopic information is often encoded by mixtures between wells, and Young measures record those mixtures without committing to a particular sequence. [/remark] The chapter therefore completes a loop in the direct method. Weak compactness produces limits, lower semicontinuity may fail for the original functional, relaxation repairs the failure, and convexification or Young-measure diagnostics explain what has been added to the model. We have seen that failure of lower semicontinuity can force a variational problem to be replaced by its relaxed version. The next chapter sharpens this idea in the vector-valued setting by identifying quasiconvexity as the condition that replaces ordinary convexity. # 7. Quasiconvexity and Weak Lower Semicontinuity Building on the scalar convexity and relaxation results of Chapters 4 and 6, this chapter addresses the structural condition that replaces convexity for vector-valued variational problems. In the scalar case, convexity of the integrand in the gradient variable is the right condition for weak lower semicontinuity, but systems allow oscillations in gradients that are invisible to weak convergence. Morrey's quasiconvexity tests an integrand against compactly supported gradient perturbations, and it is designed to detect exactly those oscillations that can arise in Sobolev minimizing sequences. ## Morrey Quasiconvexity The problem is to identify which pointwise condition on an integrand $f(A)$ prevents a minimizing sequence $u_j \rightharpoonup u$ from lowering the energy by developing fine-scale oscillations in $\nabla u_j$. Convexity rules out all averaging gains, but for maps $u: \Omega \subset \mathbb R^n \to \mathbb R^m$ the oscillations compatible with gradients have more structure than arbitrary oscillations in $\mathbb R^{m \times n}$. Morrey quasiconvexity tests only those perturbations that are actual gradients. [definition: Morrey Quasiconvex Function] Let $f: \mathbb R^{m \times n} \to \mathbb R$ be Borel measurable and locally bounded. The function $f$ is quasiconvex if, for every $A \in \mathbb R^{m \times n}$, every bounded open set $D \subset \mathbb R^n$, and every $\varphi \in W^{1,\infty}_0(D;\mathbb R^m)$, \begin{align*} f(A)|D| \le \int_D f(A + \nabla \varphi(x))\,d\mathcal L^n(x). \end{align*} [/definition] The set $D$ is only a testing domain. By scaling and tiling arguments, it is enough to test on one fixed bounded Lipschitz domain, such as the unit cube, when the integrability hypotheses allow the standard approximation steps. The definition says that an affine map $x \mapsto Ax$ cannot have its average energy lowered by replacing it with another map having the same boundary trace. [example: Convex Integrands Are Quasiconvex] Let $f:\mathbb R^{m\times n}\to \mathbb R$ be convex, fix $A\in\mathbb R^{m\times n}$, and let $\varphi\in W^{1,\infty}_0(D;\mathbb R^m)$. Extend $\varphi$ by zero outside $D$. For each component $\alpha$ and direction $i$, the distributional derivative of this compactly supported extension has integral zero, so \begin{align*} \int_D \partial_i\varphi_\alpha(x)\,d\mathcal L^n(x)=0. \end{align*} Thus, entry by entry, \begin{align*} \int_D \nabla\varphi(x)\,d\mathcal L^n(x)=0. \end{align*} It follows that the average of the perturbed gradient is exactly $A$: \begin{align*} \frac{1}{|D|}\int_D (A+\nabla\varphi(x))\,d\mathcal L^n(x)=\frac{1}{|D|}\left(A|D|+0\right)=A. \end{align*} Applying *[Jensen's inequality](/theorems/9)* to the probability measure $|D|^{-1}\mathcal L^n|_D$ gives \begin{align*} f(A)=f\left(\frac{1}{|D|}\int_D (A+\nabla\varphi(x))\,d\mathcal L^n(x)\right)\le \frac{1}{|D|}\int_D f(A+\nabla\varphi(x))\,d\mathcal L^n(x). \end{align*} Multiplying by $|D|$ yields \begin{align*} f(A)|D|\le \int_D f(A+\nabla\varphi(x))\,d\mathcal L^n(x). \end{align*} This is precisely the quasiconvexity inequality, so every convex integrand is quasiconvex; the only averaging allowed here is averaging along gradients with fixed boundary trace. [/example] Convexity is sufficient but too restrictive in vectorial problems. In nonlinear elasticity, for instance, physically natural energies depend on determinants and minors of deformation gradients, and these expressions are not usually convex in all matrix entries. The next example records the basic source of such nonconvex integrands. [example: Determinant As A Null Lagrangian] Let $m=n$, let $f(A)=\det A$, and set $u(x)=Ax+\varphi(x)$ with $\varphi\in W^{1,\infty}_0(D;\mathbb R^n)$. We show that the determinant term has the same integral as the affine comparison map. First take $\varphi\in C_c^\infty(D;\mathbb R^n)$; the general $W^{1,\infty}_0$ case follows by approximation in $W^{1,p}$ for every finite $p$ and continuity of the polynomial $A\mapsto \det A$. By multilinearity of the determinant in the columns, \begin{align*} \det(A+\nabla\varphi)=\det A+\sum_{\emptyset\ne S\subset\{1,\dots,n\}} \det(C_1^S,\dots,C_n^S). \end{align*} Here $C_i^S=\partial_i\varphi$ if $i\in S$, and $C_i^S=A_i$ if $i\notin S$, where $A_i$ is the $i$-th column of $A$. Fix a nonempty $S$ and choose $r\in S$. Expanding the determinant in the $r$-th column gives \begin{align*} \det(C_1^S,\dots,C_n^S)=\sum_{\alpha=1}^n \partial_r\varphi_\alpha\,\operatorname{cof}_{\alpha r}(C_1^S,\dots,C_n^S). \end{align*} The cofactor $\operatorname{cof}_{\alpha r}$ does not contain the $r$-th column, so [integration by parts](/theorems/2098) yields \begin{align*} \int_D \det(C_1^S,\dots,C_n^S)\,d\mathcal L^n=-\sum_{\alpha=1}^n\int_D \varphi_\alpha\,\partial_r\operatorname{cof}_{\alpha r}(C_1^S,\dots,C_n^S)\,d\mathcal L^n. \end{align*} The derivative $\partial_r\operatorname{cof}_{\alpha r}$ is a sum of determinants in which one of the remaining gradient columns $\partial_i\varphi$ is replaced by $\partial_r\partial_i\varphi$. Since mixed partials commute, each such term cancels with the term obtained by swapping the differentiated column with the $r$-th column, and the alternating sign of the determinant changes sign under that swap. Hence $\partial_r\operatorname{cof}_{\alpha r}=0$, so every nonconstant multilinear term has integral zero. Therefore \begin{align*} \int_D \det(A+\nabla\varphi(x))\,d\mathcal L^n=\int_D \det A\,d\mathcal L^n=\det(A)|D|. \end{align*} Thus both $\det A$ and $-\det A$ satisfy the quasiconvexity inequality with equality. The determinant is therefore a null Lagrangian, although it is not convex for $n\ge 2$. [/example] This equality phenomenon is more rigid than ordinary quasiconvexity: the perturbation test never changes the integral at all. Such terms create a special difficulty in recognizing convexity conditions, because they may be visibly nonconvex while remaining invisible to every compactly supported perturbation of an affine map. [definition: Null Lagrangian] A function $f: \mathbb R^{m\times n}\to \mathbb R$ is a null Lagrangian if, for every bounded Lipschitz open set $D\subset \mathbb R^n$, every $A\in \mathbb R^{m\times n}$, and every $\varphi\in W^{1,\infty}_0(D;\mathbb R^m)$, \begin{align*} \int_D f(A+\nabla\varphi(x))\,d\mathcal L^n(x)=f(A)|D|. \end{align*} [/definition] Null Lagrangians sit on the boundary between rigidity and flexibility: they are quasiconvex, but they do not create coercivity because their oscillation test is always neutral. The relevant examples are linear combinations of minors of the matrix variable, which are central in elasticity because they encode changes of length, area, and volume. ## Rank-One Convexity And Necessary Conditions Quasiconvexity is hard to verify directly because the test functions range over many compactly supported gradients. A useful way to extract necessary consequences is to test against very simple oscillations: gradients that switch between two matrices whose difference has rank one. These are the oscillations created by laminates, where the deformation changes slope across parallel layers while remaining continuous. [definition: Rank-One Convex Function] A function $f: \mathbb R^{m\times n}\to \mathbb R$ is rank-one convex if, for every $A\in \mathbb R^{m\times n}$ and every matrix $B\in \mathbb R^{m\times n}$ with $\operatorname{rank} B\le 1$, the function $t\mapsto f(A+tB)$ is convex on every interval on which it is defined. [/definition] Rank-one convexity is a one-dimensional convexity condition along the directions that can arise from jumps in gradients of continuous piecewise affine maps. The next theorem is needed because it turns the difficult quasiconvex test into a necessary condition that can be checked on every rank-one line. [quotetheorem:8747] [citeproof:8747] The laminate argument shows why rank-one directions are forced on us: two constant gradients can be pasted across a flat interface only when their difference has rank at most one. This compatibility condition is the Hadamard jump condition in this setting. The continuity hypothesis is used when the ideal laminate is cut off near the boundary: the corrected gradients differ from the pure two-gradient pattern on a small set, and continuity lets that small set have negligible energetic effect. Without a regularity hypothesis of this kind, changing a finite integrand on a thin family of matrices can destroy pointwise convexity along a rank-one line while leaving many integral tests unchanged, so the laminate argument no longer gives a pointwise conclusion. The theorem also does not give the converse: rank-one convexity is only a necessary condition, and in genuinely vectorial dimensions there are rank-one convex integrands that are not quasiconvex. Its role in the course is therefore diagnostic rather than decisive, preparing the later lower semicontinuity theorem where quasiconvexity itself is the relevant sufficient condition. [example: Rank-One Laminate Test] Let $B=a\otimes \xi$, so $B_{\alpha i}=a_\alpha \xi_i$. Let $h:\mathbb R\to\mathbb R$ be periodic and piecewise affine, with $h'=s$ on a proportion $\lambda$ of each period and $h'=t$ on the remaining proportion $1-\lambda$. For \begin{align*} \varphi_k(x)=\frac{1}{k}a\,h(k\,\xi\cdot x), \end{align*} the $\alpha$-th component is \begin{align*} (\varphi_k)_\alpha(x)=\frac{1}{k}a_\alpha h(k\,\xi\cdot x). \end{align*} At every point where $h$ is differentiable at $k\,\xi\cdot x$, the chain rule gives \begin{align*} \partial_i(\varphi_k)_\alpha(x)=\frac{1}{k}a_\alpha h'(k\,\xi\cdot x)\,k\xi_i=a_\alpha \xi_i h'(k\,\xi\cdot x). \end{align*} Thus, entry by entry, \begin{align*} \nabla\varphi_k(x)=(a\otimes \xi)h'(k\,\xi\cdot x)=B\,h'(k\,\xi\cdot x). \end{align*} On the layers where $h'(k\,\xi\cdot x)=s$, this gives $\nabla\varphi_k=sB$, and on the layers where $h'(k\,\xi\cdot x)=t$, this gives $\nabla\varphi_k=tB$. Since $h$ is periodic and piecewise affine, it is bounded; writing $M=\sup_{\tau\in\mathbb R}|h(\tau)|$, we have \begin{align*} |\varphi_k(x)|\le \frac{|a|M}{k}. \end{align*} Therefore $\varphi_k\to 0$ uniformly. In distributional form, for every test function $\eta\in C_c^\infty(D)$, \begin{align*} \int_D \partial_i(\varphi_k)_\alpha\,\eta\,d\mathcal L^n=-\int_D (\varphi_k)_\alpha\,\partial_i\eta\,d\mathcal L^n. \end{align*} The right-hand side tends to $0$ because $\|\varphi_k\|_{L^\infty(D)}\le |a|M/k$, so the oscillating gradients converge weakly to $0$ even though they keep switching between the two rank-one states $sB$ and $tB$. This is the basic laminate pattern: fine layers remain visible in the energy density but disappear under weak convergence. [/example] The implication from quasiconvexity to rank-one convexity gives a fast obstruction to weak lower semicontinuity. If an integrand fails convexity on a rank-one line, then layered test functions can beat the affine comparison map. [quotetheorem:8748] [citeproof:8748] This theorem is necessary rather than sufficient as stated because it assumes lower semicontinuity of the integral functional and derives a pointwise testing inequality. Continuity of $f$ is needed to pass from oscillating tiled perturbations to the exact pointwise inequality after boundary and approximation errors are removed; without it, values on exceptional matrix sets can obstruct the pointwise conclusion. The lower $p$-growth bound prevents the negative part of the energy from becoming non-integrable along the tiled sequence, since an integrand with very large negative wells can make the comparison functional ill-defined or equal to $-\infty$ on admissible Sobolev maps. Boundedness of $\Omega$ keeps the affine comparison map and the growth term integrable on the whole domain; on unbounded domains, the same formula may fail to define a finite functional even for affine maps. To obtain a usable existence theorem, we need a converse under hypotheses that control both growth from above and below, so that weak convergence gives enough compactness and uniform integrability to localise the quasiconvex inequality. ## Lower Semicontinuity Of Vectorial Integral Functionals The main existence question is now precise: when does quasiconvexity guarantee that an integral functional is weakly lower semicontinuous on Sobolev spaces? The answer requires growth assumptions. Without them, weak convergence of gradients does not give enough uniform integrability to pass from local quasiconvex tests to the global lower bound. [definition: Standard p-Growth Integrand] Let $1<p<\infty$. A continuous function $f:\mathbb R^{m\times n}\to \mathbb R$ has standard $p$-growth if there exist constants $c_1,c_2,c_3>0$ such that, for every $A\in \mathbb R^{m\times n}$, \begin{align*} c_1|A|^p-c_2 \le f(A) \le c_3(1+|A|^p). \end{align*} [/definition] The lower bound gives coercive control of gradients in a minimizing sequence, while the upper bound allows the integrand to be approximated locally and prevents concentration from being hidden in the energy. This motivates the following theorem, which is the global lower semicontinuity statement obtained by combining quasiconvexity with these growth estimates. [quotetheorem:8749] [citeproof:8749] This theorem is the central replacement for the scalar convexity result. It says that, once coercivity gives weak compactness, quasiconvexity is the condition that prevents loss of energy through gradient microstructure. The restriction $1<p<\infty$ is tied to reflexivity and the decomposition tools used in the proof; at $p=1$, concentration phenomena can occur and lower semicontinuity requires additional hypotheses. The upper $p$-growth bound is not cosmetic: without it, a weakly convergent sequence can carry small regions of very large gradients whose energetic contribution is not controlled by the $W^{1,p}$ bound. The lower coercive bound is what turns bounded energy into bounded gradients, so dropping it weakens the direct method even if lower semicontinuity itself remains meaningful for a particular functional. The theorem also assumes that $f$ depends only on the gradient and not explicitly on $x$ or $u$; more general integral functionals need corresponding measurability, continuity, and growth hypotheses. The next structural issue is that many important nonconvex terms are minors of $\nabla u$, so their behaviour under weak convergence has to be understood separately. [example: Direct Method With A Quasiconvex Energy] Let $\Omega\subset\mathbb R^n$ be bounded and Lipschitz, let $1<p<\infty$, let $g\in W^{1,p}(\Omega;\mathbb R^m)$, and define \begin{align*} \mathcal A_g=g+W^{1,p}_0(\Omega;\mathbb R^m). \end{align*} Assume that $f$ is continuous, quasiconvex, and satisfies standard $p$-growth: for some $c_1,c_2,c_3>0$, \begin{align*} c_1|A|^p-c_2\le f(A)\le c_3(1+|A|^p)\quad\text{for every }A\in\mathbb R^{m\times n}. \end{align*} For \begin{align*} I[u]=\int_\Omega f(\nabla u(x))\,d\mathcal L^n(x), \end{align*} we show that $I$ attains its minimum on $\mathcal A_g$. First, $\mathcal A_g$ is nonempty because $g\in\mathcal A_g$. The upper $p$-growth bound gives \begin{align*} I[g]\le c_3\int_\Omega (1+|\nabla g(x)|^p)\,d\mathcal L^n(x)<\infty, \end{align*} so $\inf_{\mathcal A_g} I<\infty$. The lower $p$-growth bound gives, for every $u\in\mathcal A_g$, \begin{align*} I[u]\ge c_1\int_\Omega |\nabla u(x)|^p\,d\mathcal L^n(x)-c_2|\Omega|\ge -c_2|\Omega|, \end{align*} so the infimum is finite. Let $(u_j)$ be a minimizing sequence, and choose $C$ such that $I[u_j]\le C$ for all large $j$. Then \begin{align*} c_1\|\nabla u_j\|_{L^p(\Omega)}^p-c_2|\Omega|\le I[u_j]\le C. \end{align*} Hence \begin{align*} \|\nabla u_j\|_{L^p(\Omega)}^p\le \frac{C+c_2|\Omega|}{c_1}. \end{align*} Writing $v_j=u_j-g\in W^{1,p}_0(\Omega;\mathbb R^m)$, Poincare's inequality gives \begin{align*} \|v_j\|_{L^p(\Omega)}\le C_P\|\nabla v_j\|_{L^p(\Omega)}. \end{align*} Since $\nabla v_j=\nabla u_j-\nabla g$, \begin{align*} \|v_j\|_{L^p(\Omega)}\le C_P\bigl(\|\nabla u_j\|_{L^p(\Omega)}+\|\nabla g\|_{L^p(\Omega)}\bigr). \end{align*} Therefore $(v_j)$ is bounded in $W^{1,p}_0(\Omega;\mathbb R^m)$, and $(u_j)=(g+v_j)$ is bounded in $W^{1,p}(\Omega;\mathbb R^m)$. Because $1<p<\infty$, the Sobolev space $W^{1,p}(\Omega;\mathbb R^m)$ is reflexive. Passing to a subsequence, there is $u\in W^{1,p}(\Omega;\mathbb R^m)$ such that \begin{align*} u_j\rightharpoonup u\quad\text{in }W^{1,p}(\Omega;\mathbb R^m). \end{align*} The subspace $W^{1,p}_0(\Omega;\mathbb R^m)$ is closed and convex, hence weakly closed, so from $u_j-g\in W^{1,p}_0(\Omega;\mathbb R^m)$ we get $u-g\in W^{1,p}_0(\Omega;\mathbb R^m)$. Thus $u\in\mathcal A_g$. By the *Acerbi-Fusco Lower Semicontinuity Theorem*, \begin{align*} I[u]\le \liminf_{j\to\infty} I[u_j]. \end{align*} Since $(u_j)$ is minimizing, \begin{align*} \liminf_{j\to\infty} I[u_j]=\inf_{w\in\mathcal A_g} I[w]. \end{align*} Because $u\in\mathcal A_g$, the opposite inequality also holds: \begin{align*} \inf_{w\in\mathcal A_g} I[w]\le I[u]. \end{align*} Combining the two inequalities gives \begin{align*} I[u]=\inf_{w\in\mathcal A_g} I[w]. \end{align*} Thus the quasiconvex energy has a minimizer with boundary data $g$; coercivity gives compactness of minimizing sequences, and quasiconvexity supplies the lower semicontinuity needed to pass to the weak limit. [/example] The direct method example uses only lower semicontinuity, but many variational energies also contain determinant or minor terms. To justify passing to the limit in those terms, we need a separate weak continuity principle for minors of gradients. [quotetheorem:8750] [citeproof:8750] [Weak continuity of minors](/theorems/8750) explains why null Lagrangians are compatible with weak lower semicontinuity despite their nonconvexity. The condition $p\ge k$ is the natural integrability threshold because a $k$-minor is a product of $k$ first derivatives; if $p<k$, the minors need not even be uniformly bounded in $L^1_{\mathrm{loc}}$, so distributional compactness can fail. The distinction between $p>k$ and $p=k$ is also essential: for $p>k$, the bound in $L^{p/k}$ gives reflexive weak compactness, while at $p=k$ the target is $L^1$ and concentration can appear unless uniform integrability is imposed. A standard failure mode at the borderline is a sequence whose gradients concentrate on smaller and smaller sets while preserving bounded $L^k$ norm, producing a defect measure in the minors rather than convergence to the minor of the weak limit. In applications to elasticity, this theorem is one reason polyconvexity is easier to verify than quasiconvexity, since polyconvex energies are convex functions of all minors and the minors have this compensated compactness property. Quasiconvexity remains the precise condition for lower semicontinuity under the hypotheses above, while weak continuity of minors supplies a usable mechanism for important examples. [remark: Hierarchy Of Convexity Conditions] For continuous integrands on $\mathbb R^{m\times n}$, convexity implies quasiconvexity, and [quasiconvexity implies rank-one convexity](/theorems/8747). The converses fail in genuinely vectorial settings, meaning $m,n\ge 2$. Thus the hierarchy is \begin{align*} \text{convex} \implies \text{quasiconvex} \implies \text{rank-one convex}. \end{align*} [/remark] The chapter's conclusion is that the direct method in vectorial problems has two separate tasks. Coercivity and weak compactness produce candidates for minimizers, while quasiconvexity supplies the lower semicontinuity needed to pass to the limit in the energy. Rank-one convexity and weak continuity of minors are practical tests and structural consequences, but Morrey quasiconvexity is the condition that governs the variational problem itself. Quasiconvexity identifies the right lower semicontinuity condition for vector-valued gradient integrals, but it is difficult to verify directly. The next chapter turns to polyconvexity, a stronger condition built from minors that is especially useful in elasticity. # 8. Polyconvexity and Minors Chapter 7 treated quasiconvexity as the right lower semicontinuity condition for vectorial integral functionals, but quasiconvexity is hard to verify directly. This chapter introduces polyconvexity, a stronger condition built from the minors of the gradient matrix. The point is that minors have special weak continuity properties in Sobolev spaces, so convexity in these minors gives a usable route to existence theorems for nonlinear elasticity and related vector-valued problems. ## Convexity in the Minors The central question is how to impose convexity without forcing the energy density to be convex in the full gradient. For vector-valued maps $u: U \subset \mathbb R^n \to \mathbb R^m$, the determinant and lower-order minors encode orientation, volume change, and area change. Polyconvexity keeps convexity after these geometric quantities are added as independent variables. [definition: Minors of a Matrix] Let $A \in \mathbb R^{m \times n}$. For $1 \le k \le \min\{m,n\}$, a $k$-minor of $A$ is the determinant of a $k \times k$ submatrix obtained by selecting $k$ rows and $k$ columns of $A$. [/definition] Fix once and for all an ordering of these minors, and let \begin{align*} N=\sum_{k=1}^{\min\{m,n\}}\binom{m}{k}\binom{n}{k}. \end{align*} The associated lifted-minor map is \begin{align*} M:\mathbb R^{m\times n}\to \mathbb R^N, \qquad A\mapsto M(A), \end{align*} where $M(A)$ is the vector containing every minor of $A$ in the chosen order. It includes the entries of $A$ as the $1$-minors, and when $m=n$ it includes $\det A$ as the unique $n$-minor. In dimension $n=m=3$, the $2$-minors are equivalently recorded by the cofactor matrix $\operatorname{cof} A$. [example: Minors in Three Dimensions] Let $A=\operatorname{diag}(a,b,c)$, so $A_{11}=a$, $A_{22}=b$, $A_{33}=c$, and all off-diagonal entries are $0$. The $1$-minors are exactly these nine entries. For the cofactor matrix, using $(\operatorname{cof}A)_{ij}=(-1)^{i+j}\det A_{\widehat i,\widehat j}$, the diagonal entries are \begin{align*} (\operatorname{cof}A)_{11}=bc-0\cdot 0=bc, \qquad (\operatorname{cof}A)_{22}=ac-0\cdot 0=ac, \qquad (\operatorname{cof}A)_{33}=ab-0\cdot 0=ab. \end{align*} The off-diagonal cofactors vanish; for instance, \begin{align*} (\operatorname{cof}A)_{12}=-(0\cdot c-0\cdot 0)=0, \qquad (\operatorname{cof}A)_{13}=0\cdot 0-b\cdot 0=0, \end{align*} and the same calculation gives $(\operatorname{cof}A)_{21}=(\operatorname{cof}A)_{23}=(\operatorname{cof}A)_{31}=(\operatorname{cof}A)_{32}=0$. Hence \begin{align*} \operatorname{cof}A=\operatorname{diag}(bc,ac,ab). \end{align*} Finally, the determinant expansion over permutations has only one nonzero term, because every non-identity permutation selects at least one off-diagonal entry. Therefore \begin{align*} \det A=A_{11}A_{22}A_{33}=abc. \end{align*} Thus $M(A)$ records the coordinate stretches $a,b,c$, the coordinate area stretch factors $bc,ac,ab$, and the volume stretch factor $abc$. [/example] Ordinary convexity in $A$ is often too restrictive for elasticity, especially because determinant and cofactor terms encode geometric quantities that are nonlinear in the gradient. The preceding example shows the variables that should be added to the gradient: all minors, including the entries of $A$, cofactors, and determinants. The next definition formalizes the workable substitute, requiring convexity only after the density is viewed as a function of these enlarged minor variables. [definition: Polyconvex Function] Let $W: \mathbb R^{m \times n} \to \mathbb R \cup \{+\infty\}$. Let $N$ be the total number of minors of matrices in $\mathbb R^{m \times n}$, including the $1$-minors. The function $W$ is polyconvex if there exists a convex function $G: \mathbb R^N \to \mathbb R \cup \{+\infty\}$ such that \begin{align*} W(A)=G(M(A)) \end{align*} for every $A \in \mathbb R^{m \times n}$. [/definition] Polyconvexity is weaker than convexity in $A$, because the map $A \mapsto M(A)$ is nonlinear. It is still strong enough to interact well with weak convergence, since each minor is a null Lagrangian in a sense made precise below. [example: A Nonconvex Polyconvex Density] In the three-dimensional vectorial elasticity case, consider deformation gradients $A \in \mathbb R^{3 \times 3}$ and \begin{align*} W(A)=|A|^p+|\operatorname{cof}A|^q+h(\det A), \end{align*} where $p,q\ge 1$ and $h:\mathbb R\to\mathbb R\cup\{+\infty\}$ is convex. Define \begin{align*} G(F,C,d)=|F|^p+|C|^q+h(d) \end{align*} on $\mathbb R^{3\times 3}\times\mathbb R^{3\times 3}\times\mathbb R$. The maps $F\mapsto |F|^p$ and $C\mapsto |C|^q$ are convex because $|\cdot|$ is a norm and $t\mapsto t^p$, $t\mapsto t^q$ are convex and increasing on $[0,\infty)$. Since $h$ is convex and the sum of convex functions is convex, $G$ is convex. Also \begin{align*} W(A)=G(A,\operatorname{cof}A,\det A), \end{align*} so $W$ is polyconvex. This construction does not force convexity as a function of $A$. To see this concretely, take $p=q=1$ and $h(t)=t$, which is convex. On the affine line $A=tI$, the cofactor of $tI$ is $t^2I$ and $\det(tI)=t^3$, so \begin{align*} W(tI)=|tI|+|t^2I|+\det(tI)=\sqrt{3}|t|+\sqrt{3}t^2+t^3. \end{align*} At $t=-2,-3,-4$ this gives \begin{align*} W(-2I)=2\sqrt{3}+4\sqrt{3}-8=6\sqrt{3}-8. \end{align*} \begin{align*} W(-3I)=3\sqrt{3}+9\sqrt{3}-27=12\sqrt{3}-27. \end{align*} \begin{align*} W(-4I)=4\sqrt{3}+16\sqrt{3}-64=20\sqrt{3}-64. \end{align*} Convexity along this line would require \begin{align*} W(-3I)\le \frac{W(-2I)+W(-4I)}{2}=13\sqrt{3}-36. \end{align*} But \begin{align*} (12\sqrt{3}-27)-(13\sqrt{3}-36)=9-\sqrt{3}>0, \end{align*} so the convexity inequality fails. Thus convexity in the lifted variables $(A,\operatorname{cof}A,\det A)$ is genuinely weaker than convexity in the original matrix variable $A$. [/example] This example is the model for nonlinear elasticity. A stored-energy density may penalise compression through $\det A$, penalise surface distortion through $\operatorname{cof} A$, and still remain accessible to the direct method. ## Weak Continuity of Minors The key analytic problem is that weak convergence $u_j \rightharpoonup u$ in $W^{1,p}$ does not imply pointwise convergence of $\nabla u_j$. The direct method therefore needs expressions in $\nabla u_j$ that pass to the limit weakly. Minors have this property because they can be written in divergence form when applied to gradients. [quotetheorem:8751] [citeproof:8751] This theorem is the prototype: the determinant behaves better than a general degree-$n$ polynomial in the gradient because it is a null Lagrangian. The exponent $W^{1,n}$ is tied to the degree of the determinant: the product of $n$ first derivatives is then integrable, and the integration-by-parts formula can be interpreted distributionally. Below this natural exponent, determinant sequences may concentrate, so distributional limits can acquire defect measures rather than remaining the determinant of the weak limit. Even at the correct exponent, the conclusion is only distributional convergence; it does not give pointwise convergence, strong $L^1$ convergence, or automatic lower semicontinuity for arbitrary nonlinear functions of $\det \nabla u_j$. The same mechanism applies to every minor, with the Sobolev exponent adjusted to the order of the minor. [quotetheorem:8750] [citeproof:8750] The condition $p \ge k$ is not cosmetic: a $k$-minor is a product of $k$ first derivatives, so for $p<k$ these products need not even be uniformly bounded in $L^1$ along a bounded $W^{1,p}$ sequence. Concentration examples can make the minors converge only after adding a singular defect, which is invisible in the weak limit of the gradients. Distributional convergence is also weaker than weak convergence in $L^1$; it tests only against smooth compactly supported functions and gives no uniform integrability by itself. For lower semicontinuity one therefore needs convexity in the minor variables together with growth assumptions that keep the minors in a weakly compact function space. The next example records the concrete three-dimensional case used most often in elasticity. [example: Cofactors and Determinants in Elasticity] Let $u_j \rightharpoonup u$ in $W^{1,p}(U;\mathbb R^3)$. For each $x$, the matrix $\nabla u_j(x)$ has entries $\partial_\alpha u_j^i(x)$, and each cofactor entry is a signed $2$-minor. For example, \begin{align*} (\operatorname{cof}\nabla u_j)_{11}=\partial_2u_j^2\,\partial_3u_j^3-\partial_3u_j^2\,\partial_2u_j^3. \end{align*} Similarly, \begin{align*} (\operatorname{cof}\nabla u_j)_{12}=-(\partial_1u_j^2\,\partial_3u_j^3-\partial_3u_j^2\,\partial_1u_j^3). \end{align*} The remaining seven entries have the same form: each is $(-1)^{a+b}$ times the determinant of a $2\times 2$ submatrix of $\nabla u_j$. Therefore, if $p\ge 2$, *Weak Continuity of Minors* applied with $k=2$ gives, for every $\phi\in C_c^\infty(U)$ and every pair of indices $a,b\in\{1,2,3\}$, \begin{align*} \int_U \phi\,(\operatorname{cof}\nabla u_j)_{ab}\,d\mathcal L^3 \to \int_U \phi\,(\operatorname{cof}\nabla u)_{ab}\,d\mathcal L^3. \end{align*} This is exactly distributional convergence of each entry of $\operatorname{cof}\nabla u_j$ to the corresponding entry of $\operatorname{cof}\nabla u$. The determinant is the unique $3$-minor of $\nabla u_j$. Expanding by permutations gives \begin{align*} \det\nabla u_j=\sum_{\sigma\in S_3}\operatorname{sgn}(\sigma)\,\partial_1u_j^{\sigma(1)}\,\partial_2u_j^{\sigma(2)}\,\partial_3u_j^{\sigma(3)}. \end{align*} Thus, if $p\ge 3$, *Weak Continuity of Minors* applied with $k=3$ gives \begin{align*} \int_U \phi\,\det\nabla u_j\,d\mathcal L^3 \to \int_U \phi\,\det\nabla u\,d\mathcal L^3 \end{align*} for every $\phi\in C_c^\infty(U)$. Hence $\det\nabla u_j\to\det\nabla u$ in the sense of distributions. The cofactor entries are degree-$2$ products of first derivatives, while the determinant is a degree-$3$ product, so an energy depending on $(\nabla u,\operatorname{cof}\nabla u,\det\nabla u)$ needs integrability hypotheses matching those degrees. [/example] The weak continuity result explains why determinants and cofactors appear naturally in existence theory. They are nonlinear expressions, but they retain enough weak stability to act like additional weak variables. ## From Polyconvexity to Quasiconvexity Quasiconvexity was introduced because it is the natural lower semicontinuity condition for integral functionals under weak Sobolev convergence. Polyconvexity is useful only if it implies quasiconvexity, since otherwise it would not connect back to the lower semicontinuity theory already developed. [definition: Quasiconvex Function] Let $W: \mathbb R^{m \times n} \to \mathbb R$ be Borel measurable and locally integrable. The function $W$ is quasiconvex if for every $A \in \mathbb R^{m \times n}$, every bounded open set $V \subset \mathbb R^n$, and every $\varphi \in W^{1,\infty}_0(V;\mathbb R^m)$, \begin{align*} W(A) \le \frac{1}{\mathcal L^n(V)}\int_V W(A+\nabla \varphi(x))\,d\mathcal L^n(x). \end{align*} [/definition] This condition says that oscillatory compactly supported perturbations cannot lower the average energy of an affine map. The remaining question is why convexity in the larger minor variable should force this averaging inequality in the original gradient variable. The answer is that minors have fixed averages under compactly supported gradient perturbations, so [Jensen's inequality](/theorems/1977) becomes available. [quotetheorem:8753] [citeproof:8753] The finiteness hypothesis matters here because Jensen's inequality is being applied to an ordinary convex function with finite averaged values; extended-valued constraints such as $h(t)=+\infty$ for $t\le 0$ require a separate admissibility argument to ensure the perturbations remain in the effective domain. The theorem also does not characterize quasiconvexity: many quasiconvex functions are not polyconvex, so polyconvexity is a usable sufficient condition rather than the sharp one. The null-Lagrangian identity for minors is the bridge between the two theories, because it says that compactly supported gradient perturbations preserve the average lifted variable $M(A+\nabla\varphi)$. This result places polyconvexity inside the hierarchy of convexity notions used for vectorial variational problems. The inclusions are strict in general, so polyconvexity is a sufficient condition rather than a characterization of lower semicontinuity. [remark: Convexity Hierarchy] For integral densities depending on matrices, convexity implies polyconvexity, [polyconvexity implies quasiconvexity](/theorems/8753), and quasiconvexity implies rank-one convexity. None of the reverse implications holds in full generality. Rank-one convexity is often easier to test through line restrictions $t \mapsto W(A+t a\otimes b)$, but it is too weak on its own for the direct method. [/remark] The hierarchy clarifies the tradeoff. Convexity is simple but too restrictive for elasticity, while quasiconvexity is sharp but difficult to check. Polyconvexity occupies the practical middle ground. [example: Comparing the Four Conditions] Assume $p\ge 1$. The map $A\mapsto |A|^p$ is convex because, for $0\le \lambda\le 1$, \begin{align*} |\lambda A+(1-\lambda)B|\le \lambda |A|+(1-\lambda)|B| \end{align*} by the triangle inequality and homogeneity of the Frobenius norm, and then \begin{align*} |\lambda A+(1-\lambda)B|^p\le \bigl(\lambda |A|+(1-\lambda)|B|\bigr)^p\le \lambda |A|^p+(1-\lambda)|B|^p \end{align*} because $t\mapsto t^p$ is convex and increasing on $[0,\infty)$. Since the $1$-minors are the entries of $A$, this same function is polyconvex by taking the convex lifted function $G(M(A))=|A|^p$. It is quasiconvex by *Polyconvexity Implies Quasiconvexity*, and it is rank-one convex because restricting a convex function to any affine line $A+t\,a\otimes b$ gives a convex function of $t$. A density of the form $A\mapsto |A|^p+h(\det A)$ with convex $h$ is polyconvex, since it equals $G(A,\operatorname{cof}A,\det A)$ with \begin{align*} G(F,C,d)=|F|^p+h(d), \end{align*} and this $G$ is convex in the lifted variables. It need not be convex in $A$: in dimension $3$, take $p=1$ and $h(t)=t$. Along $A=tI$, \begin{align*} W(tI)=|tI|+\det(tI)=\sqrt{3}|t|+t^3. \end{align*} Thus \begin{align*} W(-2I)=2\sqrt{3}-8,\qquad W(-3I)=3\sqrt{3}-27,\qquad W(-4I)=4\sqrt{3}-64. \end{align*} Convexity would require \begin{align*} W(-3I)\le \frac{W(-2I)+W(-4I)}{2}=3\sqrt{3}-36, \end{align*} but \begin{align*} W(-3I)-(3\sqrt{3}-36)=9>0. \end{align*} So this polyconvex density fails even midpoint convexity along the line $t\mapsto tI$. Finally, Morrey-type counterexamples show that rank-one convexity does not imply quasiconvexity, so testing only the rank-one lines $t\mapsto A+t\,a\otimes b$ cannot certify weak lower semicontinuity. [/example] This comparison motivates why existence theorems in elasticity are often formulated with polyconvexity. It is strong enough for proofs and flexible enough to encode determinant constraints. ## Polyconvex Lower Semicontinuity and Existence We now return to the direct method. For an admissible class $\mathcal A \subset W^{1,p}(U;\mathbb R^m)$, the functional is \begin{align*} I: \mathcal A \to \mathbb R \cup \{+\infty\}, \qquad I[u]=\int_U W(x,u(x),\nabla u(x))\,d\mathcal L^n(x). \end{align*} The aim is to minimize $I$ over $\mathcal A$, usually with prescribed boundary data. Polyconvexity supplies weak lower semicontinuity once the growth assumptions control the relevant minors. [quotetheorem:8754] [citeproof:8754] Each hypothesis has a separate role. Convexity in the lifted minor variables gives lower semicontinuity, but it only applies after the minors have enough compactness to converge weakly in an integral space. If uniform integrability fails, a sequence can concentrate determinant mass on smaller and smaller sets: the gradients may remain weakly bounded while $\det \nabla u_j$ develops a singular concentration, and the convex integral theorem cannot identify the limit as $\det \nabla u$. The coercive lower bound controls the $W^{1,p}$ norm and prevents minimizing sequences from escaping to infinity, while weak closedness of $\mathcal A$ is not part of lower semicontinuity itself but is needed later to keep the weak limit admissible. In three-dimensional elasticity the determinant term usually needs separate control beyond the $p$-growth of $|\nabla u|$ when $p<3$. [example: Stored Energy in Nonlinear Elasticity] Let $U \subset \mathbb R^3$ be the reference configuration and let $u \in W^{1,p}(U;\mathbb R^3)$ satisfy the prescribed boundary data. For $A \in \mathbb R^{3\times 3}$, consider \begin{align*} W(A)=\alpha |A|^p+\beta |\operatorname{cof}A|^q+h(\det A), \end{align*} where $\alpha,\beta>0$, $p,q>1$, and $h$ is convex with $h(t)\to+\infty$ as $t\downarrow 0$. This energy is polyconvex because it can be written as a convex function of the lifted variables. Define \begin{align*} G(F,C,d)=\alpha |F|^p+\beta |C|^q+h(d) \end{align*} for $(F,C,d)\in \mathbb R^{3\times 3}\times\mathbb R^{3\times 3}\times\mathbb R$. Since $|\cdot|$ is a norm and $r\mapsto r^p$, $r\mapsto r^q$ are convex and increasing on $[0,\infty)$, the maps $F\mapsto |F|^p$ and $C\mapsto |C|^q$ are convex. Multiplication by the positive constants $\alpha,\beta$ preserves convexity, and adding the convex function $h(d)$ preserves convexity. Hence $G$ is convex, and \begin{align*} W(A)=G(A,\operatorname{cof}A,\det A). \end{align*} Thus $W$ is convex in the minors of $A$, not necessarily in $A$ itself. The term $\beta|\operatorname{cof}A|^q$ penalises area distortion because $\operatorname{cof}A$ records the signed $2$-minors of $A$, while $h(\det A)$ penalises volume collapse because $\det A$ records the signed volume stretch. The condition $h(t)\to+\infty$ as $t\downarrow 0$ means that along any sequence with $\det A_j>0$ and $\det A_j\to 0$, one has $h(\det A_j)\to+\infty$, so finite-energy deformations cannot approach volume collapse without paying unbounded determinant energy. If the admissible class is weakly closed and the exponents give the compactness required for the minors, then *[Ball Polyconvex Sequential Weak Lower Semicontinuity Theorem](/theorems/8754)* applies to the representation above and gives weak lower semicontinuity of \begin{align*} I[u]=\int_U W(\nabla u(x))\,d\mathcal L^3(x). \end{align*} The lower bound \begin{align*} W(A)\ge \alpha |A|^p \end{align*} gives coercive control of the gradient part, up to the boundary condition and the usual Poincare inequality. Therefore a minimizing sequence has a weakly convergent subsequence in $W^{1,p}$, weak closedness keeps the limit admissible, and *[Polyconvex Direct Method Existence Criterion](/theorems/8755)* yields a minimizer. [/example] This is the main reason polyconvexity appears in the direct method: it gives the lower semicontinuity component. To turn that component into an actual minimizer, the direct method still needs the compactness component from coercivity and the admissibility component from weak closedness. [quotetheorem:8755] [citeproof:8755] This criterion is the direct method in its polyconvex form, and every assumption is doing visible work. Nonemptiness ensures there is an infimum over an actual class rather than a vacuous problem. Coercivity supplies bounded minimizing sequences, while reflexivity of $W^{1,p}$ for $1<p<\infty$ supplies weakly convergent subsequences. Weak closedness is needed because boundary conditions, determinant constraints, or injectivity-type restrictions may fail to survive weak limits unless the admissible class is chosen carefully. The conclusion is only existence of a minimizer; it does not prove uniqueness, regularity, injectivity, or physical stability of that minimizer. ## What Polyconvexity Does Not Solve The final issue is to understand the limitations of the condition. Polyconvexity is a practical sufficient hypothesis, but it does not remove all modelling and analytic difficulties. In elasticity, constraints such as injectivity, noninterpenetration of matter, and positivity of the determinant require additional structure. [remark: Orientation Preservation] A condition such as $\det \nabla u>0$ a.e. is not automatically weakly closed under weak convergence in $W^{1,p}$. Polyconvex energies often include a term $h(\det \nabla u)$ with $h(t)=+\infty$ for $t\le 0$ or with $h(t)\to +\infty$ as $t\downarrow 0$ to discourage orientation reversal or collapse. The resulting admissible class must still be chosen with care. [/remark] This warning connects the present chapter with the broader existence theory. Polyconvexity handles weak lower semicontinuity of the energy, but the admissible class and coercive bounds carry the remaining modelling constraints. [example: Determinant Constraint and Coercivity] Consider the determinant penalty \begin{align*} h(t)=t^{-s}\quad\text{for }t>0,\qquad h(t)=+\infty\quad\text{for }t\le 0, \end{align*} with $s>0$. If \begin{align*} \int_U h(\det\nabla u)\,d\mathcal L^3<\infty, \end{align*} then the set $E=\{x\in U:\det\nabla u(x)\le 0\}$ must have $\mathcal L^3(E)=0$, because $h(\det\nabla u)=+\infty$ on $E$. Thus finite energy forces $\det\nabla u>0$ a.e. It also quantitatively penalises near-collapse. For $0<\varepsilon<1$, set \begin{align*} E_\varepsilon=\{x\in U:0<\det\nabla u(x)\le \varepsilon\}. \end{align*} On $E_\varepsilon$ one has $(\det\nabla u)^{-s}\ge \varepsilon^{-s}$, so \begin{align*} \int_U h(\det\nabla u)\,d\mathcal L^3\ge \int_{E_\varepsilon}(\det\nabla u)^{-s}\,d\mathcal L^3\ge \varepsilon^{-s}\mathcal L^3(E_\varepsilon). \end{align*} Hence if the determinant part of the energy is at most $C$, then \begin{align*} \mathcal L^3(E_\varepsilon)\le C\varepsilon^s. \end{align*} This determinant penalty still does not control the full gradient. For example, let \begin{align*} A_k=\operatorname{diag}(k,k^{-1},1). \end{align*} Then \begin{align*} \det A_k=k\cdot k^{-1}\cdot 1=1, \end{align*} so $h(\det A_k)=h(1)=1$. But the Frobenius norm satisfies \begin{align*} |A_k|^2=k^2+k^{-2}+1, \end{align*} and therefore \begin{align*} |A_k|^p=(k^2+k^{-2}+1)^{p/2}\to+\infty. \end{align*} Thus affine maps $u_k(x)=A_kx$ have uniformly bounded determinant penalty on a bounded domain, while their $L^p$ gradient norms diverge. A coercive term such as $|\nabla u|^p$ is therefore needed to obtain compactness in $W^{1,p}$. [/example] The chapter's main message is therefore structural. Minors are weakly continuous in the right Sobolev regimes, convexity in minors implies quasiconvexity, and Ball's theorem turns these facts into an existence theorem. Polyconvexity is not the sharp lower semicontinuity condition, but it is the condition that makes nonlinear geometric energies tractable by the direct method. Polyconvexity gives a practical route from weak continuity of minors to lower semicontinuity. The next chapter applies that route to nonlinear elasticity, where determinant constraints, orientation preservation, and coercive growth become part of the existence problem. # 9. Direct Methods in Nonlinear Elasticity This chapter applies the direct method to nonlinear elasticity, where the unknown is a deformation of an elastic body rather than a scalar function. The new difficulty is geometric: a physically admissible deformation should preserve orientation and should not interpenetrate matter. The convexity hypotheses from Chapters 4 and 7 are therefore too rigid for elasticity, and the polyconvex framework of Chapter 8 supplies the minors of the deformation gradient, determinant constraints, and coercive blow-up used as local volume collapse is approached. ## Deformation Maps and Finite-Strain Energies The first question is what class of maps can represent deformations of an elastic body. In the small-strain theory the displacement may be the primary unknown, but finite elasticity works with the actual placement map $u:\bar{\Omega}\to \mathbb R^n$. This makes the gradient $\nabla u$ a local linear approximation to the deformation, and the sign and size of $\det \nabla u(x)$ carry physical information about orientation and local volume change. Let $\Omega\subset \mathbb R^n$ be a bounded Lipschitz domain representing the reference configuration of an elastic body. A deformation is usually sought in an affine Sobolev class with prescribed boundary values, for example $u\in W^{1,p}(\Omega;\mathbb R^n)$ with $u=u_0$ on a prescribed boundary part in the trace sense. The first formal object is therefore the Sobolev map that records the placement of each material point. [definition: Deformation] Let $\Omega\subset \mathbb R^n$ be open and let $1\le p\le \infty$. A deformation of $\Omega$ is a map $u\in W^{1,p}(\Omega;\mathbb R^n)$. [/definition] For $p>n$, Sobolev embedding gives a continuous representative, so pointwise injectivity and boundary values can be discussed directly. For $p\le n$, deformations may be discontinuous or may fail to have classical pointwise behaviour. Since elastic matter should not reverse orientation locally, the next admissibility condition is expressed through the a.e. sign of the Jacobian determinant. [definition: Orientation Preserving Deformation] Let $\Omega\subset \mathbb R^n$ be open and let $u\in W^{1,p}(\Omega;\mathbb R^n)$ with $p\ge n$. The deformation $u$ is orientation preserving if \begin{align*} \det Ju_x>0 \end{align*} for $\mathcal L^n$-a.e. $x\in \Omega$. [/definition] The strict inequality is physically natural, but it is analytically delicate. Weak limits do not generally preserve strict pointwise inequalities, so the admissible class may fail to be weakly closed unless the energy penalizes approach to zero determinant or a weaker closed condition is used. [example: Collapse of Orientation Under Weak Limits] Let $\Omega=(0,1)$ and define $u_k(x)=x/k$. For every $x\in(0,1)$, the weak derivative is \begin{align*} u_k'(x)=\frac{1}{k}, \end{align*} so $u_k'(x)>0$ everywhere. Thus each $u_k$ is orientation preserving in the one-dimensional sense. We now verify the convergence to the constant map $u=0$. If $1\le p<\infty$, then \begin{align*} \|u_k-0\|_{L^p(0,1)}^p=\int_0^1 \left|\frac{x}{k}\right|^p\,dx=\frac{1}{k^p}\int_0^1 x^p\,dx=\frac{1}{k^p(p+1)}. \end{align*} Hence \begin{align*} \|u_k-0\|_{L^p(0,1)}=\frac{1}{k}(p+1)^{-1/p}\to 0. \end{align*} For the derivatives, \begin{align*} \|u_k'-0\|_{L^p(0,1)}^p=\int_0^1 \left|\frac{1}{k}\right|^p\,dx=\frac{1}{k^p}, \end{align*} so \begin{align*} \|u_k'-0\|_{L^p(0,1)}=\frac{1}{k}\to 0. \end{align*} If $p=\infty$, then \begin{align*} \|u_k\|_{L^\infty(0,1)}=\sup_{0<x<1}\frac{x}{k}=\frac{1}{k} \end{align*} and \begin{align*} \|u_k'\|_{L^\infty(0,1)}=\frac{1}{k}, \end{align*} so the same conclusion holds. Therefore $u_k\to 0$ strongly in $W^{1,p}(0,1)$, and strong convergence implies weak convergence because every continuous linear functional sends norm-convergent sequences to convergent scalar sequences. The weak limit $u=0$ has derivative \begin{align*} u'(x)=0 \end{align*} for a.e. $x\in(0,1)$, so the strict positivity condition $u_k'>0$ is lost in the limit. Even in one dimension, positivity of the Jacobian is not weakly closed; in higher dimensions the same failure appears as loss of the strict inequality $\det Ju_x>0$. [/example] The example shows why finite elasticity cannot rely on the strict determinant constraint alone when passing to weak limits. The energy density must have a structural condition that still gives weak lower semicontinuity while retaining the determinant as a visible variable. This leads to polyconvexity, where convexity is imposed after enlarging the gradient variable to include all minors. [definition: Polyconvex Energy Density] Let $n,m\in \mathbb N$, and let $N=\sum_{r=1}^{\min\{m,n\}}\binom{m}{r}\binom{n}{r}$. Define the minors map \begin{align*} M:\mathbb R^{m\times n}\to \mathbb R^N \end{align*} by listing all minors of orders $1,\dots,\min\{m,n\}$ in a fixed order. A Borel function $W:\mathbb R^{m\times n}\to (-\infty,\infty]$ is polyconvex if there exists a convex function $G:\mathbb R^N\to (-\infty,\infty]$ such that \begin{align*} W(F)=G(M(F)) \end{align*} for every $F\in \mathbb R^{m\times n}$. [/definition] Polyconvexity is designed to interact with weak convergence in Sobolev spaces because minors of gradients have special weak continuity properties. For elasticity in dimension $n=3$, the relevant list is $F$, $\operatorname{cof}F$, and $\det F$. To formulate the variational problem, this stored-energy density is integrated over the reference body and combined with external loading. [definition: Finite-Strain Elastic Energy] Let $\Omega\subset \mathbb R^n$ be bounded and open, and let $\mathcal A\subset W^{1,p}(\Omega;\mathbb R^n)$ be an admissible class of deformations. Let \begin{align*} W:\Omega\times \mathbb R^n\times \mathbb R^{n\times n}\to (-\infty,\infty] \end{align*} be a stored-energy density and let $\ell:\mathcal A\to \mathbb R$ be a loading functional. The finite-strain elastic energy associated to $W$ and $\ell$ is the functional $I:\mathcal A\to (-\infty,\infty]$ defined by \begin{align*} I[u]=\int_\Omega W(x,u(x),Ju_x)\,d\mathcal L^n(x)-\ell[u]. \end{align*} [/definition] The term $W(x,u,Ju_x)$ measures internal stored energy, while $\ell[u]$ represents body forces and surface loads. The direct method will work when the internal energy controls the Sobolev norm strongly enough and is lower semicontinuous under weak convergence. [example: Neo-Hookean-Type Polyconvex Energy] In three dimensions, define the extended determinant term by \begin{align*} \tilde h(t)=h(t)\text{ for }t>0,\quad \tilde h(t)=\infty\text{ for }t\le 0. \end{align*} For $F\in\mathbb R^{3\times 3}$, the minors map can be written as \begin{align*} M(F)=(F,\operatorname{cof}F,\det F), \end{align*} where $F$ records the $1\times 1$ minors, $\operatorname{cof}F$ records the signed $2\times 2$ minors, and $\det F$ records the $3\times 3$ minor. Set \begin{align*} G(A,B,t)=a|A|^p+b|B|^q+\tilde h(t) \end{align*} for $(A,B,t)\in\mathbb R^{3\times 3}\times\mathbb R^{3\times 3}\times\mathbb R$. Since $p>1$ and $q>1$, the maps $A\mapsto |A|^p$ and $B\mapsto |B|^q$ are convex, and $\tilde h$ is convex because it is the convex function $h$ on the convex domain $(0,\infty)$ and is $\infty$ outside that domain. Therefore $G$ is convex as a sum of convex functions. For every $F$, \begin{align*} G(M(F))=G(F,\operatorname{cof}F,\det F)=a|F|^p+b|\operatorname{cof}F|^q+\tilde h(\det F)=W(F). \end{align*} Thus $W$ is polyconvex. The term $a|F|^p$ penalizes stretching of line elements, $b|\operatorname{cof}F|^q$ penalizes area distortion, and $\tilde h(\det F)$ makes $W(F)=\infty$ when $\det F\le 0$ while forcing the energy to diverge as $\det F\downarrow 0$. [/example] This model is simple enough to show the structure of the theory and rich enough to include the determinant barrier. It also foreshadows the main compactness question: whether a minimizing sequence can lose volume, fold over itself, or concentrate energy in singular regions. ## Determinant Constraints and Coercivity The next problem is to obtain compactness while ruling out unphysical collapse. A coercive growth condition such as $W(F)\ge c|F|^p-C$ controls the $W^{1,p}$ norm, but it does not prevent $\det F$ from approaching zero or changing sign. Nonlinear elasticity therefore combines Sobolev coercivity with determinant restrictions and blow-up near singular matrices. [definition: Determinant Barrier] Let $W:\mathbb R^{n\times n}\to (-\infty,\infty]$. The density $W$ has a determinant barrier if \begin{align*} W(F)=\infty \quad \text{whenever } \det F\le 0, \end{align*} and, for every sequence $(F_k)$ with $\det F_k>0$ and $\det F_k\to 0$, \begin{align*} W(F_k)\to \infty. \end{align*} [/definition] The first part enforces orientation preservation at finite energy, while the second part penalizes compression to zero volume. In minimization, this barrier controls degeneracy but does not by itself give weak compactness. We therefore need a coercive estimate showing that bounded energy forces bounded Sobolev norm for minimizing sequences. [quotetheorem:8756] [citeproof:8756] This theorem supplies weak compactness when $1<p<\infty$, and each hypothesis has a specific role. The condition $p>1$ is what gives reflexivity of $W^{1,p}$; at $p=1$ bounded sequences need not have weakly convergent subsequences in $W^{1,1}$. The trace condition on a positive boundary part rules out loss of compactness by translations and rigid motions, while the coercive lower bound prevents gradients from escaping to infinity along a minimizing sequence. The subcritical loading assumption is also essential: a load with $p$-growth and the wrong sign can cancel the internal coercivity and make the energy unbounded below. Even with these hypotheses, the theorem says nothing by itself about whether the weak limit remains physically admissible. The determinant condition is not a convex constraint in $Ju_x$, so weak closure must be handled using the special algebraic structure of minors. [quotetheorem:8750] [citeproof:8750] Weak continuity of minors is the technical reason polyconvexity replaces convexity. The gradient structure is essential here: arbitrary weakly convergent matrix fields do not have weakly continuous determinants, because the cancellations used in the integration-by-parts argument depend on the equality of mixed partial derivatives. The threshold $p\ge r$ is also meaningful, since below it the products forming $r\times r$ minors need not even have enough integrability to define stable distributional limits. This theorem does not prove injectivity, preserve strict determinant positivity, or rule out cavitation; it only supplies the compactness of the algebraic quantities that appear in a polyconvex integrand. It also explains why determinant constraints are more tractable when they are encoded through the energy density rather than imposed as an isolated open condition. [example: Determinant Blow-Up Prevents Uniform Compression] Let $\Omega\subset \mathbb R^n$ have $0<\mathcal L^n(\Omega)<\infty$, and consider the affine compression $u_t(x)=tx$ for $t>0$. For each component, $(u_t)_i(x)=t x_i$, so \begin{align*} \frac{\partial (u_t)_i}{\partial x_j}=t\delta_{ij}. \end{align*} Hence $Ju_t=tI_n$. Since $tI_n$ is diagonal with diagonal entries $t,\dots,t$, its determinant is \begin{align*} \det Ju_t=\det(tI_n)=t\cdot t\cdots t=t^n. \end{align*} Its Frobenius norm satisfies \begin{align*} |Ju_t|^2=|tI_n|^2=\sum_{i=1}^n\sum_{j=1}^n t^2\delta_{ij}=nt^2, \end{align*} so \begin{align*} |Ju_t|^p=(nt^2)^{p/2}=n^{p/2}t^p. \end{align*} For $W(F)=|F|^p+h(\det F)$, the total stored energy is therefore \begin{align*} \int_\Omega W(Ju_t)\,d\mathcal L^n=\int_\Omega \left(n^{p/2}t^p+h(t^n)\right)\,d\mathcal L^n. \end{align*} Because the integrand is constant in $x$, this becomes \begin{align*} \int_\Omega W(Ju_t)\,d\mathcal L^n=\mathcal L^n(\Omega)\left(n^{p/2}t^p+h(t^n)\right). \end{align*} As $t\downarrow 0$, we have $n^{p/2}t^p\to 0$ and $t^n\downarrow 0$, so the assumption $h(s)\to\infty$ as $s\downarrow 0$ gives \begin{align*} \mathcal L^n(\Omega)\left(n^{p/2}t^p+h(t^n)\right)\to\infty. \end{align*} By contrast, the pure gradient energy would be \begin{align*} \int_\Omega |Ju_t|^p\,d\mathcal L^n=\mathcal L^n(\Omega)n^{p/2}t^p\to 0. \end{align*} Thus the determinant term penalizes uniform collapse to zero volume, while the term $|F|^p$ alone would energetically favor this compression. [/example] The example addresses local volume collapse, but finite energy and positive determinant still do not guarantee global injectivity. A map may preserve orientation locally and yet overlap distant parts of the body. The next condition compares the Jacobian-counted volume with the measure of the image to exclude such self-interpenetration. [definition: Ciarlet-Necas Condition] Let $\Omega\subset \mathbb R^n$ be bounded and open, let $p>n$, and let $u\in W^{1,p}(\Omega;\mathbb R^n)$ with $\det Ju_x>0$ for $\mathcal L^n$-a.e. $x\in\Omega$. The map $u$ satisfies the Ciarlet-Necas condition if \begin{align*} \int_\Omega \det Ju_x\,d\mathcal L^n(x)\le \mathcal L^n(u(\Omega)). \end{align*} [/definition] Together with the [area formula](/theorems/3075), this condition expresses a.e. injectivity of $u$ up to negligible sets. In many elasticity courses this condition is stated rather than proved, because the proof uses the area formula and fine properties of Sobolev mappings. Its role in the direct method is to add a global exclusion of self-interpenetration to the local condition $\det Ju_x>0$. The inequality direction may look surprising at first. The area formula counts multiplicity, so if a deformation overlaps two regions onto the same part of space, then $\int_\Omega \det Ju_x\,d\mathcal L^n(x)$ counts that overlapped volume more than once, while $\mathcal L^n(u(\Omega))$ counts it once. ## Existence Under Polyconvex Hypotheses We now ask for a complete existence theorem. The direct method needs three ingredients: bounded minimizing sequences, a weakly closed admissible class, and weak lower semicontinuity of the energy. Ball's theorem combines these ingredients under hypotheses tailored to finite elasticity. [definition: Elastic Admissible Class] Let $\Omega\subset \mathbb R^n$ be bounded and Lipschitz, let $u_0\in W^{1,p}(\Omega;\mathbb R^n)$, and let $\Gamma_D\subset \partial\Omega$ have positive surface measure. An elastic admissible class is a set of the form \begin{align*} \mathcal A=\{u\in W^{1,p}(\Omega;\mathbb R^n): u=u_0 \text{ on } \Gamma_D,\ \det Ju_x>0 \text{ a.e.},\ u \text{ satisfies the chosen non-interpenetration condition}\}. \end{align*} [/definition] This definition packages the modelling choices. The remaining question is whether these choices are compatible with the direct method: do coercivity, polyconvexity, determinant blow-up, and weak closure together force the existence of a minimizer? The main existence theorem answers that question by assembling the compactness and lower-semicontinuity inputs proved above. [quotetheorem:8757] [citeproof:8757] The theorem is the finite-elasticity version of the direct method, but each hypothesis carries modelling content. Polyconvexity gives lower semicontinuity; without a convexity condition on minors, oscillating gradients can lower the limiting energy and destroy existence. Coercivity gives compactness; without it, minimizing sequences may run off through large gradients or rigid motions, especially when boundary data do not anchor the body. The determinant barrier encodes the physical prohibition against local volume collapse, but it does not by itself guarantee global injectivity, which is why weak closure of the admissible class is stated separately. Weak continuity of the load is also needed, since a load that is only strongly continuous may fail to pass to the weak limit selected by compactness. The conclusion is an existence result in a Sobolev class, not a smoothness theorem and not a global invertibility theorem unless those properties are built into the admissible class and shown to be closed. [example: Prescribed Boundary Deformation of an Elastic Body] Let $\Omega\subset \mathbb R^3$ be bounded and Lipschitz, and prescribe the affine boundary deformation $u_0(x)=Ax+b$ with $\det A>0$. Since $u_0$ is affine, each component has weak derivative \begin{align*} \frac{\partial (u_0)_i}{\partial x_j}=A_{ij}. \end{align*} Thus $Ju_0=A$ a.e. in $\Omega$, and \begin{align*} \det Ju_0=\det A>0. \end{align*} Because $\Omega$ is bounded, $u_0\in W^{1,p}(\Omega;\mathbb R^3)$ for every finite $p$, and its trace on $\partial\Omega$ is the prescribed boundary value $u_0$. If the non-interpenetration condition is the Ciarlet-Necas condition, then the affine map is injective because $\det A>0$ implies $A$ is invertible. Moreover, \begin{align*} \int_\Omega \det Ju_0\,d\mathcal L^3=(\det A)\mathcal L^3(\Omega). \end{align*} By the change-of-variables formula for the invertible affine map $x\mapsto Ax+b$, \begin{align*} \mathcal L^3(u_0(\Omega))=(\det A)\mathcal L^3(\Omega). \end{align*} Hence $u_0$ satisfies the Ciarlet-Necas inequality with equality, so the admissible class with trace $u_0$ is nonempty. For \begin{align*} W(F)=a|F|^p+b_0|\operatorname{cof}F|^q+h(\det F), \end{align*} the stored energy of the affine deformation is \begin{align*} \int_\Omega W(Ju_0)\,d\mathcal L^3=\int_\Omega \left(a|A|^p+b_0|\operatorname{cof}A|^q+h(\det A)\right)\,d\mathcal L^3. \end{align*} The integrand is constant in $x$, so \begin{align*} \int_\Omega W(Ju_0)\,d\mathcal L^3=\mathcal L^3(\Omega)\left(a|A|^p+b_0|\operatorname{cof}A|^q+h(\det A)\right). \end{align*} This quantity is finite whenever $h(\det A)<\infty$. Under the polyconvexity, coercivity, determinant-barrier, weak load-continuity, and weak-closure hypotheses of *Ball Existence Theorem in Nonlinear Elasticity*, the energy therefore attains its minimum on the admissible class. The minimizer is an equilibrium deformation with the prescribed affine boundary placement, and the boundary condition anchors the body while the determinant and non-interpenetration assumptions rule out local volume collapse and admissible self-overlap. [/example] The boundary condition is not a minor technicality: it anchors the body and removes rigid-motion loss of compactness. It also determines whether the admissible class is nonempty, since incompatible boundary data can force folding or compression. [example: Cavitation as Loss of Regularity] Let $B=B(0,1)\subset\mathbb R^n$ and fix $a>0$. For $x\ne 0$, write $r=|x|$ and define the radial map \begin{align*} u(x)=\phi(r)\frac{x}{r},\qquad \phi(r)=(r^n+a^n)^{1/n}. \end{align*} Then \begin{align*} \phi'(r)=\frac{1}{n}(r^n+a^n)^{1/n-1}nr^{n-1}=r^{n-1}(r^n+a^n)^{(1-n)/n}. \end{align*} Since $\phi(r)^n=r^n+a^n$, this can be written as \begin{align*} \phi'(r)=\frac{r^{n-1}}{\phi(r)^{n-1}}. \end{align*} For a radial deformation, the stretch in the radial direction is $\lambda_r=\phi'(r)$, and each of the $n-1$ tangential stretches is $\lambda_t=\phi(r)/r$. Hence \begin{align*} \det Ju_x=\lambda_r\lambda_t^{n-1}=\frac{r^{n-1}}{\phi(r)^{n-1}}\left(\frac{\phi(r)}{r}\right)^{n-1}=1 \end{align*} for every $x\ne 0$. Thus the limiting map is orientation preserving a.e. and does not collapse volume locally. The same formula shows where regularity is lost. As $r\downarrow 0$, \begin{align*} |u(x)|=\phi(r)=(r^n+a^n)^{1/n}\to a. \end{align*} Along the ray $x=re$ with fixed $|e|=1$, we have \begin{align*} u(re)=\phi(r)e\to ae, \end{align*} so different directions $e$ give different limiting values on the sphere of radius $a$. No value assigned at $x=0$ can make $u$ continuous there. The gradient remains integrable in many Sobolev classes. Since $\phi(r)\le (1+a^n)^{1/n}$ for $0<r<1$, the tangential stretch satisfies \begin{align*} \lambda_t^p=\left(\frac{\phi(r)}{r}\right)^p\le (1+a^n)^{p/n}r^{-p}. \end{align*} In polar coordinates, \begin{align*} \int_{B(0,1)} r^{-p}\,d\mathcal L^n=\omega_{n-1}\int_0^1 r^{n-1-p}\,dr, \end{align*} which is finite exactly when $n-p>0$. Thus for $p<n$ the deformation can belong to $W^{1,p}$ while still opening a cavity at the centre. This is the point of the example: Sobolev existence and a.e. orientation preservation can coexist with a non-classical deformation that creates an internal surface, so finite-energy minimizers need not be smooth equilibria. [/example] Cavitation is one reason nonlinear elasticity is not merely a routine application of quasiconvexity. The direct method gives minimizers in a weak class, but regularity and physical interpretation may require additional analysis of singularities, injectivity, and the boundary deformation. [remark: Weak Closure and Strict Positivity] The condition $\det Ju_x>0$ is an open pointwise condition, so it is not generally weakly closed. In existence theorems, strict positivity is recovered through finite energy and determinant blow-up, or the admissible class is formulated with closed distributional and global invertibility conditions. This distinction is central to the difference between formal physical constraints and constraints that survive weak compactness. [/remark] The chapter closes the course by showing how the [abstract direct method](/theorems/3105) adapts to a model where geometry is part of the unknown. Earlier chapters supplied weak compactness and lower semicontinuity; nonlinear elasticity adds the algebra of minors, determinant barriers, and global injectivity conditions. The resulting theorem is powerful precisely because it balances all three demands: compactness, lower semicontinuity, and physical admissibility. Nonlinear elasticity shows how compactness, lower semicontinuity, and geometric admissibility can be balanced in a concrete model. The next chapter turns to limits of this strategy, especially Lavrentiev phenomena and failures of energy-controlled approximation. # 10. Lavrentiev Phenomenon and Limits of the Direct Method After the existence results of Chapters 3-9, this chapter turns to a limitation: the direct method gives a disciplined route from compactness and lower semicontinuity to existence, but it does not say that all reasonable admissible classes give the same infimum. This chapter studies what can go wrong when the energy sees singular behaviour, nonstandard growth, or a topology in which smooth functions are not dense. The main warning is the Lavrentiev phenomenon: minimizing over a Sobolev class can give a strictly smaller value than minimizing over smooth competitors with the same boundary data. The point is not that smooth functions are unimportant. Rather, smooth competitors are reliable only when the analytic structure of the problem gives enough approximation without increasing the energy. The examples in this chapter mark the boundary between the classical direct method and the finer theory of relaxation. ## Gaps Between Smooth and Sobolev Admissible Classes Suppose a variational problem is first written for smooth maps because the Euler-Lagrange equation is easiest to derive there. If the direct method later produces a Sobolev minimizer, the central question is whether the Sobolev minimum is the same number as the smooth minimum. A negative answer means that the weak solution found by compactness is not approximable by classical competitors at the level of energy. [definition: Lavrentiev Gap] Let $U \subset \mathbb R^n$ be open, let $1\le p<\infty$, let $\mathcal A \subset W^{1,p}(U)$ be an admissible class incorporating prescribed boundary data, and let $I:\mathcal A\to(-\infty,\infty]$ be a functional. Let $\mathcal A_{\mathrm{sm}} \subset \mathcal A$ denote the subclass of admissible functions that are smooth in $U$ and satisfy the same boundary condition. A Lavrentiev gap occurs when \begin{align*} \inf_{u \in \mathcal A} I[u] < \inf_{u \in \mathcal A_{\mathrm{sm}}} I[u]. \end{align*} [/definition] This definition isolates a comparison between two closures of the same variational problem. The Sobolev class is natural for compactness, while the smooth class is natural for classical variation and approximation. A gap says that taking the closure of the admissible class and taking the infimum of the energy do not commute. [remark: Interpretation of the Gap] A Lavrentiev gap is not a failure of lower semicontinuity by itself. It is a failure of energy-controlled density: every smooth sequence converging to a Sobolev competitor may pay extra energy in the limit. The direct method may still produce a minimizer in $\mathcal A$, but that minimizer is invisible from the smooth variational problem. [/remark] The simplest way to create such a gap is not merely to put a large weight at a point. A large weight usually makes concentration near that point more expensive. The classical mechanism is subtler: the integrand is designed to vanish along a singular Sobolev curve, while regular competitors cannot follow that curve without smoothing the singularity and paying a definite positive cost. [example: Mania One-Dimensional Gap Mechanism] Let $U=(0,1)$ and set \begin{align*} L(x,u,u')=(u(x)^3-x)^2|u'(x)|^6 . \end{align*} The singular competitor $u_0(x)=x^{1/3}$ belongs to $W^{1,1}(0,1)$ because $u_0'(x)=\frac13 x^{-2/3}$ for $x>0$ and \begin{align*} \int_0^1 |u_0'(x)|\,dx=\frac13\int_0^1 x^{-2/3}\,dx=1 . \end{align*} It satisfies $u_0(0)=0$ and $u_0(1)=1$, while its derivative is not bounded near $0$ since $\frac13 x^{-2/3}\to\infty$ as $x\downarrow0$. Along this curve, \begin{align*} u_0(x)^3-x=(x^{1/3})^3-x=x-x=0 , \end{align*} so \begin{align*} L(x,u_0,u_0')=0^2|u_0'(x)|^6=0 \end{align*} for every $x\in(0,1)$. Now let $u\in C^1([0,1])$ satisfy $u(0)=0$ and $u(1)=1$. Since $u'(0)$ is finite, \begin{align*} u(x)=u'(0)x+o(x) \end{align*} as $x\downarrow0$, and therefore \begin{align*} \frac{|u(x)|}{x^{1/3}}=|u'(0)|x^{2/3}+\frac{o(x)}{x^{1/3}}\to0 . \end{align*} Thus, for all sufficiently small $x>0$, \begin{align*} |u(x)|\le \frac12 x^{1/3}. \end{align*} Cubing gives \begin{align*} u(x)^3\le |u(x)|^3\le \frac18 x , \end{align*} and hence \begin{align*} u(x)^3-x\le -\frac78 x . \end{align*} So a $C^1$ curve initially lies a definite distance below the zero-energy curve $u^3=x$, whereas $u(1)^3-1=0$. To reach the endpoint it must erase this defect, and the energy measures that erasure through the product $(u^3-x)^2|u'|^6$. The Sobolev curve has zero energy because the vanishing factor $u^3-x$ is exact; a $C^1$ curve has finite initial slope, cannot match the singular behaviour of $x^{1/3}$ at $0$, and the resulting lower-bound estimate is the prototype of a Lavrentiev gap. [/example] This example gives the correct mechanism behind the classical construction. The singular Sobolev minimizer is cheap because the integrand vanishes on its graph, not because a transition is compressed into a region with a large weight. The next theorem records the form of the example used in the course. [quotetheorem:8758] [citeproof:8758] The endpoint regularity hypothesis is doing real work: $C^1$ competitors have finite initial slope, whereas the zero-energy Sobolev curve has derivative behaving like $x^{-2/3}$ near $0$. If the smooth comparison class were replaced by the full Sobolev class, the competitor $u_0(x)=x^{1/3}$ would remove the gap at once. If the endpoint condition at $0$ were removed, constant competitors near $0$ could avoid the singular initial matching that drives the estimate. Nonnegativity makes the Sobolev infimum exactly $0$; without it, adding a negative well such as $-(1+|u'|)^{-1}$ would obscure the comparison by allowing negative values unrelated to approximation. Convexity in $u'$ shows that the gap is not caused by nonconvex gradient wells: a nonconvex double-well integrand can have gaps for the separate reason that oscillating Sobolev sequences lower the relaxed energy. The theorem therefore has a precise limitation. It says that continuity, nonnegativity, and convexity in the gradient do not by themselves give energy-controlled density; it does not say that every convex integral functional has a gap. To understand when the obstruction is absent, we next look at density itself. ## Density Failure in Weighted Sobolev Spaces Approximation by smooth functions is usually taken for granted in first courses on Sobolev spaces. The question here is what changes when the norm contains a weight that degenerates or blows up. Since variational energies often define the relevant topology, density in the ordinary $W^{1,p}$ norm is not enough. [definition: Weighted Sobolev Space] Let $U\subset\mathbb R^n$ be open, let $1\le p<\infty$, and let $w:U\to(0,\infty)$ be measurable with $w<\infty$ $\mathcal L^n$-a.e. The weighted Sobolev space $W^{1,p}(U,w)$ consists of functions $u\in L^p(U)$ whose weak gradient $\nabla u$ is an $L^1_{\mathrm{loc}}(U;\mathbb R^n)$ vector field satisfying \begin{align*} \int_U w(x)|\nabla u(x)|^p\,d\mathcal L^n(x)<\infty. \end{align*} It is equipped with the norm \begin{align*} \|u\|_{W^{1,p}(U,w)} = \|u\|_{L^p(U)} + \left(\int_U w(x)|\nabla u(x)|^p\,d\mathcal L^n(x)\right)^{1/p}. \end{align*} [/definition] This is the convention used in this chapter: the function itself is controlled in the ordinary unweighted $L^p$ norm, while only the gradient is measured with the weight. It should be read as an energy-space convention for the examples below, not as the most general theorem about weighted Sobolev spaces. For arbitrary positive finite measurable weights, completeness, extension, trace, and density properties may require additional hypotheses on $w$ and on $U$. Other texts write weighted Sobolev spaces with $\int_U w(|u|^p+|\nabla u|^p)\,d\mathcal L^n$ instead; those spaces can have different density properties when $w$ degenerates or blows up. The weight changes the cost of approximation. If $w$ is very small, gradients can oscillate or concentrate with little energy; if $w$ is very large near a set, smooth approximations may be unable to pass through that set without paying a large gradient cost. The definition keeps $w$ positive and finite almost everywhere so that the weak derivative remains an ordinary distributional derivative; weights that vanish or become infinite on positive-measure sets are better treated as degenerate energies or constrained problems rather than as this normed Sobolev space. Density is therefore a theorem that must be proved from both the geometry of $U$ and the behaviour of $w$, and the next result gives a warning case in which the expected density statement is false. [quotetheorem:8759] [citeproof:8759] The example separates several hypotheses that are often hidden in the phrase "smooth functions are dense." The failure is not caused by a jump across an interior hypersurface; such a jump would not belong to $W^{1,p}$ because its distributional derivative contains a surface measure. Instead, the obstruction is capacitary: the two components touch at a boundary point that smooth-up-to-the-boundary functions must identify. If $p<n$, the same point has zero $W^{1,p}$ capacity and a cutoff in a ball of radius $r$ has energy of order $r^{n-p}$, so the displayed obstruction disappears. If the comparison class is changed from $C^\infty(\bar U)$ to $C^\infty(U)\cap W^{1,p}(U)$, Meyers-Serrin density restores approximation on arbitrary open sets. If the domain is Lipschitz and the weight belongs to a suitable Muckenhoupt class, standard extension and mollification arguments again give density. Weighted Lavrentiev examples use the same principle with the weight, rather than the boundary contact, creating positive capacity for a set that approximation must cross. Since the direct method controls the energy norm, not merely the background Sobolev norm, this stronger density statement is the relevant one. [example: Comparison of Ordinary and Weighted Approximation] Let $U=(-1,1)$, fix $1\le p<\infty$, and take the weight $w(x)=|x|^{-\alpha}$ with $\alpha>0$. Choose a nonzero $\eta\in C_c^\infty((-1,1))$ such that $\eta'$ is supported away from $0$, and define \begin{align*} u_k(x)=k^{-1+1/p-\alpha/p}\eta(kx). \end{align*} Then $u_k\in C_c^\infty((-1,1))$ and $u_k\to0$ in the ordinary $W^{1,p}(-1,1)$ norm. Indeed, \begin{align*} \int_{-1}^{1}|u_k(x)|^p\,dx = k^{-p+1-\alpha}\int_{-1}^{1}|\eta(kx)|^p\,dx. \end{align*} With $y=kx$, so that $dx=dy/k$, this becomes \begin{align*} \int_{-1}^{1}|u_k(x)|^p\,dx = k^{-p-\alpha}\int_{-1}^{1}|\eta(y)|^p\,dy\to0. \end{align*} Also \begin{align*} u_k'(x)=k^{-1+1/p-\alpha/p}\cdot k\eta'(kx) = k^{1/p-\alpha/p}\eta'(kx), \end{align*} and therefore \begin{align*} \int_{-1}^{1}|u_k'(x)|^p\,dx = k^{1-\alpha}\int_{-1}^{1}|\eta'(kx)|^p\,dx = k^{-\alpha}\int_{-1}^{1}|\eta'(y)|^p\,dy\to0. \end{align*} The weighted gradient term behaves differently. Using the same substitution $y=kx$, \begin{align*} \int_{-1}^{1}|x|^{-\alpha}|u_k'(x)|^p\,dx = \int_{-1}^{1}|x|^{-\alpha}k^{1-\alpha}|\eta'(kx)|^p\,dx. \end{align*} Since $|x|^{-\alpha}=k^\alpha|y|^{-\alpha}$ after $y=kx$, this gives \begin{align*} \int_{-1}^{1}|x|^{-\alpha}|u_k'(x)|^p\,dx = \int_{-1}^{1}|y|^{-\alpha}|\eta'(y)|^p\,dy. \end{align*} The last number is positive and finite because $\eta'$ is nonzero and supported away from $0$. Thus $u_k\to0$ in the ordinary Sobolev norm, but the weighted gradient contribution does not tend to $0$. The example shows that ordinary Sobolev approximation does not control approximation in the weighted energy norm when the weight penalizes gradients near the concentration point. [/example] The example explains why Lavrentiev gaps are not merely pathologies of strange boundary data. They arise when the natural topology of the energy is stronger, or differently placed, than the topology in which smooth functions are usually dense. This leads to nonstandard growth, where the exponent itself changes across the domain. ## Lavrentiev Phenomenon for Nonstandard Growth Many modern energies do not have a single power growth condition. They may behave like $|\nabla u|^p$ in one region and like $|\nabla u|^q$ in another, or have an exponent depending on the point. The direct method still asks for coercivity and lower semicontinuity, but density becomes a separate structural condition. [definition: Variable Exponent Energy] Let $U\subset\mathbb R^n$ be open, let $p:U\to[1,\infty)$ be measurable, and let $X$ be an admissible class contained in the variable exponent Sobolev space $W^{1,p(\cdot)}(U)$. A variable exponent energy is a functional $I:X\to(-\infty,\infty]$ of the form \begin{align*} I[u]=\int_U F(x,\nabla u(x))\,d\mathcal L^n(x), \end{align*} where $F:U\times\mathbb R^n\to[0,\infty]$ is measurable in $x$, continuous in $\xi$, and satisfies a two-sided $p(x)$-growth estimate: there are constants $0<c\le C<\infty$ and functions $a,b\in L^1(U)$ with $a,b\ge0$ such that \begin{align*} c|\xi|^{p(x)}-a(x)\le F(x,\xi)\le C(1+|\xi|^{p(x)})+b(x) \end{align*} for all $\xi\in\mathbb R^n$ and for $\mathcal L^n$-a.e. $x\in U$. [/definition] The definition above only specifies the integral energy. Structural results for $W^{1,p(\cdot)}$ usually impose more on the exponent, such as an essential upper bound $p^+<\infty$, and reflexive compactness arguments often require $p^-:=\operatorname*{ess\,inf}_U p>1$. Approximation theorems may also ask for log-Holder type continuity or related modular estimates. The exponent controls both compactness and approximation. If $p(x)$ changes too abruptly, smoothing a function mixes regions with different gradient costs. A transition that is cheap on the Sobolev side may become expensive after mollification because the smoothed gradient is sampled in the higher-growth region. [example: Discontinuous Exponent Warning] Let $U=(-1,1)$ and define \begin{align*} p(x)=2\mathbb 1_{(-1,0)}(x)+4\mathbb 1_{(0,1)}(x). \end{align*} For an absolutely continuous $u$, the modular separates into two different gradient costs: \begin{align*} \int_U |u'(x)|^{p(x)}\,dx=\int_{-1}^{0}|u'(x)|^2\,dx+\int_{0}^{1}|u'(x)|^4\,dx. \end{align*} Thus a derivative profile that is affordable on the left can become expensive if smoothing moves part of it to the right. For example, choose $\beta$ with $\frac14\le\beta<\frac12$ and let, near $0$ on the left, \begin{align*} u'(x)=(-x)^{-\beta}\quad\text{for }-\frac12<x<0, \end{align*} while take $u'(x)=0$ for $0<x<\frac12$. The left contribution is finite because \begin{align*} \int_{-1/2}^{0}|u'(x)|^2\,dx=\int_{-1/2}^{0}(-x)^{-2\beta}\,dx=\int_{0}^{1/2}t^{-2\beta}\,dt<\infty \end{align*} since $2\beta<1$. The right contribution is zero on $(0,\frac12)$ because $u'=0$ there. But if a symmetric mollification averages across $0$, then for small positive $x$ the derivative of the smoothed function contains averages of $(-y)^{-\beta}$ over negative $y$ close to $0$. A model size estimate is \begin{align*} u_\varepsilon'(x)\approx \varepsilon^{-1}\int_{-\varepsilon}^{0}(-y)^{-\beta}\,dy=\varepsilon^{-1}\int_{0}^{\varepsilon}t^{-\beta}\,dt=\frac{1}{1-\beta}\varepsilon^{-\beta}. \end{align*} On an interval of length comparable to $\varepsilon$ on the right, the fourth-power cost then has size \begin{align*} \int_{0}^{c\varepsilon}|u_\varepsilon'(x)|^4\,dx\approx c\varepsilon\left(\frac{1}{1-\beta}\varepsilon^{-\beta}\right)^4=\frac{c}{(1-\beta)^4}\varepsilon^{1-4\beta}. \end{align*} When $\beta=\frac14$, this remains bounded away from $0$; when $\beta>\frac14$, it grows as $\varepsilon\downarrow0$. This calculation does not prove a Lavrentiev gap for the displayed modular by itself, because the gap also depends on the boundary data, admissible class, and allowed approximation procedure. It shows the diagnostic obstruction: smoothing near a jump in $p(x)$ can transfer gradients from a lower-growth region into a higher-growth region, so discontinuity of the exponent must be controlled before density or energy convergence is asserted. [/example] The warning example shows the mechanism that prevention theorems are designed to exclude, but it is not itself a verified gap. It leaves open the central approximation question: under which hypotheses can every Sobolev competitor be replaced by smooth zero-boundary perturbations without changing the energy? A standard repair is to assume ordinary $p$-growth together with enough continuity of the integrand on bounded ranges of the function and gradient variables; then truncation controls the admissible perturbation, mollification controls $\nabla u$, and the integral functional is continuous along the resulting strong $W^{1,p}$ approximations. [quotetheorem:8760] [citeproof:8760] Each hypothesis repairs a specific approximation obstruction. The bounded open set $U$ fixes the Sobolev space and the zero-boundary perturbation class $W^{1,p}_0(U)$, while the datum $u_0\in W^{1,p}(U)$ fixes the affine admissible class being approximated by functions of the form $u_0+\varphi$ with $\varphi\in C_c^\infty(U)$. Continuity of $F$ prevents small changes in $(x,u,\nabla u)$ from producing uncontrolled jumps in the integrand along mollified approximants. The upper $p$-growth bound gives uniform integrability of the energies along strongly convergent Sobolev approximations; without a bound of this kind, smoothing can move gradient mass into a region where the energy cost is much larger, as in the preceding variable-growth warning. The theorem is therefore an energy-density approximation result, not a lower-semicontinuity theorem and not a convexity criterion. It does not cover discontinuous variable exponents, singular weights, or super-$p$ growth, which is why those cases can still exhibit Lavrentiev gaps. Once this approximation property is known, the direct method and the classical smooth formulation are aligned. [remark: Role of Log-Holder Type Conditions] In variable exponent problems, assumptions such as log-Holder continuity of $p(x)$ are often used to recover good approximation and modular estimates. Their role is to prevent abrupt jumps in the growth rate from being exploited by Sobolev competitors. The exact condition depends on the function space and the boundary class, but the principle is that the exponent must vary slowly enough for mollification to behave predictably. [/remark] The absence of a gap is therefore an approximation theorem, not a formal consequence of convexity. In applications, it must be checked at the same level of seriousness as coercivity and lower semicontinuity. ## What the Direct Method Proves The direct method is sometimes summarized as coercivity plus lower semicontinuity. This slogan is useful for existence, but it hides the distinction between existence in a relaxed Sobolev class and fidelity to the original variational problem. The final task of this chapter is to separate these conclusions. [quotetheorem:8761] [citeproof:8761] The assumptions are exactly the compactness package needed for this argument, and each has a concrete failure mode. Reflexivity supplies weakly convergent subsequences; in $X=L^1(0,1)$, the minimizing sequence for $I[u]=\int_0^1 xu(x)\,dx$ over nonnegative $u$ with $\|u\|_{L^1}=1$ can concentrate near $0$ and has no weakly convergent subsequence in $L^1$. Weak closedness prevents the limit from leaving the admissible class; in $X=\mathbb R$, the continuous functional $I[x]=x$ on the admissible set $\mathcal A=(0,1)$ has infimum $0$, approached by $x_k=1/k$, but the [limit point](/page/Limit%20Point) $0$ is not admissible. Coercivity turns minimizing sequences into bounded sequences; for $I[x]=e^x$ on $X=\mathbb R$, the infimum is $0$ but the minimizing sequence $x_k=-k$ escapes to infinity. Lower semicontinuity passes the inequality to the limit; for $I[x]=0$ when $x\ne0$ and $I[0]=1$ on $\mathbb R$, the sequence $x_k\to0$ has $\liminf I[x_k]=0<I[0]$, so the infimum is not attained. This theorem gives a minimizer in the class and topology used in its hypotheses. It does not identify that minimizer with a limit of smooth competitors, does not prove regularity, and does not guarantee that the Euler-Lagrange equation derived on smooth variations captures the full minimization problem. [explanation: Three Separate Questions] Existence asks whether the infimum is attained in the chosen weakly closed class. Approximation asks whether smooth or otherwise regular competitors are dense in the energy sense. Regularity asks whether the minimizer obtained by compactness has additional smoothness or satisfies a stronger equation. The direct method answers the first question under coercivity and lower semicontinuity. The second question is answered by density and relaxation theory. The third question requires regularity estimates, Euler-Lagrange analysis, or structural assumptions beyond the direct method. [/explanation] The practical consequence is that the admissible class is part of the problem data. Changing from smooth functions to Sobolev functions is not just a technical convenience unless an approximation theorem justifies it. Without such a theorem, the direct method may solve the relaxed problem rather than the classical one. This distinction also appears outside the model examples of this chapter. In weak formulations of elliptic PDE, the chosen energy space determines which boundary values and test functions are legitimate. In numerical approximation, conforming finite element spaces inherit the same question: convergence of discrete minimizers to the Sobolev minimizer does not by itself identify the smooth or classical infimum. In geometric measure theory, relaxation replaces smooth surfaces by currents or sets of finite perimeter for the same reason: compactness is gained only after enlarging the class, and the relaxed value must then be compared with the original problem. [example: Smooth Versus Sobolev Competitors] For the Dirichlet energy \begin{align*} I[u]=\int_U |\nabla u|^2\,d\mathcal L^n \end{align*} on the affine Sobolev class $u_0+H^1_0(U)$, write \begin{align*} m=\inf_{u\in u_0+H^1_0(U)} I[u],\qquad m_{\mathrm{sm}}=\inf_{\varphi\in C_c^\infty(U)} I[u_0+\varphi]. \end{align*} Since $u_0+C_c^\infty(U)\subset u_0+H^1_0(U)$, we immediately have $m\le m_{\mathrm{sm}}$. For the reverse inequality, fix $u=u_0+v\in u_0+H^1_0(U)$ and choose $\varphi_j\in C_c^\infty(U)$ with $\varphi_j\to v$ strongly in $H^1_0(U)$. Set $u_j=u_0+\varphi_j$. Then $\nabla u_j\to\nabla u$ in $L^2(U;\mathbb R^n)$, and the quadratic identity \begin{align*} |\nabla u_j|^2-|\nabla u|^2=(\nabla u_j-\nabla u)\cdot(\nabla u_j+\nabla u) \end{align*} gives \begin{align*} |I[u_j]-I[u]|\le \|\nabla u_j-\nabla u\|_{L^2(U)}\bigl(\|\nabla u_j\|_{L^2(U)}+\|\nabla u\|_{L^2(U)}\bigr) \end{align*} by Cauchy-Schwarz. The first factor tends to $0$, and the second factor stays bounded because $\nabla u_j\to\nabla u$ in $L^2$. Hence $I[u_j]\to I[u]$. If $I[u]\le m+\varepsilon$, then some smooth competitor $u_j$ satisfies $I[u_j]\le m+2\varepsilon$, so $m_{\mathrm{sm}}\le m+2\varepsilon$. Letting $\varepsilon\downarrow0$ gives $m_{\mathrm{sm}}\le m$, and therefore $m=m_{\mathrm{sm}}$. For a weighted energy such as $\int_U w(x)|\nabla u|^2\,d\mathcal L^n$ or a variable-growth energy such as $\int_U |\nabla u|^{p(x)}\,d\mathcal L^n$, the same argument breaks at the convergence step: $\nabla u_j\to\nabla u$ in ordinary $L^2$ need not imply $w^{1/2}\nabla u_j\to w^{1/2}\nabla u$ in $L^2$, and convergence in one fixed Sobolev norm need not control the modular with a changing exponent. Thus two energies with the same formal Euler-Lagrange shape can have different infima after passing from smooth competitors to Sobolev competitors. [/example] The chapter closes the direct-method part of the course with a caution. Compactness and lower semicontinuity are powerful enough to prove existence, but they do not by themselves protect the original smooth problem from relaxation effects. Lavrentiev phenomena measure exactly this loss: the analytic completion of the admissible class changes the value of the variational problem. Lavrentiev phenomena show that existence in a weak class does not automatically recover the classical smooth problem. The next chapter separates that existence question from post-existence regularity, asking what additional structure can make minimizers smoother after they have been found. # 11. Regularity as a Post-Existence Question Regularity enters the course only after the direct method has done its job. Chapters 2-10 built existence from coercivity, compactness, and weak lower semicontinuity, often in function spaces where minimizers are only known as weak limits. This chapter asks what extra information the variational structure gives after existence: when does a minimizer become more regular, and when must singularities be accepted as part of the problem? The guiding distinction is between existence hypotheses and regularity hypotheses. Convexity, ellipticity, and growth conditions may produce estimates that improve a minimizer locally, but they do not erase the vectorial obstructions seen in quasiconvexity and geometric constraints. ## Minimal Regularity from Convexity and Ellipticity Once a minimizer $u \in W^{1,p}(U)$ has been constructed, the first regularity question is not whether $u$ is smooth, but whether its gradient has any local quantitative control beyond integrability. The basic mechanism is comparison with local perturbations: replace $u$ by $u-\eta(u-c)$ inside a ball and use minimality to estimate energy on a smaller ball by oscillation on a larger ball. To state the estimate in a form that matches the direct method, we isolate the integrand assumptions that give both existence and local energy control. Coercive growth prevents minimizing sequences from escaping in $W^{1,p}$; without it, a minimizing sequence may drive $|\nabla u_k|$ to infinity while the energy stays bounded. Convexity and ellipticity are the local inputs: without them, comparison with a cut-off competitor does not force energy on a smaller ball to be controlled by oscillation on a larger ball. [definition: Convex P-Growth Integrand] Let $1<p<\infty$ and let $F:\mathbb R^{m\times n}\to \mathbb R$ be a $C^1$ convex function. The integrand $F$ has convex $p$-growth and $p$-ellipticity if there exist constants $0<\lambda\le \Lambda<\infty$ such that for all $A,B\in \mathbb R^{m\times n}$, \begin{align*} \lambda |A|^p-\Lambda \le F(A)\le \Lambda(1+|A|^p) \end{align*} and \begin{align*} (DF(A)-DF(B)):(A-B)\ge \lambda |V_p(A)-V_p(B)|^2, \end{align*} where \begin{align*} V_p:\mathbb R^{m\times n}\to \mathbb R^{m\times n},\qquad V_p(A)=(1+|A|^2)^{(p-2)/4}A. \end{align*} [/definition] The lower bound is the coercive part already used in the direct method, while $p$-ellipticity is the new local ingredient. The next theorem is needed because regularity arguments require estimates on smaller balls whose constants depend only on the structure of the functional, not on the particular minimizer. [quotetheorem:8762] [citeproof:8762] This inequality is the regularity analogue of coercivity: it is local, scale-sensitive, and stable under replacing the centre value $c$ by an average. Each hypothesis has a distinct role. Local minimality is what permits the cut-off comparison; a general weak solution or arbitrary Sobolev map need not satisfy the estimate. Coercivity turns energy into control of $|\nabla u|^p$; if the lower growth bound is removed, finite energy need not prevent concentration of gradients. Convexity and ellipticity make the comparison quantitative; without them, nonconvex wells can favour oscillating gradients rather than penalizing them, so the displayed estimate is not forced. The estimate also has a precise limitation: by itself it gives local integral control, not continuity of $u$, continuity of $\nabla u$, or higher differentiability. Combining it with Poincare-type inequalities, reverse Hölder inequalities, or scalar iteration is what produces stronger regularity conclusions in later steps. [example: Scalar Uniformly Convex P-Growth Energy] Let $F:\mathbb R^n\to\mathbb R$ be given by \begin{align*} F(A)=\frac{1}{p}(1+|A|^2)^{p/2}. \end{align*} For $B\in\mathbb R^n$, differentiating the function $t\mapsto F(A+tB)$ at $t=0$ gives \begin{align*} DF(A)\cdot B=(1+|A|^2)^{(p-2)/2}A\cdot B. \end{align*} Differentiating once more gives \begin{align*} D^2F(A)[B,B]=(1+|A|^2)^{(p-2)/2}|B|^2+(p-2)(1+|A|^2)^{(p-4)/2}(A\cdot B)^2. \end{align*} If $p\ge 2$, the second term is nonnegative, so \begin{align*} D^2F(A)[B,B]\ge (1+|A|^2)^{(p-2)/2}|B|^2. \end{align*} If $1<p<2$, then $(A\cdot B)^2\le |A|^2|B|^2$, hence \begin{align*} D^2F(A)[B,B]\ge (1+|A|^2)^{(p-4)/2}(1+(p-1)|A|^2)|B|^2. \end{align*} Since $1+(p-1)|A|^2\ge (p-1)(1+|A|^2)$, this becomes \begin{align*} D^2F(A)[B,B]\ge (p-1)(1+|A|^2)^{(p-2)/2}|B|^2. \end{align*} Thus the regularization by $1+|A|^2$ prevents degeneracy at $A=0$, and this model satisfies the same $V_p$-ellipticity scale as in the convex $p$-growth hypothesis. Now let $u\in W^{1,p}(U)$ minimize \begin{align*} \int_U F(\nabla v)\,d\mathcal L^n \end{align*} among functions with fixed boundary trace. For $B(x_0,R)\subset U$ and $0<r<R$, apply *Caccioppoli Inequality for Convex Integral Functionals* with \begin{align*} c=u_{B(x_0,R)}=\frac{1}{\mathcal L^n(B(x_0,R))}\int_{B(x_0,R)}u\,d\mathcal L^n. \end{align*} This gives \begin{align*} \int_{B(x_0,r)}|\nabla u|^p\,d\mathcal L^n\le \frac{C}{(R-r)^p}\int_{B(x_0,R)}|u-u_{B(x_0,R)}|^p\,d\mathcal L^n+CR^n. \end{align*} By *Poincare inequality* on the ball $B(x_0,R)$, \begin{align*} \int_{B(x_0,R)}|u-u_{B(x_0,R)}|^p\,d\mathcal L^n\le C_P R^p\int_{B(x_0,R)}|\nabla u|^p\,d\mathcal L^n. \end{align*} Substituting this into the Caccioppoli estimate yields \begin{align*} \int_{B(x_0,r)}|\nabla u|^p\,d\mathcal L^n\le \frac{CC_P R^p}{(R-r)^p}\int_{B(x_0,R)}|\nabla u|^p\,d\mathcal L^n+CR^n. \end{align*} For example, taking $r=R/2$ gives \begin{align*} \int_{B(x_0,R/2)}|\nabla u|^p\,d\mathcal L^n\le 2^pCC_P\int_{B(x_0,R)}|\nabla u|^p\,d\mathcal L^n+CR^n. \end{align*} The estimate converts oscillation of $u$ on a larger ball into quantitative control of $\nabla u$ on a smaller ball, which is the local input used before boundedness and Hölder iteration arguments. [/example] The scalar case has an additional PDE structure: the Euler-Lagrange equation is a scalar elliptic equation, and scalar elliptic theory has order and comparison principles that do not survive in the same form for systems. The next quoted theorem records the regularity conclusion that this extra scalar structure provides, so that we can compare it with the weaker vectorial conclusions below. [quotetheorem:8763] This theorem is quoted from scalar elliptic regularity rather than proved in this course. Its hypotheses mark the edge of the result, and concrete examples show what each boundary means. Uniform ellipticity prevents degeneration: if $a(x)=0$ on a ball and $a(x)=I$ outside it, then any $H^1_0$ function supported inside the zero-coefficient ball solves the displayed equation weakly, so the equation gives no interior Hölder control there. The scalar assumption is also essential; elliptic systems with bounded measurable coefficients have counterexamples, due to De Giorgi, where weak solutions are discontinuous. The bounded measurable coefficient hypothesis is deliberately weak and therefore gives only [Hölder continuity](/page/H%C3%B6lder%20Continuity): for example, in the upper and lower half-balls take $a=1$ above $\{x_n=0\}$ and $a=\kappa$ below it, with $\kappa\ne 1$; the piecewise affine function with matching flux across the interface is a weak solution but its gradient jumps across the interface. Thus measurability is compatible with Hölder regularity, while differentiability requires stronger coefficient assumptions. [remark: Regularity Uses More Than Existence] The direct method for the scalar Dirichlet integral needs weak compactness, coercivity, and weak lower semicontinuity. Hölder regularity needs local energy inequalities, ellipticity, and iteration methods. These are different layers of the theory, and failure at the second layer does not invalidate existence at the first. [/remark] ## Partial Regularity in the Vectorial Case The next question is what remains when the unknown is vector-valued and convexity is replaced by quasiconvexity. Existence can still be proved under quasiconvexity and growth assumptions, but regularity no longer follows from a scalar maximum principle or comparison argument. The best general statement is partial regularity: the minimizer is smooth away from a singular set whose size can be controlled. To formulate such a result, we need language that treats regularity point by point. A Sobolev minimizer may behave well on one region and badly on another, so the regular set and singular set become part of the conclusion rather than extra assumptions. [definition: Regular and Singular Points of a Sobolev Minimizer] Let $u\in W^{1,p}_{\mathrm{loc}}(U;\mathbb R^m)$. A point $x_0\in U$ is a $C^{1,\alpha}$ regular point of $u$ if there exist $r>0$ and $\alpha\in(0,1)$ such that $u$ has a representative in $C^{1,\alpha}(B(x_0,r);\mathbb R^m)$. The singular set of $u$ is \begin{align*} \operatorname{Sing}(u)=U\setminus \operatorname{Reg}(u), \end{align*} where $\operatorname{Reg}(u)$ is the set of regular points. [/definition] This definition separates the local nature of regularity from the global existence statement. In vectorial problems, even minimizers of well-structured quasiconvex energies can develop singularities, so the realistic question is not whether every point is smooth but whether the bad points are confined to a small exceptional set. [quotetheorem:8764] This theorem is quoted in statement form as a standard structural version of partial regularity. The proof belongs to vectorial regularity theory: it combines blow-up, excess decay, harmonic approximation, and fine properties of quasiconvex integrands. The assumptions explain why the conclusion is partial rather than global, and each one rules out a concrete pathology. Quasiconvexity is the lower-semicontinuity hypothesis that replaces convexity in vectorial problems. In one dimension, where quasiconvexity reduces to convexity, the nonconvex double-well density $f(s)=(s^2-1)^2$ allows fine sawtooth competitors with slopes close to $1$ and $-1$; this illustrates how oscillating gradients can lower the relaxed energy seen by weak limits. The $C^2$ assumption is what permits a second-order expansion around an approximate affine map. For instance, an integrand such as $F(A)=|A|^p+|A_{11}|$ has a corner along $\{A_{11}=0\}$, so the quadratic linearization used in harmonic approximation is not available there. Uniform Legendre-Hadamard ellipticity supplies the elliptic linear model; if $F(A)=|A_{11}|^2$ in a problem with several gradient components, then variations in the uncontrolled components carry no quadratic cost, so excess decay cannot control the full gradient. The $p$-growth bounds keep the blow-up sequence in the correct Sobolev scale: a bounded density such as $F(A)=\arctan |A|$ gives no coercive control of $|\nabla u|^p$, while super-$p$ growth changes the compactness scale after rescaling. Even with the displayed hypotheses, full regularity is false in general: constrained harmonic maps such as the degree-one radial map show that vectorial geometric minimizers can have isolated singularities. [example: Vectorial Minimizer with a Possible Singular Set] Let $U=B(0,1)\subset\mathbb R^3$ and consider the constrained class of maps $v\in W^{1,2}(U;S^2)$ whose trace on $\partial U$ is $v(x)=x/|x|$. The map \begin{align*} u(x)=\frac{x}{|x|} \end{align*} belongs to $W^{1,2}(U;S^2)$: for $x\ne 0$, writing $r=|x|$, its components satisfy \begin{align*} \partial_j u_i=\partial_j\left(\frac{x_i}{r}\right)=\frac{\delta_{ij}}{r}-\frac{x_i x_j}{r^3}. \end{align*} Since $u_i=x_i/r$, this is \begin{align*} \partial_j u_i=\frac{1}{r}(\delta_{ij}-u_i u_j). \end{align*} Thus \begin{align*} \nabla u=\frac{1}{r}(I-u\otimes u). \end{align*} The matrix $P=I-u\otimes u$ is the [orthogonal projection](/theorems/437) onto the plane perpendicular to $u$, so its eigenvalues are $1,1,0$. Hence \begin{align*} |\nabla u|^2=\frac{|P|^2}{r^2}=\frac{1^2+1^2+0^2}{r^2}=\frac{2}{r^2}. \end{align*} Using spherical coordinates in $\mathbb R^3$, \begin{align*} E[u]=\int_{B(0,1)}|\nabla u|^2\,d\mathcal L^3=\int_0^1\int_{S^2}\frac{2}{r^2}r^2\,d\mathcal H^2\,dr=2\mathcal H^2(S^2)\int_0^1 1\,dr=8\pi. \end{align*} By *Minimality of the Degree-One Radial Harmonic Map*, this $u$ minimizes the Dirichlet energy in the constrained class with boundary trace $x/|x|$. The map is smooth on $U\setminus\{0\}$ because each component $x_i/|x|$ is a smooth quotient there. It has no continuous representative at $0$: if $a\in S^2$ and $t\in(0,1)$, then \begin{align*} u(ta)=\frac{ta}{|ta|}=a. \end{align*} Choosing two different unit vectors $a,b\in S^2$ gives two sequences $ta\to 0$ and $tb\to 0$ along which the values of $u$ converge to $a$ and $b$, respectively. Therefore no single value assigned at $0$ can make $u$ continuous there. This realizes the partial-regularity picture with regular set $U_0=U\setminus\{0\}$ and singular set $\{0\}$: the exceptional set has measure zero, but it is forced by the vectorial constraint. [/example] The previous statement is deliberately weaker than a scalar Hölder theorem. It also explains why the vectorial theory is not a minor modification of the scalar one: quasiconvexity is the existence condition, but it does not supply full regularity. [example: Harmonic Maps as a Warning Example] Take the concrete model $U=B(0,1)\subset\mathbb R^3$ with target $S^2$ and boundary trace $g(x)=x/|x|$ on $\partial U$. The radial map \begin{align*} u(x)=\frac{x}{|x|} \end{align*} satisfies $|u(x)|=|x|/|x|=1$ for $x\ne 0$, so it is admissible as an $S^2$-valued Sobolev map once we check finite energy. Writing $r=|x|$, for $x\ne 0$ we have \begin{align*} \partial_j u_i=\partial_j(x_i r^{-1})=\delta_{ij}r^{-1}+x_i\partial_j(r^{-1}). \end{align*} Since $\partial_j r=x_j/r$, the chain rule gives \begin{align*} \partial_j(r^{-1})=-r^{-2}\partial_j r=-\frac{x_j}{r^3}. \end{align*} Therefore \begin{align*} \partial_j u_i=\frac{\delta_{ij}}{r}-\frac{x_i x_j}{r^3}=\frac{1}{r}\left(\delta_{ij}-\frac{x_i}{r}\frac{x_j}{r}\right)=\frac{1}{r}(\delta_{ij}-u_i u_j). \end{align*} Thus $\nabla u=r^{-1}(I-u\otimes u)$. The matrix $I-u\otimes u$ is the orthogonal projection onto $u^\perp$, so its eigenvalues are $1,1,0$, and hence \begin{align*} |\nabla u|^2=\frac{|I-u\otimes u|^2}{r^2}=\frac{1^2+1^2+0^2}{r^2}=\frac{2}{r^2}. \end{align*} Using spherical coordinates in $\mathbb R^3$, \begin{align*} \int_{B(0,1)}|\nabla u|^2\,d\mathcal L^3=\int_0^1\int_{S^2}\frac{2}{r^2}r^2\,d\mathcal H^2\,dr=2\mathcal H^2(S^2)\int_0^1 1\,dr=8\pi. \end{align*} So $u\in W^{1,2}(B(0,1);S^2)$, and by the standard minimality theorem for the degree-one radial harmonic map, this $u$ minimizes the Dirichlet energy in the constrained class with trace $x/|x|$. The same formula shows the singularity is not an artifact of notation. If $a\in S^2$ and $t\in(0,1)$, then \begin{align*} u(ta)=\frac{ta}{|ta|}=\frac{ta}{t|a|}=a. \end{align*} For two distinct unit vectors $a,b\in S^2$, the sequences $ta\to 0$ and $tb\to 0$ give limiting values $a$ and $b$. No single value assigned at $0$ can make $u$ continuous there, although each component $x_i/|x|$ is smooth on $B(0,1)\setminus\{0\}$. Thus even the quadratic energy $\int |\nabla u|^2$ can have a singular minimizer when the admissible maps are constrained to lie on a nonlinear target. [/example] The harmonic-map example is not a contradiction of elliptic regularity. The Euler-Lagrange equation is a constrained nonlinear system, and the admissible variations must respect the target manifold. This places the problem in the vectorial and geometric regime, where partial regularity is the natural endpoint. ## Existence Theory and Regularity Theory The last question of the chapter is methodological: what exactly did the direct method prove, and what remains after it? The direct method proves that a minimizing sequence has a weakly convergent subsequence whose limit is admissible and attains the infimum. Regularity theory studies additional structure of that minimizer, usually through local estimates and the Euler-Lagrange equation. [explanation: The Two-Layer View] Existence is a compactness statement. Its inputs are an admissible class closed under the relevant convergence, coercivity or boundedness of minimizing sequences, and weak lower semicontinuity of the functional. Regularity is a stability and improvement statement. Its inputs are local comparison inequalities, ellipticity, decay estimates, and sometimes the special structure of scalar equations, convex integrands, or geometric constraints. The layers interact but do not coincide. A functional can have minimizers with poor regularity, and a formal Euler-Lagrange equation can be smooth in regimes where the direct method does not give compactness in the desired class. [/explanation] This separation helps organize the whole course. The warning that should accompany every existence result in the direct method is that attaining the infimum in a weak space is not the same thing as obtaining a classical solution. [remark: Existence Does Not Imply Full Regularity] The direct method uses weak compactness to pass from a minimizing sequence to a weak limit and lower semicontinuity to identify that limit as a minimizer. These arguments occur in spaces such as $W^{1,p}$ and do not control pointwise behaviour beyond what is already encoded by the space. A concrete model is the Dirichlet energy for maps into a sphere: compactness and lower semicontinuity can give an energy-minimizing harmonic map, while the topology of the boundary data may force a singularity at the origin, as in the radial map. This is why a [regularity theorem](/theorems/2750) needs additional hypotheses, such as scalar structure, uniform ellipticity, convexity strong enough to give Caccioppoli estimates, or an excess-decay mechanism. The limitation is also precise: the conclusion is not that minimizers are usually irregular, but that existence alone is silent about regularity. [/remark] This remark closes the conceptual gap between the direct-method existence results and the regularity results quoted in this chapter. In later applications the next step after existence is therefore to identify the local estimate available: a Caccioppoli inequality for convex energies, De Giorgi-Nash-Moser iteration for scalar uniformly elliptic equations, or partial regularity machinery for vectorial quasiconvex problems. The same split appears beyond the course: elliptic PDE systems use it to distinguish weak solvability from smoothness, geometric analysis uses it to isolate singular sets, and materials models use quasiconvexity to describe microstructure without expecting classical gradients everywhere. [remark: What To Carry Forward] When reading a variational existence theorem, record both its compactness assumptions and its structural assumptions. Coercivity, boundary closure, and lower semicontinuity answer the existence question. Convexity, ellipticity, scalar structure, and comparison estimates answer the post-existence regularity question. [/remark] The course now has its basic division of labour. Direct methods locate minimizers in weak spaces; regularity theory asks how much those minimizers improve. The answer ranges from Hölder continuity in scalar elliptic problems, through Caccioppoli-type estimates under convexity, to partial regularity and singular sets in vectorial quasiconvex problems. The existence theory developed so far shows how compactness, coercivity, lower semicontinuity, admissibility, and regularity fit together while also revealing their separate roles. The final chapter synthesizes these criteria into a decision framework for choosing the right direct-method hypothesis. # 12. Synthesis: Choosing the Right Existence Framework ## The Direct-Method Checklist These notes synthesize the existence theory developed in the second part of the calculus of variations course. The prerequisites are the Sobolev-space compactness and weak convergence tools from Chapters 1-2, the lower-semicontinuity criteria for integral functionals from Chapters 3-4, and the constraint mechanisms introduced in Chapters 5 and 7-9 on obstacles, quasiconvexity, polyconvexity, and nonlinear elasticity. The aim is not to add a new existence theorem, but to organize the earlier results into a workflow for deciding which framework fits a given variational problem. A variational problem usually arrives with more data than a single functional: there is an admissible class, a topology in which compactness is expected, boundary or integral constraints, and a structural hypothesis on the energy density. The first question is therefore not "what is the Euler-Lagrange equation?" but "which topology makes minimizing sequences compact and which theorem makes the energy lower semicontinuous?" To make that question precise, the functional, class, and convergence must be recorded together. This prevents a common mistake: proving a compactness statement in one topology while using a closedness or lower-semicontinuity statement that belongs to another. [definition: Admissible Direct-Method Problem] Let $X$ be a Banach space, let $\tau$ be a topology on $X$, let $\mathcal A \subset X$ be nonempty, and let $I:\mathcal A \to (-\infty,\infty]$ be a functional. The tuple $(X,\tau,\mathcal A,I)$ is an admissible direct-method problem if minimization is studied through sequences or nets in $\mathcal A$ with respect to $\tau$. [/definition] The point of naming the whole tuple is that existence is not a property of $I$ alone. The same formula may be coercive in one space and useless in another, while a constraint may be closed for strong convergence but fail to be closed for weak convergence. This motivates the following template theorem, which isolates the exact hypotheses that turn a minimizing sequence into a minimizer. [quotetheorem:8765] [citeproof:8765] This theorem is deliberately abstract, and each hypothesis excludes a distinct mode of failure. If $I$ is not bounded below, for instance $I[u]=-\|u\|_{L^2(U)}^2$ on $L^2(U)$, the infimum is $-\infty$ and there is no value for a minimizer to attain. If compactness fails, the functional $I[u]=e^{-u}$ on $\mathbb R$ has infimum $0$ but every minimizing sequence escapes to $+\infty$. If closedness fails, minimizing $|x|^2$ over $(0,1)\subset\mathbb R$ drives the sequence to $0$, which is outside the admissible class. If lower semicontinuity fails, a weak limit can have larger energy than the oscillating sequence that approaches it, which is the mechanism behind nonconvex gradient energies. The theorem also says less than some existence arguments need. It does not identify Euler-Lagrange equations, uniqueness, regularity, or stability under perturbation of the data. In Sobolev-space problems, the compactness step is usually obtained from coercivity and reflexivity; closedness is a trace, boundary, or constraint issue; lower semicontinuity is the structural part of the problem. A basic Dirichlet example shows how the four hypotheses are checked in practice. [example: Dirichlet Energy With Forcing] Let $U\subset\mathbb R^n$ be bounded with Lipschitz boundary, let $1<p<\infty$, let $g\in W^{1,p}(U)$, and let $f\in L^{p'}(U)$, where $p'$ is determined by \begin{align*} \frac{1}{p}+\frac{1}{p'}=1. \end{align*} On $\mathcal A=g+W^{1,p}_0(U)$, define \begin{align*} I[u]=\int_U |\nabla u|^p\,d\mathcal L^n-\int_U f u\,d\mathcal L^n. \end{align*} The class is nonempty because $g\in\mathcal A$, and $I[g]$ is finite since $|\nabla g|^p\in L^1(U)$ and $fg\in L^1(U)$ by *Holder's inequality*. Write $v=u-g\in W^{1,p}_0(U)$. By *Poincare's inequality*, there is a constant $C_P$ such that \begin{align*} \|u-g\|_{L^p(U)}=\|v\|_{L^p(U)}\le C_P\|\nabla v\|_{L^p(U)}=C_P\|\nabla u-\nabla g\|_{L^p(U)}. \end{align*} Hence \begin{align*} \|u\|_{L^p(U)}\le C_P\|\nabla u\|_{L^p(U)}+C_P\|\nabla g\|_{L^p(U)}+\|g\|_{L^p(U)}. \end{align*} Using *Holder's inequality* for the forcing term, \begin{align*} \left|\int_U fu\,d\mathcal L^n\right|\le \|f\|_{L^{p'}(U)}\|u\|_{L^p(U)}. \end{align*} Combining the last two estimates gives constants $B,D\ge0$, depending only on $f,g,p,U$, such that \begin{align*} \left|\int_U fu\,d\mathcal L^n\right|\le B\|\nabla u\|_{L^p(U)}+D. \end{align*} Therefore \begin{align*} I[u]\ge \|\nabla u\|_{L^p(U)}^p-B\|\nabla u\|_{L^p(U)}-D. \end{align*} By *Young's inequality*, $B\|\nabla u\|_{L^p(U)}\le \frac12\|\nabla u\|_{L^p(U)}^p+C$ for some constant $C$, so \begin{align*} I[u]\ge \frac12\|\nabla u\|_{L^p(U)}^p-(C+D). \end{align*} Thus $I$ is bounded below on $\mathcal A$, and every minimizing sequence is bounded in the gradient norm. The preceding Poincare estimate then also bounds $\|u_k\|_{L^p(U)}$, so every minimizing sequence $(u_k)$ is bounded in $W^{1,p}(U)$. Since $1<p<\infty$, the space $W^{1,p}(U)$ is reflexive, so a bounded minimizing sequence has a subsequence $u_{k_j}\rightharpoonup u$ weakly in $W^{1,p}(U)$. Because $u_{k_j}-g\in W^{1,p}_0(U)$ and $W^{1,p}_0(U)$ is a closed convex subspace, it is weakly closed; hence $u-g\in W^{1,p}_0(U)$ and $u\in\mathcal A$. The map $z\mapsto |z|^p$ is convex, so the gradient term is weakly lower semicontinuous: \begin{align*} \int_U |\nabla u|^p\,d\mathcal L^n\le \liminf_{j\to\infty}\int_U |\nabla u_{k_j}|^p\,d\mathcal L^n. \end{align*} Also, $u_{k_j}\rightharpoonup u$ in $L^p(U)$, so the bounded linear functional $w\mapsto\int_U fw\,d\mathcal L^n$ satisfies \begin{align*} \int_U fu_{k_j}\,d\mathcal L^n\to \int_U fu\,d\mathcal L^n. \end{align*} Consequently, \begin{align*} I[u]\le \liminf_{j\to\infty} I[u_{k_j}]. \end{align*} All four direct-method hypotheses are therefore satisfied: finite infimum, weak compactness of minimizing sequences, weak closedness of $\mathcal A$, and weak lower semicontinuity of $I$. Applying the direct-method template gives a minimizer $u\in g+W^{1,p}_0(U)$ for the forced Dirichlet energy. [/example] This model example shows how each line of the template should be accounted for separately. Failure at any one line points to a different remedy: strengthen coercivity, change the space, close the constraint, or replace the integrand by a relaxed one. ## Choosing a Convexity Hypothesis Integral functionals depend on gradients, and weak convergence of gradients is too weak to preserve pointwise nonlinearities. The second organizing question is therefore: which convexity condition matches the dimension and structure of the unknown? [definition: Convex Integrand] Let $m,n\in\mathbb N$ and let $f:\mathbb R^{m\times n}\to\mathbb R$. The function $f$ is convex if for every $F,G\in\mathbb R^{m\times n}$ and every $t\in[0,1]$, \begin{align*} f(tF+(1-t)G)\le t f(F)+(1-t)f(G). \end{align*} [/definition] Convexity is the natural condition when the gradient behaves like a free variable, and it is also the right hypothesis for scalar Sobolev problems in many first courses. Vectorial problems require more care because gradients cannot oscillate arbitrarily; they are constrained by compatibility, so the next condition tests only oscillations that arise from gradients. [definition: Quasiconvex Integrand] Let $f:\mathbb R^{m\times n}\to\mathbb R$ be Borel measurable. The function $f$ is quasiconvex if for every bounded open set $U\subset\mathbb R^n$, every $F\in\mathbb R^{m\times n}$, and every $\varphi\in W^{1,\infty}_0(U;\mathbb R^m)$, \begin{align*} f(F)\le \frac{1}{\mathcal L^n(U)}\int_U f(F+\nabla\varphi(x))\,d\mathcal L^n(x). \end{align*} [/definition] The test fields in quasiconvexity are zero-boundary perturbations of an affine map. This is precisely the oscillatory pattern that can appear in weakly convergent gradients, so quasiconvexity is the sharp lower-semicontinuity condition for many vectorial integral functionals. It is also difficult to verify directly, which motivates a stronger condition built from minors. [definition: Polyconvex Integrand] Let $m,n\in\mathbb N$ and let $N$ be the number of minors of a matrix in $\mathbb R^{m\times n}$ of all orders from $1$ to $\min\{m,n\}$. Let \begin{align*} M:\mathbb R^{m\times n}\to\mathbb R^N \end{align*} be the map that sends a matrix to the vector of all its minors. A function $f:\mathbb R^{m\times n}\to\mathbb R$ is polyconvex if there exists a convex function $g:\mathbb R^N\to\mathbb R$ such that \begin{align*} f(F)=g(M(F)) \end{align*} for every $F\in\mathbb R^{m\times n}$. [/definition] Polyconvexity is stronger than quasiconvexity but easier to verify in nonlinear elasticity. It allows energies depending convexly on $F$, $\operatorname{cof}F$, and $\det F$, which are the deformation gradient, oriented area distortion, and volume distortion. The following hierarchy explains why polyconvexity is a useful sufficient condition rather than a separate replacement for quasiconvexity. [quotetheorem:8766] [citeproof:8766] The theorem gives a practical order of attack. Try convexity first in scalar or linear problems; try polyconvexity when minors have physical or geometric meaning; use quasiconvexity when the problem is genuinely vectorial and no stronger checkable structure is available. The assumptions that $f$ is finite and continuous keep the hierarchy in the classical integral-functional setting: Jensen's inequality is applied to ordinary real-valued functions, and the approximation from smooth perturbations to $W^{1,\infty}_0$ perturbations uses continuity to pass inequalities through uniformly bounded gradients. Extended-valued or merely measurable integrands require separate lower-semicontinuity hypotheses, so the displayed implication chain should not be read as a theorem for all possible singular energies. The reverse implications fail for concrete reasons, not only because the definitions look different: the determinant $F\mapsto \det F$ is a null Lagrangian and is quasiconvex in the equality sense, but it is not convex since it changes sign along affine lines of matrices. For the other reverse implication, Sverak's vectorial examples give finite quasiconvex integrands in matrix dimensions at least $3\times 2$ that are not polyconvex. Rank-one convex examples that fail to be quasiconvex give a further warning that checking only line convexity along rank-one directions is not enough for weak lower semicontinuity. Thus the hierarchy supplies sufficient hypotheses, but it does not give a complete practical test for quasiconvexity. [example: Three Model Energies] Let $U\subset\mathbb R^n$ be bounded and open, and take $1<p<\infty$. For a scalar map $u\in W^{1,p}(U)$, define $f:\mathbb R^n\to[0,\infty)$ by $f(z)=|z|^p$. To see the convexity used in the scalar Dirichlet model, fix $z,y\in\mathbb R^n$ and $t\in[0,1]$. The triangle inequality and homogeneity of the norm give \begin{align*} |tz+(1-t)y|\le t|z|+(1-t)|y|. \end{align*} Since $s\mapsto s^p$ is increasing on $[0,\infty)$, \begin{align*} |tz+(1-t)y|^p\le \bigl(t|z|+(1-t)|y|\bigr)^p. \end{align*} Since $s\mapsto s^p$ is convex on $[0,\infty)$, \begin{align*} \bigl(t|z|+(1-t)|y|\bigr)^p\le t|z|^p+(1-t)|y|^p. \end{align*} Thus \begin{align*} f(tz+(1-t)y)\le tf(z)+(1-t)f(y), \end{align*} so $f$ is convex. The associated functional is \begin{align*} I_p[u]=\int_U |\nabla u|^p\,d\mathcal L^n. \end{align*} Because the integrand is convex in the gradient variable and has $p$-growth, the convex integral lower-semicontinuity theorem gives weak lower semicontinuity of $I_p$ on $W^{1,p}(U)$. For vectorial maps $u\in W^{1,p}(U;\mathbb R^m)$, let $h:\mathbb R^{m\times n}\to[0,\infty)$ be continuous, quasiconvex, and satisfy $p$-growth bounds of the form \begin{align*} c|F|^p-C\le h(F)\le C(1+|F|^p) \end{align*} for constants $c>0$ and $C\ge0$. The associated functional is \begin{align*} J[u]=\int_U h(\nabla u)\,d\mathcal L^n. \end{align*} Here convexity is not required; quasiconvexity is the condition matched to weak convergence of gradients. By *[Acerbi-Fusco Sequential Weak Lower Semicontinuity Theorem](/theorems/8749)*, if $u_k\rightharpoonup u$ in $W^{1,p}(U;\mathbb R^m)$, then \begin{align*} J[u]\le \liminf_{k\to\infty}J[u_k]. \end{align*} In nonlinear elasticity, take $u\in W^{1,p}(U;\mathbb R^n)$ and let $M(F)$ denote the vector of all minors of $F\in\mathbb R^{n\times n}$, including $F$, $\operatorname{cof}F$, and $\det F$. If $g:\mathbb R^N\to[0,\infty]$ is convex on the minors space and \begin{align*} W(F)=g(F,\operatorname{cof}F,\det F), \end{align*} then this is exactly the representation \begin{align*} W(F)=g(M(F)). \end{align*} Therefore $W$ is polyconvex by the definition of polyconvexity. The elasticity functional \begin{align*} E[u]=\int_U W(\nabla u)\,d\mathcal L^n \end{align*} is then handled through the polyconvex lower-semicontinuity framework: coercivity gives compactness of minimizing sequences, weak continuity of minors controls the cofactor and determinant terms, and weak closedness of the admissible determinant constraints keeps the weak limit admissible. These three model energies illustrate the hierarchy of lower-semicontinuity hypotheses: convexity for scalar gradient energies, quasiconvexity for genuinely vectorial gradient energies, and polyconvexity for elasticity energies written convexly in minors. [/example] The classification is not cosmetic. It determines which lower-semicontinuity theorem is available and which compactness theorem must also control minors, determinants, or lower-order terms. ## Constraints and Closedness Even when compactness and lower semicontinuity are available, existence can fail if the admissible class leaks under weak limits. The third question in a direct-method proof is: do the boundary conditions, pointwise restrictions, and integral constraints survive the chosen convergence? [definition: Sequentially Closed Constraint] Let $X$ be a Banach space with topology $\tau$. A set $\mathcal A\subset X$ is sequentially $\tau$-closed if whenever $u_k\in\mathcal A$ and $u_k\to u$ with respect to $\tau$, then $u\in\mathcal A$. [/definition] For weak topologies, closedness is often a theorem rather than a visual property of the constraint. Linear constraints defined by bounded functionals behave well, but nonlinear constraints may require compact embeddings or special weak-continuity results. [example: Mean Constraint] Let $U\subset\mathbb R^n$ be bounded, and define \begin{align*} \mathcal A=\left\{u\in W^{1,p}(U):\int_U u\,d\mathcal L^n=0\right\}. \end{align*} We show that $\mathcal A$ is weakly closed in $W^{1,p}(U)$. Suppose $u_k\in\mathcal A$ and $u_k\rightharpoonup u$ in $W^{1,p}(U)$. The inclusion $W^{1,p}(U)\hookrightarrow L^p(U)$ is bounded and linear, so $u_k\rightharpoonup u$ in $L^p(U)$. Define $\Lambda:L^p(U)\to\mathbb R$ by \begin{align*} \Lambda(v)=\int_U v\,d\mathcal L^n. \end{align*} This functional is linear, and by Holder's inequality, \begin{align*} |\Lambda(v)|=\left|\int_U v\cdot 1\,d\mathcal L^n\right|\le \|v\|_{L^p(U)}\|1\|_{L^{p'}(U)}=\mathcal L^n(U)^{1/p'}\|v\|_{L^p(U)}. \end{align*} Thus $\Lambda$ is a bounded linear functional on $L^p(U)$. Since $u_k\rightharpoonup u$ in $L^p(U)$, weak convergence gives \begin{align*} \int_U u\,d\mathcal L^n=\Lambda(u)=\lim_{k\to\infty}\Lambda(u_k)=\lim_{k\to\infty}\int_U u_k\,d\mathcal L^n. \end{align*} Each $u_k$ belongs to $\mathcal A$, so $\int_U u_k\,d\mathcal L^n=0$ for every $k$. Therefore \begin{align*} \int_U u\,d\mathcal L^n=0. \end{align*} Hence $u\in\mathcal A$, so the zero-mean constraint is weakly closed. [/example] The mean constraint illustrates the simplest closedness mechanism: a bounded linear functional is continuous for weak convergence. Nonlinear elasticity supplies the contrasting problem, where positivity of the determinant, injectivity-type conditions, and avoidance of interpenetration may not be weakly closed without additional hypotheses. The needed substitute is weak continuity of certain null Lagrangians, with the determinant as the central example. [quotetheorem:8767] [citeproof:8767] This result explains why determinant constraints are handled with more structure than ordinary convex constraints. They are nonlinear, but their null-Lagrangian form gives a substitute for linear weak closedness. The hypotheses are close to the natural threshold: if gradients are only weakly controlled in $L^p$ with $p<n$, determinant terms can concentrate and need not converge distributionally to the determinant of the weak limit. The theorem also has important limitations. It gives weak continuity of the determinant as a distributional quantity; it does not by itself preserve pointwise positivity of $\det\nabla u_k$, global injectivity, or noninterpenetration of matter. Those constraints require additional closedness theorems, coercive determinant terms, or admissibility conditions beyond weak convergence of $\nabla u_k$. ## Relaxation and Nonattainment When the direct-method checklist fails because lower semicontinuity is absent, the failure often has mathematical content rather than being a technical accident. The final question is: should the problem be solved as stated, or should its relaxed version be treated as the correct macroscopic model? [definition: Relaxed Functional] Let $X$ be a topological space, let $\mathcal A\subset X$, and let $I:\mathcal A\to(-\infty,\infty]$. The relaxation of $I$ with respect to convergence in $X$ is the functional $\overline I:X\to(-\infty,\infty]$ defined by \begin{align*} \overline I[u]=\inf\left\{\liminf_{k\to\infty} I[u_k]:u_k\in\mathcal A,\ u_k\to u\right\}. \end{align*} [/definition] The relaxed functional records the best energy achievable by approximating $u$ through admissible competitors for the original problem. It is built to be lower semicontinuous, and it usually replaces a nonconvex density by the correct convex, quasiconvex, or polyconvex envelope. This construction should still lead to an attained minimization problem, otherwise relaxation would only rename the nonattainment. [quotetheorem:8768] [citeproof:8768] A relaxed minimizer may not be a minimizer of the original functional. In physical terms this often means that the minimizing sequence develops fine-scale oscillation, concentration, or microstructure that is invisible in the weak limit. The assumptions in the theorem are not cosmetic: without bounded minimizing sequences, as for $I[u]=e^{-u}$ on $\mathbb R$, relaxed minimizing sequences can still escape to infinity; without weak closedness, the relaxed limit may fall outside the admissible class, as happens for $(0,1)\subset\mathbb R$ under the usual topology. Lower semicontinuity of the relaxed functional is the point of relaxation, but it must still be available in the topology used for compactness. The theorem therefore proves existence for the relaxed problem and equality of infimal values, not recovery of an actual minimizer for the original non-lower-semicontinuous problem. [example: Scalar Double-Well Relaxation] Let $U=(\alpha,\beta)\subset\mathbb R$, and in the clean double-well model assume $W\ge0$, $W(a)=W(b)=0$, and $a<b$. Fix $\theta\in[0,1]$ and set \begin{align*} s=\theta a+(1-\theta)b. \end{align*} A sequence can realize the average slope $s$ by using slope $a$ on a fraction $\theta$ of each small interval and slope $b$ on the remaining fraction. On one period this gives the averaged derivative \begin{align*} \theta a+(1-\theta)b=s. \end{align*} The corresponding averaged energy density is \begin{align*} \theta W(a)+(1-\theta)W(b)=\theta\cdot0+(1-\theta)\cdot0=0. \end{align*} If $u_k'$ alternates between $a$ and $b$ with this fixed volume fraction on periods of length tending to $0$, then $u_k'\rightharpoonup s$ in $L^p(U)$ by periodic averaging, and the weak limit has derivative $u'=s$. The original energy of the affine limit is \begin{align*} I[u]=\int_U W(s)\,d\mathcal L^1=(\beta-\alpha)W(s), \end{align*} which is positive whenever $W(s)>0$. By contrast, the convex envelope satisfies \begin{align*} W^{**}(s)\le \theta W(a)+(1-\theta)W(b)=0. \end{align*} Since $W\ge0$, its convex envelope also satisfies $W^{**}\ge0$, so \begin{align*} W^{**}(s)=0. \end{align*} Thus the relaxed energy of the same weak limit is \begin{align*} \overline I[u]=\int_U W^{**}(u'(x))\,d\mathcal L^1=\int_U W^{**}(s)\,d\mathcal L^1=0. \end{align*} Under the standard one-dimensional growth and boundary assumptions, the scalar relaxation replaces $W$ by $W^{**}$, so intermediate slopes are interpreted as weak limits of fine mixtures of the two preferred gradients rather than as gradients that pay the original pointwise cost $W(s)$. [/example] Relaxation is therefore not merely a fallback after a failed proof. It is the framework that preserves the limiting variational information when the original model contains unresolved small-scale structure. ## A Decision Tree for Existence Proofs A failed existence proof usually fails because its ingredients are checked in the wrong order: a minimizing sequence is compact in one topology, a constraint is closed in another, and the lower-semicontinuity theorem assumes a structural condition that the density does not satisfy. The course's main existence results can therefore be used as a sequence of checks. Start with the space and admissible class, then identify compactness, then choose the lower-semicontinuity theorem, and only then decide whether relaxation is needed. [explanation: Direct-Method Decision Tree] First choose the ambient space $X$, usually $W^{1,p}(U;\mathbb R^m)$ or an affine trace class inside it. Check coercivity in the norm of $X$; for fixed boundary values this often uses Poincare's inequality, while for elasticity it may require growth in $|F|$, $|\operatorname{cof}F|$, and $\det F$. Next check that every constraint is closed for the convergence produced by compactness. Linear constraints and affine trace conditions are usually weakly closed; nonlinear determinant or injectivity constraints require the special structure developed in the elasticity chapter. Then choose the lower-semicontinuity mechanism. Convexity is strongest and easiest to use, quasiconvexity is the natural vectorial condition for gradient integrals, and polyconvexity is the workable sufficient condition in nonlinear elasticity. If none of these applies, compute or characterize the appropriate relaxed density and solve the relaxed problem. Finally interpret the minimizer. A minimizer of the original problem is an actual optimizer in the stated admissible class. A minimizer of the relaxed problem describes the weak limit of nearly optimal configurations and may encode oscillation or microstructure rather than a classical deformation. [/explanation] This synthesis also clarifies the role of Euler-Lagrange equations. They can describe regular minimizers after existence is known, but they do not replace compactness, lower semicontinuity, and closedness in proving that a minimizer exists. ## Beyond and Connections The direct method is the organizing principle behind much of modern variational analysis. Its compactness step connects this note to weak compactness in reflexive Banach spaces, Sobolev embedding theorems, and compactness results for [functions of bounded variation](/page/Functions%20of%20Bounded%20Variation). Its lower semicontinuity step connects convex analysis to quasiconvexity, polyconvexity, relaxation, Young measures, and compensated compactness. Its closedness step connects variational problems to boundary traces, weakly closed constraint sets, and geometric constraints such as orientation preservation in nonlinear elasticity. Several neighboring topics extend the picture developed here. Relaxation replaces a non-lower-semicontinuous functional by the largest lower semicontinuous functional below it, giving the effective energy seen by oscillating minimizing sequences. Gamma convergence studies stability of minimizers and minimum values under perturbation of the functional, which is essential in homogenization and dimension reduction. Regularity theory asks when a minimizer obtained by the direct method is smoother than the space in which it was found. Lavrentiev phenomena show that the chosen admissible class is not a harmless technicality: changing the class can change the infimum. On Androma, this page is naturally read together with [Calculus of Variations I: Classical Theory](/page/Calculus%20of%20Variations%20I%3A%20Classical%20Theory), [Weak Convergence](/page/Weak%20Convergence), [Reflexive Space](/page/Reflexive%20Space), [Sobolev Space](/page/Sobolev%20Space), and [Convex Function](/page/Convex%20Function). The later topics on relaxation, quasiconvexity, polyconvexity, Young measures, Lavrentiev phenomena, and nonlinear elasticity are developed internally in the corresponding chapters of this note. ## References - Bernard Dacorogna, *Direct Methods in the Calculus of Variations*, 2nd ed., Springer, 2008. - Enrico Giusti, *Direct Methods in the Calculus of Variations*, World Scientific, 2003. - Charles B. Morrey Jr., *Multiple Integrals in the Calculus of Variations*, Springer, 1966. - John M. Ball, Convexity conditions and existence theorems in nonlinear elasticity, *Archive for Rational Mechanics and Analysis* 63 (1976/77), 337-403. - Lawrence C. Evans, *Partial Differential Equations*, 2nd ed., American Mathematical Society, 2010. - Mariano Giaquinta and Stefan Hildebrandt, *Calculus of Variations I*, Springer, 1996. - Andrea Braides, *Gamma-Convergence for Beginners*, Oxford University Press, 2002.

Created by admin on 6/21/2026 | Last updated on 6/21/2026

What brings you to Androma?

Start with a route through the knowledge graph.

Calculus of Variations II: Direct Methods

Sign in to Androma

Check your inbox

One last step

Calculus of Variations II: Direct Methods

Prerequisites (0/11 completed)

Prerequisites Graph

Rate this page