Necessary Condition for an Extremum — Statement & Proof

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] We reduce the infinite-dimensional extremum problem on $\mathcal{A}$ to a one-dimensional calculus problem. For each admissible direction $h$, we restrict $J$ to the affine line $\varepsilon \mapsto y^* + \varepsilon h$, obtaining a real-valued function $\phi(\varepsilon) = J[y^* + \varepsilon h]$ that inherits a local extremum at $\varepsilon = 0$. The $C^1$ regularity of $L$ and the Leibniz integral rule ensure that $\phi$ is differentiable with $\phi'(0) = \delta J[y^*; h]$. Fermat's interior extremum theorem then forces $\phi'(0) = 0$. [/proofplan] [step:Restrict $J$ to the affine line $\varepsilon \mapsto y^* + \varepsilon h$ and transfer the local extremum to $\varepsilon = 0$] Fix $h \in C^1([a,b])$ with $h(a) = h(b) = 0$. If $h = 0$, then $J[y^* + \varepsilon h] = J[y^*]$ for all $\varepsilon \in \mathbb{R}$, so $\delta J[y^*; h] = 0$ and the conclusion holds. Assume henceforth that $h \neq 0$, so that $\|h\|_{C^1([a,b])} > 0$. Define the one-parameter restriction \begin{align*} \phi: \mathbb{R} &\to \mathbb{R} \\ \varepsilon &\mapsto J[y^* + \varepsilon h] = \int_a^b L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr) \, d\mathcal{L}^1(x). \end{align*} For every $\varepsilon \in \mathbb{R}$, the function $y^* + \varepsilon h$ lies in $\mathcal{A}$: it belongs to $C^1([a,b])$ as a linear combination of $C^1$ functions, and satisfies $(y^* + \varepsilon h)(a) = \alpha + \varepsilon \cdot 0 = \alpha$ and $(y^* + \varepsilon h)(b) = \beta + \varepsilon \cdot 0 = \beta$, so the boundary conditions are preserved. The $C^1$ distance from the perturbed function to $y^*$ is \begin{align*} \|y^* + \varepsilon h - y^*\|_{C^1([a,b])} = |\varepsilon|\,\|h\|_{C^1([a,b])}. \end{align*} Define $\varepsilon_0 := \delta \,/\, \|h\|_{C^1([a,b])} > 0$. For all $\varepsilon$ with $|\varepsilon| < \varepsilon_0$, the bound $\|(y^* + \varepsilon h) - y^*\|_{C^1([a,b])} < \delta$ holds, so the local extremum hypothesis gives \begin{align*} \phi(\varepsilon) = J[y^* + \varepsilon h] \geq J[y^*] = \phi(0) \quad (\text{local minimum case; reverse for local maximum}). \end{align*} Thus $\phi$ has a local extremum at the interior point $\varepsilon = 0$ of the interval $(-\varepsilon_0,\, \varepsilon_0)$. [guided] The fundamental idea of the calculus of variations is to reduce an infinite-dimensional optimization problem to a finite-dimensional one. The admissible set $\mathcal{A} \subset C^1([a,b])$ is an infinite-dimensional affine subspace, and optimizing $J$ over $\mathcal{A}$ directly is intractable. Instead, we fix an arbitrary admissible direction $h$ and ask: what does the local extremum of $J$ at $y^*$ imply along the one-parameter family $\varepsilon \mapsto y^* + \varepsilon h$? If $h = 0$, the family is constant: $y^* + \varepsilon \cdot 0 = y^*$ for all $\varepsilon$, so $\phi(\varepsilon) = J[y^*]$ is constant and $\phi'(0) = 0 = \delta J[y^*; 0]$. The result holds vacuously. The interesting case is $h \neq 0$, where $\|h\|_{C^1([a,b])} > 0$. Define \begin{align*} \phi: \mathbb{R} &\to \mathbb{R} \\ \varepsilon &\mapsto J[y^* + \varepsilon h] = \int_a^b L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr) \, d\mathcal{L}^1(x). \end{align*} We must first verify that $\phi$ is well-defined, i.e., that $y^* + \varepsilon h \in \mathcal{A}$ for every $\varepsilon \in \mathbb{R}$. **(Regularity.)** The function $y^* + \varepsilon h$ belongs to $C^1([a,b])$ since $y^* \in C^1([a,b])$, $h \in C^1([a,b])$, and $C^1([a,b])$ is a vector space closed under addition and scalar multiplication. **(Boundary conditions.)** We compute $(y^* + \varepsilon h)(a) = y^*(a) + \varepsilon\, h(a) = \alpha + \varepsilon \cdot 0 = \alpha$, and likewise $(y^* + \varepsilon h)(b) = \beta + \varepsilon \cdot 0 = \beta$. The vanishing boundary condition $h(a) = h(b) = 0$ is precisely what ensures the perturbed function respects the same boundary data as $y^*$. Now we transfer the local extremum from $J$ on $\mathcal{A}$ to $\phi$ on $\mathbb{R}$. The key observation is that the map $\varepsilon \mapsto y^* + \varepsilon h$ is a continuous (in fact, affine) embedding of $\mathbb{R}$ into $(C^1([a,b]),\, \|\cdot\|_{C^1})$, since \begin{align*} \|y^* + \varepsilon h - y^*\|_{C^1([a,b])} = \|\varepsilon h\|_{C^1([a,b])} = |\varepsilon|\,\|h\|_{C^1([a,b])}. \end{align*} Define $\varepsilon_0 := \delta \,/\, \|h\|_{C^1([a,b])} > 0$. For any $\varepsilon$ with $|\varepsilon| < \varepsilon_0$, the $C^1$ distance from $y^* + \varepsilon h$ to $y^*$ satisfies $|\varepsilon|\,\|h\|_{C^1([a,b])} < \delta$, so $y^* + \varepsilon h$ lies in the $\delta$-neighbourhood of $y^*$ in $\mathcal{A}$. The local minimum hypothesis then gives \begin{align*} \phi(\varepsilon) = J[y^* + \varepsilon h] \geq J[y^*] = \phi(0) \quad \text{for all } |\varepsilon| < \varepsilon_0. \end{align*} This means $\phi$ has a local minimum at the interior point $\varepsilon = 0$ of the interval $(-\varepsilon_0,\, \varepsilon_0)$. (For a local maximum of $J$, the inequality reverses to $\phi(\varepsilon) \leq \phi(0)$, and $\phi$ has a local maximum at $\varepsilon = 0$. The remainder of the argument is identical in both cases.) [/guided] [/step] [step:Compute $\phi'(0) = \delta J[y^*; h]$ by differentiating under the integral sign] Define the integrand as a function of position $x$ and parameter $\varepsilon$: \begin{align*} g: [a,b] \times \mathbb{R} &\to \mathbb{R} \\ (x, \varepsilon) &\mapsto L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr), \end{align*} so that $\phi(\varepsilon) = \int_a^b g(x, \varepsilon) \, d\mathcal{L}^1(x)$. Since $L \in C^1$ and the map $\varepsilon \mapsto \bigl(y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr)$ is affine in $\varepsilon$, the chain rule gives \begin{align*} \partial_\varepsilon g(x, \varepsilon) &= \partial_y L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr)\, h(x) \\ &\quad + \partial_p L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr)\, h'(x), \end{align*} where $\partial_y L$ and $\partial_p L$ denote the partial derivatives of $L$ with respect to its second and third arguments. Both $g$ and $\partial_\varepsilon g$ are continuous on $[a,b] \times \mathbb{R}$ — as compositions and products of continuous functions ($L \in C^1$ ensures $\partial_y L, \partial_p L \in C^0$, and $y^*, (y^*)', h, h' \in C^0([a,b])$). Since $[a,b]$ is a compact interval and $\partial_\varepsilon g$ is continuous, the Leibniz integral rule applies: \begin{align*} \phi'(\varepsilon) = \int_a^b \partial_\varepsilon g(x, \varepsilon) \, d\mathcal{L}^1(x). \end{align*} Evaluating at $\varepsilon = 0$: \begin{align*} \phi'(0) = \int_a^b \Bigl[\partial_y L\bigl(x,\, y^*(x),\, (y^*)'(x)\bigr)\, h(x) \;+\; \partial_p L\bigl(x,\, y^*(x),\, (y^*)'(x)\bigr)\, h'(x)\Bigr] d\mathcal{L}^1(x) = \delta J[y^*;\, h]. \end{align*} [guided] We need to differentiate $\phi(\varepsilon) = \int_a^b g(x, \varepsilon) \, d\mathcal{L}^1(x)$ with respect to $\varepsilon$. This requires justifying the interchange of derivative and integral — we cannot move $\frac{d}{d\varepsilon}$ under the integral sign without checking hypotheses. Define the integrand as a function of two variables: \begin{align*} g: [a,b] \times \mathbb{R} &\to \mathbb{R} \\ (x, \varepsilon) &\mapsto L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr). \end{align*} The Leibniz integral rule for parameter-dependent integrals requires two conditions: (i) $g(x, \varepsilon)$ is continuous on $[a,b] \times I$ for some interval $I$, and (ii) the partial derivative $\partial_\varepsilon g(x, \varepsilon)$ exists and is continuous on $[a,b] \times I$. We verify both with $I = \mathbb{R}$. **(i) Continuity of $g$.** The map $(x, \varepsilon) \mapsto \bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr)$ is continuous from $[a,b] \times \mathbb{R}$ into $[a,b] \times \mathbb{R}^2$, since each component is a polynomial in $\varepsilon$ with continuous coefficients in $x$. Since $L$ is $C^1$ (hence continuous), the composition $g$ is continuous on $[a,b] \times \mathbb{R}$. **(ii) Existence and continuity of $\partial_\varepsilon g$.** The map $\varepsilon \mapsto \bigl(y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr)$ is affine in $\varepsilon$ with derivative $\bigl(h(x),\, h'(x)\bigr)$. Since $L \in C^1$, the chain rule applies and yields \begin{align*} \partial_\varepsilon g(x, \varepsilon) &= \partial_y L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr)\, h(x) \\ &\quad + \partial_p L\bigl(x,\; y^*(x) + \varepsilon\, h(x),\; (y^*)'(x) + \varepsilon\, h'(x)\bigr)\, h'(x), \end{align*} where $\partial_y L$ and $\partial_p L$ denote the partial derivatives of $L: [a,b] \times \mathbb{R} \times \mathbb{R} \to \mathbb{R}$ with respect to its second and third arguments, respectively. Since $L \in C^1$, both $\partial_y L$ and $\partial_p L$ are continuous. The functions $h, h' \in C^0([a,b])$, so each summand is a product of continuous functions of $(x, \varepsilon)$, hence continuous. Thus $\partial_\varepsilon g$ is continuous on $[a,b] \times \mathbb{R}$. Both hypotheses of the Leibniz integral rule are satisfied. We conclude that $\phi$ is differentiable on $\mathbb{R}$ with \begin{align*} \phi'(\varepsilon) = \int_a^b \partial_\varepsilon g(x, \varepsilon) \, d\mathcal{L}^1(x). \end{align*} Evaluating at $\varepsilon = 0$: \begin{align*} \phi'(0) &= \int_a^b \Bigl[\partial_y L\bigl(x,\, y^*(x),\, (y^*)'(x)\bigr)\, h(x) \;+\; \partial_p L\bigl(x,\, y^*(x),\, (y^*)'(x)\bigr)\, h'(x)\Bigr] d\mathcal{L}^1(x) \\ &= \delta J[y^*;\, h], \end{align*} where the last equality is the definition of the first variation. This is where the $C^1$ regularity of $L$ is consumed: the chain rule requires the first-order partial derivatives $\partial_y L$ and $\partial_p L$ to exist, and the Leibniz rule requires them to be continuous. If $L$ were merely continuous, the derivative $\phi'(0)$ might not exist at all, and the reduction to Fermat's theorem would fail. [/guided] [/step] [step:Apply Fermat's theorem to conclude $\delta J[y^*; h] = 0$] By the preceding steps, $\phi: \mathbb{R} \to \mathbb{R}$ is differentiable at $\varepsilon = 0$ and has a local extremum at the interior point $\varepsilon = 0$ of $(-\varepsilon_0, \varepsilon_0)$. Fermat's interior extremum theorem states that if a real-valued function is differentiable at an interior point of its domain and attains a local extremum there, then its derivative vanishes at that point. Applying this to $\phi$ at $\varepsilon = 0$: \begin{align*} 0 = \phi'(0) = \delta J[y^*;\, h]. \end{align*} Since $h \in C^1([a,b])$ with $h(a) = h(b) = 0$ was arbitrary, the first variation $\delta J[y^*; h]$ vanishes for every admissible direction. [/step]

What brings you to Androma?

Start with a route through the knowledge graph.

Necessary Condition for an Extremum (Theorem # 3520)

Discussion

Proof

Explore Further

Sign in to Androma

Check your inbox

One last step

Necessary Condition for an Extremum (Theorem # 3520)

Discussion

Proof

Explore Further