[proofplan]
We first prove the scalar regularity statement: testing the [distributional derivative](/page/Distributional%20Derivative) against a fixed vector $v \in V$ shows that $t \mapsto (u(t),v)_H$ has an absolutely continuous representative. Then we prove the energy identity by smoothing in time, applying the ordinary Hilbert-space chain rule to the smoothed functions, and passing to the limit in the $L^2((0,T);V)$ and $L^2((0,T);V^*)$ norms. The scalar representatives reconstruct a weakly continuous $H$-valued representative, and the energy identity gives continuity of its norm. Weak continuity together with norm continuity in a [Hilbert space](/page/Hilbert%20Space) gives strong continuity.
[/proofplan]
[step:Identify the Hilbert triple and the scalar weak derivatives]
Let
\begin{align*}
J_H: H &\to V^*
\end{align*}
denote the continuous injective map defined by
\begin{align*}
(J_H h)(v)=(h,v)_H
\end{align*}
for $h \in H$ and $v \in V$. We use this identification throughout, so that $H$ is regarded as a dense subspace of $V^*$.
Fix $v \in V$. Since the embedding $V \hookrightarrow H$ is continuous, the map
\begin{align*}
\ell_v: V \to \mathbb{R}
\end{align*}
defined by
\begin{align*}
\ell_v(w)=(w,v)_H
\end{align*}
for $w \in V$ is continuous. Define the scalar map
\begin{align*}
a_v: (0,T) \to \mathbb{R}
\end{align*}
by
\begin{align*}
a_v(t)=(u(t),v)_H
\end{align*}
for $\mathcal{L}^1$-a.e. $t \in (0,T)$. This defines an element of $L^2(0,T)$ because $u\in L^2((0,T);V)$ and $\ell_v\in V^*$. We regard $a_v$ as the corresponding scalar distribution on $(0,T)$.
By the definition of the distributional derivative $u'$, for every [test function](/page/Test%20Function) $\varphi \in C_c^\infty((0,T))$,
\begin{align*}
-\int_0^{\mathsf T} (u(t),v)_H \varphi'(t)\,d\mathcal{L}^1(t)=\int_0^{\mathsf T} u'(t)(v)\varphi(t)\,d\mathcal{L}^1(t).
\end{align*}
Thus the scalar distributional derivative of $a_v$ is the function
\begin{align*}
b_v: (0,T) \to \mathbb{R}
\end{align*}
defined by
\begin{align*}
b_v(t)=u'(t)(v).
\end{align*}
Since
\begin{align*}
|b_v(t)| \le \|u'(t)\|_{V^*}\|v\|_V
\end{align*}
for $\mathcal{L}^1$-a.e. $t \in (0,T)$, we have $b_v \in L^2(0,T) \subset L^1(0,T)$. Therefore $a_v$ has an absolutely continuous representative on $[0,T]$, still denoted $a_v$, and for every $0 \le s \le t \le T$,
\begin{align*}
a_v(t)-a_v(s)=\int_s^t u'(\tau)(v)\,d\mathcal{L}^1(\tau).
\end{align*}
[/step]
[step:Derive the energy identity first on compact subintervals by time mollification]
Let $0<a<b<T$. Choose $\varepsilon>0$ satisfying
\begin{align*}
\varepsilon<\frac{1}{2}\min\{a,T-b\}.
\end{align*}
Let $\rho \in C_c^\infty((-1,1))$ be a non-negative function satisfying
\begin{align*}
\int_{\mathbb{R}}\rho(r)\,d\mathcal{L}^1(r)=1,
\end{align*}
and define
\begin{align*}
\rho_\varepsilon: \mathbb{R} \to \mathbb{R}
\end{align*}
by
\begin{align*}
\rho_\varepsilon(r)=\varepsilon^{-1}\rho(r/\varepsilon).
\end{align*}
Extend $u$ and $u'$ by zero outside $(0,T)$, and define
\begin{align*}
u_\varepsilon: (a,b) \to V
\end{align*}
by
\begin{align*}
u_\varepsilon(t)=\int_{\mathbb{R}}\rho_\varepsilon(t-r)u(r)\,d\mathcal{L}^1(r),
\end{align*}
and define
\begin{align*}
f_\varepsilon: (a,b) \to V^*
\end{align*}
by
\begin{align*}
f_\varepsilon(t)=\int_{\mathbb{R}}\rho_\varepsilon(t-r)u'(r)\,d\mathcal{L}^1(r).
\end{align*}
The support condition on $\rho_\varepsilon$ ensures that, for $t \in (a,b)$, the convolution only samples values from $(0,T)$. Bochner convolution is well-defined because the zero extensions of $u$ and $u'$ belong to $L^2(\mathbb{R};V)$ and $L^2(\mathbb{R};V^*)$, respectively, and $\rho_\varepsilon\in C_c^\infty(\mathbb{R})\subset L^1(\mathbb{R})$. Differentiation under the Bochner integral gives
\begin{align*}
u_\varepsilon \in C^\infty((a,b);V)
\end{align*}
and the derivative as an $H$-valued map is obtained by composing the $V$-valued derivative with the continuous embedding $V\hookrightarrow H$. To identify this derivative in $V^*$, fix $v\in V$ and $\psi\in C_c^\infty((a,b))$. Using [Fubini's theorem](/theorems/2961) for the integrable scalar function generated by $u$, $v$, $\rho_\varepsilon$, and $\psi$, together with the scalar distributional identity from the first step, gives
\begin{align*}
-\int_a^b (u_\varepsilon(t),v)_H\psi'(t)\,d\mathcal{L}^1(t)=\int_a^b f_\varepsilon(t)(v)\psi(t)\,d\mathcal{L}^1(t).
\end{align*}
Thus the $V^*$-valued distributional derivative of $J_Hu_\varepsilon$ is $f_\varepsilon$. Since $u_\varepsilon$ is smooth as a $V$-valued, hence $H$-valued, map, this distributional identity is pointwise, so
\begin{align*}
J_H\frac{d}{dt}u_\varepsilon(t)=f_\varepsilon(t)
\end{align*}
for every $t\in(a,b)$ as an identity in $V^*$.
Because $u_\varepsilon(t) \in V \subset H$ and $u_\varepsilon$ is differentiable as an $H$-valued map, the ordinary chain rule in the Hilbert space $H$, followed by the identity $J_H u_\varepsilon'(t)=f_\varepsilon(t)$, gives, for $a<s\le t<b$,
\begin{align*}
\|u_\varepsilon(t)\|_H^2-\|u_\varepsilon(s)\|_H^2=2\int_s^t f_\varepsilon(\tau)(u_\varepsilon(\tau))\,d\mathcal{L}^1(\tau).
\end{align*}
[guided]
The purpose of the mollification is to create a function for which the usual derivative calculation is legitimate. The original function $u$ is only in $L^2((0,T);V)$, and its derivative is only a $V^*$-valued function. Thus the expression $\frac{d}{dt}\|u(t)\|_H^2$ cannot initially be interpreted pointwise. After convolution in time, however, $u_\varepsilon$ is smooth as a $V$-valued function on the smaller interval $(a,b)$.
We choose $\varepsilon$ so that
\begin{align*}
\varepsilon<\frac{1}{2}\min\{a,T-b\}.
\end{align*}
With this choice, if $t \in (a,b)$ and $\rho_\varepsilon(t-r)\neq 0$, then $r \in (0,T)$. This avoids boundary terms from the zero extension. The mollified maps are
\begin{align*}
u_\varepsilon: (a,b) &\to V
\end{align*}
with
\begin{align*}
u_\varepsilon(t)=\int_{\mathbb{R}}\rho_\varepsilon(t-r)u(r)\,d\mathcal{L}^1(r),
\end{align*}
and
\begin{align*}
f_\varepsilon: (a,b) &\to V^*
\end{align*}
with
\begin{align*}
f_\varepsilon(t)=\int_{\mathbb{R}}\rho_\varepsilon(t-r)u'(r)\,d\mathcal{L}^1(r).
\end{align*}
The distributional identity defining $u'$ commutes with convolution on the interior interval because the convolution kernel is smooth and compactly supported away from the endpoints after the choice of $\varepsilon$. More explicitly, testing against a fixed $v\in V$ and a time test function $\psi\in C_c^\infty((a,b))$, Fubini's theorem is applicable to the scalar integrals since $u\in L^2((0,T);V)$, $u'\in L^2((0,T);V^*)$, and $\rho_\varepsilon$ has compact support. The scalar identity from the first step then gives
\begin{align*}
-\int_a^b (u_\varepsilon(t),v)_H\psi'(t)\,d\mathcal{L}^1(t)=\int_a^b f_\varepsilon(t)(v)\psi(t)\,d\mathcal{L}^1(t).
\end{align*}
Hence the $V^*$-valued distributional derivative of $J_Hu_\varepsilon$ is $f_\varepsilon$. Since $u_\varepsilon$ is smooth as a $V$-valued, hence $H$-valued, function, this distributional identity is the pointwise identity
\begin{align*}
J_H\frac{d}{dt}u_\varepsilon(t)=f_\varepsilon(t)
\end{align*}
for every $t \in (a,b)$.
Since $u_\varepsilon(t)$ actually lies in $V \subset H$, the Hilbert norm $\|u_\varepsilon(t)\|_H^2$ is an ordinary differentiable scalar function. Applying the chain rule in $H$ first gives
\begin{align*}
\frac{d}{dt}\|u_\varepsilon(t)\|_H^2=2\left(\frac{d}{dt}u_\varepsilon(t),u_\varepsilon(t)\right)_H.
\end{align*}
By the definition of $J_H$ and the identity $J_Hu_\varepsilon'(t)=f_\varepsilon(t)$, the right-hand side is
\begin{align*}
2\left(J_Hu_\varepsilon'(t)\right)(u_\varepsilon(t))=2f_\varepsilon(t)(u_\varepsilon(t)).
\end{align*}
Integrating this identity over $(s,t)$ with respect to $\mathcal{L}^1$ yields
\begin{align*}
\|u_\varepsilon(t)\|_H^2-\|u_\varepsilon(s)\|_H^2=2\int_s^t f_\varepsilon(\tau)(u_\varepsilon(\tau))\,d\mathcal{L}^1(\tau).
\end{align*}
[/guided]
[/step]
[step:Pass the mollified identity to the limit in the duality pairing]
As $\varepsilon \to 0$, Bochner convolution gives
\begin{align*}
u_\varepsilon \to u \quad \text{in } L^2((a,b);V)
\end{align*}
and
\begin{align*}
f_\varepsilon \to u' \quad \text{in } L^2((a,b);V^*).
\end{align*}
The continuity of the embedding $V \hookrightarrow H$ also gives
\begin{align*}
u_\varepsilon \to u \quad \text{in } L^2((a,b);H).
\end{align*}
We compare the duality terms by writing
\begin{align*}
f_\varepsilon(u_\varepsilon)-u'(u)=f_\varepsilon(u_\varepsilon-u)+(f_\varepsilon-u')(u).
\end{align*}
Using the defining operator norm on $V^*$ and Cauchy-Schwarz on the [measure space](/page/Measure%20Space) $((a,b),\mathcal{B}((a,b)),\mathcal{L}^1)$,
\begin{align*}
\int_a^b |f_\varepsilon(\tau)(u_\varepsilon(\tau)-u(\tau))|\,d\mathcal{L}^1(\tau) \le \|f_\varepsilon\|_{L^2((a,b);V^*)}\|u_\varepsilon-u\|_{L^2((a,b);V)}.
\end{align*}
The right-hand side tends to $0$ because $(f_\varepsilon)$ is bounded in $L^2((a,b);V^*)$ and $u_\varepsilon \to u$ in $L^2((a,b);V)$. Similarly,
\begin{align*}
\int_a^b |(f_\varepsilon(\tau)-u'(\tau))(u(\tau))|\,d\mathcal{L}^1(\tau) \le \|f_\varepsilon-u'\|_{L^2((a,b);V^*)}\|u\|_{L^2((a,b);V)},
\end{align*}
which tends to $0$. Therefore
\begin{align*}
f_\varepsilon(u_\varepsilon)\to u'(u)
\end{align*}
in $L^1((a,b))$.
Moreover,
\begin{align*}
\int_a^b \left|\|u_\varepsilon(\tau)\|_H^2-\|u(\tau)\|_H^2\right|\,d\mathcal{L}^1(\tau) \le \left(\|u_\varepsilon\|_{L^2((a,b);H)}+\|u\|_{L^2((a,b);H)}\right)\|u_\varepsilon-u\|_{L^2((a,b);H)},
\end{align*}
so $\|u_\varepsilon\|_H^2\to \|u\|_H^2$ in $L^1((a,b))$. Let $\psi \in C_c^\infty((a,b))$. For each $\varepsilon$, the differential identity
\begin{align*}
\frac{d}{dt}\|u_\varepsilon(t)\|_H^2=2f_\varepsilon(t)(u_\varepsilon(t))
\end{align*}
holds on $(a,b)$. Testing this identity against $\psi$, integrating the left-hand side by parts on $(a,b)$ with respect to $\mathcal{L}^1$, and using the compact support of $\psi$ gives
\begin{align*}
-\int_a^b \|u_\varepsilon(\tau)\|_H^2\psi'(\tau)\,d\mathcal{L}^1(\tau)=2\int_a^b f_\varepsilon(\tau)(u_\varepsilon(\tau))\psi(\tau)\,d\mathcal{L}^1(\tau).
\end{align*}
Since $\psi$ and $\psi'$ are bounded and compactly supported in $(a,b)$, convergence in $L^1((a,b))$ implies convergence after multiplication by $\psi$ and $\psi'$, respectively, by the elementary estimate
\begin{align*}
\|\psi F\|_{L^1((a,b))}\le \|\psi\|_\infty\|F\|_{L^1((a,b))}.
\end{align*}
Passing to the limit in the two tested integrals therefore gives
\begin{align*}
-\int_a^b \|u(\tau)\|_H^2\psi'(\tau)\,d\mathcal{L}^1(\tau)=2\int_a^b u'(\tau)(u(\tau))\psi(\tau)\,d\mathcal{L}^1(\tau).
\end{align*}
Thus, in the sense of distributions on $(a,b)$,
\begin{align*}
\frac{d}{dt}\|u(t)\|_H^2=2u'(t)(u(t)).
\end{align*} Since $a$ and $b$ were arbitrary interior endpoints, the identity holds distributionally on $(0,T)$. The function
\begin{align*}
q: (0,T) \to \mathbb{R}
\end{align*}
defined by
\begin{align*}
q(t)=2u'(t)(u(t))
\end{align*}
belongs to $L^1(0,T)$, because
\begin{align*}
\int_0^{\mathsf T} |q(t)|\,d\mathcal{L}^1(t) \le 2\|u'\|_{L^2((0,T);V^*)}\|u\|_{L^2((0,T);V)}.
\end{align*}
Hence $\|u(t)\|_H^2$ has an absolutely continuous representative $g:[0,T]\to \mathbb{R}$ satisfying
\begin{align*}
g(t)-g(s)=2\int_s^t u'(\tau)(u(\tau))\,d\mathcal{L}^1(\tau)
\end{align*}
for all $0\le s\le t\le T$.
[/step]
[step:Reconstruct an $H$-valued weakly continuous representative]
For each $t \in [0,T]$, define a map
\begin{align*}
L_t: V \to \mathbb{R}
\end{align*}
by
\begin{align*}
L_t(v)=a_v(t),
\end{align*}
where $a_v$ is the absolutely continuous representative constructed in the first step.
We first verify that $L_t$ is linear for every fixed $t$. Let $\alpha \in \mathbb{R}$ and let $v,w \in V$. The two scalar functions $a_{\alpha v+w}$ and $\alpha a_v+a_w$ are continuous on $[0,T]$. For $\mathcal{L}^1$-a.e. $t \in (0,T)$, the almost-everywhere definition of $a_v$ gives
\begin{align*}
a_{\alpha v+w}(t)=(u(t),\alpha v+w)_H=\alpha (u(t),v)_H+(u(t),w)_H=\alpha a_v(t)+a_w(t).
\end{align*}
Since two continuous functions on $[0,T]$ that agree almost everywhere agree everywhere, $a_{\alpha v+w}=\alpha a_v+a_w$ on $[0,T]$. Hence $L_t(\alpha v+w)=\alpha L_t(v)+L_t(w)$ for every $t \in [0,T]$.
For every $v \in V$, the scalar function
\begin{align*}
r_v: [0,T] \to \mathbb{R}
\end{align*}
defined by
\begin{align*}
r_v(t)=g(t)\|v\|_H^2-|a_v(t)|^2
\end{align*}
is continuous. For $\mathcal{L}^1$-a.e. $t \in (0,T)$, Cauchy-Schwarz in $H$ gives
\begin{align*}
|a_v(t)|^2=|(u(t),v)_H|^2 \le \|u(t)\|_H^2\|v\|_H^2=g(t)\|v\|_H^2.
\end{align*}
Since $r_v$ is continuous and non-negative almost everywhere, it is non-negative everywhere. Thus
\begin{align*}
|L_t(v)|\le g(t)^{1/2}\|v\|_H
\end{align*}
for every $v \in V$ and every $t \in [0,T]$.
Because $V$ is dense in $H$, $L_t$ extends uniquely to a bounded linear functional on $H$. By the [Riesz representation theorem](/theorems/218) for Hilbert spaces, applied to this bounded linear functional on $H$, there is a unique vector $\widetilde{u}(t)\in H$ such that
\begin{align*}
(\widetilde{u}(t),h)_H=L_t(h)
\end{align*}
for every $h \in H$, where $L_t(h)$ denotes the continuous extension from $V$ to $H$. For $v \in V$, this gives
\begin{align*}
(\widetilde{u}(t),v)_H=a_v(t).
\end{align*}
Since $a_v$ is continuous for every $v \in V$, and $V$ is dense in $H$ while $\|\widetilde{u}(t)\|_H\le g(t)^{1/2}$ is bounded on $[0,T]$, it follows that
\begin{align*}
t \mapsto (\widetilde{u}(t),h)_H
\end{align*}
is continuous on $[0,T]$ for every $h \in H$. Hence $\widetilde{u}: [0,T]\to H$ is weakly continuous.
For $\mathcal{L}^1$-a.e. $t \in (0,T)$, the vectors $\widetilde{u}(t)$ and $u(t)$ have the same [inner product](/page/Inner%20Product) with every $v \in V$. Since $V$ is dense in $H$, they are equal in $H$. Thus $\widetilde{u}$ is a representative of $u$.
[/step]
[step:Upgrade weak continuity to strong continuity using the norm identity]
The energy identity obtained above gives a continuous representative $g$ of the squared $H$-norm of $u$. Since $\widetilde{u}=u$ for $\mathcal{L}^1$-a.e. $t$, we have
\begin{align*}
g(t)=\|\widetilde{u}(t)\|_H^2
\end{align*}
for $\mathcal{L}^1$-a.e. $t \in (0,T)$.
We prove that this equality holds for every $t \in [0,T]$. For each $v \in V$, the function $a_v$ is continuous and satisfies $a_v(t)=(\widetilde{u}(t),v)_H$ for every $t \in [0,T]$. Hence the scalar function
\begin{align*}
h_v: [0,T] \to \mathbb{R}
\end{align*}
defined by
\begin{align*}
h_v(t)=g(t)-2a_v(t)+\|v\|_H^2
\end{align*}
is continuous. For $\mathcal{L}^1$-a.e. $t \in (0,T)$, using $\widetilde{u}(t)=u(t)$ in $H$ gives
\begin{align*}
h_v(t)=\|\widetilde{u}(t)-v\|_H^2.
\end{align*}
Since a [continuous function](/page/Continuous%20Function) that agrees almost everywhere with a non-negative function is non-negative everywhere, $h_v(t)\ge 0$ for every $t \in [0,T]$ and every $v \in V$.
Fix $t \in [0,T]$. Choose a sequence $(v_n)_{n=1}^{\infty}$ in $V$ such that $v_n\to \widetilde{u}(t)$ in $H$, which is possible because $V$ is dense in $H$. From $h_{v_n}(t)\ge 0$ we obtain
\begin{align*}
g(t)\ge 2(\widetilde{u}(t),v_n)_H-\|v_n\|_H^2.
\end{align*}
Passing to the limit as $n\to\infty$ gives
\begin{align*}
g(t)\ge \|\widetilde{u}(t)\|_H^2.
\end{align*}
The opposite inequality follows from the construction of $\widetilde{u}(t)$: the representing functional $L_t$ satisfies $|L_t(h)|\le g(t)^{1/2}\|h\|_H$ for every $h \in H$, so
\begin{align*}
\|\widetilde{u}(t)\|_H\le g(t)^{1/2}.
\end{align*}
Therefore
\begin{align*}
g(t)=\|\widetilde{u}(t)\|_H^2
\end{align*}
for every $t \in [0,T]$. Since $g$ is continuous, $t \mapsto \|\widetilde{u}(t)\|_H^2$ is continuous on $[0,T]$.
Now let $(t_n)_{n=1}^{\infty}$ be any sequence in $[0,T]$ with $t_n \to t$. Weak continuity gives
\begin{align*}
\widetilde{u}(t_n)\rightharpoonup \widetilde{u}(t)
\end{align*}
in $H$, and norm continuity gives
\begin{align*}
\|\widetilde{u}(t_n)\|_H \to \|\widetilde{u}(t)\|_H.
\end{align*}
The implication from [weak convergence](/page/Weak%20Convergence) plus convergence of norms to strong convergence is proved here directly by the Hilbert-space polarization computation
\begin{align*}
\|\widetilde{u}(t_n)-\widetilde{u}(t)\|_H^2=\|\widetilde{u}(t_n)\|_H^2+\|\widetilde{u}(t)\|_H^2-2(\widetilde{u}(t_n),\widetilde{u}(t))_H.
\end{align*}
The first two terms converge by norm continuity, and the inner-product term converges by weak convergence with the fixed test vector $\widetilde{u}(t)\in H$. Hence the right-hand side tends to $0$. Therefore
\begin{align*}
\widetilde{u}(t_n)\to \widetilde{u}(t)
\end{align*}
in $H$. Since the sequence was arbitrary, $\widetilde{u}\in C([0,T];H)$.
[/step]
[step:Conclude the energy formula for the continuous representative]
The absolutely continuous norm identity already gives, for every $0\le s\le t\le T$,
\begin{align*}
g(t)-g(s)=2\int_s^t u'(\tau)(u(\tau))\,d\mathcal{L}^1(\tau),
\end{align*}
where the integrand is the well-defined $V^*$-$V$ duality pairing using the original representative $u \in L^2((0,T);V)$. The previous step proved
\begin{align*}
g(r)=\|\widetilde{u}(r)\|_H^2
\end{align*}
for every $r \in [0,T]$. Substituting this equality at $r=s$ and $r=t$ yields
\begin{align*}
\|\widetilde{u}(t)\|_H^2-\|\widetilde{u}(s)\|_H^2=2\int_s^t u'(\tau)(u(\tau))\,d\mathcal{L}^1(\tau).
\end{align*}
Thus $\widetilde{u}\in C([0,T];H)$ is the asserted continuous representative, and the displayed formula is the asserted energy identity, with the right-hand side interpreted through the original $V$-valued representative of $u$.
[/step]