[proofplan]
We use the mild formula together with the semigroup property to rewrite the increment $u(t+h)-u(t)$ in terms of $S(h)u(t)-u(t)$ and a short-time average of the forcing term. Since $u(t)\in D(A)$ and $u$ is continuous into $D(A)$ with the graph norm, the semigroup difference quotient converges to $Au(t)$ even when the base point varies with $t$. The short-time forcing average converges to $f(t)$ by continuity of $f$ and strong continuity of the semigroup. Comparing these one-sided limits with the assumed $C^1$ derivative gives $u'(t)=Au(t)+f(t)$, and the initial condition follows by evaluating the mild formula at $t=0$.
[/proofplan]
[step:Record the local boundedness and difference quotient facts for the semigroup]
Since $S$ is strongly continuous, for every $R>0$ there is a constant
\begin{align*}
M_R := \sup_{0\leq r\leq R}\|S(r)\|_{\mathcal{L}(X)} < \infty.
\end{align*}
We shall use the following consequence of the generator definition. If $x\in D(A)$, then
\begin{align*}
\lim_{h\downarrow 0}\left\|\frac{S(h)x-x}{h}-Ax\right\|_X=0.
\end{align*}
Moreover, if $h_0>0$ and a map $x_h:(0,h_0]\to D(A)$ satisfies $x_h\to x$ in $D(A)$ as $h\downarrow 0$, then
\begin{align*}
\frac{S(h)x_h-x_h}{h}\to Ax
\end{align*}
in $X$ as $h\downarrow 0$.
Indeed, for $y\in D(A)$ and $h>0$, the standard semigroup identity gives
\begin{align*}
S(h)y-y=\int_0^h S(r)Ay\,d\mathcal{L}^1(r).
\end{align*}
Therefore
\begin{align*}
\left\|\frac{S(h)x_h-x_h}{h}-Ax\right\|_X
\leq \frac{1}{h}\int_0^h \|S(r)(Ax_h-Ax)\|_X\,d\mathcal{L}^1(r)
+\left\|\frac{1}{h}\int_0^h S(r)Ax\,d\mathcal{L}^1(r)-Ax\right\|_X.
\end{align*}
The first term is bounded by $M_{h_0}\|Ax_h-Ax\|_X$, which tends to $0$. The second term tends to $0$ by strong continuity of $S$ at $0$, applied to the fixed vector $Ax\in X$.
[guided]
The key technical point is that the generator $A$ is unbounded, so one cannot treat $A$ like a bounded operator and move it through limits without justification. The graph-norm hypothesis is exactly what is needed: convergence in $D(A)$ means convergence of both the vectors and their $A$-images.
Because $S$ is a strongly continuous semigroup, it is uniformly bounded on compact time intervals. Thus for each $R>0$ we may define
\begin{align*}
M_R := \sup_{0\leq r\leq R}\|S(r)\|_{\mathcal{L}(X)} < \infty.
\end{align*}
For a fixed $x\in D(A)$, the definition of the generator says
\begin{align*}
\lim_{h\downarrow 0}\left\|\frac{S(h)x-x}{h}-Ax\right\|_X=0.
\end{align*}
We need a slightly stronger form: the vector may vary with $h$. Let $h_0>0$, and let $x_h:(0,h_0]\to D(A)$ be a map such that $x_h\to x$ in $D(A)$ as $h\downarrow 0$. Since convergence in $D(A)$ is convergence in the graph norm, we have both $x_h\to x$ in $X$ and $Ax_h\to Ax$ in $X$.
For $y\in D(A)$, the orbit map $r\mapsto S(r)y$ is differentiable in $X$ and has derivative $S(r)Ay$. Hence
\begin{align*}
S(h)y-y=\int_0^h S(r)Ay\,d\mathcal{L}^1(r).
\end{align*}
Applying this with $y=x_h$ gives
\begin{align*}
\frac{S(h)x_h-x_h}{h}=\frac{1}{h}\int_0^h S(r)Ax_h\,d\mathcal{L}^1(r).
\end{align*}
Subtract $Ax$ and use the triangle inequality:
\begin{align*}
\left\|\frac{S(h)x_h-x_h}{h}-Ax\right\|_X
\leq \frac{1}{h}\int_0^h \|S(r)(Ax_h-Ax)\|_X\,d\mathcal{L}^1(r)
+\left\|\frac{1}{h}\int_0^h S(r)Ax\,d\mathcal{L}^1(r)-Ax\right\|_X.
\end{align*}
The first term is at most $M_{h_0}\|Ax_h-Ax\|_X$, and this tends to $0$ because $x_h\to x$ in the graph norm. The second term tends to $0$ because $S(r)Ax\to Ax$ in $X$ as $r\downarrow 0$, and the average of a [continuous function](/page/Continuous%20Function) over $[0,h]$ converges to its value at $0$. Therefore
\begin{align*}
\frac{S(h)x_h-x_h}{h}\to Ax
\end{align*}
in $X$.
[/guided]
[/step]
[step:Derive the right derivative from the mild formula]
Fix $t\in[0,T)$ and let $h>0$ with $t+h\leq T$. The semigroup property gives
\begin{align*}
S(t+h)u_0=S(h)S(t)u_0.
\end{align*}
For the integral term, split the interval $[0,t+h]$ into $[0,t]$ and $[t,t+h]$. On $[0,t]$, use $S(t+h-s)=S(h)S(t-s)$. Thus
\begin{align*}
u(t+h)=S(h)u(t)+\int_t^{t+h}S(t+h-s)f(s)\,d\mathcal{L}^1(s).
\end{align*}
Therefore
\begin{align*}
\frac{u(t+h)-u(t)}{h}
=
\frac{S(h)u(t)-u(t)}{h}
+
\frac{1}{h}\int_t^{t+h}S(t+h-s)f(s)\,d\mathcal{L}^1(s).
\end{align*}
Since $u(t)\in D(A)$, the first term converges to $Au(t)$ in $X$ as $h\downarrow 0$.
For the second term, define the map $F_h:[0,h]\to X$ by $F_h(r)=S(r)f(t+h-r)$. After the change of variables $r=t+h-s$, with $d\mathcal{L}^1(r)=d\mathcal{L}^1(s)$ and $s\in[t,t+h]$ corresponding to $r\in[h,0]$, reversing the interval gives
\begin{align*}
\frac{1}{h}\int_t^{t+h}S(t+h-s)f(s)\,d\mathcal{L}^1(s)
=
\frac{1}{h}\int_0^h S(r)f(t+h-r)\,d\mathcal{L}^1(r).
\end{align*}
Let $I_X:X\to X$ denote the identity map. The integrand converges uniformly to $f(t)$ for $0\leq r\leq h$ as $h\downarrow 0$. Indeed,
\begin{align*}
\|S(r)f(t+h-r)-f(t)\|_X\leq M_T\|f(t+h-r)-f(t)\|_X+\|S(r)f(t)-I_Xf(t)\|_X.
\end{align*}
The first term tends to $0$ uniformly for $0\leq r\leq h$ by continuity of $f$ at $t$, and the second term tends to $0$ uniformly for $0\leq r\leq h$ by strong continuity of $S$ at $0$ applied to the fixed vector $f(t)\in X$. Hence
\begin{align*}
\frac{1}{h}\int_0^h S(r)f(t+h-r)\,d\mathcal{L}^1(r)\to f(t)
\end{align*}
in $X$. Since $u\in C^1([0,T];X)$, the right derivative equals $u'(t)$, so
\begin{align*}
u'(t)=Au(t)+f(t)
\end{align*}
for every $t\in[0,T)$.
[/step]
[step:Derive the left derivative at positive times]
Fix $t\in(0,T]$ and let $h>0$ with $t-h\geq 0$. Applying the identity from the previous step at time $t-h$ gives
\begin{align*}
u(t)=S(h)u(t-h)+\int_{t-h}^{t}S(t-s)f(s)\,d\mathcal{L}^1(s).
\end{align*}
Thus
\begin{align*}
\frac{u(t)-u(t-h)}{h}
=
\frac{S(h)u(t-h)-u(t-h)}{h}
+
\frac{1}{h}\int_{t-h}^{t}S(t-s)f(s)\,d\mathcal{L}^1(s).
\end{align*}
Since $u\in C([0,T];D(A))$, we have $u(t-h)\to u(t)$ in $D(A)$ as $h\downarrow 0$. The variable-base difference quotient result from the first step therefore gives
\begin{align*}
\frac{S(h)u(t-h)-u(t-h)}{h}\to Au(t)
\end{align*}
in $X$.
For the forcing average, set $r=t-s$. Then $s\in[t-h,t]$ corresponds to $r\in[h,0]$, and reversing the interval gives
\begin{align*}
\frac{1}{h}\int_{t-h}^{t}S(t-s)f(s)\,d\mathcal{L}^1(s)
=
\frac{1}{h}\int_0^h S(r)f(t-r)\,d\mathcal{L}^1(r).
\end{align*}
Again the integrand converges uniformly to $f(t)$ on $0\leq r\leq h$: using the identity map $I_X:X\to X$,
\begin{align*}
\|S(r)f(t-r)-f(t)\|_X\leq M_T\|f(t-r)-f(t)\|_X+\|S(r)f(t)-I_Xf(t)\|_X,
\end{align*}
and both terms tend to $0$ uniformly for $0\leq r\leq h$ by continuity of $f$ at $t$ and strong continuity of $S$ at $0$. Thus the average converges to $f(t)$ in $X$. Since $u\in C^1([0,T];X)$, the left derivative equals $u'(t)$, and hence
\begin{align*}
u'(t)=Au(t)+f(t)
\end{align*}
for every $t\in(0,T]$.
[/step]
[step:Verify the initial value and conclude strong solvability]
At $t=0$, the defining formula gives
\begin{align*}
u(0)=S(0)u_0+\int_0^0 S(0-s)f(s)\,d\mathcal{L}^1(s)=u_0,
\end{align*}
because $S(0)=I_X$, where $I_X:X\to X$ is the identity map, and the Bochner integral over the null interval is $0$.
The hypotheses already give $u\in C^1([0,T];X)$ and $u\in C([0,T];D(A))$. The preceding two steps show
\begin{align*}
u'(t)=Au(t)+f(t)
\end{align*}
for all $t\in[0,T]$, with the endpoint derivatives understood one-sidedly. Therefore $u$ is a strong solution of the abstract Cauchy problem with initial value $u_0$.
[/step]