[proofplan]
We prove the contraction case ($M = 1$, $\omega = 0$); the general case follows by rescaling. The forward implication (semigroup $\Rightarrow$ resolvent estimate) is the easier direction: we exhibit the resolvent as the Laplace transform $R(\lambda, A)x = \int_0^\infty e^{-\lambda t}T(t)x\, d\mathcal{L}^1(t)$ and read off the bound from contractivity. The reverse direction is the celebrated construction: given the resolvent estimate, define the **Yosida approximations** $A_\lambda := \lambda A R(\lambda, A) = \lambda^2 R(\lambda, A) - \lambda I$, which are bounded operators that approximate $A$ on $D(A)$. The exponentials $e^{tA_\lambda}$ form contraction semigroups (verified via the bound on $R(\lambda, A)$), and we pass to the limit $\lambda \to \infty$ using a Cauchy estimate to obtain the desired semigroup $T(t)$. Finally we verify that the limit semigroup has $A$ as its generator. The general case $\|T(t)\| \le Me^{\omega t}$ reduces to the contraction case after rescaling; the appearance of $R(\lambda, A)^n$ rather than $R(\lambda, A)$ encodes the constant $M$ via an iterated Neumann argument.
[/proofplan]
[step:Express the resolvent as the Laplace transform of the semigroup to obtain the forward implication]
Assume (1): $A$ generates a $C_0$-semigroup $\{T(t)\}_{t\ge 0}$ with $\|T(t)\|_{\mathcal{L}(X)} \le 1$ for all $t \ge 0$. Fix $\lambda > 0$ and define
\begin{align*}
S_\lambda: X &\to X, \\
x &\mapsto \int_0^\infty e^{-\lambda t} T(t) x \, d\mathcal{L}^1(t).
\end{align*}
The integrand $t \mapsto e^{-\lambda t} T(t) x$ is strongly continuous (since the semigroup is strongly continuous) and satisfies the Bochner-integrable bound $\|e^{-\lambda t} T(t) x\|_X \le e^{-\lambda t} \|x\|_X$, integrable on $[0,\infty)$ since $\lambda > 0$. Hence $S_\lambda x$ is well-defined as a Bochner integral, and
\begin{align*}
\|S_\lambda x\|_X \le \int_0^\infty e^{-\lambda t} \|x\|_X \, d\mathcal{L}^1(t) = \frac{1}{\lambda} \|x\|_X.
\end{align*}
[claim:$S_\lambda x \in D(A)$ and $(\lambda I - A)S_\lambda x = x$ for every $x \in X$]
[/claim]
[proof]
Fix $h > 0$. Using the semigroup property $T(t+h) = T(h)T(t)$ together with linearity and boundedness of $T(h)$,
\begin{align*}
\frac{T(h) - I}{h} S_\lambda x &= \frac{1}{h}\int_0^\infty e^{-\lambda t}[T(t+h)x - T(t)x]\, d\mathcal{L}^1(t) \\
&= \frac{1}{h}\left[\int_h^\infty e^{-\lambda(s-h)} T(s)x \, d\mathcal{L}^1(s) - \int_0^\infty e^{-\lambda t} T(t)x \, d\mathcal{L}^1(t)\right] \\
&= \frac{e^{\lambda h} - 1}{h}\int_0^\infty e^{-\lambda t} T(t)x \, d\mathcal{L}^1(t) - \frac{e^{\lambda h}}{h}\int_0^h e^{-\lambda t} T(t) x \, d\mathcal{L}^1(t),
\end{align*}
where in the second line we substituted $s = t + h$ and in the third we split the integral $\int_h^\infty = \int_0^\infty - \int_0^h$ and combined exponentials. Letting $h \to 0^+$: the first term tends to $\lambda S_\lambda x$ since $\frac{e^{\lambda h}-1}{h} \to \lambda$, while the second term tends to $T(0)x = x$ by continuity of the integrand at $0$. Therefore
\begin{align*}
\lim_{h \to 0^+} \frac{T(h) - I}{h} S_\lambda x = \lambda S_\lambda x - x.
\end{align*}
By the [Closure and Density of the $C_0$-Semigroup Generator](/theorems/3144), the existence of this limit means $S_\lambda x \in D(A)$ and $A S_\lambda x = \lambda S_\lambda x - x$, i.e., $(\lambda I - A) S_\lambda x = x$.
[/proof]
A symmetric computation (or direct application: for $x \in D(A)$, $T(t)Ax = AT(t)x$, so $S_\lambda$ commutes with $A$ on $D(A)$) gives $S_\lambda(\lambda I - A)y = y$ for $y \in D(A)$. Thus $\lambda I - A$ is bijective from $D(A)$ to $X$ with inverse $S_\lambda$, so $\lambda \in \rho(A)$ and $R(\lambda, A) = S_\lambda$. The bound $\|R(\lambda, A)\|_{\mathcal{L}(X)} \le 1/\lambda$ follows from the estimate above.
For the iterated bound, differentiating the Laplace transform identity in $\lambda$ (or iterating) yields $R(\lambda, A)^n x = \frac{1}{(n-1)!}\int_0^\infty t^{n-1} e^{-\lambda t} T(t) x \, d\mathcal{L}^1(t)$, hence $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le \frac{1}{(n-1)!}\int_0^\infty t^{n-1} e^{-\lambda t}\, d\mathcal{L}^1(t) = \frac{1}{\lambda^n}$.
[/step]
[step:Define the Yosida approximations $A_\lambda := \lambda A R(\lambda, A)$ and show they converge strongly to $A$ on $D(A)$]
Assume (2): every $\lambda > 0$ belongs to $\rho(A)$ with $\|R(\lambda, A)\|_{\mathcal{L}(X)} \le 1/\lambda$. Define the **Yosida approximations**
\begin{align*}
A_\lambda: X &\to X, \\
x &\mapsto \lambda A R(\lambda, A) x \quad (\lambda > 0).
\end{align*}
We rewrite this in a more useful form. The resolvent identity $A R(\lambda, A) = \lambda R(\lambda, A) - I$ (which holds because $(\lambda I - A) R(\lambda, A) = I$ implies $A R(\lambda, A) = \lambda R(\lambda, A) - I$) gives
\begin{align*}
A_\lambda = \lambda^2 R(\lambda, A) - \lambda I.
\end{align*}
This makes manifest that $A_\lambda \in \mathcal{L}(X)$: it is a bounded operator with
\begin{align*}
\|A_\lambda\|_{\mathcal{L}(X)} \le \lambda^2 \cdot \frac{1}{\lambda} + \lambda = 2\lambda.
\end{align*}
[claim:$\lambda R(\lambda, A) x \to x$ as $\lambda \to \infty$ for every $x \in X$]
[/claim]
[proof]
First take $x \in D(A)$. Then
\begin{align*}
\lambda R(\lambda, A) x - x = \lambda R(\lambda, A) x - (\lambda I - A) R(\lambda, A) x = A R(\lambda, A) x = R(\lambda, A) A x,
\end{align*}
where we used that $R(\lambda, A)$ commutes with $A$ on $D(A)$ (since $R(\lambda, A) = (\lambda I - A)^{-1}$). Hence $\|\lambda R(\lambda, A) x - x\|_X = \|R(\lambda, A) Ax\|_X \le \frac{1}{\lambda} \|Ax\|_X \to 0$.
For general $x \in X$: since $D(A)$ is dense in $X$ (by hypothesis), given $\varepsilon > 0$ there is $y \in D(A)$ with $\|x - y\|_X < \varepsilon$. Using $\|\lambda R(\lambda, A)\|_{\mathcal{L}(X)} \le 1$:
\begin{align*}
\|\lambda R(\lambda, A) x - x\|_X &\le \|\lambda R(\lambda, A)(x - y)\|_X + \|\lambda R(\lambda, A) y - y\|_X + \|y - x\|_X \\
&\le 2\varepsilon + \|R(\lambda, A) Ay\|_X.
\end{align*}
Letting $\lambda \to \infty$ and then $\varepsilon \to 0$ yields the conclusion.
[/proof]
Consequently, for $x \in D(A)$,
\begin{align*}
A_\lambda x = \lambda A R(\lambda, A) x = \lambda R(\lambda, A) A x \to A x \quad \text{as } \lambda \to \infty.
\end{align*}
[/step]
[step:Define $T_\lambda(t) := e^{tA_\lambda}$ and show each is a contraction semigroup using the resolvent bound]
For each $\lambda > 0$, since $A_\lambda$ is bounded, the exponential $T_\lambda(t) := \exp(tA_\lambda) = \sum_{n=0}^\infty \frac{t^n A_\lambda^n}{n!}$ is well-defined for all $t \in \mathbb{R}$ and forms a uniformly continuous semigroup. We bound $\|T_\lambda(t)\|_{\mathcal{L}(X)}$ using the decomposition $A_\lambda = \lambda^2 R(\lambda, A) - \lambda I$:
\begin{align*}
T_\lambda(t) = e^{t(\lambda^2 R(\lambda, A) - \lambda I)} = e^{-\lambda t} e^{t \lambda^2 R(\lambda, A)},
\end{align*}
where the splitting is valid because $\lambda I$ commutes with $\lambda^2 R(\lambda, A)$. Bounding the second exponential using the power series and $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le 1/\lambda^n$ (from iterating the hypothesis $\|R(\lambda, A)\|_{\mathcal{L}(X)} \le 1/\lambda$ at the level of the resolvent identity, or directly from the Neumann series):
\begin{align*}
\|e^{t\lambda^2 R(\lambda, A)}\|_{\mathcal{L}(X)} \le \sum_{n=0}^\infty \frac{(t\lambda^2)^n}{n!}\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le \sum_{n=0}^\infty \frac{(t\lambda^2)^n}{n! \lambda^n} = e^{t\lambda}.
\end{align*}
Combining:
\begin{align*}
\|T_\lambda(t)\|_{\mathcal{L}(X)} \le e^{-\lambda t} \cdot e^{t \lambda} = 1, \quad t \ge 0.
\end{align*}
So each $\{T_\lambda(t)\}_{t\ge 0}$ is a contraction semigroup.
[guided]
The bound $\|R(\lambda, A)\|_{\mathcal{L}(X)} \le 1/\lambda$ does not directly say anything about exponentials; we need a workaround. The trick is the algebraic identity
\begin{align*}
A_\lambda = \lambda^2 R(\lambda, A) - \lambda I,
\end{align*}
which expresses $A_\lambda$ as a *positive* operator $\lambda^2 R(\lambda, A)$ shifted by $-\lambda I$. Why is this useful? Because $-\lambda I$ contributes a factor $e^{-\lambda t}$, which is exponentially small, while the positive operator $\lambda^2 R(\lambda, A)$ has growth at most $e^{\lambda t}$ — these *exactly* cancel.
To make this rigorous: $\lambda I$ and $\lambda^2 R(\lambda, A)$ commute (the former is a scalar), so we may split the matrix exponential
\begin{align*}
T_\lambda(t) = e^{tA_\lambda} = e^{-\lambda t I} e^{t\lambda^2 R(\lambda, A)} = e^{-\lambda t}\, e^{t\lambda^2 R(\lambda, A)}.
\end{align*}
For the operator-valued exponential $e^{t\lambda^2 R(\lambda, A)}$, we estimate via the power series. We need bounds on $\|R(\lambda, A)^n\|$. Since $\lambda \in \rho(A)$ and $R(\lambda, A) \in \mathcal{L}(X)$ with $\|R(\lambda, A)\|_{\mathcal{L}(X)} \le 1/\lambda$, the operator norm submultiplicativity gives $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le \|R(\lambda, A)\|_{\mathcal{L}(X)}^n \le 1/\lambda^n$. Then
\begin{align*}
\|e^{t\lambda^2 R(\lambda, A)}\|_{\mathcal{L}(X)} \le \sum_{n=0}^\infty \frac{(t\lambda^2)^n}{n!} \cdot \frac{1}{\lambda^n} = \sum_{n=0}^\infty \frac{(t\lambda)^n}{n!} = e^{t\lambda}.
\end{align*}
Multiplying by $e^{-\lambda t}$ from the first exponential gives $\|T_\lambda(t)\|_{\mathcal{L}(X)} \le 1$, exactly as needed.
What would fail in the general case $\|R(\lambda, A)\|_{\mathcal{L}(X)} \le M/(\lambda - \omega)$? The submultiplicativity bound $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le M^n/(\lambda - \omega)^n$ would introduce an $M^n$ factor that does *not* cancel — the exponential series would diverge. This is precisely why the general statement requires the *iterated* bound $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le M/(\lambda - \omega)^n$ (with $M$, not $M^n$). The factor $M$ outside the exponent is the right hypothesis.
[/guided]
[/step]
[step:Show $(T_\lambda(t)x)$ is Cauchy in $\lambda$ uniformly on bounded $t$, defining $T(t)x := \lim T_\lambda(t)x$]
For $\mu, \lambda > 0$ and $x \in D(A)$, the bounded operators $A_\mu$ and $A_\lambda$ commute (both are polynomials in $R(\mu, A)$ and $R(\lambda, A)$, which commute by the resolvent equation $R(\lambda, A) - R(\mu, A) = (\mu - \lambda) R(\lambda, A) R(\mu, A)$). Therefore $T_\mu(t)$ and $T_\lambda(t)$ commute, and we may write
\begin{align*}
T_\mu(t) x - T_\lambda(t) x = \int_0^t \frac{d}{ds}\left[T_\mu(t-s) T_\lambda(s) x\right] d\mathcal{L}^1(s) = \int_0^t T_\mu(t-s) T_\lambda(s) (A_\lambda - A_\mu) x \, d\mathcal{L}^1(s),
\end{align*}
where the derivative inside the integrand uses commutativity: $\frac{d}{ds}[T_\mu(t-s)T_\lambda(s)x] = T_\mu(t-s)T_\lambda(s)(A_\lambda - A_\mu)x$.
Since $\|T_\mu(t-s)\|_{\mathcal{L}(X)} \le 1$ and $\|T_\lambda(s)\|_{\mathcal{L}(X)} \le 1$, taking norms:
\begin{align*}
\|T_\mu(t) x - T_\lambda(t) x\|_X \le t \cdot \|A_\lambda x - A_\mu x\|_X.
\end{align*}
Both $A_\lambda x \to Ax$ and $A_\mu x \to Ax$ as $\lambda, \mu \to \infty$ (Step 2), so $\|A_\lambda x - A_\mu x\|_X \to 0$. Hence $(T_\lambda(t) x)_{\lambda > 0}$ is Cauchy in $X$, uniformly for $t$ in any bounded interval $[0, t_0]$. Define
\begin{align*}
T(t) x := \lim_{\lambda \to \infty} T_\lambda(t) x \quad \text{for } x \in D(A).
\end{align*}
Since $\|T_\lambda(t)\|_{\mathcal{L}(X)} \le 1$ uniformly and $D(A)$ is dense in $X$, this extends to all $x \in X$ with $\|T(t)\|_{\mathcal{L}(X)} \le 1$. The convergence is uniform in $t$ on bounded intervals, so $T(\cdot)x$ is continuous (limit of continuous functions). The semigroup property $T(s+t) = T(s)T(t)$ passes to the limit from $T_\lambda(s+t) = T_\lambda(s) T_\lambda(t)$, and $T(0) = I$. Strong continuity at $t = 0$ holds because each $T_\lambda$ is uniformly continuous and the convergence is uniform on $[0, t_0]$.
[/step]
[step:Verify that $A$ is the generator of $\{T(t)\}_{t \ge 0}$]
Let $B$ denote the generator of the semigroup $\{T(t)\}_{t\ge 0}$ constructed in Step 4. We must show $B = A$.
For $x \in D(A)$, since $T_\lambda(t)x \to T(t)x$ and $A_\lambda x \to Ax$, we have
\begin{align*}
T_\lambda(t) x - x = \int_0^t T_\lambda(s) A_\lambda x \, d\mathcal{L}^1(s).
\end{align*}
Letting $\lambda \to \infty$ (using uniform convergence in $s \in [0, t]$ and uniform boundedness of $T_\lambda$):
\begin{align*}
T(t) x - x = \int_0^t T(s) A x \, d\mathcal{L}^1(s).
\end{align*}
Dividing by $t$ and letting $t \to 0^+$, the right-hand side tends to $T(0)Ax = Ax$ by continuity of $s \mapsto T(s)Ax$. Thus the limit $\lim_{t\to 0^+}\frac{T(t)x - x}{t} = Ax$ exists in $X$, so $x \in D(B)$ and $Bx = Ax$. This proves $A \subseteq B$ (i.e., $D(A) \subseteq D(B)$ and $B$ extends $A$).
To show $A = B$: pick any $\lambda > 0$. By the forward direction (already proved in Step 1 applied to the contraction semigroup $T(t)$), $\lambda \in \rho(B)$ with $\|R(\lambda, B)\|_{\mathcal{L}(X)} \le 1/\lambda$. By hypothesis, $\lambda \in \rho(A)$. Now $A \subseteq B$ implies $\lambda I - A \subseteq \lambda I - B$. Since $\lambda I - A: D(A) \to X$ is bijective and $\lambda I - B: D(B) \to X$ is bijective, both have the same inverse on $X$ (the unique pre-image), forcing $D(A) = D(B)$. Therefore $A = B$.
[/step]
[step:Reduce the general case $\|T(t)\|_{\mathcal{L}(X)} \le M e^{\omega t}$ to the contraction case via rescaling and renorming]
For the general case, we use two reductions:
**Reduction 1 (rescaling):** If $A$ generates $\{T(t)\}_{t\ge 0}$, then $A - \omega I$ generates $\{e^{-\omega t} T(t)\}_{t\ge 0}$. The bound $\|T(t)\|_{\mathcal{L}(X)} \le Me^{\omega t}$ becomes $\|e^{-\omega t} T(t)\|_{\mathcal{L}(X)} \le M$, and the resolvent shifts: $R(\lambda, A - \omega I) = R(\lambda + \omega, A)$. The resolvent estimate $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le M/(\lambda - \omega)^n$ becomes $\|R(\lambda, A - \omega I)^n\|_{\mathcal{L}(X)} \le M/\lambda^n$ for $\lambda > 0$.
So we reduce to: $A$ generates $\{T(t)\}_{t\ge 0}$ with $\|T(t)\|_{\mathcal{L}(X)} \le M$ if and only if $\rho(A) \supset (0, \infty)$ and $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le M/\lambda^n$ for $\lambda > 0$, $n \ge 1$.
**Reduction 2 (renorming):** Define a new norm
\begin{align*}
|\!|\!|x|\!|\!| := \sup_{\lambda > 0, n \ge 0} \|\lambda^n R(\lambda, A)^n x\|_X.
\end{align*}
The bound $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le M/\lambda^n$ guarantees $\|x\|_X \le |\!|\!|x|\!|\!| \le M\|x\|_X$, so $|\!|\!|\cdot|\!|\!|$ is equivalent to $\|\cdot\|_X$ and $X$ remains a Banach space. By construction, in the new norm, $|\!|\!|\lambda R(\lambda, A) x|\!|\!| \le |\!|\!|x|\!|\!|$ for all $\lambda > 0$, hence $|\!|\!|R(\lambda, A)|\!|\!| \le 1/\lambda$ — the contraction-case hypothesis. Applying Steps 1-5 in the renormed space yields a semigroup $\{T(t)\}_{t \ge 0}$ generated by $A$ with $|\!|\!|T(t)|\!|\!| \le 1$. Translating back: $\|T(t) x\|_X \le |\!|\!|T(t) x|\!|\!| \le |\!|\!|x|\!|\!| \le M \|x\|_X$.
Combining both reductions: under hypothesis (2) of the general statement, $A$ generates a $C_0$-semigroup with $\|T(t)\|_{\mathcal{L}(X)} \le Me^{\omega t}$. The forward direction is again immediate from the Laplace transform: $\|T(t)\|_{\mathcal{L}(X)} \le Me^{\omega t}$ implies for $\lambda > \omega$,
\begin{align*}
R(\lambda, A)^n x = \frac{1}{(n-1)!}\int_0^\infty t^{n-1} e^{-\lambda t} T(t) x \, d\mathcal{L}^1(t),
\end{align*}
so $\|R(\lambda, A)^n\|_{\mathcal{L}(X)} \le \frac{M}{(n-1)!}\int_0^\infty t^{n-1} e^{-(\lambda-\omega)t}\, d\mathcal{L}^1(t) = \frac{M}{(\lambda - \omega)^n}$.
This completes the proof of the equivalence in both the contraction case and the general case.
[/step]