[proofplan]
Let $\alpha_1, \ldots, \alpha_n$ be a finite $R$-generating set for $S$, adjusted so that $\alpha_1 = 1$. Fix $s \in S$. Multiplication by $s$ is an $R$-linear endomorphism of the $R$-module $S$, and because the $\alpha_i$ span $S$, the expansion of each $s\alpha_i$ in the generators yields a matrix $B \in R^{n \times n}$ with $s\alpha_i = \sum_j b_{ij} \alpha_j$. The key is the Cayley-Hamilton-type identity $(sI - B)\alpha = 0$, where $\alpha = (\alpha_1, \ldots, \alpha_n)^\top$: multiplying by the classical adjugate matrix kills the left-hand side with $\det(sI - B)$ times each $\alpha_i$, and using $\alpha_1 = 1$ extracts the scalar identity $\det(sI - B) = 0$. The characteristic polynomial $\det(xI - B) \in R[x]$ is monic of degree $n$ and annihilates $s$.
[/proofplan]
[step:Normalise the generating set so that $\alpha_1 = 1$]
By hypothesis, $S$ is finitely generated as an $R$-module. Let $\beta_1, \ldots, \beta_m \in S$ be a generating set. Since $S$ is a ring containing $R$ (so $1_S = 1_R \in S$), we may adjoin $1$ to obtain the generating set
\begin{align*}
\alpha_1 = 1, \quad \alpha_2 = \beta_1, \quad \ldots, \quad \alpha_{m+1} = \beta_m.
\end{align*}
Adding an element to a generating set preserves the generating property. Relabelling, we assume $\alpha_1, \ldots, \alpha_n \in S$ generate $S$ as an $R$-module with $\alpha_1 = 1$.
[guided]
Let $S$ be finitely generated as an $R$-module with generating set $\beta_1, \ldots, \beta_m$. Why can we assume $\alpha_1 = 1$? Because our goal is to recover a scalar identity from a matrix identity by extracting a coordinate — specifically, we will derive $\det(sI - B) \cdot \alpha_i = 0$ for all $i$, then read off the scalar identity $\det(sI - B) = 0$ by using an $\alpha_i$ that equals $1$. So we need $1$ itself to be among the generators.
Since $S$ is a ring containing $R$, $1 \in S$. We can always enlarge a generating set by an element without losing the generating property (a superset of a generating set is still a generating set). So we prepend $1$ to $\{\beta_1, \ldots, \beta_m\}$ and relabel to obtain $\alpha_1 = 1, \alpha_2 = \beta_1, \ldots$ — a finite $R$-generating set with $\alpha_1 = 1$. This is a purely notational reduction.
[/guided]
[/step]
[step:Expand each $s\alpha_i$ in the generators to obtain a matrix $B \in R^{n \times n}$]
Fix an arbitrary $s \in S$. For each $i = 1, \ldots, n$, the product $s\alpha_i$ lies in $S$, so it can be written as an $R$-linear combination of the generators:
\begin{align*}
s\alpha_i = \sum_{j=1}^{n} b_{ij} \alpha_j, \qquad b_{ij} \in R.
\end{align*}
Let $B = (b_{ij}) \in R^{n \times n}$. Rewriting:
\begin{align*}
\sum_{j=1}^{n} b_{ij} \alpha_j - s \alpha_i = 0, \qquad i = 1, \ldots, n.
\end{align*}
Equivalently, if $\alpha = (\alpha_1, \ldots, \alpha_n)^\top \in S^n$ and $I = I_n \in R^{n \times n}$ is the identity matrix:
\begin{align*}
(B - sI)\alpha = 0 \quad \text{in } S^n, \tag{$\dagger$}
\end{align*}
where the matrix $B - sI$ has entries in $S$ (since $R \subseteq S$ and $s \in S$), and acts on $\alpha \in S^n$ by the usual matrix-vector product.
[guided]
Fix an arbitrary $s \in S$ — we will show this $s$ is integral over $R$ with an explicit monic polynomial of degree $n$.
For each $i$, the element $s\alpha_i \in S$ can be written as an $R$-linear combination of the generators because $\{\alpha_j\}_{j=1}^n$ generates $S$ over $R$:
\begin{align*}
s\alpha_i = \sum_{j=1}^{n} b_{ij} \alpha_j, \qquad b_{ij} \in R.
\end{align*}
This expansion need not be unique (the $\alpha_j$ may not form a free basis), but we just need *some* such expansion — pick one and fix it.
The matrix $B = (b_{ij}) \in R^{n \times n}$ captures the action of "multiplication by $s$" with respect to the generating set. Rewriting the defining equation gives
\begin{align*}
(B - sI) \alpha = 0,
\end{align*}
where $\alpha$ is the column vector of generators and $B - sI$ is interpreted as a matrix over $S$ (because $sI$ has entries in $S$, and we mix the $R$-valued $B$ with the $S$-valued $sI$ inside the ring $S$).
This matrix identity is the starting point. Our next move is the classical adjugate trick.
[/guided]
[/step]
[step:Multiply by the adjugate to extract $\det(sI - B) \alpha_i = 0$ for all $i$]
For any $n \times n$ matrix $X$ with entries in a commutative ring, the [adjugate](/page/Adjugate%20Matrix) (classical adjoint) $\operatorname{adj}(X) \in R^{n \times n}$ satisfies
\begin{align*}
\operatorname{adj}(X) \cdot X = X \cdot \operatorname{adj}(X) = \det(X) \cdot I.
\end{align*}
This identity is purely formal — a polynomial identity in the entries of $X$ — and holds in any commutative ring, including $S$. Apply it with $X = sI - B$, whose entries lie in $S$:
\begin{align*}
\operatorname{adj}(sI - B) \cdot (sI - B) = \det(sI - B) \cdot I_n \in S^{n \times n}.
\end{align*}
Negating $(\dagger)$ gives $(sI - B)\alpha = 0$. Left-multiplying both sides by $\operatorname{adj}(sI - B)$:
\begin{align*}
0 = \operatorname{adj}(sI - B) \cdot (sI - B) \cdot \alpha = \det(sI - B) \cdot I_n \cdot \alpha = \det(sI - B) \cdot \alpha.
\end{align*}
Reading this componentwise:
\begin{align*}
\det(sI - B) \cdot \alpha_i = 0 \quad \text{in } S, \qquad i = 1, \ldots, n. \tag{$\ddagger$}
\end{align*}
[guided]
The adjugate matrix $\operatorname{adj}(X) \in R^{n \times n}$ of an $n \times n$ matrix $X$ is defined so that its $(i, j)$-entry is $(-1)^{i+j}$ times the determinant of the matrix obtained from $X$ by deleting row $j$ and column $i$. The [adjugate](/page/Adjugate%20Matrix) satisfies the fundamental identity
\begin{align*}
\operatorname{adj}(X) \cdot X = X \cdot \operatorname{adj}(X) = \det(X) \cdot I.
\end{align*}
This identity is **formal** — it is a polynomial identity in the entries of $X$ that holds in any commutative ring, because both sides expand into the same polynomial expressions in the entries (via the Leibniz expansion of the determinant). So we can apply it with entries in $S$, which is a commutative ring by hypothesis.
Apply it with $X = sI - B$ and entries in $S$:
\begin{align*}
\operatorname{adj}(sI - B) \cdot (sI - B) = \det(sI - B) \cdot I_n.
\end{align*}
From $(\dagger)$ we have $(B - sI)\alpha = 0$, i.e., $(sI - B)\alpha = 0$. Left-multiplying by $\operatorname{adj}(sI - B)$:
\begin{align*}
0 = \operatorname{adj}(sI - B) \cdot (sI - B) \cdot \alpha = \det(sI - B) \cdot I_n \cdot \alpha = \det(sI - B)\alpha.
\end{align*}
Reading componentwise: $\det(sI - B) \cdot \alpha_i = 0$ for each $i \in \{1, \ldots, n\}$.
Why does this trick work? The adjugate identity converts the matrix equation $(sI - B)\alpha = 0$ (from which, in a field, one might read off $\det(sI - B) = 0$ via "non-trivial kernel") into a scalar equation valid in the ring $S$, without needing $S$ to be a field or the $\alpha_i$ to form a basis. We do not need injectivity or any rank statement — we just need the formal identity $\operatorname{adj}(X) X = \det(X) I$, which holds over any commutative ring.
[/guided]
[/step]
[step:Use $\alpha_1 = 1$ to deduce the scalar equation $\det(sI - B) = 0$]
Specialising $(\ddagger)$ to $i = 1$:
\begin{align*}
\det(sI - B) \cdot \alpha_1 = \det(sI - B) \cdot 1 = \det(sI - B) = 0 \quad \text{in } S.
\end{align*}
[guided]
The identity $(\ddagger)$ states $\det(sI - B) \cdot \alpha_i = 0$ for every $i$. So far we have a scalar $\det(sI - B) \in S$ that annihilates every generator. But an element of $S$ that is zero against every generator need not be zero — unless one of the generators equals $1$.
This is exactly why we normalised $\alpha_1 = 1$ in the first step. Specialising to $i = 1$:
\begin{align*}
\det(sI - B) \cdot 1 = \det(sI - B) = 0.
\end{align*}
So $\det(sI - B) = 0$ as an element of $S$.
If $1$ had not been among the generators, we would only have $\det(sI - B) \cdot \alpha_i = 0$ for all $i$ — a statement that the scalar $\det(sI - B)$ annihilates the whole generating set. This is not enough to conclude that the scalar is zero (e.g., if $S$ had nontrivial torsion, a nonzero scalar could kill every generator). Making $1$ a generator bypasses this issue.
[/guided]
[/step]
[step:Exhibit a monic polynomial in $R[x]$ annihilating $s$]
Define the characteristic polynomial
\begin{align*}
f: R &\to R \\
f(x) &= \det(xI_n - B) \in R[x].
\end{align*}
(Here $x$ is an indeterminate; the entries of $xI_n - B$ lie in $R[x]$, so $f(x) \in R[x]$.)
We claim $f$ is monic of degree $n$. Expanding the determinant via the Leibniz formula,
\begin{align*}
f(x) = \det(xI_n - B) = \sum_{\sigma \in S_n} \operatorname{sgn}(\sigma) \prod_{i=1}^n (xI_n - B)_{i, \sigma(i)}.
\end{align*}
The identity permutation $\sigma = \operatorname{id}$ contributes $\prod_{i=1}^n (x - b_{ii})$, whose leading term is $x^n$. Every other permutation $\sigma \neq \operatorname{id}$ has at least two indices $i$ with $\sigma(i) \neq i$, so at least two diagonal factors $(xI_n - B)_{i, \sigma(i)} = -b_{i, \sigma(i)}$ are constant in $x$; therefore the total $x$-degree of $\prod_i (xI_n - B)_{i, \sigma(i)}$ is at most $n - 2 < n$. Hence $f(x) = x^n + (\text{lower-order terms in } x)$, with all coefficients in $R$.
Evaluating $f$ at $s \in S$ (via the ring homomorphism $R[x] \to S$, $x \mapsto s$, which extends the inclusion $R \hookrightarrow S$):
\begin{align*}
f(s) = \det(sI_n - B) = 0 \quad \text{in } S,
\end{align*}
where the last equality is the scalar identity from the previous step. Hence $s$ is integral over $R$.
Since $s \in S$ was arbitrary, every element of $S$ is integral over $R$, i.e., $S$ is integral over $R$.
[guided]
Define the characteristic polynomial of the matrix $B$:
\begin{align*}
f(x) = \det(xI_n - B) \in R[x],
\end{align*}
where $x$ is an indeterminate and the entries of $xI_n - B$ lie in $R[x]$ (diagonal entries are $x - b_{ii}$, off-diagonal entries are $-b_{ij}$, all of which are in $R[x]$). Hence $\det(xI_n - B) \in R[x]$.
We claim $f$ is monic of degree $n$. By the Leibniz formula,
\begin{align*}
f(x) = \sum_{\sigma \in S_n} \operatorname{sgn}(\sigma) \prod_{i=1}^n (xI_n - B)_{i, \sigma(i)}.
\end{align*}
For $\sigma = \operatorname{id}$, the product is $\prod_i (x - b_{ii})$, a product of $n$ monic linear factors, with leading term $x^n$. For any other $\sigma$, there are at least two indices $i$ where $\sigma(i) \neq i$, so at least two factors in the product are off-diagonal entries $-b_{i,\sigma(i)}$ which do not depend on $x$. Therefore the product has $x$-degree at most $n - 2$. Can any other permutation also contribute a degree-$n$ term? No — only the identity has all $n$ factors contain $x$, and only those factors are linear in $x$ with leading coefficient $1$. So
\begin{align*}
f(x) = x^n + c_{n-1} x^{n-1} + \cdots + c_0, \qquad c_i \in R,
\end{align*}
is monic of degree $n$ with coefficients in $R$.
Now apply the ring homomorphism $R[x] \to S$, $x \mapsto s$ (which exists because $S$ is an $R$-algebra via the inclusion $R \subseteq S$). The image $f(s) \in S$ equals $\det(sI_n - B)$, and by the previous step this determinant is $0$. Therefore
\begin{align*}
f(s) = 0 \quad \text{in } S,
\end{align*}
so $s$ satisfies the monic polynomial $f \in R[x]$: $s$ is integral over $R$.
Since $s \in S$ was arbitrary, every element of $S$ is integral over $R$. By definition, $S$ is integral over $R$.
As a side remark: this is the classical **determinant trick**, the module-theoretic generalisation of the Cayley-Hamilton theorem. The conclusion is quantitative: the integrality degree is at most the size of the generating set. In particular, if $S$ is generated by $n$ elements as an $R$-module, every element of $S$ satisfies a monic polynomial of degree at most $n$.
[/guided]
[/step]