[proofplan]
We prove that each of the three displayed forms is the unique polynomial of degree at most $n$ interpolating the data $(x_i,f(x_i))$. The Lagrange form is checked directly from the cardinal property of the basis polynomials. The Vandermonde form is the same interpolation problem written in the monomial basis, and uniqueness follows from the elementary fact that a nonzero polynomial of degree at most $n$ cannot have $n+1$ distinct roots. Finally, the Newton divided-difference form is shown by induction on the number of nodes to interpolate the same data, so uniqueness forces all three forms to coincide.
[/proofplan]
[step:Verify that the Lagrange form interpolates the prescribed node values]
For each $i \in \{0,1,\dots,n\}$, the factors defining $\ell_i$ are legitimate because $x_i - x_j \ne 0$ whenever $j \ne i$. Since $\ell_i$ is a product of $n$ linear polynomials, $\ell_i \in \mathcal{P}_n(\mathbb{R})$. Hence $L_f \in \mathcal{P}_n(\mathbb{R})$ because $\mathcal{P}_n(\mathbb{R})$ is closed under finite linear combinations.
Fix $m \in \{0,1,\dots,n\}$. If $i=m$, then every factor in $\ell_m(x_m)$ equals $1$, so $\ell_m(x_m)=1$. If $i \ne m$, then the factor with $j=m$ occurs in the product defining $\ell_i$, and therefore
\begin{align*}
\ell_i(x_m)=0.
\end{align*}
Thus
\begin{align*}
L_f(x_m)=\sum_{i=0}^{n} f(x_i)\ell_i(x_m)=f(x_m).
\end{align*}
Since $m$ was arbitrary, $L_f$ interpolates $f$ at all nodes $x_0,\dots,x_n$.
[guided]
The Lagrange basis is built so that each basis polynomial selects exactly one node. First, for $0 \le i \le n$, the formula
\begin{align*}
\ell_i(t) := \prod_{\substack{0 \le j \le n, j \ne i}} \frac{t - x_j}{x_i - x_j}
\end{align*}
is well-defined because the nodes are pairwise distinct, so $x_i-x_j \ne 0$ when $i \ne j$. The polynomial $\ell_i$ is a product of $n$ linear factors, so $\ell_i \in \mathcal{P}_n(\mathbb{R})$. The Lagrange polynomial
\begin{align*}
L_f(t) := \sum_{i=0}^{n} f(x_i)\ell_i(t)
\end{align*}
is therefore also in $\mathcal{P}_n(\mathbb{R})$, since a finite linear combination of polynomials of degree at most $n$ again has degree at most $n$, unless it is the zero polynomial, which is included in $\mathcal{P}_n(\mathbb{R})$ by definition.
Now fix a node index $m \in \{0,1,\dots,n\}$. The decisive property is the cardinal identity $\ell_i(x_m)=1$ when $i=m$ and $\ell_i(x_m)=0$ when $i \ne m$. If $i=m$, substituting $t=x_m$ gives
\begin{align*}
\ell_m(x_m)=\prod_{\substack{0 \le j \le n, j \ne m}} \frac{x_m - x_j}{x_m - x_j}=1.
\end{align*}
If $i \ne m$, then the product defining $\ell_i$ contains the factor indexed by $j=m$, namely
\begin{align*}
\frac{x_m-x_m}{x_i-x_m}=0.
\end{align*}
Therefore $\ell_i(x_m)=0$.
Substituting these values into $L_f$ gives
\begin{align*}
L_f(x_m)=\sum_{i=0}^{n} f(x_i)\ell_i(x_m)=f(x_m)\ell_m(x_m)=f(x_m).
\end{align*}
Because $m$ was arbitrary, $L_f$ has the desired interpolation values at every node.
[/guided]
[/step]
[step:Prove uniqueness of interpolation at the distinct nodes]
We use the following elementary root-counting fact: if $q \in \mathbb{R}[t]$ is nonzero and has degree $d$, then $q$ has at most $d$ distinct real roots. Indeed, if $r \in \mathbb{R}$ is a root of $q$, the [factor theorem](/theorems/3235) gives $q(t)=(t-r)s(t)$ for some $s \in \mathbb{R}[t]$ with $\deg s=d-1$. Repeating this factorization over distinct roots shows that $m$ distinct roots force $m \le d$.
Now let $p,q \in \mathcal{P}_n(\mathbb{R})$ satisfy $p(x_i)=q(x_i)$ for every $i \in \{0,1,\dots,n\}$. Define $r \in \mathbb{R}[t]$ by
\begin{align*}
r(t):=p(t)-q(t).
\end{align*}
Then $r \in \mathcal{P}_n(\mathbb{R})$ and $r(x_i)=0$ for every $i$. Since the $x_i$ are $n+1$ distinct [real numbers](/page/Real%20Numbers), the root-counting fact implies that $r$ cannot be nonzero. Hence $r=0$, so $p=q$.
[/step]
[step:Identify the Vandermonde solution with the unique interpolating polynomial]
Let
\begin{align*}
\Phi: \mathbb{R}^{n+1} \to \mathcal{P}_n(\mathbb{R})
\end{align*}
be the map sending a coefficient vector $a=(a_0,\dots,a_n)$ to the polynomial
\begin{align*}
\Phi(a)(t):=\sum_{k=0}^{n} a_k t^k.
\end{align*}
Since $L_f \in \mathcal{P}_n(\mathbb{R})$, there is a coefficient vector $b=(b_0,\dots,b_n) \in \mathbb{R}^{n+1}$ such that
\begin{align*}
L_f(t)=\sum_{k=0}^{n} b_k t^k.
\end{align*}
Evaluating this identity at each $x_i$ and using the preceding interpolation property gives
\begin{align*}
\sum_{k=0}^{n} b_k x_i^k=f(x_i)
\end{align*}
for every $i \in \{0,1,\dots,n\}$. Thus the Vandermonde system has at least one solution.
If $a=(a_0,\dots,a_n)$ and $c=(c_0,\dots,c_n)$ are two solutions, define $p_a,p_c \in \mathcal{P}_n(\mathbb{R})$ by
\begin{align*}
p_a(t):=\sum_{k=0}^{n} a_k t^k
\end{align*}
and
\begin{align*}
p_c(t):=\sum_{k=0}^{n} c_k t^k.
\end{align*}
The equations in the Vandermonde system give $p_a(x_i)=f(x_i)=p_c(x_i)$ for every $i$. By uniqueness of interpolation at the distinct nodes, $p_a=p_c$. Equality of polynomials in the monomial basis gives $a_k=c_k$ for every $k \in \{0,1,\dots,n\}$. Hence the Vandermonde system has a unique solution, and its associated polynomial $V_f$ is the unique element of $\mathcal{P}_n(\mathbb{R})$ interpolating the data.
[/step]
[step:Show that the Newton divided-difference form interpolates the same data]
The divided differences $f[x_i,\dots,x_j]$ and the Newton polynomial $N_f$ are those defined in the theorem statement. The denominators in the recursive definition are nonzero because the nodes are pairwise distinct. Each summand in
\begin{align*}
N_f(t):=f[x_0]+\sum_{m=1}^{n} f[x_0,\dots,x_m]\prod_{j=0}^{m-1}(t-x_j)
\end{align*}
has degree at most $m$, so $N_f \in \mathcal{P}_n(\mathbb{R})$.
[claim:The divided difference is the leading coefficient of the interpolating polynomial on its nodes]
Let $m \in \{0,1,\dots,n\}$ and let $y_0,\dots,y_m$ be pairwise distinct elements of $\mathbb{R}$ chosen from the ordered list $x_0,\dots,x_n$. Let $P_m \in \mathcal{P}_m(\mathbb{R})$ be the unique polynomial satisfying $P_m(y_i)=f(y_i)$ for every $i \in \{0,1,\dots,m\}$. Then the coefficient of $t^m$ in $P_m$ is the recursively defined divided difference $f[y_0,\dots,y_m]$.
[/claim]
[proof]
We prove the claim by induction on $m$. For $m=0$, the unique interpolating polynomial is the constant polynomial $P_0(t)=f(y_0)$, and its coefficient of $t^0$ is $f[y_0]$.
Assume $m \ge 1$ and that the claim is known for $m-1$ nodes. Let $P_{0,m-1} \in \mathcal{P}_{m-1}(\mathbb{R})$ be the unique interpolating polynomial for $y_0,\dots,y_{m-1}$, and let $P_{1,m} \in \mathcal{P}_{m-1}(\mathbb{R})$ be the unique interpolating polynomial for $y_1,\dots,y_m$. Define $R_m \in \mathbb{R}[t]$ by
\begin{align*}
R_m(t):=\frac{(t-y_0)P_{1,m}(t)-(t-y_m)P_{0,m-1}(t)}{y_m-y_0}.
\end{align*}
The denominator is nonzero because $y_0 \ne y_m$, and $R_m \in \mathcal{P}_m(\mathbb{R})$. At $t=y_0$, the first term vanishes and $R_m(y_0)=P_{0,m-1}(y_0)=f(y_0)$. At $t=y_m$, the second term vanishes and $R_m(y_m)=P_{1,m}(y_m)=f(y_m)$. If $1 \le i \le m-1$, then $P_{0,m-1}(y_i)=P_{1,m}(y_i)=f(y_i)$, so
\begin{align*}
R_m(y_i)=\frac{(y_i-y_0)f(y_i)-(y_i-y_m)f(y_i)}{y_m-y_0}=f(y_i).
\end{align*}
Thus $R_m$ interpolates $f$ at $y_0,\dots,y_m$, and uniqueness of interpolation at distinct nodes gives $R_m=P_m$.
By the induction hypothesis, the coefficient of $t^{m-1}$ in $P_{1,m}$ is $f[y_1,\dots,y_m]$, and the coefficient of $t^{m-1}$ in $P_{0,m-1}$ is $f[y_0,\dots,y_{m-1}]$. Therefore the coefficient of $t^m$ in $R_m=P_m$ is
\begin{align*}
\frac{f[y_1,\dots,y_m]-f[y_0,\dots,y_{m-1}]}{y_m-y_0}=f[y_0,\dots,y_m],
\end{align*}
which is exactly the recursive definition of the divided difference. This proves the claim.
[/proof]
We now prove by induction on $m \in \{0,1,\dots,n\}$ that the truncated Newton polynomial
\begin{align*}
N_m(t):=f[x_0]+\sum_{r=1}^{m} f[x_0,\dots,x_r]\prod_{j=0}^{r-1}(t-x_j)
\end{align*}
satisfies $N_m(x_i)=f(x_i)$ for every $i \in \{0,1,\dots,m\}$. For $m=0$, this is $N_0(x_0)=f[x_0]=f(x_0)$.
Assume the assertion holds for $m-1$, where $1 \le m \le n$. For $i<m$, the final product in $N_m-N_{m-1}$ contains the factor $x_i-x_i$, so $N_m(x_i)=N_{m-1}(x_i)=f(x_i)$. Let $P_m \in \mathcal{P}_m(\mathbb{R})$ be the unique polynomial interpolating $f$ at $x_0,\dots,x_m$. By the claim, the coefficient of $t^m$ in $P_m$ is $f[x_0,\dots,x_m]$. The coefficient of $t^m$ in $N_m$ is also $f[x_0,\dots,x_m]$, because the last Newton basis product has leading coefficient $1$ and every earlier summand has degree at most $m-1$. Hence $N_m-P_m \in \mathcal{P}_{m-1}(\mathbb{R})$. Since $N_m(x_i)=P_m(x_i)=f(x_i)$ for $i \in \{0,1,\dots,m-1\}$, the polynomial $N_m-P_m$ has the $m$ distinct roots $x_0,\dots,x_{m-1}$. The root-counting fact used above forces $N_m-P_m=0$, so $N_m=P_m$. In particular, $N_m(x_m)=P_m(x_m)=f(x_m)$.
Thus $N_n=N_f$ interpolates $f$ at all nodes $x_0,\dots,x_n$.
[guided]
The delicate point is that the Newton coefficient $f[x_0,\dots,x_m]$ must really be the coefficient that adds the new node $x_m$. We prove this from the recursive definition of divided differences, rather than assuming it.
First, we show a general fact. Given pairwise distinct nodes $y_0,\dots,y_m$, let $P_m \in \mathcal{P}_m(\mathbb{R})$ be the unique polynomial with $P_m(y_i)=f(y_i)$ for all $i$. We claim that the coefficient of $t^m$ in $P_m$ is $f[y_0,\dots,y_m]$. The case $m=0$ is immediate, because $P_0(t)=f(y_0)=f[y_0]$.
For $m \ge 1$, let $P_{0,m-1}$ interpolate the first $m$ nodes $y_0,\dots,y_{m-1}$, and let $P_{1,m}$ interpolate the last $m$ nodes $y_1,\dots,y_m$. Define
\begin{align*}
R_m(t):=\frac{(t-y_0)P_{1,m}(t)-(t-y_m)P_{0,m-1}(t)}{y_m-y_0}.
\end{align*}
This is legitimate because $y_m-y_0 \ne 0$. The point of this construction is that it blends the two lower-degree interpolants while forcing the endpoints to come from the correct one. At $t=y_0$, the factor $t-y_0$ kills the first term and gives $R_m(y_0)=P_{0,m-1}(y_0)=f(y_0)$. At $t=y_m$, the factor $t-y_m$ kills the second term and gives $R_m(y_m)=P_{1,m}(y_m)=f(y_m)$. For an interior node $y_i$ with $1 \le i \le m-1$, both lower-degree interpolants take the same value $f(y_i)$, so
\begin{align*}
R_m(y_i)=\frac{(y_i-y_0)f(y_i)-(y_i-y_m)f(y_i)}{y_m-y_0}=f(y_i).
\end{align*}
Thus $R_m$ interpolates all $m+1$ data points, and uniqueness of interpolation at distinct nodes implies $R_m=P_m$.
Now compare leading coefficients. By the induction hypothesis, the coefficient of $t^{m-1}$ in $P_{1,m}$ is $f[y_1,\dots,y_m]$, and the coefficient of $t^{m-1}$ in $P_{0,m-1}$ is $f[y_0,\dots,y_{m-1}]$. Multiplying each by $t-y_0$ or $t-y_m$ raises those leading terms to degree $m$, so the coefficient of $t^m$ in $P_m=R_m$ is
\begin{align*}
\frac{f[y_1,\dots,y_m]-f[y_0,\dots,y_{m-1}]}{y_m-y_0}=f[y_0,\dots,y_m].
\end{align*}
This proves that recursive divided differences are exactly the leading coefficients of the corresponding interpolating polynomials.
We apply this fact to the truncated Newton polynomial
\begin{align*}
N_m(t):=f[x_0]+\sum_{r=1}^{m} f[x_0,\dots,x_r]\prod_{j=0}^{r-1}(t-x_j).
\end{align*}
The induction starts with $N_0(x_0)=f[x_0]=f(x_0)$. Suppose $N_{m-1}$ interpolates $f$ at $x_0,\dots,x_{m-1}$. For $i<m$, the added term in $N_m-N_{m-1}$ contains the factor $x_i-x_i$, so $N_m(x_i)=N_{m-1}(x_i)=f(x_i)$.
Let $P_m$ be the unique polynomial in $\mathcal{P}_m(\mathbb{R})$ interpolating $f$ at $x_0,\dots,x_m$. The leading-coefficient fact gives that the coefficient of $t^m$ in $P_m$ is $f[x_0,\dots,x_m]$. The coefficient of $t^m$ in $N_m$ is the same number, because the product $\prod_{j=0}^{m-1}(t-x_j)$ has leading coefficient $1$ and all earlier Newton terms have degree at most $m-1$. Therefore $N_m-P_m$ has degree at most $m-1$. It also vanishes at $x_0,\dots,x_{m-1}$, because both polynomials interpolate the first $m$ nodes. A polynomial of degree at most $m-1$ with $m$ distinct roots must be zero. Hence $N_m=P_m$, and in particular $N_m(x_m)=P_m(x_m)=f(x_m)$.
By induction, $N_n=N_f$ interpolates $f$ at all nodes $x_0,\dots,x_n$.
[/guided]
[/step]
[step:Conclude that all three interpolation forms are the same polynomial]
The Lagrange polynomial $L_f$, the Vandermonde polynomial $V_f$, and the Newton polynomial $N_f$ all belong to $\mathcal{P}_n(\mathbb{R})$ and all satisfy
\begin{align*}
p(x_i)=f(x_i)
\end{align*}
for every $i \in \{0,1,\dots,n\}$. By uniqueness of interpolation at the pairwise distinct nodes $x_0,\dots,x_n$, any two of these polynomials are equal. Therefore
\begin{align*}
L_f=V_f=N_f
\end{align*}
as elements of $\mathbb{R}[t]$. This proves the claimed equivalence of the Lagrange, Vandermonde, and Newton interpolation forms.
[/step]