Polynomial interpolation answers a basic reconstruction question: if a function is known only at finitely many points, when can those values be matched by a single [polynomial](/page/Polynomial), and what does that polynomial tell us about the original function? The question is algebraic because finitely many coefficients must satisfy finitely many equations, but it is also analytic because the resulting polynomial may be used to approximate a function between the data points. This dual nature is why interpolation appears early in [numerical analysis](/page/Numerical%20Analysis), [approximation theory](/page/Approximation%20Theory), and the study of Taylor polynomial approximations.
The central tension is that exact agreement at selected points does not automatically mean good approximation elsewhere. A polynomial may pass through every measured value and still oscillate badly between nodes. Polynomial interpolation is therefore not just the act of fitting a curve; it is the study of which data determine a polynomial, how that polynomial can be represented, and how its error depends on the underlying function and the placement of the nodes.
## Definition
The page topic is not merely a formula for drawing a curve through points. Polynomial interpolation is the finite-dimensional reconstruction problem in which point samples are treated as exact constraints and the unknown object is restricted to a polynomial space of matching dimension. Here $\mathbb{R}[x]$ denotes the ring, or equivalently the [vector space](/page/Vector%20Space), of polynomials in one variable $x$ with real coefficients.
[definition: Polynomial Interpolation]
Let $n \in \mathbb{N}$, let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be distinct, and let $y_0, y_1, \ldots, y_n \in \mathbb{R}$. Polynomial interpolation at the nodes $x_0, x_1, \ldots, x_n$ is the construction of a polynomial $p \in \mathbb{R}[x]$ such that $\deg p \le n$ and
\begin{align*}
p(x_i) = y_i, \qquad i=0,1,\ldots,n.
\end{align*}
[/definition]
The input data for interpolation must specify where the function values are sampled. These sampling locations need to be separated, because asking for two different function values at the same point is not an ordinary pointwise interpolation problem.
[definition: Interpolation Nodes]
Let $n \in \mathbb{N}$. Interpolation nodes are distinct points $x_0, x_1, \ldots, x_n \in \mathbb{R}$.
[/definition]
The distinctness condition is not cosmetic. If the same node is listed twice while only function values are prescribed, the data may demand two incompatible values at one input.
[example: Repeated Node Failure]
Suppose a polynomial $p \in \mathbb{R}[x]$ is required to satisfy both $p(0)=1$ and $p(0)=2$. Since a polynomial defines a function $\mathbb{R}\to\mathbb{R}$, evaluating $p$ at the single input $0$ produces one real number. Thus the two requirements force the equality
\begin{align*}
1=p(0)=2.
\end{align*}
Subtracting $1$ from both sides gives
\begin{align*}
0=1,
\end{align*}
which is impossible in $\mathbb{R}$. Therefore no such polynomial exists. The obstruction is not special to polynomials: the data assign two different values to the same node, so they do not define ordinary pointwise interpolation data. If repeated points are intended to encode derivative conditions, the appropriate framework is Hermite interpolation rather than ordinary polynomial interpolation.
[/example]
Given nodes, the basic problem is to replace a list of data values by one polynomial of controlled degree. The degree bound is part of the definition: without it, many polynomials can be forced through the same finite data by adding a polynomial that vanishes at every node.
[definition: Polynomial Interpolation Problem]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be interpolation nodes, and let $y_0, y_1, \ldots, y_n \in \mathbb{R}$. The polynomial interpolation problem is the problem of finding a polynomial $p \in \mathbb{R}[x]$ such that $\deg p \le n$ and
\begin{align*}
p(x_i) = y_i, \qquad i = 0,1,\ldots,n.
\end{align*}
[/definition]
When the data values come from an actual function, interpolation should preserve the distinction between the sampled function and the polynomial constructed from the samples. This matters because the polynomial may be used as a proxy away from the nodes, where it need not equal the original function.
[definition: Interpolating Polynomial]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be interpolation nodes, and let $f: A \to \mathbb{R}$ be a function with $\{x_0, x_1, \ldots, x_n\} \subset A \subset \mathbb{R}$. An interpolating polynomial for $f$ at the nodes $x_0, x_1, \ldots, x_n$ is a polynomial $p \in \mathbb{R}[x]$ such that $\deg p \le n$ and
\begin{align*}
p(x_i) = f(x_i), \qquad i = 0,1,\ldots,n.
\end{align*}
[/definition]
Many applications fix the same nodes and vary the sampled function. Quadrature rules, finite element constructions, and spectral collocation methods all need a reusable operation that turns function values into a polynomial. Packaging interpolation as an operator records this reusable construction and makes algebraic questions, such as linearity, meaningful.
[definition: Interpolation Operator]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be interpolation nodes, and let $A \subset \mathbb{R}$ contain all the nodes. The interpolation operator associated to these nodes is the map $I_n: \{f: A \to \mathbb{R}\} \to \mathbb{R}_n[x]$ where $\mathbb{R}_n[x] = \{p \in \mathbb{R}[x] : \deg p \le n\}$ and $I_n f$ is the interpolating polynomial for $f$ at $x_0, x_1, \ldots, x_n$.
[/definition]
The definition of $I_n$ depends on the fact that the interpolating polynomial exists and is unique. That result is stated below in the properties section; the definition records the intended object once existence and uniqueness have been established.
## Equivalent Characterisations
An explicit interpolation formula should avoid solving a fresh system of equations every time the data values change. The natural strategy is to build polynomials that act like coordinate vectors with respect to evaluation at the nodes: one basis polynomial detects one node and vanishes at all the others.
[definition: Lagrange Basis Polynomial]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be interpolation nodes. For each $i \in \{0,1,\ldots,n\}$, the $i$-th Lagrange basis polynomial is the polynomial
\begin{align*}
\ell_i(x) = \prod_{0 \le j \le n,\ j \ne i} \frac{x - x_j}{x_i - x_j}.
\end{align*}
[/definition]
The Lagrange basis is designed so that data values become coefficients. This gives a direct formula for the interpolant and makes the dependence on the prescribed values transparent. The resulting theorem is the standard closed-form solution of the interpolation problem.
[quotetheorem:8328]
The Lagrange formula is explicit and symmetric in the nodes, which makes it conceptually clean. A different viewpoint is needed when interpolation is studied through coefficients, conditioning, and invertibility. Writing the unknown polynomial in the monomial basis turns the problem into a linear system.
The matrix in that system has a special structure: each row evaluates monomials at one node. Naming this matrix lets the interpolation problem connect directly to [matrix](/page/Matrix) algebra and to determinant criteria for solvability.
[definition: Vandermonde Matrix]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$. The Vandermonde matrix associated to these points is the matrix $V \in \mathbb{R}^{(n+1) \times (n+1)}$ with entries
\begin{align*}
V_{ij} = x_i^j, \qquad 0 \le i,j \le n.
\end{align*}
[/definition]
If $p(x)=a_0+a_1x+\cdots+a_nx^n$, then the interpolation conditions are $Va=y$, where $a=(a_0,\ldots,a_n)$ and $y=(y_0,\ldots,y_n)$. Solvability for arbitrary data is therefore controlled by whether the columns of $V$ are linearly independent. The possible obstruction is that two nodes may force two rows of $V$ to coincide, while distinct nodes should leave no hidden linear dependence among the monomial evaluations. The determinant formula isolates exactly this obstruction.
The determinant identity below uses the symbol $F$ for a field. Here a field means a number system where addition, subtraction, multiplication, and division by nonzero elements obey the familiar algebraic rules; $\mathbb{R}$ is the example needed for this page. The theorem is quoted in that broader language because its determinant computation does not use order, limits, continuity, or any specifically real-analytic property. Reading it with $F=\mathbb{R}$ gives exactly the interpolation criterion we need: the determinant should vanish precisely when two interpolation nodes coincide.
[quotetheorem:1673]
The formula shows that distinctness of the nodes is not a cosmetic hypothesis: it is exactly the condition that makes the Vandermonde matrix invertible. If two nodes coincide, two rows agree and the same polynomial value is being asked to serve two possibly different data values; if all nodes are distinct, the determinant is nonzero and every data vector has a unique coefficient vector in the monomial basis. This also explains a practical limitation of the monomial form. Even when the determinant is nonzero, nearby nodes can make the Vandermonde system numerically ill-conditioned, so the determinant proves uniqueness but does not by itself give a stable computational method.
When sample data are revealed incrementally, an interpolation method should update the old polynomial instead of discarding all previous work. The needed coefficients must record how much new information is contributed by a newly appended node. This need motivates divided differences, which provide exactly these incremental coefficients.
[definition: Divided Difference]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be interpolation nodes, and let $f: A \to \mathbb{R}$ be a function with $\{x_0, x_1, \ldots, x_n\} \subset A \subset \mathbb{R}$. The divided differences of $f$ on these nodes are defined recursively by $f[x_i] = f(x_i)$ and, for $k \ge 1$,
\begin{align*}
f[x_i,x_{i+1},\ldots,x_{i+k}] = \frac{f[x_{i+1},\ldots,x_{i+k}] - f[x_i,\ldots,x_{i+k-1}]}{x_{i+k}-x_i}
\end{align*}
for all indices for which the displayed expression is defined.
[/definition]
Divided differences measure the coefficient that must be added when passing from interpolation on $k$ nodes to interpolation on $k+1$ nodes. The next definition records the polynomial form in which those coefficients appear, giving a representation that can be extended by appending one new term when a node is added.
[definition: Newton Interpolation Form]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be interpolation nodes, and let $f: A \to \mathbb{R}$ be a function with $\{x_0, x_1, \ldots, x_n\} \subset A \subset \mathbb{R}$. The Newton interpolation form of the interpolating polynomial is
\begin{align*}
I_n f(x) = f[x_0] + \sum_{k=1}^n f[x_0,x_1,\ldots,x_k]\prod_{j=0}^{k-1}(x-x_j).
\end{align*}
[/definition]
The Lagrange, Vandermonde, and Newton forms describe the same polynomial, but that agreement is not part of their definitions. It is a consequence of uniqueness: once each construction satisfies the same interpolation conditions and has the same degree bound, there is no room for them to differ. The theorem below records this compatibility so the representations can be used interchangeably.
[quotetheorem:8329]
This equivalence is what allows the three forms to serve different purposes without changing the interpolant. The Lagrange form is usually best for proving existence and for writing a direct formula from the data values. The Vandermonde form exposes the coefficient problem and connects interpolation to invertibility and conditioning of matrices. The Newton form is better suited to adding nodes one at a time, because each new divided difference contributes one new term. In applications, one chooses the representation for its algebraic or numerical convenience while relying on the theorem to know that the underlying polynomial is the same.
## Examples
The simplest non-linear case already shows the whole mechanism: three values determine a quadratic polynomial. This example also illustrates why the Lagrange basis is useful for hand computation.
[example: Quadratic Interpolation Through Three Points]
Let the interpolation nodes be $x_0=0$, $x_1=1$, and $x_2=2$, with data values $y_0=1$, $y_1=3$, and $y_2=2$. From the definition of the Lagrange basis polynomial,
\begin{align*}
\ell_0(x)=\frac{x-1}{0-1}\frac{x-2}{0-2}=\frac{(x-1)(x-2)}{2}.
\end{align*}
Similarly,
\begin{align*}
\ell_1(x)=\frac{x-0}{1-0}\frac{x-2}{1-2}=-x(x-2).
\end{align*}
For the third node,
\begin{align*}
\ell_2(x)=\frac{x-0}{2-0}\frac{x-1}{2-1}=\frac{x(x-1)}{2}.
\end{align*}
Thus the Lagrange interpolating polynomial is
\begin{align*}
p(x)=1\cdot \frac{(x-1)(x-2)}{2}+3\cdot \bigl(-x(x-2)\bigr)+2\cdot \frac{x(x-1)}{2}.
\end{align*}
Combining the scalar factors gives
\begin{align*}
p(x)=\frac{(x-1)(x-2)}{2}-3x(x-2)+x(x-1).
\end{align*}
Expanding each product,
\begin{align*}
(x-1)(x-2)=x^2-3x+2.
\end{align*}
Also,
\begin{align*}
-3x(x-2)=-3x^2+6x.
\end{align*}
Finally,
\begin{align*}
x(x-1)=x^2-x.
\end{align*}
Substituting these expansions into $p(x)$ gives
\begin{align*}
p(x)=\frac{x^2-3x+2}{2}+(-3x^2+6x)+(x^2-x).
\end{align*}
Collecting the quadratic, linear, and constant terms,
\begin{align*}
p(x)=\left(\frac{1}{2}-3+1\right)x^2+\left(-\frac{3}{2}+6-1\right)x+1.
\end{align*}
Therefore
\begin{align*}
p(x)=-\frac{3}{2}x^2+\frac{7}{2}x+1.
\end{align*}
Now evaluate the polynomial at the three nodes:
\begin{align*}
p(0)=-\frac{3}{2}\cdot 0^2+\frac{7}{2}\cdot 0+1=1.
\end{align*}
At $x=1$,
\begin{align*}
p(1)=-\frac{3}{2}+\frac{7}{2}+1=\frac{4}{2}+1=3.
\end{align*}
At $x=2$,
\begin{align*}
p(2)=-\frac{3}{2}\cdot 4+\frac{7}{2}\cdot 2+1=-6+7+1=2.
\end{align*}
Hence this quadratic polynomial matches all three prescribed data values, so it is the interpolating polynomial for the given nodes and values.
[/example]
## Properties
The foundational theorem says that the interpolation problem has exactly one answer. This is the fact that makes notation such as $I_n f$ meaningful. In the quoted theorem, $P_n[x]$ denotes the same space of real polynomials of degree at most $n$ that this page elsewhere writes as $\mathbb{R}_n[x]$ or $\mathcal{P}_n(\mathbb{R})$.
[quotetheorem:473]
The theorem is the algebraic heart of interpolation. It says that the evaluation map from polynomials of degree at most $n$ to data vectors in $\mathbb{R}^{n+1}$ is an isomorphism whenever the nodes are distinct. Since this map is linear, interpolation should preserve linear combinations of data, and the next theorem states that operator property explicitly.
[quotetheorem:8330]
Linearity matters because interpolation is often applied to data assembled from simpler pieces. If two sampled functions are added, or if all sampled values are rescaled, the interpolating polynomial changes in exactly the same linear way. This is why interpolation operators can be combined with integration to build quadrature rules and with local polynomial spaces in finite element methods. The limitation is equally important: linearity is an algebraic property of the reconstruction map, not a guarantee of stability or accuracy. Error estimates still depend on smoothness and node placement, which is why the next question is analytic rather than purely algebraic.
The main analytic question is not whether the interpolating polynomial exists, but how far it is from the function it interpolates. To state the error formula cleanly, it is useful to isolate the polynomial that vanishes at all interpolation nodes. Its size controls where the interpolant is forced to agree with $f$ and where the error is free to grow.
[definition: Nodal Polynomial]
Let $x_0, x_1, \ldots, x_n \in \mathbb{R}$ be interpolation nodes. The nodal polynomial associated to these nodes is
\begin{align*}
\omega_{n+1}(x) = \prod_{i=0}^n (x-x_i).
\end{align*}
[/definition]
The nodal polynomial is the algebraic record of the interpolation nodes. It vanishes exactly at the places where the interpolation error must vanish, but these forced zeros alone do not say how large the error can become between nodes. The missing information is how much curvature of order $n+1$ remains after a degree-$n$ polynomial has matched the data. The error formula combines these two sources: the geometry of the nodes through $\omega_{n+1}$ and the smoothness of $f$ through a higher derivative.
[quotetheorem:475]
This formula explains both success and failure. If $f^{(n+1)}$ is controlled and the nodal product is small on the interval, interpolation gives a good approximation. If the nodal product grows large, especially near the endpoints for equally spaced nodes, interpolation can behave poorly even for smooth functions.
[example: Runge Phenomenon for Equally Spaced Nodes]
Let $f: [-1,1] \to \mathbb{R}$ be given by
\begin{align*}
f(x)=\frac{1}{1+25x^2}.
\end{align*}
For each $n \in \mathbb{N}$, take the equally spaced nodes $x_i=-1+2i/n$ for $i=0,1,\ldots,n$, and let $I_n f$ be the interpolating polynomial.
Already at degree $4$, the endpoint behavior can be seen explicitly. The nodes are $-1,-1/2,0,1/2,1$, and the sampled values are
\begin{align*}
f(0)=1.
\end{align*}
Also,
\begin{align*}
f(1/2)=f(-1/2)=\frac{1}{1+25/4}=\frac{1}{29/4}=\frac{4}{29}.
\end{align*}
At the endpoints,
\begin{align*}
f(1)=f(-1)=\frac{1}{1+25}=\frac{1}{26}.
\end{align*}
Because the data are symmetric, the interpolating polynomial has the form
\begin{align*}
p(x)=ax^4+bx^2+1.
\end{align*}
Using $p(1/2)=4/29$ gives
\begin{align*}
\frac{a}{16}+\frac{b}{4}+1=\frac{4}{29}.
\end{align*}
Subtracting $1$ from both sides gives
\begin{align*}
\frac{a}{16}+\frac{b}{4}=-\frac{25}{29}.
\end{align*}
Multiplying by $16$ gives
\begin{align*}
a+4b=-\frac{400}{29}.
\end{align*}
Using $p(1)=1/26$ gives
\begin{align*}
a+b+1=\frac{1}{26}.
\end{align*}
Subtracting $1$ gives
\begin{align*}
a+b=-\frac{25}{26}.
\end{align*}
Subtracting this equation from $a+4b=-400/29$ gives
\begin{align*}
3b=-\frac{400}{29}+\frac{25}{26}.
\end{align*}
With common denominator $754$, this is
\begin{align*}
3b=\frac{-10400+725}{754}=-\frac{9675}{754}.
\end{align*}
Therefore
\begin{align*}
b=-\frac{3225}{754}.
\end{align*}
Since $a+b=-25/26$, we get
\begin{align*}
a=-\frac{25}{26}+\frac{3225}{754}.
\end{align*}
Because $754=29\cdot 26$, this becomes
\begin{align*}
a=-\frac{725}{754}+\frac{3225}{754}=\frac{2500}{754}=\frac{1250}{377}.
\end{align*}
Thus
\begin{align*}
I_4f(x)=\frac{1250}{377}x^4-\frac{3225}{754}x^2+1.
\end{align*}
At $x=3/4$,
\begin{align*}
I_4f(3/4)=\frac{1250}{377}\cdot\frac{81}{256}-\frac{3225}{754}\cdot\frac{9}{16}+1.
\end{align*}
The first term is
\begin{align*}
\frac{1250\cdot 81}{377\cdot 256}=\frac{101250}{96512}.
\end{align*}
The second term is
\begin{align*}
\frac{3225\cdot 9}{754\cdot 16}=\frac{29025}{12064}=\frac{232200}{96512}.
\end{align*}
Therefore
\begin{align*}
I_4f(3/4)=\frac{101250}{96512}-\frac{232200}{96512}+1=-\frac{130950}{96512}+\frac{96512}{96512}=-\frac{34438}{96512}.
\end{align*}
So
\begin{align*}
I_4f(3/4)=-\frac{17219}{48256}.
\end{align*}
But the original function is positive there:
\begin{align*}
f(3/4)=\frac{1}{1+25\cdot 9/16}=\frac{1}{241/16}=\frac{16}{241}.
\end{align*}
Thus the interpolating polynomial matches $f$ at all five equally spaced nodes, but between the nodes it has already crossed below $0$ while $f$ remains positive. Runge's phenomenon is the amplified version of this effect: for the Runge function, increasing the number of equally spaced nodes produces endpoint oscillations rather than [uniform convergence](/page/Uniform%20Convergence) on $[-1,1]$.
[/example]
The earlier error formula explains why node placement matters. Once the derivative term is controlled on an interval, the remaining visible factor is the nodal product. Chebyshev-type nodes are designed to keep that product smaller than it is for equally spaced nodes, which is why they are central in stable polynomial approximation.
After the error estimate, the remaining analytic question is what information is carried by the Newton coefficients themselves. These coefficients are divided differences, so one wants to know whether they are only entries in a recursive interpolation table or whether they measure a genuine rate of change of the function.
This question matters because the error formula contains a derivative of $f$, while the Newton form is built from divided differences computed from sampled values. To connect the computational coefficients with the analytic smoothness assumptions, one needs a bridge showing that a divided difference over several nodes behaves like an ordinary derivative of the corresponding order at some intermediate point.
For smooth functions, divided differences are not merely algebraic bookkeeping. They are discrete analogues of derivatives, taken across several nodes rather than at one point. The point needing justification is that this analogy should not depend on the particular recursive table used to compute them.
The next result supplies this bridge. It says that a divided difference over $n+1$ distinct nodes is not just comparable to an $n$th derivative in spirit: under smoothness hypotheses, it is exactly an $n$th derivative value, up to the factorial normalization, at some intermediate point among the nodes.
[quotetheorem:908]
This result justifies thinking of divided differences as discrete derivatives. As the nodes coalesce in a controlled way, divided differences approach ordinary derivatives, which leads naturally from interpolation to Taylor polynomials and Hermite interpolation.
## Relationship to Other Concepts
Polynomial interpolation is a finite-dimensional version of approximation by a chosen function class. In [approximation theory](/page/Approximation%20Theory), the central question is usually not exact agreement at finitely many points but small error in a norm, such as the $C^0$ norm or an $L^2$ norm. Interpolation gives one concrete projection-like procedure, though it is not generally an [orthogonal projection](/theorems/437).
Interpolation is also closely related to Taylor polynomial approximation. Taylor approximation matches derivative data at one point; ordinary polynomial interpolation matches value data at several distinct points. Hermite interpolation lies between them by allowing repeated nodes with derivative conditions.
In numerical integration, interpolation leads to quadrature rules: first replace $f$ by $I_n f$, then integrate the polynomial exactly. This is the mechanism behind Newton-Cotes formulas and Gaussian quadrature, though their stability and accuracy depend heavily on node choice.
In numerical differential equations and finite element methods, local interpolation constructs polynomial approximations on small intervals or elements. The same existence and uniqueness theorem is used locally, while analytic estimates control how the local errors combine.
A final warning is conceptual: interpolation is not regression. Interpolation demands exact agreement with all data values. Regression and least-squares approximation allow residual error and are usually better suited to noisy data. Thus interpolation belongs to exact reconstruction from trusted samples, while regression belongs to statistical or overdetermined approximation.
## Beyond and Connections
Polynomial interpolation is a meeting point for algebra, analysis, and computation. Algebraically, it says that a finite table of distinct input values determines a unique polynomial below a prescribed degree. Analytically, it raises the separate question of whether these interpolants approximate an underlying function well as more sample points are added. Numerically, it forces attention to basis choice, node placement, and stability, because mathematically equivalent formulas can behave quite differently in finite precision.
This distinction explains why interpolation appears both as a construction and as a warning. Lagrange form is often the clearest certificate of existence and uniqueness, Newton form is well suited to incremental data and divided differences, and Vandermonde systems connect the problem to linear algebra while also exposing conditioning issues. Later topics such as spline interpolation, Chebyshev approximation, quadrature rules, and spectral methods all keep the same core question but change the polynomial space, the sampling geometry, or the notion of error being controlled.
## References
[Polynomial](/page/Polynomial).
[Derivative](/page/Derivative).
[Taylor's Theorem](/theorems/827).
[Approximation Theory](/page/Approximation%20Theory).
Isaacson and Keller, *Analysis of Numerical Methods* (1966).
Trefethen, *Approximation Theory and Approximation Practice* (2013).
Atkinson, *An Introduction to Numerical Analysis* (1989).
Polynomial Interpolation
Also known as: Interpolation polynomials, Polynomial approximation at nodes, Lagrange interpolation, Vandermonde interpolation, Newton interpolation