[proofplan]
We prove the finite [Jensen inequality](/theorems/515) by induction on the number of points. The base case has only one coefficient and is forced by the condition that the coefficients sum to $1$. In the induction step, we separate the last coefficient from the others; if the remaining total mass is positive, we normalize the first coefficients and apply the induction hypothesis to the normalized convex combination. The two-point convexity of both the set $C$ and the function $f$ then combines that normalized point with the last point.
[/proofplan]
[step:Prove the one-point case]
Let $x_1 \in C$ and let $\lambda_1 \in [0,\infty)$ satisfy
\begin{align*}
\lambda_1 = 1.
\end{align*}
Then
\begin{align*}
\sum_{i=1}^1 \lambda_i x_i = \lambda_1 x_1 = x_1 \in C.
\end{align*}
Therefore
\begin{align*}
f\left(\sum_{i=1}^1 \lambda_i x_i\right) = f(x_1) = \lambda_1 f(x_1) = \sum_{i=1}^1 \lambda_i f(x_i).
\end{align*}
This proves the assertion for $m=1$.
[/step]
[step:Normalize the first $m$ weights in the induction step]
Fix an integer $m \ge 1$, and assume the theorem holds for this value of $m$. We prove it for $m+1$.
Let $x_1,\ldots,x_m,x_{m+1} \in C$, and let $\lambda_1,\ldots,\lambda_m,\lambda_{m+1} \in [0,\infty)$ satisfy
\begin{align*}
\sum_{i=1}^{m+1} \lambda_i = 1.
\end{align*}
Define the remaining mass $\mu \in [0,1]$ by
\begin{align*}
\mu = \sum_{i=1}^m \lambda_i.
\end{align*}
Then
\begin{align*}
\mu + \lambda_{m+1} = 1.
\end{align*}
If $\mu = 0$, then $\lambda_i = 0$ for every $i \in \{1,\ldots,m\}$, because each $\lambda_i$ is nonnegative. Hence $\lambda_{m+1}=1$, and
\begin{align*}
\sum_{i=1}^{m+1} \lambda_i x_i = x_{m+1} \in C.
\end{align*}
Moreover,
\begin{align*}
f\left(\sum_{i=1}^{m+1} \lambda_i x_i\right) = f(x_{m+1}) = \sum_{i=1}^{m+1} \lambda_i f(x_i).
\end{align*}
Thus the desired conclusion holds in the case $\mu=0$.
Assume now that $\mu>0$. For each $i \in \{1,\ldots,m\}$, define $\alpha_i \in [0,\infty)$ by
\begin{align*}
\alpha_i = \frac{\lambda_i}{\mu}.
\end{align*}
Then
\begin{align*}
\sum_{i=1}^m \alpha_i = \frac{1}{\mu}\sum_{i=1}^m \lambda_i = 1.
\end{align*}
Define $y \in V$ by
\begin{align*}
y = \sum_{i=1}^m \alpha_i x_i.
\end{align*}
By the induction hypothesis applied to the points $x_1,\ldots,x_m$ and the coefficients $\alpha_1,\ldots,\alpha_m$, we have $y \in C$ and
\begin{align*}
f(y) \le \sum_{i=1}^m \alpha_i f(x_i).
\end{align*}
[guided]
We separate the last point $x_{m+1}$ and measure how much coefficient mass remains on the first $m$ points. Define
\begin{align*}
\mu = \sum_{i=1}^m \lambda_i.
\end{align*}
Since all coefficients are nonnegative and their total sum is $1$, we have $\mu \in [0,1]$ and
\begin{align*}
\mu + \lambda_{m+1} = 1.
\end{align*}
There are two cases. If $\mu=0$, then the sum of the nonnegative numbers $\lambda_1,\ldots,\lambda_m$ is zero, so each one is zero. The total sum condition then gives $\lambda_{m+1}=1$. Therefore the weighted point is just
\begin{align*}
\sum_{i=1}^{m+1} \lambda_i x_i = x_{m+1}.
\end{align*}
Since $x_{m+1}\in C$, the weighted point lies in $C$, and the Jensen inequality becomes equality:
\begin{align*}
f\left(\sum_{i=1}^{m+1} \lambda_i x_i\right) = f(x_{m+1}) = \sum_{i=1}^{m+1} \lambda_i f(x_i).
\end{align*}
Now suppose $\mu>0$. The reason for introducing $\mu$ is that the first $m$ coefficients need not sum to $1$, so we cannot apply the induction hypothesis to them directly. We normalize them. For each $i \in \{1,\ldots,m\}$, define
\begin{align*}
\alpha_i = \frac{\lambda_i}{\mu}.
\end{align*}
Each $\alpha_i$ is nonnegative, and the normalized coefficients sum to $1$:
\begin{align*}
\sum_{i=1}^m \alpha_i = \frac{1}{\mu}\sum_{i=1}^m \lambda_i = 1.
\end{align*}
Define the normalized weighted point $y \in V$ by
\begin{align*}
y = \sum_{i=1}^m \alpha_i x_i.
\end{align*}
The induction hypothesis applies because the points $x_1,\ldots,x_m$ lie in $C$ and the coefficients $\alpha_1,\ldots,\alpha_m$ are nonnegative with sum $1$. Hence $y \in C$, and
\begin{align*}
f(y) \le \sum_{i=1}^m \alpha_i f(x_i).
\end{align*}
This is the exact place where the induction hypothesis converts the first $m$-point part of the weighted sum into a single point $y \in C$ that can be combined with $x_{m+1}$ using the ordinary two-point convexity inequality.
[/guided]
[/step]
[step:Apply two-point convexity to combine the normalized point with the last point]
Continue in the case $\mu>0$. Define $z \in V$ by
\begin{align*}
z = \sum_{i=1}^{m+1} \lambda_i x_i.
\end{align*}
Using the definition of $y$ and the identity $\lambda_i=\mu\alpha_i$ for $i \in \{1,\ldots,m\}$, we compute
\begin{align*}
z = \mu y + \lambda_{m+1}x_{m+1}.
\end{align*}
Since $y \in C$, $x_{m+1}\in C$, and $\mu,\lambda_{m+1}\in[0,1]$ with $\mu+\lambda_{m+1}=1$, the stated two-point convexity property of $C$ gives $z\in C$.
Applying the stated two-point convexity property of $f$ to the points $y,x_{m+1}\in C$ with coefficients $\mu$ and $\lambda_{m+1}$ gives
\begin{align*}
f(z) \le \mu f(y) + \lambda_{m+1}f(x_{m+1}).
\end{align*}
Using the induction estimate for $f(y)$ and the nonnegativity of $\mu$, we obtain
\begin{align*}
f(z) \le \mu \sum_{i=1}^m \alpha_i f(x_i) + \lambda_{m+1}f(x_{m+1}).
\end{align*}
Substituting $\mu\alpha_i=\lambda_i$ for each $i \in \{1,\ldots,m\}$ yields
\begin{align*}
f(z) \le \sum_{i=1}^m \lambda_i f(x_i) + \lambda_{m+1}f(x_{m+1}) = \sum_{i=1}^{m+1} \lambda_i f(x_i).
\end{align*}
Because $z=\sum_{i=1}^{m+1}\lambda_i x_i$, this is precisely the desired inequality for $m+1$.
[/step]
[step:Conclude by induction]
The assertion holds for $m=1$, and the induction step proves that validity for $m$ implies validity for $m+1$. Therefore, by induction, the assertion holds for every integer $m\ge 1$.
[/step]