[proofplan]
We choose a finite-dimensional polynomial projection $\Pi: W^{m,p}(U) \to \mathcal{P}_{m-1}$ by fixing averaged moments against a basis of $\mathcal{P}_{m-1}$. The error $u-\Pi u$ lies in a closed complementary subspace on which the order-$m$ Sobolev seminorm is a genuine norm. A compactness contradiction proves that lower-order Sobolev seminorms on this complement are controlled by the order-$m$ seminorm; the endpoint $p=\infty$ is handled by the same contradiction argument using [uniform convergence](/page/Uniform%20Convergence) of lower derivatives. Taking $q=\Pi u$ then gives all estimates, while the case $k=m$ is immediate because all order-$m$ derivatives of $q$ vanish.
[/proofplan]
[step:Choose averaged moments that determine the polynomial part]
Because $U$ is star-shaped with respect to a ball, fix an open ball $B \subset U$ with positive $n$-dimensional [Lebesgue measure](/page/Lebesgue%20Measure). This hypothesis also implies that $U$ is connected: if $x,y\in U$, then each point is joined to every point of the ball by a line segment lying in $U$, so $x$ and $y$ are joined by a polygonal path in $U$. Let $\mathcal{L}^n$ denote $n$-dimensional Lebesgue measure on $\mathbb{R}^n$. Let $N := \dim \mathcal{P}_{m-1}$ and choose a basis $r_1,\dots,r_N$ of $\mathcal{P}_{m-1}$. Since $\mathcal{P}_{m-1}$ is finite-dimensional and $B \subset U$ has positive Lebesgue measure, define the bilinear map $A: \mathcal{P}_{m-1} \times \mathcal{P}_{m-1} \to \mathbb{R}$ by
\begin{align*}
A(v,w) := \int_B v(x)w(x)\,d\mathcal{L}^n(x).
\end{align*}
This bilinear map is an [inner product](/page/Inner%20Product) on $\mathcal{P}_{m-1}$. Indeed, if $A(v,v)=0$, then $v=0$ $\mathcal{L}^n$-a.e. on $B$, and since $v$ is a polynomial, it follows that $v=0$ on $\mathbb{R}^n$.
Let $G \in \mathbb{R}^{N \times N}$ be the Gram matrix defined by
\begin{align*}
G_{ij} := \int_B r_i(x)r_j(x)\,d\mathcal{L}^n(x).
\end{align*}
The preceding paragraph says that $G$ is invertible. For $u \in W^{m,p}(U)$, define the vector $b(u) \in \mathbb{R}^N$ by
\begin{align*}
b_i(u) := \int_B u(x)r_i(x)\,d\mathcal{L}^n(x), \qquad 1 \le i \le N.
\end{align*}
This is well-defined for every $1 \le p \le \infty$ because $B$ has finite measure, $u|_B \in L^p(B)$, and each $r_i$ is bounded on $\overline{B}$.
Define the polynomial projection $\Pi: W^{m,p}(U) \to \mathcal{P}_{m-1}$ by
\begin{align*}
\Pi u := \sum_{j=1}^N a_j(u)r_j,
\end{align*}
where $a(u)=(a_1(u),\dots,a_N(u)) \in \mathbb{R}^N$ is the unique solution of $G a(u)=b(u)$. Then $\Pi$ is linear, bounded as a map from $W^{m,p}(U)$ to $\mathcal{P}_{m-1}$, and satisfies
\begin{align*}
\int_B (u-\Pi u)(x)r_i(x)\,d\mathcal{L}^n(x)=0
\end{align*}
for every $1 \le i \le N$.
[guided]
The polynomial $q$ must be chosen in a way that depends linearly and continuously on $u$. Since $U$ is star-shaped with respect to a ball, fix an open ball $B \subset U$ with positive $n$-dimensional Lebesgue measure, and let $\mathcal{L}^n$ denote $n$-dimensional Lebesgue measure on $\mathbb{R}^n$. To choose $q$, we impose $N=\dim \mathcal{P}_{m-1}$ moment conditions. Choose a basis $r_1,\dots,r_N$ of $\mathcal{P}_{m-1}$. The ball $B$ is used here because integration over $B$ gives a nondegenerate inner product on polynomials:
\begin{align*}
A(v,w) := \int_B v(x)w(x)\,d\mathcal{L}^n(x).
\end{align*}
If $A(v,v)=0$, then $v=0$ for $\mathcal{L}^n$-almost every $x \in B$. A polynomial that vanishes almost everywhere on a nonempty open ball vanishes identically, so $A$ is positive definite on $\mathcal{P}_{m-1}$.
Now define the Gram matrix $G \in \mathbb{R}^{N \times N}$ by
\begin{align*}
G_{ij} := \int_B r_i(x)r_j(x)\,d\mathcal{L}^n(x).
\end{align*}
Since $A$ is an inner product, $G$ is invertible. For a Sobolev function $u \in W^{m,p}(U)$, the moments
\begin{align*}
b_i(u) := \int_B u(x)r_i(x)\,d\mathcal{L}^n(x)
\end{align*}
are finite: the ball $B$ has finite measure, $u|_B \in L^p(B)$, and each polynomial $r_i$ is bounded on $\overline{B}$. Solving $G a(u)=b(u)$ gives coefficients $a_1(u),\dots,a_N(u)$, and we set
\begin{align*}
\Pi u := \sum_{j=1}^N a_j(u)r_j.
\end{align*}
The equation $G a(u)=b(u)$ is exactly the statement that $u-\Pi u$ has zero moments against every basis polynomial:
\begin{align*}
\int_B (u-\Pi u)(x)r_i(x)\,d\mathcal{L}^n(x)=0.
\end{align*}
Because $G^{-1}$ is a fixed matrix and each moment $b_i$ is a bounded linear functional on $W^{m,p}(U)$, the map $\Pi: W^{m,p}(U)\to \mathcal{P}_{m-1}$ is bounded and linear.
[/guided]
[/step]
[step:Identify the complementary subspace where the top seminorm is a norm]
Define the closed subspace
\begin{align*}
X_p := \left\{v \in W^{m,p}(U): \int_B v(x)r_i(x)\,d\mathcal{L}^n(x)=0 \text{ for every } 1 \le i \le N\right\}.
\end{align*}
For every $u \in W^{m,p}(U)$, the function $u-\Pi u$ belongs to $X_p$.
[claim:Connected-domain polynomial characterization]
Let $1\le p\le\infty$ and let $v\in W^{m,p}(U)$ satisfy $D^\alpha v=0$ in $\mathcal{D}'(U)$ for every multi-index $\alpha$ with $|\alpha|=m$. Then there exists $P\in\mathcal{P}_{m-1}$ such that $v=P$ $\mathcal{L}^n$-a.e. on $U$.
[/claim]
[proof]
The assertion is local on balls: on each open ball $B_0\subset U$, the distributional form of [Taylor's theorem](/theorems/827) gives a polynomial $P_{B_0}\in\mathcal{P}_{m-1}$ such that $v=P_{B_0}$ in $\mathcal{D}'(B_0)$, hence $\mathcal{L}^n$-a.e. on $B_0$. If two such balls overlap, then $P_{B_0}-P_{B_1}=0$ on an [open set](/page/Open%20Set), so the polynomial identity theorem gives $P_{B_0}=P_{B_1}$ on $\mathbb{R}^n$. Since $U$ is connected, any two balls in $U$ are joined by a finite chain of overlapping balls inside $U$, and the local polynomials therefore agree with one global polynomial $P\in\mathcal{P}_{m-1}$. Thus $v=P$ $\mathcal{L}^n$-a.e. on $U$.
[/proof]
We claim that the seminorm $|\cdot|_{W^{m,p}(U)}$ vanishes on $X_p$ only at $0$. Let $v \in X_p$ and suppose $|v|_{W^{m,p}(U)}=0$. Then $D^\alpha v=0$ in $L^p(U)$ for every multi-index $\alpha$ with $|\alpha|=m$, hence $D^\alpha v=0$ in $\mathcal{D}'(U)$. By the connected-domain polynomial characterization just proved, $v$ agrees $\mathcal{L}^n$-a.e. on $U$ with some polynomial $P \in \mathcal{P}_{m-1}$. Since $v \in X_p$ and $v=P$ $\mathcal{L}^n$-a.e. on $U$,
\begin{align*}
\int_B P(x)r_i(x)\,d\mathcal{L}^n(x)=0
\end{align*}
for every $1 \le i \le N$. Taking $P=\sum_{i=1}^N c_i r_i$ and using the invertibility of $G$ gives $c_i=0$ for every $i$, hence $P=0$ and $v=0$ in $W^{m,p}(U)$.
[/step]
[step:Prove the lower-order estimate by compactness when $1 \le p < \infty$]
Assume first that $1 \le p < \infty$. We prove that there exists $C_0=C_0(U,m,n,p)>0$ such that
\begin{align*}
\sum_{\ell=0}^{m-1} |v|_{W^{\ell,p}(U)} \le C_0 |v|_{W^{m,p}(U)}
\end{align*}
for every $v \in X_p$.
Suppose not. Then for each $j \in \mathbb{N}$ there exists $v_j \in X_p$ such that
\begin{align*}
\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,p}(U)}=1
\end{align*}
and
\begin{align*}
|v_j|_{W^{m,p}(U)} \le \frac{1}{j}.
\end{align*}
Thus $(v_j)$ is bounded in $W^{m,p}(U)$. Since $U$ is bounded and Lipschitz, the [Rellich-Kondrachov Compactness Theorem](/theorems/64) implies that, after passing to a subsequence, $v_j \to v$ in $W^{m-1,p}(U)$ for some $v \in W^{m-1,p}(U)$. Here $C_c^\infty(U)$ denotes the space of smooth compactly supported test functions on $U$, and $\mathcal{D}'(U)$ denotes its distributional dual.
For every multi-index $\alpha$ with $|\alpha|=m$, the sequence $D^\alpha v_j$ converges to $0$ in $L^p(U)$ because $|v_j|_{W^{m,p}(U)} \to 0$. Therefore the distributional derivatives of order $m$ of the limit vanish: $D^\alpha v=0$ in $\mathcal{D}'(U)$ for every $|\alpha|=m$. Hence $v$ agrees almost everywhere with a polynomial $P \in \mathcal{P}_{m-1}$.
The moment conditions pass to the limit by Holder's inequality. Let $p'\in[1,\infty]$ be the conjugate exponent to $p$. Since $B$ has finite measure and $r_i$ is bounded on $B$, we have $r_i\in L^{p'}(B)$. Convergence in $W^{m-1,p}(U)$ implies convergence in $L^p(U)$, so
\begin{align*}
\left|\int_B (v_j(x)-v(x))r_i(x)\,d\mathcal{L}^n(x)\right| \le \|v_j-v\|_{L^p(B)}\|r_i\|_{L^{p'}(B)} \to 0.
\end{align*}
Thus
\begin{align*}
\int_B P(x)r_i(x)\,d\mathcal{L}^n(x)=0
\end{align*}
for every $1 \le i \le N$. By invertibility of the Gram matrix, $P=0$, so $v=0$. But convergence in $W^{m-1,p}(U)$ gives
\begin{align*}
\sum_{\ell=0}^{m-1} |v|_{W^{\ell,p}(U)}=\lim_{j\to\infty}\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,p}(U)}=1,
\end{align*}
a contradiction. The desired estimate follows.
[guided]
The point of the moment conditions is that they remove exactly the polynomial obstruction. We prove the estimate by contradiction. Suppose there is no constant $C_0$ such that
\begin{align*}
\sum_{\ell=0}^{m-1} |v|_{W^{\ell,p}(U)} \le C_0 |v|_{W^{m,p}(U)}
\end{align*}
for all $v \in X_p$. Then we can find $v_j \in X_p$ with the lower-order part normalised to one:
\begin{align*}
\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,p}(U)}=1,
\end{align*}
while the top-order part tends to zero:
\begin{align*}
|v_j|_{W^{m,p}(U)} \le \frac{1}{j}.
\end{align*}
Together these two formulas say that $(v_j)$ is bounded in $W^{m,p}(U)$.
Now we use compactness. The [Rellich-Kondrachov Compactness Theorem](/theorems/64) applies because $U$ is a bounded Lipschitz domain and $1 \le p < \infty$. It gives a subsequence, still denoted $(v_j)$, and a function $v \in W^{m-1,p}(U)$ such that
\begin{align*}
v_j \to v \quad \text{in } W^{m-1,p}(U).
\end{align*}
Here $C_c^\infty(U)$ denotes the space of smooth compactly supported test functions on $U$, and $\mathcal{D}'(U)$ denotes its distributional dual.
This strong convergence is exactly why the contradiction argument is set up with lower-order seminorms on the left: compactness is available below the top derivative order.
For every multi-index $\alpha$ with $|\alpha|=m$, the inequality $|v_j|_{W^{m,p}(U)} \le 1/j$ implies
\begin{align*}
\|D^\alpha v_j\|_{L^p(U)} \to 0.
\end{align*}
Hence the order-$m$ distributional derivatives of the limit vanish. More explicitly, for each [test function](/page/Test%20Function) $\varphi \in C_c^\infty(U)$,
\begin{align*}
D^\alpha v(\varphi)=\lim_{j\to\infty} D^\alpha v_j(\varphi)=0.
\end{align*}
Therefore $D^\alpha v=0$ in $\mathcal{D}'(U)$ for all $|\alpha|=m$. The connected-domain polynomial characterization proved above then gives a polynomial $P \in \mathcal{P}_{m-1}$ such that $v=P$ $\mathcal{L}^n$-almost everywhere on $U$.
It remains to use the moment conditions. Let $p'\in[1,\infty]$ be the conjugate exponent to $p$. Since $B$ has finite measure and each $r_i$ is bounded on $B$, we have $r_i\in L^{p'}(B)$. Holder's inequality gives
\begin{align*}
\left|\int_B (v_j(x)-v(x))r_i(x)\,d\mathcal{L}^n(x)\right| \le \|v_j-v\|_{L^p(B)}\|r_i\|_{L^{p'}(B)}.
\end{align*}
The right-hand side tends to $0$ because $v_j\to v$ in $L^p(U)$. Passing to the limit in
\begin{align*}
\int_B v_j(x)r_i(x)\,d\mathcal{L}^n(x)=0
\end{align*}
therefore gives
\begin{align*}
\int_B P(x)r_i(x)\,d\mathcal{L}^n(x)=0
\end{align*}
for every $1 \le i \le N$. Writing $P=\sum_{i=1}^N c_i r_i$, this says $Gc=0$. Since the Gram matrix $G$ is invertible, $c=0$, so $P=0$ and $v=0$.
But the strong convergence in $W^{m-1,p}(U)$ also gives convergence of every lower-order seminorm. Therefore
\begin{align*}
\sum_{\ell=0}^{m-1} |v|_{W^{\ell,p}(U)} = \lim_{j\to\infty}\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,p}(U)} = 1.
\end{align*}
This contradicts $v=0$. Hence the lower-order estimate must hold on $X_p$.
[/guided]
[/step]
[step:Prove the lower-order estimate when $p=\infty$]
For $p=\infty$, the same estimate holds with a constant $C_0=C_0(U,m,n)>0$:
\begin{align*}
\sum_{\ell=0}^{m-1} |v|_{W^{\ell,\infty}(U)} \le C_0 |v|_{W^{m,\infty}(U)}
\end{align*}
for all $v \in X_\infty$.
Indeed, if this failed, there would exist $v_j \in X_\infty$ such that
\begin{align*}
\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,\infty}(U)}=1
\end{align*}
and
\begin{align*}
|v_j|_{W^{m,\infty}(U)} \le \frac{1}{j}.
\end{align*}
Thus $(v_j)$ is bounded in $W^{m,\infty}(U)$. Since $U$ is a bounded Lipschitz domain, the Sobolev [extension theorem](/theorems/59) for Lipschitz domains gives a bounded linear extension operator $E: W^{m,\infty}(U) \to W^{m,\infty}(\mathbb{R}^n)$; this is the bounded extension property for Sobolev spaces on Lipschitz domains. Choose $R>0$ so large that $\overline{U}\subset B(0,R)$, and set $V:=B(0,R)$. For every multi-index $\beta$ with $|\beta|\le m-1$, the functions $D^\beta E v_j$ are uniformly bounded on $V$, and their weak first derivatives are uniformly bounded in $L^\infty(V)$ because those first derivatives are order at most $m$ derivatives of $E v_j$. Since $V$ is convex, each representative of $D^\beta E v_j$ has a globally Lipschitz representative on $\overline{V}$ with Lipschitz constant bounded by its $W^{1,\infty}(V)$ norm. The family is therefore uniformly bounded and equicontinuous on the compact set $\overline{V}$. Applying the Arzela-Ascoli compactness criterion for uniformly bounded equicontinuous families on compact metric spaces to the finite family of multi-indices $|\beta|\le m-1$ and then diagonalising, there are a subsequence and functions $w_\beta \in C(\overline{V})$ such that $D^\beta E v_j \to w_\beta$ uniformly on $\overline{V}$ for every $|\beta|\le m-1$.
Let $v := w_0|_U$. Uniform convergence of the distributional derivatives and [integration by parts](/theorems/210) against test functions in $C_c^\infty(U)$ show that $w_\beta|_U = D^\beta v$ in $\mathcal{D}'(U)$ for every $|\beta|\le m-1$. Since the representatives converge uniformly on $\overline V$, their restrictions converge in $L^\infty(U)$ for every $|\beta|\le m-1$. Therefore $v \in W^{m-1,\infty}(U)$ and $v_j \to v$ in $W^{m-1,\infty}(U)$.
Since the order-$m$ derivatives satisfy $\|D^\alpha v_j\|_{L^\infty(U)} \to 0$ for every $|\alpha|=m$, the limit satisfies $D^\alpha v=0$ in $\mathcal{D}'(U)$ for every $|\alpha|=m$. Hence $v=P$ almost everywhere on $U$ for some $P \in \mathcal{P}_{m-1}$. The uniform convergence of $v_j$ to $v$ on $B$ passes the moment conditions to the limit, so $P$ is orthogonal to every $r_i$ in $L^2(B,\mathcal{L}^n)$. The Gram matrix is invertible, hence $P=0$. This contradicts
\begin{align*}
\sum_{\ell=0}^{m-1} |v|_{W^{\ell,\infty}(U)} = \lim_{j\to\infty}\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,\infty}(U)} = 1.
\end{align*}
Thus the estimate holds for $p=\infty$.
[guided]
The endpoint $p=\infty$ needs a separate compactness argument because Rellich-Kondrachov in the form used above was invoked only for $1\le p<\infty$. Suppose the estimate fails. Then there are functions $v_j\in X_\infty$ such that
\begin{align*}
\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,\infty}(U)}=1
\end{align*}
and
\begin{align*}
|v_j|_{W^{m,\infty}(U)}\le \frac{1}{j}.
\end{align*}
These two bounds imply that $(v_j)$ is bounded in $W^{m,\infty}(U)$.
Because $U$ is a bounded Lipschitz domain, use the bounded extension property of Lipschitz domains to choose a [bounded linear operator](/page/Bounded%20Linear%20Operator)
\begin{align*}
E: W^{m,\infty}(U)\to W^{m,\infty}(\mathbb{R}^n).
\end{align*}
Choose $R>0$ with $\overline U\subset B(0,R)$ and set $V:=B(0,R)$. This choice matters: $V$ is convex, so a $W^{1,\infty}(V)$ function has a representative that is globally Lipschitz on $\overline V$. For each multi-index $\beta$ with $|\beta|\le m-1$, the functions $D^\beta E v_j$ are uniformly bounded in $W^{1,\infty}(V)$, because their first weak derivatives are derivatives of $E v_j$ of order at most $m$. Hence the representatives of $D^\beta E v_j$ are uniformly bounded and equicontinuous on the compact set $\overline V$.
The Arzela-Ascoli compactness criterion for uniformly bounded equicontinuous families on compact metric spaces gives a uniformly convergent subsequence for each fixed $\beta$. Since there are only finitely many multi-indices with $|\beta|\le m-1$, a diagonal extraction gives one subsequence, still denoted $(v_j)$, and functions $w_\beta\in C(\overline V)$ such that
\begin{align*}
D^\beta E v_j\to w_\beta \quad \text{uniformly on } \overline V
\end{align*}
for every $|\beta|\le m-1$.
Let $v:=w_0|_U$. We now identify the other limits as weak derivatives of $v$. For every test function $\varphi\in C_c^\infty(U)$ and every $|\beta|\le m-1$, [integration by parts](/theorems/2098) in the distributional sense gives the identity defining $D^\beta v$ after passing to the uniform limit. Uniform convergence on $\overline V$ also gives convergence in $L^\infty(U)$ after restricting the chosen representatives to $U$. Therefore $w_\beta|_U=D^\beta v$ in $\mathcal D'(U)$ for every $|\beta|\le m-1$, so $v\in W^{m-1,\infty}(U)$ and $v_j\to v$ in $W^{m-1,\infty}(U)$.
For every multi-index $\alpha$ with $|\alpha|=m$, the bound $|v_j|_{W^{m,\infty}(U)}\le 1/j$ implies
\begin{align*}
\|D^\alpha v_j\|_{L^\infty(U)}\to 0.
\end{align*}
Passing to distributions gives $D^\alpha v=0$ in $\mathcal D'(U)$ for every $|\alpha|=m$. Since $U$ is connected, the connected-domain polynomial characterization proved above gives a polynomial $P\in\mathcal P_{m-1}$ such that $v=P$ $\mathcal L^n$-a.e. on $U$.
The uniform convergence $v_j\to v$ on $B$ passes the moments to the limit:
\begin{align*}
\int_B P(x)r_i(x)\,d\mathcal L^n(x)=0
\end{align*}
for every $1\le i\le N$. Writing $P=\sum_{i=1}^N c_i r_i$, this is the linear system $Gc=0$. Since the Gram matrix $G$ is invertible, $c=0$, hence $P=0$ and $v=0$.
Finally, convergence in $W^{m-1,\infty}(U)$ gives convergence of the lower-order seminorms, so
\begin{align*}
\sum_{\ell=0}^{m-1} |v|_{W^{\ell,\infty}(U)}=\lim_{j\to\infty}\sum_{\ell=0}^{m-1} |v_j|_{W^{\ell,\infty}(U)}=1.
\end{align*}
This contradicts $v=0$. Therefore the endpoint estimate holds.
[/guided]
[/step]
[step:Apply the complementary estimate to the projection error]
Let $u \in W^{m,p}(U)$ and set
\begin{align*}
q := \Pi u \in \mathcal{P}_{m-1}.
\end{align*}
Then $v:=u-q$ belongs to $X_p$. For every integer $k$ with $0 \le k \le m-1$, the complementary estimate gives
\begin{align*}
|u-q|_{W^{k,p}(U)} \le \sum_{\ell=0}^{m-1}|u-q|_{W^{\ell,p}(U)} \le C_0 |u-q|_{W^{m,p}(U)}.
\end{align*}
Since $q$ has degree at most $m-1$, every [weak derivative](/page/Weak%20Derivative) $D^\alpha q$ with $|\alpha|=m$ is zero. Therefore
\begin{align*}
|u-q|_{W^{m,p}(U)} = |u|_{W^{m,p}(U)}.
\end{align*}
Combining the two displays gives
\begin{align*}
|u-q|_{W^{k,p}(U)} \le C_0 |u|_{W^{m,p}(U)}
\end{align*}
for every $0 \le k \le m-1$, while the case $k=m$ holds with constant $1$. Taking
\begin{align*}
C := \max\{C_0,1\}
\end{align*}
proves the asserted estimate for every $0 \le k \le m$.
[/step]