[proofplan]
The pullback is defined pointwise through the total derivative $Df_x$, so we verify the identity at an arbitrary point on an arbitrary $k$-tuple of tangent vectors. The argument has three ingredients. First, $f^* dy_i = df_i$: this is because $dy_i$ extracts the $i$-th component of a vector in $\mathbb{R}^n$, and the $i$-th component of $Df_x v$ is precisely $(df_i)_x(v)$. Second, the determinant convention for the wedge product converts the pullback of a decomposable $k$-form $dy_{i_1} \wedge \cdots \wedge dy_{i_k}$ into the wedge of the pulled-back factors $df_{i_1} \wedge \cdots \wedge df_{i_k}$. Third, coefficient functions transform by composition $a_I \mapsto a_I \circ f$, and the linearity of multilinear evaluation in $\omega$ extends the identity from a single decomposable summand to the full sum.
[/proofplan]
[step:Unpack the pointwise definition of the pullback]
A smooth $k$-form on $V$ is a smooth map
\begin{align*}
\omega: V &\to \Lambda^k(\mathbb{R}^n)^* \\
y &\mapsto \omega_y,
\end{align*}
where $\omega_y$ is an alternating $k$-multilinear form on $\mathbb{R}^n$. For the smooth map $f: U \to V$, the pullback $f^*\omega \in \Omega^k(U)$ is defined pointwise by
\begin{align*}
(f^*\omega)_x(v_1, \dots, v_k) := \omega_{f(x)}(Df_x v_1, \dots, Df_x v_k),
\end{align*}
for $x \in U$ and $v_1, \dots, v_k \in \mathbb{R}^m$. Here $Df_x: \mathbb{R}^m \to \mathbb{R}^n$ is the total derivative of $f$ at $x$.
Both sides of the claimed identity are smooth $k$-forms on $U$, so it suffices to prove equality at an arbitrary $x \in U$ evaluated on an arbitrary tuple $(v_1, \dots, v_k) \in (\mathbb{R}^m)^k$. We fix such $x$ and $v_1, \dots, v_k$ for the remainder of the proof.
[/step]
[step:Identify the pullback of each coordinate 1-form as $f^* dy_i = df_i$]
The coordinate $1$-form $dy_i$ on $V$ is the constant covector field
\begin{align*}
dy_i: V &\to (\mathbb{R}^n)^* \\
y &\mapsto \bigl(w \mapsto w_i\bigr),
\end{align*}
i.e., $(dy_i)_y(w) = w_i$ for every $w \in \mathbb{R}^n$, independently of $y$. For any $v \in \mathbb{R}^m$, the definition of the pullback gives
\begin{align*}
(f^* dy_i)_x(v) = (dy_i)_{f(x)}(Df_x v) = (Df_x v)_i.
\end{align*}
Since $f = (f_1, \dots, f_n)$ with each $f_i: U \to \mathbb{R}$ smooth, the total derivative decomposes componentwise as
\begin{align*}
Df_x v = (Df_{1,x}(v), \dots, Df_{n,x}(v)) \in \mathbb{R}^n,
\end{align*}
so that $(Df_x v)_i = Df_{i,x}(v)$. By definition of the differential of a real-valued function, $(df_i)_x(v) = Df_{i,x}(v)$. Combining,
\begin{align*}
(f^* dy_i)_x(v) = (df_i)_x(v),
\end{align*}
and since $x \in U$ and $v \in \mathbb{R}^m$ were arbitrary, $f^* dy_i = df_i$.
[guided]
The pullback of a $1$-form is, by definition, the linear functional obtained by precomposing with the differential. So $(f^* dy_i)_x$ is the linear functional on $\mathbb{R}^m$ given by
\begin{align*}
v \longmapsto (dy_i)_{f(x)}(Df_x v).
\end{align*}
What does this functional do concretely? The covector $dy_i$ extracts the $i$-th coordinate of a vector in $\mathbb{R}^n$. So the value is the $i$-th coordinate of $Df_x v \in \mathbb{R}^n$.
Now, what is $Df_x v$? Since $f$ is the tuple $(f_1, \dots, f_n)$, applying the total derivative componentwise gives
\begin{align*}
Df_x v = (Df_{1,x}(v), \dots, Df_{n,x}(v)),
\end{align*}
i.e., the $j$-th component of $Df_x v$ is the directional derivative of $f_j$ at $x$ in direction $v$. In particular, its $i$-th component is $Df_{i,x}(v)$. By the very definition of the differential of a real-valued function, this is $(df_i)_x(v)$:
\begin{align*}
(f^* dy_i)_x(v) = (Df_x v)_i = Df_{i,x}(v) = (df_i)_x(v).
\end{align*}
The conceptual content of this step is that the pullback operation replaces the $j$-th target coordinate function $y_j$ with the $j$-th component function $f_j$, and correspondingly replaces $dy_j$ with $df_j$. Everything else in the proof is algebraic compatibility: pullback commutes with wedge products and with multiplication by functions, so once $dy_i \mapsto df_i$ is established the formula propagates to arbitrary $k$-forms.
[/guided]
[/step]
[step:Evaluate the pullback of an elementary wedge via the determinant identity]
For a strictly increasing multi-index $I = (i_1, \dots, i_k)$ with $1 \le i_1 < \cdots < i_k \le n$, define the elementary wedge products
\begin{align*}
\eta_I &:= dy_{i_1} \wedge \cdots \wedge dy_{i_k} \in \Omega^k(V), &
\zeta_I &:= df_{i_1} \wedge \cdots \wedge df_{i_k} \in \Omega^k(U).
\end{align*}
We adopt the **determinant convention** for the wedge product of covectors: for any covectors $\alpha_1, \dots, \alpha_k \in W^*$ and vectors $w_1, \dots, w_k \in W$ (on any finite-dimensional [vector space](/page/Vector%20Space) $W$),
\begin{align*}
(\alpha_1 \wedge \cdots \wedge \alpha_k)(w_1, \dots, w_k) = \det\bigl[\alpha_p(w_q)\bigr]_{p,q=1}^k.
\end{align*}
Applying the pullback definition to $\eta_I$ and then the determinant identity:
\begin{align*}
(f^* \eta_I)_x(v_1, \dots, v_k)
&= (\eta_I)_{f(x)}(Df_x v_1, \dots, Df_x v_k) \\
&= \det\bigl[(dy_{i_p})_{f(x)}(Df_x v_q)\bigr]_{p,q=1}^k.
\end{align*}
Each matrix entry is computed using the identity from Step 2:
\begin{align*}
(dy_{i_p})_{f(x)}(Df_x v_q) = (Df_x v_q)_{i_p} = (df_{i_p})_x(v_q).
\end{align*}
Substituting back and applying the determinant identity in reverse, this time to the covectors $(df_{i_1})_x, \dots, (df_{i_k})_x$ on $\mathbb{R}^m$:
\begin{align*}
(f^* \eta_I)_x(v_1, \dots, v_k) = \det\bigl[(df_{i_p})_x(v_q)\bigr]_{p,q=1}^k = (\zeta_I)_x(v_1, \dots, v_k).
\end{align*}
Since $x \in U$ and $(v_1, \dots, v_k) \in (\mathbb{R}^m)^k$ were arbitrary, $f^* \eta_I = \zeta_I$.
[guided]
The wedge product of covectors is alternating and multilinear, and the **determinant convention** packages these two properties into a single closed-form evaluation rule: for covectors $\alpha_1, \dots, \alpha_k \in W^*$ and vectors $w_1, \dots, w_k \in W$,
\begin{align*}
(\alpha_1 \wedge \cdots \wedge \alpha_k)(w_1, \dots, w_k) = \det\bigl[\alpha_p(w_q)\bigr]_{p,q=1}^k.
\end{align*}
(The alternative convention divides by $k!$; the formula in this theorem is the same under either convention, since both sides of the equality use the same convention.)
The strategy: we pull back the elementary wedge $\eta_I = dy_{i_1} \wedge \cdots \wedge dy_{i_k}$, evaluate it via the determinant convention, and then observe that each matrix entry is one of the scalars $(df_{i_p})_x(v_q)$ produced in Step 2. Applying the determinant convention in the reverse direction repackages the matrix into a wedge of the differentials $df_{i_p}$.
Concretely, from the pullback definition,
\begin{align*}
(f^* \eta_I)_x(v_1, \dots, v_k) = (\eta_I)_{f(x)}\bigl(Df_x v_1, \dots, Df_x v_k\bigr).
\end{align*}
Since $\eta_I$ is the wedge of the covectors $(dy_{i_1})_{f(x)}, \dots, (dy_{i_k})_{f(x)}$ on $\mathbb{R}^n$, the determinant convention gives
\begin{align*}
(f^* \eta_I)_x(v_1, \dots, v_k) = \det\bigl[(dy_{i_p})_{f(x)}(Df_x v_q)\bigr]_{p,q}.
\end{align*}
By Step 2, the entry $(dy_{i_p})_{f(x)}(Df_x v_q)$ equals $(df_{i_p})_x(v_q)$. Therefore the matrix has entries $\bigl[(df_{i_p})_x(v_q)\bigr]_{p,q}$, and applying the determinant convention again — this time on the [vector space](/page/Vector%20Space) $\mathbb{R}^m$ with the covectors $(df_{i_1})_x, \dots, (df_{i_k})_x$ — recognises the determinant as the value of $\zeta_I = df_{i_1} \wedge \cdots \wedge df_{i_k}$ at $x$ on $(v_1, \dots, v_k)$:
\begin{align*}
\det\bigl[(df_{i_p})_x(v_q)\bigr]_{p,q} = (\zeta_I)_x(v_1, \dots, v_k).
\end{align*}
This is the moment where the determinant convention earns its keep: pullback acts on the wedge by acting on each factor, because evaluating both sides reduces to the same determinant.
[/guided]
[/step]
[step:Handle coefficient functions and conclude by linearity]
For a smooth coefficient function $a_I \in C^\infty(V)$, the product $a_I \eta_I \in \Omega^k(V)$ is the $k$-form $y \mapsto a_I(y)\, (\eta_I)_y$. Applying the pullback definition:
\begin{align*}
(f^*(a_I \eta_I))_x(v_1, \dots, v_k)
&= (a_I \eta_I)_{f(x)}(Df_x v_1, \dots, Df_x v_k) \\
&= a_I(f(x)) \cdot (\eta_I)_{f(x)}(Df_x v_1, \dots, Df_x v_k) \\
&= (a_I \circ f)(x) \cdot (f^* \eta_I)_x(v_1, \dots, v_k) \\
&= (a_I \circ f)(x) \cdot (\zeta_I)_x(v_1, \dots, v_k),
\end{align*}
where the second equality uses that $\omega_y \mapsto a_I(y)\, \omega_y$ scales the multilinear form pointwise, the third re-applies the pullback definition to $\eta_I$, and the fourth uses Step 3. Hence
\begin{align*}
f^*(a_I \eta_I) = (a_I \circ f)\, df_{i_1} \wedge \cdots \wedge df_{i_k}.
\end{align*}
The pullback is $\mathbb{R}$-linear in $\omega$: for $\omega^{(1)}, \omega^{(2)} \in \Omega^k(V)$ and scalars $c_1, c_2 \in \mathbb{R}$, the pointwise definition together with linearity of evaluation of multilinear forms in the form itself gives
\begin{align*}
(f^*(c_1 \omega^{(1)} + c_2 \omega^{(2)}))_x(v_1, \dots, v_k)
&= (c_1 \omega^{(1)} + c_2 \omega^{(2)})_{f(x)}(Df_x v_1, \dots, Df_x v_k) \\
&= c_1 \omega^{(1)}_{f(x)}(Df_x v_1, \dots, Df_x v_k) + c_2 \omega^{(2)}_{f(x)}(Df_x v_1, \dots, Df_x v_k) \\
&= c_1 (f^* \omega^{(1)})_x(v_1, \dots, v_k) + c_2 (f^* \omega^{(2)})_x(v_1, \dots, v_k).
\end{align*}
This identity extends to any finite linear combination. Applying it to the finite sum $\omega = \sum_I a_I \eta_I$:
\begin{align*}
f^*\omega
&= \sum_{1 \le i_1 < \cdots < i_k \le n} f^*(a_{i_1\cdots i_k}\, \eta_{i_1\cdots i_k}) \\
&= \sum_{1 \le i_1 < \cdots < i_k \le n} (a_{i_1\cdots i_k} \circ f)\, df_{i_1} \wedge \cdots \wedge df_{i_k},
\end{align*}
which is the claimed identity. This completes the proof.
[/step]