[guided]To prove linear independence, we need a way to "read off" the coefficient of each $e_i \otimes f_j$ in a linear combination. The strategy is to construct, for each pair $(i_0, j_0)$, a linear functional on $V \otimes_k W$ that extracts the $(i_0, j_0)$-coefficient.
For each $1 \le i_0 \le m$, let $\varepsilon_{i_0}: V \to k$ denote the $i_0$-th coordinate functional defined by $\varepsilon_{i_0}(e_i) = \delta_{i, i_0}$ (the Kronecker delta). This is the unique element of $V^*$ satisfying $\varepsilon_{i_0}\left(\sum_i a_i e_i\right) = a_{i_0}$. Similarly, for each $1 \le j_0 \le n$, let $\varepsilon_{j_0}': W \to k$ be the $j_0$-th coordinate functional with $\varepsilon_{j_0}'(f_j) = \delta_{j, j_0}$.
Now define
\begin{align*}
\beta_{i_0, j_0}: V \times W &\to k \\
(v, w) &\mapsto \varepsilon_{i_0}(v) \cdot \varepsilon_{j_0}'(w).
\end{align*}
This is bilinear: for fixed $w$, the map $v \mapsto \varepsilon_{i_0}(v) \cdot \varepsilon_{j_0}'(w)$ is linear because $\varepsilon_{i_0}$ is linear and $\varepsilon_{j_0}'(w)$ is a scalar, and symmetrically in the second argument.
By the [universal property of the tensor product](/theorems/???), there is a unique linear map $\tilde{\beta}_{i_0, j_0}: V \otimes_k W \to k$ satisfying
\begin{align*}
\tilde{\beta}_{i_0, j_0}(v \otimes w) = \beta_{i_0, j_0}(v, w) = \varepsilon_{i_0}(v) \cdot \varepsilon_{j_0}'(w)
\end{align*}
for all $v \in V$, $w \in W$. Why does this help? Because evaluating on basis tensors gives
\begin{align*}
\tilde{\beta}_{i_0, j_0}(e_i \otimes f_j) = \varepsilon_{i_0}(e_i) \cdot \varepsilon_{j_0}'(f_j) = \delta_{i, i_0}\, \delta_{j, j_0},
\end{align*}
so $\tilde{\beta}_{i_0, j_0}$ picks out the coefficient of $e_{i_0} \otimes f_{j_0}$ and kills all other basis tensors.[/guided]