[proofplan]
We prove two claims: that $(k_i^A)$ satisfies the given system, and that it is the minimal non-negative solution. For the first claim, the boundary condition $k_i^A = 0$ for $i \in A$ is immediate, and for $i \notin A$ we condition on $X_1$ and use the Markov property to express $k_i^A$ as $1 + \sum_j p_{i,j}\, k_j^A$. For minimality, we show by induction that any non-negative solution $(y_i)$ satisfies $y_i \geq \sum_{m=1}^{n} \mathbb{P}_i(H^A \geq m)$ for every $n$, and pass to $n \to \infty$ using the tail-sum formula for expectation.
[/proofplan]
[step:Verify $k_i^A = 0$ for $i \in A$]
If $i \in A$, then $X_0 = i \in A$, so $H^A = 0$ and
\begin{align*}
k_i^A = \mathbb{E}_i[H^A] = 0.
\end{align*}
[/step]
[step:Condition on $X_1$ to derive the recurrence for $i \notin A$]
Suppose $i \notin A$. Since $X_0 = i \notin A$, we have $H^A \geq 1$: the chain must take at least one step before entering $A$. Conditioning on the first step $X_1 = j$ and applying the Markov property:
\begin{align*}
k_i^A = \mathbb{E}_i[H^A] = \sum_{j \in S} \mathbb{E}_i[H^A \mid X_1 = j]\, p_{i,j}.
\end{align*}
Given $X_1 = j$, the Markov property implies that the process $(X_1, X_2, \ldots)$ is a Markov chain started at $j$ with the same transition matrix $P$. Since $i \notin A$, the hitting time $H^A$ can be decomposed as $H^A = 1 + H^A \circ \theta_1$, where $\theta_1$ denotes the time-shift operator and $H^A \circ \theta_1$ is the hitting time of $A$ for the shifted chain starting at $X_1$. By the Markov property, $\mathbb{E}_i[H^A \circ \theta_1 \mid X_1 = j] = \mathbb{E}_j[H^A] = k_j^A$. Therefore:
\begin{align*}
\mathbb{E}_i[H^A \mid X_1 = j] = 1 + k_j^A.
\end{align*}
Substituting:
\begin{align*}
k_i^A = \sum_{j \in S} p_{i,j}\,(1 + k_j^A) = \sum_{j \in S} p_{i,j} + \sum_{j \in S} p_{i,j}\, k_j^A = 1 + \sum_{j \in S} p_{i,j}\, k_j^A,
\end{align*}
where we used $\sum_{j \in S} p_{i,j} = 1$ (the rows of a stochastic matrix sum to one). This confirms that $(k_i^A)$ satisfies the stated system.
[guided]
The "$1 +$" term in the recurrence has a direct interpretation: one unit of time elapses when the chain moves from state $i$ to state $X_1 = j$, and then the remaining expected time to reach $A$ from $j$ is $k_j^A$. This is the essence of first-step analysis for expectations.
More precisely, since $i \notin A$, we have $H^A \geq 1$. Decompose $H^A = 1 + (\text{remaining time after step 1})$. The remaining time, viewed from $X_1 = j$, is exactly the hitting time of $A$ for a chain started at $j$. By the Markov property (the future $(X_1, X_2, \ldots)$ given $X_1 = j$ has the same law as a fresh chain from $j$), this remaining time has expectation $k_j^A$. Therefore:
\begin{align*}
k_i^A = \mathbb{E}_i[H^A] &= \sum_{j \in S} p_{i,j}\, \mathbb{E}_i[H^A \mid X_1 = j] \\
&= \sum_{j \in S} p_{i,j}\,(1 + k_j^A) = 1 + \sum_{j \in S} p_{i,j}\, k_j^A,
\end{align*}
where the last equality uses $\sum_j p_{i,j} = 1$.
Note the structural parallel with the [Hitting Probability System](/theorems/2217): for hitting probabilities, the recurrence is $h_i^A = \sum_j p_{i,j}\, h_j^A$; for mean hitting times, it is $k_i^A = 1 + \sum_j p_{i,j}\, k_j^A$. The extra "$1$" accounts for the cost of the first step, which was free (probability $1$) in the hitting-probability problem but costs one unit of time in the mean-hitting-time problem.
[/guided]
[/step]
[step:Prove minimality by induction using the tail-sum formula for expectation]
Let $(y_i : i \in S)$ be any non-negative solution to the system. We must show $y_i \geq k_i^A$ for all $i$.
For $i \in A$, $y_i = 0 = k_i^A$.
For $i \notin A$, we prove by induction on $n \geq 1$ that
\begin{align*}
y_i \geq \sum_{m=1}^{n} \mathbb{P}_i(H^A \geq m).
\end{align*}
**Base case** ($n = 1$): Since $i \notin A$, the system gives $y_i = 1 + \sum_{j \in S} p_{i,j}\, y_j \geq 1$, using $y_j \geq 0$. Since $i \notin A$ implies $H^A \geq 1$ with probability one under $\mathbb{P}_i$, we have $\mathbb{P}_i(H^A \geq 1) = 1$, so $y_i \geq 1 = \mathbb{P}_i(H^A \geq 1)$.
**Inductive step**: Suppose $y_j \geq \sum_{m=1}^{n} \mathbb{P}_j(H^A \geq m)$ for all $j \notin A$. Using the recurrence and splitting the sum:
\begin{align*}
y_i &= 1 + \sum_{j \in A} p_{i,j}\, y_j + \sum_{j \notin A} p_{i,j}\, y_j \\
&= 1 + \sum_{j \notin A} p_{i,j}\, y_j,
\end{align*}
since $y_j = 0$ for $j \in A$. By the inductive hypothesis applied to $j \notin A$:
\begin{align*}
y_i &\geq 1 + \sum_{j \notin A} p_{i,j} \sum_{m=1}^{n} \mathbb{P}_j(H^A \geq m) = 1 + \sum_{m=1}^{n} \sum_{j \notin A} p_{i,j}\, \mathbb{P}_j(H^A \geq m).
\end{align*}
For each $m \geq 1$, the inner sum satisfies $\sum_{j \notin A} p_{i,j}\, \mathbb{P}_j(H^A \geq m) = \mathbb{P}_i(X_1 \notin A,\, H^A \circ \theta_1 \geq m) = \mathbb{P}_i(H^A \geq m + 1)$, where we used the Markov property and the fact that $\{H^A \geq m+1\}$ requires both $X_1 \notin A$ and at least $m$ more steps to reach $A$ from $X_1$. Therefore:
\begin{align*}
y_i \geq 1 + \sum_{m=1}^{n} \mathbb{P}_i(H^A \geq m + 1) = \mathbb{P}_i(H^A \geq 1) + \sum_{m=2}^{n+1} \mathbb{P}_i(H^A \geq m) = \sum_{m=1}^{n+1} \mathbb{P}_i(H^A \geq m),
\end{align*}
where we used $\mathbb{P}_i(H^A \geq 1) = 1$ since $i \notin A$.
**Conclusion**: Taking $n \to \infty$ and applying the monotone convergence theorem (all terms are non-negative):
\begin{align*}
y_i \geq \sum_{m=1}^{\infty} \mathbb{P}_i(H^A \geq m).
\end{align*}
By the tail-sum formula for the expectation of a non-negative integer-valued random variable (allowing $+\infty$):
\begin{align*}
\sum_{m=1}^{\infty} \mathbb{P}_i(H^A \geq m) = \mathbb{E}_i[H^A] = k_i^A.
\end{align*}
Hence $y_i \geq k_i^A$ for all $i \in S$.
[guided]
The minimality proof is structurally parallel to the one for the [Hitting Probability System](/theorems/2217), but the quantity being bounded from below is different: there we bounded $\mathbb{P}_i(H^A \leq n)$; here we bound $\sum_{m=1}^{n} \mathbb{P}_i(H^A \geq m)$.
The connection between the tail sum and the expectation is the identity
\begin{align*}
\mathbb{E}_i[H^A] = \sum_{m=1}^{\infty} \mathbb{P}_i(H^A \geq m),
\end{align*}
valid for any non-negative integer-valued random variable (possibly $+\infty$). This is the discrete analogue of the layer-cake formula: the expectation of a non-negative random variable $Z$ equals $\sum_{m=1}^{\infty} \mathbb{P}(Z \geq m)$.
The induction works as follows. At each step, we expand the recurrence $y_i = 1 + \sum_{j \notin A} p_{i,j}\, y_j$ and use $y_j \geq 0$ to discard nothing — we simply apply the inductive hypothesis to each $y_j$. The "$1$" at each level of expansion accumulates: after $n$ expansions, we have collected $n$ tail probabilities $\mathbb{P}_i(H^A \geq 1), \ldots, \mathbb{P}_i(H^A \geq n)$.
The key point at each step of the induction is the identity
\begin{align*}
\sum_{j \notin A} p_{i,j}\, \mathbb{P}_j(H^A \geq m) = \mathbb{P}_i(H^A \geq m + 1),
\end{align*}
which follows from the Markov property: starting from $i \notin A$, the event $\{H^A \geq m+1\}$ means the first step goes to some $j \notin A$ (otherwise $H^A = 1 \leq m$), and then from $j$ the chain needs at least $m$ more steps to reach $A$.
Sending $n \to \infty$, the monotone convergence theorem (each partial sum is non-negative and increases in $n$) converts the finite-$n$ bound into $y_i \geq \mathbb{E}_i[H^A] = k_i^A$. As with the hitting probability, the non-negativity of $(y_i)$ is essential for the lower bound at each iteration step.
[/guided]
[/step]