Prediction Error Decomposition for the Linear Gaussian State Space Likelihood

Prediction Error Decomposition for the Linear Gaussian State Space Likelihood (Theorem # 3656)

Theorem

Edit Issues Pull Requests Attributions Admin

Discussion

No discussion available for this theorem.

Proof

[proofplan] The likelihood is obtained by decomposing the joint density of the observations into one-step predictive conditional densities. The Kalman filter identifies each conditional distribution $Y_t \mid (Y_1=y_1,\dots,Y_{t-1}=y_{t-1})$ as a Gaussian distribution with mean $Z_t a_{t \mid t-1}+d_t$ and covariance $F_t$. Substituting the multivariate Gaussian density into the product decomposition and then taking logarithms gives the stated prediction error likelihood. [/proofplan] [step:Write the joint likelihood as a product of one-step predictive densities] Let $p_{1:n}: (\mathbb{R}^m)^n \to [0,\infty)$ denote the joint density of $(Y_1,\dots,Y_n)$ with respect to $\mathcal{L}^{mn}$. For each $t \in \{1,\dots,n\}$, let \begin{align*} p_t(\,\cdot \mid y_1,\dots,y_{t-1}): \mathbb{R}^m \to [0,\infty) \end{align*} denote the conditional density of $Y_t$ given $Y_1=y_1,\dots,Y_{t-1}=y_{t-1}$ with respect to $\mathcal{L}^m$, where for $t=1$ this means the marginal density of $Y_1$ conditional only on the fixed initial quantities $a_1$ and $P_1$. The chain rule for conditional densities gives \begin{align*} p_{1:n}(y_1,\dots,y_n) = \prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}). \end{align*} Therefore the likelihood function, conditional on $a_1$ and $P_1$, is \begin{align*} L(y_1,\dots,y_n \mid a_1,P_1) = \prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}). \end{align*} [guided] The likelihood is a joint density evaluated at the observed data. We denote this joint density by \begin{align*} p_{1:n}: (\mathbb{R}^m)^n \to [0,\infty), \end{align*} where the reference measure is the $mn$-dimensional Lebesgue measure $\mathcal{L}^{mn}$ on $(\mathbb{R}^m)^n$. For each time $t$, we also introduce the one-step conditional density \begin{align*} p_t(\,\cdot \mid y_1,\dots,y_{t-1}): \mathbb{R}^m \to [0,\infty) \end{align*} with respect to $\mathcal{L}^m$. When $t=1$, there are no earlier observations, so $p_1(\,\cdot\,)$ is the marginal predictive density determined by the fixed initial quantities $a_1$ and $P_1$. The conditional density factorization states that a joint density may be built by multiplying successive conditional densities: \begin{align*} p_{1:n}(y_1,\dots,y_n) = p_1(y_1) p_2(y_2 \mid y_1) \cdots p_n(y_n \mid y_1,\dots,y_{n-1}). \end{align*} Equivalently, \begin{align*} p_{1:n}(y_1,\dots,y_n) = \prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}). \end{align*} Thus the conditional likelihood of the observed data, with $a_1$ and $P_1$ fixed, is exactly this product: \begin{align*} L(y_1,\dots,y_n \mid a_1,P_1) = \prod_{t=1}^{n} p_t(y_t \mid y_1,\dots,y_{t-1}). \end{align*} This is the point at which the likelihood becomes a prediction-error likelihood: each factor depends only on the one-step-ahead prediction error at time $t$. [/guided] [/step] [step:Identify the one-step predictive Gaussian density from the Kalman filter] For each $t \in \{1,\dots,n\}$, the Kalman filter gives the conditional Gaussian law \begin{align*} Y_t \mid (Y_1=y_1,\dots,Y_{t-1}=y_{t-1}) \sim \mathcal{N}(Z_t a_{t \mid t-1}+d_t, F_t). \end{align*} Define the predictive mean $\mu_t \in \mathbb{R}^m$ by \begin{align*} \mu_t = Z_t a_{t \mid t-1}+d_t. \end{align*} Since $F_t$ is invertible by hypothesis and is a covariance matrix, it is positive definite on the non-degenerate observation directions, so $\det F_t>0$. Hence the conditional density at $y_t$ is \begin{align*} p_t(y_t \mid y_1,\dots,y_{t-1}) = (2\pi)^{-m/2}(\det F_t)^{-1/2} \exp\left( -\frac{1}{2}(y_t-\mu_t)^\top F_t^{-1}(y_t-\mu_t) \right). \end{align*} By the definition of the innovation vector $v_t=y_t-\mu_t$, this becomes \begin{align*} p_t(y_t \mid y_1,\dots,y_{t-1}) = (2\pi)^{-m/2}(\det F_t)^{-1/2} \exp\left( -\frac{1}{2}v_t^\top F_t^{-1}v_t \right). \end{align*} [guided] The Kalman filter supplies the one-step predictive distribution. At time $t$, after processing the observations $y_1,\dots,y_{t-1}$, it produces a predicted state mean $a_{t \mid t-1} \in \mathbb{R}^r$ and predicted state covariance $P_{t \mid t-1} \in \mathbb{R}^{r \times r}$. Passing this predicted state through the linear observation equation gives the predictive observation mean \begin{align*} \mu_t = Z_t a_{t \mid t-1}+d_t \in \mathbb{R}^m. \end{align*} The corresponding predictive covariance is the innovation covariance matrix $F_t \in \mathbb{R}^{m \times m}$. Thus the conditional law is \begin{align*} Y_t \mid (Y_1=y_1,\dots,Y_{t-1}=y_{t-1}) \sim \mathcal{N}(\mu_t,F_t). \end{align*} The hypothesis that $F_t$ is invertible is exactly the non-degeneracy condition needed to write an ordinary density on $\mathbb{R}^m$. Since $F_t$ is a covariance matrix and is invertible, its determinant is positive, and the multivariate Gaussian density with mean $\mu_t$ and covariance $F_t$ is \begin{align*} p_t(y_t \mid y_1,\dots,y_{t-1}) = (2\pi)^{-m/2}(\det F_t)^{-1/2} \exp\left( -\frac{1}{2}(y_t-\mu_t)^\top F_t^{-1}(y_t-\mu_t) \right). \end{align*} The innovation vector is precisely the prediction error \begin{align*} v_t = y_t-\mu_t = y_t-Z_t a_{t \mid t-1}-d_t. \end{align*} Substituting this definition into the Gaussian density gives \begin{align*} p_t(y_t \mid y_1,\dots,y_{t-1}) = (2\pi)^{-m/2}(\det F_t)^{-1/2} \exp\left( -\frac{1}{2}v_t^\top F_t^{-1}v_t \right). \end{align*} [/guided] [/step] [step:Take logarithms of the product of predictive densities] Substituting the preceding density formula into the likelihood product gives \begin{align*} L(y_1,\dots,y_n \mid a_1,P_1) = \prod_{t=1}^{n} (2\pi)^{-m/2}(\det F_t)^{-1/2} \exp\left( -\frac{1}{2}v_t^\top F_t^{-1}v_t \right). \end{align*} Taking logarithms and using $\log(ab)=\log a+\log b$ for positive factors yields \begin{align*} \ell(y_1,\dots,y_n \mid a_1,P_1) &= \sum_{t=1}^{n} \left[ -\frac{m}{2}\log(2\pi) -\frac{1}{2}\log\det F_t -\frac{1}{2}v_t^\top F_t^{-1}v_t \right] \\ &= -\frac{1}{2}\sum_{t=1}^{n} \left( m\log(2\pi) + \log\det F_t + v_t^\top F_t^{-1}v_t \right). \end{align*} This is the asserted log-likelihood formula. The dependence of $v_t$ and $F_t$ on earlier observations is already encoded in the recursively computed Kalman filter quantities $a_{t \mid t-1}$ and $P_{t \mid t-1}$, so the expression is evaluated sequentially at the fixed model parameters. [/step]

Explore Further

Orthogonal Equivariance of Covariance Principal Components probability Distribution of the Sample Mean of a Multivariate Normal Sample probability Exact Normal-Theory Prediction Region for a New Multivariate Linear-Model Observation probability Rayleigh–Ritz Variance Maximisation Characterisation probability Conditional Distribution of a Multivariate Normal Vector probability Wilks' Lambda Product Formula probability Covariance Stationarity Criterion for the GARCH(1,1) Process probability Maximum Likelihood Estimator of the Coefficient Matrix in the Multivariate Linear Model probability

What brings you to Androma?

Start with a route through the knowledge graph.