Read first: Kalman Filter I: The Gauss-Markov model
Read next: Kalman Filter III: Measurement and time updates

In this post we present a result of central importance in the development of the Kalman filter.

Main result

Theorem II.1 (Conditioning of multivariate normal). Let $X \sim \mathcal{N}(\mu, \Sigma)$ be an $n$-dimensional random vector. Let us partition $X$ into two random vectors $X_1$ and $X_2$ as follows

$$ X = \begin{bmatrix}X_1\\X_2\end{bmatrix},\tag{1}$$

with $X_1 \in \R^{n_1}$, $X_2 \in \R^{n_2}$ with $n = n_1 + n_2$. Let

$$ \mu=\begin{bmatrix}\mu_1\\\mu_2\end{bmatrix} \text{and } \Sigma = \begin{bmatrix}\Sigma_{11} & \Sigma_{12}\\\Sigma_{21} & \Sigma_{22}\end{bmatrix},\tag{2}$$

and assume that $\Sigma_{22}\in\mathbb{S}_{++}^{n_2}$.

Then, the conditional distribution of $X_1$ given that $X_2 = x_2$ is normal with mean

$${\rm I\!E}[X_1 {}\mid{} X_2 = x_2] {}={} \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2),\tag{3a}$$

and

$${\rm Var}[X_1{}\mid{} X_2 = x_2] {}={} \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1} \Sigma_{21}.\tag{3b}$$

Theorem II.1 is discussed in the following video:

Proof. The proof hinges on Schur's complement. We define the Schur complement of $\Sigma$ (with respect to $\Sigma_{22}$) to be the following nonsingular matrix

$$\Sigma_* = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}.\tag{4}$$

Then, the inverse of $\Sigma$ is the matrix

$$ \Sigma^{-1} = \begin{bmatrix} \Sigma_{11}^* & \Sigma_{12}^* \\ \Sigma_{21}^{*} & \Sigma_{22}^{*} \end{bmatrix},\tag{5}$$

where $\Sigma_{11}^* = \Sigma_*^{-1}$, $\Sigma_{12}^*=-\Sigma_*^{-1}\Sigma_{12}\Sigma_{22}^{-1}$, $\Sigma_{21}^*=(\Sigma_{12}^*)^\intercal$ (since $\Sigma$ is symmetric) and $\Sigma_{22}^* = \Sigma_{22}^{-1} + \Sigma_{22}^{-1}\Sigma_{21}\Sigma_*^{-1}\Sigma_{12}\Sigma_{22}^{-1}$. We need to determine the pdf of $X_1$ conditional on $X_2$; we know that $p_{X_1\mid X_2}(x_1 {}\mid{} x_2)$ is proportional to $p_{X_1, X_2}(x_1, x_2)$. We denote this as

$$p_{X_1\mid X_2}(x_1 {}\mid{} x_2) \propto p_{X_1, X_2}(x_1, x_2).\tag{6}$$

Since $(X_1, X_2)$ are jointly normal, we have that its pdf is

$$p_{X_1, X_2}(x_1, x_2) {}\propto{} \exp\left(-\tfrac{1}{2}(x - \mu)^\intercal \Sigma^{-1}(x - \mu)\right),\tag{7}$$

where $x=(x_1, x_2)$ and $\mu = (\mu_1, \mu_2)$.

The reader can use the block-inversion formula in Equation (5) to verify that we can write

$$\begin{aligned} (x - \mu)^\intercal \Sigma^{-1}(x - \mu) {}={}& (x_1 - \mu_*)^\intercal \Sigma_*^{-1}(x_1 - \mu_*) \\ &\quad{}+{} (x_2 - \mu_2)^\intercal \Sigma_{22}^{-1}(x_2 - \mu_2),\tag{8}\end{aligned}$$

where $\mu_* = \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2)$. From Equations (6) and (7) we conclude that

$$p_{X_1\mid X_2}(x_1 {}\mid{} x_2) {}\propto{} \exp\left(-\tfrac{1}{2}(x_1 - \mu_*)^\intercal \Sigma_*^{-1}(x_1 - \mu_*)\right),\tag{9}$$

which proves that $X_1\mid X_2$ is normal with mean $\mu_*$ and variance $\Sigma_*$. $\Box$

Remark. By Equation \eqref{eq:nrm:conditional_variance}, we have

$${\rm Var}[X_1{}\mid{} X_2 = x_2] {}={} \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1} \Sigma_{21}.\tag{10}$$

Since \(\Sigma_{22} \succ 0\), \(\Sigma_{12}\Sigma_{22}^{-1} \Sigma_{21} \succcurlyeq 0\), therefore \(\Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1} \Sigma_{21} \preccurlyeq \Sigma_{11}\), i.e.,

$${\rm Var}[X_1{}\mid{} X_2 = x_2] \preccurlyeq {\rm Var}[X_1].\tag{11}$$

In other words, additional information does not increase (in the sense of $\preccurlyeq$) the uncertainty!}

Example

Suppose that $Z=(Z_1, Z_2, Z_3, Z_4)$ is a four-dimensional random variable that follow the normal distribution, $Z \sim \mathcal{N}(\mu, \Sigma)$ with $\mu=(1,2,3,4)$ and

$$ \Sigma = \begin{bmatrix} 1.0 & 0.35 & 0.32 & 0.39 \\ 0.35 & 0.84 & 0.3 & 0.26\\ 0.32 & 0.3 & 0.77 & 0.23\\ 0.39 & 0.26 & 0.23 & 0.83 \end{bmatrix}.\tag{12}$$

The reader can verify that $\Sigma\in\mathbb{S}_{++}^4$. Suppose we measure $Z_3$ and $Z_4$ and we want to determine ${\rm I\!E}[Z_1, Z_2 {}\mid{} Z_3, Z_4]$. We will apply Theorem II.1 with $X_{1}=(Z_1, Z_2)$ and $X_{2}=(Z_3, Z_4)$. We have

$$ \Sigma_{11} = \begin{bmatrix}1.0 & 0.35\\0.35 & 0.84\end{bmatrix}, \Sigma_{12} = \begin{bmatrix}0.32 & 0.39\\0.3 & 0.26\end{bmatrix}, \Sigma_{22} = \begin{bmatrix}0.77 & 0.23\\0.23 & 0.83\end{bmatrix}.$$

and $\mu_1 = (1,2)$, $\mu_2 = (3, 4)$. By Theorem II.1

$${\rm I\!E}\left[Z_1, Z_2 {} \left|\hspace{-0.2em} \begin{array}{l} Z_3 = z_3 \\ Z_4 = z_4 \end{array} \hspace{-0.5em} \right. \right] {}={} \begin{bmatrix}1\\2\end{bmatrix} {+} \begin{bmatrix}0.32 & 0.39\\0.3 & 0.26\end{bmatrix} \begin{bmatrix}0.77 & 0.23\\0.23 & 0.83\end{bmatrix}^{-1} \left( \begin{bmatrix} z_3 \\z_4 \end{bmatrix} {}-{} \begin{bmatrix}3\\4\end{bmatrix} \right),$$

and

$$\begin{aligned} {\rm Var}\left[Z_1, Z_2 {} \left| \hspace{-0.2em} \begin{array}{l} Z_3 = z_3 \\ Z_4 = z_4 \end{array}\hspace{-0.5em} \right. \right] {}={}& \begin{bmatrix}1.0 & 0.35\\0.35 & 0.84\end{bmatrix} \\&\quad{-} \begin{bmatrix}0.32 & 0.39\\0.3 & 0.26\end{bmatrix} \begin{bmatrix}0.77 & 0.23\\0.23 & 0.83\end{bmatrix}^{-1} \begin{bmatrix}0.32 & 0.3\\0.39 & 0.26\end{bmatrix}.\end{aligned}$$

Read next: Kalman Filter III: Measurement and time updates