Previous post: Kalman Filter VI: Further examples
Read next: The Kalman Filter VIII: Square root form
Consider the dynamical system
$$\begin{aligned} x_{t+1} {}={} & Ax_t + w_t, \\ y_t {}={} & Cx_t + v_t, \end{aligned}$$
under the independence assumptions we stated earlier and assuming that $w_t$, $v_t$, and $x_0$ are normally distributed, that is,
$$\begin{aligned} p_{w_t}(w) {}\propto{} & \exp \left[-\tfrac{1}{2}\|w\|_{Q^{-1}}^2\right], \\ p_{v_t}(v) {}\propto{} & \exp \left[-\tfrac{1}{2}\|v\|_{R^{-1}}^2\right], \\ p_{x_0}(x_0) {}\propto{} & \exp \left[-\tfrac{1}{2}\|x_0-\bar{x}_0\|_{P_{0}^{-1}}^{2}\right]. \end{aligned}$$
Given a set of measurements $y_0, y_1, \ldots, y_{N-1}$ we need to estimate $x_0, x_1, \ldots, x_{N}$.
Useful results
In what follows we use the convenient notation $x_{0:N}=(x_0, \ldots, x_N)$, and likewise we define $y_{0:N}$, $w_{0:N}$ and so on.
Proposition VII.1 (Joint distribution of states). For any $N\in{\rm I\!N}$
$$p(x_{0:N}) {}={} p(x_0)\prod_{t=0}^{N-1}p_{w_t}(x_{t+1} - Ax_{t}).$$
Proposition VI.2 (Joint distribution of outputs). Let $N\in{\rm I\!N}$. Then
$$p(y_{0:N}) {}={} p(y_0)\prod_{t=1}^{N}p(y_t {}\mid{} y_{0:t-1}),$$
where $y_{0:t} = (y_0, \ldots, y_t)$.
Proof. Similar to the proof of Proposition VII.1. $\Box$
Proposition VII.3 (Joint distribution of states and $w$'s). For any $N\in{\rm I\!N}$
$$\begin{aligned}p(x_{0:N}, w_{0:N-1}) {}={} \begin{cases} 0, & \text{if not } x_{t+1} = Ax_t + w_t, \\ p_{x_0}(x_0)\prod_{t=0}^{N-1}p_{w_t}(w_t), & \text{otherwise} \end{cases}\end{aligned}$$
Proposition VI.4 (Joint distribution of states given outputs). For any $N\in{\rm I\!N}$
$$\begin{aligned}p(x_{0:N} {}\mid{} y_{0:N-1}) {}\propto{} p_{x_0}(x_0)\prod_{t=0}^{N-1}p_{v_t}(y_t - Cx_t)p_{w_t}(x_{t+1}-Ax_t).\end{aligned}$$
The full information estimation problem
From Proposition VII.4, the log-likelihood (omitting the constant term) is
$$\begin{aligned} & \log p(x_0,\ldots, x_{N} {}\mid{} y_0, \ldots, y_{N-1})\\ & \qquad{}={} \log p_{x_0}(x_0) {}+{} \sum_{t=1}^{N-1}\log p_{v_t}(y_t - Cx_t) + \log p_{w_t}(x_{t+1}-Ax_t) \\ & \qquad{}={} -\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \tfrac{1}{2}\sum_{t=0}^{N-1} -\|y_t - Cx_t\|_{R^{-1}}^{2} {}-{} \|x_{t+1}-Ax_t\|_{Q^{-1}}^{2}. \end{aligned}$$
Using the result of this exercise, we have the maximum a posteriori (MAP) estimate
$$\begin{aligned} (\hat{x}_t)_{t=0}^{N-1} {}={} & \argmax_{x_0,\ldots,x_{N-1}} p(x_0,\ldots, x_{N-1} {}\mid{} y_0, \ldots, y_{N}) \\ {}={} & \argmax_{x_0,\ldots,x_{N-1}} \log p(x_0,\ldots, x_{N-1} {}\mid{} y_0, \ldots, y_{N}) \\ {}={} & \argmax_{x_0,\ldots,x_{N-1}} -\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \tfrac{1}{2}\sum_{t=0}^{N-1} -\|y_t - Cx_t\|_{R^{-1}}^{2} {}-{} \|x_{t+1}-Ax_t\|_{Q^{-1}}^{2} \\ {}={} & \argmin_{x_0,\ldots,x_{N-1}} \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \tfrac{1}{2}\sum_{t=0}^{N-1} \|y_t - Cx_t\|_{R^{-1}}^{2} {}+{} \|x_{t+1}-Ax_t\|_{Q^{-1}}^{2}. \end{aligned}$$
In fact we can write it as
$$(\hat{x}_t)_{t=0}^{N-1} {}={} \hspace{-1.5em} \argmin_{ \substack{ x_0,\ldots,x_{N}, \\ w_0, \ldots, w_{N-1}, \\ v_0, \ldots, v_{N-1}, \\ x_{t+1} = Ax_t + w_t, t\in{\rm I\!N}_{[0, N-1]} \\ y_{t} = Cx_t + v_t, t \in {\rm I\!N}_{[0, N]} } }\hspace{-1em} \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \sum_{t=0}^{N-1} \tfrac{1}{2}\|v_t\|_{R^{-1}}^{2} {}+{} \tfrac{1}{2}\|w_t\|_{Q^{-1}}^{2}.$$
We need to solve the problem
$$\begin{aligned} \operatorname*{minimise}_{ \substack{ x_0,\ldots,x_{N}, \\ w_0,\ldots, w_{N-1}, \\ v_0, \ldots, v_{N-1} } }\ & \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \sum_{t=0}^{N-1} \tfrac{1}{2}\|v_t\|_{R^{-1}}^{2} {}+{} \tfrac{1}{2}\|w_t\|_{Q^{-1}}^{2}, \\ \text{subject to: } & x_{t+1} = Ax_t + w_t, t\in{\rm I\!N}_{[0, N-1]}, \\ & y_{t} = Cx_t + v_t, t \in {\rm I\!N}_{[0, N]}, \end{aligned}$$
which is akin to a linear-quadratic optimal control problem. The key difference is the presence of an arrival cost instead of a terminal cost and the initial condition is not fixed. The problem can be written as follows:
$$\begin{aligned} \operatorname*{minimise}_{ \substack{ x_0,\ldots,x_{N}, \\ w_0,\ldots, w_{N-1} } }\ & \underbrace{\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2}}_{\ell_{x_0}(x_0)} + \sum_{t=0}^{N-1} \underbrace{ \tfrac{1}{2}\|y_t-Cx_t\|_{R^{-1}}^{2} {}+{} \tfrac{1}{2}\|w_t\|_{Q^{-1}}^{2} }_{\ell_t(x_t, w_t)}, \\ \text{subject to: } & x_{t+1} = Ax_t + w_t, t\in{\rm I\!N}_{[0, N-1]}. \end{aligned}$$
This is known as the full information estimation (FIE) problem. Later we will show that FIE is equivalent to the Kalman filter.
So what?
Firstly, the full information estimation formulation allows us to interpret the Kalman filter as a Bayesian estimator and work in terms of prior and posterior distributions instead of conditional expectations as we did earlier. This way we can get a better understanding of how the Kalam filter works.
By looking at the cost function of the FIE problem we see that we are trying to minimise:
- A penalty on the distance between the estimated initial state, $x_0$, and the expected initial state, $\bar{x}_0$, of our prior on $x_0$
- A penalty on the output error, $y_t - Cx_t$ (what we observe minus what we should be observing based on the estimated state)
- A penalty on the process noise (because we want to avoid an estimator that attributes everything to process noise)
The above penalties are scaled by their corresponding variance-covariance matrices. This means that the "higher" a covariance, the "lower" the inverse covariance will be, entailing a lower penalty. For example, if we don't trust our sensors very much we will be having a "large" $R$, so a "small" $R^{-1}$, so a small penalty on $y_t - Cx_t$, thus allowing the estimator to predict states which lead to large observation errors.
Lastly, the Bayesian approach allows us to generalise to nonlinear systems. We can easily repeat the above procedure assuming we have nonlinear dynamics and/or a nonlinear relationship between states and outputs. This will lead to the moving horizon estimation method.
Forward dynamic programming
The estimation problem becomes
$$\begin{aligned} \operatorname*{minimise}_{ x_0,\ldots,x_{N-1} }\ & \ell_{x_0}(x_0) + \sum_{t=0}^{N-1} \ell_t(x_t, w_t), \\ \text{subject to: } & x_{t+1} = Ax_t + w_t, t\in\N_{[0, N-1]}. \end{aligned}$$
We apply the DP procedure in a forward fashion:
$$\begin{aligned} \widehat{V}_N^\star {}={} & \min_{\substack{x_0,\ldots,x_N \\w_0,\ldots,w_{N-1}}} \left\{ \ell_{x_0}(x_0) + \sum_{t=0}^{N-1} \ell_t(x_t, w_t) \left| \begin{array}{l} x_{t+1} = Ax_t + w_t, \\ t\in\N_{[0, N-1]} \end{array} \right. \right\} \\ {}={} & \min_{\substack{x_1,\ldots,x_N \\w_1,\ldots,w_{N-1}}} \left\{ \underbrace{ \min_{x_0, w_0} \left\{ \ell_{x_0}(x_0) + \ell_0(x_0, w_0) {}\mid{} x_1 = Ax_0 + w_0 \right\} }_{V_1^{\star}(x_1)} \rule{0cm}{1.1cm}\right. \\& \qquad\qquad\qquad \left. \rule{0cm}{1.1cm} {}+{} \sum_{t=1}^{N-1} \ell_t(x_t, w_t)\, \left| \begin{array}{l} x_{t+1} = Ax_t + w_t, \\ t\in\N_{[1, N-1]} \end{array} \right. \right\} \\ {}={} & \min_{\substack{x_1,\ldots,x_N \\w_1,\ldots,w_{N-1}}} \left\{ V_1^{\star}(x_1) {}+{} \sum_{t=1}^{N-1} \ell_t(x_t, w_t) \left| \begin{array}{l} x_{t+1} = Ax_t + w_t, \\ t\in\N_{[1, N-1]} \end{array} \right. \right\}. \end{aligned}$$
The forward dynamic programming procedure can be written as
$$\begin{aligned} V_0^\star(x_0) {}={} & \ell_{x_0}(x_0), \\ V_{t+1}^{\star}(x_{t+1}) {}={} & \min_{x_t, w_t} \left\{ V_t^\star(x_t) + \ell_t(x_t, w_t) {}\mid{} x_{t+1} = Ax_t + w_t \right\}, \\ \widehat{V}_N^\star {}={} & \min_{x_{N}}V_N^\star(x_N). \end{aligned}$$
We shall prove that the solution of this problem yields the Kalman filter!
The Kalman Filter is a MAP estimator
To show that the forward DP yields the Kalman Filter, we need two lemmata. Firstly, Woodbury’s matrix inversion lemma which we state without a proof.
Lemma VII.5 (Woodbury's identity). The following holds
$$(S+UCV)^{-1} = S^{-1} - S^{-1}U(C^{-1} + VS^{-1}U)^{-1}VS^{-1}.$$
Secondly, we state and prove the following lemma.
Lemma VII.6 (Factorisation of sum of quadratics). Let $Q_1\in\mathbb{S}_{++}^n$, $Q_2\in\mathbb{S}_{++}^m$, $F\in{\rm I\!R}^{m\times n}$, $b\in{\rm I\!R}^m$. Define
$$q(x) = \tfrac{1}{2}\|x-\bar{x}\|_{Q_1^{-1}}^{2} + \tfrac{1}{2}\|Fx-b\|_{Q_2^{-1}}^2.$$
Then for all $x\in{\rm I\!R}^n$,
$$q(x){}={}\tfrac{1}{2}\|x-x^{\star}\|_{W^{-1}}^2 + c,$$
where
$$\begin{aligned} x^\star {}={} & \bar{x} + Q_1F^\intercal S^{-1}(b-F\bar{x}), \\ W {}={} & Q_1 - Q_1F^\intercal S^{-1}F Q_1, \\ c {}={} & \tfrac{1}{2}\|F\bar{x}-b\|_{S^{-1}}^2, \end{aligned}$$
and where $S = Q_2 + FQ_1F^\intercal.$ Moreover,
$$x^\star = \argmin_x q(x),$$
and
$$\min_x q(x) = c.$$
We can now state the main result of this section
Theorem VII.7 (MAP Estimator = KF). Suppose that
$$\begin{aligned} \ell_{x_0}(x_0) {}={} & \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2, \\ \ell_t(x_t, w_t) {}={} & \tfrac{1}{2}\|y_t-Cx_t\|_{R^{-1}}^2 + \tfrac{1}{2}\|w_t\|_{Q^{-1}}^2. \end{aligned}$$
Then, the forward DP yields the KF equations.
Indeed, the first step in the forward DP is
$$\begin{aligned} V_{1}^{\star}(x_{1}) {}={} & \min_{x_0, w_0} \left\{ V_0^\star(x_0) + \ell_0(x_0, w_0) {}\mid{} x_{1} = Ax_0 + w_0 \right\} \\ {}={} & \min_{x_0, w_0} \left\{ \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2 + \tfrac{1}{2}\|y_0-Cx_0\|_{R^{-1}}^2 + \tfrac{1}{2}\|w_0\|_{Q^{-1}}^2 {}\mid{} x_{1} = Ax_0 + w_0 \right\} \\ {}={} & \min_{x_0} \left\{ \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2 + \tfrac{1}{2}\|y_0-Cx_0\|_{R^{-1}}^2 + \tfrac{1}{2}\|x_1 - Ax_0\|_{Q^{-1}}^2 \right\}. \end{aligned}$$
We can use Lemma VII.6 to write the sum of the first two terms as follows
$$\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2 + \tfrac{1}{2}\|y_0-Cx_0\|_{R^{-1}}^2 {}={} \tfrac{1}{2}\|x_0-x_0^\star\|_{W_{0}^{-1}}^2 + \tfrac{1}{2}\|y_0 - Cx_0^\star\|_{S_0^{-1}}^2,$$
where
$$\begin{aligned} S_0 {}={} & R + CP_0C^\intercal, \\ W_0 {}={} & P_0 - P_0 C^\intercal S_0^{-1} CP_0, \\ x_0^\star {}={} & \bar{x}_0 + P_0C^\intercal S_0^\intercal (y_0 - C\bar{x}_0). \end{aligned}$$
Note that $W_0 = \Sigma_{0\mid 0}$ and $x_0^\star = \hat{x}_{0\mid 0}$. Next, we have
$$V_1^\star(x_1) {}={} \underbrace{\tfrac{1}{2}\|y_0 - Cx_0^\star\|_{S_0^{-1}}^2}_{\text{constant}} {}+{} \min_{x_0} \left\{ \tfrac{1}{2}\|x_0-x_0^\star\|_{W_{0}^{-1}}^2 {}+{} \tfrac{1}{2}\|x_1 - Ax_0\|_{Q^{-1}}^2 \right\}.$$
From Lemma VII.6 we have
$$V_1^\star(x_1) {}={} \text{constant} {}+{} \tfrac{1}{2}\|x_1 - A\hat{x}_{0\mid 0}\|^2_{\overline{S}_{0}^{-1}},$$
where $\hat{x}_{1\mid 0} = A\hat{x}_{0\mid 0}$ and
$$\overline{S}_{0} = Q + AW_0A^\intercal = \Sigma_{1\mid 0},$$
(compare this the time update step of the Kalman filter).
Note that $V_1^\star$ has the same form as $V_1^\star$, so the same procedure can be repeated. This will yield the same iterates.