Previous post: Kalman Filter VI: Further examples
Read next: The Kalman Filter VIII: Square root form

Consider the dynamical system

xt+1=Axt+wt,yt=Cxt+vt,\begin{aligned} x_{t+1} {}={} & Ax_t + w_t, \\ y_t {}={} & Cx_t + v_t, \end{aligned}

under the independence assumptions we stated earlier and assuming that wtw_t, vtv_t, and x0x_0 are normally distributed, that is,

pwt(w)exp[12wQ12],pvt(v)exp[12vR12],px0(x0)exp[12x0xˉ0P012].\begin{aligned} p_{w_t}(w) {}\propto{} & \exp \left[-\tfrac{1}{2}\|w\|_{Q^{-1}}^2\right], \\ p_{v_t}(v) {}\propto{} & \exp \left[-\tfrac{1}{2}\|v\|_{R^{-1}}^2\right], \\ p_{x_0}(x_0) {}\propto{} & \exp \left[-\tfrac{1}{2}\|x_0-\bar{x}_0\|_{P_{0}^{-1}}^{2}\right]. \end{aligned}

Given a set of measurements y0,y1,,yN1y_0, y_1, \ldots, y_{N-1} we need to estimate x0,x1,,xNx_0, x_1, \ldots, x_{N}.

Useful results

In what follows we use the convenient notation x0:N=(x0,,xN)x_{0:N}=(x_0, \ldots, x_N), and likewise we define y0:Ny_{0:N}, w0:Nw_{0:N} and so on.

Proposition VII.1 (Joint distribution of states). For any NI ⁣NN\in{\rm I\!N}

p(x0:N)=p(x0)t=0N1pwt(xt+1Axt).p(x_{0:N}) {}={} p(x_0)\prod_{t=0}^{N-1}p_{w_t}(x_{t+1} - Ax_{t}).

Proposition VI.2 (Joint distribution of outputs). Let NI ⁣NN\in{\rm I\!N}. Then

p(y0:N)=p(y0)t=1Np(yty0:t1),p(y_{0:N}) {}={} p(y_0)\prod_{t=1}^{N}p(y_t {}\mid{} y_{0:t-1}),

where y0:t=(y0,,yt)y_{0:t} = (y_0, \ldots, y_t).


Proof. Similar to the proof of Proposition VII.1. \Box

Proposition VII.3 (Joint distribution of states and ww's). For any NI ⁣NN\in{\rm I\!N}

p(x0:N,w0:N1)={0,if not xt+1=Axt+wt,px0(x0)t=0N1pwt(wt),otherwise\begin{aligned}p(x_{0:N}, w_{0:N-1}) {}={} \begin{cases} 0, & \text{if not } x_{t+1} = Ax_t + w_t, \\ p_{x_0}(x_0)\prod_{t=0}^{N-1}p_{w_t}(w_t), & \text{otherwise} \end{cases}\end{aligned}


Proposition VI.4 (Joint distribution of states given outputs). For any NI ⁣NN\in{\rm I\!N}

p(x0:Ny0:N1)px0(x0)t=0N1pvt(ytCxt)pwt(xt+1Axt).\begin{aligned}p(x_{0:N} {}\mid{} y_{0:N-1}) {}\propto{} p_{x_0}(x_0)\prod_{t=0}^{N-1}p_{v_t}(y_t - Cx_t)p_{w_t}(x_{t+1}-Ax_t).\end{aligned}

The full information estimation problem

From Proposition VII.4, the log-likelihood (omitting the constant term) is

logp(x0,,xNy0,,yN1)=logpx0(x0)+t=1N1logpvt(ytCxt)+logpwt(xt+1Axt)=12x0xˉ0P012+12t=0N1ytCxtR12xt+1AxtQ12.\begin{aligned} & \log p(x_0,\ldots, x_{N} {}\mid{} y_0, \ldots, y_{N-1})\\ & \qquad{}={} \log p_{x_0}(x_0) {}+{} \sum_{t=1}^{N-1}\log p_{v_t}(y_t - Cx_t) + \log p_{w_t}(x_{t+1}-Ax_t) \\ & \qquad{}={} -\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \tfrac{1}{2}\sum_{t=0}^{N-1} -\|y_t - Cx_t\|_{R^{-1}}^{2} {}-{} \|x_{t+1}-Ax_t\|_{Q^{-1}}^{2}. \end{aligned}

Using the result of this exercise, we have the maximum a posteriori (MAP) estimate

(x^t)t=0N1=arg maxx0,,xN1p(x0,,xN1y0,,yN)=arg maxx0,,xN1logp(x0,,xN1y0,,yN)=arg maxx0,,xN112x0xˉ0P012+12t=0N1ytCxtR12xt+1AxtQ12=arg minx0,,xN112x0xˉ0P012+12t=0N1ytCxtR12+xt+1AxtQ12.\begin{aligned} (\hat{x}_t)_{t=0}^{N-1} {}={} & \argmax_{x_0,\ldots,x_{N-1}} p(x_0,\ldots, x_{N-1} {}\mid{} y_0, \ldots, y_{N}) \\ {}={} & \argmax_{x_0,\ldots,x_{N-1}} \log p(x_0,\ldots, x_{N-1} {}\mid{} y_0, \ldots, y_{N}) \\ {}={} & \argmax_{x_0,\ldots,x_{N-1}} -\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \tfrac{1}{2}\sum_{t=0}^{N-1} -\|y_t - Cx_t\|_{R^{-1}}^{2} {}-{} \|x_{t+1}-Ax_t\|_{Q^{-1}}^{2} \\ {}={} & \argmin_{x_0,\ldots,x_{N-1}} \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \tfrac{1}{2}\sum_{t=0}^{N-1} \|y_t - Cx_t\|_{R^{-1}}^{2} {}+{} \|x_{t+1}-Ax_t\|_{Q^{-1}}^{2}. \end{aligned}

In fact we can write it as

(x^t)t=0N1=arg minx0,,xN,w0,,wN1,v0,,vN1,xt+1=Axt+wt,tIN[0,N1]yt=Cxt+vt,tIN[0,N]12x0xˉ0P012+t=0N112vtR12+12wtQ12.(\hat{x}_t)_{t=0}^{N-1} {}={} \hspace{-1.5em} \argmin_{ \substack{ x_0,\ldots,x_{N}, \\ w_0, \ldots, w_{N-1}, \\ v_0, \ldots, v_{N-1}, \\ x_{t+1} = Ax_t + w_t, t\in{\rm I\!N}_{[0, N-1]} \\ y_{t} = Cx_t + v_t, t \in {\rm I\!N}_{[0, N]} } }\hspace{-1em} \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \sum_{t=0}^{N-1} \tfrac{1}{2}\|v_t\|_{R^{-1}}^{2} {}+{} \tfrac{1}{2}\|w_t\|_{Q^{-1}}^{2}.

We need to solve the problem

minimisex0,,xN,w0,,wN1,v0,,vN1 12x0xˉ0P012+t=0N112vtR12+12wtQ12,subject to: xt+1=Axt+wt,tI ⁣N[0,N1],yt=Cxt+vt,tI ⁣N[0,N],\begin{aligned} \operatorname*{minimise}_{ \substack{ x_0,\ldots,x_{N}, \\ w_0,\ldots, w_{N-1}, \\ v_0, \ldots, v_{N-1} } }\ & \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2} + \sum_{t=0}^{N-1} \tfrac{1}{2}\|v_t\|_{R^{-1}}^{2} {}+{} \tfrac{1}{2}\|w_t\|_{Q^{-1}}^{2}, \\ \text{subject to: } & x_{t+1} = Ax_t + w_t, t\in{\rm I\!N}_{[0, N-1]}, \\ & y_{t} = Cx_t + v_t, t \in {\rm I\!N}_{[0, N]}, \end{aligned}

which is akin to a linear-quadratic optimal control problem. The key difference is the presence of an arrival cost instead of a terminal cost and the initial condition is not fixed. The problem can be written as follows:

minimisex0,,xN,w0,,wN1 12x0xˉ0P012x0(x0)+t=0N112ytCxtR12+12wtQ12t(xt,wt),subject to: xt+1=Axt+wt,tI ⁣N[0,N1].\begin{aligned} \operatorname*{minimise}_{ \substack{ x_0,\ldots,x_{N}, \\ w_0,\ldots, w_{N-1} } }\ & \underbrace{\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^{2}}_{\ell_{x_0}(x_0)} + \sum_{t=0}^{N-1} \underbrace{ \tfrac{1}{2}\|y_t-Cx_t\|_{R^{-1}}^{2} {}+{} \tfrac{1}{2}\|w_t\|_{Q^{-1}}^{2} }_{\ell_t(x_t, w_t)}, \\ \text{subject to: } & x_{t+1} = Ax_t + w_t, t\in{\rm I\!N}_{[0, N-1]}. \end{aligned}

This is known as the full information estimation (FIE) problem. Later we will show that FIE is equivalent to the Kalman filter.

So what?

Firstly, the full information estimation formulation allows us to interpret the Kalman filter as a Bayesian estimator and work in terms of prior and posterior distributions instead of conditional expectations as we did earlier. This way we can get a better understanding of how the Kalam filter works.

By looking at the cost function of the FIE problem we see that we are trying to minimise:

  1. A penalty on the distance between the estimated initial state, x0x_0, and the expected initial state, xˉ0\bar{x}_0, of our prior on x0x_0
  2. A penalty on the output error, ytCxty_t - Cx_t (what we observe minus what we should be observing based on the estimated state)
  3. A penalty on the process noise (because we want to avoid an estimator that attributes everything to process noise)

The above penalties are scaled by their corresponding variance-covariance matrices. This means that the "higher" a covariance, the "lower" the inverse covariance will be, entailing a lower penalty. For example, if we don't trust our sensors very much we will be having a "large" RR, so a "small" R1R^{-1}, so a small penalty on ytCxty_t - Cx_t, thus allowing the estimator to predict states which lead to large observation errors.

Lastly, the Bayesian approach allows us to generalise to nonlinear systems. We can easily repeat the above procedure assuming we have nonlinear dynamics and/or a nonlinear relationship between states and outputs. This will lead to the moving horizon estimation method.

Forward dynamic programming

The estimation problem becomes

minimisex0,,xN1 x0(x0)+t=0N1t(xt,wt),subject to: xt+1=Axt+wt,tN[0,N1].\begin{aligned} \operatorname*{minimise}_{ x_0,\ldots,x_{N-1} }\ & \ell_{x_0}(x_0) + \sum_{t=0}^{N-1} \ell_t(x_t, w_t), \\ \text{subject to: } & x_{t+1} = Ax_t + w_t, t\in\N_{[0, N-1]}. \end{aligned}

We apply the DP procedure in a forward fashion:

V^N=minx0,,xNw0,,wN1{x0(x0)+t=0N1t(xt,wt)xt+1=Axt+wt,tN[0,N1]}=minx1,,xNw1,,wN1{minx0,w0{x0(x0)+0(x0,w0)x1=Ax0+w0}V1(x1)+t=1N1t(xt,wt)xt+1=Axt+wt,tN[1,N1]}=minx1,,xNw1,,wN1{V1(x1)+t=1N1t(xt,wt)xt+1=Axt+wt,tN[1,N1]}.\begin{aligned} \widehat{V}_N^\star {}={} & \min_{\substack{x_0,\ldots,x_N \\w_0,\ldots,w_{N-1}}} \left\{ \ell_{x_0}(x_0) + \sum_{t=0}^{N-1} \ell_t(x_t, w_t) \left| \begin{array}{l} x_{t+1} = Ax_t + w_t, \\ t\in\N_{[0, N-1]} \end{array} \right. \right\} \\ {}={} & \min_{\substack{x_1,\ldots,x_N \\w_1,\ldots,w_{N-1}}} \left\{ \underbrace{ \min_{x_0, w_0} \left\{ \ell_{x_0}(x_0) + \ell_0(x_0, w_0) {}\mid{} x_1 = Ax_0 + w_0 \right\} }_{V_1^{\star}(x_1)} \rule{0cm}{1.1cm}\right. \\& \qquad\qquad\qquad \left. \rule{0cm}{1.1cm} {}+{} \sum_{t=1}^{N-1} \ell_t(x_t, w_t)\, \left| \begin{array}{l} x_{t+1} = Ax_t + w_t, \\ t\in\N_{[1, N-1]} \end{array} \right. \right\} \\ {}={} & \min_{\substack{x_1,\ldots,x_N \\w_1,\ldots,w_{N-1}}} \left\{ V_1^{\star}(x_1) {}+{} \sum_{t=1}^{N-1} \ell_t(x_t, w_t) \left| \begin{array}{l} x_{t+1} = Ax_t + w_t, \\ t\in\N_{[1, N-1]} \end{array} \right. \right\}. \end{aligned}

The forward dynamic programming procedure can be written as

V0(x0)=x0(x0),Vt+1(xt+1)=minxt,wt{Vt(xt)+t(xt,wt)xt+1=Axt+wt},V^N=minxNVN(xN).\begin{aligned} V_0^\star(x_0) {}={} & \ell_{x_0}(x_0), \\ V_{t+1}^{\star}(x_{t+1}) {}={} & \min_{x_t, w_t} \left\{ V_t^\star(x_t) + \ell_t(x_t, w_t) {}\mid{} x_{t+1} = Ax_t + w_t \right\}, \\ \widehat{V}_N^\star {}={} & \min_{x_{N}}V_N^\star(x_N). \end{aligned}

We shall prove that the solution of this problem yields the Kalman filter!

The Kalman Filter is a MAP estimator

To show that the forward DP yields the Kalman Filter, we need two lemmata. Firstly, Woodbury’s matrix inversion lemma which we state without a proof.

Lemma VII.5 (Woodbury's identity). The following holds

(S+UCV)1=S1S1U(C1+VS1U)1VS1.(S+UCV)^{-1} = S^{-1} - S^{-1}U(C^{-1} + VS^{-1}U)^{-1}VS^{-1}.

Secondly, we state and prove the following lemma.

Lemma VII.6 (Factorisation of sum of quadratics). Let Q1S++nQ_1\in\mathbb{S}_{++}^n, Q2S++mQ_2\in\mathbb{S}_{++}^m, FI ⁣Rm×nF\in{\rm I\!R}^{m\times n}, bI ⁣Rmb\in{\rm I\!R}^m. Define

q(x)=12xxˉQ112+12FxbQ212.q(x) = \tfrac{1}{2}\|x-\bar{x}\|_{Q_1^{-1}}^{2} + \tfrac{1}{2}\|Fx-b\|_{Q_2^{-1}}^2.

Then for all xI ⁣Rnx\in{\rm I\!R}^n,

q(x)=12xxW12+c,q(x){}={}\tfrac{1}{2}\|x-x^{\star}\|_{W^{-1}}^2 + c,

where

x=xˉ+Q1FS1(bFxˉ),W=Q1Q1FS1FQ1,c=12FxˉbS12,\begin{aligned} x^\star {}={} & \bar{x} + Q_1F^\intercal S^{-1}(b-F\bar{x}), \\ W {}={} & Q_1 - Q_1F^\intercal S^{-1}F Q_1, \\ c {}={} & \tfrac{1}{2}\|F\bar{x}-b\|_{S^{-1}}^2, \end{aligned}

and where S=Q2+FQ1F.S = Q_2 + FQ_1F^\intercal. Moreover,

x=arg minxq(x),x^\star = \argmin_x q(x),

and

minxq(x)=c.\min_x q(x) = c.

We can now state the main result of this section

Theorem VII.7 (MAP Estimator = KF). Suppose that

x0(x0)=12x0xˉ0P012,t(xt,wt)=12ytCxtR12+12wtQ12.\begin{aligned} \ell_{x_0}(x_0) {}={} & \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2, \\ \ell_t(x_t, w_t) {}={} & \tfrac{1}{2}\|y_t-Cx_t\|_{R^{-1}}^2 + \tfrac{1}{2}\|w_t\|_{Q^{-1}}^2. \end{aligned}

Then, the forward DP yields the KF equations.


Indeed, the first step in the forward DP is

V1(x1)=minx0,w0{V0(x0)+0(x0,w0)x1=Ax0+w0}=minx0,w0{12x0xˉ0P012+12y0Cx0R12+12w0Q12x1=Ax0+w0}=minx0{12x0xˉ0P012+12y0Cx0R12+12x1Ax0Q12}.\begin{aligned} V_{1}^{\star}(x_{1}) {}={} & \min_{x_0, w_0} \left\{ V_0^\star(x_0) + \ell_0(x_0, w_0) {}\mid{} x_{1} = Ax_0 + w_0 \right\} \\ {}={} & \min_{x_0, w_0} \left\{ \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2 + \tfrac{1}{2}\|y_0-Cx_0\|_{R^{-1}}^2 + \tfrac{1}{2}\|w_0\|_{Q^{-1}}^2 {}\mid{} x_{1} = Ax_0 + w_0 \right\} \\ {}={} & \min_{x_0} \left\{ \tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2 + \tfrac{1}{2}\|y_0-Cx_0\|_{R^{-1}}^2 + \tfrac{1}{2}\|x_1 - Ax_0\|_{Q^{-1}}^2 \right\}. \end{aligned}

We can use Lemma VII.6 to write the sum of the first two terms as follows

12x0xˉ0P012+12y0Cx0R12=12x0x0W012+12y0Cx0S012,\tfrac{1}{2}\|x_0 - \bar{x}_0\|_{P_0^{-1}}^2 + \tfrac{1}{2}\|y_0-Cx_0\|_{R^{-1}}^2 {}={} \tfrac{1}{2}\|x_0-x_0^\star\|_{W_{0}^{-1}}^2 + \tfrac{1}{2}\|y_0 - Cx_0^\star\|_{S_0^{-1}}^2,

where

S0=R+CP0C,W0=P0P0CS01CP0,x0=xˉ0+P0CS0(y0Cxˉ0).\begin{aligned} S_0 {}={} & R + CP_0C^\intercal, \\ W_0 {}={} & P_0 - P_0 C^\intercal S_0^{-1} CP_0, \\ x_0^\star {}={} & \bar{x}_0 + P_0C^\intercal S_0^\intercal (y_0 - C\bar{x}_0). \end{aligned}

Note that W0=Σ00W_0 = \Sigma_{0\mid 0} and x0=x^00x_0^\star = \hat{x}_{0\mid 0}. Next, we have

V1(x1)=12y0Cx0S012constant+minx0{12x0x0W012+12x1Ax0Q12}.V_1^\star(x_1) {}={} \underbrace{\tfrac{1}{2}\|y_0 - Cx_0^\star\|_{S_0^{-1}}^2}_{\text{constant}} {}+{} \min_{x_0} \left\{ \tfrac{1}{2}\|x_0-x_0^\star\|_{W_{0}^{-1}}^2 {}+{} \tfrac{1}{2}\|x_1 - Ax_0\|_{Q^{-1}}^2 \right\}.

From Lemma VII.6 we have

V1(x1)=constant+12x1Ax^00S012,V_1^\star(x_1) {}={} \text{constant} {}+{} \tfrac{1}{2}\|x_1 - A\hat{x}_{0\mid 0}\|^2_{\overline{S}_{0}^{-1}},

where x^10=Ax^00\hat{x}_{1\mid 0} = A\hat{x}_{0\mid 0} and

S0=Q+AW0A=Σ10,\overline{S}_{0} = Q + AW_0A^\intercal = \Sigma_{1\mid 0},

(compare this the time update step of the Kalman filter).

Note that V1V_1^\star has the same form as V1V_1^\star, so the same procedure can be repeated. This will yield the same iterates.