Read first:Kalman Filter II: Conditioning

Read next:Kalman Filter IV: Application to position estimation

Consider again the dynamical system (without an input)

$$\begin{aligned}x_{t+1} {}={} & A_t x_t + G_t w_t,\\y_t {}={} & C_t x_t + v_t,\tag{1}\end{aligned}$$

where $x_t\in{\rm I\!R}^{n_x}$ is the system state, $y_t\in{\rm I\!R}^{n_y}$ is the *output*, $w_t\in{\rm I\!R}^{n_w}$ is a noise term acting on the system dynamics known as *process noise*, and $v_t\in{\rm I\!R}^{n_v}$ is a measurement noise term.

Hereafter we shall assume that

- ${\rm I\!E}[w_t]=0$ and ${\rm I\!E}[v_t]=0$ for all $t\in{\rm I\!N}$,
- $x_0$, $(w_t)_t$ and $(v_t)_t$ are mutually independent random variables (therefore, ${\rm I\!E}[w_tw_l^\intercal]=0$ for $t{}\neq{}l$, and ${\rm I\!E}[v_tv_l^\intercal]=0$ for $t{}\neq{}l$),
- $w_t$ and $v_t$ are normally distributed and ${\rm I\!E}[w_tw_t^\intercal]=Q_t$, ${\rm I\!E}[v_tv_t^\intercal]=R_t$.
- $x_0$ is a random variable and $x_0 \sim \mathcal{N}(\tilde{x}_0, P_0)$

In the following video you will find a complete presentation of the derivation of the Kalman filter.

## Update Equations

The random variables $x_0$ and $y_0$ are jointly normal with ${\rm I\!E}[x_0]=\tilde{x}_0$, ${\rm I\!E}[y_0]={\rm I\!E}[C_0x_0+v_0] = C_0\tilde{x}_0$.

The variance of $x_0$ is ${\rm Var}[x_0] = {P_0}$. The variance of $y_0$ is

$${\rm Var}[y_0] = {\rm Var}[C_0 x_0 + v_0] = {C_0P_0C_0^\intercal + R_0}.\tag{2}$$

The covariance of $x_0$ with $y_0$ is

$$\begin{aligned} {\rm Cov}(x_0, y_0) {}={} & {\rm I\!E}[(x_0-\tilde{x}_0)(y_0 - \tilde{y}_0)^\intercal],\quad \text{ where } \tilde{y}_0 = {\rm I\!E}[y_0] \\ {}={} & {\rm I\!E}[(x_0-\tilde{x}_0)(C_0x_0 + v_0 - C_0\tilde{x}_0)^\intercal] \\ {}={} & {\rm I\!E}[(x_0-\tilde{x}_0)(C_0(x_0 - \tilde{x}_0) + v_0)^\intercal] \\ {}={} & {\rm I\!E}[(x_0-\tilde{x}_0)(x_0 - \tilde{x}_0)^\intercal C_0^\intercal + (x_0-\tilde{x}_0)v_0^\intercal] = {P_0C_0^\intercal}. \end{aligned}$$

Therefore,

$$ \begin{bmatrix} x_0 \\y_0 \end{bmatrix} {}\sim{} \mathcal{N}\left( \begin{bmatrix}\tilde{x}_0\\C_0\tilde{x}_0\end{bmatrix}, \begin{bmatrix} {P_0} & {P_0C_0^\intercal} \\ {C_0P_0} & {C_0P_0C_0^\intercal + R_0} \end{bmatrix} \right).\tag{3}$$

Suppose we measure $y_0$. What is $x_0$ given $y_0$?

Since $(x_0, y_0)$ is jointly normally distributed as in Equation (3), $x_0{}\mid{}y_0$ is normally distributed.

We may define the estimator \(\hat{x}_{0{}\mid{}0} \coloneqq {\rm I\!E}[x_0 {}\mid{} y_0] \), which is (ref)

$$ \hat{x}_{0{}\mid{}0} {}={} \tilde{x}_0 {}+{} {P_0C_0^\intercal} ({C_0P_0C_0^\intercal + R_0})^{-1} (y_0 - C_0\tilde{x}_0),\tag{4}$$

and the estimator variance, \(\Sigma_{0{}\mid{}0} \coloneqq {\rm Var}[x_0{}\mid{}y_0]\), which is

$$\Sigma_{0{}\mid{}0} {}={} {P_0} {}-{} {P_0C_0^\intercal} ({C_0P_0C_0^\intercal + R_0})^{-1} {C_0P_0}\tag{5}$$

Having observed $y_0$ at $t=0$ we want to estimate $x_1$; we compute \(\hat{x}_{1{}\mid{}0} \coloneqq {\rm I\!E}[x_1 {}\mid{} y_0]\) which is

$$\hat{x}_{1{}\mid{}0} {}={} A_0 \hat{x}_{0\mid 0}.\tag{6}$$

The estimator variance, \(\Sigma_{1{}\mid{}0} = {\rm Var}[x_1 {}\mid{} y_0]\), is

$$\Sigma_{1{}\mid{}0} {}={} A_0 \Sigma_{0{}\mid{}0}A_0^\intercal + G_0Q_0G_0^\intercal.\tag{7}$$

The output at $t=1$ given the observation of $y_0$ is expected to be

$$\hat{y}_{1{}\mid{}0} = {\rm I\!E}[y_1 {}\mid{} y_0] = C_1\hat{x}_{1{}\mid{}0},\tag{8}$$

and its (conditional) variance is

$${\rm Var}[y_1 {}\mid{} y_0] = {C_1\Sigma_{1{}\mid{}0}C_1^\intercal + R_1},\tag{9}$$

(can you see why?) and the covariance between $x_1$ and $y_1$, conditional on $y_0$, is

$${\rm Cov}(x_1, y_1 {}\mid{} y_0) {}\coloneqq {} {\rm I\!E}[(x_1 - \tilde{x}_1)(y_1 - \tilde{y}_1)^\intercal {}\mid{} y_0] {}={} {\Sigma_{1{}\mid{}0} C_1^\intercal}.\tag{10}$$

Overall,

$$\left. \begin{bmatrix} x_1 \\ y_1 \end{bmatrix} \right| y_0 {}\sim{} \mathcal{N}\left( \begin{bmatrix}\hat{x}_{1{}\mid{}0}\\C_1\hat{x}_{1{}\mid{}0}\end{bmatrix}, \begin{bmatrix} {\Sigma_{1{}\mid{}0}} & {\Sigma_{1{}\mid{}0}C_1^\intercal} \\ {C_1\Sigma_{1{}\mid{}0}} & {C_1\Sigma_{1{}\mid{}0}C_1^\intercal + R_1 } \end{bmatrix} \right).\tag{11}$$

Once we obtain a measurement $y_1$,

$$\begin{aligned} \hat{x}_{1{}\mid{}1} {}={} & {\rm I\!E}[x_1 {}\mid{} y_0, y_1] = \hat{x}_{1{}\mid{}0} + {\Sigma_{1{}\mid{}0}C_1^\intercal} ({C_1\Sigma_{1{}\mid{}0}C_1^\intercal + R_1 })^{-1}(y_1 - C_1\hat{x}_{1{}\mid{}0}), \\ \Sigma_{1{}\mid{}1} {}={} & {\rm Var}[x_1 {}\mid{} y_0, y_1] = {\Sigma_{1{}\mid{}0}} {}-{} {\Sigma_{1{}\mid{}0}C_1^\intercal} ({C_1\Sigma_{1{}\mid{}0}C_1^\intercal + R_1 })^{-1} {C_1\Sigma_{1{}\mid{}0}}. \end{aligned}$$

We can now state the update equations of the Kalman filter

$$\begin{aligned} \text{Measurement} & \left[ \begin{array}{l} \hat{x}_{t{}\mid{}t} {}={} \hat{x}_{t{}\mid{}t-1} {}+{} \Sigma_{t{}\mid{}t-1}C_t^\intercal (C_t\Sigma_{t{}\mid{}t-1}C_t^\intercal + R_t)^{-1}(y_t - C_t\hat{x}_{t{}\mid{}t-1}) \\ \Sigma_{t{}\mid{}t} {}={} \Sigma_{t{}\mid{}t-1} {}-{} \Sigma_{t{}\mid{}t-1}C_t^\intercal (C_t\Sigma_{t{}\mid{}t-1}C_t^\intercal + R_t)^{-1} C_t\Sigma_{t{}\mid{}t-1} \end{array} \right. \\ \text{Time update} & \left[ \begin{array}{l} \hat{x}_{t+1{}\mid{}t} {}={} A_t \hat{x}_{t{}\mid{}t} \\ \Sigma_{t+1{}\mid{}t} {}={} A_t \Sigma_{t{}\mid{}t} A_t^\intercal + G_tQ_tG_t^\intercal \end{array} \right. \\ \text{Initial conditions} & \left[ \begin{array}{l} \hat{x}_{0{}\mid{}-1} {}={} \tilde{x}_0 \\ \Sigma_{0{}\mid{}-1} {}={} P_0 \end{array} \right. \end{aligned}$$

Note that we have defined: (i) $\hat{x}_{t{}\mid{}t} \coloneqq {\rm I\!E}[x_t {}\mid{} y_0, y_1, \ldots, y_t]$, (ii) $\hat{x}_{t+1{}\mid{}t} \coloneqq {\rm I\!E}[x_{t+1} {}\mid{} y_0, y_1, \ldots, y_t]$, (iii) $\Sigma_{t{}\mid{}t} \coloneqq {\rm Var}[x_t {}\mid{} y_0, y_1, \ldots, y_t]$, and (iv) $\Sigma_{t+1{}\mid{}t} \coloneqq {\rm Var}[x_{t+1} {}\mid{} y_0, y_1, \ldots, y_t]$.

## How it works

The Kalman filter involves three steps.

**Initialisation.** At the beginning, an initialisation step is applied where we provide our *prior* information about the initial state in terms of a normal distribution. In particular, we need to provide an estimate of the initial state, $\tilde{x}_0$. This is our best guess about the state. Alongside, we need to provide the initial variance-covariance matrix, $P_0$. If we are not very confident about our guess, $P_0$ should be taken to be "large" (e.g, a diagonal matrix with large diagonal entries).

**Measurement update step.** Once a new measurement, $y_t$, is available, we apply the measurement update step where we update $\hat{x}_{t\mid t}$ and $\Sigma_{t\mid t}$. The Kalman filter can be applied when the measurements are available intermittently. Then, we will simply skip the measurement update step and we will proceed with the time update.

**Time update step.** At every time instant, after the measurement update (if any), we need to apply the time update step where we predict the next state $\hat{x}_{t+1\mid t}$ and we determine the corresponding variance, $\Sigma_{t+1 \mid t}$ (see the Gauss-Markov model).

## Tuning

The tuning of the Kalman filter consists, mainly, in determining $Q$ and $R$ (they can be time-varying). A "small" $Q$ means that we trust the model, while a "small" $R$ means that we trust the sensors. When it comes to $R$, it is easier to determine it by collecting data from the sensors in a controlled environment (usually we can do this), or by looking at the datasheets. On the other hand, $Q$ may be more difficult to determine accurately, especially if we cannot measure the state directly. Often, both $Q$ and $R$ are determined by trial and error.

Read next:Kalman Filter IV: Application to position estimation

## At infinity

Suppose that $A_t = A$, $C_t = C$, $Q_t = Q$ and $R_t = R$. Then, the covariance matrices are updated according to

$$\Sigma_{t+1{}\mid{}t} {}={} A \Sigma_{t{}\mid{}t-1} A^\intercal {}-{} A \Sigma_{t{}\mid{}t-1}C^\intercal (C\Sigma_{t{}\mid{}t-1}C^\intercal + R)^{-1} C\Sigma_{t{}\mid{}t-1} A^\intercal {}+{} GQG^\intercal,$$

which is a **Riccati recursion**! We know that the Riccati recursion - under certain assumptions (left to the reader as an exercise; drop a comment below) - converges to a steady-state matrix $\Sigma_{\infty}$, i.e., $\Sigma_{t+1{}\mid{}t}\to\Sigma_\infty$ as $t\to\infty$. The covariance matrices can be computed without the need to obtain any system data (independent of $y_t$). The state estimates are essentially conditional expectations. As such, the Kalman filter is the *best* we can achieve (minimum
conditional variance estimator).

Read next:Kalman Filter IV: Application to position estimation

The limit covariance satisfies

$$\Sigma_{\infty} {}={} A \Sigma_{\infty} A^\intercal {}-{} A \Sigma_{\infty}C^\intercal (C\Sigma_{\infty}C^\intercal + R)^{-1} C\Sigma_{\infty} A^\intercal {}+{} GQG^\intercal,$$

that is, $\Sigma_{\infty}$ is the solution of a discrete-time algebraic Riccati equation (DARE). We can compute such a matrix in Python as follows

```
import numpy as np
import scipy.linalg as spla
A = np.array([[1.2, 0], [1, 0.5]])
C = np.array([[1, 3]])
Q = np.eye(2)
G = np.eye(2)
R = np.array([[4]])
sigma_inf = scipy.linalg.solve_discrete_are(A.T, C.T, G @ Q @G.T, R)
```

## Joseph form of covariance update

Here we will show an equivalent form of the measurement update of the covariance matrix, known as the Joseph update.

**Proposition III.1 (Joseph form).** The measurement update of the covariance matrix can be written as

$$\Sigma_{t\mid{}t} {}={} (I-L_t C_t)\Sigma_{t\mid{}t-1}(I-L_tC_t)^\intercal + L_tR_tL_t^\intercal,\tag{13}$$

where $L_t$ is the *Kalman gain* matrix given by

$$L_t = \Sigma_{t{}\mid{}t-1}C^\intercal (C\Sigma_{t{}\mid{}t-1}C^\intercal + R)^{-1}.\tag{14}$$

As we will see later, the Joseph form is more convenient in some cases.