The Kalman Filter VIII: QR-based square root form

Previous post: Kalman Filter VII: Recursive maximum a posteriori

About

The big issue with the Kalman filter is that it involves the inverse of a matrix. When we update the covariance matrix, numerical issues may lead to covariances which are not positive semidefinite. Moreover, the condition number of the covariance can be very high. As a result the Kalman filter, in its standard form is not suitable for embedded applications with fixed-point arithmetic [1].

In this post we will present the square-root form of the Kalman filter by Tracy [2] that uses only QR factorisations and we will give code snippets in Python.

Matrix factorisations

Cholesky factorisation

The Cholesky factorisation of a symmetric positive definite matrix can be thought of as a square root of the matrix. Every symmetric positive definite matrix, $M$, can be written as

$$M=U^\intercal U,\tag{1}$$

where $U$ is an upper diagonal matrix with positive diagonal entries. We introduce the notation $U = {\sf chol}(M)$.

QR factorisation

Any $M\in{\rm I\!R}^{n\times n}$ can be written as

$$M = QR\tag{2}$$

where $Q$ is an orthogonal matrix - therefore, $QQ^\intercal=Q^\intercal{}Q=I$ - and $R$ is upper triangular. For convenience, let us define the operator ${\sf qr}$ that maps $M$ to such a pair $(Q, R)$. We write

$$(Q, R) = {\sf qr}(M).\tag{3}$$

The individual matrices $Q$ and $R$ will be denoted by ${\sf qr}_Q(M)$ and ${\sf qr}_R(M)$. We will not go into details regarding the uniqueness of the QR factorisation in this post.

Cholesky of sum of matrices

The following lemma allows us to determine the Cholesky of the sum of two symmetric positive definite matrix matrices, $A+B$, from Cholesky factors or $A$ and $B$. We will see that this can be done at the cost of a QR factorisation.

Lemma VIII.1 (Cholesky of sum). It is

$${\sf chol}(A+B) = {\sf qr}_R\left(\begin{bmatrix}{\sf chol}(A) \\ {\sf chol}(B)\end{bmatrix}\right)^{\intercal}.\tag{4}$$

Proof. Let $U_A = {\sf chol}(A)$ and $U_B = {\sf chol}(B)$. Then

$$A + B = U_A^\intercal U_A + U_B^\intercal U_B = [U_A^\intercal ~~~ U_B^\intercal] \begin{bmatrix}U_A \\ U_B\end{bmatrix}.\tag{5}$$

Then we perform the following QR factorisation

$$\begin{bmatrix}U_A \\ U_B\end{bmatrix} = QR,\tag{6}$$

$$A + B = [U_A^\intercal ~~~ U_B^\intercal]\begin{bmatrix}U_A \\ U_B\end{bmatrix} = R^\intercal Q^\intercal Q R = R^\intercal R,\tag{7}$$

therefore, $R^\intercal$ is the (upper triangular) Cholesky factor of $A+B$. $\Box$

We can write the result of Lemma VIII.1 as follows

Let $U$ and $V$ be upper triangular matrices. Then,

$$U^\intercal U + V^\intercal V = W^\intercal W,\tag{8a}$$

where

$$W = {\sf qr}_R\left(\begin{bmatrix}U \\ V\end{bmatrix}\right)^{\intercal}.\tag{8b}$$

Here is an example of applying Lemma VIII.1 in Python.

import numpy as np

n = 3

def random_symm_posdef(n):
    W = np.random.rand(n, n)
    return W.T @ W + np.eye(n)

A = random_symm_posdef(n)
B = random_symm_posdef(n)

uA = np.linalg.cholesky(A).T
uB = np.linalg.cholesky(B).T

uAB = np.vstack((uA, uB))

_, r = np.linalg.qr(uAB)

error = r.T @ r - (A+B)
assert np.linalg.norm(error, np.inf) < 1e-12

Solving linear systems

Both the Cholesky and the QR factorisations can be used to solve linear systems. For example, using the Cholesky factorisation of a symmetric positive definite matrix $A\in{\rm I\!R}^{n\times n}$, that is, $A = UU^\intercal$ we can solve the linear system

$$Ax = b \Leftrightarrow x = A^{-1}b = (UU^{\intercal})^{-1}b = U^{-\intercal}U^{-1}b.\tag{9}$$

Equivalently, the linear system $Ax = b$ is equivalent to solving

$$\begin{aligned}Uz =& b,\\U^\intercal x =& z.\end{aligned}$$

Since $U$ is triangular, the above two systems can be solved easily.

Likewise, for the QR factorisation, $A=QR$, we have

$$\begin{aligned}Ax = b \Leftrightarrow& (QR)^{-1}x = b \\ \Leftrightarrow& R^{-1}Q^{-1}x = b \\ \Leftrightarrow& R^{-1}Q^{\intercal}x = b,\end{aligned}$$

which can be solved easily.

Let us give an example in Python using scipy.linalg.solve_triangular.

import numpy as np
import scipy.linalg as spla

A = np.array([[3, -1, 1],
              [5, 4, 2],
              [1, 2, -10]])
Q, R = np.linalg.qr(A)
b = np.array([[1, 2, 3]]).T
x = spla.solve_triangular(R, Q.T@b)
assert np.linalg.norm(A@x-b, np.inf) < 1e-12

Square root form of the Kalman filter

Tracy in [2] presents a square-root Kalman filter that uses only QR decompositions. The idea is to formulate the measurement and time update steps of the Kalman filter in such a way that we update the Cholesky factorisation of the covariances. Hereafter we will assume that the system matrices, $A, G, Q,$ and $R$ are time-invariant.

Initialisation

We initialise the Kalman filter by setting $\hat{x}_{0\mid -1} = \tilde{x}_0$ and $\Sigma_{0\mid -1} = P_0$ as we did earlier. Next we perform a Cholesky factorisation of $\Sigma_{0\mid -1}$. We define

$$F_{0\mid -1} {}={} {\sf chol}(\Sigma_{0\mid -1}).\tag{10}$$

We have that $\Sigma_{0\mid -1} = F_{0\mid -1}^{\intercal} F_{0\mid -1}$.

Moreover, we determine the Cholesky factorisations of $Q$ and $R$; we define

$$\begin{aligned}\Theta_Q ={}& {\sf chol}(Q), \\ \Theta_R ={}& {\sf chol}(R).\end{aligned}$$

Measurement update

Measurement update of the state

Recall that the measurement update equation for the state is

$$\hat{x}_{t{}\mid{}t} {}={} \hat{x}_{t{}\mid{}t-1} {}+{} {\color{blue}\Sigma_{t{}\mid{}t-1}C^\intercal (C\Sigma_{t{}\mid{}t-1}C^\intercal + R)^{-1}}(y_t - C\hat{x}_{t{}\mid{}t-1}).\tag{11}$$

The quantity of interest is the matrix in blue. Let us define

$$L_t = \Sigma_{t{}\mid{}t-1}C^\intercal (C\Sigma_{t{}\mid{}t-1}C^\intercal + R)^{-1}.\tag{12}$$

We have

$$ L_t {}={} F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal (C F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal + \Theta_R^\intercal \Theta_R)^{-1} \tag{13}$$

We can now invoke Lemma VIII.1 to factorise $C F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal + \Theta_R^\intercal \Theta_R$. It is

$$ C F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal + \Theta_R^\intercal \Theta_R = \Gamma_t^\intercal \Gamma_t,\tag{14}$$

where $\Gamma_t$ is the upper triangular matrix

$$\Gamma_t = {\sf qr}_R\left( \begin{bmatrix}F_{t{}\mid{}t-1} C^\intercal \\ \Theta_R\end{bmatrix} \right)^{\intercal}\tag{15}$$

So from Equation (13),

$$L_t = F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal \Gamma_t^{-1} \Gamma_t^{-\intercal}.\tag{16}$$

Note that

$$L_t = (\Gamma_t^{-1}\Gamma_t^{-\intercal}CF_{t\mid t-1}^\intercal F_{t\mid t-1})^\intercal,\tag{17}$$

and the multiplication by $\Gamma_t^{-1}$ and $\Gamma_t^{-\intercal}$ amounts to solving a linear system with a triangular matrix, which can be done easily.

Measurement update of the covariance

Here we will use the Joseph form of the measurement update, that is

$$\begin{aligned} \Sigma_{t\mid t} {}={}& (I-L_tC)\Sigma_{t\mid t-1}(I-L_tC)^\intercal + L_t R L_t^\intercal \\ {}={}& (I-L_tC)\Sigma_{t\mid t-1}(I-L_tC)^\intercal + L_t \Theta_R^\intercal \Theta_R L_t^\intercal, \end{aligned}\tag{18}$$

so from Lemma VIII.1,

$$F_{t\mid t} {}={} {\sf qr}_R\left( \begin{bmatrix}F_{t\mid t-1}(I-L_tC) \\ \Theta_R L_t^\intercal\end{bmatrix} \right)^{\intercal}.\tag{19}$$

Time update

The time update step of the state estimate, $\hat{x}_{t+1\mid t}$ remains the same, that is,

$$\hat{x}_{t+1\mid t} = A\hat{x}_{t\mid t}.\tag{20}$$

For the time update step of the covariance we have

$$\begin{aligned} \Sigma_{t+1{}\mid{}t} {}={}& A \Sigma_{t{}\mid{}t} A^\intercal + GQG^\intercal \\ {}={}& A F_{t{}\mid{}t}^{\intercal} F_{t{}\mid{}t} A^\intercal + G\Theta_Q^{\intercal} \Theta_Q G^\intercal. \end{aligned}\tag{21}$$

From Lemma VIII.1,

$$F_{t+1\mid t} {}={} {\sf qr}_R\left( \begin{bmatrix} F_{t\mid t} A^{\intercal} \\ \Theta_{Q}G^{\intercal} \end{bmatrix} \right)^{\intercal}.\tag{22}$$

Putting it all together

... so we end up with the following update equations

$$\begin{aligned} \text{Initial conditions} & \left[ \begin{array}{l} \hat{x}_{0{}\mid{}-1} {}={} \tilde{x}_0 \\ F_{0{}\mid{}-1} {}={} {\sf chol}(P_0) \\ \Theta_{Q} {}={} {\sf chol}(Q) \\ \Theta_{R} {}={} {\sf chol}(R) \end{array} \right. \\ \text{Measurement} & \left[ \begin{array}{l} \Gamma_t = {\sf qr}_R\left( \begin{bmatrix}F_{t{}\mid{}t-1} C^\intercal \\ \Theta_R\end{bmatrix} \right)^{\intercal} \\ e_t = y_t - C_t\hat{x}_{t{}\mid{}t-1} \\ L_t = (\Gamma_t^{-1}\Gamma_t^{-\intercal}CF_{t\mid t-1}^\intercal F_{t\mid t-1})^\intercal \\ \hat{x}_{t{}\mid{}t} {}={} \hat{x}_{t{}\mid{}t-1} {}+{} L_t e_t \\ F_{t\mid t} {}={} {\sf qr}_R\left( \begin{bmatrix}F_{t\mid t-1}(I-L_tC) \\ \Theta_R L_t^\intercal\end{bmatrix} \right) \end{array} \right. \\ \text{Time update} & \left[ \begin{array}{l} \hat{x}_{t+1{}\mid{}t} {}={} A_t \hat{x}_{t{}\mid{}t} \\ F_{t+1{}\mid{}t} {}={} {\sf qr}_R \left( \begin{bmatrix} F_{t\mid t}A^{\intercal} \\ \Theta_Q G^{\intercal} \end{bmatrix} \right)^{\intercal} \end{array} \right. \end{aligned}$$

We see that the measurement and time update steps involve three QR factorisations.

Example in Python

Here is a simple implementation of the above QR-based square root Kalman filter in Python

import numpy as np
import scipy.linalg as spla


class SquareRootKalmanFilter:

    def __init__(self, Q, R, A, C, G, x0, P0):
        # Initial covariances
        self.F_measurement_chol = None
        self.F_predicted_chol = np.linalg.cholesky(P0).T
        # System matrices
        self.A = A
        self.C = C
        self.G = G
        # Initialisation
        self.state_measurement = None
        self.state_predicted = x0
        self.theta_Q = np.linalg.cholesky(Q).T
        self.theta_R = np.linalg.cholesky(R).T

    @staticmethod
    def qrr(a, b):
        _, r = np.linalg.qr(np.vstack((a, b)))
        return r

    def time_update(self):
        self.state_predicted = self.A @ self.state_measurement
        self.F_predicted_chol = SquareRootKalmanFilter.qrr(
            self.F_measurement_chol @ self.A.T,
            self.theta_Q @ self.G.T)

    def measurement_update(self, y):
        nx = self.A.shape[0]

        # Compute Gamma_t
        gamma = SquareRootKalmanFilter.qrr(
            self.F_predicted_chol @ C.T,
            self.theta_R)

        # Compute the Kalman gain
        sigma_pred = self.F_predicted_chol.T @ self.F_predicted_chol
        c_sigma_pred = self.C @ sigma_pred
        l_temp = spla.solve_triangular(gamma, c_sigma_pred, lower=False, trans=1)
        kalman_gain = spla.solve_triangular(gamma, l_temp, lower=False).T

        # Compute the output error
        err = y - self.C @ self.state_predicted

        # State update
        self.state_measurement = self.state_predicted + kalman_gain @ err

        # Covariance update
        self.F_measurement_chol = SquareRootKalmanFilter.qrr(
            self.F_predicted_chol @ (np.eye(nx) - kalman_gain @ self.C).T,
            self.theta_R @ kalman_gain.T)

References

[1] B.D.O. Anderson and J.B. Moore, Optimal Filtering, Dover Publications, 2005

[2] K. Tracy, A square-root Kalman filter using only QR decompositions, arXiv: 2208.06452, 2022

Previous post: Kalman Filter VII: Recursive maximum a posteriori
Read next: The Kalman Filter IX: Information filter (coming soon)

About#

Matrix factorisations#

Cholesky factorisation#

QR factorisation#

Cholesky of sum of matrices#

Solving linear systems#

Square root form of the Kalman filter#

Initialisation#

Measurement update#

Measurement update of the state#

Measurement update of the covariance#

Time update#

Putting it all together#

Example in Python#

References#