Previous post: Kalman Filter VII: Recursive maximum a posteriori
About
The big issue with the Kalman filter is that it involves the inverse of a matrix. When we update the covariance matrix, numerical issues may lead to covariances which are not positive semidefinite. Moreover, the condition number of the covariance can be very high. As a result the Kalman filter, in its standard form is not suitable for embedded applications with fixed-point arithmetic [1].
In this post we will present the square-root form of the Kalman filter by Tracy [2] that uses only QR factorisations and we will give code snippets in Python.
Matrix factorisations
Cholesky factorisation
The Cholesky factorisation of a symmetric positive definite matrix can be thought of as a square root of the matrix. Every symmetric positive definite matrix, $M$, can be written as
$$M=U^\intercal U,\tag{1}$$
where $U$ is an upper diagonal matrix with positive diagonal entries. We introduce the notation $U = {\sf chol}(M)$.
QR factorisation
Any $M\in{\rm I\!R}^{n\times n}$ can be written as
$$M = QR\tag{2}$$
where $Q$ is an orthogonal matrix - therefore, $QQ^\intercal=Q^\intercal{}Q=I$ - and $R$ is upper triangular. For convenience, let us define the operator ${\sf qr}$ that maps $M$ to such a pair $(Q, R)$. We write
$$(Q, R) = {\sf qr}(M).\tag{3}$$
The individual matrices $Q$ and $R$ will be denoted by ${\sf qr}_Q(M)$ and ${\sf qr}_R(M)$. We will not go into details regarding the uniqueness of the QR factorisation in this post.
Cholesky of sum of matrices
The following lemma allows us to determine the Cholesky of the sum of two symmetric positive definite matrix matrices, $A+B$, from Cholesky factors or $A$ and $B$. We will see that this can be done at the cost of a QR factorisation.
Lemma VIII.1 (Cholesky of sum). It is
$${\sf chol}(A+B) = {\sf qr}_R\left(\begin{bmatrix}{\sf chol}(A) \\ {\sf chol}(B)\end{bmatrix}\right)^{\intercal}.\tag{4}$$
Proof. Let $U_A = {\sf chol}(A)$ and $U_B = {\sf chol}(B)$. Then
$$A + B = U_A^\intercal U_A + U_B^\intercal U_B = [U_A^\intercal ~~~ U_B^\intercal] \begin{bmatrix}U_A \\ U_B\end{bmatrix}.\tag{5}$$
Then we perform the following QR factorisation
$$\begin{bmatrix}U_A \\ U_B\end{bmatrix} = QR,\tag{6}$$
so
$$A + B = [U_A^\intercal ~~~ U_B^\intercal]\begin{bmatrix}U_A \\ U_B\end{bmatrix} = R^\intercal Q^\intercal Q R = R^\intercal R,\tag{7}$$
therefore, $R^\intercal$ is the (upper triangular) Cholesky factor of $A+B$. $\Box$
We can write the result of Lemma VIII.1 as follows
Let $U$ and $V$ be upper triangular matrices. Then,
$$U^\intercal U + V^\intercal V = W^\intercal W,\tag{8a}$$
where
$$W = {\sf qr}_R\left(\begin{bmatrix}U \\ V\end{bmatrix}\right)^{\intercal}.\tag{8b}$$
Here is an example of applying Lemma VIII.1 in Python.
import numpy as np
n = 3
def random_symm_posdef(n):
W = np.random.rand(n, n)
return W.T @ W + np.eye(n)
A = random_symm_posdef(n)
B = random_symm_posdef(n)
uA = np.linalg.cholesky(A).T
uB = np.linalg.cholesky(B).T
uAB = np.vstack((uA, uB))
_, r = np.linalg.qr(uAB)
error = r.T @ r - (A+B)
assert np.linalg.norm(error, np.inf) < 1e-12
Solving linear systems
Both the Cholesky and the QR factorisations can be used to solve linear systems. For example, using the Cholesky factorisation of a symmetric positive definite matrix $A\in{\rm I\!R}^{n\times n}$, that is, $A = UU^\intercal$ we can solve the linear system
$$Ax = b \Leftrightarrow x = A^{-1}b = (UU^{\intercal})^{-1}b = U^{-\intercal}U^{-1}b.\tag{9}$$
Equivalently, the linear system $Ax = b$ is equivalent to solving
$$\begin{aligned}Uz =& b,\\U^\intercal x =& z.\end{aligned}$$
Since $U$ is triangular, the above two systems can be solved easily.
Likewise, for the QR factorisation, $A=QR$, we have
$$\begin{aligned}Ax = b \Leftrightarrow& (QR)^{-1}x = b \\ \Leftrightarrow& R^{-1}Q^{-1}x = b \\ \Leftrightarrow& R^{-1}Q^{\intercal}x = b,\end{aligned}$$
which can be solved easily.
Let us give an example in Python using scipy.linalg.solve_triangular
.
import numpy as np
import scipy.linalg as spla
A = np.array([[3, -1, 1],
[5, 4, 2],
[1, 2, -10]])
Q, R = np.linalg.qr(A)
b = np.array([[1, 2, 3]]).T
x = spla.solve_triangular(R, Q.T@b)
assert np.linalg.norm(A@x-b, np.inf) < 1e-12
Square root form of the Kalman filter
Tracy in [2] presents a square-root Kalman filter that uses only QR decompositions. The idea is to formulate the measurement and time update steps of the Kalman filter in such a way that we update the Cholesky factorisation of the covariances. Hereafter we will assume that the system matrices, $A, G, Q,$ and $R$ are time-invariant.
Initialisation
We initialise the Kalman filter by setting $\hat{x}_{0\mid -1} = \tilde{x}_0$ and $\Sigma_{0\mid -1} = P_0$ as we did earlier. Next we perform a Cholesky factorisation of $\Sigma_{0\mid -1}$. We define
$$F_{0\mid -1} {}={} {\sf chol}(\Sigma_{0\mid -1}).\tag{10}$$
We have that $\Sigma_{0\mid -1} = F_{0\mid -1}^{\intercal} F_{0\mid -1}$.
Moreover, we determine the Cholesky factorisations of $Q$ and $R$; we define
$$\begin{aligned}\Theta_Q ={}& {\sf chol}(Q), \\ \Theta_R ={}& {\sf chol}(R).\end{aligned}$$
Measurement update
Measurement update of the state
Recall that the measurement update equation for the state is
$$\hat{x}_{t{}\mid{}t} {}={} \hat{x}_{t{}\mid{}t-1} {}+{} {\color{blue}\Sigma_{t{}\mid{}t-1}C^\intercal (C\Sigma_{t{}\mid{}t-1}C^\intercal + R)^{-1}}(y_t - C\hat{x}_{t{}\mid{}t-1}).\tag{11}$$
The quantity of interest is the matrix in blue. Let us define
$$L_t = \Sigma_{t{}\mid{}t-1}C^\intercal (C\Sigma_{t{}\mid{}t-1}C^\intercal + R)^{-1}.\tag{12}$$
We have
$$ L_t {}={} F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal (C F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal + \Theta_R^\intercal \Theta_R)^{-1} \tag{13}$$
We can now invoke Lemma VIII.1 to factorise $C F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal + \Theta_R^\intercal \Theta_R$. It is
$$ C F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal + \Theta_R^\intercal \Theta_R = \Gamma_t^\intercal \Gamma_t,\tag{14}$$
where $\Gamma_t$ is the upper triangular matrix
$$\Gamma_t = {\sf qr}_R\left( \begin{bmatrix}F_{t{}\mid{}t-1} C^\intercal \\ \Theta_R\end{bmatrix} \right)^{\intercal}\tag{15}$$
So from Equation (13),
$$L_t = F_{t{}\mid{}t-1}^{\intercal} F_{t{}\mid{}t-1} C^\intercal \Gamma_t^{-1} \Gamma_t^{-\intercal}.\tag{16}$$
Note that
$$L_t = (\Gamma_t^{-1}\Gamma_t^{-\intercal}CF_{t\mid t-1}^\intercal F_{t\mid t-1})^\intercal,\tag{17}$$
and the multiplication by $\Gamma_t^{-1}$ and $\Gamma_t^{-\intercal}$ amounts to solving a linear system with a triangular matrix, which can be done easily.
Measurement update of the covariance
Here we will use the Joseph form of the measurement update, that is
$$\begin{aligned} \Sigma_{t\mid t} {}={}& (I-L_tC)\Sigma_{t\mid t-1}(I-L_tC)^\intercal + L_t R L_t^\intercal \\ {}={}& (I-L_tC)\Sigma_{t\mid t-1}(I-L_tC)^\intercal + L_t \Theta_R^\intercal \Theta_R L_t^\intercal, \end{aligned}\tag{18}$$
so from Lemma VIII.1,
$$F_{t\mid t} {}={} {\sf qr}_R\left( \begin{bmatrix}F_{t\mid t-1}(I-L_tC) \\ \Theta_R L_t^\intercal\end{bmatrix} \right)^{\intercal}.\tag{19}$$
Time update
The time update step of the state estimate, $\hat{x}_{t+1\mid t}$ remains the same, that is,
$$\hat{x}_{t+1\mid t} = A\hat{x}_{t\mid t}.\tag{20}$$
For the time update step of the covariance we have
$$\begin{aligned} \Sigma_{t+1{}\mid{}t} {}={}& A \Sigma_{t{}\mid{}t} A^\intercal + GQG^\intercal \\ {}={}& A F_{t{}\mid{}t}^{\intercal} F_{t{}\mid{}t} A^\intercal + G\Theta_Q^{\intercal} \Theta_Q G^\intercal. \end{aligned}\tag{21}$$
From Lemma VIII.1,
$$F_{t+1\mid t} {}={} {\sf qr}_R\left( \begin{bmatrix} F_{t\mid t} A^{\intercal} \\ \Theta_{Q}G^{\intercal} \end{bmatrix} \right)^{\intercal}.\tag{22}$$
Putting it all together
... so we end up with the following update equations
$$\begin{aligned} \text{Initial conditions} & \left[ \begin{array}{l} \hat{x}_{0{}\mid{}-1} {}={} \tilde{x}_0 \\ F_{0{}\mid{}-1} {}={} {\sf chol}(P_0) \\ \Theta_{Q} {}={} {\sf chol}(Q) \\ \Theta_{R} {}={} {\sf chol}(R) \end{array} \right. \\ \text{Measurement} & \left[ \begin{array}{l} \Gamma_t = {\sf qr}_R\left( \begin{bmatrix}F_{t{}\mid{}t-1} C^\intercal \\ \Theta_R\end{bmatrix} \right)^{\intercal} \\ e_t = y_t - C_t\hat{x}_{t{}\mid{}t-1} \\ L_t = (\Gamma_t^{-1}\Gamma_t^{-\intercal}CF_{t\mid t-1}^\intercal F_{t\mid t-1})^\intercal \\ \hat{x}_{t{}\mid{}t} {}={} \hat{x}_{t{}\mid{}t-1} {}+{} L_t e_t \\ F_{t\mid t} {}={} {\sf qr}_R\left( \begin{bmatrix}F_{t\mid t-1}(I-L_tC) \\ \Theta_R L_t^\intercal\end{bmatrix} \right) \end{array} \right. \\ \text{Time update} & \left[ \begin{array}{l} \hat{x}_{t+1{}\mid{}t} {}={} A_t \hat{x}_{t{}\mid{}t} \\ F_{t+1{}\mid{}t} {}={} {\sf qr}_R \left( \begin{bmatrix} F_{t\mid t}A^{\intercal} \\ \Theta_Q G^{\intercal} \end{bmatrix} \right)^{\intercal} \end{array} \right. \end{aligned}$$
We see that the measurement and time update steps involve three QR factorisations.
Example in Python
Here is a simple implementation of the above QR-based square root Kalman filter in Python
import numpy as np
import scipy.linalg as spla
class SquareRootKalmanFilter:
def __init__(self, Q, R, A, C, G, x0, P0):
# Initial covariances
self.F_measurement_chol = None
self.F_predicted_chol = np.linalg.cholesky(P0).T
# System matrices
self.A = A
self.C = C
self.G = G
# Initialisation
self.state_measurement = None
self.state_predicted = x0
self.theta_Q = np.linalg.cholesky(Q).T
self.theta_R = np.linalg.cholesky(R).T
@staticmethod
def qrr(a, b):
_, r = np.linalg.qr(np.vstack((a, b)))
return r
def time_update(self):
self.state_predicted = self.A @ self.state_measurement
self.F_predicted_chol = SquareRootKalmanFilter.qrr(
self.F_measurement_chol @ self.A.T,
self.theta_Q @ self.G.T)
def measurement_update(self, y):
nx = self.A.shape[0]
# Compute Gamma_t
gamma = SquareRootKalmanFilter.qrr(
self.F_predicted_chol @ C.T,
self.theta_R)
# Compute the Kalman gain
sigma_pred = self.F_predicted_chol.T @ self.F_predicted_chol
c_sigma_pred = self.C @ sigma_pred
l_temp = spla.solve_triangular(gamma, c_sigma_pred, lower=False, trans=1)
kalman_gain = spla.solve_triangular(gamma, l_temp, lower=False).T
# Compute the output error
err = y - self.C @ self.state_predicted
# State update
self.state_measurement = self.state_predicted + kalman_gain @ err
# Covariance update
self.F_measurement_chol = SquareRootKalmanFilter.qrr(
self.F_predicted_chol @ (np.eye(nx) - kalman_gain @ self.C).T,
self.theta_R @ kalman_gain.T)
References
[1] B.D.O. Anderson and J.B. Moore, Optimal Filtering, Dover Publications, 2005
[2] K. Tracy, A square-root Kalman filter using only QR decompositions, arXiv: 2208.06452, 2022
Previous post: Kalman Filter VII: Recursive maximum a posteriori
Read next: The Kalman Filter IX: Information filter (coming soon)