This will be a short blog post on the Rayleigh quotient of a symmetric matrix, \( A\), which is defined as
$$R_A(x) = \frac{x^\intercal A x}{|x|^2},$$
for \( x \in\mathbb{R}^n\), with \( x \neq 0\).
Some properties
Firstly let us prove the following useful preliminary result [1, Section 2.1, Exercise 6 (p19)]:
Lemma 1. Let the function \( f: \mathbb{R}^n\setminus\{0\} \to \mathbb{R}\) be continuous, satysfying \( f(\lambda x) = f(x)\) for all \( \lambda > 0\) and \( x \in\mathbb{R}^n\), with \( x \neq 0\). Then \( f\) has a minimiser.
Proof. Note that the range of the function is \( f(\mathbb{R} \setminus \{0\}) = f(S_1)\), where \( S_1 = \{x \in \mathbb{R}^n {}:{} \|x\| = 1\}\), which is a compact set. Since \( f\) is continuous, \( f(S_1)\) is a compact subset of \( \mathbb{R}\), therefore it has a minimum. \( \Box\)
As a result, the Rayleigh quotient of a symmetric matrix \( A\) has a minimiser.
Let us now determine the directional derivative of the Rayleight quotient. Here we use the definition of the directional derivative (along a direction \( d\)) as stated in [1, Section 2.1]. The directional derivative of a function \( f: \mathbb{R}^n\to\mathbb{R}\) along a direction \( d\in\mathbb{R}^n\) is given by
$$\begin{aligned}f’(x; d) = \lim_{t\downarrow 0}\frac{f(x + td)-f(x)}{t}.\end{aligned}$$
Note that this is not the same as a common definition of the directional derivative where we take "\( h \to 0\)", instead of \( h \downarrow 0\). If \( f\) is directionally differentiable along all directions \( d\in\mathbb{R}^n\) and there is a vector \( \nabla f(x)\) such that \( f'(x; d) = \langle \nabla f(x), d\rangle\) then we say that \( f\) is Gâteaux-differentiable at \( x\). A lot of interesting facts about directional derivatives can be found in this blog post by Nguyen Mau Nam.
Proposition 2 (Directional Derivative of Rayleight Quotient).The directional derivative of \( R_A\) along a direction \( d \in \mathbb{R}^n\), \( d \neq 0\) at a point \( x\in\mathbb{R}^n\), with \( x \neq 0\) is given by
$$\begin{aligned} R_A'(x; d) = \frac{2}{\|x\|^4}\left( \|x\|^2 x^\intercal A d - x^\intercal A x x^\intercal d \right).\end{aligned}$$
Proof. It is
$$\begin{aligned}R_A'(x; d) {}={}& \lim_{t{}\downarrow{}0} \frac{1}{t}\left(\frac{(x+td)^\intercal A (x+td)}{\|x+td\|^2} - \frac{x^\intercal A x}{\|x\|^2}\right) \\ {}={}& \lim_{t{}\downarrow{}0} \frac{1}{t}\left(\frac{\|x\|^2 (x^\intercal A x + t^2 d^\intercal A d + 2tx^\intercal A d) - x^\intercal A x \|x+td\|^2}{\|x+td\|^2 + \|x\|^2}\right) \\ {}={}& \ldots \\ {}={}& \lim_{t{}\downarrow{}0} \frac{2(x^\intercal A d \|x\|^2 - x^\intercal d x^\intercal A x) + t \cdot (\|x\|^2 d^\intercal A d - x^\intercal A x \|d\|^2)}{\|x\|^4 + t \cdot 2x^\intercal d \|x\|^2 + t^2 \cdot \|d\|^2 \|x\|^2} \\ {}={}& \frac{2(x^\intercal A d \|x\|^2 - x^\intercal d x^\intercal A x)}{\|x\|^4},\end{aligned}$$
which completes the proof. \( \Box\)
For that we can tell that
Proposition 3 (Gradient of Rayleigh Quotient). The Rayleigh Quotient is Gâteaux-differentiable over \( \mathbb{R}^n \setminus \{0\}\) with gradient
$$\begin{aligned} \nabla R_A’(x) {}={}& \frac{2}{|x|^4}(|x|^2 Ax - |x|_A^2 x) \ {}={}& \frac{2}{|x|^2}(Ax - R_A(x) x),\end{aligned}$$
where we have used the notation \( \|x\|_A^2 = x^\intercal A x\) (this is not a norm unless \( A\) is positive definite as well).
Again, since \( R_A(\mathbb{R} \setminus \{0\})\) is compact, \( R_A\) admits at least one minimum and maximum value over \( S_1\). In fact, if \( x^\star\) is a minimiser with \( \|x^\star\| = 1\), then any point \( x = \lambda x^\star\) with \( \lambda > 0\) is also a minimiser. By Fermat's theorem, at any minimiser \( x^\star\) and maximiser \( x^{\star\star}\) the gradient of the Rayleigh quotient vanishes, i.e.,
$$\begin{aligned} Ax = R_A(x) x,\end{aligned}$$
which means that \( x\) and \( Ax\) are colinear. In other words, \( x^\star\) and \( x^{\star\star}\) are eigenvectors of \( A\) and \( R_A(x^\star)\), \( R_A(x^{\star\star})\) are the corresponding eigenvalues.
It is evident that
$$\begin{aligned} \max_{x\neq 0}R_A(x) {}={}& \lambda_{\max}(A),\ \min_{x\neq 0}R_A(x) {}={}& \lambda_{\min}(A). \end{aligned}$$
One last thing. Let us now focus on the following optimisation problem
$$\begin{aligned} \mathrm{Minimise}_{\|x\|^2 = 1} x^\intercal A x, \end{aligned}$$
where \( A\) is a symmetric matrix. The Lagrangian is \( L: \mathbb{R}^n \times \mathbb{R} \to \mathbb{R}\) given by
$$\begin{aligned} L(x, \lambda) = x^\intercal A x + \lambda (\|x\|^2 - 1), \end{aligned}$$
where \( \nabla_x L(x, \lambda) = 2Ax - 2\lambda x\). By setting this to zero we see that \( Ax = \lambda x\), so we conclude that \( x\) must be an eigenvector of \( A\) (in particular, it should have unit norm) and \( \lambda\) is the corresponding (needless to say, real) eigenvalue.
References
[1] JM Borwein and AS Lewis, Convex Analysis and Nonlinear Optimization, Canadian Mathematical