Estimation Crash Course III: Cramér-Rao bound

Previous article: Estimation Crash Course II: Fisher information

The Cramér-Rao Bound is a lower bound on the variance of an unbiased estimator. The result we are about to state relies on the following weak regularity assumptions on the likelihood function, $\ell(\theta)$, and the estimator, $\widehat{\theta}$.

Regularity assumptions. In addition to the basic regularity assumptions we stated in this previous post, suppose that for all $x$ for which $p_X(x; \theta)>0$, the derivative $\frac{\partial}{\partial \theta}{}\;{}p_X(x; \theta)$ exists, and the following holds

$$ \frac{\partial}{\partial \theta}\int_E \widehat{\theta}(x)p_X(x; \theta){\rm d} x {}={} \int_E \widehat{\theta}(x) \frac{\partial}{\partial \theta}p_X(x; \theta){\rm d} x\tag{1}$$

whenever the right hand side exists and is finite.

Theorem 1 (Cramér-Rao lower bound). Let $X$ be a sample from with pdf $p_X(x;\theta)$ and $\widehat{\theta}$ is an unbiased estimator of $\theta$. Suppose that the regularity assumptions hold. Then,

$${\rm var}[\widehat{\theta}] {}\geq{} \frac{1}{I(\theta)}.\tag{2}$$

It follows from Theorem 1 that if an unbiased estimator attains the lower bound, i.e., if ${\rm var}[\widehat{\theta}]{}={}\frac{1}{I(\theta)},$ then it is a UMVUE. However, this lower bound is not tight in the sense that not all UMVUEs attain this lower bound. Let us give an example where this happens.

Example 1 (UMVUE via Cramér-Rao bound). In Example 1 in Part I we showed that the sample mean, $\bar{X}_N$, is an unbiased estimator of the mean $\mu$ and in Example 2 in Part I we showed that its variance is $\mathrm{var}[\bar{X}_N] = \sigma^2/N$ (and $\sigma^2$ is assumed to be known). Now assume that $X_1,\ldots,X_N{}\overset{\text{iid}}{\sim}{}\mathcal{N}(\mu,\sigma^2)$ for which the regularity assumptions hold and the Fisher information for $\mu$ is

$$I(\mu) = \frac{N}{\sigma^2}.\tag{3}$$

We see that $\mathrm{var}[\bar{X}_N] = 1/I(\mu)$, therefore, $\bar{X}_N$ is a UMVUE for $\mu$. $\heartsuit$

Example 1 (Estimator of Bernoulli parameter is UMVUE). Let $X_1, \ldots, X_N {}\overset{\text{iid}}{\sim}{} {\rm Ber}(p)$. The estimator

$$\widehat{p}(X_1,\ldots, X_N) {}={} \tfrac{1}{N}\sum_{i=1}^{N}X_i,\tag{4}$$

is unbiased (why?) and its variance is (see Part I, Eq. (4))

$${\rm var}[{}\widehat{p}{}] {}={} {\rm Var}\left[\left.\tfrac{1}{N}\sum_{i=1}^{N}X_i\right|p\right] {}\overset{\text{indep.}}{=}{} \tfrac{1}{N^2}\sum_{i=1}^{N}{\rm Var}[X_i] = \frac{p(1-p)}{N}.\tag{5}$$

Now if $X\sim{\rm Ber}(p)$, according to Equation (14) in Part II the Fisher information of $p$ is $I(p) = \tfrac{1}{p(1-p)}$ and following Exercise 2 in Part II, the Fisher information for the case of $N$ independent observations becomes

$$I(p) = \frac{N}{p(1-p)}.\tag{6}$$

The reader can verify that the regularity assumptions hold of the Bernoulli distribution. We see that ${\rm var}[{}\widehat{p}{}] = I(p)^{-1}$, therefore, $\widehat{p}$ is a UMVUE of $p$. $\heartsuit$

An unbiased estimator that attains the Cramér-Rao lower bound is called an efficient estimator. All efficient estimators are UMVUE, however not all UMVUEs are efficient.

We shall now state a very useful result that can be used to determine an efficient estimator (if it exists). The proof is a bit technical, so we will skip it.

Theorem 2 (Factorisation for efficient estimators). An efficient estimator, $\widehat{\theta}$, exists if and only if

$$\frac{\partial \ell(\theta; x)}{\partial \theta} {}={} I(\theta)[\widehat{\theta}(x) - \theta].\tag{7}$$

If follows from the theorem that $\mathrm{var}[\widehat{\theta}] = I(\theta)^{-1}$. Let us give an example where we "factorise" $\frac{\partial \ell(\theta; x)}{\partial \theta}$ as in Equation (7) to determine an efficient estimator.

Example 2 (Estimator of $\sigma^2$ with known mean). Suppose $X_1,\ldots, X_N {}\overset{\text{iid}}{\sim}{}\mathcal{N}(\mu, \sigma^2)$, where $\mu$ is known and $\sigma^2$ is an unknown parameter. The likelihood of $\sigma^2$ given a measurements $X=x$ is

$$\ell(\sigma^2) {}={} \log p_X(x; \mu, \sigma^2) {}={} -\tfrac{1}{2}\log(2\pi\sigma^2) -\frac{(x-\mu)^2}{2\sigma^2}.\tag{8}$$

We leave it to the reader to verify that the Fischer information for one observation is $\frac{1}{2\sigma^4},$ so for $N$ observations the Fisher information is (see Exercise 2 in Part II)

$$I(\sigma^2) = \frac{N}{2\sigma^4}.\tag{9}$$

In Example 4 in Part 1 we showed that $s^2_{\rm corr} = \frac{1}{N-1}\sum_{i=1}^{N}(X_i-\bar{X}_N)^2$ is an unbiased estimator of $\sigma^2$. It can be shown that the variance of this estimator is

$$\mathrm{var}[s^2_{\rm corr}] = \frac{2\sigma^4}{n-1} > \frac{\sigma^4}{n},\tag{10}$$

so $s^2_{\rm corr}$ does not attain the lower bound of the Cramér-Rao inequality. Let us now try to factorise $\frac{\partial \ell(\sigma^2; x)}{\partial \sigma^2}$ as in Equation (7):

$$\begin{aligned} \frac{\partial \ell(\theta; x)}{\partial \sigma^2} {}={} & \frac{\partial}{\partial \sigma^2}\log p(x_1, x_2, \ldots, x_N; \mu, \sigma^2) \\ {}={} & \frac{\partial}{\partial \sigma^2}\log \prod_{i=1}^{N}p(x_i; \mu, \sigma^2) {}={} \frac{\partial}{\partial \sigma^2} \sum_{i=1}^{N}\log p(x_i; \mu, \sigma^2) \\ {}={} & \frac{\partial}{\partial \sigma^2} \sum_{i=1}^{N} \log\left[\tfrac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x_i-\mu)^2}{2\sigma^2}\right)\right] \\ {}={} & \frac{\partial}{\partial \sigma^2}\sum_{i=1}^{N}\left[-\tfrac{1}{2}\log(2\pi\sigma^2) - \frac{(x_i-\mu)^2}{2\sigma^2}\right] \\ {}={} & \frac{\partial}{\partial \sigma^2} \left[-\frac{N}{2}\log(2\pi\sigma^2) - \sum_{i=1}^{N}\frac{(x_i-\mu)^2}{2\sigma^2}\right] \\ {}={} & -\frac{N}{2\sigma^2} + \sum_{i=1}^{N}\frac{(x_i-\mu)^2}{2\sigma^4}. \end{aligned}$$

If we now take out as a common factor $\frac{N}{2\sigma^4}$ - i.e., the Fisher information according to Equation (9) - we have

$$\begin{aligned} \frac{\partial \ell(\theta; x)}{\partial \sigma^2} {}={} & \underbrace{\frac{N}{2\sigma^4}}_{I(\sigma^2)} \left( \underbrace{\frac{\sum_{i=1}^{N}(x_i-\mu)^2}{N}}_{\text{efficient estimator}}-\sigma^2 \right).\tag{11} \end{aligned}$$

The regularity assumptions are satisfied in this case so from Theorem 2 we conclude that the estimator

$$\widehat{\sigma}^2 = \frac{\sum_{i=1}^{N}(x_i-\mu)^2}{N},\tag{12}$$

is an efficient estimator of $\sigma^2$ (thus UMVUE) and, by definition, its variance is $1/I(\sigma^2)=\frac{\sigma^4}{N}$. There is no other unbiased estimator of $\sigma^2$ with a lower variance. Note that this holds only under the assumption that $\mu$ is known. $\heartsuit$