Science Fair Projects Ideas - Cramér-Rao inequality

All Science Fair Projects

      

Science Fair Project Encyclopedia for Schools!

  Search    Browse    Forum  Coach    Links    Editor    Help    Tell-a-Friend    Encyclopedia    Dictionary     

Science Fair Project Encyclopedia

For information on any area of science that interests you,
enter a keyword (eg. scientific method, molecule, cloud, carbohydrate etc.).
Or else, you can start by choosing any of the categories below.

Cramér-Rao inequality

In statistics, the Cramér-Rao inequality, named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, states that the reciprocal of the Fisher information, \mathcal{I}(\theta), of a parameter θ, is a lower bound on the variance of an unbiased estimator of the parameter (denoted \hat{\theta}).

\mathrm{var} \left(\hat{\theta}\right) \geq \frac{1}{\mathcal{I}(\theta)} = \frac{1} {  \mathrm{E}  \left[   \left[    \frac{d}{d\theta} \log f(X;\theta)   \right]^2  \right] }

In some cases, no unbiased estimator exists that realizes the lower bound.

The Cramér-Rao inequality is also known as the Cramér-Rao bounds (CRB) or Cramér-Rao lower bounds (CRLB) because it puts a lower bounds on the variance of an estimator \hat{\theta}

Contents

Regularity conditions

This inequality relies on two weak regularity conditions on the probability density function, f(x;θ), and the estimator T(X):

  • The Fisher information is always defined; equivalently, for all x such that f(x;θ) > 0,
\frac{\partial}{\partial\theta} \ln f(x;\theta)
is finite.
  • The operations of integration with respect to s and differentiation with respect to θ can be interchanged in the expectation of T; that is,
\frac{\partial}{\partial\theta}  \left[   \int T(x) f(x;\theta) \,dx  \right]  =  \int T(x)   \left[    \frac{\partial}{\partial\theta} f(x;\theta)   \right]  \,dx
whenever the right-hand side is finite.

In some cases, a biased estimator can have both a variance and a mean squared error that are below the Cramér-Rao lower bound (the lower bound applies only to estimators that are unbiased). See bias (statistics).

If the second regularity condition extends to the second derivative, then an alternative form of Fisher information can be used and yields a new Cramér-Rao inequality

\mathrm{var} \left(\hat{\theta}\right) \geq \frac{1}{\mathcal{I}(\theta)} = \frac{1} {  -\mathrm{E}  \left[   \frac{d^2}{d\theta^2} \log f(X;\theta)  \right] }

In some cases, it may be easier to take the expectation with respect to the second derivative than to take the expectation of the square of the first derivative.

Multiparameter

Extending the Cramér-Rao inequality to multiple parameters, define a parameter column vector

\boldsymbol{\theta} = \left[ \theta_1, \theta_2, \dots, \theta_d \right]^T \in \mathbb{R}^d

with probability density function (pdf), f(x; \boldsymbol{\theta}), that satisfies the above two regularity conditions.

The Fisher information matrix is a d \times d matrix with element \mathcal{I}_{m, k} defined as

\mathcal{I}_{m, k} = \mathrm{E} \left[  \frac{d}{d\theta_m} \log f\left(x; \boldsymbol(\theta)\right)  \frac{d}{d\theta_k} \log f\left(x; \boldsymbol(\theta)\right) \right]

then the Cramér-Rao inequality is

\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right) \geq \frac  {\partial \boldsymbol{\psi} \left(\boldsymbol{\theta}\right)}  {\partial \boldsymbol{\theta}^T} \mathcal{I}\left(\boldsymbol{\theta}\right)^{-1} \frac  {\partial \boldsymbol{\psi}\left(\boldsymbol{\theta}\right)^T}  {\partial \boldsymbol{\theta}}

where

  • \boldsymbol{T}(X) = \begin{bmatrix} T_1(X) & T_2(X) & \cdots & T_d(X) \end{bmatrix}^T
  • \boldsymbol{\psi} = \mathrm{E}\left[\boldsymbol{T}(X)\right] = \begin{bmatrix} \psi_1\left(\boldsymbol{\theta}\right) &  \psi_2\left(\boldsymbol{\theta}\right) &  \cdots &  \psi_d\left(\boldsymbol{\theta}\right) \end{bmatrix}^T
  • \frac{\partial \boldsymbol{\psi}\left(\boldsymbol{\theta}\right)}{\partial \boldsymbol{\theta}^T} = \begin{bmatrix}  \psi_1 \left(\boldsymbol{\theta}\right) \\  \psi_2 \left(\boldsymbol{\theta}\right) \\  \vdots \\  \psi_d \left(\boldsymbol{\theta}\right) \end{bmatrix} \begin{bmatrix}  \frac{\partial}{\partial \theta_1} &  \frac{\partial}{\partial \theta_2} &  \cdots &  \frac{\partial}{\partial \theta_d} \end{bmatrix} = \begin{bmatrix}  \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} &  \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} &  \cdots &  \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \\  \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} &  \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} &  \cdots &  \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \\  \vdots &  \vdots &  \ddots &  \vdots \\  \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_1} &  \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_2} &  \cdots &  \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \end{bmatrix}
  • \frac{\partial \boldsymbol{\psi}\left(\boldsymbol{\theta}\right)^T}{\partial \boldsymbol{\theta}} = \begin{bmatrix}  \frac{\partial}{\partial \theta_1} \\  \frac{\partial}{\partial \theta_2} \\  \vdots \\  \frac{\partial}{\partial \theta_d} \end{bmatrix} \begin{bmatrix}  \psi_1 \left(\boldsymbol{\theta}\right) &  \psi_2 \left(\boldsymbol{\theta}\right) &  \cdots &  \psi_d \left(\boldsymbol{\theta}\right) \end{bmatrix} = \begin{bmatrix}  \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} &  \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_1} &  \cdots &  \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_1} \\  \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} &  \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_2} &  \cdots &  \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_2} \\  \vdots &  \vdots &  \ddots &  \vdots \\  \frac{\partial \psi_1 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} &  \frac{\partial \psi_2 \left(\boldsymbol{\theta}\right)}{\partial \theta_d} &  \cdots &  \frac{\partial \psi_d \left(\boldsymbol{\theta}\right)}{\partial \theta_d} \end{bmatrix}

And \mathrm{cov}_{\boldsymbol{\theta}} \left( \boldsymbol{T}(X) \right) is a positive-semidefinite matrix, that is

x^{T} \mathrm{cov}_{\boldsymbol{\theta}} \left( \boldsymbol{T}(X) \right) x \geq 0 \quad \forall x \in \mathbb{R}^d

If \boldsymbol{T}(X) = \begin{bmatrix} T_1(X) & T_2(X) & \cdots & T_d(X) \end{bmatrix}^T is an unbiased estimator (i.e., \boldsymbol{\psi}\left(\boldsymbol{\theta}\right) = \boldsymbol{\theta}) then the Cramér-Rao inequality is

\mathrm{cov}_{\boldsymbol{\theta}}\left(\boldsymbol{T}(X)\right) \geq \mathcal{I}\left(\boldsymbol{\theta}\right)^{-1}

Single-parameter proof

First, a more general version of the inequality will be proven; namely, that if the expectation of T is denoted by ψ(θ), then for all θ

{\rm var}(t(X)) \geq \frac{[\psi^\prime(\theta)]^2}{I(\theta)}

The Cramér-Rao inequality will then follow as a consequence.

Let X be a random variable with probability density function f(x,θ). Here T = t(X) is a statistic, which is used as an estimator for θ. If V is the score, i.e.

V = \frac{\partial}{\partial\theta} \log f(X;\theta)

then the expectation of V, written E(V), is zero. If we consider the covariance cov(V,T) of V and T, we have cov(V,T) = E(VT), because E(V) = 0. Expanding this expression we have

{\rm cov}(V,T) = {\rm E} \left(  T \cdot \frac{\partial}{\partial\theta} \ln f(X;\theta) \right)

This may be expanded using the chain rule

\frac{\partial}{\partial\theta} \ln Q = \frac{1}{Q}\frac{\partial Q}{\partial\theta}

and the definition of expectation gives, after cancelling f(x;θ),

{\rm E} \left(  T \cdot \frac{\partial}{\partial\theta} \ln f(X;\theta) \right) = \int  t(x)  \left[   \frac{\partial}{\partial\theta} f(x;\theta)  \right] \, dx = \frac{\partial}{\partial\theta} \left[  \int t(x)f(x;\theta)\,dx \right] = \psi^\prime(\theta)

because the integration and differentiation operations commute (second condition).

The Cauchy-Schwarz inequality shows that

\sqrt{ {\rm var} (T) {\rm var} (V)} \geq {\rm cov}(V,T) = \psi^\prime (\theta)

therefore

{\rm var\ } T \geq \frac{[\psi^\prime(\theta)]^2}{{\rm var} (V)} = \frac{[\psi^\prime(\theta)]^2}{I(\theta)} = \left[  \frac{\partial}{\partial\theta}  {\rm E} (T) \right]^2 \frac{1}{I(\theta)}

Q.E.D.

If T is an unbiased estimator of θ, that is, E(T) = θ, then ψ'(θ) = 1; the inequality then becomes

{\rm var}(T) \geq \frac{1}{I(\theta)}

This is the Cramér-Rao inequality.

The efficiency of T is defined as

e(T) = \frac{\frac{1}{I(\theta)}}{{\rm var}(T)}

or the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér-Rao lower bound thus gives e(T) \le 1.

Multivariate normal distribution

For the case of a d-variate normal distribution

\boldsymbol{x} \sim N_d \left(  \boldsymbol{\mu} \left( \boldsymbol{\theta} \right)  ,  C \left( \boldsymbol{\theta} \right) \right)

with a probability density function

f\left( \boldsymbol{x}; \boldsymbol{\theta} \right) = \frac{1}{\sqrt{ (2\pi)^d \left| C \right| }} \exp \left(  -\frac{1}{2}  \left(   \boldsymbol{x} - \boldsymbol{\mu}  \right)^{T}  C^{-1}  \left(   \boldsymbol{x} - \boldsymbol{\mu}  \right) \right).

The Fisher information matrix has elements

\mathcal{I}_{m, k} = \frac{\partial \boldsymbol{\mu}^T}{\partial \theta_m} C^{-1} \frac{\partial \boldsymbol{\mu}}{\partial \theta_k} + \frac{1}{2} \mathrm{tr} \left(  C^{-1}  \frac{\partial C}{\partial \theta_m}  C^{-1}  \frac{\partial C}{\partial \theta_k} \right)

where "tr" is the trace.

Let w[n] be a white Gaussian noise (a sample of N independent observations) with variance σ2

w[n] \sim \mathbb{N}_N \left( 0, \sigma^2 I \right).

Then the Fisher information matrix is 1 × 1

\mathcal{I}(\sigma^2) = \frac{1}{2} \mathrm{tr} \left(  C^{-1}  \frac{\partial C}{\partial \theta_m}  C^{-1}  \frac{\partial C}{\partial \theta_k} \right) = \frac{1}{2 \sigma^2} \mathrm{tr} \left(I\right) = \frac{N}{2 \sigma^2},

and so the Cramér-Rao inequality is

\mathrm{var}\left(\sigma^2\right) \geq \frac{2 \sigma^2}{N}.
10-26-2009 08:16:03
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details
Science kits, science lessons, science toys, maths toys, hobby kits, science games and books - these are some of many products that can help give your kid an edge in their science fair projects, and develop a tremendous interest in the study of science. When shopping for a science kit or other supplies, make sure that you carefully review the features and quality of the products. Compare prices by going to several online stores. Read product reviews online or refer to magazines.

Start by looking for your science kit review or science toy review. Compare prices but remember, Price $ is not everything. Quality does matter.
Science Fair Coach
What do science fair judges look out for?
ScienceHound
Science Fair Projects for students of all ages
All Science Fair Projects.com Site
All Science Fair Projects Homepage
Search | Browse | Links | From-our-Editor | Books | Help | Contact | Privacy | Disclaimer | Copyright Notice