Science Fair Projects Ideas - Sufficiency (statistics)

All Science Fair Projects

      

Science Fair Project Encyclopedia for Schools!

  Search    Browse    Forum  Coach    Links    Editor    Help    Tell-a-Friend    Encyclopedia    Dictionary     

Science Fair Project Encyclopedia

For information on any area of science that interests you,
enter a keyword (eg. scientific method, molecule, cloud, carbohydrate etc.).
Or else, you can start by choosing any of the categories below.

Sufficiency (statistics)

In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. A quantity T(X) that depends on the (observable) random variable X but not on the (unobservable) parameter θ is called a statistic. Sir Ronald Fisher tried to make precise the intuitive idea that a statistic may capture all of the information in X that is relevant to the estimation of θ. A statistic that does that is called a sufficient statistic.

Mathematical definition

The precise definition is this:

A statistic T(X) is sufficient for θ precisely if the conditional probability distribution of the data X given the statistic T(X) does not depend on θ.

An equivalent test, known as the Fisher's factorization criterion, is often used instead. If the probability density function (in the discrete case, the probability mass function) of X is f(x;θ), then T satisfies the factorization criterion if and only if functions g and h can be found such that

f(x;\theta)=g\left(T(x),\theta\right)h(x).

This is a product in which one factor, h, does not depend on θ and the other depends on x only through T(x). The way to think about this is to consider varying x in such a way as to maintain a constant value of T(X) and ask whether such a variation has any effect on inferences one might make about θ. If the factorization criterion above holds, the answer is "none" because the dependence of the likelihood function f on θ is unchanged.

Examples

  • If X1, ...., Xn are independent Bernoulli-distributed random variables with expected value p, then the sum T(X) = X1 + ... + Xn is a sufficient statistic for p.

This is seen by considering the joint probability distribution:

P(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

p^{x_1}(1-p)^{1-x_1} p^{x_2}(1-p)^{1-x_2}\cdots p^{x_n}(1-p)^{1-x_n}

and, collecting powers of p and 1 − p gives

p^{\sum x_i}(1-p)^{n-\sum x_i}=p^{T(x)}(1-p)^{n-T(x)}

which satisfies the factorization criterion, with h(x) being just the identity function. Note the crucial feature: the unknown parameter (here p) interacts with the observation x only via the statistic T(x) (here the sum Σ xi).

  • If X1, ...., Xn are independent and uniformly distributed on the interval [0,θ], then max(X1, ...., Xn ) is sufficient for θ.

To see this, consider the joint probability distribution:

P(X=x)=P(X_1=x_1,X_2=x_2,\ldots,X_n=x_n).

Because the observations are independent, this can be written as

\frac{H(\theta-x_1)}{\theta}\times \frac{H(\theta-x_2)}{\theta}\times\ldots\times \frac{H(\theta-x_n)}{\theta}

where H(x) is the Heaviside step function. This may be written as

\frac{H\left(\theta-\max(x_i)\right)}{\theta^n}

which shows that the factorization criterion is satisfied, again with h(x) being the identity function.

The Rao-Blackwell theorem

Since the conditional distribution of X given T(X) does not depend on θ, neither does the conditional expected value of g(X) given T(X), where g is any (sufficiently well-behaved) function. Consequently that conditional expected value is actually a statistic, and so is available for use in estimation. If g(X) is any kind of estimator of θ, then typically the conditional expectation of g(X) given T(X) is a better estimator of θ ; one way of making that statement precise is called the Rao-Blackwell theorem. Sometimes one can very easily construct a very crude estimator g(X), and then evaluate that conditional expected value to get an estimator that is in various senses optimal.

03-10-2013 05:06:04
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details
Science kits, science lessons, science toys, maths toys, hobby kits, science games and books - these are some of many products that can help give your kid an edge in their science fair projects, and develop a tremendous interest in the study of science. When shopping for a science kit or other supplies, make sure that you carefully review the features and quality of the products. Compare prices by going to several online stores. Read product reviews online or refer to magazines.

Start by looking for your science kit review or science toy review. Compare prices but remember, Price $ is not everything. Quality does matter.
Science Fair Coach
What do science fair judges look out for?
ScienceHound
Science Fair Projects for students of all ages
All Science Fair Projects.com Site
All Science Fair Projects Homepage
Search | Browse | Links | From-our-Editor | Books | Help | Contact | Privacy | Disclaimer | Copyright Notice