Estimating the Proportion of Large Populations
Statistics Tutorial: Estimating a Proportion (Large Sample)
Suppose we select all possible samples of size n from a population of
size N. For each sample, we compute a sample proportion, p. The
relationship between the sample proportion, p, and the population
proportion, π, is described by the sampling distribution of the proportion.
The Sampling Distribution of a Proportion
When we examine the sampling distribution of the proportion, we find the
- The average of all possible sample proportions is equal to the population
proportion, π. Thus, if there are k possible
samples of size n, then:
π = [ p1 + p2 + . . . + pk] / k
- The standard deviation of the sampling distribution (also known as the
standard error) indicates the “average” deviation between the k sample
proportions and the true population proportion, π.
The standard error of the proportion σp is:
σp = sqrt[ π *( 1 – π ) / n ] * sqrt[ ( N – n ) / ( N – 1 ) ]
where π is the population proportion, n is the sample size, and N is the population size.
- The central limit theorem states that the sampling distribution of any statistic will be normal or nearly normal, if the sample size is large enough. Generally, a
sample size that is greater than or equal to 30 is considered “large enough”.
Therefore, if the sample size is large, the sampling distribution of a
proportion will be approximately normal in shape.
How to Find the Confidence Interval for a Proportion
A confidence interval provides the most useful estimate of a population proportion. When the sample size is large (greater than or equal to 30), the following six steps can be
used to construct a confidence interval.
- Select a confidence level.
- Compute alpha.
- Identify a sample statistic to serve as a point estimate of the population parameter. Since we are estimating the population proportion, the logical sample statistic is the sample proportion.
- Specify the sampling distribution of the statistic. This distribution (its shape, its mean, and its standard deviation) is described above, in the first part of this
- Based on the sampling distribution of the statistic, find the value for
which the cumulative probability is 1 – alpha/2. That value is the upper limit of the range of the confidence interval.
- In a similar way, find the value for which the cumulative probability is
alpha/2. That value is the lower limit of the range of the confidence interval.
Taken from: Probability, Statistics, and Survey Sampling Retrieved November 28, 2006 from http://www.stattrek.com/Lesson4/Proportion.aspx