[A, SfS] Chapter 6: Hypothesis Testing: 6.5: Test for Population Proportion
Hypothesis Test for a Population Proportion
Hypothesis test for a population proportion
In this section, we will look at how to test whether the proportion of the population having some characteristic is different from some benchmark value.
Suppose is a binary variable measured on a population in which an unknown proportion of the population meets some condition of interest, with if the subject meets that condition and otherwise. We can also think of as the probability that for a randomly-selected element from the population.
Let denote measurements of on a random sample from the population. We noted previously that
Let .
We saw previously that
Moreover, we learned that when is large, the Central Limit Theorem implies that has an approximate
We might hypothesize that is different from some specified benchmark value (or, similarly, that the probability that is different from ).
Research Question and Hypotheses
The research question of a hypothesis test for a population proportion is whether or not differs from some benchmark value .
Depending on the direction of the test, a hypothesis test for a population proportion has one of the following pairs of hypotheses:
Two-tailed | Left-tailed | Right-tailed |
|
|
|
Test Statistic and Null Distribution
If the sample size is large, then the test statistic
Thus values of are extreme if they are far from , in either tail of the density curve.
Calculating and Evaluating the P-value
Depending on the form of we compute the P-value based on the observed value of the test statistic , just as we did previously in the tests for a population mean.
As a reminder, if we are testing for then the P-value is computed in using in the absolute value of the test statistic, with command
> 2*pnorm(abs(z),low=F)
If we are testing for then the P-value is computed in with the command
> pnorm(z)
If we are testing for the p-value is computed in with the command
> pnorm(z,low=F)
Given a specific significance level we would reject and conclude if the P-value is . Otherwise, we would not reject , meaning that the evidence in the data is not inconsistent with the null hypothesis.
We mentioned earlier that there is a correspondence between a hypothesis test about a population parameter at significance level and a confidence interval for .
This is not quite true for this setting, because when forming the CI we use to estimate the value of and we use
But in the hypothesis test we use to estimate the value of and we use
So this inconsistency could mean that the conclusion of the hypothesis test might not align with the corresponding CI. To prevent this inconsistency from occurring, calculate the confidence interval using instead.
Re-establishing the Connection Between Hypothesis Testing and Confidence Intervals
If the confidence interval for is formed by using
That is, if falls inside this CI then we would not reject at significance level . If falls outside this CI then we would reject at significance level .
Similar adjustments would be required for correspondence between one-sided tests and one-sided CIs.
Even when is not large, we can still proceed with the same hypothesis test.
Test Statistic and Null Distribution
If the sample size is small, then the total number of successes has a distribution when is true.
Calculating and Evaluating the P-value
Depending on the form of , we can compute the P-value based on the observed value of the test statistic in as follows:
Given , use the command:
> pbinom(s,n,p_0)
Given , use the command:
> pbinom(s-1,n,p_0,low=F)
Given a specific significance level we would reject and conclude if the P-value is . Otherwise, we would not reject , meaning that the evidence in the data is not inconsistent with the null hypothesis.
Example:
Is the proportion of voters in Amsterdam who support the complete legalization of marijuana larger than ?
A researcher selects a random sample of Amsterdam voters and asked them for their opinion about this issue. Of these, said they were in favor of the complete legalization of marijuana, said they were against it, and did not have an opinion.
Conduct a hypothesis test at significance level to investigate this research question.
Solution:
We test against at significance level . We will omit the participants who had no opinion.
Hence, the sample size is , of which are in favor of the complete legalization of marijuana, so that
The test statistic is
> pnorm(0.3464,low=F)
to be . This is larger than , so we do not reject .
The evidence does not support the hypothesis that the proportion of the voters in Amsterdam and support the complete legalization of marijuana is larger than .
Example:
It is alleged that a certain coin has a greater chance of coming up Heads than Tails, i.e., , where denotes the probability of Heads. The coin is flipped times and Heads are observed.
Conduct a hypothesis test to determine whether the coin is fair or not at a significance level .
Solution
The null hypothesis is .
Since is small, the P-value must be computed using the distribution:
> 1-pbinom(13,20,0.5)
which equals .
Since the P-value is larger than we would not conclude in favor of , i.e., we do not conclude that the coin is biased towards Heads. Fourteen or more Heads out of is not too unusual for a fair coin.
Or visit omptest.org if jou are taking an OMPT exam.