Hypothesis Test for a Population Proportion

[A, SfS] Chapter 6: Hypothesis Testing: 6.5: Test for Population Proportion

Hypothesis Test for a Population Proportion

Hypothesis test for a population proportion

In this section, we will look at how to test whether the proportion of the population having some characteristic is different from some benchmark value.

Suppose $X$ is a binary variable measured on a population in which an unknown proportion $p$ of the population meets some condition of interest, with $X = 1$ if the subject meets that condition and $X = 0$ otherwise. We can also think of $p$ as the probability that $X = 1$ for a randomly-selected element from the population.

Let $X_1,\ldots,X_n$ denote measurements of $X$ on a random sample from the population. We noted previously that

$S = X_1 + \cdots + X_n \sim B(n,p).$

Let $\hat{p} = \cfrac{S}{n}$ .

We saw previously that

$E(\hat{p}) = p \\\phantom{0}\\ \text{and}\\\phantom{0}\\ V(\hat{p}) = \cfrac{p(1 - p)}{n}.$

Moreover, we learned that when $n$ is large, the Central Limit Theorem implies that $\hat{p}$ has an approximate

$N\bigg(p,\cfrac{p(1 - p)}{n}\bigg)$ distribution.

We might hypothesize that $p$ is different from some specified benchmark value $p_0$ (or, similarly, that the probability $p$ that $X=1$ is different from $p_0$ ).

The research question of a hypothesis test for a population proportion is whether or not $p$ differs from some benchmark value $p_0$ .

Depending on the direction of the test, a hypothesis test for a population proportion has one of the following pairs of hypotheses:

Two-tailed	Left-tailed	Right-tailed
$H_0: p = p_0$ $H_1: p \neq p_0$	$H_0: p \geq p_0$ $H_1: p \lt p_0$	$H_0: p \leq p_0$ $H_1: p \gt p_0$

If the sample size $n$ is large, then the test statistic

$Z = \cfrac{\hat{p} - p_0}{\sqrt{\cfrac{p_0(1 - p_0)}{n}}}$ has an approximate $N(0,1)$ distribution when $H_0$ is true.

Thus values of $Z$ are extreme if they are far from $0$ , in either tail of the $N(0,1)$ density curve.

Calculating and Evaluating the P-value

Depending on the form of $H_1$ we compute the P-value based on the observed value $z$ of the test statistic $Z$ , just as we did previously in the tests for a population mean.

As a reminder, if we are testing for $H_1 : p \neq p_0$ then the P-value is computed in $\mathrm{R}$ using in the absolute value of the test statistic, with command

> 2*pnorm(abs(z),low=F)

If we are testing for $H_1: p < p_0$ then the P-value is computed in $\mathrm{R}$ with the command

> pnorm(z)

If we are testing for $H_1: p > p_0$ the p-value is computed in $\mathrm{R}$ with the command

> pnorm(z,low=F)

Given a specific significance level $\alpha$ we would reject $H_0$ and conclude $H_1$ if the P-value is $\leq \alpha$ . Otherwise, we would not reject $H_0$ , meaning that the evidence in the data is not inconsistent with the null hypothesis.

$\text{}$

We mentioned earlier that there is a correspondence between a hypothesis test about a population parameter $\theta$ at significance level $\alpha$ and a $(1 - \alpha)100\%$ confidence interval for $\theta$ .

This is not quite true for this setting, because when forming the CI we use $\tilde{p}$ to estimate the value of $p$ and we use

$\sqrt{\cfrac{\tilde{p}(1 - \tilde{p})}{n + 4}}$ to estimate the standard error of $\tilde{p}$ .

But in the hypothesis test we use $\hat{p}$ to estimate the value of $p$ and we use

$\sqrt{\cfrac{p_0(1 - p_0)}{n}}$ to estimate the standard error of $\hat{p}$ .

So this inconsistency could mean that the conclusion of the hypothesis test might not align with the corresponding CI. To prevent this inconsistency from occurring, calculate the confidence interval using $p_0$ instead.

If the $(1 - \alpha)100\%$ confidence interval for $p$ is formed by using

$\bigg(\hat{p} - z_{\alpha /2}\sqrt{\cfrac{p_0(1 - p_0)}{n}},\,\,\,\,\,\hat{p} + z_{\alpha /2}\sqrt{\cfrac{p_0(1 - p_0)}{n}}\bigg)$ then the correspondence between hypothesis testing and confidence intervals holds.

That is, if $p_0$ falls inside this CI then we would not reject $H_0: p = p_0$ at significance level $\alpha$ . If $p_0$ falls outside this CI then we would reject $H_0: p = p_0$ at significance level $\alpha$ .

Similar adjustments would be required for correspondence between one-sided tests and one-sided CIs.

$\text{}$
Even when $n$ is not large, we can still proceed with the same hypothesis test.

If the sample size $n$ is small, then the total number of successes $S$ has a $B(n,p_0)$ distribution when $H_0$ is true.

Depending on the form of $H_1$ , we can compute the P-value based on the observed value $s$ of the test statistic $S$ in $\mathrm{R}$ as follows:

Given $H_1: p < p_0$ , use the command:

> pbinom(s,n,p_0)

Given $H_1: p > p_0$ , use the command:

> pbinom(s-1,n,p_0,low=F)

Example:

Is the proportion $p$ of voters in Amsterdam who support the complete legalization of marijuana larger than $0.5$ ?

A researcher selects a random sample of $82$ Amsterdam voters and asked them for their opinion about this issue. Of these, $39$ said they were in favor of the complete legalization of marijuana, $36$ said they were against it, and $7$ did not have an opinion.

Conduct a hypothesis test at significance level $\alpha = 0.05$ to investigate this research question.

Solution:

We test $H_0 : p \leq 0.5$ against $H_1: p > 0.5$ at significance level $\alpha = 0.05$ . We will omit the participants who had no opinion.

Hence, the sample size is $n = 75$ , of which $X = 39$ are in favor of the complete legalization of marijuana, so that

$\hat{p} = \cfrac{39}{75} \approx 0.52$

The test statistic is

$z = \cfrac{0.52 - 0.5}{\sqrt{\cfrac{0.5(1 - 0.5)}{75}}} \approx 0.3464$ and the P-value is computed in $\mathrm{R}$ using

> pnorm(0.3464,low=F)

to be $0.365$ . This is larger than $0.05$ , so we do not reject $H_0$ .

The evidence does not support the hypothesis that the proportion of the voters in Amsterdam and support the complete legalization of marijuana is larger than $0.5$ .

Example:

It is alleged that a certain coin has a greater chance of coming up Heads than Tails, i.e., $H_1: p > 0.5$ , where $p$ denotes the probability of Heads. The coin is flipped $n = 20$ times and $S = 14$ Heads are observed.

Conduct a hypothesis test to determine whether the coin is fair or not at a significance level $\alpha = 0.05$ .

Solution

The null hypothesis is $H_0:p \leq 0.5$ .

Since $n$ is small, the P-value must be computed using the $B(20,0.5)$ distribution:

$P(S \geq 14 \ | \ p = 0.5) = 1 - P(S \leq 13 \ | \ p = 0.5)$

> 1-pbinom(13,20,0.5)

which equals $0.0577$ .

Since the P-value is larger than $0.05$ we would not conclude in favor of $H_1$ , i.e., we do not conclude that the coin is biased towards Heads. Fourteen or more Heads out of $20$ is not too unusual for a fair coin.

Note: We did not discuss the procedure to compute a CI for this situation. Because the distribution is discrete, the procedure is somewhat complex, so we will avoid it in this course.