Independent Proportions Z-test: Test Statistic and p-value

Chapter 8. Testing for Differences in Mean and Proportion: Independent Proportions Z-test

Independent Proportions Z-test: Test Statistic and p-value

Independent proportions Z-test: Test Statistic

Let $X_1$ denote the number of successes in the first sample and $X_2$ the number of successes in the second sample. Then $\hat{p}_1$ and $\hat{p}_2$ are the sample proportions:

$\hat{p}_1 = \cfrac{X_1}{n_1} \phantom{000000} \hat{p}_2 = \cfrac{X_2}{n_2}$

Besides the individual sample proportions, we will also need the pooled sample proportion $\hat{p}$ in order to calculate the test statistic:

$\hat{p} = \cfrac{X_1+X_2}{n_1+n_2}$

The test statistic of a independent proportions $Z$ -test is denoted $Z$ and is computed with the following formula:

$Z=\cfrac{(\hat{p}_1-\hat{p}_2) - (\pi_1 - \pi_2)}{s_{(\hat{p}_1 - \hat{p}_2)}} = \cfrac{\hat{p}_1-\hat{p}_2 }{\sqrt{\hat{p}\cdot(1-\hat{p})\cdot(\cfrac{1}{n_1}+\cfrac{1}{n_2})}}$

where $s_{(\hat{p}_1 - \hat{p}_2)}$ is the standard error of the proportion difference.

When both samples are large $(n_1 \geq 30 \text{ and } n_2 \geq 30)$ , the $Z$ -statistic follows the Standard Normal Distribution under the null hypothesis of the test:

$Z \sim N(0,1)$

Calculating the p-value of an independent proportions Z-test with Statistical Software

The calculation of the $p$ -value of an independent proportions $Z$ -test is dependent on the direction of the test and can be performed using either Excel or R.

To calculate the $p$ -value of an independent proportions $Z$ -test for $\pi_1 - \pi_2$ in Excel, make use of one of the following commands:

$\begin{array}{llll} \phantom{0}\text{Direction}&\phantom{000000}H_0&\phantom{000000}H_a&\phantom{0000000000}\text{Excel Command}\\ \hline \text{Two-tailed}&H_0:\pi_1 - \pi_2 = 0&H_a:\pi_1 - \pi_2 \neq 0&=2 \text{ * }(1 \text{ - }\text{NORM.DIST}(\text{ABS}(z),0,1,1))\\ \text{Left-tailed}&H_0:\pi_1 - \pi_2 \geq 0&H_a:\pi_1 - \pi_2 \lt 0&=\text{NORM.DIST}(z,0,1,1)\\ \text{Right-tailed}&H_0:\pi_1 - \pi_2 \leq 0&H_a:\pi_1 - \pi_2 \gt 0&=1 \text{ - }\text{NORM.DIST}(z,0,1,1)\\ \end{array}$

To calculate the $p$ -value of an independent proportions $Z$ -test for $\pi_1 - \pi_2$ in R, make use of one of the following commands:

$\begin{array}{llll} \phantom{0}\text{Direction}&\phantom{000000}H_0&\phantom{000000}H_a&\phantom{0000000000}\text{R Command}\\ \hline \text{Two-tailed}&H_0:\pi_1 - \pi_2 = 0&H_a:\pi_1 - \pi_2 \neq 0&2 \text{ * }\text{pnorm}(\text{abs}(z),0,1, \text{FALSE})\\ \text{Left-tailed}&H_0:\pi_1 - \pi_2 \geq 0&H_a:\pi_1 - \pi_2 \lt 0&\text{pnorm}(z,0,1, \text{TRUE})\\ \text{Right-tailed}&H_0:\pi_1 - \pi_2 \leq 0&H_a:\pi_1 - \pi_2 \gt 0&\text{pnorm}(z,0,1, \text{FALSE})\\ \end{array}$

If $p \leq \alpha$ , reject $H_0$ and conclude $H_a$ . Otherwise, do not reject $H_0$ .

Is the on-time rate for trains arriving at Amsterdam Central Station the same during the morning and the evening commutes? To investigate this matter, a researcher sampled the arrival times of $n_1=52$ trains on weekday mornings and $n_2=56$ trains on weekday evenings.

The researcher plans on using an independent proportions $Z$ -test to determine whether or not there is a significant difference between the morning and evening on-time arrival rate, at the $\alpha = 0.03$ level of significance.

Out of the $52$ morning trains, $X_1=44$ arrived on time. Out of the $56$ evening trains, $X_2=45$ arrived on time.

Calculate the $p$ -value of the test and make a decision regarding $H_0: \pi_1 - \pi_2 = 0$ . Round your answer to $3$ decimal places.

$p=0.561$

On the basis of this $p$ -value, $H_0$ should not be rejected, because $\,p$ $\gt$ $\alpha$ .

There are a number of different ways we can calculate the $p$ -value of the test. Click on one of the panels to toggle a specific solution.

Excel Calculation

Compute the sample proportions $\hat{p}_1$ and $\hat{p}_2$ :

$\hat{p}_1=\cfrac{X_1}{n_1}=\cfrac{44}{52}=0.84615\\ \hat{p}_2=\cfrac{X_2}{n_2}=\cfrac{45}{56}=0.80357$
Compute the pooled sample proportion $\hat{p}$ :

$\hat{p}=\cfrac{X_1 + X_2 }{n_1 + n_2}=\cfrac{44 + 45}{52 + 56}=0.82407$
Compute the $Z$ -statistic:

$z=\cfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} \cdot (1-\hat{p}) \cdot \bigg(\cfrac{1}{n_1}+\cfrac{1}{n_2} \bigg)}} =\cfrac{0.84615 - 0.80357}{\sqrt{0.82407 \cdot (1-0.82407) \cdot \bigg(\cfrac{1}{52}+\cfrac{1}{56} \bigg)}}=0.5807$
Since both $n_1$ and $n_2$ are considered large ( $\gt 30$ ), the Central Limit Theorem applies and we know that the test statistic

$Z=\cfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} \cdot (1-\hat{p}) \cdot \bigg(\cfrac{1}{n_1}+\cfrac{1}{n_2} \bigg)}}$

approximately has the Standard Normal Distribution, under the assumption that $H_0$ is true.

For a two-tailed $Z$ -test, the $p$ -value is defined as $2\cdot \mathbb{P}(Z \geq |z|)$ . To calculate this value in Excel, make use of the following function:

NORM.DIST(x, mean, standard_dev, cumulative)

x: The value at which you wish to evaluate the distribution function.

mean: The mean of the distribution.

standard_dev: The standard deviation of the distribution.

cumulative: A logical value that determines the form of the function.

TRUE - uses the cumulative distribution function, $\mathbb{P}(X \leq x)$

FALSE - uses the probability density function

Thus, to calculate $p = 2\cdot \mathbb{P}(Z \geq |z|)$ , run the following command:

$=2 \text{ * }(1 \text{ - } \text{NORM.DIST}(\text{ABS}(z),0,1,1))\\ \downarrow\\ =2 \text{ * }(1 \text{ - } \text{NORM.DIST}(\text{ABS}(0.58072),0,1,1))$
This gives:

$p = 0.561$
Since $\,p$ $\gt$ $\alpha$ , $H_0: \pi_1 - \pi_2 = 0$ should not be rejected.

R Calculation

Compute the sample proportions $\hat{p}_1$ and $\hat{p}_2$ :

$\hat{p}_1=\cfrac{X_1}{n_1}=\cfrac{44}{52}=0.84615\\ \hat{p}_2=\cfrac{X_2}{n_2}=\cfrac{45}{56}=0.80357$
Compute the pooled sample proportion $\hat{p}$ :

$\hat{p}=\cfrac{X_1 + X_2 }{n_1 + n_2}=\cfrac{44 + 45}{52 + 56}=0.82407$
Compute the $Z$ -statistic:

$Z=\cfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p} \cdot (1-\hat{p}) \cdot \bigg(\cfrac{1}{n_1}+\cfrac{1}{n_2} \bigg)}}$

pnorm(q, mean, sd, lower.tail)

q: The value at which you wish to evaluate the distribution function.

mean: The mean of the distribution.

sd: The standard deviation of the distribution.

lower.tail: If TRUE (default), probabilities are $\mathbb{P}(X \leq x)$ , otherwise, $\mathbb{P}(X \gt x)$ .

Thus, to calculate $p = 2\cdot \mathbb{P}(Z \geq |z|)$ , run the following command:

$2 \text{ * } \text{pnorm}(q = \text{abs}(z), mean = 0, sd = 1,lower.tail = \text{FALSE})\\ \downarrow\\ 2\text{ * } \text{pnorm}(q = \text{abs}(0.58072), mean = 0, sd = 1,lower.tail = \text{FALSE})$
This gives:

$p = 0.561$
Since $\,p$ $\gt$ $\alpha$ , $H_0: \pi_1 - \pi_2 = 0$ should not be rejected.

New example