[A, SfS] Chapter 3: Probability Distributions: 3.1: The Binomial Distribution
The Binomial Distribution
The Binomial Distribution
In this lesson, you will learn about modeling the probability of successes out of independent repetitions of an experiment with possible outcomes.
Many random variables can be modeled with a named distribution model. There are numerous such distribution models, for both discrete and continuous random variables. We will focus here on the models that are relevant for statistics, and refer to other models only for examples and exercises. A full probability course would cover many other models.
We begin with the most important model among discrete random variables: the Binomial Distribution.
Bernoulli Trial
Consider an experiment with exactly possible outcomes (e.g., flipping a coin). We can arbitrarily label one of the outcomes “success” and the other “failure”. The probability of a success for this experiment is denoted . Such an experiment is called a Bernoulli Trial.
Mean and Variance of Bernoulli Distribution
Let denote the outcome of a Bernoulli Trial, where if the trial results in a success, and if the trial results in a failure.
So, the probability mass function of is:
The mean of the Bernoulli distribution is easy to compute:
The variance is also easy to compute:
For example, suppose you are escaping from a castle, and you come to four doors. A wizard tells you that one of the doors leads to freedom, and the other three doors lead to doom.
You will select a door randomly, so the probability of success (i.e., freedom) is .
If we define to be equal if you choose the door to freedom and to equal otherwise, then has a Bernoulli distribution with:
Binomial Distribution
Now suppose we have a situation in which we have independent repetitions of a Bernoulli trial (e.g., we are going to flip a coin 10 times).
Let denote the total number of successes observed among the trials. Note that can equal one of the values in the set .
We then say that has the Binomial distribution with parameters and .
We can write this in symbols as (if we have only one repetition, then and we say that has the Bernoulli distribution).
Binomial Probability Mass Function
We want the probability mass function for in this situation i.e., a general formula for .
Out of repetitions, there are ways to have successes (and thus failures).
For any particular combination of successes and failures, the probability of that combination is thus:
Putting it all together, we have a formula for the pmf of :
for .
For example, if , then:
Binomial Cumulative Distribution Function
The cumulative distribution function for the binomial distribution is a summation of the probability mass function from to the largest integer in the set that is smaller than or equal to .
If then every integer in that set is larger than , so .
If then , since we would sum the pmf over all the possible values of .
That is, if then:
where is the largest integer in the set such that .
Now suppose . What are the mean and variance of ?
Since is the number of successes in independent Bernoulli Trials, we can write
where is the number of successes ( or ) in the first trial, is the number of successes ( or ) in the second trial, …, and is the number of successes ( or ) in the th trial.
Thus for each . Then:
And because through are independent:
Mean and Variance of the Binomial Probability Distribution
In summary, if then:
A system consists of identical components, each of which works independently from the others. Each component has a probability of failure of .
If one or more components fail, then the whole system will shut down. What is the probability that the whole system will shut down?
Let denote the number of components out of that fail. Then .
The whole system will shut down if . Thus we need:
For the previous question, what is the mean number of components out of in the system that fail?
Since :
For the previous question, what is the variance in the number of components out of in the system that fail?
Since :
Consider this situation: you have a population consisting of elements. A proportion of those elements have some characteristic of interest. You will randomly select one element from the population and note whether the element has the characteristic of interest (a success) or not (a failure). Let if the outcome is a failure and if it is a success. The probability the outcome will be a success is the same as the proportion of successes in the population.
If of the elements in the population are green, then the probability that a randomly-selected element will be green is . Thus we can conclude that .
Now suppose you will randomly select elements from the population and count the number of successes out of the selected elements. Can we say that ?
No. Why not? Because once the first element is selected, the proportion of successes remaining in the population will change (either increase or decrease). Once the second element is selected, the proportion will change again, and so on. So we cannot think of this as identical repetitions of the same Bernoulli experiment. It would be incorrect to model this using the binomial distribution (the hypergeometric distribution, which we will not discuss in this course, is the correct model to use).
However, if the population is very large relative to , then the amount by which changes after each element is selected may be so small that we may ignore this issue.
Binomial Distribution and Sampling
If the population you are sampling from is relatively large compared to the sample size , then the probability of a success may be assumed to be constant, and the number of successes out of the selected elements may be assumed to be binomially distributed with parameters and .
If you randomly sample people from a population of people, of which are carriers of a specific disease, then it is considered acceptable to model the number of people in your sample who are carriers of the disease as .
It is not a perfectly correct model, but it is accurate enough in practice, and it is indeed what we do in statistics quite regularly, as you will see.
Using R
Binomial Probability Mass Function
Suppose is modeled with a binomial distribution with parameters and . In , if you want to compute for , you can do so without using the formula given for the pmf of the binomial distribution. Instead, use:
> dbinom(x,n,p)
For example, if , then can be found quickly using:
> dbinom(5,8,0.45)
Binomial Cumulative Distribution Function
If you want for , you can also do this with a single command in :
> pbinom(x,n,p)
For example, if , then is found using:
> pbinom(5,8,0.45)
And is found using:
> 1 - pbinom(1,8,0.45)
While is found using
> pbinom(6,8,0.45) - pbinom(2,8,0.45)
Generating a Random Sample from a Binomial Distribution
Now suppose you want to generate a random sample of size from a binomial distribution with parameters and . This can be performed in using:
> rbinom(N,n,p)
For example, to generate a sample of values from a distribution, and save it as a vector named , use:
> MySpace = rbinom(100,12,0.7)
Or visit omptest.org if jou are taking an OMPT exam.