[A, SfS] Chapter 1: Sampling, Descriptive Statistics, Intr: 1.9: Frequency Tables
Frequency Tables
Frequency Tables
In this lesson you will learn:
- How to summarize qualitative data.
- What absolute, relative, and cumulative frequencies are and how to calculate them.
#\text{}#
Thus far, we have discussed a number of descriptive techniques that can be used to summarize the measurements on a quantitative variable. The problem with these measures, however, is that they cannot be used to summarize the measurements on a qualitative variable. This is because, due to the non-numeric nature of nominal and ordinal data, no mathematical operations can be applied to such measurements. After all, we cannot subtract an apple from an orange, or divide a golden retriever by a german shepherd. As a result, measures such as the mean or median cannot be calculated when the variable being measured is qualitative in nature.
This does not mean, however, that there are no meaningful ways to summarize qualitative data. The primary method of summarizing measurements on a nominal or ordinal variable is to determine the frequency with which each value is observed in the data set (this also works for discrete variables that can only take on a small number of possible values).
There are a number of different types of frequencies that are important to know, but they all have their basis in the same fundamental concept: the absolute frequency.
Absolute Frequency
The number of times that a particular value occurs in the data set is called the absolute frequency.
Frequency Table
Frequency counts are generally organized in a frequency table, which lists all observed values and their associated frequency.
Suppose #20# people are asked what their favorite season is and give the following answers:
\[\text{Summer, Spring, Summer, Winter, Summer, Summer, Winter, Spring, Fall, Summer}\\
\text{Spring, Summer, Winter, Summer, Spring, Summer, Fall, Summer, Spring, Summer}\]
Then the frequency table summarizing these results would look like this:
Season | Absolute frequency |
Winter | #3# |
Spring | #5# |
Summer | #10# |
Fall | #2# |
A problem with absolute frequencies, however, is that they can be somewhat misleading when comparing frequencies. To illustrate this point, consider the following example.
Suppose you are about to go into surgery and are given the option between two surgeons: surgeon #A# and surgeon #B#. You are told that, in the previous year, both surgeons successfully completed this particular surgery #20# times. If you were to base your decision on this information alone, you would likely be indifferent between the two surgeons. However, if you are also told that surgeon #A# performed the surgery a total of #20# times and surgeon #B# performed the surgery a total of #40# times, then surgeon #A# suddenly seems like the much better choice. This is because the relative success rate of surgeon #A# #(100\%)# is much higher than the relative success rate of surgeon #B# #(50\%)#.
#\text{}#
When the goal is to compare observed frequencies, it is common practice to first transform the absolute frequencies into relative frequencies.
Relative Frequency
The relative frequency of a value is the proportion of times it occurs in the data set.
It is calculated by dividing the absolute frequency of a value by the total number of elements in the data set.
Continuing the example above, the relative frequencies are:
Season | Absolute frequency | Relative frequency |
Winter | #3# | #3/20 = 0.15# |
Spring | #5# | #5/20 = 0.25# |
Summer | #10# | #10/20 = 0.50# |
Fall | #2# | #2/20 = 0.10# |
#\text{}#
If a variable is measured on an ordinal level, and we are able to order the observed values, it is also possible to calculate cumulative frequencies.
Cumulative Frequencies
The cumulative frequency of a value is the number of elements in the data set that have a value equal to or less than the value in question.
Just like absolute frequencies, cumulative frequencies can be transformed into cumulative relative frequencies, by dividing by the total number of elements in the data set.
The following frequency table summarizes the performance of #25# Olympic athletes:
Medal | Absolute frequency | Cumulative frequency | Cumulative relative frequency |
No medal | #16# | #16# | #16/25 = 0.64# |
Bronze medal | #5# | #16 + 5 = 21# | #21/25 = 0.84# |
Silver medal | #3# | #16 + 5 + 3 = 24# | #24/25 = 0.96# |
Gold medal | #1# | #16 + 5 + 3 + 1 = 25# | #22/25 = 1.00# |
#\text{}#
Using R
Frequency table
Suppose you have measurements of a categorical variable on a sample, stored as a vector in your #\mathrm{R}# workspace. Each measurement is the name of a category, or perhaps a number that is the chosen code for that category.
You can make a frequency table for the variable using the #\mathtt{table()}# function. Suppose the variable is named #\mathtt{Treatment}#. Then use:
> table(Treatment)
This gives you the frequencies of the categories in the sample.
Relative frequency table
If you want the relative frequencies instead, divide the table by the sample size (which is the same as the length of the data vector):
> table(Treatment)/length(Treatment)
(Relative) frequency for a single category
If you are only interested in the (relative) frequency for one category of a categorical variable, you would get the frequency by counting the number of occurrences of that category using the #\mathtt{sum()}# function with a condition stated (either #==#,#!=#,#<#,#<=#,#># or #>=#):
> sum(Treatment == "Placebo")
Then the relative frequency (proportion) of sample elements in that category is this frequency divided by the sample size:
> sum(Treatment == "Placebo")/length(Treatment)
One can also use this approach to find the proportion of a sample having a value or range of values for a quantitative variable:
> sum(Age >= 21)/length(Age)
Or visit omptest.org if jou are taking an OMPT exam.