Frequency Tables

[A, SfS] Chapter 1: Sampling, Descriptive Statistics, Intr: 1.9: Frequency Tables

Frequency Tables

In this lesson you will learn:

How to summarize qualitative data.
What absolute, relative, and cumulative frequencies are and how to calculate them.

#\text{}#

Thus far, we have discussed a number of descriptive techniques that can be used to summarize the measurements on a quantitative variable. The problem with these measures, however, is that they cannot be used to summarize the measurements on a qualitative variable. This is because, due to the non-numeric nature of nominal and ordinal data, no mathematical operations can be applied to such measurements. After all, we cannot subtract an apple from an orange, or divide a golden retriever by a german shepherd. As a result, measures such as the mean or median cannot be calculated when the variable being measured is qualitative in nature.

This does not mean, however, that there are no meaningful ways to summarize qualitative data. The primary method of summarizing measurements on a nominal or ordinal variable is to determine the frequency with which each value is observed in the data set (this also works for discrete variables that can only take on a small number of possible values).

There are a number of different types of frequencies that are important to know, but they all have their basis in the same fundamental concept: the absolute frequency.

Absolute Frequency

The number of times that a particular value occurs in the data set is called the absolute frequency.

Frequency Table

Frequency counts are generally organized in a frequency table, which lists all observed values and their associated frequency.

Suppose #20# people are asked what their favorite season is and give the following answers:

\[\text{Summer, Spring, Summer, Winter, Summer, Summer, Winter, Spring, Fall, Summer}\\
\text{Spring, Summer, Winter, Summer, Spring, Summer, Fall, Summer, Spring, Summer}\]

Then the frequency table summarizing these results would look like this:

Season	Absolute frequency
Winter	#3#
Spring	#5#
Summer	#10#
Fall	#2#

A problem with absolute frequencies, however, is that they can be somewhat misleading when comparing frequencies. To illustrate this point, consider the following example.

Suppose you are about to go into surgery and are given the option between two surgeons: surgeon #A# and surgeon #B#. You are told that, in the previous year, both surgeons successfully completed this particular surgery #20# times. If you were to base your decision on this information alone, you would likely be indifferent between the two surgeons. However, if you are also told that surgeon #A# performed the surgery a total of #20# times and surgeon #B# performed the surgery a total of #40# times, then surgeon #A# suddenly seems like the much better choice. This is because the relative success rate of surgeon #A# #(100\%)# is much higher than the relative success rate of surgeon #B# #(50\%)#.

#\text{}#

When the goal is to compare observed frequencies, it is common practice to first transform the absolute frequencies into relative frequencies.

Relative Frequency

The relative frequency of a value is the proportion of times it occurs in the data set.

It is calculated by dividing the absolute frequency of a value by the total number of elements in the data set.

Continuing the example above, the relative frequencies are:

Season	Absolute frequency	Relative frequency
Winter	#3#	#3/20 = 0.15#
Spring	#5#	#5/20 = 0.25#
Summer	#10#	#10/20 = 0.50#
Fall	#2#	#2/20 = 0.10#

#\text{}#

If a variable is measured on an ordinal level, and we are able to order the observed values, it is also possible to calculate cumulative frequencies.

Cumulative Frequencies

The cumulative frequency of a value is the number of elements in the data set that have a value equal to or less than the value in question.

Just like absolute frequencies, cumulative frequencies can be transformed into cumulative relative frequencies, by dividing by the total number of elements in the data set.

The following frequency table summarizes the performance of #25# Olympic athletes:

Medal	Absolute frequency	Cumulative frequency	Cumulative relative frequency
No medal	#16#	#16#	#16/25 = 0.64#
Bronze medal	#5#	#16 + 5 = 21#	#21/25 = 0.84#
Silver medal	#3#	#16 + 5 + 3 = 24#	#24/25 = 0.96#
Gold medal	#1#	#16 + 5 + 3 + 1 = 25#	#22/25 = 1.00#

#\text{}#

Using R

Frequency table

Suppose you have measurements of a categorical variable on a sample, stored as a vector in your #\mathrm{R}# workspace. Each measurement is the name of a category, or perhaps a number that is the chosen code for that category.

You can make a frequency table for the variable using the #\mathtt{table()}# function. Suppose the variable is named #\mathtt{Treatment}#. Then use:

> table(Treatment)

This gives you the frequencies of the categories in the sample.

Relative frequency table

If you want the relative frequencies instead, divide the table by the sample size (which is the same as the length of the data vector):

> table(Treatment)/length(Treatment)

(Relative) frequency for a single category

If you are only interested in the (relative) frequency for one category of a categorical variable, you would get the frequency by counting the number of occurrences of that category using the #\mathtt{sum()}# function with a condition stated (either #==#,#!=#,#<#,#<=#,#># or #>=#):

> sum(Treatment == "Placebo")

Then the relative frequency (proportion) of sample elements in that category is this frequency divided by the sample size:

> sum(Treatment == "Placebo")/length(Treatment)

One can also use this approach to find the proportion of a sample having a value or range of values for a quantitative variable:

> sum(Age >= 21)/length(Age)