Measurement Levels

[A, SfS] Chapter 1: Sampling, Descriptive Statistics, Intr: 1.4: Measurement Levels

Measurement Levels

In this lesson you will learn:

The difference between categorical variables (nominal, ordinal) and quantitative variables (discrete, continuous).

$\text{}$

Earlier, we defined a variable as some feature which can be measured on the elements of a sample. Variables can be classified among different measurement levels which determine which statistical tool you can use.

Variables can be divided into two groups:

Categorical variables (possible values are categories or groups)
Quantitative variables (possible values are numerical)

Among the categorical variables, there are two different measurement levels:

Nominal
Ordinal

Nominal Variables

Nominal variables are based on a classification of elements into specific categories without any logical ranking among the categories.

“Shirt color”, might have the categories: red, blue and green. There is no inherent ranking in a list of different colors.

There is no logical reason why “red, blue, green” would be preferred over “blue, green, red” or “green, red, blue”.

$\text{}$

Ordinal Variables

Ordinal variables are based on a classification of elements into categories which do have a logical ranking.

“Weather alarms” must have the categories: none, yellow, orange and red.

There is a logical ranking of these different categories, from least severe to most severe.

$\text{}$

For quantitative variables we also have two different levels of measurement:

Discrete
Continuous

Discrete Variables

Discrete variables are measured on a numerical scale, but the set of all possible values of such a variable can be placed in one-to-one correspondence with the set of natural numbers ${1,2,3,...}$ .

This includes:

Quantitative variables for which the set of possible values is finite.
Quantitative variables for which the set of possible values is infinite, but countable.

In this course, discrete variables will usually either be binary variables (possible values $0$ or $1$ ) or variables whose possible values are in a finite set of non-negative integers (such as ${0,1,2,3,4,5}$ ).

An example of a finite set would be the score you could earn on your exam.

An example of an infinite, but countable, set would be the number of times you must repeat your exam until you pass it.

$\text{}$

Continuous Variables

Continuous variables are quantitative variables whose range of possible values is not only infinite, but also uncountable, such as the set of all real number in some interval.

The time for a rat to complete the task in a maze is continuous.

If we set an upper bound of $5$ minutes for the task, then the range of possible values (in minutes) is $(0,5]$ , or in seconds is $(0,300]$ .

In practice, we measure continuous variables such as time, length, weight, velocity, temperature, etc., using instruments which can only provide measurements up to a fixed number of decimal places. This makes the range of possible values finite. But we still consider the variable to be continuous, because in principle one could eventually develop an instrument that can output more decimal places and thereby expand the range of possible values.

Furthermore, if a discrete variable has a very large range of possible values, such as the annual salary of an employee, statisticians often find it convenient to treat it as if it was continuous.

A marathon runner finishes $x$ minutes behind the first-place finisher. The variable $x$ is measured on a(n) ... level.

Nominal

Ordinal

Discrete

Continuous

$\text{}$

With measurement of a quantitative variable on a sample, the values will generally be clustered around some central value, but due to individual differences among the elements of the sample the values will be dispersed around that center. But there may also be some unusual values which are far below or above the center which don't fit the pattern observed in the sample. These are called outliers.

Outliers

An outlier is an exceptionally high or low value that does not conform to the pattern observed for the majority of the data.

A researcher needs to assess each outlier to determine whether it is possibly due to an error (and thus can be excluded from the data set) or could be a legitimate value (and that should remain in the data set).

An outlier could be the result of an error in data entry. Someone could have typed an extra $0$ when recording the data, or a data-recording machine or software might malfunction.

But an outlier could also be $100$ prison inmates are given a math exam. Ninety-nine of the inmates never finished high school, while one of them has a bachelor's degree in statistics. His math exam score is likely to be an outlier in that sample.

$\text{}$

Summary

Summary

The features measured on the sample elements in a study are called variables.

Variables have four possible measurement levels:

Categorical: Nominal, Ordinal
Quantitative: Discrete, Continuous

$\text{}$

Using R

Classifying objects

Each object in your $\mathrm{R}$ workspace is of a certain type, its $\mathtt{"class"}$ .

We can use the $\mathtt{class()}$ function to find out its class. Measurements on a variable are stored in a vector. But $\mathtt{"vector"}$ itself is not a class in $\mathrm{R}$ . Its class depends on the information contained in the vector.

If the measurements are the names of categories for a categorical variable, the class of the vector will be $\mathtt{"character"}$ .

> MyVector = c("red","blue","green","red","blue","green")

> class(MyVector)

[1] "character"

$\text{}$

But suppose you coded the categories with numbers, i.e., $\mathtt{blue=1, green=2, red=3}$ . Then:

> MyVector = c(3,1,1,2,3,1,2)

> class(MyVector)

[1] "numeric"

Reclassifying objects

Thus $\mathrm{R}$ will treat the vector as if it contained measurements on a quantitative variable, as it is now $\mathtt{"numeric"}$ . That will usually not matter, but if you want to force the vector to have the character class you could use the $\mathtt{as.character()}$ function:

> MyVector = as.character(MyVector)

> MyVector

[1] "3" "1" "1" "2" "3" "1" "2"

Now the numbers are treated as names, as seen by the $\mathtt{" "}$ signs.

There is no distinction in $\mathrm{R}$ between nominal and ordinal variables, and no distinction between discrete and continuous variables. But categorical variables can also have the $\mathtt{"factor"}$ class, and quantitative variables can also have the $\mathtt{"integer"}$ class.

Some $\mathrm{R}$ objects are contained inside other $\mathrm{R}$ objects. For example, a $\mathtt{"data.frame"}$ may contain five columns, of which two might be $\mathtt{"numeric"}$ vectors, one might be a $\mathtt{"character"}$ vector, another might be a $\mathtt{"factor"}$ and the last might be an $\mathtt{"integer"}$ vector.