A contingency table and a frequency table are two statistical tools that might seem similar, but they serve different purposes. A contingency table is used to display the relationship between two categorical variables.
Contingency tables are particularly useful for examining the relationship between two variables, as seen in the example of a study on the relationship between exercise frequency and smoking status. The table allows researchers to visualize the data and identify any patterns or correlations.
In contrast, a frequency table is used to summarize the distribution of a single categorical variable. It shows the count or percentage of each category. For instance, a frequency table can be used to display the distribution of different age groups in a population.
Frequency tables are often used in data analysis to get a quick overview of the data, and to identify any outliers or patterns.
Suggestion: Two Way Contingency Table
Understanding Crosstabs
A crosstab is a table used to display the frequencies of two variables. It's created by entering the values of the variables in a table, with the first variable plotted from left to right and the second variable from top to bottom.
Each cell in a crosstab contains the frequencies of the characteristic combinations. For example, in a crosstab showing the relationship between gender and education level, the cell for "female" and "without a degree" might contain the number 6.
The values of one variable are plotted in the rows and the values of the other variable are plotted in the columns. Typically, the independent variable is plotted in the columns and the dependent variable in the rows.
You can choose to display either the absolute or relative frequencies in a crosstab. Absolute frequencies show the exact number of times each combination occurs, while relative frequencies show the proportion of each combination.
A contingency table, like a crosstab, helps calculate probabilities by displaying sample values in relation to two variables. This can facilitate determining conditional probabilities.
Recommended read: A Contingency Table Shows the Frequencies for Categorical Variables.
Crosstab Structure
A crosstab is created by entering the values of two categorical variables into a table, with the first variable plotted from left to right and the second variable from top to bottom.
Each cell in a crosstab is filled with either the absolute or the relative frequency. This means you'll see actual counts or percentages in each box.
In a crosstab, the values of one variable are plotted in the rows and the values of the other variable are plotted in the columns. This setup helps you visualize the relationships between the variables.
The independent variable is usually plotted in the columns, and the dependent variable is plotted in the rows. This is a common convention to follow when creating a crosstab.
In each cell of a crosstab, the frequencies of the characteristic combinations are plotted. This helps you see how often certain characteristics or combinations occur.
Absolute and Relative Frequencies for Crosstabs
In a crosstab, you can choose to display either absolute or relative frequencies in each cell. Absolute frequencies indicate how often a specific combination of values occurs.
To create a crosstab, you can output either absolute or relative frequencies, as explained in Example 4. This choice depends on the context and the story you want to tell with your data.
Consider reading: A Contingency Table Shows the Frequencies for
Absolute frequencies are those values that indicate how often the respective combination of two characteristic values occurs, as stated in Example 5. This is useful when you want to know the exact count of each combination.
Relative frequencies, on the other hand, indicate how often the respective combination of expressions occurs in relation to all cases, as mentioned in Example 6. This is often expressed as a percentage, making it easier to compare different combinations.
By displaying relative frequencies, you can see the proportion of each combination in relation to the total number of cases. This can be particularly helpful when you have a large dataset.
Relative Frequencies
Relative frequencies are a way to output crosstab data. They show how often each combination of characteristic values occurs in relation to the total number of observations.
Relative frequencies are useful for comparing the frequency of different combinations across the entire dataset. This can help identify patterns and trends that might not be apparent from looking at absolute frequencies alone.
In a crosstab, relative frequencies are calculated by dividing the absolute frequency of each combination by the total number of observations.
Example and Explanation
A contingency table is a powerful tool for analyzing data, and it's often used in conjunction with a frequency table. But what's the difference between the two? Let's take a look at some examples to understand the concept better.
A contingency table is a table that displays the frequency of two variables, such as gender and whether or not someone uses an umbrella, as seen in Example 1. This table shows the absolute frequencies of the respective characteristic combinations, which is calculated by counting the number of people in each category.
In Example 2, we have a contingency table that shows the frequency of speeding violations and cell phone use. This table is used to calculate various probabilities, such as the probability of a person being a car phone user.
To create a contingency table, you need to have two variables, and the table will display the frequency of each combination of these variables. For instance, in Example 1, the contingency table shows the frequency of people who are female and use an umbrella, as well as the frequency of people who are male and use an umbrella.
Here's a summary of the key points:
The contingency table can be used to calculate various probabilities, such as the probability of a person being a car phone user, as seen in Example 2. This is done by dividing the frequency of the desired combination by the total number of people in the sample.
For example, the probability of a person being a car phone user is calculated by dividing the frequency of car phone users (305) by the total number of people in the sample (755).
The contingency table can also be used to determine the probability of an AND event, such as the probability of a person stretching before exercising and having no injury in the last year, as seen in Example 3.
In this case, the contingency table shows the frequency of people who stretch before exercising and have no injury in the last year, as well as the frequency of people who do not stretch before exercising and have no injury in the last year.
To calculate the probability of an AND event, you need to divide the frequency of the desired combination by the total number of people in the sample. For instance, the probability of a person stretching before exercising and having no injury in the last year is calculated by dividing the frequency of people who stretch before exercising and have no injury in the last year (295) by the total number of people in the sample (800).
The contingency table can also be used to determine the probability of an OR event, such as the probability of a person being a car phone user or having no speeding violation in the last year, as seen in Example 2.
In this case, the contingency table shows the frequency of people who are car phone users and have no speeding violation in the last year, as well as the frequency of people who are not car phone users and have no speeding violation in the last year.
To calculate the probability of an OR event, you need to add the frequencies of the two desired combinations and divide by the total number of people in the sample. For instance, the probability of a person being a car phone user or having no speeding violation in the last year is calculated by adding the frequencies of car phone users (305) and people who have no speeding violation in the last year (685) and dividing by the total number of people in the sample (755).
The contingency table can also be used to determine the probability of a given event, such as the probability of a person preferring a hilly path, as seen in Example 4.
In this case, the contingency table shows the frequency of people who prefer a hilly path and are male, as well as the frequency of people who prefer a hilly path and are female.
To calculate the probability of a given event, you need to divide the frequency of the desired combination by the total number of people in the sample. For instance, the probability of a person preferring a hilly path is calculated by dividing the frequency of people who prefer a hilly path and are male (52) by the total number of males in the sample (55).
The contingency table can also be used to determine whether two events are independent, such as the events "being male" and "preferring the hilly path", as seen in Example 4.
To determine whether two events are independent, you need to calculate the probability of one event given the other event, and then compare it to the probability of the first event. If the two probabilities are equal, then the events are independent. Otherwise, the events are not independent.
For instance, in Example 4, the probability of a person preferring a hilly path given that they are male is calculated by dividing the frequency of people who prefer a hilly path and are male (52) by the total number of males in the sample (55). This probability is equal to the probability
Broaden your view: Contingency Table Probability
Tables
A contingency table is a powerful tool for analyzing data and visualizing relationships between variables. It's essentially a table that displays the frequency of observations for different combinations of variables.
The table helps in determining conditional probabilities quite easily, as seen in Example 2. A contingency table provides a way of portraying data that can facilitate calculating probabilities.
To create a contingency table, you need to have a dataset with two or more variables. For instance, in Example 1, the student counted how many people "with" and how many "without" umbrellas came to the statistics lecture, and also made a note of the sex of the students.
You might like: Model Drift vs Data Drift
The cross-classified table now contains the absolute frequencies of the respective characteristic combinations. This is calculated by counting the number of observations in each category. For example, in Example 1, the table shows that there were 5 female students with umbrellas and 7 female students without umbrellas.
A contingency table can also be used to calculate probabilities. For example, in Example 3, the table shows the number of athletes who stretch before exercising and how many had injuries within the past year. To find the probability of an athlete stretching before exercising, you can simply divide the number of athletes who stretch by the total number of athletes.
Here are some key characteristics of contingency tables:
- They display the frequency of observations for different combinations of variables.
- They can be used to calculate conditional probabilities.
- They can be used to identify relationships between variables.
- They are often used in conjunction with other statistical techniques, such as chi-squared tests.
For example, in Example 3, the table shows the number of athletes who stretch before exercising and how many had injuries within the past year. To find the probability of an athlete stretching before exercising, you can simply divide the number of athletes who stretch by the total number of athletes.
If this caught your attention, see: In a Contingency Table the Number of Rows and Columns
Tables can be used to display data in a clear and concise manner. For example, in Example 1, the table shows the number of students with and without umbrellas, as well as their sex. This makes it easy to see the relationships between the variables.
Here is an example of a contingency table:
Frequently Asked Questions
What is the difference between summary table and contingency table?
A summary table and contingency table are often used interchangeably, but a contingency table specifically refers to a table summarizing two or more classification variables. In essence, all contingency tables are summary tables, but not all summary tables are contingency tables.
Sources
- https://medium.com/@Mamdouh.Refaat/analysis-of-contingency-tables-43c100b4e9b8
- https://datatab.net/tutorial/cross-table
- https://courses.lumenlearning.com/introstats1/chapter/contingency-tables/
- https://openstax.org/books/introductory-business-statistics-2e/pages/3-4-contingency-tables-and-probability-trees
- https://www.statsref.com/HTML/contingency_tables.html
Featured Images: pexels.com