A contingency table is a simple yet powerful tool for analyzing relationships between two categorical variables. It's essentially a table that displays the frequency of each combination of the two variables.
The table is typically organized into rows and columns, with the rows representing one variable and the columns representing the other. The cell at the intersection of each row and column shows the number of observations that fall into that particular category.
By examining the contingency table, you can quickly identify patterns and trends in the data, such as which categories are most likely to occur together or separately. This can be particularly useful in fields like medicine, social sciences, and marketing.
You might enjoy: Two Way Contingency Table
Sampling Schemes
A contingency table is a powerful tool for presenting cross-classification data of two categorical variables. The data is arranged in a table with rows and columns, where each cell represents the count of observations that fall into a specific category.
In a contingency table, the sampling scheme refers to the method used to collect the data. There are several types of sampling schemes, each with its own probability distribution.
The multinomial sampling scheme is one type of sampling scheme where the total sample size is fixed. In this case, the probability of observing a particular combination of categories is given by the multinomial distribution. For example, in a hypothetical crossover trial, a fixed sample of 100 patients received a low dose and then a high dose of a treatment, and the responses were recorded. The data were cross-classified to report the number of treatments for which both treatments were successful, both failed, or one succeeded and the other failed.
The product multinomial sampling scheme is another type of sampling scheme where the marginal row sizes are fixed. In this case, the probability of observing a particular combination of categories is given by the product of the multinomial distributions for each row.
Here are some key differences between the multinomial and product multinomial sampling schemes:
The product binomial sampling scheme is a special case of the product multinomial sampling scheme where the marginal row sizes are fixed, and the data are binary. In this case, the probability of observing a particular combination of categories is given by the product of the binomial distributions for each row.
Multinomial Sampling Scheme
A multinomial sampling scheme is a scenario where the total sample size is fixed, and the counts in each category are collected independently. This is in contrast to a product multinomial sampling scheme, where the marginal row sizes are fixed.
The multinomial distribution is characterized by a probability vector π, which represents the joint distribution of the two categorical variables. The probability of observing a particular combination of counts is given by the multinomial probability mass function.
Here are the key features of a multinomial sampling scheme:
- The total sample size n++ is fixed.
- The counts in each category are collected independently.
- The probability vector π represents the joint distribution of the two categorical variables.
- The multinomial probability mass function is used to calculate the probability of observing a particular combination of counts.
In practice, a multinomial sampling scheme is often used in experiments where the sample size is fixed, and the counts in each category are collected independently. For example, in a crossover trial, patients receive a low dose and then a high dose of a treatment, and the response is recorded. The counts in each category represent the number of patients who respond in each way.
Here's an example of a multinomial sampling scheme in action:
In this example, the total sample size is 100, and the counts in each category represent the number of patients who respond in each way.
Which to Use?
When working with contingency tables, it's essential to choose the right statistical test to evaluate association between variables. The three primary methods for computing a P value are Chi-square, Fisher's exact test, and Yates' continuity correction.
Chi-square is the standard method, providing an approximate P value, and is best suited for large sample sizes. It's also known as the chi-square test of independence.
Fisher's exact test is used for small sample sizes, specifically when marginal totals are fixed. It's called an exact test, but it's only exact under specific conditions.
Yates' continuity correction can be used alongside Chi-square to make the approximation more conservative. However, it's not commonly used and its effect is negligible for large samples.
The choice between a one-tailed and two-tailed test depends on the research question. Two-tailed tests are more common for contingency tables.
Here's a summary of the three methods:
Probability Calculations
Probability calculations can be performed using contingency tables, which display the observed frequency of two variables. These tables can be used to calculate the probability of an outcome.
The probability of getting an outcome X=i given that the outcome Y=j is represented by πi|j = πij / π+j. This conditional probability is useful for understanding the relationship between the variables.
To calculate the probability of an outcome, you can use the formula: πij = (number of outcomes where X=i and Y=j) / (total number of outcomes). This formula can be applied to each cell in the contingency table.
Here's a summary of the probability-related quantities that can be calculated using contingency tables:
- Probability that a person prefers their ice cream in a cup: 410/1002
- Probability that a random participant is female: 1000/2200
- Conditional probability that a person prefers ice cream sandwiches given that the person is male: 24/1200
- Conditional probability that a person is male given that ice cream sandwiches are preferred: 24/44
Theoretical Probabilities
Theoretical probabilities are a fundamental concept in probability calculations. They represent the likelihood of specific outcomes occurring.
To define theoretical probabilities, we use the notation \(\pi_{ij} = P(X=i, Y=j)\), which gives us the probability of getting outcome \((i,j)\).
Theoretical probabilities can be tabulated in a contingency table, similar to the one shown in Figure 2.3. This table displays the distribution \(\pi_{ij} = P(X=i, Y=j)\) of responses \((X,Y)\).
We can also calculate marginal probabilities, which give us the probability of getting an outcome \(X=i\) regardless of the outcome of \(Y=j\), denoted as \(\pi_{i+} = \sum_j \pi_{ij}\).
Another important marginal probability is \(\pi_{+j} = \sum_i \pi_{ij}\), which gives us the probability of getting an outcome \(Y=j\) regardless of the outcome of \(X=i\).
Note that the sum of all theoretical probabilities is equal to 1, denoted as \(\pi_{++} = \sum_{i,j} \pi_{ij} = 1\).
Here's a summary of the key notation:
- \(\pi_{ij} = P(X=i, Y=j)\): Theoretical probability of getting outcome \((i,j)\)
- \(\pi_{i+} = \sum_j \pi_{ij}\): Marginal probability of getting an outcome \(X=i\)
- \(\pi_{+j} = \sum_i \pi_{ij}\): Marginal probability of getting an outcome \(Y=j\)
- \(\pi_{++} = \sum_{i,j} \pi_{ij} = 1\): Sum of all theoretical probabilities
Conditional Probabilities
Conditional probabilities are a crucial aspect of probability calculations. They help us understand the likelihood of an event occurring given that another event has occurred.
Conditional probabilities can be calculated using contingency tables. For example, if we have a contingency table showing the number of people who prefer ice cream in a cup and those who prefer ice cream sandwiches, we can calculate the probability that a person prefers ice cream sandwiches given that they are male.
The formula for conditional probability is πi|j = πij / π+j, where πi|j is the probability of getting an outcome X=i given that the outcome Y=j.
A contingency table can be used to calculate conditional probabilities quite easily. For instance, if we have a table showing the number of people who prefer ice cream in a cup and those who prefer ice cream sandwiches, we can calculate the probability that a person prefers ice cream sandwiches given that they are male.
The table helps in determining conditional probabilities quite easily. The table displays sample values in relation to two different variables that may be dependent or contingent on one another.
Conditional probabilities can also be used to determine the probability that a person is male given that they prefer ice cream sandwiches.
The probability that a person is male given that ice cream sandwiches are preferred is πj|i = πij / πi+, where πj|i is the probability of getting an outcome Y=j given that the outcome X=i.
Here's a table showing the conditional probabilities of the outcome Y given outcome X:
Note that the table shows the conditional probabilities of the outcome Y given outcome X, not the other way around.
Fisher's Calculation Details
Fisher's test is rarely calculated by hand and can be very intensive even for a computer.
Several methods exist to calculate Fisher's test, and this is one of them.
The summing small P values method is used in this calculator.
Fisher's test is a complex calculation that's usually left to computers.
This method involves adding up small P values, which can be a time-consuming process.
Calculating Fisher's test by hand is not a common practice, even among experienced statisticians.
Chi-Square Test
The Chi-Square Test is a powerful statistical tool used to determine whether there's a significant association between two categorical variables. It's based on the idea that if there's no association, the observed frequencies should be close to the expected frequencies.
The Chi-Square Test statistic is calculated using the formula: X^2 = ∑[(O_j - E_j)^2 / E_j], where O_j are the observed frequencies, E_j are the expected frequencies, and m is the total sample size.
Readers also liked: Chi Square Contingency Table
In a Chi-Square Test of Independence, the test statistic X^2 follows a χ^2_(I-1)(J-1) distribution, where I and J are the number of rows and columns in the contingency table.
The degrees of freedom for a Chi-Square Test of Independence are (I-1)(J-1), which is calculated by subtracting 1 from the total number of cells in the table and then subtracting the number of parameters under estimation.
To determine whether to reject the null hypothesis, we compare the calculated X^2 value to the critical value from the χ^2 distribution with (I-1)(J-1) degrees of freedom. If the calculated X^2 value is greater than or equal to the critical value, we reject the null hypothesis.
Here's a summary of the Chi-Square Test formula and degrees of freedom:
This means that if the observed frequencies vary significantly from the expected frequencies, we can conclude that there's a significant association between the two variables.
Explore further: A Contingency Table Shows the Frequencies for
Frequently Asked Questions
What is a contingency table example?
A contingency table is a simple example of a 2x2 or larger grid that displays the relationship between two or more categorical variables, such as gender and computer type. For instance, a 2x3 table can be created by adding a new column for a third type of computer.
What is the joint probability in a contingency table?
The joint probability in a contingency table represents the proportion of subjects that fall into a specific category of X and a specific category of Y. It's calculated by dividing the cell count by the total count in the table.
Sources
- https://bookdown.org/ssjackson300/ASM_Lecture_Notes/twocontingencytables.html
- https://mathworld.wolfram.com/ContingencyTable.html
- https://courses.lumenlearning.com/introstats1/chapter/contingency-tables/
- https://openstax.org/books/introductory-business-statistics-2e/pages/3-4-contingency-tables-and-probability-trees
- https://www.graphpad.com/quickcalcs/contingency1/
Featured Images: pexels.com