Working with 2x2 contingency tables in statistics can be a straightforward process, especially when you understand the basics. In a 2x2 contingency table, there are two independent variables with two levels each.
The table is used to display the frequency of each possible combination of the two variables. For example, a study on the relationship between exercise and weight loss might use a 2x2 contingency table to show the number of participants who lost weight versus those who didn't, based on their exercise habits.
To create a 2x2 contingency table, you need to identify the two variables and their levels, and then count the frequency of each combination. This can be done using a simple table with rows and columns, where each cell represents a unique combination of the two variables.
Intriguing read: Contingency Table vs Frequency Table
Chi-Square Test
The Chi-Square Test is a statistical method used to determine if there's a significant association between two categorical variables. It's commonly used in 2x2 contingency tables, which display the frequency of two variables.
Curious to learn more? Check out: Two Way Contingency Table
The test calculates the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table. The expected frequencies are computed based on the marginal sums under the assumption of independence.
To perform the Chi-Square Test, you can use the `chi2_contingency` function in Python, which computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table.
The degrees of freedom for the Chi-Square Test is calculated as (# rows - 1) x (# columns - 1), which is 1 in the case of a 2x2 contingency table.
The critical value of the Chi-Square Test can be obtained from a chi-square distribution table, and it's used to determine if the calculated test statistic is greater than the critical value, which would lead to rejecting the null hypothesis.
Here's a summary of the Chi-Square Test:
The Chi-Square Test is a powerful tool for determining if there's an association between two categorical variables, and it's widely used in various fields, including medicine, social sciences, and marketing.
Compute Expected Values in Cells A, B, C, D
To compute expected values in cells A, B, C, and D, you need to calculate the expected proportions of individuals that lived and died. The expected proportion of individuals that lived in the entire sample is 0.5470, while the expected proportion of individuals that died is 0.4530.
To calculate the expected values in cells A and B, you multiply the expected proportion of living or dying individuals by the corresponding row total. For example, the expected proportion of individuals that lived and were HG+ is 0.5470 x 218 = 119.25.
The expected proportion of surviving individuals that were HG- is 0.5470 x 144 = 78.768. Similarly, the expected proportion of individuals that died that were HG- is 0.4530 x 144 = 65.232.
Here is a summary of the expected values in cells A, B, C, and D:
These expected values are calculated under the null hypothesis, and they provide a baseline for comparing the observed values in the contingency table.
Measures of Association
Measures of association are crucial when analyzing 2x2 contingency tables. They help determine the strength and direction of the relationship between two variables.
The phi coefficient (φ) is a simple measure of association, applicable only to 2x2 contingency tables. It varies from 0 (no association) to 1 or -1 (complete association or complete inverse association). The sign of φ equals the sign of the product of the main diagonal elements of the table minus the product of the off-diagonal elements.
The lambda coefficient measures the strength of association of the cross tabulations when the variables are measured at the nominal level. Its values range from 0.0 (no association) to 1.0 (the maximum possible association).
A different take: Chi Square 2x2 Contingency Table Exmaple
Measures of Association
Measures of Association are essential in understanding the relationship between variables in a contingency table. The degree of association between two variables can be assessed using various coefficients, such as Cramér's V and the contingency coefficient C.
Cramér's V is a popular measure of association that can be used with contingency tables of any size. It has a simple formula, and its values range from 0 to 1, with 1 indicating a perfect association.
The contingency coefficient C, on the other hand, has a similar range, but it doesn't reach a maximum of 1 in 2x2 tables. To adjust for this, you can divide C by the square root of (k-1)/k, where k is the number of rows or columns.
Another measure of association is the lambda coefficient, which is suitable for nominal-level variables. Its values range from 0 to 1, with 1 indicating a perfect association. There are two types of lambda coefficients: asymmetric and symmetric, which measure the improvement in predicting the dependent variable in different directions.
The uncertainty coefficient, or Theil's U, is another measure of association for nominal-level variables. Its values range from -1 to 1, with 0 indicating no association. This coefficient is asymmetrical and can provide insights not evident in symmetrical measures.
For 2x2 contingency tables, the odds ratio is a simple measure of association. It's defined as the ratio of the odds of an event in the presence of another event to the odds of the same event in the absence of the other event. If the odds ratio is 1, the events are independent; if it's greater than 1, they're positively associated; and if it's less than 1, they're negatively associated.
Here's a brief summary of the measures of association mentioned:
These measures of association can help you understand the relationship between variables in a contingency table and make informed decisions based on the data.
Symmetry and Homogeneity
Symmetry is the property that the probability of a particular combination of values for the row and column factors is the same, regardless of the order in which they are listed. This means that the probability of a row and column combination is the same as the probability of the column and row combination.
To test for symmetry, the table must be square, and the row and column categories must be identical and occur in the same order. For example, in a table assessing visual acuity in people's left and right eyes, the row and column categories must be identical.
The Table class contains methods for analyzing r x c contingency tables, and the SquareTable object can be created from a contingency table. The summary method prints results for the symmetry testing procedure.
The individual case records can also be used to perform the same analysis by passing the raw data using the SquareTable.from_data class method. This is useful when working with large datasets and need to perform the same analysis on different subsets of the data.
Consider reading: Contingency Table Probability
Sources
- https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
- https://biostatistics.letgen.org/mikes-biostatistics-book/inferences-categorical-data/chi-square-contingency-tables/
- https://en.wikipedia.org/wiki/Contingency_table
- https://whitlockschluter3e.zoology.ubc.ca/RExamples/Rcode_Chapter_9.html
- https://www.statsmodels.org/dev/contingency_tables.html
Featured Images: pexels.com