A 3x3 confusion matrix is a simple yet powerful tool used in machine learning to evaluate the performance of a classification model. It's a table with three rows and three columns, each representing a different outcome.
The rows represent the actual class labels, while the columns represent the predicted class labels. This allows us to compare the predicted outcomes with the actual outcomes.
The matrix is called "confusion" because it helps us understand how well our model is doing in terms of correctly identifying the classes. In a perfect world, the diagonal of the matrix would be filled with zeros, indicating that all predictions were correct.
The 3x3 confusion matrix is particularly useful when we have three classes to predict.
What Is a Confusion Matrix?
A confusion matrix is an N X N matrix used to evaluate the performance of a classification model, where N is the number of target classes. It compares the actual target values against the ones predicted by the ML model.
The matrix is a result of running a classification model, which produces two outcomes. These outcomes can be easily compared with the actual values of the observation.
A confusion matrix provides a holistic view of how a classification model will work and the errors it will face.
For more insights, see: Binary Classifier
Components of a Confusion Matrix
A 3x3 confusion matrix has four key components: True Positives (TP), False Positives (FP), False Negatives (FN), and True Negatives (TN). These components are generated per label, meaning each label has its own TP, FP, FN, and TN values.
TP represents the number of correct predictions for a specific class. For example, if a model predicts class A and the actual value is also A, it's a True Positive.
FP, on the other hand, represents the number of incorrect predictions when the actual observation is positive. This occurs when a model predicts a class, but the actual value is a different class.
The following table summarizes the components of a 3x3 confusion matrix:
TN represents the number of true predictions when the observation is negative. This occurs when a model does not predict a class, and the actual value is indeed not that class.
For another approach, see: When Should You Use a Confusion Matrix
Contoh
A confusion matrix is a powerful tool for evaluating the performance of a classification model. It's a table that displays the number of true positives, false positives, true negatives, and false negatives.
Let's take a look at a 3x3 confusion matrix example. Here are the key points:
- 50 data points were correctly predicted as label 'A'.
- 5 data points that were actually label 'A' were misclassified as label 'B'.
- 10 data points that were actually label 'A' were misclassified as label 'C'.
The same matrix also shows us that 40 data points were correctly predicted as label 'B', with 4 data points that were actually label 'B' misclassified as label 'A' and 8 data points misclassified as label 'C'.
We can also see that 35 data points were correctly predicted as label 'C', but 7 data points that were actually label 'C' were misclassified as label 'A', and 6 data points were misclassified as label 'B'.
Komponen Penting Dalam
A confusion matrix is a table used to evaluate the performance of a classification model. It's a great tool for understanding how well your model is doing.
The confusion matrix has 4 important components: TP, FP, FN, and TN. These abbreviations might seem daunting, but they're actually quite straightforward.
True Positives (TP) are the number of correct predictions when the actual observation is positive. For example, if a model predicts a class A and the actual value is indeed A, that's a True Positive.
False Positives (FP) are the number of incorrect predictions when the actual observation is positive. Conversely, if a model predicts class A but the actual value is actually class B or C, that's a False Positive.
False Negatives (FN) are the number of incorrect predictions when the actual observation is negative. This means the model missed a correct prediction.
True Negatives (TN) are the number of correct predictions when the observation is negative. If a model predicts class B or C and the actual value is indeed B or C, that's a True Negative.
Here's a breakdown of how to calculate these values:
For example, let's say we're calculating the TP, TN, FP, and FN values for the class Setosa using a 3x3 confusion matrix. We can use the following formulas:
- TP: The actual value and predicted value should be the same. So concerning Setosa class, the value of cell 1 is the TP value.
- FN: The sum of values of corresponding rows except for the TP value (cell 2 + cell 3)
- FP: The sum of values of the corresponding column except for the TP value
- TN: The sum of values of all columns and rows except the values of that class that we are calculating the values for (cell 5 + cell 6 + cell 8 + cell 9)
Understanding Confusion Matrix Metrics
A 3x3 confusion matrix is used when there are three classes, and it's a great way to visualize the performance of a classifier. This type of matrix is particularly useful when working with datasets that have three distinct categories.
The confusion matrix is divided into two main dimensions: actual values and predicted values. This allows us to compare the true values of the observations with the predicted values made by the model.
True Positive (TP) is an important metric in a confusion matrix, and it refers to the predicted value that matches the actual value. In other words, when the model predicts a positive class and it's actually positive.
True Negative (TN) is another key metric, which represents the predicted value that matches the actual value, but in the case of a negative class.
A False Positive (FP) occurs when the predicted value is falsely predicted, and the actual value is negative. This type of error is also known as a Type I error.
False Negative (FN) is another type of error, where the predicted value is falsely predicted, and the actual value is positive. This is also known as a Type II error.
Here's a summary of the confusion matrix metrics:
In a real-world scenario, a classifier with a large number of true positives and true negatives is considered decent, as it indicates that the model is accurately classifying most of the data points.
Calculating Confusion Matrix Values
Calculating Confusion Matrix Values is a crucial step in understanding the performance of your classification model. The values you need to calculate are TP, TN, FP, and FN.
TP is the number of true results when the actual observation is positive. This is where the actual value and predicted value are the same.
To calculate FP, you need to sum the values of the corresponding column except for the TP value. For example, in a 3x3 confusion matrix, the FP value for a class will be the sum of values of the corresponding column except for the TP value.
TN is the number of true predictions when the observation is negative. To calculate TN, you need to sum the values of all columns and rows except the values of that class that you are calculating the values for. For instance, in the IRIS dataset, the TN value for the Setosa class is calculated as (cell 5 + cell 6 + cell 8 + cell 9).
FN is the number of incorrect predictions when the observation is negative. To calculate FN, you need to sum the values of corresponding rows except for the TP value. For example, in the IRIS dataset, the FN value for the Setosa class is calculated as (cell 2 + cell 3).
Here's a quick summary of how to calculate these values:
These calculations will help you understand the performance of your classification model and identify areas for improvement.
Cara Membaca
To read a 3x3 confusion matrix, you need to understand what each value represents. True Positives (TP) for class A is 50, which means the model correctly predicted class A for 50 cases.
False Positives (FP) for class A are 11, which is the sum of 4 cases where class A was predicted but the actual class was B, and 7 cases where class A was predicted but the actual class was C.
False Negatives (FN) for class A are 15, which is the sum of 5 cases where class A should have been predicted but was incorrectly predicted as B, and 10 cases where class A was predicted as C.
True Negatives (TN) for class A are 75, which is the sum of 40 cases where class A was not predicted and the actual class was not A, and 35 cases where class A was not predicted and the actual class was C.
You can break down the confusion matrix into individual classes, just like for class A, to analyze the performance of the model for each class.
Here's a breakdown of the values for each class:
Choosing Performance Indicators
Choosing the right performance indicators is crucial for evaluating your model's performance.
Accuracy is not a great performance indicator for overall ML model performance, but it can be used to compare model outcomes and find optimal values.
Precision is a good indicator to use when you want to focus on reducing false positives, such as in disaster relief efforts where you need to ensure rescues are true positives.
The F1 score limits both false positives and false negatives as much as possible, making it a versatile performance indicator for general performance operations.
In a situation where precision is not the top priority, the F1 score can be a better choice.
Broaden your view: Inception Score
Machine Learning
In machine learning, a confusion matrix is a powerful tool that helps evaluate the performance of a classification model. It's a 3x3 table that shows the true positives, true negatives, false positives, and false negatives for each class.
A 3x3 confusion matrix is used for multi-class classification problems, like the IRIS DATASET, which has 3 classes: Versicolor, Virginia, and Setosa. The model has to classify the given instance as one of these three flowers.
To calculate the TP, TN, FP, and FN values, we need to calculate for each class separately. For example, for the Versicolor class, we calculate the values as follows:
Here's a breakdown of what each value represents:
- TP (True Positive): Correctly predicted instances of Versicolor
- FP (False Positive): Incorrectly predicted instances of Versicolor
- FN (False Negative): Incorrectly predicted instances of Versicolor
- TN (True Negative): Correctly predicted instances of Virginia or Setosa
The confusion matrix helps us predict errors and their exact types, such as Type-I or Type-II. It also helps us calculate the different parameters of the model, like accuracy, precision, and recall.
Frequently Asked Questions
What is the confusion matrix N * N?
A Confusion Matrix's N x N size refers to the total number of target classes in a classification model. This matrix compares predicted and actual target values to evaluate model performance.
Sources
- https://wiki.cloudfactory.com/docs/mp-wiki/metrics/confusion-matrix
- https://haloryan.com/blog/cara-membaca-confusion-matrix-3-x-3-atau-lebih
- https://www.analyticsvidhya.com/articles/confusion-matrix-in-machine-learning/
- https://www.analyticsvidhya.com/blog/2021/06/confusion-matrix-for-multi-class-classification/
- https://www.turing.com/kb/how-to-plot-confusion-matrix
Featured Images: pexels.com