Classification
work on voting the object belongs from which classes has more probability
- There are two types of classification :
- Binary classification : There are two classes we have ex: male-female, cat-dog, yes-not
- Multiple classification : There are classes more than two we have ex: traffic signs, face recognition, flower race , Digit Recognition
Confusion matrix :
Confusion matrix is one type of technique to evaluate the model accuracy for classification problem. In this technique we consider how many of positive and negative data points we predict correctly.
The main consideration terms are accuracy, precision and recall
The accuracy was an appealing matric, because it was a single number. Here precision and recall(sensitivity) are two numbers. So to get the final score (accuracy) of our model we use F1 score, so that we have a single number.
Here is the F1 score's mathematical formula:
F1 = 2x precision x recall / (precision + recall)
Terms | Meaning |
---|---|
Sensitivity (Recall , Total Positive Rate) | true positive per total actual positive |
Specificity (Total Negative Rate) | true negative per total actual negative |
Precision | true positive per total predicted positive |
Negative Predicted Values | true negative per predicted negative |
Accuracy: | true predictions per total number of testing features |
Fall out (False Positive Rate) | false positive per actual positive |
False discovery (False Negative Rate) | false positive per predicted positive |
Missing Rate | false negative per actual negative |
False omission | false negative per predicted negative |
Explain Accuracy Paradox
Accuracy Paradox means there is contradiction arises about good accuracy of model. Being a high accurate model is sometime doesn't need and practically proved not good model. Means if a model has a higher accuracy rate than another, it doesn't mean necessarily mean it is better than the second one. As a developer, we have to look at the results and data further to see if the model is good or not.
Let's say we have training set of 100 places from which we have to predict which has chances of earthquake and which has not. In reality, it happens occurs in 5 places.
Model 1: Predict the two places hasn't chances of earthquake. But it's don .In that case, the accuracy of the model is 98%.
Model 2: Predict the 8 places has chances of earthquake, but it didn't occur at those places, the accuracy of the model is 92%.
When we compare the two models, and we find out the Model 1 is high accurate than Model 2. But when we see the Model 1 doesn't predict that the remain three places has chance and it happened, so it would be dangerous. And when we look at the Model 2 that predict the 8 places has chances even it didn't happen. In this case peoples would come out of danger. Accordingly, this matter of description the Model 1 is far more useful for the testing.
So, we can conclude that, accuracy is not a great judge of a classification model. We need to delve deeper to find out, how our model is working based on the information given by the dataset.
Comments
Post a Comment