Skip to main content

Classification & Confusion Matrix & Accuracy Paradox

Classification 

work on voting the object belongs from which classes has more probability


  •  There are two types of classification :

  1. Binary classification : There are two classes we have ex: male-female, cat-dog, yes-not 
  2. Multiple classification :  There are classes more than two we have ex: traffic signs, face recognition, flower race , Digit Recognition


Confusion matrix : 

Confusion matrix is one type of technique to evaluate the model accuracy for classification problem. In this technique we consider how many of positive and negative data points we predict correctly.


The main consideration terms are accuracy, precision and recall
The accuracy was an appealing matric, because it was a single number. Here precision and recall(sensitivity) are two numbers. So to get the final score (accuracy) of our model we use F1 score, so that we have a single number.

Here is the F1 score's mathematical formula:

F1 = 2x precision x recall / (precision + recall)



Terms Meaning
 Sensitivity (Recall , Total Positive Rate)          true positive per total actual positive         
 Specificity (Total Negative Rate)   true negative per total actual negative
 Precision   true positive per total predicted positive
 Negative Predicted Values   true negative per predicted negative
 Accuracy:   true predictions per total number of testing features  
 Fall out (False Positive Rate)   false positive per actual positive
 False discovery  (False Negative Rate)   false positive per predicted positive
 Missing Rate   false negative per actual negative
 False omission   false negative per predicted negative

Explain Accuracy Paradox

    Accuracy Paradox means there is contradiction arises about good accuracy of model. Being a high accurate model is sometime doesn't need and practically proved not good model. Means if a model has a higher accuracy rate than another, it doesn't mean necessarily mean it is better than the second one. As a developer, we have to look at the results and data further to see if the model is good or not.
    
     Let's say we have training set of 100 places from which we have to predict which has chances of earthquake and which has not. In reality, it happens occurs in 5 places.
     
     Model 1: Predict the two places hasn't chances of earthquake. But it's  don .In that case, the accuracy of the model is 98%.
     
     Model 2: Predict the 8 places has chances of earthquake, but it didn't occur at those places, the accuracy of the model is  92%.
     
    When we compare the two models, and we find out the Model 1 is high accurate than Model 2. But when we see the Model 1 doesn't predict that the remain three places has chance and it happened, so it would be dangerous. And when we look at the Model 2 that predict the 8 places has chances even it didn't happen. In this case peoples would come out of danger. Accordingly, this matter of description the Model 1 is far more useful for the testing.
    
    So, we can conclude that, accuracy is not a great judge of a classification model. We need to delve deeper to find out, how our model is working based on the information given by the dataset. 

Comments

Popular posts from this blog

Python program to check if variable is of integer or string

Let's say if you want to input something of any datatype and want to get datatype only of it. So... Whenever you input some data whether it is string, integer or float like this: i = input('enter something here: ') means without int, str or float put before the syntax, that time your given input is always consider as string  or if you make it like this to add int before syntax; i = int(input('enter something here: '))  it always consider as integer and gives value error when you input string and same thing happens with float, So here is a program to solve this problem of input and get datatype var = input('input to check if variable is of integer or string: ') if var.isdigit() == False:     print(type(var)) else:     var1 = int(var)     print(type(var1))

Multiple classification from many of directories

  # %%  Import nessacary libraries import  numpy  as  np import  pandas  as  pd import  cv2 import  matplotlib.pyplot  as  plt import  os import  glob # %%   Keras Tensorflow libraries from  keras  import  layers from  keras.models  import  Model from  keras.optimizers  import  RMSprop , Adam , Nadam from  keras.preprocessing.image  import  ImageDataGenerator from  keras.layers  import  Input, BatchNormalization, Dense, Dropout, Conv2D, Flatten, GlobalAveragePooling2D, LeakyReLU from  keras.preprocessing.image  import  ImageDataGenerator, img_to_array, load_img # %%  Path path  =   r 'G:/Machine Learning/Project/Lego Mnifigures Classification/dataset' open_dir  =  os....

Digit Recognition

Here you can import digit dataset from scikit learn library which is in-built, So you don't need to download from other else Note: If you use visual code, I recommend you to turn your color theme to Monokai because it has a few extra and important keyword and attractive colors than other theme.   # %%  Import libraries import  numpy  as  np import  pandas  as  pd import  matplotlib.pyplot  as  plt import  random  # %%   Load dataset from  sklearn.datasets  import  load_digits dataset  =  load_digits() dataset.keys() output: d ict_keys(['data', 'target', 'target_names', 'images', 'DESCR']) You have to check all to direct print them Here DESCR is a description of dataset # %%   divide the dataset into input and target inputs  =  dataset.data target  =  dataset.target # %% ...