Hey, folks! In this article, we will learn how to calculate precision in Python which is a Classification Error Metric.
So, let us begin!
What is Precision?
Let us understand the need for Error Metrics in Classification or Regression Algorithms.
Error Metrics helps us analyze the accuracy of a particular machine learning model over a dataset or set of data values. There are different error metrics for different types of machine learning algorithms.
Error metrics for Regression data–
- Mean Square Error
- Root Mean Square Error
- R square
- Adjusted R square, etc
Error metrics for Classification —
- Confusion Matrix
- Accuracy
- Precision
- Recall
- f1 Score, etc
Precision identifies the correctly classified positive labels from the classified data values.
With Precision, we tend to measure the positive labels that are predicted correctly and are actually correct!
Have a look at the below formula–
Precision = True Positives / (True Positives + False Positives)
Here, the True Positive and False Positive values can be calculated through the Confusion Matrix. The value of Precision ranges between 0.0 to 1.0 respectively.
By True positive, we mean the values which are predicted as positive and are actually positive. While False Positive values are the values which are predicted as positive but are actually negative.
Let us now implement this in the upcoming section through an example.
Implementing Precision with a Classification Algorithm
We have tried implementing Precision as a measure with Decision Tree Algorithms.
Let us start implementing the same!!
In this example, we have used Bank Loan Defaulter dataset. This problem refers to the prediction of the loan defaulters from the bank’s dataset.
1. Load the dataset
Here, we have used Bank Loan Dataset and imported the same into the environment using pandas.read_csv() function.
1 2 3 |
import pandas as pd import numpy as np loan = pd.read_csv("bank-loan.csv") # dataset |
2. Splitting the dataset
Splitting of the dataset into training and testing set is performed using train_test_split()
function as shown below–
1 2 3 |
from sklearn.model_selection import train_test_split X = loan.drop(['default'],axis=1) Y = loan['default'].astype(str) |
3. Defining Error Metrics
We have defined the Confusion Matrix and Precision calculation to be used for the evaluation of the model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Error metrics -- Confusion matrixFPRFNRf1 score def err_metric(CM): TN = CM.iloc[0,0] FN = CM.iloc[1,0] TP = CM.iloc[1,1] FP = CM.iloc[0,1] precision =(TP)/(TP+FP) accuracy_model =(TP+TN)/(TP+TN+FP+FN) recall_score =(TP)/(TP+FN) specificity_value =(TN)/(TN + FP) False_positive_rate =(FP)/(FP+TN) False_negative_rate =(FN)/(FN+TP) f1_score =2*(( precision * recall_score)/( precision + recall_score)) print("Precision value of the model: ",precision) print("Accuracy of the model: ",accuracy_model) |
4. Modelling
We have applied the Decision Tree Algorithm to identify the loan defaulters from the data.
1 2 3 4 |
#Decision Trees decision = DecisionTreeClassifier(max_depth= 6,class_weight="balanced" ,random_state =0).fit(X_train,Y_train) target = decision.predict(X_test) targetclass_prob = decision.predict_proba(X_test)[:, 1] |
5. Evaluation of model
Finally, we have evaluated the model by calling the defined confusion matrix and precision format.
1 2 |
confusion_matrix = pd.crosstab(Y_test,target) err_metric(confusion_matrix) |
Output:
1 2 3 |
<span style="color: #008000;"><strong>Precision value of the model: 0.25 Accuracy of the model: 0.6028368794326241 </strong></span> |
So, the output states that 25% of the values predicted as positive by the model are actually positive.
Conclusion
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. Till then, Happy Learning! 🙂