Hey, readers! Today, we will be focusing on an important error metric for Classification Algorithms โ F1 Score in Python. So, let us begin!
What is F1 score?
F1 score
is a Classification error metric that like any other error metric helps us evaluate the performance of an algorithm. It helps us evaluate the performance of the machine learning model in terms of binary classification.
It is a combination of precision
and recall
metrics and is termed as the harmonic mean of precision and recall. It is basically used in cases when the data is imbalanced or there is a binary classification in the dataset.
Have a look at the below formulaโ
1 |
F1 = 2 * (precision * recall) / (precision + recall) |
F1 score increases as the precision and recall value rises for a model.
A high score indicates that the model is well versed in terms of handling the class imbalance problem.
Let us now focus on the practical implementation of the same in the upcoming section.
Applying F1 Score on Loan Dataset
Here, we would be implementing the evaluation metrics on Loan Defaulter Prediction. You can find the dataset here.
1. Load the dataset
We have used pandas.read_csv() function to load the dataset into the environment.
1 2 3 |
import pandas as pd import numpy as np loan = pd.read_csv("Bank-loan.csv") |
2. Split the dataset
Further, we have splitted the dataset using train_test_split() function as shownโ
1 2 3 4 |
from sklearn.model_selection import train_test_split X = loan.drop(['default'],axis=1) Y = loan['default'].astype(str) X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0) |
3. Defining the error metrics
Here, we have defined the confusion matrix
and other error metrics using customized functions.
1 2 3 4 5 6 7 8 9 10 11 |
# Error metrics -- Confusion matrixFPRFNRf1 score def err_metric(CM): TN = CM.iloc[0,0] FN = CM.iloc[1,0] TP = CM.iloc[1,1] FP = CM.iloc[0,1] precision =(TP)/(TP+FP) accuracy_model = (TP+TN)/(TP+TN+FP+FN) recall_score = (TP)/(TP+FN) f1_score = 2*(( precision * recall_score)/( precision + recall_score)) print("f1 score of the model: ",f1_score) |
4. Modelling
We have applied Decision Tree algorithm on the dataset as shown belowโ
1 2 3 4 |
#Decision Trees decision = DecisionTreeClassifier(max_depth= 6,class_weight="balanced" ,random_state =0).fit(X_train,Y_train) target = decision.predict(X_test) targetclass_prob = decision.predict_proba(X_test)[:, 1] |
5. Evaluation of the model
Now, having applied the model, now we have evaluated the model with the metrics defined in the above section.
1 2 |
confusion_matrix = pd.crosstab(Y_test,target) err_metric(confusion_matrix) |
Output:
1 2 |
<span style="color: #008000;"><strong>f1 score of the model: 0.3488372093023256 </strong></span> |
F1 Score with sklearn library
In this example, we have used the built-in function from sklearn library
to calculate the f1 score of the data values. The f1_score()
method is used to calculate the score value without having to explicitly make use of the precision and recall values.
1 2 3 4 5 |
from sklearn.metrics import f1_score x = [0, 1, 20 ,30, 40] y = [1, 19, 20, 30, 39] res=f1_score(x, y, average="macro") print("F1 score:", res) |
Output:
1 2 |
<span style="color: #008000;"><strong>F1 score: 0.2857142857142857 </strong></span> |
Conclusion
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
Till then, Stay tuned and Keep Learning!! ๐