Hey, folks! In our series of Machine Learning algorithms, today we will be focusing on Naive Bayes Algorithm in Python in detail.
So, let us begin!
What is Naive Bayes Algorithm?
Naive Bayes
is a Supervised Classification Machine Learning algorithm. It is a classification algorithm that is based on the below theoremโ
- Bayes Theorem
- Maximum A Posteriori Hypothesis
Let us have a look at the below formula-
The above formula represents the Bayes theorem which determines the probability of A given the evidence B (observed data sample B).
Thus, in Naive Bayes, we determine the probability that a particular hypothesis holds true for a particular evidence of the dataset.
Let us now understand the assumptions in the upcoming section.
Assumptions of Naive Bayes
Naive Bayes theorem assumes that the effect of a data feature/attribute on a given class or set is independent of the values of the other data variables/attributes of the dataset.
That is, the data variables are independent with regards to the effect of them over the probability class. This concept is termed as Class Conditional Independence
.
Implementing Naive Bayes in Python
Initially we used pandas.read_csv() function to load the dataset into the environment.
You can find the dataset used in the examples, here.
Further, we have split the dataset into training and testing dataset using train_test_split() function
.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import pandas as pd import numpy as np data = pd.read_csv("bank-loan.csv") # dataset loan = data.copy() from sklearn.model_selection import train_test_split X = loan.drop(['default'],axis=1) Y = loan['default'].astype(str) X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0) # Naive Bayes Algorithm from sklearn.naive_bayes import GaussianNB Naive = GaussianNB().fit(X_train,Y_train) target = Naive.predict(X_test) print(target) |
Here, we have applied Gaussian Naive Bayes
theorem using GaussianNB()
to predict whether the customer is a loan defaulter(0) or not(1).
Output:
1 2 3 4 5 6 7 8 9 10 11 |
array(['0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '1', '1', '0', '0', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '1', '1', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '1', '0', '0', '1', '0', '0', '1', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0', '0', '0', '1', '0', '1', '0', '0', '0', '0', '1', '0', '0', '0', '1', '0', '1', '0', '1', '0', '0', '0', '0', '0', '0', '0', '1', '0', '1', '0', '0', '1', '1', '0', '0', '0', '0', '1', '0', '1', '0', '0', '0', '0', '0', '0', '0', '0', '1', '0', '0', '0', '0'], dtype="<U1") |
Types of Naive Bayes Algorithms
Naive Bayes can be further classified into the following typesโ
- Bernoulli Naive Bayes
- Multinomial Naive Bayes
- Gaussian Naive Bayes
Let us have a look at each one of them in detail in the below section.
1. Bernoulli Naive Bayes
It is based on Bernoulli distribution of data. It is useful for binary classification
i.e. when the outcome depends on only two responses.
2. Multinomial Naive Bayes
It is a discrete classification algorithm and used when the output represents the frequency of occurrences of a term.
3. Gaussian Naive Bayes
In Gaussian Naive Bayes, we assume that the continuous variables follow the Normal distribution of data. Here, the mean and variance are calculated using the maximum likelihood approach.
Advantages of Naive Bayes
- Robust to Missing or NULL values.
- As this algorithm uses simple probability approach, it is less prone to Overfitting.
- Performs well for multiclass classification.
- Faster results and easy to apply.
Limitations of Naive Bayes
- Zero Frequency problemโIt arises when the algorithm assigns zero probability to a dataset. It can be overcome using smoothing techniques such as Laplace smoothing technique.
- Assumption of independent predictor variables is hazardous in real time datasets.
Application of Naive Bayes
- Multi-class Prediction of data groups
- Recommended Systems
- Text Classification
- Sentiment Analysis
- Spam Filtering
Conclusion
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to Python, Stay tuned @ Python with JournalDev and till then, Happy Learning!! ๐