Naive Bayes estimators are probabilistic estimators based on the `Bayes theorem`

with assumptions that there is strong independence between features. The Bayes Theorem helps us find out the probability of occurring events based on some prior knowledge of conditions that can be related to the event. The naive Bayes classifiers have worked quite well for document classification and spam filtering applications. It requires a small amount of training data to set up with probabilities for Bayes theorem and therefore works quite fast.

Scikit-Learn provides a list of 4 Naive Bayes estimators where each differs from other based on probability of particular feature appearing if particular class appears:

**BernoulliNB**- It represents classifier which is based on data that is multivariate Bernoulli distributions. The Bernoulli distribution implies that data can have multiple features but each one is assumed to be a binary variable.**GaussianNB**- It represents classifier which is based on assumption that likelihood of features is Gaussian distribution.**ComplementNB**- It represents a classifier that uses a complement of each class to compute model weights. It's a standard variant of multinomial naive Bayes which is well suited for imbalanced class classification problems.**MultinomialNB**- It represents a classifier that is suited for multinomially distributed data.

We'll be explaining the usage of each one of the naive Bayes variants with examples.

We'll start by importing the necessary libraries for our tutorial.

In [1]:

```
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
np.set_printoptions(precision=2)
%matplotlib inline
```

We'll be using digits dataset for our explanation purpose. It has data about every 0-9 digits as an `8x8`

pixel image. Each sample image is kept as a vector of size `64`

.

In [2]:

```
from sklearn.datasets import load_boston, load_digits
digits = load_digits()
X_digits, Y_digits = digits.data, digits.target
print('Dataset Size : ', X_digits.shape, Y_digits.shape)
```

We'll split the dataset into two parts:

`Training data`

which will be used for the training model.`Test data`

against which accuracy of the trained model will be checked.

`train_test_split`

function of `model_selection`

module of sklearn will help us split data into two sets with `80%`

for training and `20%`

for test purposes. We are also using `seed(random_state=123)`

with train_test_split so that we always get the same split and can reproduce results in the future as well.

NOTE

Please make a note that we are also using **stratify** parameter which will prevent unequal distribution of all classes in train and test sets.For each classes, we'll have 80% samples in train set and 20% samples in test set. This will make sure that we don't have any dominating class in either train or test set.

In [3]:

```
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, stratify=Y_digits, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
```

The first estimator that we'll be introducing is `BernoulliNB`

available with the `naive_bayes`

module of sklearn. We'll be first fitting it with default parameters to data and then will try to improve its performance by doing hyperparameter tuning. We'll also evaluate its performance using a confusion matrix. We'll even inform you regarding important attributes of `BernoulliNB`

which can give helpful insight once the model is trained.

We'll be fitting model to train data by using `fit()`

method of estimator passing it train features and train labels. We are fitting a default model to train data without setting any parameter explicitly.

In [4]:

```
from sklearn.naive_bayes import BernoulliNB
bernoulli_nb = BernoulliNB()
bernoulli_nb.fit(X_train, Y_train)
```

Out[4]:

Almost all models in Scikit-Learn API provides `predict()`

method which can be used to predict target variable on Test Set passed to it.

In [5]:

```
Y_preds = bernoulli_nb.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test Accuracy : %.3f'%bernoulli_nb.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%bernoulli_nb.score(X_train, Y_train))
```

We'll be plotting the confusion matrix to better understand the performance of our model. We have designed the method `plot_confusion_matrix()`

which accepts original labels and predicted labels of data. It then plots a confusion matrix. We'll be reusing this method in the future as well when training other estimators.

In [6]:

```
from sklearn.metrics import confusion_matrix
def plot_confusion_matrix(Y_test, Y_preds):
conf_mat = confusion_matrix(Y_test, Y_preds)
#print(conf_mat)
fig = plt.figure(figsize=(6,6))
plt.matshow(conf_mat, cmap=plt.cm.Blues, fignum=1)
plt.yticks(range(10), range(10))
plt.xticks(range(10), range(10))
plt.colorbar();
for i in range(10):
for j in range(10):
plt.text(i-0.2,j+0.1, str(conf_mat[j, i]), color='tab:red')
```

In [ ]:

```
plot_confusion_matrix(Y_test, bernoulli_nb.predict(X_test))
```

Below are list of important attributes available through estimator instance of BernoulliNB.

- It represents log probability of each class.`class_log_prior_`

- It represents log probability of particular feature based on class.`feature_log_prob_`

`(n_classes x n_features)`

In [8]:

```
bernoulli_nb.class_log_prior_
```

Out[8]:

In [9]:

```
print("Log Probability of Each Feature per class : ", bernoulli_nb.feature_log_prob_.shape)
```

Below is a list of common hyperparameters that needs tuning for getting best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

**alpha**- It accepts float value representing the additive smoothing parameter. The value of`0.0`

represents no smoothing. The default value of this parameter is`1.0`

.

It's a wrapper class provided by sklearn which loops through all parameters provided as `params_grid`

parameter with a number of cross-validation folds provided as `cv`

parameter, evaluates model performance on all combinations and stores all results in `cv_results_`

attribute. It also stores model which performs best in all cross-validation folds in `best_estimator_`

attribute and best score in `best_score_`

attribute.

NOTE

**n_jobs** parameter is provided by many estimators. It accepts number of cores to use for parallelization. If value of **-1** is given then it uses all cores. It uses joblib parallel processing library for running things in parallel in background.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by splitting data into `3-fold cross-validation`

.

In [10]:

```
%%time
from sklearn.model_selection import GridSearchCV
params = {'alpha': [0.01, 0.1, 0.5, 1.0, 10.0],
}
bernoulli_nb_grid = GridSearchCV(BernoulliNB(), param_grid=params, n_jobs=-1, cv=5, verbose=5)
bernoulli_nb_grid.fit(X_digits,Y_digits)
print('Train Accuracy : %.3f'%bernoulli_nb_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%bernoulli_nb_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%bernoulli_nb_grid.best_score_)
print('Best Parameters : ',bernoulli_nb_grid.best_params_)
```

Below we are plotting the confusion matrix again with the best estimator that we found out using grid search.

In [ ]:

```
plot_confusion_matrix(Y_test, bernoulli_nb_grid.best_estimator_.predict(X_test))
```

The first estimator that we'll be introducing is `GaussianNB`

available with the `naive_bayes`

module of sklearn. We'll be first fitting it with default parameters to data and then will try to improve its performance by doing hyperparameter tuning. We'll also evaluate its performance using a confusion matrix. We'll even inform you regarding important attributes of `GaussianNB`

which can give helpful insight once the model is trained.

In [12]:

```
from sklearn.naive_bayes import GaussianNB
gaussian_nb = GaussianNB()
gaussian_nb.fit(X_train, Y_train)
```

Out[12]:

Almost all models in Scikit-Learn API provides `predict()`

method which can be used to predict target varible on Test Set passed to it.

In [13]:

```
Y_preds = gaussian_nb.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test Accuracy : %.3f'%gaussian_nb.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%gaussian_nb.score(X_train, Y_train))
```

In [ ]:

```
plot_confusion_matrix(Y_test, gaussian_nb.predict(X_test))
```

Below are list of important attributes available through estimator instance of GaussianNB.

- It represents log probability of each class.`class_log_prior_`

- It represents absolute additive value to variances.`epsilon_`

- It represents variance of each feature per class.`sigma_`

`(n_classes x n_features)`

- It represents mean of feature per class.`theta_`

`(n_classes x n_features)`

In [15]:

```
gaussian_nb.class_prior_
```

Out[15]:

In [16]:

```
gaussian_nb.epsilon_
```

Out[16]:

In [17]:

```
print("Gaussian Naive Bayes Sigma Shape : ", gaussian_nb.sigma_.shape)
```

In [18]:

```
print("Gaussian Naive Bayes Theta Shape : ", gaussian_nb.theta_.shape)
```

The first estimator that we'll be introducing is `ComplementNB`

available with the `naive_bayes`

module of sklearn. We'll be first fitting it with default parameters to data and then will try to improve its performance by doing hyperparameter tuning. We'll also evaluate its performance using a confusion matrix. We'll even inform you regarding important attributes of `ComplementNB`

which can give helpful insight once the model is trained.

In [19]:

```
from sklearn.naive_bayes import ComplementNB
complement_nb = ComplementNB()
complement_nb.fit(X_train, Y_train)
```

Out[19]:

Almost all models in Scikit-Learn API provides `predict()`

method which can be used to predict target variable on Test Set passed to it.

In [20]:

```
Y_preds = complement_nb.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test Accuracy : %.3f'%complement_nb.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%complement_nb.score(X_train, Y_train))
```

In [ ]:

```
plot_confusion_matrix(Y_test, complement_nb.predict(X_test))
```

Below are list of important attributes available through estimator instance of ComplementNB.

- It represents log probability of each class.`class_log_prior_`

- It represents log probability of particular feature based on class.`feature_log_prob_`

`(n_classes x n_features)`

In [22]:

```
complement_nb.class_log_prior_
```

Out[22]:

In [23]:

```
print("Log Probability of Each Feature per class : ", complement_nb.feature_log_prob_.shape)
```

Below is a list of common hyperparameters that needs tuning for getting best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

**alpha**- It accepts float value representing the additive smoothing parameter. The value of`0.0`

represents no smoothing. The default value of this parameter is`1.0`

.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by splitting data into `3-fold cross-validation`

.

In [24]:

```
%%time
params = {'alpha': [0.01, 0.1, 0.5, 1.0, 10.0, ],
}
complement_nb_grid = GridSearchCV(ComplementNB(), param_grid=params, n_jobs=-1, cv=5, verbose=5)
complement_nb_grid.fit(X_digits,Y_digits)
print('Train Accuracy : %.3f'%complement_nb_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%complement_nb_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%complement_nb_grid.best_score_)
print('Best Parameters : ',complement_nb_grid.best_params_)
```

Below we are plotting confusion matrix again with best estimator that we found out using grid search.

In [ ]:

```
plot_confusion_matrix(Y_test, complement_nb_grid.best_estimator_.predict(X_test))
```

The first estimator that we'll be introducing is `MultinomialNB`

available with the `naive_bayes`

module of sklearn. We'll be first fitting it with default parameters to data and then will try to improve its performance by doing hyperparameter tuning. We'll also evaluate its performance using a confusion matrix. We'll even inform you regarding important attributes of `MultinomialNB`

which can give helpful insight once the model is trained.

In [26]:

```
from sklearn.naive_bayes import MultinomialNB
multinomial_nb = MultinomialNB()
multinomial_nb.fit(X_train, Y_train)
```

Out[26]:

Almost all models in Scikit-Learn API provides `predict()`

method which can be used to predict target variable on Test Set passed to it.

In [27]:

```
Y_preds = multinomial_nb.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test Accuracy : %.3f'%multinomial_nb.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%multinomial_nb.score(X_train, Y_train))
```

In [ ]:

```
plot_confusion_matrix(Y_test, multinomial_nb.predict(X_test))
```

Below are list of important attributes available through estimator instance of MultinomialNB.

- It represents log probability of each class.`class_log_prior_`

- It represents log probability of particular feature based on class.`feature_log_prob_`

`(n_classes x n_features)`

In [29]:

```
multinomial_nb.class_log_prior_
```

Out[29]:

In [30]:

```
print("Log Probability of Each Feature per class : ", multinomial_nb.feature_log_prob_.shape)
```

Below is a list of common hyperparameters that needs tuning for getting best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

**alpha**- It accepts float value representing the additive smoothing parameter. The value of`0.0`

represents no smoothing. The default value of this parameter is`1.0`

.

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by splitting data into `3-fold cross-validation`

.

In [31]:

```
%%time
params = {'alpha': [0.01, 0.1, 0.5, 1.0, 10.0, ],
}
multinomial_nb_grid = GridSearchCV(MultinomialNB(), param_grid=params, n_jobs=-1, cv=5, verbose=5)
multinomial_nb_grid.fit(X_digits,Y_digits)
print('Train Accuracy : %.3f'%multinomial_nb_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%multinomial_nb_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%multinomial_nb_grid.best_score_)
print('Best Parameters : ',multinomial_nb_grid.best_params_)
```

Below we are plotting the confusion matrix again with the best estimator that we found out using grid search.

In [ ]:

```
plot_confusion_matrix(Y_test, multinomial_nb_grid.best_estimator_.predict(X_test))
```

This ends our small tutorial on introducing various naive Bayes implementation available with scikit-learn. Please feel free to let us know your views in the comments section.

Sunny Solanki

dice-ml - Diverse Counterfactual Explanations for ML Models [Python]

interpret-ml - Explain Machine Learning Models And Their Predictions [Python]

Yellowbrick - Text Data Visualizations [Python]

How to Use eli5 to Understand sklearn Models, their Performance, and their Predictions [Python]?