Share @ LinkedIn Facebook  sklearn, boosting
Scikit-Learn - Ensemble Learning : Boosting

Scikit-Learn - Ensemble Learning: Boosting

Table of Contents

Introduction

Boosting is a type of ensemble learning where we train estimators sequentially rather than training all estimators in parallel. We try to create a few fast simple (weak but better than random guess) models and then combine results of all weak estimators to make the final prediction. We have already discussed another ensemble learning method as a part of our tutorial on bagging & random forests. Please feel free to go through it if you want to learn about it.

Scikit-learn provides two different boosting algorithms for classification and regression problems:

  • Gradient Tree Boosting (Gradient Boosted Decision Trees) - It builds learners iteratively where weak learners train on errors of samples which were predicted wrong. It initially starts with one learner and then adds learners iteratively. It tries to minimize loss by adding new trees iteratively. It uses decision trees are weak estimators. Scikit-learn provides two classes for which implements Gradient Tree Boosting for classification and regression problems.

    • GradientBoostingClassifier
    • GradientBoostingRegressor
  • Adaptive Boost - It fits the list of weak estimators iteratively on modified data. It then combines results of all estimators based on a weighted vote to generate a final result. At each iteration, high weights are assigned to samples which were predicted wrong in the previous iteration, and wights are decreased for those samples which were predicted right in the previous iteration. This enables models to concentrate on samples that are going wrong. Initially, all samples are assigned the same weights (1/ n_samples). It let us specify which estimators to use for the process. Scikit-learn provides two classes for which implements Adaptive Boosting for classification and regression problems.

    • AdaBoostClassifier
    • AdaBoostRegressor

This ends our small introduction to the Boosting process. We'll now start with the coding part.

We'll start by importing necessary libraries.

In [1]:
import numpy as np
import pandas as pd

import sklearn
import warnings

warnings.filterwarnings("ignore")

np.set_printoptions(precision=3)
%matplotlib inline

Load Dataset

We'll be loading below mentioned two for our purpose.

  • Digits Dataset: We'll be using digits dataset which has images of size 8x8 for digits 0-9. We'll use digits data for classification tasks below.
  • Boston Housing Dataset: We'll be using the Boston housing dataset which has information about various house properties like average no of rooms, per capita crime rate in town, etc. We'll be using it for regression tasks.

Sklearn provides both of this dataset as a part of the datasets module. We can load them by calling load_digits() and load_boston() methods. It returns dictionary-like object BUNCH which can be used to retrieve features and target.

In [2]:
from sklearn.datasets import load_boston, load_digits

digits = load_digits()
X_digits, Y_digits = digits.data, digits.target

print('Dataset Size : ',X_digits.shape, Y_digits.shape)
Dataset Size :  (1797, 64) (1797,)
In [3]:
boston = load_boston()
X_boston, Y_boston = boston.data, boston.target
print('Dataset Size : ',X_boston.shape, Y_boston.shape)
Dataset Size :  (506, 13) (506,)

GradientBoostingRegressor

TheGradientBoostingRegressor is available as a part of the ensemble module of sklearn. We'll be training the default model with Boston housing data and then tune the model by trying various hyperparameter settings to improve its performance. We'll also compare it with other regression estimators to check its performance relative to other machine learning models.

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (404, 13) (102, 13) (404,) (102,)
In [5]:
from sklearn.ensemble import GradientBoostingRegressor

grad_boosting_regressor = GradientBoostingRegressor()
grad_boosting_regressor.fit(X_train, Y_train)
Out[5]:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                          learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=100,
                          n_iter_no_change=None, presort='auto',
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)
In [6]:
Y_preds = grad_boosting_regressor.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test R^2 Score : %.3f'%grad_boosting_regressor.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%grad_boosting_regressor.score(X_train, Y_train))
[33.731 26.108 48.711 18.784 31.065 43.077 25.474  9.03  18.201 29.294
 22.577 19.06  15.871 24.611 19.605]
[15.  26.6 45.4 20.8 34.9 21.9 28.7  7.2 20.  32.2 24.1 18.5 13.5 27.
 23.1]
Test R^2 Score : 0.812
Training R^2 Score : 0.979

Important Attributes of GradientBoostingRegressor

Below are some of the important attributes of GradientBoostingRegressor which can provide important information once the model is trained.

  • feature_importances_ - It returns an array of floats representing the importance of each feature in the dataset.
  • estimators_ - It returns trained estimators.
  • oob_improvement_ - It returns array of size (n_estimators,). Each value in the array represents an improvement in a loss in out-of-bag samples relative to the previous iteration.
  • loss_ - It returns loss function as object.
In [7]:
print("Feature Importances : ", grad_boosting_regressor.feature_importances_)
Feature Importances :  [1.551e-02 3.064e-04 1.025e-03 4.960e-05 3.208e-02 4.883e-01 1.295e-02
 5.908e-02 1.602e-03 1.187e-02 2.461e-02 5.472e-03 3.472e-01]
In [8]:
print("Estimators Shape: ", grad_boosting_regressor.estimators_.shape)

grad_boosting_regressor.estimators_[:2]
Estimators Shape:  (100, 1)
Out[8]:
array([[DecisionTreeRegressor(criterion='friedman_mse', max_depth=3, max_features=None,
                      max_leaf_nodes=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      presort='auto',
                      random_state=RandomState(MT19937) at 0x7F0EF00E7780,
                      splitter='best')],
       [DecisionTreeRegressor(criterion='friedman_mse', max_depth=3, max_features=None,
                      max_leaf_nodes=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      presort='auto',
                      random_state=RandomState(MT19937) at 0x7F0EF00E7780,
                      splitter='best')]], dtype=object)
In [9]:
print("Loss : ", grad_boosting_regressor.loss_)
Loss :  <sklearn.ensemble._gb_losses.LeastSquaresError object at 0x7f0eac9d4b00>

Finetuning Model By Doing Grid Search On Various Hyperparameters

Below are list of common hyperparameters which needs tuning for getting best fit for our data. We'll try various hyperparemters settings to various splits of train/test data to find out best fit which will have almost same accuracy for both train & test dataset or have quite less different between accuracy.

  • learning_rate - It shrinks contribution of each tree. There is trade-off between learning_rate and n_estimatros.
  • n_estimators - Number of base estimators whose results will be combined to produce final prediction. default=100
  • max_depth - Maximum depth of individual trees. We need to find best value.default=3
  • min_samples_split - Number of samples required to split internal node. It accepts int(0-n_samples), float(0.0-0.5] values. Float takes ceil(min_samples_split * n_samples) features. default=2
  • min_samples_leaf - Minimum number of samples required to be at leaf node. It accepts int(0-n_samples), float(0.0-0.5] values. Float takes ceil(min_samples_leaf * n_samples) features. default=1
  • criterion - Cost function which we algorithm tries to minimize. Currently it supports mse(mean squared error) & mae(mean absolute error). default=friedman_mse
  • max_features - Number of features to consider when doing split. It accepts int(0-n_features), float(0.0-0.5], string(sqrt, log2, auto) or None as value. default=None
    • None - n_features are used as value if None is provided.
    • sqrt - sqrt(n_features) features are used for split.
    • auto - sqrt(n_features) features are used for split.
    • log2 - log2(n_features) features are used for split.
  • validation_fraction - It refers to proportion of training data to be used as validation for early stopping.It accepts float(0.0,1.0) default=0.1

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 3-fold cross-validation on data.

In [10]:
%%time

from sklearn.model_selection import GridSearchCV

n_samples = X_boston.shape[0]
n_features = X_boston.shape[1]

params = {'n_estimators': np.arange(100, 301, 50),
          'max_depth': [None, 3, 5,],
          'min_samples_split': [2, 0.3, 0.5, n_samples//2, ],
          'min_samples_leaf': [1, 0.3, 0.5, n_samples//2, ],
          'criterion': ['friedman_mse', 'mae'],
          'max_features': [None, 'sqrt', 'auto', 'log2', 0.3, 0.7, n_features//2, ],
         }

grad_boost_regressor_grid = GridSearchCV(GradientBoostingRegressor(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
grad_boost_regressor_grid.fit(X_train,Y_train)

print('Train R^2 Score : %.3f'%grad_boost_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%grad_boost_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%grad_boost_regressor_grid.best_score_)
print('Best Parameters : ',grad_boost_regressor_grid.best_params_)
Fitting 3 folds for each of 3360 candidates, totalling 10080 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:   27.6s
[Parallel(n_jobs=-1)]: Done 134 tasks      | elapsed:   29.9s
[Parallel(n_jobs=-1)]: Done 854 tasks      | elapsed:   38.2s
[Parallel(n_jobs=-1)]: Done 1662 tasks      | elapsed:   44.9s
[Parallel(n_jobs=-1)]: Done 2958 tasks      | elapsed:   55.1s
[Parallel(n_jobs=-1)]: Done 4542 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 5503 tasks      | elapsed:  1.8min
[Parallel(n_jobs=-1)]: Done 6281 tasks      | elapsed:  2.8min
[Parallel(n_jobs=-1)]: Done 7186 tasks      | elapsed:  3.5min
[Parallel(n_jobs=-1)]: Done 8124 tasks      | elapsed:  4.2min
[Parallel(n_jobs=-1)]: Done 9301 tasks      | elapsed:  5.3min
[Parallel(n_jobs=-1)]: Done 10080 out of 10080 | elapsed:  5.9min finished
Train R^2 Score : 0.997
Test R^2 Score : 0.776
Best R^2 Score Through Grid Search : 0.891
Best Parameters :  {'criterion': 'friedman_mse', 'max_depth': None, 'max_features': None, 'min_samples_leaf': 1, 'min_samples_split': 0.3, 'n_estimators': 150}
CPU times: user 5.14 s, sys: 254 ms, total: 5.39 s
Wall time: 5min 53s

Printing First Few Cross Validation Results

In [11]:
cross_val_results = pd.DataFrame(grad_boost_regressor_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 3360
Out[11]:
mean_fit_time std_fit_time mean_score_time std_score_time param_criterion param_max_depth param_max_features param_min_samples_leaf param_min_samples_split param_n_estimators params split0_test_score split1_test_score split2_test_score mean_test_score std_test_score rank_test_score
0 0.446993 0.001296 0.001833 0.000558 friedman_mse None None 1 2 100 {'criterion': 'friedman_mse', 'max_depth': Non... 0.670962 0.863044 0.680220 0.738218 0.088509 829
1 0.230005 0.156945 0.001789 0.000547 friedman_mse None None 1 2 150 {'criterion': 'friedman_mse', 'max_depth': Non... 0.670966 0.863046 0.680222 0.738221 0.088509 821
2 0.123556 0.003804 0.001345 0.000026 friedman_mse None None 1 2 200 {'criterion': 'friedman_mse', 'max_depth': Non... 0.670966 0.863046 0.680222 0.738221 0.088509 821
3 0.127053 0.001129 0.001377 0.000047 friedman_mse None None 1 2 250 {'criterion': 'friedman_mse', 'max_depth': Non... 0.670966 0.863046 0.680222 0.738221 0.088509 821
4 0.130700 0.001741 0.001365 0.000030 friedman_mse None None 1 2 300 {'criterion': 'friedman_mse', 'max_depth': Non... 0.670966 0.863046 0.680222 0.738221 0.088509 821

Comparing Performance Of Gradient Boosting With Bagging, Random Forest, Extra Trees, Decision Tree and Extra Tree

In [12]:
from sklearn import ensemble, tree
## Gradient Boosting Regressor with Default Params
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
                                                     gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))

## Above Hyper-perameter tuned Gradient Boosting Regressor
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1, **grad_boost_regressor_grid.best_params_)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
                                                     gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))

## Random Forest Regressor with Default Params
rforest_regressor = ensemble.RandomForestRegressor(random_state=1)
rforest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_regressor.__class__.__name__,
                                                     rforest_regressor.score(X_train, Y_train),rforest_regressor.score(X_test, Y_test)))


## Extra Trees Regressor with Default Params
extra_forest_regressor = ensemble.ExtraTreesRegressor(random_state=1)
extra_forest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_regressor.__class__.__name__,
                                                     extra_forest_regressor.score(X_train, Y_train),extra_forest_regressor.score(X_test, Y_test)))

## Bagging Regressor with Default Params
bag_regressor = ensemble.BaggingRegressor(random_state=1)
bag_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_regressor.__class__.__name__,
                                                     bag_regressor.score(X_train, Y_train),bag_regressor.score(X_test, Y_test)))


## Decision Tree with Default Parameters
dtree_regressor = tree.DecisionTreeRegressor(random_state=1)
dtree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_regressor.__class__.__name__,
                                                     dtree_regressor.score(X_train, Y_train),dtree_regressor.score(X_test, Y_test)))

## Decision Tree with Default Parameters
extra_tree_regressor = tree.ExtraTreeRegressor(random_state=1)
extra_tree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_regressor.__class__.__name__,
                                                     extra_tree_regressor.score(X_train, Y_train),extra_tree_regressor.score(X_test, Y_test)))
GradientBoostingRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
GradientBoostingRegressor : Train Accuracy : 1.00, Test Accuracy : 0.78
RandomForestRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
ExtraTreesRegressor : Train Accuracy : 1.00, Test Accuracy : 0.83
BaggingRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
DecisionTreeRegressor : Train Accuracy : 1.00, Test Accuracy : 0.44
ExtraTreeRegressor : Train Accuracy : 1.00, Test Accuracy : 0.51

GradientBosstingClassifier

TheGradientBosstingClassifier is available as a part of the ensemble module of sklearn. We'll be training the default model with digits data and then tune model by trying various hyperparameter settings to improve its performance. We'll also compare it with other classification estimators to check its performance relative to other machine learning models.

In [13]:
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (1437, 64) (360, 64) (1437,) (360,)
In [14]:
from sklearn.ensemble import GradientBoostingClassifier

grad_boosting_classif = GradientBoostingClassifier()
grad_boosting_classif.fit(X_train, Y_train)
Out[14]:
GradientBoostingClassifier(criterion='friedman_mse', init=None,
                           learning_rate=0.1, loss='deviance', max_depth=3,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=100,
                           n_iter_no_change=None, presort='auto',
                           random_state=None, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)
In [15]:
Y_preds = grad_boosting_classif.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Test Accuracy : %.3f'%grad_boosting_classif.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%grad_boosting_classif.score(X_train, Y_train))
[3 3 4 4 1 3 1 0 7 4 0 6 5 1 6]
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
Test Accuracy : 0.956
Test Accuracy : 0.956
Training Accuracy : 1.000

Important Attributes of GradientBoostingClassifier

The GradientBoostingClassifier has the same set of attributes as that of GradientBoostingRegressor.

In [16]:
print("Feature Importances Shape: ", grad_boosting_classif.feature_importances_.shape)

grad_boosting_classif.feature_importances_[:10]
Feature Importances Shape:  (64,)
Out[16]:
array([0.   , 0.001, 0.011, 0.003, 0.003, 0.059, 0.004, 0.001, 0.001,
       0.002])
In [17]:
print("Estimators Shape : ", grad_boosting_classif.estimators_.shape)
Estimators Shape :  (100, 10)
In [18]:
print("Loss : ", grad_boosting_classif.loss_)
Loss :  <sklearn.ensemble._gb_losses.MultinomialDeviance object at 0x7f0eac97d4a8>

Finetuning Model By Doing Grid Search On Various Hyperparameters

GradientBoostingClassifier has almost all parameters same as that of GradientBoostingRegressor

In [19]:
%%time

n_samples = X_digits.shape[0]
n_features = X_digits.shape[1]

params = {'n_estimators': [100, 200],
          'max_depth': [None, 2,5,],
          'min_samples_split': [2,0.5, n_samples//2, ],
          'min_samples_leaf': [1, 0.5, n_samples//2, ],
          'criterion': ['friedman_mse', 'mae'],
          'max_features': [None, 'sqrt', 'log2', 0.5, n_features//2,],
         }

grad_boost_classif_grid = GridSearchCV(GradientBoostingClassifier(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
grad_boost_classif_grid.fit(X_train,Y_train)

print('Train Accuracy : %.3f'%grad_boost_classif_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grad_boost_classif_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%grad_boost_classif_grid.best_score_)
print('Best Parameters : ',grad_boost_classif_grid.best_params_)
Fitting 3 folds for each of 540 candidates, totalling 1620 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    7.6s
[Parallel(n_jobs=-1)]: Done  64 tasks      | elapsed:   21.5s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:   40.8s
[Parallel(n_jobs=-1)]: Done 280 tasks      | elapsed:  1.3min
[Parallel(n_jobs=-1)]: Done 442 tasks      | elapsed:  1.9min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed:  2.7min
[Parallel(n_jobs=-1)]: Done 874 tasks      | elapsed: 13.0min
[Parallel(n_jobs=-1)]: Done 1144 tasks      | elapsed: 30.8min
[Parallel(n_jobs=-1)]: Done 1450 tasks      | elapsed: 48.8min
[Parallel(n_jobs=-1)]: Done 1620 out of 1620 | elapsed: 59.1min finished
Train Accuracy : 1.000
Test Accuracy : 0.972
Best Accuracy Through Grid Search : 0.978
Best Parameters :  {'criterion': 'mae', 'max_depth': 5, 'max_features': 'log2', 'min_samples_leaf': 1, 'min_samples_split': 2, 'n_estimators': 200}
CPU times: user 50.9 s, sys: 236 ms, total: 51.2 s
Wall time: 59min 56s

Printing First Few Cross Validation Results

In [20]:
cross_val_results = pd.DataFrame(grad_boost_classif_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 540
Out[20]:
mean_fit_time std_fit_time mean_score_time std_score_time param_criterion param_max_depth param_max_features param_min_samples_leaf param_min_samples_split param_n_estimators params split0_test_score split1_test_score split2_test_score mean_test_score std_test_score rank_test_score
0 1.755920 0.019092 0.005332 0.001424 friedman_mse None None 1 2 100 {'criterion': 'friedman_mse', 'max_depth': Non... 0.898551 0.897490 0.907563 0.901183 0.004511 116
1 2.450977 0.141080 0.004983 0.000073 friedman_mse None None 1 2 200 {'criterion': 'friedman_mse', 'max_depth': Non... 0.898551 0.897490 0.907563 0.901183 0.004511 116
2 2.336374 0.033164 0.007230 0.000163 friedman_mse None None 1 0.5 100 {'criterion': 'friedman_mse', 'max_depth': Non... 0.954451 0.947699 0.953782 0.951983 0.003037 63
3 4.306882 0.083603 0.013569 0.000535 friedman_mse None None 1 0.5 200 {'criterion': 'friedman_mse', 'max_depth': Non... 0.958592 0.956067 0.953782 0.956159 0.001966 48
4 1.050215 0.109802 0.003766 0.000060 friedman_mse None None 1 898 100 {'criterion': 'friedman_mse', 'max_depth': Non... 0.931677 0.918410 0.915966 0.922060 0.006915 108

Comparing Performance Of Gradient Boosting With Bagging, Random Forest, Extra Trees, Decision Tree and Extra Tree

In [21]:
from sklearn import ensemble

## Gradient Boosting Regressor with Default Params
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
                                                     gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))

## Above Hyper-perameter tuned Gradient Boosting Regressor
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1, **grad_boost_classif_grid.best_params_)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
                                                     gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))

## Random Forest Regressor with Default Params
rforest_classif = ensemble.RandomForestClassifier(random_state=1)
rforest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_classif.__class__.__name__,
                                                     rforest_classif.score(X_train, Y_train),rforest_classif.score(X_test, Y_test)))


## Extra Trees Regressor with Default Params
extra_forest_classif = ensemble.ExtraTreesClassifier(random_state=1)
extra_forest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_classif.__class__.__name__,
                                                     extra_forest_classif.score(X_train, Y_train),extra_forest_classif.score(X_test, Y_test)))

## Bagging Regressor with Default Params
bag_classif = ensemble.BaggingClassifier(random_state=1)
bag_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_classif.__class__.__name__,
                                                     bag_classif.score(X_train, Y_train),bag_classif.score(X_test, Y_test)))


## Decision Tree with Default Parameters
dtree_classif = tree.DecisionTreeClassifier(random_state=1)
dtree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_classif.__class__.__name__,
                                                     dtree_classif.score(X_train, Y_train),dtree_classif.score(X_test, Y_test)))

## Decision Tree with Default Parameters
extra_tree_classif = tree.ExtraTreeClassifier(random_state=1)
extra_tree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_classif.__class__.__name__,
                                                     extra_tree_classif.score(X_train, Y_train),extra_tree_classif.score(X_test, Y_test)))
GradientBoostingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.96
GradientBoostingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.97
RandomForestClassifier : Train Accuracy : 1.00, Test Accuracy : 0.94
ExtraTreesClassifier : Train Accuracy : 1.00, Test Accuracy : 0.95
BaggingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.94
DecisionTreeClassifier : Train Accuracy : 1.00, Test Accuracy : 0.83
ExtraTreeClassifier : Train Accuracy : 1.00, Test Accuracy : 0.83

AdaBoostRegressor

TheAdaBoostRegressor is available as a part of the ensemble module of sklearn. We'll be training the default model with Boston housing data and then tune the model by trying various hyperparameter settings to improve its performance. We'll also compare it with other regression estimators to check its performance relative to other machine learning models.

In [22]:
X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (404, 13) (102, 13) (404,) (102,)
In [23]:
from sklearn.ensemble import AdaBoostRegressor

ada_boost_regressor = AdaBoostRegressor()
ada_boost_regressor.fit(X_train, Y_train)
Out[23]:
AdaBoostRegressor(base_estimator=None, learning_rate=1.0, loss='linear',
                  n_estimators=50, random_state=None)
In [24]:
Y_preds = ada_boost_regressor.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test R^2 Score : %.3f'%ada_boost_regressor.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%ada_boost_regressor.score(X_train, Y_train))
[18.221 27.061 47.212 18.138 31.677 39.125 27.313 11.922 17.933 26.785
 26.249 20.919 17.408 27.167 18.951]
[15.  26.6 45.4 20.8 34.9 21.9 28.7  7.2 20.  32.2 24.1 18.5 13.5 27.
 23.1]
Test R^2 Score : 0.834
Training R^2 Score : 0.912

Important Attributes of AdaBoostRegressor

Below are some of the important attributes of AdaBoostRegressor which can provide important information once the model is trained.

  • base_estimator_ - It returns base estimator from which whole strong estimator consisting of weak estimators is created.
  • feature_importances_ - It returns an array of floats representing the importance of each feature in the dataset.
  • estimators_ - It returns trained estimators.
In [25]:
print("Base Estimator : ", ada_boost_regressor.base_estimator_)
Base Estimator :  DecisionTreeRegressor(criterion='mse', max_depth=3, max_features=None,
                      max_leaf_nodes=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      presort=False, random_state=None, splitter='best')
In [26]:
print("Feature Importances : ", ada_boost_regressor.feature_importances_)
Feature Importances :  [3.205e-02 0.000e+00 4.921e-03 3.564e-04 4.383e-02 2.726e-01 8.314e-03
 1.300e-01 1.609e-02 5.684e-02 2.638e-02 4.679e-03 4.040e-01]
In [27]:
print("Estimators Shape : ", len(ada_boost_regressor.estimators_))

ada_boost_regressor.estimators_[:2]
Estimators Shape :  50
Out[27]:
[DecisionTreeRegressor(criterion='mse', max_depth=3, max_features=None,
                       max_leaf_nodes=None, min_impurity_decrease=0.0,
                       min_impurity_split=None, min_samples_leaf=1,
                       min_samples_split=2, min_weight_fraction_leaf=0.0,
                       presort=False, random_state=280424452, splitter='best'),
 DecisionTreeRegressor(criterion='mse', max_depth=3, max_features=None,
                       max_leaf_nodes=None, min_impurity_decrease=0.0,
                       min_impurity_split=None, min_samples_leaf=1,
                       min_samples_split=2, min_weight_fraction_leaf=0.0,
                       presort=False, random_state=1540040155, splitter='best')]

Finetuning Model By Doing Grid Search On Various Hyperparameters

Below is a list of common hyperparameters that needs tuning for getting best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.

  • base_estimator - It let us specify the base estimator from which ensemble will be created. It can be any other machine learning estimator like KNearestNeighbors, DecisionTree, etc. The default is a decision tree with a max depth of 3.
  • learning_rate - It shrinks the contribution of each tree. There is a trade-off between learning_rate and n_estimatros.
  • n_estimators - Number of base estimators whose results will be combined to produce a final prediction. default=100

We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 3-fold cross-validation on data.

In [28]:
%%time

from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression

n_samples = X_boston.shape[0]
n_features = X_boston.shape[1]

params = {
            'base_estimator':[None, DecisionTreeRegressor(), KNeighborsRegressor(), LinearRegression()],
            'n_estimators': np.arange(100, 350, 50),
            'learning_rate': [0.5, 0.8, 1.0, 2.0, ]
         }

ada_boost_regressor_grid = GridSearchCV(AdaBoostRegressor(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
ada_boost_regressor_grid.fit(X_train,Y_train)

print('Train R^2 Score : %.3f'%ada_boost_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%ada_boost_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%ada_boost_regressor_grid.best_score_)
print('Best Parameters : ',ada_boost_regressor_grid.best_params_)
Fitting 3 folds for each of 80 candidates, totalling 240 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  12 tasks      | elapsed:    0.9s
[Parallel(n_jobs=-1)]: Done 120 tasks      | elapsed:    8.2s
[Parallel(n_jobs=-1)]: Done 240 out of 240 | elapsed:   15.0s finished
Train R^2 Score : 1.000
Test R^2 Score : 0.848
Best R^2 Score Through Grid Search : 0.878
Best Parameters :  {'base_estimator': DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
                      max_leaf_nodes=None, min_impurity_decrease=0.0,
                      min_impurity_split=None, min_samples_leaf=1,
                      min_samples_split=2, min_weight_fraction_leaf=0.0,
                      presort=False, random_state=None, splitter='best'), 'learning_rate': 1.0, 'n_estimators': 150}
CPU times: user 512 ms, sys: 11.9 ms, total: 524 ms
Wall time: 15.3 s

Printing First Few Cross Validation Results

In [29]:
cross_val_results = pd.DataFrame(ada_boost_regressor_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 80
Out[29]:
mean_fit_time std_fit_time mean_score_time std_score_time param_base_estimator param_learning_rate param_n_estimators params split0_test_score split1_test_score split2_test_score mean_test_score std_test_score rank_test_score
0 0.154460 0.004609 0.005228 0.000052 None 0.5 100 {'base_estimator': None, 'learning_rate': 0.5,... 0.851541 0.820585 0.793769 0.822035 0.023593 31
1 0.202743 0.011192 0.008456 0.000807 None 0.5 150 {'base_estimator': None, 'learning_rate': 0.5,... 0.853543 0.829347 0.781863 0.821683 0.029745 33
2 0.232278 0.013604 0.010420 0.000402 None 0.5 200 {'base_estimator': None, 'learning_rate': 0.5,... 0.850087 0.832786 0.779422 0.820867 0.030041 36
3 0.265798 0.001404 0.013722 0.001798 None 0.5 250 {'base_estimator': None, 'learning_rate': 0.5,... 0.850051 0.835823 0.778000 0.821399 0.031122 34
4 0.310391 0.001247 0.015124 0.000192 None 0.5 300 {'base_estimator': None, 'learning_rate': 0.5,... 0.851892 0.835819 0.779556 0.822528 0.030978 30

Comparing Performance Of Ada Boost With Gradient Boosting, Bagging, Random Forest, Extra Trees, Decision Tree and Extra Tree

In [30]:
from sklearn import ensemble
## Ada Boosting Regressor with Default Params
ada_regressor = ensemble.AdaBoostRegressor(random_state=1)
ada_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_regressor.__class__.__name__,
                                                     ada_regressor.score(X_train, Y_train),ada_regressor.score(X_test, Y_test)))

## Above Hyper-perameter tuned Ada Boosting Regressor
ada_regressor = ensemble.AdaBoostRegressor(random_state=1, **ada_boost_regressor_grid.best_params_)
ada_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_regressor.__class__.__name__,
                                                     ada_regressor.score(X_train, Y_train),ada_regressor.score(X_test, Y_test)))

## Gradient Boosting Regressor with Default Params
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
                                                     gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))

## Random Forest Regressor with Default Params
rforest_regressor = ensemble.RandomForestRegressor(random_state=1)
rforest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_regressor.__class__.__name__,
                                                     rforest_regressor.score(X_train, Y_train),rforest_regressor.score(X_test, Y_test)))


## Extra Trees Regressor with Default Params
extra_forest_regressor = ensemble.ExtraTreesRegressor(random_state=1)
extra_forest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_regressor.__class__.__name__,
                                                     extra_forest_regressor.score(X_train, Y_train),extra_forest_regressor.score(X_test, Y_test)))

## Bagging Regressor with Default Params
bag_regressor = ensemble.BaggingRegressor(random_state=1)
bag_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_regressor.__class__.__name__,
                                                     bag_regressor.score(X_train, Y_train),bag_regressor.score(X_test, Y_test)))


## Decision Tree with Default Parameters
dtree_regressor = tree.DecisionTreeRegressor(random_state=1)
dtree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_regressor.__class__.__name__,
                                                     dtree_regressor.score(X_train, Y_train),dtree_regressor.score(X_test, Y_test)))

## Decision Tree with Default Parameters
extra_tree_regressor = tree.ExtraTreeRegressor(random_state=1)
extra_tree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_regressor.__class__.__name__,
                                                     extra_tree_regressor.score(X_train, Y_train),extra_tree_regressor.score(X_test, Y_test)))
AdaBoostRegressor : Train Accuracy : 0.92, Test Accuracy : 0.79
AdaBoostRegressor : Train Accuracy : 1.00, Test Accuracy : 0.85
GradientBoostingRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
RandomForestRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
ExtraTreesRegressor : Train Accuracy : 1.00, Test Accuracy : 0.83
BaggingRegressor : Train Accuracy : 0.98, Test Accuracy : 0.81
DecisionTreeRegressor : Train Accuracy : 1.00, Test Accuracy : 0.44
ExtraTreeRegressor : Train Accuracy : 1.00, Test Accuracy : 0.51

AdaBoostClassifier

TheAdaBoostClassifier is available as a part of the ensemble module of sklearn. We'll be training the default model with digits data and then tune model by trying various hyperparameter settings to improve its performance. We'll also compare it with other classification estimators to check its performance relative to other machine learning models.

In [31]:
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
Train/Test Sizes :  (1437, 64) (360, 64) (1437,) (360,)
In [32]:
from sklearn.ensemble import AdaBoostClassifier

ada_boosting_classif = AdaBoostClassifier()
ada_boosting_classif.fit(X_train, Y_train)
Out[32]:
AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1.0,
                   n_estimators=50, random_state=None)
In [33]:
Y_preds = ada_boosting_classif.predict(X_test)

print(Y_preds[:15])
print(Y_test[:15])

print('Test Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Test Accuracy : %.3f'%ada_boosting_classif.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%ada_boosting_classif.score(X_train, Y_train))
[8 8 8 8 8 8 8 9 8 8 0 9 6 6 6]
[3 3 4 4 1 3 1 0 7 4 0 0 5 1 6]
Test Accuracy : 0.369
Test Accuracy : 0.369
Training Accuracy : 0.337

Important Attributes of AdaBoostClassifier

The AdaBoostClassifier has all attributes the same as that of AdaBoostRegressor.

In [34]:
print("Base Estimator : ", ada_boosting_classif.base_estimator_)
Base Estimator :  DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best')
In [35]:
print("Feature Importances : ", ada_boosting_classif.feature_importances_)
Feature Importances :  [0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
 0.   0.   0.   0.   0.   0.   0.   0.02 0.   0.   0.   0.   0.   0.
 0.48 0.   0.   0.   0.   0.   0.   0.   0.5  0.   0.   0.   0.   0.
 0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.
 0.   0.   0.   0.   0.   0.   0.   0.  ]
In [36]:
print("Estimators Shape : ", len(ada_boosting_classif.estimators_))

ada_boosting_classif.estimators_[:2]
Estimators Shape :  50
Out[36]:
[DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,
                        max_features=None, max_leaf_nodes=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, presort=False,
                        random_state=777627829, splitter='best'),
 DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,
                        max_features=None, max_leaf_nodes=None,
                        min_impurity_decrease=0.0, min_impurity_split=None,
                        min_samples_leaf=1, min_samples_split=2,
                        min_weight_fraction_leaf=0.0, presort=False,
                        random_state=429440265, splitter='best')]

Finetuning Model By Doing Grid Search On Various Hyperparameters

AdaBoostClassifier has almost all parameters same as that of AdaBoostRegressor

In [37]:
%%time

from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

n_samples = X_digits.shape[0]
n_features = X_digits.shape[1]

params = {
            'base_estimator':[None, DecisionTreeClassifier(), SVC(), LogisticRegression()],
            'n_estimators': np.arange(100, 350, 100),
            'learning_rate': [0.5, 1.0, 2.0, ]
         }


ada_boost_classif_grid = GridSearchCV(AdaBoostClassifier(random_state=1, algorithm='SAMME'), param_grid=params, n_jobs=-1, cv=3, verbose=5)
ada_boost_classif_grid.fit(X_train,Y_train)

print('Train Accuracy : %.3f'%ada_boost_classif_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%ada_boost_classif_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%ada_boost_classif_grid.best_score_)
print('Best Parameters : ',ada_boost_classif_grid.best_params_)
Fitting 3 folds for each of 36 candidates, totalling 108 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  10 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed: 11.2min finished
Train Accuracy : 0.985
Test Accuracy : 0.956
Best Accuracy Through Grid Search : 0.953
Best Parameters :  {'base_estimator': LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), 'learning_rate': 2.0, 'n_estimators': 300}
CPU times: user 32.7 s, sys: 17.6 s, total: 50.3 s
Wall time: 11min 33s

Printing First Few Cross Validation Results

In [38]:
cross_val_results = pd.DataFrame(ada_boost_classif_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))

cross_val_results.head() ## Printing first few results.
Number of Various Combinations of Parameters Tried : 36
Out[38]:
mean_fit_time std_fit_time mean_score_time std_score_time param_base_estimator param_learning_rate param_n_estimators params split0_test_score split1_test_score split2_test_score mean_test_score std_test_score rank_test_score
0 0.288682 0.006750 0.012123 0.000486 None 0.5 100 {'base_estimator': None, 'learning_rate': 0.5,... 0.778468 0.815900 0.792017 0.795407 0.015490 23
1 0.487868 0.042866 0.022856 0.000148 None 0.5 200 {'base_estimator': None, 'learning_rate': 0.5,... 0.815735 0.834728 0.810924 0.820459 0.010264 21
2 0.689390 0.005973 0.049050 0.009829 None 0.5 300 {'base_estimator': None, 'learning_rate': 0.5,... 0.811594 0.836820 0.836134 0.828114 0.011758 20
3 0.230102 0.005932 0.012194 0.000801 None 1 100 {'base_estimator': None, 'learning_rate': 1.0,... 0.853002 0.832636 0.815126 0.833681 0.015488 18
4 0.476351 0.002038 0.023662 0.000702 None 1 200 {'base_estimator': None, 'learning_rate': 1.0,... 0.824017 0.811715 0.861345 0.832289 0.021058 19

Comparing Performance Of Ada Boost With Gradient Boosting, Bagging, Random Forest, Extra Trees, Decision Tree and Extra Tree

In [39]:
from sklearn import ensemble

## Gradient Boosting Regressor with Default Params
ada_classifier = ensemble.AdaBoostClassifier(random_state=1)
ada_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_classifier.__class__.__name__,
                                                     ada_classifier.score(X_train, Y_train),ada_classifier.score(X_test, Y_test)))

## Above Hyper-perameter tuned Gradient Boosting Regressor
ada_classifier = ensemble.AdaBoostClassifier(random_state=1, **ada_boost_classif_grid.best_params_)
ada_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_classifier.__class__.__name__,
                                                     ada_classifier.score(X_train, Y_train),ada_classifier.score(X_test, Y_test)))

## Gradient Boosting Regressor with Default Params
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
                                                     gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))


## Random Forest Regressor with Default Params
rforest_classif = ensemble.RandomForestClassifier(random_state=1)
rforest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_classif.__class__.__name__,
                                                     rforest_classif.score(X_train, Y_train),rforest_classif.score(X_test, Y_test)))


## Extra Trees Regressor with Default Params
extra_forest_classif = ensemble.ExtraTreesClassifier(random_state=1)
extra_forest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_classif.__class__.__name__,
                                                     extra_forest_classif.score(X_train, Y_train),extra_forest_classif.score(X_test, Y_test)))

## Bagging Regressor with Default Params
bag_classif = ensemble.BaggingClassifier(random_state=1)
bag_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_classif.__class__.__name__,
                                                     bag_classif.score(X_train, Y_train),bag_classif.score(X_test, Y_test)))


## Decision Tree with Default Parameters
dtree_classif = tree.DecisionTreeClassifier(random_state=1)
dtree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_classif.__class__.__name__,
                                                     dtree_classif.score(X_train, Y_train),dtree_classif.score(X_test, Y_test)))

## Decision Tree with Default Parameters
extra_tree_classif = tree.ExtraTreeClassifier(random_state=1)
extra_tree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_classif.__class__.__name__,
                                                     extra_tree_classif.score(X_train, Y_train),extra_tree_classif.score(X_test, Y_test)))
AdaBoostClassifier : Train Accuracy : 0.34, Test Accuracy : 0.37
AdaBoostClassifier : Train Accuracy : 0.75, Test Accuracy : 0.74
GradientBoostingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.96
RandomForestClassifier : Train Accuracy : 1.00, Test Accuracy : 0.94
ExtraTreesClassifier : Train Accuracy : 1.00, Test Accuracy : 0.95
BaggingClassifier : Train Accuracy : 1.00, Test Accuracy : 0.94
DecisionTreeClassifier : Train Accuracy : 1.00, Test Accuracy : 0.83
ExtraTreeClassifier : Train Accuracy : 1.00, Test Accuracy : 0.83


Sunny Solanki  Sunny Solanki