Boosting is a type of ensemble learning where we train estimators sequentially rather than training all estimators in parallel. We try to create a few fast simple (weak but better than random guess) models and then combine results of all weak estimators to make the final prediction. We have already discussed another ensemble learning method as a part of our tutorial on bagging & random forests. Please feel free to go through it if you want to learn about it.
Scikit-learn provides two different boosting algorithms for classification and regression problems:
Gradient Tree Boosting (Gradient Boosted Decision Trees) - It builds learners iteratively where weak learners train on errors of samples which were predicted wrong. It initially starts with one learner and then adds learners iteratively. It tries to minimize loss by adding new trees iteratively. It uses decision trees are weak estimators. Scikit-learn provides two classes for which implements Gradient Tree Boosting
for classification and regression problems.
Adaptive Boost - It fits the list of weak estimators iteratively on modified data. It then combines results of all estimators based on a weighted vote to generate a final result. At each iteration, high weights are assigned to samples which were predicted wrong in the previous iteration, and wights are decreased for those samples which were predicted right in the previous iteration. This enables models to concentrate on samples that are going wrong. Initially, all samples are assigned the same weights (1/ n_samples). It let us specify which estimators to use for the process. Scikit-learn provides two classes for which implements Adaptive Boosting
for classification and regression problems.
This ends our small introduction to the Boosting process. We'll now start with the coding part.
We'll start by importing necessary libraries.
import numpy as np
import pandas as pd
import sklearn
import warnings
warnings.filterwarnings("ignore")
np.set_printoptions(precision=3)
%matplotlib inline
We'll be loading below mentioned two for our purpose.
8x8
for digits 0-9
. We'll use digits data for classification tasks below. Sklearn provides both of this dataset as a part of the datasets
module. We can load them by calling load_digits()
and load_boston()
methods. It returns dictionary-like object BUNCH
which can be used to retrieve features and target.
from sklearn.datasets import load_boston, load_digits
digits = load_digits()
X_digits, Y_digits = digits.data, digits.target
print('Dataset Size : ',X_digits.shape, Y_digits.shape)
boston = load_boston()
X_boston, Y_boston = boston.data, boston.target
print('Dataset Size : ',X_boston.shape, Y_boston.shape)
TheGradientBoostingRegressor
is available as a part of the ensemble
module of sklearn. We'll be training the default model with Boston housing data and then tune the model by trying various hyperparameter settings to improve its performance. We'll also compare it with other regression estimators to check its performance relative to other machine learning models.
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
from sklearn.ensemble import GradientBoostingRegressor
grad_boosting_regressor = GradientBoostingRegressor()
grad_boosting_regressor.fit(X_train, Y_train)
Y_preds = grad_boosting_regressor.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test R^2 Score : %.3f'%grad_boosting_regressor.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%grad_boosting_regressor.score(X_train, Y_train))
GradientBoostingRegressor
¶Below are some of the important attributes of GradientBoostingRegressor
which can provide important information once the model is trained.
feature_importances_
- It returns an array of floats representing the importance of each feature in the dataset.estimators_
- It returns trained estimators.oob_improvement_
- It returns array of size (n_estimators,). Each value in the array represents an improvement in a loss in out-of-bag samples relative to the previous iteration.loss_
- It returns loss function as object.print("Feature Importances : ", grad_boosting_regressor.feature_importances_)
print("Estimators Shape: ", grad_boosting_regressor.estimators_.shape)
grad_boosting_regressor.estimators_[:2]
print("Loss : ", grad_boosting_regressor.loss_)
Below are list of common hyperparameters which needs tuning for getting best fit for our data. We'll try various hyperparemters settings to various splits of train/test data to find out best fit which will have almost same accuracy for both train & test dataset or have quite less different between accuracy.
default=100
default=3
int(0-n_samples)
, float(0.0-0.5]
values. Float takes ceil(min_samples_split * n_samples) features. default=2
int(0-n_samples)
, float(0.0-0.5]
values. Float takes ceil(min_samples_leaf * n_samples) features. default=1
default=friedman_mse
int(0-n_features)
, float(0.0-0.5]
, string(sqrt, log2, auto) or None
as value. default=None
float(0.0,1.0)
default=0.1
We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 3-fold cross-validation on data.
%%time
from sklearn.model_selection import GridSearchCV
n_samples = X_boston.shape[0]
n_features = X_boston.shape[1]
params = {'n_estimators': np.arange(100, 301, 50),
'max_depth': [None, 3, 5,],
'min_samples_split': [2, 0.3, 0.5, n_samples//2, ],
'min_samples_leaf': [1, 0.3, 0.5, n_samples//2, ],
'criterion': ['friedman_mse', 'mae'],
'max_features': [None, 'sqrt', 'auto', 'log2', 0.3, 0.7, n_features//2, ],
}
grad_boost_regressor_grid = GridSearchCV(GradientBoostingRegressor(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
grad_boost_regressor_grid.fit(X_train,Y_train)
print('Train R^2 Score : %.3f'%grad_boost_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%grad_boost_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%grad_boost_regressor_grid.best_score_)
print('Best Parameters : ',grad_boost_regressor_grid.best_params_)
cross_val_results = pd.DataFrame(grad_boost_regressor_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
from sklearn import ensemble, tree
## Gradient Boosting Regressor with Default Params
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))
## Above Hyper-perameter tuned Gradient Boosting Regressor
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1, **grad_boost_regressor_grid.best_params_)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))
## Random Forest Regressor with Default Params
rforest_regressor = ensemble.RandomForestRegressor(random_state=1)
rforest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_regressor.__class__.__name__,
rforest_regressor.score(X_train, Y_train),rforest_regressor.score(X_test, Y_test)))
## Extra Trees Regressor with Default Params
extra_forest_regressor = ensemble.ExtraTreesRegressor(random_state=1)
extra_forest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_regressor.__class__.__name__,
extra_forest_regressor.score(X_train, Y_train),extra_forest_regressor.score(X_test, Y_test)))
## Bagging Regressor with Default Params
bag_regressor = ensemble.BaggingRegressor(random_state=1)
bag_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_regressor.__class__.__name__,
bag_regressor.score(X_train, Y_train),bag_regressor.score(X_test, Y_test)))
## Decision Tree with Default Parameters
dtree_regressor = tree.DecisionTreeRegressor(random_state=1)
dtree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_regressor.__class__.__name__,
dtree_regressor.score(X_train, Y_train),dtree_regressor.score(X_test, Y_test)))
## Decision Tree with Default Parameters
extra_tree_regressor = tree.ExtraTreeRegressor(random_state=1)
extra_tree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_regressor.__class__.__name__,
extra_tree_regressor.score(X_train, Y_train),extra_tree_regressor.score(X_test, Y_test)))
TheGradientBosstingClassifier
is available as a part of the ensemble
module of sklearn. We'll be training the default model with digits data and then tune model by trying various hyperparameter settings to improve its performance. We'll also compare it with other classification estimators to check its performance relative to other machine learning models.
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
from sklearn.ensemble import GradientBoostingClassifier
grad_boosting_classif = GradientBoostingClassifier()
grad_boosting_classif.fit(X_train, Y_train)
Y_preds = grad_boosting_classif.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Test Accuracy : %.3f'%grad_boosting_classif.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%grad_boosting_classif.score(X_train, Y_train))
GradientBoostingClassifier
¶The GradientBoostingClassifier
has the same set of attributes as that of GradientBoostingRegressor
.
print("Feature Importances Shape: ", grad_boosting_classif.feature_importances_.shape)
grad_boosting_classif.feature_importances_[:10]
print("Estimators Shape : ", grad_boosting_classif.estimators_.shape)
print("Loss : ", grad_boosting_classif.loss_)
GradientBoostingClassifier
has almost all parameters same as that of GradientBoostingRegressor
%%time
n_samples = X_digits.shape[0]
n_features = X_digits.shape[1]
params = {'n_estimators': [100, 200],
'max_depth': [None, 2,5,],
'min_samples_split': [2,0.5, n_samples//2, ],
'min_samples_leaf': [1, 0.5, n_samples//2, ],
'criterion': ['friedman_mse', 'mae'],
'max_features': [None, 'sqrt', 'log2', 0.5, n_features//2,],
}
grad_boost_classif_grid = GridSearchCV(GradientBoostingClassifier(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
grad_boost_classif_grid.fit(X_train,Y_train)
print('Train Accuracy : %.3f'%grad_boost_classif_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%grad_boost_classif_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%grad_boost_classif_grid.best_score_)
print('Best Parameters : ',grad_boost_classif_grid.best_params_)
cross_val_results = pd.DataFrame(grad_boost_classif_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
from sklearn import ensemble
## Gradient Boosting Regressor with Default Params
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))
## Above Hyper-perameter tuned Gradient Boosting Regressor
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1, **grad_boost_classif_grid.best_params_)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))
## Random Forest Regressor with Default Params
rforest_classif = ensemble.RandomForestClassifier(random_state=1)
rforest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_classif.__class__.__name__,
rforest_classif.score(X_train, Y_train),rforest_classif.score(X_test, Y_test)))
## Extra Trees Regressor with Default Params
extra_forest_classif = ensemble.ExtraTreesClassifier(random_state=1)
extra_forest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_classif.__class__.__name__,
extra_forest_classif.score(X_train, Y_train),extra_forest_classif.score(X_test, Y_test)))
## Bagging Regressor with Default Params
bag_classif = ensemble.BaggingClassifier(random_state=1)
bag_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_classif.__class__.__name__,
bag_classif.score(X_train, Y_train),bag_classif.score(X_test, Y_test)))
## Decision Tree with Default Parameters
dtree_classif = tree.DecisionTreeClassifier(random_state=1)
dtree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_classif.__class__.__name__,
dtree_classif.score(X_train, Y_train),dtree_classif.score(X_test, Y_test)))
## Decision Tree with Default Parameters
extra_tree_classif = tree.ExtraTreeClassifier(random_state=1)
extra_tree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_classif.__class__.__name__,
extra_tree_classif.score(X_train, Y_train),extra_tree_classif.score(X_test, Y_test)))
TheAdaBoostRegressor
is available as a part of the ensemble
module of sklearn. We'll be training the default model with Boston housing data and then tune the model by trying various hyperparameter settings to improve its performance. We'll also compare it with other regression estimators to check its performance relative to other machine learning models.
X_train, X_test, Y_train, Y_test = train_test_split(X_boston, Y_boston, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
from sklearn.ensemble import AdaBoostRegressor
ada_boost_regressor = AdaBoostRegressor()
ada_boost_regressor.fit(X_train, Y_train)
Y_preds = ada_boost_regressor.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test R^2 Score : %.3f'%ada_boost_regressor.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training R^2 Score : %.3f'%ada_boost_regressor.score(X_train, Y_train))
AdaBoostRegressor
¶Below are some of the important attributes of AdaBoostRegressor
which can provide important information once the model is trained.
base_estimator_
- It returns base estimator from which whole strong estimator consisting of weak estimators is created.feature_importances_
- It returns an array of floats representing the importance of each feature in the dataset.estimators_
- It returns trained estimators.print("Base Estimator : ", ada_boost_regressor.base_estimator_)
print("Feature Importances : ", ada_boost_regressor.feature_importances_)
print("Estimators Shape : ", len(ada_boost_regressor.estimators_))
ada_boost_regressor.estimators_[:2]
Below is a list of common hyperparameters that needs tuning for getting best fit for our data. We'll try various hyperparameters settings to various splits of train/test data to find out best fit which will have almost the same accuracy for both train & test dataset or have quite less difference between accuracy.
default=100
We'll below try various values for the above-mentioned hyperparameters to find the best estimator for our dataset by doing 3-fold cross-validation on data.
%%time
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
n_samples = X_boston.shape[0]
n_features = X_boston.shape[1]
params = {
'base_estimator':[None, DecisionTreeRegressor(), KNeighborsRegressor(), LinearRegression()],
'n_estimators': np.arange(100, 350, 50),
'learning_rate': [0.5, 0.8, 1.0, 2.0, ]
}
ada_boost_regressor_grid = GridSearchCV(AdaBoostRegressor(random_state=1), param_grid=params, n_jobs=-1, cv=3, verbose=5)
ada_boost_regressor_grid.fit(X_train,Y_train)
print('Train R^2 Score : %.3f'%ada_boost_regressor_grid.best_estimator_.score(X_train, Y_train))
print('Test R^2 Score : %.3f'%ada_boost_regressor_grid.best_estimator_.score(X_test, Y_test))
print('Best R^2 Score Through Grid Search : %.3f'%ada_boost_regressor_grid.best_score_)
print('Best Parameters : ',ada_boost_regressor_grid.best_params_)
cross_val_results = pd.DataFrame(ada_boost_regressor_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
from sklearn import ensemble
## Ada Boosting Regressor with Default Params
ada_regressor = ensemble.AdaBoostRegressor(random_state=1)
ada_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_regressor.__class__.__name__,
ada_regressor.score(X_train, Y_train),ada_regressor.score(X_test, Y_test)))
## Above Hyper-perameter tuned Ada Boosting Regressor
ada_regressor = ensemble.AdaBoostRegressor(random_state=1, **ada_boost_regressor_grid.best_params_)
ada_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_regressor.__class__.__name__,
ada_regressor.score(X_train, Y_train),ada_regressor.score(X_test, Y_test)))
## Gradient Boosting Regressor with Default Params
gb_regressor = ensemble.GradientBoostingRegressor(random_state=1)
gb_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_regressor.__class__.__name__,
gb_regressor.score(X_train, Y_train),gb_regressor.score(X_test, Y_test)))
## Random Forest Regressor with Default Params
rforest_regressor = ensemble.RandomForestRegressor(random_state=1)
rforest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_regressor.__class__.__name__,
rforest_regressor.score(X_train, Y_train),rforest_regressor.score(X_test, Y_test)))
## Extra Trees Regressor with Default Params
extra_forest_regressor = ensemble.ExtraTreesRegressor(random_state=1)
extra_forest_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_regressor.__class__.__name__,
extra_forest_regressor.score(X_train, Y_train),extra_forest_regressor.score(X_test, Y_test)))
## Bagging Regressor with Default Params
bag_regressor = ensemble.BaggingRegressor(random_state=1)
bag_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_regressor.__class__.__name__,
bag_regressor.score(X_train, Y_train),bag_regressor.score(X_test, Y_test)))
## Decision Tree with Default Parameters
dtree_regressor = tree.DecisionTreeRegressor(random_state=1)
dtree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_regressor.__class__.__name__,
dtree_regressor.score(X_train, Y_train),dtree_regressor.score(X_test, Y_test)))
## Decision Tree with Default Parameters
extra_tree_regressor = tree.ExtraTreeRegressor(random_state=1)
extra_tree_regressor.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_regressor.__class__.__name__,
extra_tree_regressor.score(X_train, Y_train),extra_tree_regressor.score(X_test, Y_test)))
TheAdaBoostClassifier
is available as a part of the ensemble
module of sklearn. We'll be training the default model with digits data and then tune model by trying various hyperparameter settings to improve its performance. We'll also compare it with other classification estimators to check its performance relative to other machine learning models.
X_train, X_test, Y_train, Y_test = train_test_split(X_digits, Y_digits, train_size=0.80, test_size=0.20, random_state=123)
print('Train/Test Sizes : ', X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
from sklearn.ensemble import AdaBoostClassifier
ada_boosting_classif = AdaBoostClassifier()
ada_boosting_classif.fit(X_train, Y_train)
Y_preds = ada_boosting_classif.predict(X_test)
print(Y_preds[:15])
print(Y_test[:15])
print('Test Accuracy : %.3f'%(Y_preds == Y_test).mean())
print('Test Accuracy : %.3f'%ada_boosting_classif.score(X_test, Y_test)) ## Score method also evaluates accuracy for classification models.
print('Training Accuracy : %.3f'%ada_boosting_classif.score(X_train, Y_train))
AdaBoostClassifier
¶The AdaBoostClassifier
has all attributes the same as that of AdaBoostRegressor
.
print("Base Estimator : ", ada_boosting_classif.base_estimator_)
print("Feature Importances : ", ada_boosting_classif.feature_importances_)
print("Estimators Shape : ", len(ada_boosting_classif.estimators_))
ada_boosting_classif.estimators_[:2]
AdaBoostClassifier
has almost all parameters same as that of AdaBoostRegressor
%%time
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
n_samples = X_digits.shape[0]
n_features = X_digits.shape[1]
params = {
'base_estimator':[None, DecisionTreeClassifier(), SVC(), LogisticRegression()],
'n_estimators': np.arange(100, 350, 100),
'learning_rate': [0.5, 1.0, 2.0, ]
}
ada_boost_classif_grid = GridSearchCV(AdaBoostClassifier(random_state=1, algorithm='SAMME'), param_grid=params, n_jobs=-1, cv=3, verbose=5)
ada_boost_classif_grid.fit(X_train,Y_train)
print('Train Accuracy : %.3f'%ada_boost_classif_grid.best_estimator_.score(X_train, Y_train))
print('Test Accuracy : %.3f'%ada_boost_classif_grid.best_estimator_.score(X_test, Y_test))
print('Best Accuracy Through Grid Search : %.3f'%ada_boost_classif_grid.best_score_)
print('Best Parameters : ',ada_boost_classif_grid.best_params_)
cross_val_results = pd.DataFrame(ada_boost_classif_grid.cv_results_)
print('Number of Various Combinations of Parameters Tried : %d'%len(cross_val_results))
cross_val_results.head() ## Printing first few results.
from sklearn import ensemble
## Gradient Boosting Regressor with Default Params
ada_classifier = ensemble.AdaBoostClassifier(random_state=1)
ada_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_classifier.__class__.__name__,
ada_classifier.score(X_train, Y_train),ada_classifier.score(X_test, Y_test)))
## Above Hyper-perameter tuned Gradient Boosting Regressor
ada_classifier = ensemble.AdaBoostClassifier(random_state=1, **ada_boost_classif_grid.best_params_)
ada_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(ada_classifier.__class__.__name__,
ada_classifier.score(X_train, Y_train),ada_classifier.score(X_test, Y_test)))
## Gradient Boosting Regressor with Default Params
gb_classifier = ensemble.GradientBoostingClassifier(random_state=1)
gb_classifier.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(gb_classifier.__class__.__name__,
gb_classifier.score(X_train, Y_train),gb_classifier.score(X_test, Y_test)))
## Random Forest Regressor with Default Params
rforest_classif = ensemble.RandomForestClassifier(random_state=1)
rforest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(rforest_classif.__class__.__name__,
rforest_classif.score(X_train, Y_train),rforest_classif.score(X_test, Y_test)))
## Extra Trees Regressor with Default Params
extra_forest_classif = ensemble.ExtraTreesClassifier(random_state=1)
extra_forest_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_forest_classif.__class__.__name__,
extra_forest_classif.score(X_train, Y_train),extra_forest_classif.score(X_test, Y_test)))
## Bagging Regressor with Default Params
bag_classif = ensemble.BaggingClassifier(random_state=1)
bag_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(bag_classif.__class__.__name__,
bag_classif.score(X_train, Y_train),bag_classif.score(X_test, Y_test)))
## Decision Tree with Default Parameters
dtree_classif = tree.DecisionTreeClassifier(random_state=1)
dtree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(dtree_classif.__class__.__name__,
dtree_classif.score(X_train, Y_train),dtree_classif.score(X_test, Y_test)))
## Decision Tree with Default Parameters
extra_tree_classif = tree.ExtraTreeClassifier(random_state=1)
extra_tree_classif.fit(X_train, Y_train)
print("%s : Train Accuracy : %.2f, Test Accuracy : %.2f"%(extra_tree_classif.__class__.__name__,
extra_tree_classif.score(X_train, Y_train),extra_tree_classif.score(X_test, Y_test)))
This ends our small tutorial on ensemble learning method boosting using scikit-learn. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to