Updated On : Aug-21,2022 Time Investment : ~60 mins

XGBoost - An In-Depth Guide [Python API]¶

> What is XGBoost (Extreme Gradient Boosting)?¶

Xgboost is a machine learning library that implements the gradient boosting algorithms (gradient boosted decision trees). The gradient boosted decision trees is a type of gradient boosting machines algorithm that has many decision trees in an ensemble. All these decision trees are generally weak predictors and their predictions are combined to make final prediction.

> Why Choose "XGBoost" Over Other Gradient Boosting Trees Implementations?¶

XGBoost is designed to be quite fast compared to the implementation available in sklearn.

XGBoost lets us handle a large amount of data that can have samples in billions with ease.

It can run in parallel and distributed environments to speed up the training process. The distributed algorithm can be useful if data does not fit into to main memory of the machine. Currently, it has support for dask to run the algorithm in a distributed environment.

Xgboost even supports running an algorithm on GPU with a simple configuration which will complete quite fast compared to when run on CPU.

Xgboost provides API in C, C++, Python, R, Java, Julia, Ruby, and Swift.

Xgboost code can be run on a distributed environment like AWS YARN, Hadoop, etc.

It even provides an interface (CLI) to run the algorithm from the command line/shell.

Apart from this, xgboost provides support for controlling feature interactions, custom evaluation functions, callbacks during training, monotonic constraints, etc.

> What Can You Learn From This Article?¶

As a part of this tutorial, we have explained how to use Python library XGBoost to solve machine learning tasks (Classification & Regression). We have explained majority of Python API with simple and easy-to-understand examples.

Apart from training models & making predictions, we have covered concepts like cross-validation, saving & loading models, visualizing features importances, early stop training to avoid overfitting, creating custom object/loss function, creating custom evaluation metrics, callbacks during training, distributed training using dask, GPU training, etc.

All our examples are trained on toy datasets (structured - tabular) available from scikit-learn to keep things simple and easy to grasp

We have tried to cover the majority of features available from xgboost to make this tutorial a short reference to master xgboost Python API.

> Which Other Python Libraries Provides Implementation Of Gradient Boosted Trees?¶

> How to Install XGBoost?¶

PIP
- pip install -U xgboost
Conda
- conda install py-xgboost

Below, we have listed important sections of tutorial to give an overview of the material covered. We know that the list below is big but you can skip some sections of tutorial which has a theory or repeat example of some concepts. We have included NOTE in those sections so you can skip them to complete tutorial faster. You can then refer to those sections in your free time or as per need.

Important Sections Of Tutorial¶

Load Datasets for Tutorial
- Boston Housing Dataset
- Breast Cancer Dataset
- Wine Dataset
XGBoost Estimators at High-Level (High-Level API)
Core API: Booster Estimator
- Booster: Regression Example
  - Divide Data into Train and Test Sets
  - DMatrix: XGBoost Data Structure to Represent Data
  - "train()": Train Model
  - "predict()": Make Predictions
  - Evaluate Model Performance
  - Visualize Features Importances using "plot_importance()"
- Important Parameters of Boosting (train())
- Booster: Tweedie Regression Example
- Booster: Binary Classification Example
- Booster: Multi-Class Classification Example
- Saving and Loading Trained Model
- Cross Validation
Sklearn Like API
- XGBRegressor
  - Train Model, Make Predictions & Evaluate Model Performance
  - Hyperparameters Tuning using Grid Search
- XGBClassifier
- XGBRFRegressor
- XGBRFClassifier
Early Stop Training to Avoid Overfitting
Feature Interaction Constraints
Monotonic Constraints
Custom Objective/Loss Function
Custom Evaluation Functions
Callbacks
Dask Backend for Distributed Training
GPU Support
GPU & Dask Together For Parallel GPUs

We'll start by importing the necessary libraries which we'll use as a part of this tutorial.

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import warnings

warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 50)

import xgboost as xgb
import sklearn

print("XGB Version          : ", xgb.__version__)
print("Scikit-Learn Version : ", sklearn.__version__)

XGB Version          :  1.6.1
Scikit-Learn Version :  1.0.2

1. Load Datasets ¶

We'll be using the below-mentioned three different datasets which are available from sklearn as a part of this tutorial for explanation purposes.

Boston Housing Dataset: It's a regression problem dataset which has information about a various attribute of houses in Boston and their price in dollar. This will be used for regression tasks.
Breast Cancer Dataset: It's a classification dataset which has information about two different types of tumor. It'll be used for explaining binary classification tasks.
Wine Dataset - It's a classification dataset which has information about ingredients used in three different types of wines. It'll be used for explaining multi-class classification tasks.

We have loaded all three datasets mentioned one by one below. We are printing descriptions of datasets which gives us an overview of dataset features and size. We have even loaded each dataset as a pandas data frame and displayed the first few samples of data.

Boston Housing Dataset¶

from sklearn.datasets import load_boston

boston = load_boston()

for line in boston.DESCR.split("\n")[5:29]:
    print(line)

boston_df = pd.DataFrame(data=boston.data, columns = boston.feature_names)
boston_df["Price"] = boston.target

boston_df.head()

**Data Set Characteristics:**

    :Number of Instances: 506

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	Price
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33	36.2

Breast Cancer Dataset¶

from sklearn.datasets import load_breast_cancer

breast_cancer = load_breast_cancer()

for line in breast_cancer.DESCR.split("\n")[5:31]:
    print(line)

breast_cancer_df = pd.DataFrame(data=breast_cancer.data, columns = breast_cancer.feature_names)
breast_cancer_df["TumorType"] = breast_cancer.target

breast_cancer_df.head()

**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, field
        13 is Radius SE, field 23 is Worst Radius.

        - class:
                - WDBC-Malignant
                - WDBC-Benign

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	radius error	texture error	perimeter error	area error	smoothness error	compactness error	concavity error	concave points error	symmetry error	fractal dimension error	worst radius	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension
0	17.99	10.38	122.80	1001.0	0.11840	0.27760	0.3001	0.14710	0.2419	0.07871	1.0950	0.9053	8.589	153.40	0.006399	0.04904	0.05373	0.01587	0.03003	0.006193	25.38	17.33	184.60	2019.0	0.1622	0.6656	0.7119	0.2654	0.4601	0.11890
1	20.57	17.77	132.90	1326.0	0.08474	0.07864	0.0869	0.07017	0.1812	0.05667	0.5435	0.7339	3.398	74.08	0.005225	0.01308	0.01860	0.01340	0.01389	0.003532	24.99	23.41	158.80	1956.0	0.1238	0.1866	0.2416	0.1860	0.2750	0.08902
2	19.69	21.25	130.00	1203.0	0.10960	0.15990	0.1974	0.12790	0.2069	0.05999	0.7456	0.7869	4.585	94.03	0.006150	0.04006	0.03832	0.02058	0.02250	0.004571	23.57	25.53	152.50	1709.0	0.1444	0.4245	0.4504	0.2430	0.3613	0.08758
3	11.42	20.38	77.58	386.1	0.14250	0.28390	0.2414	0.10520	0.2597	0.09744	0.4956	1.1560	3.445	27.23	0.009110	0.07458	0.05661	0.01867	0.05963	0.009208	14.91	26.50	98.87	567.7	0.2098	0.8663	0.6869	0.2575	0.6638	0.17300
4	20.29	14.34	135.10	1297.0	0.10030	0.13280	0.1980	0.10430	0.1809	0.05883	0.7572	0.7813	5.438	94.44	0.011490	0.02461	0.05688	0.01885	0.01756	0.005115	22.54	16.67	152.20	1575.0	0.1374	0.2050	0.4000	0.1625	0.2364	0.07678

Wine Dataset¶

from sklearn.datasets import load_wine

wine = load_wine()

for line in wine.DESCR.split("\n")[5:29]:
    print(line)

wine_df = pd.DataFrame(data=wine.data, columns = wine.feature_names)
wine_df["WineType"] = wine.target

wine_df.head()

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2

	alcohol	malic_acid	ash	alcalinity_of_ash	magnesium	total_phenols	flavanoids	nonflavanoid_phenols	proanthocyanins	color_intensity	hue	od280/od315_of_diluted_wines	proline
0	14.23	1.71	2.43	15.6	127.0	2.80	3.06	0.28	2.29	5.64	1.04	3.92	1065.0
1	13.20	1.78	2.14	11.2	100.0	2.65	2.76	0.26	1.28	4.38	1.05	3.40	1050.0
2	13.16	2.36	2.67	18.6	101.0	2.80	3.24	0.30	2.81	5.68	1.03	3.17	1185.0
3	14.37	1.95	2.50	16.8	113.0	3.85	3.49	0.24	2.18	7.80	0.86	3.45	1480.0
4	13.24	2.59	2.87	21.0	118.0	2.80	2.69	0.39	1.82	4.32	1.04	2.93	735.0

2. XGBoost Estimators at High-Level (High-Level API)¶

Below, we have listed important estimators provided by XGBoost to perform classification and regression tasks.

Booster - It's a universal estimator which can handle both classification and regression datasets with settings. We can create it by calling train() function of XGBoost library.
XGBRegressor - It is an estimator with scikit-learn like API designed to work with regression datasets.
XGBClassifier - It is an estimator with scikit-learn like API designed to work with classification datasets.
XGBRFRegressor - It is an estimator with scikit-learn like API and random forest implementation designed to work with regression datasets.
XGBRFClassifier - It is an estimator with scikit-learn like API and random forest implementation designed to work with classification datasets.

We'll now explain estimators one by one with examples.

3. Core API: Booster Estimator ¶

As a part of this section, we'll explain the core API of xgboost which will have an explanation for different machine learning estimators available with the library. We'll even explain the parameters of these estimators as well as important attributes and methods available through them.

3.1 Booster: Regression Example ¶

We'll start with the creation of a simple estimator for the regression task of predicting prices of houses in Boston. We'll explain how we can use API to create an estimator with default parameters which will just work fine. We'll then explain various parameters available for different purposes.

3.1.1 Divide Data into Train and Test Sets¶

We'll first divide Boston dataset into train (90%) and test (10%) datasets using sklearn's function train_test_split().

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

((455, 13), (51, 13), (455,), (51,))

3.1.2 DMatrix: XGBoost Data Structure to Represent Data¶

Xgboost default API only accepts a dataset that is wrapped in DMatrix. DMatrix is an internal data structure of xgboost that wraps data features and labels both into it. It's designed to be efficient and fastens the training process.

We can create a DMatrix instance by setting a list of the below parameters. Only the data parameter is required and all others are optional.

data - This parameter accepts one of the below as input which has values for data features.
- pandas dataframe
- numpy array
- scipy sparse matrix
- path to libsvm format text file
- libsvm format text
label - It accepts a numpy array of pandas data frame containing labels of the dataset.
missing - It accepts float value in the dataset which should be treated as a missing value. The default is "None" meaning that "np.nan" is considered missing.
feature_names - It accepts a list of string specifying feature names of data.
feature_types - It accepts a list of string specifying feature data types.
nthread - It accepts integer specifying the number of threads to use when loading data. The value of -1 uses all available threads on the system.

Below we have created train DMatrix and test DMatrix using numpy arrays of features data and labels. We have also passed feature names to the constructor.

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

dmat_train, dmat_test

(<xgboost.core.DMatrix at 0x7f7416cf0240>,
 <xgboost.core.DMatrix at 0x7f7416cf0208>)

3.1.3 "train()": Train Model¶

The simplest way of creating a booster using xgboost is by calling the train() method of xgboost. The train() method returns an instance of class xgboost.core.Booster after training is completed. We need to pass parameters for boosting algorithm as a dictionary to train method.

Below we have given a list of important parameters of the train() method. Only params and dtrain are required and all other parameters are optional and have default values set to them.

params - It accepts a dictionary of gradient boosting algorithm parameters. We can give it an even empty dictionary and it'll take the default value for all parameters. By default, it'll consider the task to be a regression task and will calculate RMSE loss. We need to specify at least an objective function if we want it to consider a classification task for the data.
dtrain - It accepts DMatrix instances of train data.
num_boost_round - It accepts integer specifying the number of rounds of the training process. The algorithm will iterate over whole training data many times.
evals - We can provide a list of tuples specifying datasets to be used for evaluation when performing training. We have passed our train and test datasets as evaluation sets hence RMSE for each will be printed after all iterations.
obj - We can give customized objective function which will be maximized/minimized when training algorithm.
feval - We can give a customized evaluation function that will be used to evaluate datasets given to evals.
maximize - It accepts a boolean specifying whether to maximize or minimize our objective/loss function.
early_stopping_rounds - It accepts an integer that instructs the algorithm to stop training if the last eval set in the list has not improved for that many rounds. If the objective/loss of the last eval dataset has not improved for that many consecutive rounds of training then the training process will stop. This parameter requires us to provide an evals parameter for it to work.
evals_result - We can provide an empty dictionary to this parameter and it'll store evaluation results in it.
verbose_eval - It accepts bool or integer specifying whether to print evaluation results. The integer value greater than 0 will print evaluation results at every that many iterations.
callbacks - It accepts a list of callbacks that are applied at the end of each iteration of the training process.

Below we have called the train() method of xgboost by passing it a few parameters for boosting algorithm, train data for training, and evaluation set of training and test dataset on which evaluation after each iteration will happen.

booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:squarederror'},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")])

booster

[0]	train-rmse:3.94894	test-rmse:3.59159
[1]	train-rmse:3.37195	test-rmse:3.26373
[2]	train-rmse:3.09769	test-rmse:3.12218
[3]	train-rmse:2.78200	test-rmse:2.94107
[4]	train-rmse:2.53499	test-rmse:2.75222
[5]	train-rmse:2.37140	test-rmse:2.78515
[6]	train-rmse:2.23286	test-rmse:2.64519
[7]	train-rmse:2.16047	test-rmse:2.64290
[8]	train-rmse:2.03129	test-rmse:2.58895
[9]	train-rmse:1.96511	test-rmse:2.61442

<xgboost.core.Booster at 0x7f7416cedcf8>

3.1.4 "predict()": Make Predictions¶

We can use the predict() method of booster instance to predict labels for data passed to it. The predict() method requires us to pass the DMatrix instance only.

The predict method provides a list of the below important parameters that can be useful in different situations.

data - It accepts DMatrix of feature values.
ntree_limit - It accepts an integer specifying the number of trees to use from the total tree to make a prediction. The default is 0 which means to use all trees.
pred_leaf - It accepts boolean which is set to True returns array of size n_samples x n_trees where each entry is an index of leaf in a tree which was used for prediction. The entry (0,1) refers to an index of leaf for the 2nd tree which was used to make a prediction for the first sample. The default is False.
pred_contribs - It accepts boolean which if set to True returns an array of size n_sample x n_features+1 where each entry specifies contributions of features in making a final prediction for that sample. It's referred to as SHAP values. If we add all values for a particular sample then we can get the actual prediction. The default is False.
pred_interactions - It accepts boolean which if set to True returns array of size n_sample x n_features+1 xn_features+1 indicating features SHAP interaction values for each sample.

Below we have created a data frame showing the first 10 actual test labels and 10 predicted labels for test data.

pd.DataFrame({ "Actuals":Y_test[:10], "Prediction":booster.predict(dmat_test)[:10]})

	Actuals	Prediction
0	23.6	25.580267
1	32.4	31.743393
2	13.6	13.508162
3	22.8	23.470869
4	16.1	13.658171
5	20.0	22.350372
6	17.8	17.217281
7	14.0	14.332675
8	19.6	20.501831
9	16.8	20.756474

Below we have retrieved shap values for our test samples. We have even summed up shap values for each sample to calculate the final prediction which is the same as the actual prediction printed above.

If you are interested in learning about the SHAP python library which provides various methods for calculating SHAP values and different types of plots to interpret them then please feel free to check our tutorial on the same.

SHAP - Explain Machine Learning Model Predictions using Game-Theoretic Approach

shap_values = booster.predict(dmat_test, pred_contribs=True)

print("SHAP Values Size : ", shap_values.shape)

print("\nSample SHAP Values : ",shap_values[0])
print("\nSumming SHAP Values for Prediction : ",shap_values.sum(axis=1)[:5]) # First 5 preds are only printed

SHAP Values Size :  (51, 14)

Sample SHAP Values :  [ 5.31424880e-01  0.00000000e+00  3.62157822e-04  1.90089308e-02
  1.01445103e+00 -2.51514196e+00 -5.74439168e-01  2.83589065e-01
  4.30885423e-03 -2.59072632e-01  3.69396627e-01  1.22908555e-01
  3.89043856e+00  2.26930332e+01]

Summing SHAP Values for Prediction :  [25.580269 31.743395 13.508162 23.470871 13.658173]

booster.predict(dmat_test, pred_leaf=True)[:5]

array([[ 8, 11, 11, 13, 11,  8, 11,  5,  9, 13],
       [ 8,  7, 11,  8, 12,  8,  8,  5,  7, 13],
       [13, 11, 11, 12,  7, 13, 11,  6, 13, 11],
       [ 8, 11, 11, 13,  7, 12,  8,  5,  7, 13],
       [14, 11, 11, 14, 11, 13, 11,  5, 13, 14]], dtype=int32)

shap_interactions = booster.predict(dmat_test, pred_interactions=True)

print("SHAP Interactions Size : ", shap_interactions.shape)

SHAP Interactions Size :  (51, 14, 14)

3.1.5 Evaluate Model Performance¶

We can explicitly evaluate the dataset using a trained booster instance with the help of the eval() method. It'll evaluate the dataset and return an objective function value for it. below we are using the eval() method on the train and test DMatrix to get RMSE for both.

print("Train RMSE : ",booster.eval(dmat_train))
print("Test  RMSE : ",booster.eval(dmat_test))

Train RMSE :  [0]	eval-rmse:1.965108
Test  RMSE :  [0]	eval-rmse:2.614419

Below we have evaluated the R2 score for train and test datasets using the r2_score() function of sklearn. We have then evaluated the R2 score based on using only 5 trees from the ensemble rather than using all trees.

Scikit-learn provides many commonly used machine learning metrics for evaluating model performance on regression, classification, and clustering tasks. Please feel free to check below link if you want to learn about them.

Scikit-Learn: Model Evaluation Metrics/Scoring Functions

from sklearn.metrics import r2_score

print("Test  R2 Score : %.2f"%r2_score(Y_test, booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, booster.predict(dmat_train)))

Test  R2 Score : 0.89
Train R2 Score : 0.96

print("Number of Trees in Ensemble : ",booster.best_ntree_limit)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, booster.predict(dmat_test, ntree_limit=5)))
print("Train R2 Score : %.2f"%r2_score(Y_train, booster.predict(dmat_train, ntree_limit=5)))

Number of Trees in Ensemble :  10

Test  R2 Score : 0.88
Train R2 Score : 0.93

3.1.6 Visualize Features Importances using "plot_importance()"¶

The xgboost provides functionality that lets us print feature importance. We need to pass our booster instance to the method and it'll plot feature importance bar chart using matplotlib. The plot_importance() method has an important parameter named importance_type which accepts one of the below-mentioned 3 string values to plot feature importance in three different ways.

weight - It plots the number of times a feature appears in a tree. This is the default value.
gain - It plots the average gain of splits that uses the feature.
cover - It plots the average coverage of splits for each feature.

with plt.style.context("ggplot"):
    fig = plt.figure(figsize=(9,6))
    ax = fig.add_subplot(111)
    xgb.plotting.plot_importance(booster, ax=ax, height=0.6, importance_type="weight")

Visualize Individual Boosted Tree using "plot_tree()"¶

Xgboost also lets us plot the individual trees in the ensemble of trees using the plot_tree() method. It accepts booster instance and index of a tree which we want to plot. Below we have plotted the 10th tree of an ensemble. Please make a note that indexing starts at 0.

with plt.style.context("ggplot"):
    fig = plt.figure(figsize=(25,10))
    ax = fig.add_subplot(111)
    xgb.plotting.plot_tree(booster, ax=ax, num_trees=9)

Visualize Feature Values Split Histogram using "get_split_value_histogram()"¶

The get_split_value_histogram() method returns histogram of splits for feature values. Below we have created split values histogram for feature LSTAT of data. It gives us value and how many times a split has happened at that value.

booster.get_split_value_histogram("LSTAT")

	SplitValue	Count
0	7.182500	2.0
1	9.530000	1.0
2	11.877500	4.0
3	16.572500	2.0
4	18.920001	1.0
5	21.267501	1.0
6	30.657501	1.0
7	33.005001	1.0

Convert Trees to Dataframe using "trees_to_dataframe()"¶

The trees_to_dataframe() method will dump information on trees used in an ensemble as a pandas dataframe. It'll have information on each tree-like individual node ids, feature name, and its values used for a split at each node, gain at each node, cover at each node, etc.

booster.trees_to_dataframe()

	Tree	Node	ID	Feature	Split	Yes	No	Missing	Gain	Cover
0	0	0	0-0	LSTAT	9.72500	0-1	0-2	0-1	16866.609400	455.0
1	0	1	0-1	RM	6.94100	0-3	0-4	0-3	6006.859380	196.0
2	0	2	0-2	LSTAT	16.21500	0-5	0-6	0-5	2317.007810	259.0
3	0	3	0-3	DIS	1.48495	0-7	0-8	0-7	562.812500	129.0
4	0	4	0-4	RM	7.43700	0-9	0-10	0-9	496.929688	67.0
...	...	...	...	...	...	...	...	...	...	...
133	9	10	9-10	Leaf	NaN	NaN	NaN	NaN	0.277203	2.0
134	9	11	9-11	Leaf	NaN	NaN	NaN	NaN	0.477180	104.0
135	9	12	9-12	Leaf	NaN	NaN	NaN	NaN	-1.080109	11.0
136	9	13	9-13	Leaf	NaN	NaN	NaN	NaN	0.046793	249.0
137	9	14	9-14	Leaf	NaN	NaN	NaN	NaN	-0.831588	78.0

138 rows × 10 columns

3.2 Important Parameters of Boosting (train()) ¶

NOTE: Please feel free to skip this section if you are in hurry. It is a theoretical section listing parameters of "train()" function. You can refer to them later as you need to tweak model.

Below we have given a list of important parameters of the boosting algorithm which we can pass as a dictionary to the params parameter of "train()" function as well as other XGBoost models (XGBRegressor, XGBClassifier, etc) explained below.

booster - It specifies which gradient boosting algorithm to use for training. Below is a list of possible options.
- gbtree - It’s a tree-based algorithm. Default.
- gblinear - It’s a linear function based algorithm.
- dart - It’s a tree-based algorithm.
eta - It accepts float [0,1] specifying learning rate for training process. Default = 0.3
tree_method - It accepts string specifying tree construction algorithm. Below is a list of possible options.
- auto - It automatically decides the algorithm based on dataset size. For the small datasets, it uses exact and for larger datasets approx.
- exact - It specifies the exact greedy algorithm. It tries all possible splits to create trees.
- approx - It’s an approximate greedy algorithm that uses quantile sketch and gradient histogram.
- hist - It’s an approximate greedy algorithm optimized using a faster histogram.
- gpu_hist - Its a GPU implementation of hist.
max_depth - It accepts an integer specifying the maximum depth of the tree. The default is 6.
gamma - It accepts float specifying minimum loss required to make a further partition on a particular node of the tree during training. The default is 0.
subsample - It accepts float in the range (0,1] specifying sub-sample ratio of training samples. The value of 0.5 will result in taking half of the sample randomly before training starts which can help prevent overfitting.
sampling_method - This parameter accepts one of the below string as a sampling method to draw sub-samples.
- uniform - Default
- gradient_based
lambda - It accepts float specifying L2 regularization term on weights. The default is 1.
alpha - It accepts float specifying L1 regularization term on weights. The default is 0.
max_bin - It accepts an integer specifying the number of bins to bucket continuous features. The default is 256. The more value improves split quality at the expense of more computation time.
monotone_constraints - It accepts tuple of integers of length n_features. Each entry in tuple has a value of either 1,0 or -1 specifying increasing, none, or decreasing monotone relation of a feature with the target. It only works with tree_method set to one of the exact, hist or gpu_hist.
interaction_constraints - It accepts a list of the list each individual list represents indexes of features that are allowed to interact when creating a tree to make the final prediction. If we don't provide this constraint then all features are allowed to interact with one another. We can restrict feature interaction using this parameter.
tweedie_variance_power - It accepts float in the range (1,2) that controls variance of Tweedie distribution. The default value is 1.5.
objective - It accepts string specifying objective/loss function to use for training. The default value is reg:squarederror. Below are some of the commonly used values. Please visit this link to check a list of all objective functions available.
- reg:squarederror
- reg:squaredlogerror
- reg:logistic - Logistic Regression
- binary:logistic - Logistic Regression for Binary Classification. Outputs probability.
- multi:softmax - Multi-Class classification using softmax function.
- multi:softprob - It’s the same as softmax but outputs probability.
- reg:tweedie - It’s tweedie regression with log-link.
eval_metric - It accepts string value specifying metric which will be used to evaluate evaluation sets passed to evals parameter. Below is a list of commonly used values. Please visit this link to check a list of all evaluation metrics available.
- rmse - Root Mean Squared Error
- rmsle - Root Mean Squared Log Error
- mae - Mean Absolute Error
- logloss - Negative Log-likelihood
- auc - Area Under Curve ROC
- error - Binary Classification error rate (no_wrong_preds/total_samples).
num_class - It's an integer specifying number of class for multi-class classification problem. We need to provide this when the objective is set to multi:softmax or multi:softprob.
nthread - It specifies the number of threads to use to run xgboost.
verbosity - It accepts one of the below integers for printing messages during training.
- 0 - Silent
- 1 - Warning
- 2 - Info
- 3 - Debug

Please make a NOTE that this is not a list of all parameters for estimator but a list of important parameters that are commonly tuned by practitioners. Please visit the below link to know about all possible parameters available with xgboost.

XGBoost Important Parameters

3.3 Booster: Tweedie Regression Example ¶

NOTE: Please feel free to skip this section if you are in hurry and have understood how to perform regression from previous section. It is explaining usage of different objective/loss function.

Below we have explained an example of how we can use tweedie regression on Boston housing data. We have trained the model using tweedie regression and then evaluated RMSE and R2 scores on both train and test datasets.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie', 'tree_method':'hist', 'nthread':4},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")])

print("\nTrain RMSE : ",tweedie_booster.eval(dmat_train))
print("Test  RMSE : ",tweedie_booster.eval(dmat_test))

from sklearn.metrics import r2_score

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	train-tweedie-nloglik@1.5:28.32970	test-tweedie-nloglik@1.5:26.66488
[1]	train-tweedie-nloglik@1.5:19.30740	test-tweedie-nloglik@1.5:18.58394
[2]	train-tweedie-nloglik@1.5:18.72894	test-tweedie-nloglik@1.5:18.14010
[3]	train-tweedie-nloglik@1.5:18.71592	test-tweedie-nloglik@1.5:18.13065
[4]	train-tweedie-nloglik@1.5:18.70913	test-tweedie-nloglik@1.5:18.12305
[5]	train-tweedie-nloglik@1.5:18.70438	test-tweedie-nloglik@1.5:18.12354
[6]	train-tweedie-nloglik@1.5:18.70052	test-tweedie-nloglik@1.5:18.11985
[7]	train-tweedie-nloglik@1.5:18.69816	test-tweedie-nloglik@1.5:18.12131
[8]	train-tweedie-nloglik@1.5:18.69564	test-tweedie-nloglik@1.5:18.12422
[9]	train-tweedie-nloglik@1.5:18.69303	test-tweedie-nloglik@1.5:18.12833

Train RMSE :  [0]	eval-tweedie-nloglik@1.5:18.693033
Test  RMSE :  [0]	eval-tweedie-nloglik@1.5:18.128325

Test  R2 Score : 0.90
Train R2 Score : 0.95

3.4 Booster: Binary Classification Example ¶

As a part of this section, we have explained how we can use the train() method to train booster for the binary classification task of classifying breast cancer tumor types. Please make a note that we have used binary:logistic as our objective function hence the output of the predict() method of the booster will be a probability. We have included logic to convert probabilities into class. We have then calculated accuracy, confusion matrix, and classification report for test data.

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target, train_size=0.90, stratify=breast_cancer.target, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=breast_cancer.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=breast_cancer.feature_names)

booster = xgb.train({'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic'},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")])

print("\nTrain RMSE : ",booster.eval(dmat_train))
print("Test  RMSE : ",booster.eval(dmat_test))

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

train_preds = [1 if pred>0.5 else 0 for pred in booster.predict(data=dmat_train)]
test_preds = [1 if pred>0.5 else 0 for pred in booster.predict(data=dmat_test)]

print("\nTest  Accuracy : %.2f"%accuracy_score(Y_test, test_preds))
print("Train Accuracy : %.2f"%accuracy_score(Y_train, train_preds))

print("\nConfusion Matrix : ")
print(confusion_matrix(Y_test, test_preds))

print("\nClassification Report : ")
print(classification_report(Y_test, test_preds))

Train/Test Sizes :  (512, 30) (57, 30) (512,) (57,)

[0]	train-error:0.05273	test-error:0.10526
[1]	train-error:0.02344	test-error:0.07018
[2]	train-error:0.01953	test-error:0.07018
[3]	train-error:0.02148	test-error:0.05263
[4]	train-error:0.00977	test-error:0.05263
[5]	train-error:0.00781	test-error:0.05263
[6]	train-error:0.00586	test-error:0.07018
[7]	train-error:0.00195	test-error:0.03509
[8]	train-error:0.00195	test-error:0.08772
[9]	train-error:0.00195	test-error:0.03509

Train RMSE :  [0]	eval-error:0.001953
Test  RMSE :  [0]	eval-error:0.035088

Test  Accuracy : 0.96
Train Accuracy : 1.00

Confusion Matrix :
[[20  1]
 [ 1 35]]

Classification Report :
              precision    recall  f1-score   support

           0       0.95      0.95      0.95        21
           1       0.97      0.97      0.97        36

    accuracy                           0.96        57
   macro avg       0.96      0.96      0.96        57
weighted avg       0.96      0.96      0.96        57

Below we have plotted feature importance for booster trained on breast cancer dataset. We have plotted the average gain of splits that uses the feature. Please feel free to look at the data frame retrieved using the trees_to_dataframe() method.

with plt.style.context("ggplot"):
    fig = plt.figure(figsize=(9,6))
    ax = fig.add_subplot(111)
    xgb.plotting.plot_importance(booster, ax=ax, height=0.6, importance_type="gain")

We have now plotted 3rd tree from the ensemble below.

with plt.style.context("ggplot"):
    fig = plt.figure(figsize=(15,10))
    ax = fig.add_subplot(111)
    xgb.plotting.plot_tree(booster, ax=ax, num_trees=2)

3.5 Booster: Multi-Class Classification Example ¶

NOTE: Please feel free to skip this section if you are in hurry and have understood how to perform classification from previous binary classification section.

As a part of this section, we have explained how we can use the train() method for multi-class classification problems. We have used it to generate booster trained on wine classification train dataset. We have then evaluated the accuracy, confusion matrix, and classification report on the test dataset.

We have then plotted the feature importance bar chart and first decision tree.

X_train, X_test, Y_train, Y_test = train_test_split(wine.data, wine.target, train_size=0.80, stratify=wine.target, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=wine.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=wine.feature_names)

booster = xgb.train({'max_depth': 5, 'eta': 1, 'objective': 'multi:softmax', 'num_class':3},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")])

print("\nTrain RMSE : ",booster.eval(dmat_train))
print("Test  RMSE : ",booster.eval(dmat_test))

from sklearn.metrics import accuracy_score

print("\nTest  Accuracy : %.2f"%accuracy_score(Y_test, booster.predict(data=dmat_test)))
print("Train Accuracy : %.2f"%accuracy_score(Y_train, booster.predict(data=dmat_train)))

print("\nConfusion Matrix : ")
print(confusion_matrix(Y_test, booster.predict(data=dmat_test)))

print("\nClassification Report : ")
print(classification_report(Y_test, booster.predict(data=dmat_test)))

Train/Test Sizes :  (142, 13) (36, 13) (142,) (36,)

[0]	train-merror:0.00000	test-merror:0.05556
[1]	train-merror:0.00000	test-merror:0.05556
[2]	train-merror:0.00000	test-merror:0.02778
[3]	train-merror:0.00000	test-merror:0.02778
[4]	train-merror:0.00000	test-merror:0.00000
[5]	train-merror:0.00000	test-merror:0.02778
[6]	train-merror:0.00000	test-merror:0.00000
[7]	train-merror:0.00000	test-merror:0.02778
[8]	train-merror:0.00000	test-merror:0.02778
[9]	train-merror:0.00000	test-merror:0.02778

Train RMSE :  [0]	eval-merror:0.000000
Test  RMSE :  [0]	eval-merror:0.027778

Test  Accuracy : 0.97
Train Accuracy : 1.00

Confusion Matrix :
[[12  0  0]
 [ 0 14  0]
 [ 0  1  9]]

Classification Report :
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       0.93      1.00      0.97        14
           2       1.00      0.90      0.95        10

    accuracy                           0.97        36
   macro avg       0.98      0.97      0.97        36
weighted avg       0.97      0.97      0.97        36

with plt.style.context("ggplot"):
    fig = plt.figure(figsize=(9,6))
    ax = fig.add_subplot(111)
    xgb.plotting.plot_importance(booster, ax=ax, height=0.6, importance_type="weight")

with plt.style.context("ggplot"):
    fig = plt.figure(figsize=(20,10))
    ax = fig.add_subplot(111)
    xgb.plotting.plot_tree(booster, ax=ax, num_trees=1)

3.6 Saving and Loading Trained Model ¶

As a part of this section, we have explained how we can save the trained xgboost model to disk and then load it to make predictions again in the future.

Below is a list of available methods that can be used to save the model in a different format.

save_model(file_name) - It saves model in xgboost internal format.
save_config() - It outputs booster configuration as JSON string which can be saved to json file. We can load the booster later using the same parameter configuration using this file.
save_raw() - It returns the byte array object which is the current memory representation of a booster instance.

Below is a list of available methods that can be used to load the saved model.

load_model(file_name) - It accepts file name or byte array from which trained model can be loaded.
load_config() - It accepts JSON string generated by save_config() to load model with same configuration.

Below we have saved our multi-class classification model which we created in the previous example. We have then reloaded the model and made predictions using it for verification.

booster.save_model("multiclass_classification.model")

loaded_booster =  xgb.Booster()
loaded_booster

<xgboost.core.Booster at 0x7f7460ae5780>

loaded_booster.load_model("multiclass_classification.model")

pd.DataFrame({"Preds":booster.predict(dmat_test)[:5], "Loaded Model Preds":loaded_booster.predict(dmat_test)[:5]})

	Preds	Loaded Model Preds
0	0.0	0.0
1	2.0	2.0
2	0.0	0.0
3	1.0	1.0
4	1.0	1.0

We can even load the model by using the Booster() class giving it the file name as a part of the model_file parameter.

loaded_booster1 =  xgb.Booster(model_file="multiclass_classification.model")

pd.DataFrame({"Preds":booster.predict(dmat_test)[:5], "Loaded Model Preds":loaded_booster1.predict(dmat_test)[:5]})

	Preds	Loaded Model Preds
0	0.0	0.0
1	2.0	2.0
2	0.0	0.0
3	1.0	1.0
4	1.0	1.0

3.7 Cross Validation ¶

Xgboost lets us perform cross-validation on our dataset as well using the cv() method. The cv() method has almost the same parameters as that of the train() method with few extra parameters as mentioned below.

nfold - It accepts an integer specifying the number of folds to create from the dataset. The default is 3.
folds - It accepts sklearn KFold, StratifiedKFold, ShuffleSplitor StratifiedShuffleSplit instance.
metrics - It accepts list of metrics to evaluate.

Below we have performed cross-validation on the full Boston dataset for 10 rounds and 5 folds.

dmat_train = xgb.DMatrix(boston.data, boston.target, feature_names=boston.feature_names)

xgb.cv({'max_depth': 5, 'eta': 1, 'objective': 'reg:squarederror'}, dmat_train, num_boost_round=10, nfold=5)

	train-rmse-mean	train-rmse-std	test-rmse-mean	test-rmse-std
0	3.822509	0.207240	5.041322	0.709767
1	2.650213	0.148219	4.740111	0.665003
2	2.179826	0.093208	4.509482	0.673848
3	1.828081	0.080050	4.392651	0.675589
4	1.512701	0.056885	4.294071	0.450255
5	1.316335	0.024188	4.285634	0.437991
6	1.114541	0.050333	4.350706	0.418353
7	0.961021	0.054002	4.387452	0.434030
8	0.869441	0.049707	4.403180	0.421032
9	0.790959	0.063960	4.389377	0.445109

Below we have again performed cross-validation on the Boston dataset but this time we have passed sklearn ShufflSplit for creating folds. It creates 10 fold of randomly shuffled data.

from sklearn.model_selection import KFold, ShuffleSplit

shuffle_split = ShuffleSplit(random_state=123)

dmat_train = xgb.DMatrix(boston.data, boston.target, feature_names=boston.feature_names)

xgb.cv({'max_depth': 5, 'eta': 1, 'objective': 'reg:squaredlogerror'}, dmat_train, folds=shuffle_split)

	train-rmsle-mean	train-rmsle-std	test-rmsle-mean	test-rmsle-std
0	2.166785	0.006862	2.179977	0.064457
1	1.660311	0.006184	1.673285	0.062628
2	1.203496	0.005193	1.216123	0.059105
3	0.825498	0.003802	0.837564	0.051902
4	0.558500	0.003576	0.570968	0.040899
5	0.405827	0.004899	0.418653	0.036919
6	0.334471	0.008007	0.348108	0.039197
7	0.308396	0.010793	0.320564	0.043359
8	0.301879	0.011445	0.312817	0.046100
9	0.300311	0.011576	0.310560	0.047579

Below we have performed cross-validation on the breast cancer dataset. We have informed the cv() method to evaluate log loss, AUC, and error metrics for each iteration.

dmat_train = xgb.DMatrix(breast_cancer.data,
                         breast_cancer.target,
                         feature_names=breast_cancer.feature_names)

xgb.cv({'max_depth': 3, 'eta': 1, 'objective': 'binary:logitraw'},
       dmat_train, stratified=breast_cancer.target, nfold=5, metrics=["auc", "logloss", "error"])

	train-auc-mean	train-auc-std	train-logloss-mean	train-logloss-std	train-error-mean	train-error-std	test-auc-mean	test-auc-std	test-logloss-mean	test-logloss-std	test-error-mean	test-error-std
0	0.982511	0.003957	0.814563	0.177615	0.036921	0.009734	0.950616	0.018239	2.359281	0.590475	0.077384	0.007155
1	0.995670	0.002256	0.407830	0.056408	0.020655	0.002281	0.971417	0.015519	1.840956	0.755282	0.065117	0.016566
2	0.998588	0.000816	0.251895	0.053079	0.010540	0.003212	0.981969	0.012894	1.461477	0.831429	0.054652	0.023516
3	0.999566	0.000313	0.119303	0.108406	0.006146	0.002904	0.986684	0.013078	1.383230	0.909166	0.044125	0.024508
4	0.999913	0.000132	0.055061	0.069058	0.004390	0.003395	0.985918	0.013166	1.323538	0.760395	0.047665	0.027874
5	0.999975	0.000049	0.017522	0.033633	0.000877	0.001754	0.987003	0.012294	1.381233	0.761902	0.044125	0.022510
6	0.999992	0.000016	0.001264	0.002502	0.000877	0.001754	0.987900	0.009896	1.309201	0.746346	0.038815	0.019172
7	1.000000	0.000000	0.000359	0.000674	0.000439	0.000877	0.989445	0.007545	1.248183	0.666860	0.040554	0.016580
8	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.990555	0.007242	1.244237	0.668620	0.038815	0.021483
9	1.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.990723	0.007753	1.371507	0.702382	0.040554	0.019206

4. Sklearn Like API ¶

Xgboost provides estimators that have almost the same API like that of sklearn estimators. This helps developers with sklearn background to grasp the usage of xgboost faster. It even lets us use the xgboost model with sklearn's grid search functionality. As a part of this section, we'll explain 4 estimators available from xgboost which has the same API as sklearn's estimators.

XGBRegressor
XGBClassifier
XGBRFRegressor
XGBRFClassifier

4.1 XGBRegressor ¶

The XGBRegressor is an estimator that is used for regression problems. It has a default objective function as reg:squarederror. It has a list of parameters that we gave as a dictionary to the train() method. We pass those parameters to the constructor of XGBRegressor directly.

4.1.1 Train Model, Make Predictions & Evaluate Model Performance¶

Below we have trained XGBRegressor on Boston train data and then calculated R2 score on test and train dataset both. The score() method is available as a part of estimators which has sklearn like API. The score() method will return the R2 score for regression tasks and accuracy for classification tasks.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

xgb_regressor = xgb.XGBRegressor()

xgb_regressor.fit(X_train, Y_train, eval_set=[(X_test, Y_test)], eval_metric="mae", verbose=10)

print("Test  R2 Score : %.2f"%xgb_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_regressor.score(X_train, Y_train))

[0]	validation_0-mae:14.61328
[10]	validation_0-mae:1.86316
[20]	validation_0-mae:1.70020
[30]	validation_0-mae:1.62740
[40]	validation_0-mae:1.63325
[50]	validation_0-mae:1.62120
[60]	validation_0-mae:1.61760
[70]	validation_0-mae:1.62004
[80]	validation_0-mae:1.61866
[90]	validation_0-mae:1.62278
[99]	validation_0-mae:1.62320
Test  R2 Score : 0.93
Train R2 Score : 1.00

xgb_regressor.predict(X_test)[:5]

array([24.521688, 29.77457 , 14.518701, 22.433651, 17.031559],
      dtype=float32)

Below we have printed the number of estimators which model used by default, max depth of each tree, and feature importance of individual features.

print("Default Number of Estimators : ",xgb_regressor.n_estimators)
print("Default Max Depth of Trees   : ", xgb_regressor.max_depth)
print("Feature Importances : ")

pd.DataFrame([xgb_regressor.feature_importances_], columns=boston.feature_names)

Default Number of Estimators :  100
Default Max Depth of Trees   :  None
Feature Importances :

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.011552	0.001155	0.014551	0.00315	0.043485	0.242339	0.011518	0.056501	0.010146	0.032733	0.062321	0.012791	0.497758

4.1.2 Hyperparameters Tuning using Grid Search¶

We have now explained how we perform a grid search with XGBRegressor. We have tried different values of parameters n_estimators, max_depth, and eta to find the best performing values. We have then plotted grid search results as well.

%%time

from sklearn.model_selection import GridSearchCV

params = {
        'n_estimators': [50,100],
        'max_depth': [None, 3, 5, 7, 9],
        'eta': [0.5, 1, 2, 3]
        }
grid_search = GridSearchCV(xgb.XGBRegressor(), params, n_jobs=-1)

grid_search.fit(X_train, Y_train)

print("Test  R2 Score : %.2f"%grid_search.score(X_test, Y_test))
print("Train R2 Score : %.2f"%grid_search.score(X_train, Y_train))

print("Best Params : ", grid_search.best_params_)
print("Feature Importances : ")
pd.DataFrame([grid_search.best_estimator_.feature_importances_], columns=boston.feature_names)

Test  R2 Score : 0.91
Train R2 Score : 1.00
Best Params :  {'eta': 0.5, 'max_depth': 5, 'n_estimators': 50}
Feature Importances :
CPU times: user 652 ms, sys: 85.1 ms, total: 738 ms
Wall time: 3.39 s

/home/sunny/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
  DeprecationWarning)

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.011688	0.003513	0.011462	0.002225	0.035786	0.147807	0.009096	0.040782	0.007239	0.026909	0.057599	0.014461	0.631432

grid_search_results = pd.DataFrame(grid_search.cv_results_)
print("Grid Search Size : ", grid_search_results.shape)
grid_search_results.head()

Grid Search Size :  (40, 14)

	mean_fit_time	std_fit_time	mean_score_time	std_score_time	param_eta	param_max_depth	param_n_estimators	params	split0_test_score	split1_test_score	split2_test_score	mean_test_score	std_test_score	rank_test_score
0	0.051536	0.018275	0.006071	0.006235	0.5	None	50	{'eta': 0.5, 'max_depth': None, 'n_estimators'...	0.887726	0.849283	0.879015	0.871992	0.016472	6
1	0.056037	0.022982	0.001785	0.000207	0.5	None	100	{'eta': 0.5, 'max_depth': None, 'n_estimators'...	0.887756	0.849392	0.879027	0.872043	0.016434	5
2	0.012587	0.000226	0.001103	0.000048	0.5	3	50	{'eta': 0.5, 'max_depth': 3, 'n_estimators': 50}	0.865275	0.870843	0.884393	0.873480	0.008021	3
3	0.023356	0.000537	0.001254	0.000021	0.5	3	100	{'eta': 0.5, 'max_depth': 3, 'n_estimators': 100}	0.867816	0.869590	0.880927	0.872760	0.005802	4
4	0.019933	0.001143	0.001214	0.000030	0.5	5	50	{'eta': 0.5, 'max_depth': 5, 'n_estimators': 50}	0.880565	0.874491	0.872728	0.875935	0.003357	1

xgb_regressor.get_booster() ## We can get Booster object using this method from sklearn estimators

<xgboost.core.Booster at 0x7f7460a810b8>

4.2 XGBClassifier ¶

The XGBClassifier is an estimator that is used for classification tasks. It has the default objective function binary:logistic. We can pass the same parameters which we can pass to the train() method's params parameter as a dictionary to the constructor of XGBClassifier. We can get actual predictions using predict() method and probabilities using predict_proba() method. It even provides a score() method which lets us calculate the accuracy of the model on given data.

4.2.1 Train Model, Make Predictions & Evaluate Model Performance¶

Below we have trained XGBClassifier on the breast cancer train dataset. We have then evaluated accuracy on train and test datasets. We have also printed the first few predictions and probabilities.

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target,
                                                    stratify=breast_cancer.target,
                                                    train_size=0.90, random_state=42)

xgb_classif = xgb.XGBClassifier()

xgb_classif.fit(X_train, Y_train, eval_set=[(X_test, Y_test)], eval_metric="auc" , verbose=10)

print("Test  Accuracy Score : %.2f"%xgb_classif.score(X_test, Y_test))
print("Train Accuracy Score : %.2f"%xgb_classif.score(X_train, Y_train))

[0]	validation_0-auc:0.97685
[10]	validation_0-auc:0.99339
[20]	validation_0-auc:0.99206
[30]	validation_0-auc:0.99206
[40]	validation_0-auc:0.98809
[50]	validation_0-auc:0.98809
[60]	validation_0-auc:0.98809
[70]	validation_0-auc:0.98809
[80]	validation_0-auc:0.98809
[90]	validation_0-auc:0.98809
[99]	validation_0-auc:0.98942
Test  Accuracy Score : 0.96
Train Accuracy Score : 1.00

xgb_classif.predict(X_test)[:5]

array([0, 1, 1, 0, 0])

print("Probabilities : ")
print(xgb_classif.predict_proba(X_test)[:5])
print("\nPrediction From Probabilities : ")
print(np.argmax(xgb_classif.predict_proba(X_test)[:5], axis=1))

Probabilities :
[[9.9962151e-01 3.7849427e-04]
 [7.4094534e-04 9.9925905e-01]
 [7.4838996e-03 9.9251610e-01]
 [9.9939799e-01 6.0198107e-04]
 [9.9195606e-01 8.0439411e-03]]

Prediction From Probabilities :
[0 1 1 0 0]

print("Default Number of Estimators : ",xgb_classif.n_estimators)
print("Default Max Depth of Trees   : ", xgb_classif.max_depth)
print("Feature Importances : ")
pd.DataFrame([xgb_classif.feature_importances_], columns=breast_cancer.feature_names)

Default Number of Estimators :  100
Default Max Depth of Trees   :  None
Feature Importances :

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	radius error	texture error	perimeter error	area error	smoothness error	compactness error	concavity error	concave points error	symmetry error	fractal dimension error	worst radius	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension
0	0.007508	0.018245	0.0	0.01407	0.005634	0.00113	0.00268	0.034978	0.001187	0.005131	0.01308	0.002079	0.0	0.005953	0.001658	0.006462	0.0	0.001915	0.000841	0.004829	0.449676	0.020751	0.218758	0.033382	0.007877	0.0	0.009276	0.124308	0.002986	0.005606

4.2.2 Hyperparameters Tuning using Grid Search¶

Below we have explained how we can use XGBClassifier with sklearn's grid search functionality to try a list of parameters to find the best parameter settings.

%%time

from sklearn.model_selection import GridSearchCV

params = {
        'n_estimators': [50,100,150,200,300,500],
        'max_depth': [None, 3, 5, 7, 9],
        'eta': [0.5, 1, 2, 3]
        }
grid_search = GridSearchCV(xgb.XGBClassifier(), params, n_jobs=-1, cv=5)

grid_search.fit(X_train, Y_train)

print("Test  Accuracy Score : %.2f"%grid_search.score(X_test, Y_test))
print("Train Accuracy Score : %.2f"%grid_search.score(X_train, Y_train))

print("Best Params : ", grid_search.best_params_)
print("Feature Importances : ")
pd.DataFrame([grid_search.best_estimator_.feature_importances_], columns=breast_cancer.feature_names)

Test  Accuracy Score : 0.98
Train Accuracy Score : 1.00
Best Params :  {'eta': 1, 'max_depth': None, 'n_estimators': 50}
Feature Importances :
CPU times: user 353 ms, sys: 3.51 ms, total: 357 ms
Wall time: 6.9 s

/home/sunny/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_search.py:813: DeprecationWarning: The default of the `iid` parameter will change from True to False in version 0.22 and will be removed in 0.24. This will change numeric results when test-set sizes are unequal.
  DeprecationWarning)

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	radius error	texture error	perimeter error	area error	smoothness error	compactness error	concavity error	concave points error	symmetry error	fractal dimension error	worst radius	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension
0	0.00544	0.014877	0.0	0.0	0.002808	0.00063	0.000494	0.052355	0.002261	0.000968	0.007919	0.002445	0.0	0.004377	0.000712	0.002277	0.001108	0.000309	0.000304	0.001782	0.462069	0.015647	0.268949	0.004184	0.002715	0.007969	0.021782	0.111749	0.003871	0.0

4.3 XGBRFRegressor ¶

4.3.1 Train Model, Make Predictions & Evaluate Model Performance¶

The XGBRFRegressor is a random forest implementation based on decision trees for regression tasks. It has almost exactly the same API as that of XGBRegressor. We have explained below the usage of it on the Boston housing dataset.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

xgb_rf_regressor = xgb.XGBRFRegressor()

xgb_rf_regressor.fit(X_train, Y_train)

print("Test  R2 Score : %.2f"%xgb_rf_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_rf_regressor.score(X_train, Y_train))

Test  R2 Score : 0.87
Train R2 Score : 0.96

print("Default Number of Estimators : ",xgb_rf_regressor.n_estimators)
print("Default Max Depth of Trees   : ", xgb_rf_regressor.max_depth)
print("Feature Importances : ")
pd.DataFrame([xgb_rf_regressor.feature_importances_], columns=boston.feature_names)

Default Number of Estimators :  100
Default Max Depth of Trees   :  None
Feature Importances :

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.027662	0.004731	0.04277	0.00403	0.08146	0.335912	0.015528	0.091729	0.010167	0.028006	0.039975	0.013578	0.304454

4.3.2 Hyperparameters Tuning using Grid Search¶

%%time

from sklearn.model_selection import GridSearchCV

params = {
        'n_estimators': [50,100,150,200,300,500],
        'max_depth': [None, 3, 5, 7, 9],
        'eta': [0.5, 1, 2, 3]
        }
grid_search = GridSearchCV(xgb.XGBRFRegressor(), params, n_jobs=-1, cv=5)

grid_search.fit(X_train, Y_train)

print("Test  R2 Score : %.2f"%grid_search.score(X_test, Y_test))
print("Train R2 Score : %.2f"%grid_search.score(X_train, Y_train))

print("Best Params : ", grid_search.best_params_)
print("Feature Importances : ")
pd.DataFrame([grid_search.best_estimator_.feature_importances_], columns=boston.feature_names)

Test  R2 Score : 0.88
Train R2 Score : 0.99
Best Params :  {'eta': 0.5, 'max_depth': 9, 'n_estimators': 100}
Feature Importances :
CPU times: user 1.42 s, sys: 33.9 ms, total: 1.45 s
Wall time: 14.4 s

	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.01547	0.0042	0.028811	0.004552	0.063577	0.299841	0.014431	0.089211	0.013981	0.035951	0.046551	0.014287	0.369136

4.4 XGBRFClassifier ¶

4.4.1 Train Model, Make Predictions & Evaluate Model Performance¶

The XGBRFClassifier is a random forest implementation based on decision trees for classification tasks. It has almost exactly the same API as that of XGBClassifier. We have explained below the usage of it on the breast cancer dataset.

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target,
                                                    stratify=breast_cancer.target,
                                                    train_size=0.90, random_state=42)

xgb_rf_classif = xgb.XGBRFClassifier()

xgb_rf_classif.fit(X_train, Y_train)

print("Test  Accuracy Score : %.2f"%xgb_rf_classif.score(X_test, Y_test))
print("Train Accuracy Score : %.2f"%xgb_rf_classif.score(X_train, Y_train))

Test  Accuracy Score : 0.95
Train Accuracy Score : 0.99

print("Default Number of Estimators : ",xgb_rf_classif.n_estimators)
print("Default Max Depth of Trees   : ", xgb_rf_classif.max_depth)
print("Feature Importances : ")
pd.DataFrame([xgb_rf_classif.feature_importances_], columns=breast_cancer.feature_names)

Default Number of Estimators :  100
Default Max Depth of Trees   :  None
Feature Importances :

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	radius error	texture error	perimeter error	area error	smoothness error	compactness error	concavity error	concave points error	symmetry error	fractal dimension error	worst radius	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension
0	0.003356	0.009529	0.009581	0.010629	0.006968	0.007762	0.009291	0.090449	0.00362	0.010503	0.004347	0.015605	0.00586	0.006079	0.001386	0.015551	0.007671	0.007575	0.005983	0.006225	0.157929	0.013715	0.232325	0.161624	0.01187	0.027882	0.01665	0.096775	0.017094	0.026166

4.4.2 Hyperparameters Tuning using Grid Search¶

%%time

from sklearn.model_selection import GridSearchCV

params = {
        'n_estimators': [50,100,150,200,300,500],
        'max_depth': [None, 3, 5, 7, 9],
        'eta': [0.5, 1, 2, 3]
        }
grid_search = GridSearchCV(xgb.XGBRFClassifier(), params, n_jobs=-1, cv=5)

grid_search.fit(X_train, Y_train)

print("Test  Accuracy Score : %.2f"%grid_search.score(X_test, Y_test))
print("Train Accuracy Score : %.2f"%grid_search.score(X_train, Y_train))

print("Best Params : ", grid_search.best_params_)
print("Feature Importances : ")
pd.DataFrame([grid_search.best_estimator_.feature_importances_], columns=breast_cancer.feature_names)

Test  Accuracy Score : 0.95
Train Accuracy Score : 0.99
Best Params :  {'eta': 0.5, 'max_depth': None, 'n_estimators': 150}
Feature Importances :
CPU times: user 836 ms, sys: 3.11 ms, total: 840 ms
Wall time: 25.4 s

	mean radius	mean texture	mean perimeter	mean area	mean smoothness	mean compactness	mean concavity	mean concave points	mean symmetry	mean fractal dimension	radius error	texture error	perimeter error	area error	smoothness error	compactness error	concavity error	concave points error	symmetry error	fractal dimension error	worst radius	worst texture	worst perimeter	worst area	worst smoothness	worst compactness	worst concavity	worst concave points	worst symmetry	worst fractal dimension
0	0.003453	0.009951	0.008917	0.010582	0.00692	0.007947	0.010473	0.067403	0.003711	0.010675	0.00435	0.014345	0.005435	0.006293	0.001694	0.013137	0.006933	0.00861	0.006133	0.005835	0.139899	0.01372	0.261298	0.164143	0.012468	0.033611	0.015302	0.103061	0.016882	0.026823

5. Early Stop Training to Avoid Overfitting ¶

Xgboost provides us with an option that lets us stop the training process if training loss is not improving for some specified number of iterations. We can specify the early_stopping_rounds parameter in the train() method to some integer and it'll stop training if training loss is not improved for that many rounds of training.

Below we have instructed train() method to train for 20 rounds using num_boost_round parameter and early_stopping_rounds is set to 5. The train() method will stop training if training loss is not improved for 5 sequential rounds of training.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie'},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")],
                    num_boost_round=20,
                    early_stopping_rounds=5)

print("\nTrain RMSE : ",tweedie_booster.eval(dmat_train))
print("Test  RMSE : ",tweedie_booster.eval(dmat_test))

from sklearn.metrics import r2_score

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	train-tweedie-nloglik@1.5:28.32970	test-tweedie-nloglik@1.5:26.66488
Multiple eval metrics have been passed: 'test-tweedie-nloglik@1.5' will be used for early stopping.

Will train until test-tweedie-nloglik@1.5 hasn't improved in 5 rounds.
[1]	train-tweedie-nloglik@1.5:19.30211	test-tweedie-nloglik@1.5:18.59277
[2]	train-tweedie-nloglik@1.5:18.73162	test-tweedie-nloglik@1.5:18.15825
[3]	train-tweedie-nloglik@1.5:18.71630	test-tweedie-nloglik@1.5:18.14867
[4]	train-tweedie-nloglik@1.5:18.70620	test-tweedie-nloglik@1.5:18.14152
[5]	train-tweedie-nloglik@1.5:18.70218	test-tweedie-nloglik@1.5:18.13844
[6]	train-tweedie-nloglik@1.5:18.69798	test-tweedie-nloglik@1.5:18.13924
[7]	train-tweedie-nloglik@1.5:18.69527	test-tweedie-nloglik@1.5:18.13543
[8]	train-tweedie-nloglik@1.5:18.69322	test-tweedie-nloglik@1.5:18.12734
[9]	train-tweedie-nloglik@1.5:18.69212	test-tweedie-nloglik@1.5:18.12620
[10]	train-tweedie-nloglik@1.5:18.69033	test-tweedie-nloglik@1.5:18.12475
[11]	train-tweedie-nloglik@1.5:18.68892	test-tweedie-nloglik@1.5:18.12602
[12]	train-tweedie-nloglik@1.5:18.68768	test-tweedie-nloglik@1.5:18.12688
[13]	train-tweedie-nloglik@1.5:18.68724	test-tweedie-nloglik@1.5:18.12563
[14]	train-tweedie-nloglik@1.5:18.68605	test-tweedie-nloglik@1.5:18.12537
[15]	train-tweedie-nloglik@1.5:18.68555	test-tweedie-nloglik@1.5:18.12502
Stopping. Best iteration:
[10]	train-tweedie-nloglik@1.5:18.69033	test-tweedie-nloglik@1.5:18.12475


Train RMSE :  [0]	eval-tweedie-nloglik@1.5:18.685551
Test  RMSE :  [0]	eval-tweedie-nloglik@1.5:18.125015

Test  R2 Score : 0.91
Train R2 Score : 0.97

We have below explained how we can early stop training with XGBRegressor.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

xgb_regressor = xgb.XGBRegressor(max_depth=3, eta=1, objective='reg:tweedie')

xgb_regressor.fit(X_train, Y_train,
                  eval_set=[(X_test, Y_test)], eval_metric="rmse",
                  early_stopping_rounds=5, verbose=5)

print("Test  R2 Score : %.2f"%xgb_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_regressor.score(X_train, Y_train))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	validation_0-rmse:19.40547
Will train until validation_0-rmse hasn't improved in 5 rounds.
[5]	validation_0-rmse:3.15521
[10]	validation_0-rmse:2.48459
[15]	validation_0-rmse:2.40736
[20]	validation_0-rmse:2.40103
[25]	validation_0-rmse:2.46336
Stopping. Best iteration:
[21]	validation_0-rmse:2.38972

Test  R2 Score : 0.91
Train R2 Score : 0.98

6. Feature Interaction Constraints ¶

When xgboost creates a tree during the training process it takes into consideration all feature interactions. In a decision tree, we have nodes where each node represents a decision to be made based on a particular value of the feature. The next node will be based on the feature value mentioned in the previous node. By default, all features can be present in any node of the decision tree. We can force xgboost to keep a list of features in subsequent nodes by giving it a list of indices of features in the dataset. We can give list of list to interaction_constraints parameter of train() method. Here an individual list is a list of feature indices that should only interact with one another and not with other features.

Please feel free to go through this link to get in-depth details about feature interaction constraints in xgboost.

Below we have kept features 0,1,2,11 and 12 into one list hence these features will interact with one another when creating a tree but not with other features hence tree will have only these features. The Same goes for other lists.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie',
                             'tree_method':'hist', 'nthread':4,
                             'interaction_constraints':[[0,1,2,11,12], [3, 4],[6,10], [5,9], [7,8]]},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")])

print("\nTrain RMSE : ",tweedie_booster.eval(dmat_train))
print("Test  RMSE : ",tweedie_booster.eval(dmat_test))

from sklearn.metrics import r2_score

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	train-tweedie-nloglik@1.5:28.32970	test-tweedie-nloglik@1.5:26.66487
[1]	train-tweedie-nloglik@1.5:19.31425	test-tweedie-nloglik@1.5:18.56556
[2]	train-tweedie-nloglik@1.5:18.75812	test-tweedie-nloglik@1.5:18.15946
[3]	train-tweedie-nloglik@1.5:18.73723	test-tweedie-nloglik@1.5:18.18582
[4]	train-tweedie-nloglik@1.5:18.72045	test-tweedie-nloglik@1.5:18.18661
[5]	train-tweedie-nloglik@1.5:18.71539	test-tweedie-nloglik@1.5:18.18151
[6]	train-tweedie-nloglik@1.5:18.71035	test-tweedie-nloglik@1.5:18.16513
[7]	train-tweedie-nloglik@1.5:18.70579	test-tweedie-nloglik@1.5:18.15365
[8]	train-tweedie-nloglik@1.5:18.70398	test-tweedie-nloglik@1.5:18.15155
[9]	train-tweedie-nloglik@1.5:18.69945	test-tweedie-nloglik@1.5:18.15776

Train RMSE :  [0]	eval-tweedie-nloglik@1.5:18.699446
Test  RMSE :  [0]	eval-tweedie-nloglik@1.5:18.157764

Test  R2 Score : 0.78
Train R2 Score : 0.94

Below we have explained how we can use feature interaction constraint with XGBRegressor.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

xgb_regressor = xgb.XGBRegressor(max_depth=3, eta=1, objective='reg:tweedie',
                                 interaction_constraints=[[0,1,2,11,12], [3, 4],[6,10], [5,9], [7,8]])

xgb_regressor.fit(X_train, Y_train,
                  eval_set=[(X_test, Y_test)], eval_metric="rmse",
                  early_stopping_rounds=5, verbose=1)

print("Test  R2 Score : %.2f"%xgb_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_regressor.score(X_train, Y_train))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	validation_0-rmse:19.40547
Will train until validation_0-rmse hasn't improved in 5 rounds.
[1]	validation_0-rmse:9.42298
[2]	validation_0-rmse:3.54240
[3]	validation_0-rmse:4.78789
[4]	validation_0-rmse:5.06168
[5]	validation_0-rmse:5.65695
[6]	validation_0-rmse:5.04555
[7]	validation_0-rmse:4.97560
Stopping. Best iteration:
[2]	validation_0-rmse:3.54240

Test  R2 Score : 0.80
Train R2 Score : 0.79

7. Monotonic Constraints ¶

The monotonic constraints let us specify increasing, decreasing, or no monotone relation of the feature with the target. We can specify a monotone value of 1,0 or -1 for each feature to show the increasing, none, and decreasing relation of the feature with the target by setting the monotone_constraints parameter. Below we have explained the usage of monotonic constraints for regression problems using the Boston dataset.

Please feel free to check this link to better understand monotonic constraints.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie',
                             'tree_method':'hist', 'nthread':4,
                             'monotone_constraints':(1,0,1,-1,1,0,1,0,-1,1,1, -1, 1)},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")])

print("\nTrain RMSE : ",tweedie_booster.eval(dmat_train))
print("Test  RMSE : ",tweedie_booster.eval(dmat_test))

from sklearn.metrics import r2_score

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	train-tweedie-nloglik@1.5:28.33915	test-tweedie-nloglik@1.5:26.59607
[1]	train-tweedie-nloglik@1.5:19.35218	test-tweedie-nloglik@1.5:18.59604
[2]	train-tweedie-nloglik@1.5:18.79351	test-tweedie-nloglik@1.5:18.18378
[3]	train-tweedie-nloglik@1.5:18.75948	test-tweedie-nloglik@1.5:18.15357
[4]	train-tweedie-nloglik@1.5:18.75133	test-tweedie-nloglik@1.5:18.15053
[5]	train-tweedie-nloglik@1.5:18.74689	test-tweedie-nloglik@1.5:18.14669
[6]	train-tweedie-nloglik@1.5:18.74240	test-tweedie-nloglik@1.5:18.14553
[7]	train-tweedie-nloglik@1.5:18.73603	test-tweedie-nloglik@1.5:18.16522
[8]	train-tweedie-nloglik@1.5:18.73118	test-tweedie-nloglik@1.5:18.16656
[9]	train-tweedie-nloglik@1.5:18.72748	test-tweedie-nloglik@1.5:18.16702

Train RMSE :  [0]	eval-tweedie-nloglik@1.5:18.727484
Test  RMSE :  [0]	eval-tweedie-nloglik@1.5:18.167023

Test  R2 Score : 0.82
Train R2 Score : 0.88

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

xgb_regressor = xgb.XGBRegressor(max_depth=3, eta=1, objective='reg:tweedie',
                                 monotone_constraints=(1,0,1,-1,1,0,1,0,-1,1,1, -1, 1))

xgb_regressor.fit(X_train, Y_train,
                  eval_set=[(X_test, Y_test)], eval_metric="rmse",
                  early_stopping_rounds=5, verbose=1)

print("Test  R2 Score : %.2f"%xgb_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_regressor.score(X_train, Y_train))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	validation_0-rmse:19.38521
Will train until validation_0-rmse hasn't improved in 5 rounds.
[1]	validation_0-rmse:9.61720
[2]	validation_0-rmse:4.16700
[3]	validation_0-rmse:3.83074
[4]	validation_0-rmse:4.35182
[5]	validation_0-rmse:4.52082
[6]	validation_0-rmse:4.44078
[7]	validation_0-rmse:4.20142
[8]	validation_0-rmse:4.24515
Stopping. Best iteration:
[3]	validation_0-rmse:3.83074

Test  R2 Score : 0.76
Train R2 Score : 0.78

8. Custom Objective/Loss Function ¶

As a part of this section, we have explained how we can use a custom objective/loss function with xgboost. We'll be giving input to loss function list of predicted values and actual target values. It'll then return a list of the first derivative and second derivative of loss function for that values. Below we have created the mean squared error loss function and explained its usage with a simple example. We need to pass a reference to function to the objective parameter of an estimator.

def first_grad(predt, dtrain):
    '''Compute the first derivative for mean squared error.'''
    y = dtrain.get_label() if isinstance(dtrain, xgb.DMatrix) else dtrain
    return 2*(y-predt)

def second_grad(predt, dtrain):
    '''Compute the second derivative for mean squared error.'''
    y = dtrain.get_label() if isinstance(dtrain, xgb.DMatrix) else dtrain
    return [1] * len(predt)

def mean_sqaured_error(predt, dtrain):
    ''''Mean squared error function.'''
    predt[predt < -1] = -1 + 1e-6
    grad = first_grad(predt, dtrain)
    hess = second_grad(predt, dtrain)
    return grad, hess

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

xgb_regressor = xgb.XGBRegressor(max_depth=3, eta=1, objective=mean_sqaured_error)  ## Custom Evaluation Function

xgb_regressor.fit(X_train, Y_train,
                  eval_set=[(X_test, Y_test)], eval_metric="mae",
                  early_stopping_rounds=5,
                  verbose=10)

print("\nTest  R2 Score : %.2f"%xgb_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_regressor.score(X_train, Y_train))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	validation_0-mae:19.54971
Will train until validation_0-mae hasn't improved in 5 rounds.
[10]	validation_0-mae:15.58860
[20]	validation_0-mae:13.57056
[30]	validation_0-mae:11.92066
[40]	validation_0-mae:10.05207
[50]	validation_0-mae:9.34793
[60]	validation_0-mae:8.23015
[70]	validation_0-mae:7.45686
[80]	validation_0-mae:7.02828
[90]	validation_0-mae:6.30696
[99]	validation_0-mae:5.31595

Test  R2 Score : 0.27
Train R2 Score : 0.63

9. Custom Evaluation Functions ¶

Xgboost lets us create our custom evaluation function as well. The function should accept predictions and DMatrix instances as parameters and then calculate metrics based on predictions and actual target values. We have created simple mean_absolute_error() for explanation purpose.

We can pass function reference to the feval parameter of train() to use it on the evaluation dataset.

def mean_absolute_error(preds, dmat):
    actuals = dmat.get_label()
    err = (actuals - preds).sum()
    return "MAE", err

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:squarederror'},
                    dmat_train,
                    evals=[(dmat_test, "test")],
                    feval=mean_absolute_error, ## Custom Evaluation Function
                    num_boost_round=10,
                    early_stopping_rounds=5)

print("\nTrain RMSE : ",booster.eval(dmat_train))
print("Test  RMSE : ",booster.eval(dmat_test))

from sklearn.metrics import r2_score

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, booster.predict(dmat_train)))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	test-rmse:3.59159	test-MAE:26.53242
Multiple eval metrics have been passed: 'test-MAE' will be used for early stopping.

Will train until test-MAE hasn't improved in 5 rounds.
[1]	test-rmse:3.26373	test-MAE:15.24190
[2]	test-rmse:3.12218	test-MAE:18.01450
[3]	test-rmse:2.94107	test-MAE:4.76002
[4]	test-rmse:2.75222	test-MAE:2.55075
[5]	test-rmse:2.78515	test-MAE:-3.35190
[6]	test-rmse:2.64519	test-MAE:-2.30084
[7]	test-rmse:2.64290	test-MAE:-2.01779
[8]	test-rmse:2.58895	test-MAE:-9.34707
[9]	test-rmse:2.61442	test-MAE:-5.63952

Train RMSE :  [0]	eval-rmse:1.965108
Test  RMSE :  [0]	eval-rmse:2.614419

Test  R2 Score : 0.89
Train R2 Score : 0.96

Below we have explained how we can use custom evaluation metrics with XGBRegressor. We need to set the eval_metric parameter of the fit() method with reference to the custom evaluation function.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

xgb_regressor = xgb.XGBRegressor(max_depth=3, eta=1, objective='reg:squarederror')

xgb_regressor.fit(X_train, Y_train,
                  eval_set=[(X_test, Y_test)], eval_metric=mean_absolute_error,
                  early_stopping_rounds=5,
                  verbose=5)

print("Test  R2 Score : %.2f"%xgb_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_regressor.score(X_train, Y_train))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	validation_0-rmse:3.59159	validation_0-MAE:26.53242
Multiple eval metrics have been passed: 'validation_0-MAE' will be used for early stopping.

Will train until validation_0-MAE hasn't improved in 5 rounds.
[5]	validation_0-rmse:2.78515	validation_0-MAE:-3.35190
[10]	validation_0-rmse:2.57606	validation_0-MAE:-16.31485
[15]	validation_0-rmse:2.47347	validation_0-MAE:-23.60214
[20]	validation_0-rmse:2.43278	validation_0-MAE:-19.31291
Stopping. Best iteration:
[16]	validation_0-rmse:2.49018	validation_0-MAE:-24.94875

Test  R2 Score : 0.90
Train R2 Score : 0.97

10. Callbacks ¶

Xgboost provides us with a list of callback functions for a different purpose which gets executed after each iteration of training. Below is a list of available callbacks with xgboost as a part of the callback module.

early_stop - It accepts integer specifying whether to stop training if evaluation metric results on last evaluation set are not improved for that many iterations.
print_evaluation - It accepts integer values specifying how often to print evaluation results. Evaluation metric results are printed at every that many iterations as specified.
record_evaluation - It accepts a dictionary in which evaluation results will be recorded.
reset_learning_rate - It lets us reset the learning rate after each iteration of training. It accepts an array of size the same as the number of iterations or callback returning the new learning rate for each iteration.

We need to provide a list of callbacks to the callbacks parameter for their execution after each iteration.

Below we have explained usage of early_stop(), print_evaluation() and record_evaluation() callbacks for regression task.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)


early_stop_execution = xgb.callback.early_stop(5)
print_eval = xgb.callback.print_evaluation(1)
eval_results = {}
eval_results_callback = xgb.callback.record_evaluation(eval_results)

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie'},
                    dmat_train,
                    evals=[(dmat_test, "test")],
                    num_boost_round=25,
                    verbose_eval=False,
                    callbacks=[early_stop_execution, print_eval, eval_results_callback])

print("Evaluation Results : ", eval_results)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

Will train until test-tweedie-nloglik@1.5 hasn't improved in 5 rounds.
[0]	test-tweedie-nloglik@1.5:26.66488
[1]	test-tweedie-nloglik@1.5:18.59277
[2]	test-tweedie-nloglik@1.5:18.15825
[3]	test-tweedie-nloglik@1.5:18.14867
[4]	test-tweedie-nloglik@1.5:18.14152
[5]	test-tweedie-nloglik@1.5:18.13844
[6]	test-tweedie-nloglik@1.5:18.13924
[7]	test-tweedie-nloglik@1.5:18.13543
[8]	test-tweedie-nloglik@1.5:18.12734
[9]	test-tweedie-nloglik@1.5:18.12620
[10]	test-tweedie-nloglik@1.5:18.12475
[11]	test-tweedie-nloglik@1.5:18.12602
[12]	test-tweedie-nloglik@1.5:18.12688
[13]	test-tweedie-nloglik@1.5:18.12563
[14]	test-tweedie-nloglik@1.5:18.12537
Stopping. Best iteration:
[10]	test-tweedie-nloglik@1.5:18.12475

Evaluation Results :  {'test': {'tweedie-nloglik@1.5': [26.664877, 18.592772, 18.158253, 18.148666, 18.141518, 18.138443, 18.139244, 18.135429, 18.127338, 18.126202, 18.124754, 18.12602, 18.126877, 18.125628, 18.12537]}}

Test  R2 Score : 0.91
Train R2 Score : 0.97

Below we have again explained the same three callbacks with XGBRegressor. This time we need to pass a list of callback functions to the callbacks parameter of the fit() method.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

xgb_regressor = xgb.XGBRegressor(max_depth=3, eta=1, objective='reg:squarederror')

early_stop_execution = xgb.callback.early_stop(5)
print_eval = xgb.callback.print_evaluation(5)
eval_results = {}
eval_results_callback = xgb.callback.record_evaluation(eval_results)

xgb_regressor.fit(X_train, Y_train,
                  eval_set=[(X_test, Y_test)],
                  verbose=False,
                  callbacks = [early_stop_execution, print_eval, eval_results_callback]
                  )

print("Evaluation Results : ", eval_results)

print("\nTest  R2 Score : %.2f"%xgb_regressor.score(X_test, Y_test))
print("Train R2 Score : %.2f"%xgb_regressor.score(X_train, Y_train))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

Will train until validation_0-rmse hasn't improved in 5 rounds.
[0]	validation_0-rmse:3.59159
[5]	validation_0-rmse:2.78515
[10]	validation_0-rmse:2.57606
[15]	validation_0-rmse:2.47347
[20]	validation_0-rmse:2.43278
[25]	validation_0-rmse:2.39191
[30]	validation_0-rmse:2.44904
Stopping. Best iteration:
[26]	validation_0-rmse:2.37548

Evaluation Results :  {'validation_0': {'rmse': [3.591595, 3.263732, 3.122178, 2.941069, 2.752223, 2.785148, 2.64519, 2.642898, 2.588952, 2.614419, 2.576064, 2.52405, 2.521241, 2.548234, 2.546718, 2.473467, 2.490181, 2.460504, 2.394174, 2.426881, 2.432777, 2.421911, 2.420417, 2.379785, 2.382656, 2.39191, 2.375476, 2.386967, 2.440467, 2.443965, 2.449045]}}

Test  R2 Score : 0.90
Train R2 Score : 0.99

As a part of this example, we have explained how we can use the reset_learning_rate() callback. We have first called the reset_learning_rate() function with an array of size 15 which is the same as the number of iterations of our training process. The array starts from 0.1 till 1.5 increasing the learning rate by 0.1 each time.

reset_learning_rate = xgb.callback.reset_learning_rate(list(np.linspace(0.1,1.5, num=15)))

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie'},
                    dmat_train,
                    evals=[(dmat_test, "test")],
                    num_boost_round=15,
                    callbacks=[reset_learning_rate])

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

[0]	test-tweedie-nloglik@1.5:26.66487
[1]	test-tweedie-nloglik@1.5:25.31718
[2]	test-tweedie-nloglik@1.5:23.11685
[3]	test-tweedie-nloglik@1.5:20.87608
[4]	test-tweedie-nloglik@1.5:19.25085
[5]	test-tweedie-nloglik@1.5:18.44966
[6]	test-tweedie-nloglik@1.5:18.19030
[7]	test-tweedie-nloglik@1.5:18.14023
[8]	test-tweedie-nloglik@1.5:18.13443
[9]	test-tweedie-nloglik@1.5:18.13133
[10]	test-tweedie-nloglik@1.5:18.13147
[11]	test-tweedie-nloglik@1.5:18.13265
[12]	test-tweedie-nloglik@1.5:18.13475
[13]	test-tweedie-nloglik@1.5:18.13259
[14]	test-tweedie-nloglik@1.5:18.13398

Test  R2 Score : 0.87
Train R2 Score : 0.97

We have now explained another example demonstrating usage of the reset_learning_rate() callback. This time we have created a function named calculate_learning_rate() which will be passed to reset_learning_rate() callback. The function takes as input two integers (current boosting round index and a total number of boosting rounds) and returns the learning rate for that boosting round. We have then passed the callback created to the callbacks parameter.

def calculate_learning_rate(boosting_round, num_boost_round):
    lrs = list(np.linspace(0.1,1.5, num=num_boost_round))
    return lrs[boosting_round]

reset_learning_rate = xgb.callback.reset_learning_rate(calculate_learning_rate)

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie'},
                    dmat_train,
                    evals=[(dmat_test, "test")],
                    num_boost_round=15,
                    callbacks=[reset_learning_rate])

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

[0]	test-tweedie-nloglik@1.5:26.66487
[1]	test-tweedie-nloglik@1.5:25.31718
[2]	test-tweedie-nloglik@1.5:23.11685
[3]	test-tweedie-nloglik@1.5:20.87608
[4]	test-tweedie-nloglik@1.5:19.25085
[5]	test-tweedie-nloglik@1.5:18.44966
[6]	test-tweedie-nloglik@1.5:18.19030
[7]	test-tweedie-nloglik@1.5:18.14023
[8]	test-tweedie-nloglik@1.5:18.13443
[9]	test-tweedie-nloglik@1.5:18.13133
[10]	test-tweedie-nloglik@1.5:18.13147
[11]	test-tweedie-nloglik@1.5:18.13265
[12]	test-tweedie-nloglik@1.5:18.13475
[13]	test-tweedie-nloglik@1.5:18.13258
[14]	test-tweedie-nloglik@1.5:18.13398

Test  R2 Score : 0.87
Train R2 Score : 0.97

11. Dask Backend for Distributed Training Of XGBoost Models ¶

Xgboost provides support for using dask as a backend for training gradient boosting algorithm in a distributed environment. Xgboost has a module named dask which has a list of data structures and estimators for using with dask.

Dask has a simple structure where we have below mentioned three main components.

Scheduler - Dask distributed environment has one scheduler which handles communication between clients and workers. It’s even responsible for distributing work to worker nodes.
Clients - We can have more than one client instances which can be used to submit tasks to the scheduler.
Workers - These are actual nodes(processes/machines) which runs task.

In order to use dask, we need to create a client that will be used to communicate with the scheduler. This tutorial is run on a single PC and not on a distributed environment with multiple nodes. When we create an instance of dask client without giving the IP address and port of scheduler, it'll create a cluster on the local machine itself. Below we have created a small cluster of 4 workers using the Client() constructor.

If you are interested in learning about dask then please feel free to check our tutorials on the same. It has information about creating a dask distributed environment as well on an actual cluster with multiple machines.

Please make a note that using xgboost on the dask distributed environment requires a little background of dask to make it work correctly.

print("Dask Installed ?", xgb.dask.DASK_INSTALLED)

client = xgb.dask.Client(n_workers=4, threads_per_worker=4)

Dask Installed ? True

client

xgb.dask.get_client()

Below we have divided the Boston housing dataset into the train (90%) and test (10%) sets. We have then converted the normal numpy array to the dask array using the da module of the xgboost.dask module. The da module provides access to the dask.array module. We can even use the dask.array module to create arrays and it'll work fine. The xgboost estimators are available through the xgboost.dask module accepts either the dask array or dask dataframe. The dask.dataframe module is available in xgboost as xgboost.dask.dd.

Please feel free to check below links if you are looking for some guidance on dask.array and dask.dataframe.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

X_train_d, X_test_d, Y_train_d, Y_test_d = xgb.dask.da.array(X_train), xgb.dask.da.array(X_test), xgb.dask.da.array(Y_train), xgb.dask.da.array(Y_test)

X_train_d

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

	Array	Chunk
Bytes	47.32 kB	47.32 kB
Shape	(455, 13)	(455, 13)
Count	1 Tasks	1 Chunks
Type	float64	numpy.ndarray

Below we have explained how we can run the xgboost algorithm in a dask distributed environment. The dask module has its own DaskDMatrix data structure which is almost the same as DMatrix but requires client instance as the first argument followed by data arrays containing features values and target labels.

The train() method available through the dask module requires us to pass the client instance first before the parameters dictionary. Everything else is the same as the train() method available directly. It runs training in a distributed environment and returns a dictionary with two components (Booster instance and training history). We can then call the predict() method of the dask module by giving it the client, booster instance, and DaskDMatrix dataset. It'll return a lazy instance on which we need to call compute() to evaluate it and return the actual result. We have calculated the R2 score on train and test datasets at last.

dmat_train_dask = xgb.dask.DaskDMatrix(client, X_train_d, Y_train_d, feature_names=boston.feature_names)
dmat_test_dask = xgb.dask.DaskDMatrix(client, X_test_d, Y_test_d, feature_names=boston.feature_names)

reg_booster = xgb.dask.train(client, {'max_depth': 3, 'eta': 1, 'objective': 'reg:squarederror'},
                             dmat_train_dask,
                             evals=[(dmat_train_dask, "train"), (dmat_test_dask, "test")])

print(reg_booster)

from sklearn.metrics import r2_score

test_preds = xgb.dask.predict(client, reg_booster["booster"], dmat_test_dask)
train_preds = xgb.dask.predict(client, reg_booster["booster"], dmat_train_dask)

print("\nType of Predictions : ",test_preds)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds.compute()))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds.compute()))

{'booster': <xgboost.core.Booster object at 0x7f73ec6a4748>, 'history': {'train': {'rmse': [4.037745, 3.383986, 3.151301, 2.850509, 2.742785, 2.55956, 2.460093, 2.290408, 2.148991, 2.067703]}, 'test': {'rmse': [3.581157, 3.032583, 2.976439, 2.762226, 2.773082, 2.730096, 2.779377, 2.83347, 2.83619, 2.845627]}}}

Type of Predictions :  dask.array<from-value, shape=(51,), dtype=float32, chunksize=(51,), chunktype=numpy.ndarray>

Test  R2 Score : 0.87
Train R2 Score : 0.95

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

print("\nTrain RMSE : ",reg_booster["booster"].eval(dmat_train))
print("Test  RMSE : ",reg_booster["booster"].eval(dmat_test))

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, reg_booster["booster"].predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, reg_booster["booster"].predict(dmat_train)))

Train RMSE :  [0]	eval-rmse:2.067703
Test  RMSE :  [0]	eval-rmse:2.845627

Test  R2 Score : 0.87
Train R2 Score : 0.95

Below we have explained how we can use the DaskXGBRegressor() estimator for regression task with the Boston housing dataset. It has the same API as that of XGBRegressor(). We have first created a client instance and then used it as context to call all other methods which will require the usage of dask distributed environment.

client = xgb.dask.Client(n_workers=4, threads_per_worker=4)

with client:
    X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

    print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

    X_train_d, X_test_d, Y_train_d, Y_test_d = xgb.dask.da.array(X_train), xgb.dask.da.array(X_test), xgb.dask.da.array(Y_train), xgb.dask.da.array(Y_test)

    xgb_dask_regressor = xgb.dask.DaskXGBRegressor()

    xgb_dask_regressor.fit(X_train_d, Y_train_d)

    print("Test  R2 Score : %.2f"%xgb_dask_regressor.score(X_test_d, Y_test_d))
    print("Train R2 Score : %.2f"%xgb_dask_regressor.score(X_train_d, Y_train_d))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

Test  R2 Score : 0.93
Train R2 Score : 1.00

As the last example of using xgboost with dask, we have explained how we can use the DaskXGBClassifier() estimator for classification tasks. The majority of things are almost the same as normal API with differences like using the client to communicate to dask cluster, wrapping data into dask data structures, and calling compute() on lazy instances to actually run a task on a cluster to get results.

client = xgb.dask.Client(n_workers=4, threads_per_worker=4)

with client:
    X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target, train_size=0.90,
                                                        stratify=breast_cancer.target,
                                                        random_state=42)

    print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

    X_train_d, X_test_d, Y_train_d, Y_test_d = xgb.dask.da.array(X_train), xgb.dask.da.array(X_test), xgb.dask.da.array(Y_train, dtype="int64"), xgb.dask.da.array(Y_test, dtype="int64")

    xgb_dask_classif = xgb.dask.DaskXGBClassifier()

    xgb_dask_classif.fit(X_train_d, Y_train_d)

    train_preds = xgb_dask_classif.predict(X_train_d)
    test_preds = xgb_dask_classif.predict(X_test_d)

    print("Test  Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds.compute()))
    print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds.compute()))

    test_preds_proba = xgb_dask_classif.predict_proba(X_test_d)

    print("\nType of Preds Proba Result : ",type(test_preds_proba))

    test_preds_proba = test_preds_proba.compute()

test_preds_proba[:5]

Train/Test Sizes :  (512, 30) (57, 30) (512,) (57,)

Test  Accuracy Score : 0.96
Train Accuracy Score : 1.00

Type of Preds Proba Result :  <class 'dask.array.core.Array'>

array([3.9076293e-04, 9.9919409e-01, 9.9410325e-01, 4.0606453e-04,
       8.1864735e-03], dtype=float32)

12. GPU Support ¶

Xgboost provides support for running algorithms on GPU as well. It takes the addition of two simple parameters in order to instruct xgboost to shift training from CPU to GPU. The tree_method parameter has a value named gpu_hist which will let us run our same code on GPU. We can also provide GPU id by setting the gpu_id parameter if we have more than one GPU available.

Below we have run the same code from our previous example but now on GPU by setting tree_method asgpu_hist and gpu_id to 0.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

dmat_train = xgb.DMatrix(X_train, Y_train, feature_names=boston.feature_names)
dmat_test = xgb.DMatrix(X_test, Y_test, feature_names=boston.feature_names)

tweedie_booster = xgb.train({'max_depth': 3, 'eta': 1, 'objective': 'reg:tweedie',
                             'tree_method':'gpu_hist', 'gpu_id':0},
                    dmat_train,
                    evals=[(dmat_train, "train"), (dmat_test, "test")])

print("\nTrain RMSE : ",tweedie_booster.eval(dmat_train))
print("Test  RMSE : ",tweedie_booster.eval(dmat_test))

from sklearn.metrics import r2_score

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, tweedie_booster.predict(dmat_test)))
print("Train R2 Score : %.2f"%r2_score(Y_train, tweedie_booster.predict(dmat_train)))

Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[0]	train-tweedie-nloglik@1.5:28.32970	test-tweedie-nloglik@1.5:26.66487
[1]	train-tweedie-nloglik@1.5:19.30740	test-tweedie-nloglik@1.5:18.58394
[2]	train-tweedie-nloglik@1.5:18.72894	test-tweedie-nloglik@1.5:18.14010
[3]	train-tweedie-nloglik@1.5:18.71593	test-tweedie-nloglik@1.5:18.13066
[4]	train-tweedie-nloglik@1.5:18.70913	test-tweedie-nloglik@1.5:18.12305
[5]	train-tweedie-nloglik@1.5:18.70438	test-tweedie-nloglik@1.5:18.12354
[6]	train-tweedie-nloglik@1.5:18.70052	test-tweedie-nloglik@1.5:18.11985
[7]	train-tweedie-nloglik@1.5:18.69751	test-tweedie-nloglik@1.5:18.12120
[8]	train-tweedie-nloglik@1.5:18.69563	test-tweedie-nloglik@1.5:18.12000
[9]	train-tweedie-nloglik@1.5:18.69326	test-tweedie-nloglik@1.5:18.12512

Train RMSE :  [0]	eval-tweedie-nloglik@1.5:18.693260
Test  RMSE :  [0]	eval-tweedie-nloglik@1.5:18.125120

Test  R2 Score : 0.90
Train R2 Score : 0.95

Below we have explained how we can inform XGBClassifier to run training on GPU. The same will work for XGBRegressor, XGBRFClassifier, and XGBRFRegressor as well.

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target,
                                                    stratify=breast_cancer.target,
                                                    train_size=0.90, random_state=42)

xgb_classif = xgb.XGBClassifier(tree_method="gpu_hist", gpu_id=0)

xgb_classif.fit(X_train, Y_train)

print("Test  Accuracy Score : %.2f"%xgb_classif.score(X_test, Y_test))
print("Train Accuracy Score : %.2f"%xgb_classif.score(X_train, Y_train))

Test  Accuracy Score : 0.95
Train Accuracy Score : 1.00

13. GPU & Dask Together For Parallel GPUs ¶

Xgboost lets us run our code in parallel on multi GPUs as well by using dask. We can use dask for distributed training of our dataset and we can tree_method to gpu_hist to instruct each worker of dask to run the training process on GPU. This way we can run a training process on all workers of dask where each worker will run training on GPU of its own.

This ends our small tutorial explaining an in-depth guide to the majority of API of xgboost. Please feel free to let us know your views in the comments section.

References ¶

Other Gradient Boosted Decision Trees Implementations¶

XGBoost Doc¶

Dask Distributed Training Framework¶

Other Useful Libraries¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

xgboost

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

XGBoost - An In-Depth Guide [Python API]¶

> What is XGBoost (Extreme Gradient Boosting)?¶

> Why Choose "XGBoost" Over Other Gradient Boosting Trees Implementations?¶

> What Can You Learn From This Article?¶

> Which Other Python Libraries Provides Implementation Of Gradient Boosted Trees?¶

> How to Install XGBoost?¶

Important Sections Of Tutorial¶

1. Load Datasets¶

Boston Housing Dataset¶

Breast Cancer Dataset¶

Wine Dataset¶

2. XGBoost Estimators at High-Level (High-Level API)¶

3. Core API: Booster Estimator ¶

3.1 Booster: Regression Example ¶

3.1.1 Divide Data into Train and Test Sets¶

3.1.2 DMatrix: XGBoost Data Structure to Represent Data¶

3.1.3 "train()": Train Model¶

3.1.4 "predict()": Make Predictions¶

3.1.5 Evaluate Model Performance¶

3.1.6 Visualize Features Importances using "plot_importance()"¶

Visualize Individual Boosted Tree using "plot_tree()"¶

Visualize Feature Values Split Histogram using "get_split_value_histogram()"¶

Convert Trees to Dataframe using "trees_to_dataframe()"¶

3.2 Important Parameters of Boosting (train()) ¶

3.3 Booster: Tweedie Regression Example ¶

3.4 Booster: Binary Classification Example ¶

3.5 Booster: Multi-Class Classification Example ¶

3.6 Saving and Loading Trained Model ¶

3.7 Cross Validation ¶

4. Sklearn Like API ¶

4.1 XGBRegressor ¶

4.1.1 Train Model, Make Predictions & Evaluate Model Performance¶

4.1.2 Hyperparameters Tuning using Grid Search¶

4.2 XGBClassifier ¶

4.2.1 Train Model, Make Predictions & Evaluate Model Performance¶

4.2.2 Hyperparameters Tuning using Grid Search¶

4.3 XGBRFRegressor ¶

4.3.1 Train Model, Make Predictions & Evaluate Model Performance¶

4.3.2 Hyperparameters Tuning using Grid Search¶

4.4 XGBRFClassifier ¶

4.4.1 Train Model, Make Predictions & Evaluate Model Performance¶

4.4.2 Hyperparameters Tuning using Grid Search¶

5. Early Stop Training to Avoid Overfitting ¶

6. Feature Interaction Constraints ¶

7. Monotonic Constraints ¶

8. Custom Objective/Loss Function ¶

9. Custom Evaluation Functions ¶

10. Callbacks ¶

11. Dask Backend for Distributed Training Of XGBoost Models ¶

12. GPU Support ¶

13. GPU & Dask Together For Parallel GPUs ¶

References ¶

Other Gradient Boosted Decision Trees Implementations¶

XGBoost Doc¶

Dask Distributed Training Framework¶

Other Useful Libraries¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription

1. Load Datasets ¶