Updated On : Aug-19,2022 Time Investment : ~30 mins

LightGBM - An In-Depth Guide [Python API]

> What is LightGBM?

LightGBM is a framework that provides an implementation of gradient boosted decision trees. The gradient boosted decision trees is a type of gradient boosted machines algorithm that uses decision trees as estimators of an ensemble. An ensemble consists of many weak models/estimators (decision trees) whose predictions are combined to make final prediction.

It was created by the researchers and developers team at Microsoft.

> Why "LightGBM" over Other Python Gradient Boosted Trees Implementations?

Light GBM is known for its

  • Faster-training speed
  • Good accuracy with default parameters
  • Parallel and GPU learning
  • Low memory footprint
  • Capability of handling large datasets which might not fit in memory.

LightGBM provides API in C, Python, and R Programming.

LightGBM even provides CLI (Command Line Interface) which lets us use the library from the command line.

LightGBM estimators provide a large set of hyperparameters to tune the model. It even has a large set of optimization/loss functions and evaluation metrics already implemented.

> What Can You Learn From This Article?

As a part of this tutorial, we have explained how to use Python library LighGBM to solve machine learning tasks (Regression and Classification). Tutorial explains majority of Python API of library with simple and easy-to-understand examples.

Apart from training model and making predictions, it explains many different concepts like cross-validation, saving & loading model, visualizing features importances, early stopping training to avoid overfitting, how to create custom loss functions, how to create a custom evaluation metrics, how to use callbacks, etc.

All our examples have lightgbm models trained on toy datasets (structured - tabular) available from scikit-learn.

The main aim of this tutorial is to make readers aware of the majority of functionalities available through lightgbm and get them started with the framework.

> Which Other Python Libraries Provides Implementation Of Gradient Boosted Trees?

> How to Install LightGBM?

  • PIP
    • pip install -U lightgbm
  • Conda
    • conda install lightgbm

Below, we have listed important sections of tutorial to give an overview of the material covered. We know that the list below is big but you can skip some sections of tutorial which has a theory or repeat example of some concepts. We have included NOTE in those sections so you can skip them to complete tutorial faster. You can then refer to those sections in your free time or as per need.

Important Sections Of Tutorial

  1. Load Datasets for Tutorial
    • Boston Housing Dataset for Regression Tasks
    • Breast Cancer Dataset for Binary Classification Tasks
    • Wine Dataset for Multi-Class Classification Tasks
  2. LightGBM Models at High-Level (High-Level API)
  3. Booster: train() - Core API to Train Model
    • Important Parameters of train() Function
    • Dataset: LightGBM Data Structure to Represent Data
    • Regression Example
    • Binary Classification Example
    • Multi-Class Classification Example
  4. List of Important Parameters of LightGBM Estimators
  5. LGBMModel (Scikit-Learn like API)
    • Regression Example
    • Binary Classification Example
  6. LGBMRegressor (Scikit-Learn like API)
  7. LGBMClassifier (Scikit-Learn like API)
  8. Saving and Loading Model
  9. Cross Validation
  10. Plotting Functionality
    • Visualize Features Importance using "plot_importance()"
    • Visualize ML Metric using "plot_metric()"
    • Visualize Feature Values Split using "plot_split_value_histogram()"
    • Visualize Individual Boosted Tree using "plot_tree()"
  11. Early Stopping Training
  12. Feature Interaction Constraints
  13. Monotonic Constraints
  14. Custom Objective/Loss Function
  15. Custom Evaluation Function
  16. Callbacks

We'll start by importing the necessary Python libraries and printing the versions that we have used in our tutorial.

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import warnings

warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 50)

import lightgbm as lgb
import sklearn

print("LightGBM Version     : ", lgb.__version__)
print("Scikit-Learn Version : ", sklearn.__version__)
LightGBM Version     :  3.3.2
Scikit-Learn Version :  1.0.2

1. Load Datasets

We'll be using the below-mentioned three different datasets which are available from sklearn as a part of this tutorial for explanation purposes.

  1. Boston Housing Dataset: It's a regression problem dataset which has information about the various attribute of houses in Boston and their price in dollar. This will be used for regression tasks.
  2. Breast Cancer Dataset: It's a classification dataset that has information about two different types of tumor. It'll be used for explaining binary classification tasks.
  3. Wine Dataset - It's a classification dataset that has information about ingredients used in three different types of wines. It'll be used for explaining multi-class classification tasks.

We have loaded all three datasets mentioned one by one below. We have printed descriptions of datasets which gives us an overview of dataset features and size. We have even loaded each dataset as a pandas data frame and displayed the first few samples of data.

Boston Housing Dataset

from sklearn.datasets import load_boston

boston = load_boston()

for line in boston.DESCR.split("\n")[5:29]:
    print(line)

boston_df = pd.DataFrame(data=boston.data, columns = boston.feature_names)
boston_df["Price"] = boston.target

boston_df.head()
**Data Set Characteristics:**

    :Number of Instances: 506

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT Price
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2

Breast Cancer Dataset

from sklearn.datasets import load_breast_cancer

breast_cancer = load_breast_cancer()

for line in breast_cancer.DESCR.split("\n")[5:31]:
    print(line)

breast_cancer_df = pd.DataFrame(data=breast_cancer.data, columns = breast_cancer.feature_names)
breast_cancer_df["TumorType"] = breast_cancer.target

breast_cancer_df.head()
**Data Set Characteristics:**

    :Number of Instances: 569

    :Number of Attributes: 30 numeric, predictive attributes and the class

    :Attribute Information:
        - radius (mean of distances from center to points on the perimeter)
        - texture (standard deviation of gray-scale values)
        - perimeter
        - area
        - smoothness (local variation in radius lengths)
        - compactness (perimeter^2 / area - 1.0)
        - concavity (severity of concave portions of the contour)
        - concave points (number of concave portions of the contour)
        - symmetry
        - fractal dimension ("coastline approximation" - 1)

        The mean, standard error, and "worst" or largest (mean of the three
        largest values) of these features were computed for each image,
        resulting in 30 features.  For instance, field 3 is Mean Radius, field
        13 is Radius SE, field 23 is Worst Radius.

        - class:
                - WDBC-Malignant
                - WDBC-Benign
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension radius error texture error perimeter error area error smoothness error compactness error concavity error concave points error symmetry error fractal dimension error worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension TumorType
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 1.0950 0.9053 8.589 153.40 0.006399 0.04904 0.05373 0.01587 0.03003 0.006193 25.38 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890 0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 0.5435 0.7339 3.398 74.08 0.005225 0.01308 0.01860 0.01340 0.01389 0.003532 24.99 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902 0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 0.7456 0.7869 4.585 94.03 0.006150 0.04006 0.03832 0.02058 0.02250 0.004571 23.57 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758 0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 0.4956 1.1560 3.445 27.23 0.009110 0.07458 0.05661 0.01867 0.05963 0.009208 14.91 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300 0
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 0.7572 0.7813 5.438 94.44 0.011490 0.02461 0.05688 0.01885 0.01756 0.005115 22.54 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678 0

Wine Dataset

from sklearn.datasets import load_wine

wine = load_wine()

for line in wine.DESCR.split("\n")[5:29]:
    print(line)

wine_df = pd.DataFrame(data=wine.data, columns = wine.feature_names)
wine_df["WineType"] = wine.target

wine_df.head()
**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2

alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline WineType
0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065.0 0
1 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28 4.38 1.05 3.40 1050.0 0
2 13.16 2.36 2.67 18.6 101.0 2.80 3.24 0.30 2.81 5.68 1.03 3.17 1185.0 0
3 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18 7.80 0.86 3.45 1480.0 0
4 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82 4.32 1.04 2.93 735.0 0

2. LightGBM Models at High-Level

LightGBM provides four different estimators to perform classification and regression tasks.

  1. Booster - It is a universal estimator created by calling train() method. It can be used for regression as well as classification tasks. All other estimators are wrapper around it.
  2. LGBMModel - Its a universal estimator with scikit-learn like API that can handle both classification and regression datasets with settings.
  3. LGBMRegressor - It is an estimator with scikit-learn like API designed to work with regression datasets.
  4. LGBMClassifier - It is an estimator with scikit-learn like API designed to work with classification datasets.

3. Booster (train()): Core API to Train Model

The simplest way to create an estimator in lightgbm is by using the train() method. It takes as input estimator parameter as dictionary and training dataset. It then trains the estimator and returns an object of type Booster which is a trained estimator that can be used to make future predictions.

3.1 Important Parameters of "train()" Function

Below are some of the important parameters of the train() method.

  • params - This parameter accepts dictionary specifying parameters of gradient boosted decision trees algorithm. We just need to provide an objective function to get started with based on the type of problem (classification/regression). We'll later explain a commonly used list of parameters that can be passed to this dictionary.
  • train_set - This parameter accepts lightgbm Dataset object which holds information about feature values and target values. It's an internal data structure designed by lightgbm to wrap data.
  • num_boost_round - It specifies the number of booting trees that will be used in the ensemble. The group of gradient boosted trees is called ensemble to whom we generally refer as an estimator. The default value is 100.
  • valid_sets - It accepts list of Dataset objects which as validation sets. These validation sets will be evaluated after each training round.
  • valid_names - It accepts a list of strings of the same length as that of valid_sets specifying names for each validation set. These names will be used when printing evaluation metrics for these datasets as well as when plotting them.
  • categorical_feature - It accepts list of strings/ints or string auto. If we give a list of strings/ints then those columns from the dataset will be treated as categorical columns.
  • verbose_eval - It accepts bool or int as value. If we set the value to False or 0 then it won't print metrics evaluation results calculated on validation sets that we passed. If we pass True then it'll print results for each round. If we pass an integer greater than 1 then it'll print results repeatedly after that many rounds.

3.2 Dataset: LightGBM Data Structure to Represent Data

The dataset is a lightgbm internal data structure for holding data and labels. Below are important parameters of the class.

  • data - It accepts numpy array, pandas dataframe, scipy sparse matrix, list of numpy arrays, h2o data table’s frame as input holding feature values.
  • label - It accepts numpy array, pandas series, pandas one column dataframe specifying target values. We can even set this parameter to None if we don't have target values. The default is None.
  • feature_name - It accepts a list of strings specifying feature names.
  • categorical_feature - It has the same meaning as that mentioned in the train() method parameter above. We can handle categorical feature here or in that method.

3.3 Regression Example

The first problem that we'll solve using lightgbm is a simple regression problem using the Boston housing dataset which we loaded earlier. We have divided the dataset into train/test sets and created a Dataset instance out of them. We have then called the lightgbm.train() method giving it train and validation set. We have set the number of boosting rounds to 10 hence it'll create 10 boosted trees to solve the problem. After training completes, it'll return an instance of type Booster which we can later use to make future predictions on the dataset. As we have given the validation set as input, it'll print the validation l2 score after each iteration of training. Please make a note that by default lightgbm minimizes l2 loss for regression problems.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression"},
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    num_boost_round=10)
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000076 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 975
[LightGBM] [Info] Number of data points in the train set: 379, number of used features: 13
[LightGBM] [Info] Start training from score 22.590501
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[1]	valid_0's l2: 63.038
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[2]	valid_0's l2: 54.5739
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[3]	valid_0's l2: 47.6902
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[4]	valid_0's l2: 41.6301
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[5]	valid_0's l2: 36.776
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[6]	valid_0's l2: 32.8883
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[7]	valid_0's l2: 29.8897
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[8]	valid_0's l2: 27.244
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[9]	valid_0's l2: 24.9776
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[10]	valid_0's l2: 22.8617

Below we have made predictions on train and test data using a trained booster. We have then calculated R2 metrics for both using the sklearn metric method. Please make a note that the predict() method accepts numpy array, pandas dataframe, scipy sparse matrix, or h2o data table’s frame as input for making predictions.

If you are interested in learning the list of available metrics in scikit-learn then please feel free to check our tutorial on the same.

from sklearn.metrics import r2_score


test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Test  R2 Score : 0.67
Train R2 Score : 0.74

The predict() method of a few important parameters which can be used to make a different kind of predictions.

  • raw_score - It a boolean parameter which if set to True will return raw predictions. For regression problems, this won't make any difference but for classification problems, it'll return function values as output than probabilities.
  • pred_leaf - This parameter accepts boolean values which if set to True will return an index of leaf in each tree that was predicted for a particular sample. The size of the output will be n_samples x n_trees.
  • pred_contrib - It returns an array of features contribution for each sample. It'll return an array of size (n_features + 1) for each sample of data where the last value is the expected value and the first n_features values are the contribution of features in making that prediction. We can add the contribution of each feature to the last expected value and we'll get an actual prediction. It's commonly referred to as SHAP values.

If you are interested in learning about SHAP values and our tutorial on the awesome SHAP package which lets us visualize these SHAP values in different ways to understand the performance of the model then check our tutorial on the same.

idxs = booster.predict(X_test, pred_leaf=True)

print("Shape : ", idxs.shape)

idxs
Shape :  (127, 10)
array([[ 2,  2,  2, ...,  4,  4,  4],
       [ 9, 12,  6, ...,  5,  8,  7],
       [ 9,  6,  6, ...,  5, 10,  7],
       ...,
       [ 2,  2,  2, ...,  4,  4,  4],
       [13, 10, 12, ..., 13, 14, 10],
       [11,  0,  8, ...,  8,  9, 13]], dtype=int32)
shap_vals = booster.predict(X_test, pred_contrib=True)

print("Shape : ", shap_vals.shape)

print("\nShap Values of 0th Sample : ", shap_vals[0])
print("\nPrediction of 0th using SHAP Values : ", shap_vals[0].sum())
print("Actual Prediction of 0th Sample     : ", test_preds[0])
Shape :  (127, 14)

Shap Values of 0th Sample :  [ 2.83275837e-01  0.00000000e+00  1.18896249e-01  0.00000000e+00
  7.28958665e-02  5.53802603e+00 -2.43603336e-02  7.18686350e-02
 -2.33464487e-03  4.29395596e-02  7.31633672e-02 -1.87498941e-02
  3.34226697e+00  2.24936676e+01]

Prediction of 0th using SHAP Values :  31.99155522517617
Actual Prediction of 0th Sample     :  31.991555225176175

We can call the num_trees() method on the booster instance to get a number of trees in the ensemble. Please make a note that if we don't stop training early then a number of trees will be the same as num_boost_round. But if we are stopping training early then a number of trees will be different from num_boost_round. We have explained later in this tutorial how we can stop training if the ensemble's performance is not improving when evaluated on the validation set.

booster.num_trees()
10

The booster instance has another important method named feature_importance() which can return us the importance of features based on gain and split values of the trees.

booster.feature_importance(importance_type="gain")
array([ 3814.74202061,     0.        ,   207.29499817,     0.        ,
        2729.91098022, 38139.31585693,   891.23509979,   529.9323864 ,
         256.73030472,   198.98090363,   657.979702  ,   150.48840141,
       76114.31529617])
booster.feature_importance(importance_type="split")
array([20,  0,  1,  0,  9, 33,  8,  6,  2,  4,  7,  5, 40], dtype=int32)

3.4 Binary Classification Example

In this section, we have explained how we can use the train() method to create a booster for a binary classification problem. We are training the model on the breast cancer dataset and later evaluating the accuracy of it using a metric from sklearn. We have set an objective to binary for informing the train() method that we'll be giving data for binary classification problem. We have also set the verbosity parameter value to -1 in order to prevent training messages. It'll still print validation set evaluation results which can be turned off by setting the verbose_eval parameter to False.

Please make a note that for classification problems predict() method of booster return probabilities. We have included logic to convert probabilities to the target class.

LightGBM evaluates binary log loss function by default on the validation set for binary classification problems. We can give the metric parameter in the dictionary which we are giving to the train() method with any metric names available with lightgbm and it'll evaluate that metric. We'll later explain the list of available metrics with lightgbm.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=breast_cancer.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=breast_cancer.feature_names.tolist())


booster = lgb.train({"objective": "binary", "verbosity": -1},
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    num_boost_round=10)

from sklearn.metrics import accuracy_score


test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]
train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest  Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))
print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))
Train/Test Sizes :  (426, 30) (143, 30) (426,) (143,)
[1]	valid_0's binary_logloss: 0.593312
[2]	valid_0's binary_logloss: 0.532185
[3]	valid_0's binary_logloss: 0.484191
[4]	valid_0's binary_logloss: 0.442367
[5]	valid_0's binary_logloss: 0.406814
[6]	valid_0's binary_logloss: 0.373153
[7]	valid_0's binary_logloss: 0.344765
[8]	valid_0's binary_logloss: 0.320929
[9]	valid_0's binary_logloss: 0.296162
[10]	valid_0's binary_logloss: 0.278894

Test  Accuracy Score : 0.96
Train Accuracy Score : 0.98

3.5 MultiClass Classification Example

NOTE: Please feel free to skip this section if you are in hurry and have understood how to use LightGBM for classification tasks using our previous binary classification example.

As a part of this section, we have explained how we can use the train() method for multi-class classification problems. We are using it on the wine dataset which has three different types of wine as the target variable. We have set an objective function to multiclass. We need to provide the num_class parameter with an integer specifying a number of classes whenever we are using the method for multi-class classification problems.

The predict() method returns the probabilities of each class in case of multi-class problems. We have included logic to select the class with maximum probability as a prediction.

LightGBM evaluates multi-class log loss function by default on the validation set for binary classification problems.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(wine.data, wine.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=wine.feature_names)
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=wine.feature_names)


booster = lgb.train({"objective": "multiclass", "num_class":3, "verbosity": -1},
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    num_boost_round=10)

from sklearn.metrics import accuracy_score


test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = np.argmax(test_preds, axis=1)
train_preds = np.argmax(train_preds, axis=1)

print("\nTest  Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))
print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))
Train/Test Sizes :  (133, 13) (45, 13) (133,) (45,)
[1]	valid_0's multi_logloss: 1.00307
[2]	valid_0's multi_logloss: 0.88303
[3]	valid_0's multi_logloss: 0.787628
[4]	valid_0's multi_logloss: 0.712626
[5]	valid_0's multi_logloss: 0.645891
[6]	valid_0's multi_logloss: 0.588107
[7]	valid_0's multi_logloss: 0.540376
[8]	valid_0's multi_logloss: 0.501455
[9]	valid_0's multi_logloss: 0.458277
[10]	valid_0's multi_logloss: 0.420889

Test  Accuracy Score : 0.91
Train Accuracy Score : 0.99

4. List of Important Parameters of LightGBM Estimators (train() Function)

NOTE: Please feel free to skip this section if you are in hurry. It is theoretical section listing parameters of "train()" constructor. You can refer them later as you need to tweak model.

We'll now list down important parameters of lightgbm which can be provided in a dictionary when calling the 'train()' method. We can provide the same parameters to estimators (LGBMModel, LGBMRegressor, and LGBMClassifier) that are readily available in lightgbm with the only difference that we don't need to provide them as a dictionary but we can provide them directly when creating an instance. We'll be introducing those estimators from the next section onwards.

  • objective - This parameter lets us define an objective/loss function to use for the task. The default value of this parameter is regression. Below is a list of commonly used values for this parameter.
  • metric - This parameter accepts metrics to be evaluated on evaluation datasets if evaluation datasets are provided as eval_set/validation_sets parameter value. We can provide more than one metric and all will be evaluated on validation sets. Below is a list of the commonly used values of metrics.
  • boosting - This parameter accepts one of the below-mentioned string specifying which algorithm to use.
    • gbdt - Default. Gradient Boosting Decision Tree
    • rf - Random Forest
    • dart - Dropouts meet multiple additive regression trees
    • goss - Gradient-based on side sampling
  • num_iterations - This parameter is an alias to num_boost_round which lets us specify the number of trees to the ensemble to create an estimator. The default is 100.
  • learning_rate - This parameter accepts a learning rate to use for the training process. The default is 0.1.
  • num_class - If we are working with multi-class classification problems then we need to provide a number of classes to this parameter.
  • num_leaves - This parameter accepts integer specifying the number of max leaves allowed per tree. The default is 31.
  • num_threads - It accepts integer specifying the number of threads to use for training. We can set it to the same number of cores of the system.
  • seed - This lets us specify the default seed for training which lets us regenerate the same results.
  • max_depth - This parameter lets us specify the maximum depth allowed for trees in the ensemble. The default is -1 which let trees grow as deep as possible. We can restrict this behavior by setting this parameter.
  • min_data_in_leaf - This parameter accepts integer value specifying a minimum number of data points that can be kept in one leaf. This parameter can be used to control overfitting. The default value is 20.
  • bagging_fraction - This parameter accepts float value between 0-1 letting us specify randomly select that much part of the data when training. This parameter can help prevent overfitting. The default is 1.0.
  • feature_fraction - This parameter accepts a float value between 0-1 that informs the algorithm to select that fraction of features from the total for training at each iteration. The default is 1.0 hence selecting all features.
  • extra_trees - This parameter accepts boolean values specifying whether to use an extremely randomized tree or not.
  • early_stopping_round - This parameter accepts integer specifying we should stop training if evaluation metric is not improving on last evaluation set for iterations specified by this parameter.
  • monotone_constraints - This parameter lets us specify whether our model should enforce increasing, decreasing, or no relation of an individual feature with the target value. We have explained the usage of this parameter in a section named monotonic constraints.
  • monotone_constraints_method - This parameter accepts one of the below-mentioned string specifying the type of monotonic constraints to impose.
    • basic - Basic monotone constraints method which can over constrain the model.
    • intermediate - It’s a little advanced constraints method which is a little less constraining than the basic method but can take a little more time.
    • advanced - - It’s an advanced constraints method that is less constraining than basic and intermediate methods but can take more time.
  • interaction_constraints - This parameter accepts a list of lists where individual list specify feature indices which are allowed to interact with one another. We have explained feature interaction in detail in section feature interaction constraints.
  • verbosity - This parameter accepts integer value for controlling logging message when training.
    • <0 - Only Fatal Errors are displayed.
    • 0 - Only Error/Warning messages are displayed.
    • 1 - Only info messages are displayed.
    • >1 - Only debug information is displayed.
  • is_unbalance - This is a boolean parameter that should be set to True if data is imbalanced. It should be used with binary and multi-class classification problems.
  • device_type - It accepts one of the below string specifying device type of training.
    • cpu
    • gpu
  • force_col_wise - This parameter accepts boolean value specifying whether to force column-wise histogram building when training. If data has too many columns then setting this parameter to True will improve training process speed by reducing memory usage.
  • force_row_wise - This parameter accepts boolean value specifying whether to force row-wise histogram building when training. If data has too many rows then setting this parameter to True will improve training process speed by reducing memory usage.

Please make a NOTE that this is not the full list of parameters available with lightgbm but only a few important parameters list. If you are interested in learning about all parameters then please feel free to check the below link.

5. LGBMModel (Scikit-Learn like API)

LGBMModel class is a wrapper around Booster class that provides scikit-learn like API for training and prediction in lightgbm. It let us create an estimator object with a list of parameters as input. We can then call the fit() method giving train data for training and the predict() method for making a prediction. The parameters which we had given as a dictionary to params parameter of train() can now directly be given to the constructor of LGBMModel to create a model. LGBMModel let us perform both classification and regression tasks by specifying the objective of the task.

5.1 Regression Example

Below we have explained with a simple example of how we can use LGBMModel to perform regression tasks with Boston housing data. We have first created an instance of LGBMModel with the objective as regression and number of trees set to 10. The n_estimators parameter is an alias of num_boost_round parameter of train() method.

We have then called the fit() method for the training model giving train data to it. Please make a note that it accepts numpy arrays as input and not lightgbm Dataset object. We have also given a dataset to be used as an evaluation set and metrics to be evaluated on the evaluation dataset. The parameter of the fit() method is almost the same as that of the train() method.

At last, we have called the predict() method to make predictions.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric="rmse")

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 8.50151	valid_0's l2: 72.2756
[2]	valid_0's rmse: 7.82463	valid_0's l2: 61.2248
[3]	valid_0's rmse: 7.22264	valid_0's l2: 52.1665
[4]	valid_0's rmse: 6.72909	valid_0's l2: 45.2806
[5]	valid_0's rmse: 6.29399	valid_0's l2: 39.6144
[6]	valid_0's rmse: 5.90399	valid_0's l2: 34.8571
[7]	valid_0's rmse: 5.58942	valid_0's l2: 31.2417
[8]	valid_0's rmse: 5.3252	valid_0's l2: 28.3577
[9]	valid_0's rmse: 5.07205	valid_0's l2: 25.7257
[10]	valid_0's rmse: 4.82126	valid_0's l2: 23.2445

Test  R2 Score : 0.72
Train R2 Score : 0.74

5.2 Binary Classification Example

Below we have explained with a simple example of how we can use LGBMModel for classification tasks. We have a trained model with a breast cancer dataset. Please make a note that the predict() method returns probabilities. We have included logic to calculate class from probabilities.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="binary", n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),])

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]
train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest  Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))
print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))
Train/Test Sizes :  (426, 30) (143, 30) (426,) (143,)
[1]	valid_0's binary_logloss: 0.569994
[2]	valid_0's binary_logloss: 0.511938
[3]	valid_0's binary_logloss: 0.463662
[4]	valid_0's binary_logloss: 0.423662
[5]	valid_0's binary_logloss: 0.391412
[6]	valid_0's binary_logloss: 0.361046
[7]	valid_0's binary_logloss: 0.332719
[8]	valid_0's binary_logloss: 0.311722
[9]	valid_0's binary_logloss: 0.292474
[10]	valid_0's binary_logloss: 0.270656

Test  Accuracy Score : 0.92
Train Accuracy Score : 0.97

6. LGBMRegressor (Scikit-Learn like API)

LGBMRegressor is another wrapper estimator around the Booster class provided by lightgbm which has the same API as that of sklearn estimators. As its name suggests, it’s designed for regression tasks. LGBMRegressor is almost the same as that of LGBMModel with the only difference that it’s designed for only regression tasks. Below we have explained the usage of LGBMRegressor with a simple example using the Boston housing dataset. Please make a note that LGBMRegressor provides the score() method which evaluates the R2 score for us which we used to evaluate using the sklearn metric method till now.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMRegressor(objective="regression_l2", n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric=["rmse", "l2", "l1"])

print("\nTest  R2 Score : %.2f"%booster.score(X_train, Y_train))
print("Train R2 Score : %.2f"%booster.score(X_test, Y_test))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 8.32795	valid_0's l2: 69.3548	valid_0's l1: 6.29438
[2]	valid_0's rmse: 7.7053	valid_0's l2: 59.3716	valid_0's l1: 5.82674
[3]	valid_0's rmse: 7.13747	valid_0's l2: 50.9434	valid_0's l1: 5.41024
[4]	valid_0's rmse: 6.64829	valid_0's l2: 44.1998	valid_0's l1: 5.021
[5]	valid_0's rmse: 6.17531	valid_0's l2: 38.1344	valid_0's l1: 4.68112
[6]	valid_0's rmse: 5.77563	valid_0's l2: 33.3579	valid_0's l1: 4.40061
[7]	valid_0's rmse: 5.44279	valid_0's l2: 29.624	valid_0's l1: 4.14437
[8]	valid_0's rmse: 5.13386	valid_0's l2: 26.3566	valid_0's l1: 3.89693
[9]	valid_0's rmse: 4.87077	valid_0's l2: 23.7244	valid_0's l1: 3.68527
[10]	valid_0's rmse: 4.61592	valid_0's l2: 21.3067	valid_0's l1: 3.50584

Test  R2 Score : 0.73
Train R2 Score : 0.74

7. LGBMClassifier (Scikit-Learn like API)

LGBMClassifier is one more wrapper estimator around the Booster class that provides a sklearn-like API for classification tasks. It works exactly like LGBMModel but for only classification tasks. It also provides a score() method which evaluates the accuracy of data passed to it.

Please make a note that LGBMClassifier predicts actual class labels for the classification tasks with the predict() method. It provides the predict_proba() method if we want probabilities of target classes.

7.1 Binary Classification Example

Below we have explained with a simple example of how we can use LGBMClassifier for binary classification tasks. We have explained its usage with the Breast cancer dataset.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMClassifier(objective="binary", n_estimators=10)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),])

print("\nTest  Accuracy Score : %.2f"%booster.score(X_test, Y_test))
print("Train Accuracy Score : %.2f"%booster.score(X_train, Y_train))
Train/Test Sizes :  (426, 30) (143, 30) (426,) (143,)
[1]	valid_0's binary_logloss: 0.599368
[2]	valid_0's binary_logloss: 0.536792
[3]	valid_0's binary_logloss: 0.487134
[4]	valid_0's binary_logloss: 0.444999
[5]	valid_0's binary_logloss: 0.409009
[6]	valid_0's binary_logloss: 0.377066
[7]	valid_0's binary_logloss: 0.349213
[8]	valid_0's binary_logloss: 0.324688
[9]	valid_0's binary_logloss: 0.303217
[10]	valid_0's binary_logloss: 0.284869

Test  Accuracy Score : 0.92
Train Accuracy Score : 0.97

Test  Accuracy Score : 0.92
Train Accuracy Score : 0.97

7.2 Multi-Class Classification Example

NOTE: Please feel free to skip this section if you are in hurry and have understood how to use LightGBM for classification tasks using our previous binary classification example.

Below we have explained the usage of LGBMClassifier for multi-class classification tasks using the Wine classification dataset.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(wine.data, wine.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMClassifier(objective="multiclassova", n_estimators=10, num_class=3)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),])

print("\nTest  Accuracy Score : %.2f"%booster.score(X_test, Y_test))
print("Train Accuracy Score : %.2f"%booster.score(X_train, Y_train))
Train/Test Sizes :  (133, 13) (45, 13) (133,) (45,)
[1]	valid_0's multi_logloss: 0.923875
[2]	valid_0's multi_logloss: 0.81018
[3]	valid_0's multi_logloss: 0.726106
[4]	valid_0's multi_logloss: 0.660671
[5]	valid_0's multi_logloss: 0.594604
[6]	valid_0's multi_logloss: 0.546413
[7]	valid_0's multi_logloss: 0.498342
[8]	valid_0's multi_logloss: 0.460875
[9]	valid_0's multi_logloss: 0.421938
[10]	valid_0's multi_logloss: 0.37877

Test  Accuracy Score : 0.98
Train Accuracy Score : 0.99

NOTE

Please make a note that LGBMModel, LGBMRegressor and LGBMClassifier provides an attribute named 'booster_' which returns an instance of the Booster class which we can save to disk after training and later load for prediction.

booster.booster_
<lightgbm.basic.Booster at 0x7f21e9f69eb8>

8. Saving and Loading Model

We'll now explain how we can save the trained model to a disk to use later for predictions. Lightgbm provides the below-mentioned methods for our purpose of saving and loading models.

  • save_model() - This method takes as an input file name to which save the model.
  • model_to_string() - This method returns a string representation of the model which we can then save to a text file.
  • lightgbm.Booster() - This constructor lets us create an instance of the Booster class. It has two important parameters that can help us load a model from a file or from a string.
    • model_file - This parameter accepts the file name from which to load the trained model.
    • model_str - This parameter accepts a string that has information about the trained model. We need to give a string that was generated using model_to_string() to this parameter after loading from the file.

Below we have explained with simple examples how we can use above mentioned methods to save models to a disk and then load it.

Please make a note that in order to save model trained using LGBMModel, LGBMRegressor, and LGBMClassifier, we first need to get their Booster instance by using the booster_ attribute of an estimator and then save it. LGBMModel, LGBMRegressor, and LGBMClassifier do not provide saving and loading functionalities. It’s only available with the Booster instance.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())


booster = lgb.train({"objective": "regression", "verbosity": -1},
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    verbose_eval=False,
                    feature_name=boston.feature_names.tolist(),
                    num_boost_round=10)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)

Test  R2 Score : 0.70
Train R2 Score : 0.74

Save Model using LightGBM "save_model()" Method

booster.save_model("lgb.model")
<lightgbm.basic.Booster at 0x7f08e8967c50>

Load Model through "Booster()" Constructor

loaded_booster  = lgb.Booster(model_file="lgb.model")

loaded_booster
<lightgbm.basic.Booster at 0x7f08e8e744a8>
from sklearn.metrics import r2_score

test_preds = loaded_booster.predict(X_test)
train_preds = loaded_booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Test  R2 Score : 0.70
Train R2 Score : 0.74

Save Model as a String to File using "model_to_string()" Method

model_as_str = booster.model_to_string()

with open("booster2.model", "w") as f:
    f.write(model_as_str)

Load Model from String through Booster() Constructor

model_str = open("booster2.model").read()

booster_frm_str = lgb.Booster(model_str = model_str)
booster_frm_str
Finished loading model, total used 10 iterations
<lightgbm.basic.Booster at 0x7f08e8938940>
from sklearn.metrics import r2_score

test_preds = booster_frm_str.predict(X_test)
train_preds = booster_frm_str.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Test  R2 Score : 0.70
Train R2 Score : 0.74

9. Cross Validation Example

Lightgbm let us perform cross-validation using cv() method. It accepts model parameters as a dictionary like the train() method. We can then give a dataset on which to perform cross-validation. It performs 5-fold cross-validation by default. We can change the number of folds by setting the nfold parameter. It also accepts sklearn's data splitter like KFold, StratifiedKFold, ShuffleSplit, and StratifiedShuffleSplit. We can provide these data splitters to the folds parameter of the method.

The cv() method returns a dictionary that has information about the mean and standard deviation of loss for each round of training. We can even ask the method to return an instance of CVBooster by setting the return_cvbooster parameter to True. CVBooster object has information about cross-validation.

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=breast_cancer.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=breast_cancer.feature_names.tolist())

lgb.cv({"objective": "binary", "verbosity": -1},
       train_set=test_dataset, num_boost_round=10,
       nfold=5, stratified=True, shuffle=True,
       verbose_eval=True)
[1]	cv_agg's binary_logloss: 0.586297 + 0.00814598
[2]	cv_agg's binary_logloss: 0.536385 + 0.0139104
[3]	cv_agg's binary_logloss: 0.494618 + 0.021394
[4]	cv_agg's binary_logloss: 0.457766 + 0.0266986
[5]	cv_agg's binary_logloss: 0.427578 + 0.0317981
[6]	cv_agg's binary_logloss: 0.400594 + 0.0347366
[7]	cv_agg's binary_logloss: 0.378743 + 0.0393459
[8]	cv_agg's binary_logloss: 0.355944 + 0.0406613
[9]	cv_agg's binary_logloss: 0.341757 + 0.0431176
[10]	cv_agg's binary_logloss: 0.324393 + 0.0439941
{'binary_logloss-mean': [0.5862971048268162,
  0.536385329057131,
  0.4946178001035051,
  0.4577660981720048,
  0.42757828019512817,
  0.40059432541714546,
  0.3787432348470402,
  0.355943799374708,
  0.3417565456639551,
  0.3243928378974005],
 'binary_logloss-stdv': [0.008145979941642538,
  0.013910430256742287,
  0.02139399288171927,
  0.026698647074055896,
  0.0317980957740354,
  0.03473655291456087,
  0.039345850387526374,
  0.04066125361064387,
  0.04311758960643671,
  0.04399410008603076]}
from sklearn.model_selection import StratifiedShuffleSplit

cv_output = lgb.cv({"objective": "binary", "verbosity": -1},
                   train_set=test_dataset, num_boost_round=10,
                   metrics=["auc", "average_precision"],
                   folds=StratifiedShuffleSplit(n_splits=3),
                   verbose_eval=True,
                   return_cvbooster=True)

for key, val in cv_output.items():
    print("\n" + key, " : ", val)
[1]	cv_agg's auc: 0.891975 + 0.0243025	cv_agg's average_precision: 0.903601 + 0.0403935
[2]	cv_agg's auc: 0.947531 + 0.0218243	cv_agg's average_precision: 0.966003 + 0.0157877
[3]	cv_agg's auc: 0.959877 + 0.0340906	cv_agg's average_precision: 0.97341 + 0.0230962
[4]	cv_agg's auc: 0.962963 + 0.0302406	cv_agg's average_precision: 0.976702 + 0.018958
[5]	cv_agg's auc: 0.969136 + 0.0314754	cv_agg's average_precision: 0.980817 + 0.0197982
[6]	cv_agg's auc: 0.975309 + 0.0230967	cv_agg's average_precision: 0.985447 + 0.0135086
[7]	cv_agg's auc: 0.975309 + 0.0230967	cv_agg's average_precision: 0.985447 + 0.0135086
[8]	cv_agg's auc: 0.975309 + 0.0230967	cv_agg's average_precision: 0.985447 + 0.0135086
[9]	cv_agg's auc: 0.975309 + 0.0230967	cv_agg's average_precision: 0.985447 + 0.0135086
[10]	cv_agg's auc: 0.969136 + 0.0314754	cv_agg's average_precision: 0.980817 + 0.0197982

auc-mean  :  [0.8919753086419754, 0.9475308641975309, 0.9598765432098766, 0.9629629629629629, 0.9691358024691358, 0.9753086419753086, 0.9753086419753086, 0.9753086419753086, 0.9753086419753086, 0.9691358024691358]

auc-stdv  :  [0.02430249343830806, 0.02182428336995516, 0.034090620423417484, 0.0302406141084343, 0.031475429096251756, 0.023096650535641628, 0.023096650535641628, 0.023096650535641628, 0.023096650535641628, 0.031475429096251756]

average_precision-mean  :  [0.9036008230452675, 0.9660026187803966, 0.9734100261878039, 0.9767022072577629, 0.9808174335952113, 0.9854470632248411, 0.9854470632248411, 0.9854470632248411, 0.9854470632248411, 0.9808174335952113]

average_precision-stdv  :  [0.04039346734018979, 0.0157876653573454, 0.02309624528782449, 0.01895799108192039, 0.01979815600144299, 0.013508585022458419, 0.013508585022458419, 0.013508585022458419, 0.013508585022458419, 0.01979815600144299]

cvbooster  :  <lightgbm.engine.CVBooster object at 0x7f21e9693518>
cvbooster = cv_output['cvbooster']

cvbooster.boosters
[<lightgbm.basic.Booster at 0x7f21e96937b8>,
 <lightgbm.basic.Booster at 0x7f21e90dfc88>,
 <lightgbm.basic.Booster at 0x7f21e9693240>]

10. Plotting Functionality

Lightgbm provides a list of the below-mentioned plotting functions.

10.1 Visualize Features Importance using "plot_importance()"

This method accepts a booster instance and plots feature importance using it. Below we have created a feature importance plot using the booster trained earlier for the regression task. The method has a parameter named importance_type which can be set to string split will plot the number of times feature was used for split and plots gains of splits if set to string gain. The value of parameter importance_type is split. The plot_importance() method has another important parameter max_num_features which accepts an integer specifying how many features to include in the plot. We can limit the number of features using this parameter as it'll include only that many top features in the plot.

lgb.plot_importance(booster, figsize=(8,6));

LightGBM - An In-Depth Guide [Python]

10.2 Visualize ML Metric using "plot_metric()"

This method plots the results of an evaluation metric. We need to give a booster instance to the method in order to plot an evaluation metric evaluated on the evaluation dataset.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,)

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),], eval_metric="rmse", eval_names = ["Validation Set"],
            feature_name=boston.feature_names.tolist()
           )

lgb.plot_metric(booster, figsize=(8,6));

LightGBM - An In-Depth Guide [Python]

lgb.plot_metric(booster, metric="rmse", figsize=(8,6));

LightGBM - An In-Depth Guide [Python]

10.3 Visualize Feature Values Split using "plot_split_value_histogram()"

This method takes as input booster instance and feature name/index. It then plots a split value histogram for the feature.

lgb.plot_split_value_histogram(booster, feature="LSTAT", figsize=(8,6));

LightGBM - An In-Depth Guide [Python]

10.4 Visualize Individual Boosted Tree using "plot_tree()"

This method lets us plot the individual tree of the ensemble. We need to give a booster instance and index of the tree which we want to plot to it.

lgb.plot_tree(booster, tree_index = 1, figsize=(20,12));

LightGBM - An In-Depth Guide [Python]

11. Early Stopping Training to Avoid Overfitting

Early stopping training is a process where we stop training if the evaluation metric evaluated on the evaluation dataset is not improving for a specified number of rounds. Lightgbm provides parameter named early_stopping_rounds as a part of train() method as well as fit() method of lightgbm sklearn-like estimators. This parameter accepts integer value specifying that stop the training process if the evaluation metric result has not improved for that many rounds.

Please make a note that we need an evaluation dataset in order for this to work as it’s based on evaluation metric results evaluated on the evaluation dataset.

Below we have explained the usage of the parameter early_stopping_rounds for regression and classification tasks with simple examples.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())


booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse"},
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    early_stopping_rounds=5,
                    num_boost_round=100)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 8.82485
Training until validation scores don't improve for 5 rounds
[2]	valid_0's rmse: 8.09497
[3]	valid_0's rmse: 7.46686
[4]	valid_0's rmse: 6.90991
[5]	valid_0's rmse: 6.4172
[6]	valid_0's rmse: 5.99212
[7]	valid_0's rmse: 5.62928
[8]	valid_0's rmse: 5.30155
[9]	valid_0's rmse: 5.05191
[10]	valid_0's rmse: 4.84863
[11]	valid_0's rmse: 4.63474
[12]	valid_0's rmse: 4.44933
[13]	valid_0's rmse: 4.28644
[14]	valid_0's rmse: 4.15939
[15]	valid_0's rmse: 4.01791
[16]	valid_0's rmse: 3.92719
[17]	valid_0's rmse: 3.82892
[18]	valid_0's rmse: 3.77695
[19]	valid_0's rmse: 3.69585
[20]	valid_0's rmse: 3.64548
[21]	valid_0's rmse: 3.58403
[22]	valid_0's rmse: 3.54853
[23]	valid_0's rmse: 3.51134
[24]	valid_0's rmse: 3.4976
[25]	valid_0's rmse: 3.45016
[26]	valid_0's rmse: 3.42836
[27]	valid_0's rmse: 3.41483
[28]	valid_0's rmse: 3.40661
[29]	valid_0's rmse: 3.39959
[30]	valid_0's rmse: 3.38903
[31]	valid_0's rmse: 3.37894
[32]	valid_0's rmse: 3.35784
[33]	valid_0's rmse: 3.37572
[34]	valid_0's rmse: 3.3732
[35]	valid_0's rmse: 3.35426
[36]	valid_0's rmse: 3.35484
[37]	valid_0's rmse: 3.34265
[38]	valid_0's rmse: 3.33666
[39]	valid_0's rmse: 3.33256
[40]	valid_0's rmse: 3.33374
[41]	valid_0's rmse: 3.32778
[42]	valid_0's rmse: 3.33335
[43]	valid_0's rmse: 3.33888
[44]	valid_0's rmse: 3.34715
[45]	valid_0's rmse: 3.32557
[46]	valid_0's rmse: 3.34178
[47]	valid_0's rmse: 3.3474
[48]	valid_0's rmse: 3.33983
[49]	valid_0's rmse: 3.33105
[50]	valid_0's rmse: 3.3198
[51]	valid_0's rmse: 3.31533
[52]	valid_0's rmse: 3.31672
[53]	valid_0's rmse: 3.32232
[54]	valid_0's rmse: 3.3158
[55]	valid_0's rmse: 3.31626
[56]	valid_0's rmse: 3.32085
Early stopping, best iteration is:
[51]	valid_0's rmse: 3.31533

Test  R2 Score : 0.88
Train R2 Score : 0.95
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc")

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),],
            early_stopping_rounds=3)

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]
train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest  Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))
print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))
Train/Test Sizes :  (426, 30) (143, 30) (426,) (143,)
[1]	valid_0's auc: 0.986129
Training until validation scores don't improve for 3 rounds
[2]	valid_0's auc: 0.989355
[3]	valid_0's auc: 0.988925
[4]	valid_0's auc: 0.987097
[5]	valid_0's auc: 0.990108
[6]	valid_0's auc: 0.993011
[7]	valid_0's auc: 0.993011
[8]	valid_0's auc: 0.993441
[9]	valid_0's auc: 0.993441
[10]	valid_0's auc: 0.994194
[11]	valid_0's auc: 0.994194
[12]	valid_0's auc: 0.994194
[13]	valid_0's auc: 0.994409
[14]	valid_0's auc: 0.995914
[15]	valid_0's auc: 0.996129
[16]	valid_0's auc: 0.996989
[17]	valid_0's auc: 0.996989
[18]	valid_0's auc: 0.996344
[19]	valid_0's auc: 0.997204
[20]	valid_0's auc: 0.997419
[21]	valid_0's auc: 0.997849
[22]	valid_0's auc: 0.998065
[23]	valid_0's auc: 0.997849
[24]	valid_0's auc: 0.998065
[25]	valid_0's auc: 0.997634
Early stopping, best iteration is:
[22]	valid_0's auc: 0.998065

Test  Accuracy Score : 0.97
Train Accuracy Score : 0.98

How to Stop Training Early using "early_stopping()" Callback?

Lightgbm provides early stopping training functionality using the early_stopping() callback function as well. We can give number of rounds to early_stopping() function and give that function to callbacks parameter of train()/fit() method. We have explained callbacks in an upcoming section.

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc")

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),],
            callbacks=[lgb.early_stopping(3)]
            )

from sklearn.metrics import accuracy_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]
train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]

print("\nTest  Accuracy Score : %.2f"%accuracy_score(Y_test, test_preds))
print("Train Accuracy Score : %.2f"%accuracy_score(Y_train, train_preds))
Train/Test Sizes :  (426, 30) (143, 30) (426,) (143,)
[1]	valid_0's auc: 0.954328
Training until validation scores don't improve for 3 rounds
[2]	valid_0's auc: 0.959322
[3]	valid_0's auc: 0.982938
[4]	valid_0's auc: 0.988244
[5]	valid_0's auc: 0.987203
[6]	valid_0's auc: 0.98762
[7]	valid_0's auc: 0.98814
Early stopping, best iteration is:
[4]	valid_0's auc: 0.988244

Test  Accuracy Score : 0.94
Train Accuracy Score : 0.95

12. Feature Interaction Constraints

When lightgbm has completed training trees of the ensemble on a dataset, the individual node of trees represents some condition based on some value of the feature. When we are making predictions using an individual tree, we start from the root node of the tree, checking the feature condition specified in the node with our sample feature values. We make decisions based on the feature values in our sample and the condition present in the tree. This way we follow a particular path reaching the leaf of the tree to make the final prediction. By default, there is no restriction on which node can have which feature as a condition. This process of making a final decision by going through nodes of tree checking feature condition is called feature interaction because predictor has come to the particular node after evaluating the condition of the previous node. Lightgbm can let us define restrictions on which feature to interact with which another feature. We can give a list of indices and only that many features will interact with one another. Those features won't be allowed to interact with other features and this restriction will be forced when creating trees during the training process.

Below we have explained with a simple example of how we can force feature interaction constraint on estimator in lightgbm. Lighgbm estimators provide a parameter named interaction_constraints which accepts a list of lists where individual lists are indices of parameters that are allowed to interact with one another.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse",
                    'interaction_constraints':[[0,1,2,11,12], [3, 4],[6,10], [5,9], [7,8]]},
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    num_boost_round=10)


from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[1]	valid_0's rmse: 7.50225
[2]	valid_0's rmse: 7.01989
[3]	valid_0's rmse: 6.58246
[4]	valid_0's rmse: 6.18581
[5]	valid_0's rmse: 5.83873
[6]	valid_0's rmse: 5.47166
[7]	valid_0's rmse: 5.19667
[8]	valid_0's rmse: 4.96259
[9]	valid_0's rmse: 4.69168
[10]	valid_0's rmse: 4.51653

Test  R2 Score : 0.67
Train R2 Score : 0.69
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,
                        interaction_constraints = [[0,1,2,11,12], [3, 4],[6,10], [5,9], [7,8]])

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),], eval_metric="rmse",
            )

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 8.97871	valid_0's l2: 80.6173
[2]	valid_0's rmse: 8.35545	valid_0's l2: 69.8135
[3]	valid_0's rmse: 7.93432	valid_0's l2: 62.9535
[4]	valid_0's rmse: 7.61104	valid_0's l2: 57.9279
[5]	valid_0's rmse: 7.16832	valid_0's l2: 51.3849
[6]	valid_0's rmse: 6.93182	valid_0's l2: 48.0501
[7]	valid_0's rmse: 6.57728	valid_0's l2: 43.2606
[8]	valid_0's rmse: 6.41497	valid_0's l2: 41.1518
[9]	valid_0's rmse: 6.13983	valid_0's l2: 37.6976
[10]	valid_0's rmse: 5.9864	valid_0's l2: 35.837

Test  R2 Score : 0.60
Train R2 Score : 0.69

13. Monotonic Constraints

Lightgbm let us specify monotonic constraints on a model that specifies whether the individual feature has increasing, decreasing, or no relation with the target value. It let us specify monotone values of -1, 0, and 1 forcing model to impose decreasing, none, and increasing relationship of the feature with the target. We can provide a list with the same length as a number of features specifying 1,0 or -1 for the monotonic relationship by using the monotone_constraints parameter. We have explained below with a simple example of how we can enforce monotonic constraints in lightgbm.

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse",
                    'monotone_constraints':(1,0,1,-1,1,0,1,0,-1,1,1, -1, 1)},
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    num_boost_round=10)


from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[1]	valid_0's rmse: 7.50077
[2]	valid_0's rmse: 7.01013
[3]	valid_0's rmse: 6.57254
[4]	valid_0's rmse: 6.19802
[5]	valid_0's rmse: 5.8771
[6]	valid_0's rmse: 5.59538
[7]	valid_0's rmse: 5.35168
[8]	valid_0's rmse: 5.15228
[9]	valid_0's rmse: 4.95664
[10]	valid_0's rmse: 4.81777

Test  R2 Score : 0.63
Train R2 Score : 0.63
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective="regression", n_estimators=10,
                        monotone_constraints = (1,0,1,-1,1,0,1,0,-1,1,1, -1, 1))

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),], eval_metric="rmse",
            )

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 8.87332	valid_0's l2: 78.7359
[2]	valid_0's rmse: 8.37389	valid_0's l2: 70.122
[3]	valid_0's rmse: 7.89759	valid_0's l2: 62.3719
[4]	valid_0's rmse: 7.51069	valid_0's l2: 56.4105
[5]	valid_0's rmse: 7.18851	valid_0's l2: 51.6747
[6]	valid_0's rmse: 6.90391	valid_0's l2: 47.664
[7]	valid_0's rmse: 6.66775	valid_0's l2: 44.4589
[8]	valid_0's rmse: 6.46139	valid_0's l2: 41.7495
[9]	valid_0's rmse: 6.27545	valid_0's l2: 39.3813
[10]	valid_0's rmse: 6.12082	valid_0's l2: 37.4644

Test  R2 Score : 0.58
Train R2 Score : 0.62

14. Custom Objective/Loss Function

Lightgbm let us define custom objective function as well. We need to define a function that takes a list of prediction and actual labels as input and returns the first derivative and second derivative of the loss function. We need to return the first derivative and the second derivative of loss function evaluated using predictions and actual values. We can give a custom-defined objective/loss function to the objective parameter of the estimator. If we are using the train() method then we need to give this function to the fobj parameter.

Below we have designed the mean squared error objective function. We have then given this function to an objective parameter of LGBMModel for an explanation.

def first_grad(predt, dmat):
    '''Compute the first derivative for mean squared error.'''
    y = dmat.get_label() if isinstance(dmat, lgb.Dataset) else dmat
    return 2*(y-predt)

def second_grad(predt, dmat):
    '''Compute the second derivative for mean squared error.'''
    y = dmat.get_label() if isinstance(dmat, lgb.Dataset) else dmat
    return [1] * len(predt)

def mean_sqaured_error(predt, dmat):
    ''''Mean squared error function.'''
    predt[predt < -1] = -1 + 1e-6
    grad = first_grad(predt, dmat)
    hess = second_grad(predt, dmat)
    return grad, hess
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric="rmse")

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 19.3349
[2]	valid_0's rmse: 15.5417
[3]	valid_0's rmse: 12.5873
[4]	valid_0's rmse: 10.2379
[5]	valid_0's rmse: 8.43293
[6]	valid_0's rmse: 7.08919
[7]	valid_0's rmse: 6.09021
[8]	valid_0's rmse: 5.39551
[9]	valid_0's rmse: 4.88447
[10]	valid_0's rmse: 4.59251

Test  R2 Score : 0.75
Train R2 Score : 0.83

15. Custom Evaluation Function

Lightgbm lets us define our own evaluation metric if we don't want to use evaluation metrics available with lightgbm. We need to define a function that takes an input list of predictions and actual target values and returns a string specifying metric name, metric evaluation value, and boolean value specifying whether higher is better or not. The value higher is better should be returned True if we want the metric value to be maximized else it should be False if we want the metric value to be minimized.

We need to give reference to this function as the value of parameter feval if we are using train() method to design our estimator. If we are using a sklearn-like estimator then we need to give this function to the eval_metric parameter of the fit() method.

Below we have explained with simple examples of how we can use custom evaluation metrics with lightgbm.

def mean_absolute_error(preds, dmat):
    actuals = dmat.get_label() if isinstance(dmat, lgb.Dataset) else dmat
    err = (actuals - preds).sum()
    is_higher_better = False
    return "MAE", err, is_higher_better
X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, train_size=0.90, random_state=42)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape, "\n")

train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())

booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse"},
                    feval=mean_absolute_error,
                    train_set=train_dataset, valid_sets=(test_dataset,),
                    num_boost_round=10)


from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (455, 13) (51, 13) (455,) (51,)

[1]	valid_0's rmse: 7.40798	valid_0's MAE: -74.3941
[2]	valid_0's rmse: 6.83504	valid_0's MAE: -68.5244
[3]	valid_0's rmse: 6.32897	valid_0's MAE: -63.4968
[4]	valid_0's rmse: 5.90259	valid_0's MAE: -59.304
[5]	valid_0's rmse: 5.53393	valid_0's MAE: -55.712
[6]	valid_0's rmse: 5.17631	valid_0's MAE: -52.3329
[7]	valid_0's rmse: 4.87576	valid_0's MAE: -48.2586
[8]	valid_0's rmse: 4.62314	valid_0's MAE: -46.1631
[9]	valid_0's rmse: 4.38363	valid_0's MAE: -41.8425
[10]	valid_0's rmse: 4.2398	valid_0's MAE: -39.1324

Test  R2 Score : 0.71
Train R2 Score : 0.76
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train, eval_set=[(X_test, Y_test),], eval_metric=mean_absolute_error)

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's MAE: -2230.03
[2]	valid_0's MAE: -1775.07
[3]	valid_0's MAE: -1413.3
[4]	valid_0's MAE: -1127.13
[5]	valid_0's MAE: -900.256
[6]	valid_0's MAE: -719.848
[7]	valid_0's MAE: -572.454
[8]	valid_0's MAE: -449.162
[9]	valid_0's MAE: -357.412
[10]	valid_0's MAE: -281.703

Test  R2 Score : 0.66
Train R2 Score : 0.82

16. Callbacks

Lightgbm provides users with a list of callback functions for a different purpose that gets executed after each iteration of training. Below is a list of available callback functions with lightgbm:

  • early_stopping(stopping_rounds) - This callback function accepts an integer specifying whether to stop training if evaluation metric results on the last evaluation set are not improved for that many iterations.
  • print_evaluation(period, show_stdv) - This callback function accepts integer values specifying how often to print evaluation results. Evaluation metric results are printed at every that many iterations as specified.
  • record_evaluation(eval_result) - This callback function accepts a dictionary in which evaluation results will be recorded.
  • reset_parameter() - This callback function lets us reset the learning rate after each iteration of training. It accepts an array of size the same as the number of iterations or callback returning the new learning rate for each iteration.

The callbacks parameter which is available with the train() method and the fit() method of estimators accepts a list of callback functions.

Below we have explained with simple examples of how we can use different callback functions. The explanation of the early_stopping() callback function has been covered in the early stopping training section of this tutorial.

How to Use "print_evaluation()" Callback?

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),], eval_metric="rmse", verbose=False,
            callbacks=[lgb.callback.print_evaluation(period=3)])

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[3]	valid_0's rmse: 12.1433
[6]	valid_0's rmse: 6.86157
[9]	valid_0's rmse: 4.37858

Test  R2 Score : 0.79
Train R2 Score : 0.80

How to Use "record_evaluation()" Callback?

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

evals_results = {}

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),], eval_metric="rmse", verbose=False,
            callbacks=[lgb.print_evaluation(period=3), lgb.record_evaluation(evals_results)])

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
print("Evaluation Results : ", evals_results)
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[3]	valid_0's rmse: 12.8003
[6]	valid_0's rmse: 7.40552
[9]	valid_0's rmse: 5.11615

Test  R2 Score : 0.67
Train R2 Score : 0.82
Evaluation Results :  {'valid_0': OrderedDict([('rmse', [19.235743778402917, 15.611391428644854, 12.800304472773783, 10.469162299753663, 8.715414846943654, 7.405524963318977, 6.417956121607763, 5.66002020770034, 5.116147011782366, 4.786323495504935])])}

How to Use "reset_parameter()" Callback?

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)

print("Train/Test Sizes : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

booster = lgb.LGBMModel(objective=mean_sqaured_error, n_estimators=10,)

booster.fit(X_train, Y_train,
            eval_set=[(X_test, Y_test),], eval_metric="rmse",
            callbacks=[lgb.reset_parameter(learning_rate=np.linspace(0.1,1,10).tolist())])

from sklearn.metrics import r2_score

test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)

print("\nTest  R2 Score : %.2f"%r2_score(Y_test, test_preds))
print("Train R2 Score : %.2f"%r2_score(Y_train, train_preds))
Train/Test Sizes :  (379, 13) (127, 13) (379,) (127,)
[1]	valid_0's rmse: 19.224
[2]	valid_0's rmse: 12.167
[3]	valid_0's rmse: 6.42527
[4]	valid_0's rmse: 4.44198
[5]	valid_0's rmse: 4.22668
[6]	valid_0's rmse: 4.43308
[7]	valid_0's rmse: 4.29187
[8]	valid_0's rmse: 4.47696
[9]	valid_0's rmse: 4.5301
[10]	valid_0's rmse: 4.64636

Test  R2 Score : 0.73
Train R2 Score : 0.95
Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.


Subscribe to Our YouTube Channel

YouTube SubScribe

Newsletter Subscription