Updated On : Oct-03,2021 Tags scikit-optimize, hyperparameters-optimization
Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Hyperparameters optimization is a process where we find values of hyperparameters of the ML model that gives the best results in less time without trying all possible combinations. ML models for complicated problems generally have lots of hyperparameters and trying all possible combinations (grid search) of those hyperparameter's values on a large amount of data can take a lot of time even on modern computers. We need an efficient way of finding values of hyperparameters in less amount of time that gives the best results so that we can try more experiments. As a part of this tutorial, we'll explain the usage of python library scikit-optimize that uses a bayesian process to find out the best hyperparameters setting for a model in less amount of time.

Scikit-Optimize is designed on top of numpy, scipy, and scikit-learn. It let us minimize the output value of black-box functions. We'll be following below mentioned steps to minimize any given function.

Steps to Use Scikit-Optimize

  1. Define an Objective Function.
    • This is the step where we declare a function that takes a single combination of hyperparameters as input, creates a model, trains it, and evaluates it. We return some metric value (MSE, MAE, etc.) at the end of the function which we want to minimize.
  2. Define Hyperparameters Search Space.
    • This is the step where we declare a list of hyperparameters that we want to optimize and the range of values of those hyperparameters to try.
  3. Minimize Objective function value by trying different hyperparameters combinations using the bayesian process.
    • This is the step where scikit-optimize will try different combinations of hyperparameters settings on the objective function and try to minimize the output metric value in less amount of time.

We'll first try to explain the usage of scikit-optimize by minimizing the output value of the simple line formula following the steps mentioned above. We'll then explain how we can use a library with scikit-learn models. We'll also explore plotting functionality available through scikit-optimize.

Tutorial Sections Overview

  • Minimize Simple Line Formula
    • Define Objective Function
    • Define HyperParameters Search Space
    • Minimize Objective Function
    • Print Results
    • Minimize over Float Search Space
    • Define Search Space using "skopt.space" Module
  • Regression using Scikit-Learn
    • Load Dataset
    • Define Hyperparameters Search Space
    • Define Objective Function
    • Optimize Objective Function (Minimize for Least MSE)
    • Print Best Results
    • Train and Evaluate Model with Best Hyperparameters
  • Classification using Scikit-Learn
    • Load Dataset
    • Define Hyperparameters Search Space
    • Define Objective Function
    • Optimize Objective Function (Maximize for Highest Accuracy)
    • Print Best Results
    • Train and Evaluate Model with Best Hyperparameters
  • Plotting
    • Gaussian Process and Optimization Results for 1D Objective Functions
    • Partial Dependence Plots of Objective Function
    • Partial Dependence Plot of 2 Hyperparameters Combination
    • Hyperparameters Search Space Sampling Plot
    • Hyperparameter Search Space Sampling Histogram
    • Convergence Plot
    • Cumulative Regret Plot
  • References

We'll now start by importing the necessary libraries.

In [1]:
import skopt
import sklearn

print("Scikit-Optimize Version : {}".format(skopt.__version__))
print("Scikit-Learn    Version : {}".format(sklearn.__version__))

import warnings

warnings.filterwarnings("ignore")
Scikit-Optimize Version : 0.8.1
Scikit-Learn    Version : 0.24.2

1. Minimize Simple Line Formula

As a part of this section, we'll try to minimize the output of line formula 5x-21 using scikit-optimize. We want to find the value of parameter x at which the value of line formula 5x-21 becomes zero. This is a simple function and we can easily find the value of x by setting the line equation to zero. But we want scikit-optimize to find the best value at which line formula becomes zero. We'll ask it to try values in a particular range and find the best value of x at which line formula becomes zero or at least evaluates to value near zero.

Define Objective Function

Our objective function for this example is quite simple. It takes as input single parameter x and returns the value of line formula 5x-21 calculated using it. We have wrapped the line formula with python function abs() which will always return a value greater than or equal to zero. If we don't use this function then the negative value of x can keep on decreasing line formula until negative infinity. We want the line formula to be evaluated to a value near zero. We'll be trying to minimize the output of this objective function which has a minimum of zero.

In [2]:
def objective(x):
    #print(x)
    return abs(5*x[0] - 21)

Define HyperParameters Search Space

Our second step when using scikit-optimize for hyperparameters optimization will be to define search space for hyperparameters. This is the step where we declare a range for continuous features and a list of values for categorical variables from which we want to try different combinations.

When using scikit-optimize, we need to provide search space as a list of ranges/values for hyperparameters. In this example, we have only one hyperparameter named x to optimize. We have below declared search space with a range of [-5,5].

In [3]:
search_space= [(-5,5)]

Minimize Objective Function

In this section, we'll be using gp_minimize() function from scikit-optimize to minimize our objective function by giving different values of parameter x from range [-5,5] to objective function. The function internally uses Bayesian optimization using gaussian processes to find out the best value of x which minimizes objective function value in less amount of time. Below we have explained the definition of gp_minimize() function.


  • gp_minimize(func,dimensions,n_calls=100,random_state=None,verbose=False,n_jobs=1) - This function takes as input objective function and hyperparameters search space as input. It then tries different values of hyperparameters on objective function using gaussian process. It returns an instance of scipy.optimize.OptimizeResult instance which has information about the optimization process like best parameters settings that gave best results, stats of different trials, etc.
    • The n_calls parameter accepts integer value specifying the number of trials to perform on objective function with different hyperparameter settings.
    • The random_state is for reproducibility.

Below we have executed gp_minimize() function by giving objective function and search space as input to it. We have asked it to try 20 different values of parameter x on the objective function by setting n_calls to 20.

In [4]:
%%time

from skopt import gp_minimize

res1 = gp_minimize(objective, dimensions=search_space, n_calls=20)
CPU times: user 5.57 s, sys: 2.72 s, total: 8.29 s
Wall time: 3.95 s

In this section, we have printed the result of the optimization process. We can notice from the output object type that it’s of type scipy.optimize.OptimizeResult. It has a list of attributes that holds information about the optimization process.

The x attribute of the scipy.optimize.OptimizeResult object has a list of hyperparameters settings that gave the best result. We can notice in our case that integer 4 gave the best result (least value) for the line formula.

The fun attribute has information about the output of the objective function for the best hyperparameters setting. In our case, the output of line formula (5x-21) with x value of 4 is 1.

In [5]:
print("Result Type : {}".format(type(res1)))
print("5*x-21 at x={} is {}".format(res1.x[0], res1.fun))
Result Type : <class 'scipy.optimize.optimize.OptimizeResult'>
5*x-21 at x=4 is 1

Below we have printed search space through space attribute of the result object. Please make a note that as we had provided range as (-5,5), scikit-optimize internally created a range of integer values. We have explained in the next example how we can simply ask it to create float search space by giving float instead of integers.

The space module of scikit-optimize provides methods to create search space which we'll explain in our upcoming examples.

In [27]:
print("Result 1 Space : {}".format(res1.space))
Result 1 Space : Space([Integer(low=-5, high=5, prior='uniform', transform='normalize')])

The x_iters parameter returns a list of different hyperparameters settings that were tried on the objective function. Below we have printed 20 different values of x which were tried during optimization.

In [28]:
print("List of X values tried : {}".format(res1.x_iters))
List of X values tried : [[-3], [5], [-1], [4], [3], [-2], [-4], [4], [0], [-1], [4], [4], [4], [4], [4], [4], [4], [4], [4], [4]]

Minimize over Float Search Space

In this section, we are explaining how we can inform scikit-optimize to try a range of float values for x on our objective function. We have simply made one change for this purpose which is changing the search space range value from integer to float. We have passed search space [(-5.,5.),] instead of [(-5,5),]. This will create float search space and will try float values on the objective function.

In [6]:
%%time

res2 = gp_minimize(objective, dimensions=[(-5.,5.),], n_calls=20, random_state=123)
CPU times: user 4.28 s, sys: 2.52 s, total: 6.8 s
Wall time: 1.96 s

Below we have printed the best result details.

In [30]:
print("5*x-21 at x={} is {}".format(res2.x[0], res2.fun))
5*x-21 at x=4.180426243834873 is 0.09786878082563533

We can notice that now search space is float.

In [31]:
print("Result 2 Space : {}".format(res2.space))
Result 2 Space : Space([Real(low=-5.0, high=5.0, prior='uniform', transform='normalize')])

Below is a list of float values of x that were tried.

In [32]:
print("List of X values tried : {}".format(res2.x_iters))
List of X values tried : [[2.1295532052322734], [-0.7152907381317419], [1.908848550268618], [2.191503101547731], [-0.08881066567402662], [2.800277619120793], [-0.89075627290981], [0.7969429702261017], [-3.6004923736961545], [-0.9898244333879633], [5.0], [4.649477605062296], [4.127749577433184], [3.9484404250713414], [4.1714948271755965], [4.17937226528638], [4.270885131599767], [-5.0], [4.223883556976615], [4.180426243834873]]

Define Search Space using "skopt.space" Module

We can also declare search space using methods of space module of scikit-optimize which we'll explain in this section. We'll be declaring search space using methods from space module in all our upcoming examples as well.

In order to declare search space using skopt.space module, we need to create an instance of Space. The Space instance accepts a list of search space dimensions. The individual entries of the list correspond to range/options representing one hyperparameter or the ML Model.

We have listed down useful methods from space module for creating search space.


  • space.Space(dimensions) - This method accepts list of ranges/options for different hyperparameters. Each individual entry in the list represents a range or list of options for individual hyperparameters of the ML model. There are various ways to provide dimensions. We have listed them down below.
    • List of tuples for continuous features. (low, high)
    • List of options categorical options. ([option1, option2, ...])
    • List of instances of Dimension object (Real, Integer or Categorical).
  • space.Integer(low,high,prior="uniform",transform=None, name=None) - This method accepts low and high bounds for range from which values will be tried for hyperparameter.
    • The prior accepts string 'uniform' or 'log-uniform' as input specifying distribution from which to fetch values. The 'uniform' is default.
    • The transform parameter accepts string 'identity' or 'normalize' as input specifying transformation to apply. The 'identity' won't apply any transformation. The 'normalize' will scale the space.
    • The name parameter accepts string specifying the name of hyperparameter.
  • space.Real(low,high,prior="uniform",transform=None, name=None) - This method works exactly like Integer described above with only change that it returns float values from specified range.
  • space.Categorical(categories,prior=None,transform=None,name=None) - This method accepts list of options to try for particular hyperparameter. This is useful for categorical features of data.
    • The prior parameter accepts a list of probabilities for a list of options.
    • The transform parameter accepts one of the below-mentioned strings specifying transformation to apply to options.
      • The 'identity' string does not perform any transformation.
      • The 'string' will string encode options.
      • The 'label' will label encode options.
      • The 'onehot' will one-hot encode the options.

Below we have declared search space for our objective function by creating an instance of Space. We have given it a list with a single entry which is an instance of Real which will suggest float values in the range [-1.0,5.0].

We have then called gp_minimize() function with objective function and this search space. We have instructed the function to try 20 different values of hyperparameter x on the objective function.

In [7]:
%%time

from skopt import space

search_space = space.Space([space.Real(low=-1.0, high=5.0, prior="uniform", transform="identity"), ])

res3 = gp_minimize(objective, dimensions=search_space, n_calls=20, random_state=123)
CPU times: user 4.4 s, sys: 2.68 s, total: 7.08 s
Wall time: 2.19 s

Below we have printed the result after running hyperparameter optimization above. We have also printed search space and a list of different float values of hyperparameter x which were tried.

In [34]:
print("5*x-21 at x={} is {}".format(res3.x[0], res3.fun))
5*x-21 at x=4.196785716760545 is 0.016071416197274146
In [35]:
print("Result 3 Space : {}".format(res3.space))
Result 3 Space : Space([Real(low=-1.0, high=5.0, prior='uniform', transform='normalize')])
In [36]:
print("List of X values tried : {}".format(res3.x_iters))
List of X values tried : [[3.2777319231393633], [1.5708255571209548], [3.145309130161171], [3.3149018609286394], [1.9467136005955843], [3.6801665714724763], [1.4655462362541143], [2.4781657821356613], [-0.1602954242176926], [1.4061053399672216], [5.0], [3.7883067744365997], [3.9058976259052356], [4.005283436263058], [4.29980569058579], [4.244263652761318], [4.306175310863598], [4.389063379935344], [4.196785716760545], [4.179767910821305]]

2. Regression using Scikit-Learn

In this section, we'll explain how we can use scikit-optimize with machine learning framework scikit-learn. We'll be trying to solve a regression problem. We'll be using the Boston housing dataset for our problem. Our objective function will return a mean squared error which we want to minimize. We'll use scikit-optimize to find the best hyperparameters settings for Ridge regression model that gives the least mean squared error on the Boston housing dataset.

Load Dataset

Below we have loaded the Boston housing dataset from scikit-learn. We have loaded it in two variables named X and Y. The variable X has data about features of the houses in the Boston area and Y has data about median house price in 1000 dollars. We have divided data into the train (80%) and test (20%) sets as well.

In [8]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, train_size=0.8, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[8]:
((404, 13), (102, 13), (404,), (102,))

Define Objective Function

As this is a regression problem, we'll be using Ridge regression solver from scikit-learn. We'll be optimizing three hyperparameters of the model.

  • alpha
  • fit_intercept
  • solver

Our objective function takes a single argument args which will have a list of hyperparameters values in order alpha, fit_intercept, and solver. We have first separated arguments into separate variables. We have then created an instance of Ridge model using those hyperparameters values. After creating the model, we have fit it on train data and made predictions on test data. At last, we have calculated mean squared error (MSE) on test data and returned it from function. The MSE is the metric that we want to minimize.

In [9]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

def objective(args):
    alpha=args[0]
    fit_intercept=args[1]
    solver=args[2]

    reg = Ridge(alpha=alpha, fit_intercept=fit_intercept, solver=solver, random_state=123)

    reg.fit(X_train, Y_train)
    Y_pred = reg.predict(X_test)

    return mean_squared_error(Y_test, Y_pred)

Define Search Space

Our search space consists of three hyperparameters. We have declared search space for alpha using Real method asking it to try float values in the range [0.5,5]. The fit_intercept and solver are declared using Categorical method giving a list of options for each hyperparameter. We have given these three hyperparameters as a list to Space method to create a search space.

In [10]:
from skopt import space

search_space = space.Space([
                 space.Real(0.5, 5, prior="uniform", transform="identity", name="alpha"),
                 space.Categorical([True, False], name="fit_intercept"),
                 space.Categorical(["svd", "cholesky", "lsqr", "sag", "saga", "sparse_cg"], name="solver"),
               ])

Minimize Objective Function

In this section, we have called gp_minimize() function by giving it objective function and hyperparameters search space. We have asked it to try 50 different combinations of three hyperparameters on objective function in a way that MSE is much less as possible.

In [11]:
%%time

from skopt import gp_minimize

res_reg = gp_minimize(objective, search_space, n_calls=50, random_state=123, n_jobs=-1)
CPU times: user 30.5 s, sys: 16.5 s, total: 47.1 s
Wall time: 20.6 s

Below we have printed the result we received after completing the hyperparameters optimization process above. We have also printed the lease MSE that we received using those hyperparameters settings.

In [41]:
best_params = dict(list(zip(["alpha", "fit_intercept", "solver"], res_reg.x)))

print("Best Parameters : {}".format(best_params))
print("Best MSE : {}".format(res_reg.fun))
Best Parameters : {'alpha': 0.5, 'fit_intercept': True, 'solver': 'cholesky'}
Best MSE : 28.636816344247777

Train Model with Best Params

Below we have created Ridge regression model again with the best parameter settings we got after optimization. We have then trained it on train data and evaluated MSE on both train and test sets.

In [42]:
reg = Ridge(**best_params, random_state=123)

reg.fit(X_train, Y_train)

print("Train MSE : {:.2f}".format(mean_squared_error(Y_train, reg.predict(X_train))))
print("Test  MSE : {:.2f}".format(mean_squared_error(Y_test, reg.predict(X_test))))
Train MSE : 20.74
Test  MSE : 28.64

3. Classification using Scikit-Learn

In this section, we'll explain how we can use scikit-optimize for classification problems. We'll be using the wine dataset available from scikit-learn for our problem which has information about ingredients measurements used in the creation of three different types of wines. We'll explain how to use scikit-optimize to optimize hyperparameters of LogisticRegression so that it gives the best results on a given dataset.

Load Dataset

We'll start by loading the wine dataset from scikit-learn. We have loaded features data into variable X and target variable into variable Y. The variable Y has data about the type of the wine for measurements recorded in variable X.

We have then divided the dataset into the train (80%) and test (20%) sets.

In [12]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_wine(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, train_size=0.8, stratify=Y, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[12]:
((142, 13), (36, 13), (142,), (36,))

Define Objective Function

As this is a classification problem, we'll be using LogisticRegression solver from scikit-learn to predict the type of wine. We'll be optimizing four hyperparameters of the model.

  • penalty
  • C
  • fit_intercept
  • solver

The objective function takes args parameter as input which will have values for the above-mentioned four hyperparameters. We have first saved the value of individual hyperparameters into different variables. We have then created an instance of LogisticRegression model using those hyperparameters values. After creating the model, we have trained it on train data.

At last, we have called score() method on the model by giving it test data which will return the accuracy of the model on test data as LogisticRegression is classification model. We have then multiplied the accuracy with -1 because we want to maximize accuracy and gp_minimize() minimizes value returned from the objective function. If we multiply the value by -1 then it'll become negative and gp_minimize() will try to minimize this negative value and we'll get more accuracy.

In [13]:
from sklearn.linear_model import LogisticRegression

def objective(args):
    penalty=args[0]
    C=args[1]
    fit_intercept = args[2]
    solver=args[3]

    log_reg = LogisticRegression(penalty=penalty,
                                 C=C,
                                 fit_intercept=fit_intercept,
                                 solver=solver,
                                 random_state=123)

    log_reg.fit(X_train, Y_train)

    return -1 * log_reg.score(X_test, Y_test)

Define Search Space

Our search space for this section has hyperparameter C declared using Real method which will try values in the range [0.5,5]. The hyperparameters penalty, fit_intercept, and solver are declared using method Categorical by giving options for each. We have created search space by creating an instance of Space** with a list of hyperparameters declarations.

In [14]:
from skopt import space

search_space = space.Space([
                 space.Categorical(["l2", "none"], name="penalty"),
                 space.Real(0.5, 5, prior="uniform", transform="identity", name="C"),
                 space.Categorical([True, False], name="fit_intercept"),
                 space.Categorical(["newton-cg", "lbfgs", "saga"], name="solver"),
               ])

Minimize Objective Function

At last, we have called gp_minimize() by giving objective function and search space. We have asked it to try 50 different combinations of hyperparameters on the objective function and minimize returned value from the objective function.

In [15]:
%%time

from skopt import gp_minimize

res_classif = gp_minimize(objective, search_space, n_calls=50, random_state=123, n_jobs=-1)
CPU times: user 33.7 s, sys: 16 s, total: 49.6 s
Wall time: 17.9 s

Below we have printed the best hyperparameters setting that gave the least value from the objective function. We have also printed the least value returned from the objective function but have multiplied it by -1 to get an original accuracy.

In [8]:
best_params = dict(list(zip(["penalty", "C", "fit_intercept", "solver"], res_classif.x)))

print("Best Parameters : {}".format(best_params))
print("Best Accuracy : {}".format(-1* res_classif.fun))
Best Parameters : {'penalty': 'l2', 'C': 4.010124928604357, 'fit_intercept': False, 'solver': 'newton-cg'}
Best Accuracy : 1.0

Train Model with Best Params

Below we have created a LogisticRegression model with the best hyperparameters setting that we received after the optimization process. We have then trained it with train data and evaluated on both train and test dataset. We have printed the accuracy of the model on both train and test data.

In [48]:
classifier = LogisticRegression(**best_params, random_state=123)

classifier.fit(X_train, Y_train)

print("Train Accuracy : {:.2f}".format(classifier.score(X_train, Y_train)))
print("Test  Accuracy : {:.2f}".format(classifier.score(X_test, Y_test)))#)
Train Accuracy : 0.99
Test  Accuracy : 1.00

4. Plotting

In this section, we'll explain various plotting functionalities available from scikit-optimize which can give us useful insights about the hyperparameters optimization process. All the plotting functionalities are available through plots module of skopt.

In [16]:
from skopt import plots

Gaussian Process and Optimization Results for 1D Objective Functions

The first chart that we'll explain plots the whole gaussian optimization process. It'll show a list of values tried as the x-axis and the result of the objective function as the y-axis. This plot will only work for objective functions which accept only one hyperparameter to optimize.

We can create this plot using plot_gaussian_process() method of plots module. It accepts result object (scipy.optimize.OptimizeResult) and created plot from it.

In our case, only our first line formula had one hyperparameter to optimize hence we'll be able to plot only results from it. Below we have plotted the chart using res2 object from our first example. The red dot represents the number of different values of x tried and objective function value for those x values. The dotted line represents the direction followed by the Gaussian process in trying the different values of x. We can notice that it tried many different values of x at the bottom where it thought trying different values will minimize objective function furthermore.

In [ ]:
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(12,7))
ax = fig.add_subplot(111)

plots.plot_gaussian_process(res2, ax=ax);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Partial Dependence Plots of Objective Function

The second chart that we'll introduce is a 2D matrix of partial dependence plots of the objective function. It can be used to analyze how each individual hyperparameter is affecting objective function.

We can create this plot using plot_objective() method from plots module. The function takes as input result object (scipy.optimize.OptimizeResult) and creates a plot based on it. The chart on diagonal shows impact of single hyperparameter on objective function whereas all other chart shows the effect of two hyperparameters on the objective function.

Below we have created a plot using the result object from the regression problem section. The chart on diagonal shows the impact of hyperparameters alpha, fit_intercept and solver on the objective function. The charts, other than diagonal show the impact of two hyperparameters combinations.

In [ ]:
plots.plot_objective(res_reg);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Below we have created another plot partial dependence charts of the objective function with only two hyperparameters (C and solver) using the result object from the classification problem section.

In [ ]:
plots.plot_objective(res_classif, plot_dims=["C", "solver"]);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Partial Dependence Plot of 2 Hyperparameters Combination

The plots module provide a separate method name plot_objective_2D() if we want to create a partial dependence plot of objective function based on two hyperparameters. We need to provide a method with the result object (scipy.optimize.OptimizeResult) and the name of two hyperparameters based on which we want to create a partial dependence plot.

Below we have created a partial dependence plot showing the effect of hyperparameters fit_intercept and solver on an objective function from the regression problem section.

In [ ]:
plots.plot_objective_2D(res_reg, dimension_identifier1="fit_intercept", dimension_identifier2="solver");

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Below we have created another partial dependence plot showing the effect of hyperparameters C and solver on objective function from the classification problem section.

In [ ]:
plots.plot_objective_2D(res_classif, dimension_identifier1="C", dimension_identifier2="solver");

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Hyperparameters Search Space Sampling Plot

In this section, we'll introduce a plot that shows how values of hyperparameters were sampled from a list of values or ranges. The plot is a 2D matrix where charts on diagonal are histograms showing the distribution of values sampled for a particular hyperparameter whereas other charts are scatter plots showing values sampled for combinations of two hyperparameters.

We can create this plot using plot_evaluations() method of plots module by giving result object (scipy.optimize.OptimizeResult) to it. If we want to create plot of only few selected hyperparameters then we can give list of hyperparameter names to plot_dims parameter of the plot_evaluations() method.

Below we have created hyperparameters search space sampling plot for the regression problem section.

In [ ]:
plots.plot_evaluations(res_reg);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Below we have created another hyperparameter search space sampling plot using the result object from the classification problem section.

In [ ]:
plots.plot_evaluations(res_classif);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Hyperparameter Search Space Sampling Histogram

We can create a histogram of values sampled for a single hyperparameter only as well. This is the histogram that gets included on diagonal in the plot created using plot_evaluations() (previous chart). We can create a histogram of a single hyperparameter's sampled values using plot_histogram() method. We need to provide the result object (scipy.optimize.OptimizeResult) and the hyperparameter name to it.

Below we have created a histogram showing how values of hyperparameters solver were sampled during the optimization process performed in the regression problem section. We can notice that solver named sag seems to have been sampled more than others as it might be giving good results.

In [ ]:
plots.plot_histogram(res_reg, dimension_identifier="solver");

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Below we have created another histogram showing distribution of values sampled for hyperparameter solver during the optimization process of the classification problem section. We can notice that it has sampled solver named newton-cg more compared to others as it might be giving good results.

In [ ]:
plots.plot_histogram(res_classif, dimension_identifier="solver");

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Convergence Plot

The convergence plot shows how we converged to the minimum value of an objective function over the number of different trials of hyperparameter combinations. The x-axis shows the number of calls to the objective function and the y-axis shows the minimum value of the objective function after that many calls.

We can create convergence plot using plot_convergence() method of plots module by giving result object (scipy.optimize.OptimizeResult) to it.

Below we have created a convergence plot using the result object from the regression problem section. We can notice that it seems to have converged after 15-17 trials and after that metric value returned from the objective function is not decreasing any more.

We can come to the conclusion that trials performed after the first 15-17 trials were not able to reduce the value of the optimization function. We can avoid extra calls to the objective function if the metric value is not decreasing after a particular number of calls which can save time and resources.

In [ ]:
plots.plot_convergence(res_reg);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Below we have created a convergence plot using the result object from the classification problem section. We can notice that it seems to have achieved 100% accuracy after first around 5 trials only.

In [ ]:
plots.plot_convergence(res_classif);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Cumulative Regret Plot

Regret plot shows regret of not sampling best hyperparameters settings. Regret refers to mistakes in sampling particular hyperparameters combinations which has given bad results. It can help make better decisions on sampling for upcoming trials. The regret in regret plot is different between total loss accumulated till now minus the minimum loss of all trials till now. If the line represented by the cumulative plot flattens over time then we can be sure that we are making less or no mistake in sampling hyperparameters.

We can create regret plot using plot_regret() method by giving result object (scipy.optimize.OptimizeResult) to it.

Below we have created a cumulative regret plot using the result object from the regression problem section.

In [ ]:
plots.plot_regret(res_reg);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

Below we have created another cumulative regret plot using the result object from the classification problem section.

In [ ]:
plots.plot_regret(res_classif);

Scikit-Optimize: Simple Guide to Hyperparameters Optimization/Tunning

This ends our small tutorial explaining the usage of scikit-optimize library. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki