Updated On : Sep-09,2022 Time Investment : ~45 mins

Hyperopt - Simple Guide to Hyperparameters Tuning / Optimization

The complexity of machine learning models is increasing day by day due to the rise of deep learning and deep neural networks. This has given rise to a number of parameters for the ML model which are generally referred to as hyperparameters.

> What is Hyperparameters Tuning or Fine Tuning of ML Model?

Hyperparameters tuning also referred to as fine-tuning sometimes is a process of finding hyperparameters combination for ML / DL Model that gives best results (Global optima) in minimum amount of time.

ML Model trained with Hyperparameters combination found using this process generally gives best results compared to all other combinations. It gives best results for ML evaluation metrics. It gives least value for loss function.

> Why Fine Tune ML Model?

ML model can accept a wide range of hyperparameters combinations and we don't know upfront which combination will give us the best results. Hence, we need to try few to find best performing one.

> Is Grid Search not Enough?

The common approach used till now was to grid search through all possible combinations of values of hyperparameters.

Though this approach works well with small models and datasets, it becomes increasingly time-consuming with real-world problems with billions of examples and ML models with lots of hyperparameters.

> What Solution Python Offers for Hyperparameters Tuning?

Python has bunch of libraries (Optuna, Hyperopt, Scikit-Optimize, bayes_opt, etc) for Hyperparameters tuning. Hyperopt is one such library that let us try different hyperparameters combinations to find best results in less amount of time.

The hyperopt looks for hyperparameters combinations based on internal algorithms (Random Search | Tree of Parzen Estimators (TPE) | Adaptive TPE) that search hyperparameters space in places where the good results are found initially.

Hyperopt also lets us run trials of finding the best hyperparameters settings in parallel using MongoDB and Spark. This lets us scale the process of finding the best hyperparameters on more than one computer and cores.

> What Can You Learn From This Article?

As a part of this tutorial, we have explained how to use Python library hyperopt for 'hyperparameters tuning' which can improve performance of ML Models. Tutorial provides a simple guide to use "hyperopt" with scikit-learn ML models to make things simpler and easy to understand.

Tutorial starts by optimizing parameters of a simple line formula to get individuals familiar with "hyperopt" library. Then, it explains how to use "hyperopt" with scikit-learn regression and classification models. We have also listed steps for using "hyperopt" at the beginning.

> How to Install 'Hyperopt'?

  • PIP
    • pip install -U hyperopt
  • Conda
    • conda install -c conda-forge hyperopt

Below we have listed important sections of the tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Steps to Use "Hyperopt"
  2. Minimize Simple Line Formula
    • Simple Example with Default Arguments
      • Define Objective Function
      • Define Search Space
      • Minimize Objective Function
      • Print Best Results
    • Trials Object for Tracking Stats
      • Pass Trials Object for Recording Tuning Statistics
      • How to Retrieve Statistics Of Individual Trial?
      • How to Retrieve Statistics Of Best Trial?
      • Other Useful Methods and Attributes of Trials Object
  3. Hyperparameters Tuning for Regression Tasks | Scikit-Learn
    • Load Boston Housing Dataset
    • Define Hyperparameters Search Space
    • Define Objective Function
    • Optimize Objective Function (Minimize for Least MSE)
    • Print Best Results
    • Train and Evaluate Model with Best Hyperparameters
  4. Hyperparameters Tuning for Classification Tasks | Scikit-Learn
    • Load Wine Dataset
    • Define Hyperparameters Search Space
    • Define Objective Function
    • Optimize Objective Function (Maximize for Highest Accuracy)
    • Print Best Results
    • Train and Evaluate Model with Best Hyperparameters

NOTE: You can skip first section where we have explained the usage of "hyperopt" with simple line formula if you are in hurry. That section has many definitions. You can refer to it later as well. All sections are almost independent and you can go through any of them directly.

We'll start our tutorial by importing the necessary Python libraries.

import hyperopt

import warnings

warnings.filterwarnings("ignore")

print("Hyperopt Version : {}".format(hyperopt.__version__))
Hyperopt Version : 0.2.5

1. Steps to Use "Hyperopt"

  1. Create an Objective Function.
    • This step requires us to create a function that creates an ML model, fits it on train data, and evaluates it on validation or test set returning some loss value or metric (MSE, MAE, Accuracy, etc.) that captures the performance of the model. We want to minimize / maximize the loss / metric value returned through this function.
  2. Create search space of hyperparameters.
    • This is the step where we declare a list of hyperparameters and a range of values for each that we want to try.
  3. Minimize / Maximize objective function by trying different hyperparameters combinations from search space.
    • This is the step where we give different settings of hyperparameters to the objective function and return metric value for each setting.
    • Hyperopt internally uses one of the below-mentioned algorithms to search hyperparameters space to find the best settings of hyperparameters.

The first two steps can be performed in any order.

Now, We'll be explaining how to perform these steps using the API of Hyperopt.

2. Minimize Simple Line Formula

As a part of this section, we'll explain how to use hyperopt to minimize the simple line formula. We'll be trying to find a minimum value where line equation 5x-21 will be zero.

We can easily calculate that by setting the equation to zero. But we want that hyperopt tries a list of different values of x and finds out at which value the line equation evaluates to zero.

This simple example will help us understand how we can use hyperopt. We'll then explain usage with scikit-learn models from the next example.

NOTE: Please feel free to skip this section if you are in hurry and want to learn how to use "hyperopt" with ML models. This section explains usage of "hyperopt" with simple line formula. It has quite theoretical sections. You can refer this section for theories when you have any doubt going through other sections. If you have enough time then going through this section will prepare you well with concepts.

2.1 Simple Example with Default Arguments

2.1.1 Define Objective Function

The first step will be to define an objective function which returns a loss or metric that we want to minimize. Hyperopt will give different hyperparameters values to this function and return value after each evaluation. This value will help it make a decision on which values of hyperparameter to try next.

Below we have defined an objective function with a single parameter x. It returns a value that we get after evaluating line formula 5x - 21.

We have put line formula inside of python function abs() so that it returns value >=0. This way we can be sure that the minimum metric value returned will be 0. If we don't use abs() function to surround the line formula then negative values of x can keep decreasing metric value till negative infinity.

def objective(x):
    return abs(5*x - 21)

2.1.2 Define Search Space

The second step will be to define search space for hyperparameters. The search space refers to the name of hyperparameters and their range of values that we want to give to the objective function for evaluation.

In this simple example, we have only one hyperparameter named x whose different values will be given to the objective function in order to minimize the line formula.

Hyperopt requires us to declare search space using a list of functions it provides. It has a module named 'hp' that provides a bunch of methods that can be used to declare search space for continuous (integers & floats) and categorical variables.

Below we have listed few methods and their definitions that we'll be using as a part of this tutorial


> Important Methods of "hyperopt.hp" Module to Declare Hyperparameters in Search Space

  • hp.uniform(label, low, high) - This method accepts string label as the first parameter specifying the name of the hyperparameter. The next two parameters are low and high values specifying the range from which we want to select different values. It retrieves values from a uniform distribution. It's used for continuous variables.
  • hp.normal(label, low, high) - This method also accepts a string label specifying hyperparameter name as the first argument. The low and high values specify the range from which to try different values. It retrieves values from a normal distribution. It's used for continuous variables.
  • hp.choice(label, options) - This method like the other two accepts string label naming hyperparameters. The second parameter is a list of values. The algorithm will select different values from this list. It's commonly used for categorical variables.

There are other methods available from hp module like lognormal(), loguniform(), pchoice(), etc which can be used for trying log and probability-based values. It's not included in this tutorial to keep it simple.


As we have only one hyperparameter for our line formula function, we have declared a search space that tries different values of it. We have declared search space using uniform() function with range [-10,10]. We'll explain in our upcoming examples, how we can create search space with multiple hyperparameters.

from hyperopt import hp

search_space = hp.uniform("x", -10, 10)

search_space
<hyperopt.pyll.base.Apply at 0x7fe0adc750b8>

2.1.3 Minimize Objective Function (Loss or Metric)

Our last step will be to use an algorithm that tries different values of hyperparameter from search space and evaluates objective function using those values. It tries to minimize the return value of an objective function.

Hyperopt provides a function named 'fmin()' for this purpose. We need to provide it objective function, search space, and algorithm which tries different combinations of hyperparameters.

It'll then use this algorithm to minimize the value returned by the objective function based on search space in less time.

It'll look at places where the objective function is giving minimum value the majority of the time and explore hyperparameter values in those places.


> Hyperopt 'fmin()' Function Signature

  • fmin(fn,space,algo,max_evals=9223372036854775807,timeout=None,loss_threshold=None,trials=None,rstate=None,verbose=True,return_argmin=True,show_progressbar=True,early_stop_fn=None) - This function takes as input objective function, hyperparameters search space and search algorithm as input. It then tries different combinations of hyperparameters until it finds minimum value of objective function. It'll keep running and trying different values if we don't specify to stop it using one of the max_eval, timeout or loss_threshold parameter values.
    • The fn parameter accepts a callable which is our objective function.
    • The space parameter accepts search space declared using methods from hp module.
    • The algo parameter accepts one of the below-mentioned three options specifying search algorithm.
      • hyperopt.rand.suggest - It'll try random values of hyperparameters.
      • hyperopt.tpe.suggest - It'll try values of hyperparameters using Tree Parzen Estimators - TPE algorithm.
      • hyperopt.atpe.suggest - It'll try values of hyperparameters using Adaptive TPE algorithm.
    • The max_vals parameter accepts integer value specifying how many different trials of objective function should be executed it. It'll try that many values of hyperparameters combination on it.
    • The timeout parameter accepts integer values specifying the function to timeout after that many second has passed.
    • The loss_threshold parameter accepts a float value. Our objective function returns a metric value which is generally loss for ML algorithms like MSE for regression problems. If some combination of hyperparameters causes an objective function to return a value that is less than this parameter value then it'll stop the algorithm.
    • The trials parameter accepts an instance of Trials class. This class is generally used to store statistics of different trials (A single trial refers to a single combination of hyperparameters tried on an objective function).
    • The rstate parameter accepts numpy.RandomState. This is used for reproducibility.
    • The return_argmin parameter is True by default and causes fmin() to return a dictionary which has hyperparameters combination that gave best results i.e least value for objective function.
    • The early_stop_fn accepts a callable which is executed after each trial. The callable takes as input result of the objective function and returns True if the fmin() function should stop else returns False instructing to continue with more trials.

Below we have executed fmin() with our objective function, earlier declared search space, and TPE algorithm to search hyperparameters search space. We have instructed it to try 100 different values of hyperparameter x using max_evals parameter.

The function returns a dictionary of best results i.e hyperparameters which gave the least value for the objective function.

The TPE algorithm tries different values of hyperparameter x in the range [-10,10] evaluating line formula each time. It'll look where objective values are decreasing in the range and will try different values near those values to find the best results.

best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             max_evals=100)
100%|██████████| 100/100 [00:00<00:00, 525.37trial/s, best loss: 0.1308582656748598]

2.1.4 Print Best Results

Below we have printed the best hyperparameter value that returned the minimum value from the objective function.

We have then evaluated the value of the line formula as well using that hyperparameter value.

We can notice from the result that it seems to have done a good job in finding the value of x which minimizes line formula 5x - 21 though it's not best. If we try more than 100 trials then it might further improve results.

best_results
{'x': 4.173828346865028}
obj_func_res = abs(5*best_results["x"] - 21)

print("Value of Function 5x-21 at best value is : {}".format(obj_func_res))
Value of Function 5x-21 at best value is : 0.1308582656748598

2.2 Trials Object for Tracking Stats

When we executed 'fmin()' function earlier which tried different values of parameter x on objective function. After trying 100 different values of x, it returned the value of x using which objective function returned the least value.

Though function tried 100 different values, we don't have information about which values were tried, objective values during trials, etc. In short, we don't have any stats about different trials.

NOTE: Each individual hyperparameters combination given to objective function is counted as one trial.

2.2.1 Pass Trials Object for Recording Tuning Statistics

Hyperopt lets us record stats of our optimization process using Trials instance. It'll record different values of hyperparameters tried, objective function values during each trial, time of trials, state of the trial (success/failure), etc. We just need to create an instance of Trials and give it to trials parameter of fmin() function and it'll record stats of our optimization process.

Below we have declared Trials instance and called fmin() function again with this object. We have again tried 100 trials on the objective function.

trials_obj = hyperopt.Trials()

best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             max_evals=100,
                             trials=trials_obj
                            )
100%|██████████| 100/100 [00:00<00:00, 509.74trial/s, best loss: 0.03237813101906539]

The Trials instance has a list of attributes and methods which can be explored to get an idea about individual trials.

The Trials instance has an attribute named trials which has a list of dictionaries where each dictionary has stats about one trial of the objective function. Below we have printed the content of the first trial. We can notice from the contents that it has information like id, loss, status, x value, datetime, etc.

trials_obj.trials[0]
{'state': 2,
 'tid': 0,
 'spec': None,
 'result': {'loss': 56.22126394071591, 'status': 'ok'},
 'misc': {'tid': 0,
  'cmd': ('domain_attachment', 'FMinIter_Domain'),
  'workdir': None,
  'idxs': {'x': [0]},
  'vals': {'x': [-7.044252788143181]}},
 'exp_key': None,
 'owner': None,
 'version': 0,
 'book_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 57000),
 'refresh_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 57000)}

2.2.2 How to Retrieve Statistics Of Individual Trial?

Below we have retrieved the objective function value from the first trial available through trials attribute of Trial instance. We have a printed loss present in it. We have then retrieved x value of this trial and evaluated our line formula to verify loss value with it. We can notice that both are the same.

first_trial = trials_obj.trials[0]

print("Loss Value of First Trial : {}".format(first_trial['result']['loss']))

loss = abs(5*first_trial['misc']['vals']['x'][0] - 21)

print("Loss Value of First Trial : {}".format(loss))
Loss Value of First Trial : 56.22126394071591
Loss Value of First Trial : 56.22126394071591

2.2.3 How to Retrieve Statistics Of Best Trial?

The Trial object has an attribute named best_trial which returns a dictionary of the trial which gave the best results i.e. least value from an objective function (least loss). We have printed details of the best trial. We have then printed loss through best trial and verified it as well by putting x value of the best trial in our line formula.

best_trial = trials_obj.best_trial

best_trial
{'state': 2,
 'tid': 77,
 'spec': None,
 'result': {'loss': 0.03237813101906539, 'status': 'ok'},
 'misc': {'tid': 77,
  'cmd': ('domain_attachment', 'FMinIter_Domain'),
  'workdir': None,
  'idxs': {'x': [77]},
  'vals': {'x': [4.193524373796187]}},
 'exp_key': None,
 'owner': None,
 'version': 0,
 'book_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 209000),
 'refresh_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 209000)}
print("Loss Value of Best Trial : {}".format(best_trial['result']['loss']))

loss = abs(5*best_trial['misc']['vals']['x'][0] - 21)

print("Loss Value of Best Trial : {}".format(loss))
Loss Value of Best Trial : 0.03237813101906539
Loss Value of Best Trial : 0.03237813101906539

2.2.4 Useful Methods and Attributes of Trials Object

In this section, we'll explain the usage of some useful attributes and methods of Trial object.


> Useful Attributes of Trials Object

  • results - This attribute returns a list of dictionaries. Each dictionary has details about the results of an individual trial. It has loss value and status of trial in it.
  • vals - This attribute returns a list of values of hyperparameters that were tried.

> Useful Methods of Trials Object

  • average_best_error() - This function returns average error of all trials of the experiment.
  • statuses() - This method returns list of status values of trials.
  • losses() - This method returns list of losses.
  • miscs() - This method returns details like ids, working directory, values of hyperparameters, index of trial, etc.

Below we have printed values of useful attributes and methods of Trial instance for explanation purposes.

results = trials_obj.results

print("Total Results : {}".format(len(results)))
print("Best Result : {}".format(trials_obj.average_best_error()))
print("First Few Results : ")
results[:5]
Total Results : 100
Best Result : 0.4362271636430819
First Few Results :
[{'loss': 36.03465338318733, 'status': 'ok'},
 {'loss': 16.725453187066798, 'status': 'ok'},
 {'loss': 22.59591554564137, 'status': 'ok'},
 {'loss': 31.781846627893387, 'status': 'ok'},
 {'loss': 47.18387600505959, 'status': 'ok'}]
print("First Few Status : {}".format(trials_obj.statuses()[:5]))
print("\nFirst Few X Values : {}".format(trials_obj.vals['x'][:5]))
print("\nFirst Few Losses : {}".format(trials_obj.losses()[:5]))
print("\nFirst Few Miscs : {}".format(trials_obj.miscs[:5]))
First Few Status : ['ok', 'ok', 'ok', 'ok', 'ok']

First Few X Values : [-3.0069306766374666, 7.5450906374133595, 8.719183109128274, -2.1563693255786776, -5.236775201011918]

First Few Losses : [36.03465338318733, 16.725453187066798, 22.59591554564137, 31.781846627893387, 47.18387600505959]

First Few Miscs : [{'tid': 0, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [0]}, 'vals': {'x': [-3.0069306766374666]}}, {'tid': 1, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [1]}, 'vals': {'x': [7.5450906374133595]}}, {'tid': 2, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [2]}, 'vals': {'x': [8.719183109128274]}}, {'tid': 3, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [3]}, 'vals': {'x': [-2.1563693255786776]}}, {'tid': 4, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [4]}, 'vals': {'x': [-5.236775201011918]}}]

3. Hyperparameters Tuning for Regression Tasks | Scikit-Learn

In this section, we'll explain how we can use hyperopt with machine learning library scikit-learn. This framework will help the reader in deciding how it can be used with any other ML framework.

The transition from scikit-learn to any other ML framework is pretty straightforward by following the below steps.

We'll be using hyperopt to find optimal hyperparameters for a regression problem.

Hyperopt - Simple Guide to Hyperparameters Tuning / Optimization

3.1 Load Boston Housing Dataset

We'll be using the Boston housing dataset available from scikit-learn. It has information houses in Boston like the number of bedrooms, the crime rate in the area, tax rate, etc. The target variable of the dataset is the median value of homes in 1000 dollars. As the target variable is a continuous variable, this will be a regression problem.

Below we have loaded our Boston hosing dataset as variable X and Y. The variable X has data for each feature and variable Y has target variable values. We have then divided the dataset into the train (80%) and test (20%) sets.

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
((404, 13), (102, 13), (404,), (102,))

3.2 Define Hyperparameters Search Space

Below we have declared hyperparameters search space for our example. We'll be using Ridge regression solver available from scikit-learn to solve the problem. We'll be trying to find the best values for three of its hyperparameters.


> Parameters to Tune

  • alpha
  • fit_intercept
  • solvers

We have declared search space as a dictionary. The alpha hyperparameter accepts continuous values whereas fit_intercept and solvers hyperparameters has list of fixed values.

We have declared a dictionary where keys are hyperparameters names and values are calls to function from hp module which we discussed earlier. These functions are used to declare what values of hyperparameters will be sent to the objective function for evaluation.

intercepts = [True, False]
solvers = ["svd", "cholesky", "lsqr", "sag", "saga"]

search_space = {
    "alpha": hp.normal("alpha", 1, 5),
    "fit_intercept": hp.choice("fit_intercept", intercepts),
    "solver": hp.choice("solver", solvers)
}

3.3 Define Objective Function

Our objective function starts by creating Ridge solver with arguments given to the objective function. We then fit ridge solver on train data and predict labels for test data. We are then printing hyperparameters combination that was passed to the objective function. We also print the mean squared error on the test dataset.

Our objective function returns MSE on test data which we want it to minimize for best results.

We have used mean_squared_error() function available from 'metrics' sub-module of scikit-learn to evaluate MSE. Scikit-learn provides many such evaluation metrics for common ML tasks. Please feel free to check below link if you want to know about them.

from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

def objective(args):
    ridge_reg = Ridge(**args, random_state=123)

    ridge_reg.fit(X_train, Y_train)

    Y_pred = ridge_reg.predict(X_test)

    print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
    print("MSE : {}\n".format(mean_squared_error(Y_test, Y_pred)))

    return mean_squared_error(Y_test, Y_pred)

3.4 Optimize Objective Function (Minimize for Least MSE)

In this section, we have called fmin() function with the objective function, hyperparameters search space, and TPE algorithm for search. We have instructed the method to try 10 different trials of the objective function. We have also created Trials instance for tracking stats of trials.

We can notice from the output that it prints all hyperparameters combinations tried and their MSE as well.

trials_obj = hyperopt.Trials()

best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             trials=trials_obj,
                             max_evals=10)
Hyperparameters : {'alpha': 11.456909783038185, 'fit_intercept': False, 'solver': 'cholesky'}
MSE : 32.6780206950975

Hyperparameters : {'alpha': -6.022064763493154, 'fit_intercept': False, 'solver': 'saga'}
MSE : 40.0745463100764

Hyperparameters : {'alpha': 7.436267706835002, 'fit_intercept': False, 'solver': 'svd'}
MSE : 32.689769046138714

Hyperparameters : {'alpha': 1.5656653073238473, 'fit_intercept': True, 'solver': 'lsqr'}
MSE : 29.46296743296732

Hyperparameters : {'alpha': -3.592042958686301, 'fit_intercept': True, 'solver': 'svd'}
MSE : 32.02190509726122

Hyperparameters : {'alpha': -4.16209741657863, 'fit_intercept': True, 'solver': 'lsqr'}
MSE : 28.192485758346496

Hyperparameters : {'alpha': 7.529616236524469, 'fit_intercept': True, 'solver': 'saga'}
MSE : 29.357834183797227

Hyperparameters : {'alpha': 2.730093210771717, 'fit_intercept': True, 'solver': 'saga'}
MSE : 29.3466844072238

Hyperparameters : {'alpha': 8.423615538119535, 'fit_intercept': False, 'solver': 'svd'}
MSE : 32.68866941310808

Hyperparameters : {'alpha': 2.1400505222685284, 'fit_intercept': False, 'solver': 'cholesky'}
MSE : 32.65079545632969

100%|██████████| 10/10 [00:00<00:00, 18.36trial/s, best loss: 28.192485758346496]

3.5 Print Best Results

Below we have printed the best results of the above experiment. Please make a note that in the case of hyperparameters with a fixed set of values, it returns the index of value from a list of values of hyperparameter.

It returned index 0 for fit_intercept hyperparameter which points to value True if you check above in search space section. Same way, the index returned for hyperparameter solver is 2 which points to lsqr.

best_results
{'alpha': -4.16209741657863, 'fit_intercept': 0, 'solver': 2}
best_trial = trials_obj.best_trial

best_trial
{'state': 2,
 'tid': 5,
 'spec': None,
 'result': {'loss': 28.192485758346496, 'status': 'ok'},
 'misc': {'tid': 5,
  'cmd': ('domain_attachment', 'FMinIter_Domain'),
  'workdir': None,
  'idxs': {'alpha': [5], 'fit_intercept': [5], 'solver': [5]},
  'vals': {'alpha': [-4.16209741657863], 'fit_intercept': [0], 'solver': [2]}},
 'exp_key': None,
 'owner': None,
 'version': 0,
 'book_time': datetime.datetime(2021, 10, 7, 2, 15, 3, 417000),
 'refresh_time': datetime.datetime(2021, 10, 7, 2, 15, 3, 426000)}
print("Best MSE : {}".format(trials_obj.average_best_error()))
Best MSE : 28.192485758346496

3.6 Train and Evaluate Model with Best Hyperparameters

In this section, we have created Ridge model again with the best hyperparameters combination that we got using hyperopt. We have then trained the model on train data and evaluated it for MSE on both train and test data.

Please make a NOTE that we can save the trained model during the hyperparameters optimization process if the training process is taking a lot of time and we don't want to perform it again. We can include logic inside of the objective function which saves all different models that were tried so that we can later reuse the one which gave the best results by just loading weights.

from sklearn.metrics import mean_squared_error

alpha = best_results["alpha"]
fit_intercept = intercepts[best_results["fit_intercept"]]
solver = solvers[best_results["solver"]]

ridge = Ridge(alpha=alpha,
              fit_intercept=fit_intercept,
              solver=solver,
              random_state=123)

ridge.fit(X_train, Y_train)

Y_pred = ridge.predict(X_test)

print("Test  MSE : {}".format(mean_squared_error(Y_test, Y_pred)))
print("Train MSE : {}".format(mean_squared_error(Y_train, ridge.predict(X_train))))
Test  MSE : 28.192485758346496
Train MSE : 20.677107947815138

4. Hyperparameters Tuning for Classification Tasks | Scikit-Learn

In this section, we'll again explain how to use hyperopt with scikit-learn but this time we'll try it for classification problem.

Also, we'll explain how we can create complicated search space through this example.

We'll be using the wine dataset available from scikit-learn for this example. The wine dataset has the measurement of ingredients used in the creation of three different types of wine. The measurement of ingredients is the features of our dataset and wine type is the target variable.

Hyperopt - Simple Guide to Hyperparameters Tuning / Optimization

4.1 Load Wine Dataset

Below we have loaded the wine dataset from scikit-learn and divided it into the train (80%) and test (20%) sets.

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_wine(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
((142, 13), (36, 13), (142,), (36,))

4.2 Define Hyperparameters Search Space

We'll be using LogisticRegression solver for our problem hence we'll be declaring a search space that tries different values of hyperparameters of it. We'll try to find the best values of the below-mentioned four hyperparameters for LogisticRegression which gives the best accuracy on our dataset.


> Parameters to Tune

  • fit_intercept
  • C
  • penalty
  • solver

The search space for this example is a little bit involved because some solver of LogisticRegression do not support all different penalties available. The saga solver supports penalties l1, l2, and elasticnet. The liblinear solver supports l1 and l2 penalties. The newton-cg and lbfgs solvers supports l2 penalty only.

As we want to try all solvers available and want to avoid failures due to penalty mismatch, we have created three different cases based on combinations. The hyperparameters fit_intercept and C are the same for all three cases hence our final search space consists of three key-value pairs (C, fit_intercept, and cases). The cases are further involved based on a combination of solver and penalty combinations. We have declared C using hp.uniform() method because it's a continuous feature. We want to try values in the range [1,5] for C. All other hyperparameters are declared using hp.choice() method as they are all categorical.

intercepts = [True, False]
solvers1 = ["newton-cg", "lbfgs", ]
solvers2 = ["liblinear", ]
solvers3 = ["saga", ]
penalties1 = ["l2", "none"]
penalties2 = ["l1", "l2"]
penalties3 = ["l1", "elasticnet", "l2"]

case1 = {
        "penalty1": hp.choice("penalty1", penalties1),
        "solver1": hp.choice("solver1", solvers1)
}
case2 = {
        "penalty2": hp.choice("penalty2", penalties2),
        "solver2": hp.choice("solver2", solvers2)
}
case3 = {
        "penalty3": hp.choice("penalty3", penalties3),
        "solver3": hp.choice("solver3", solvers3)
}

search_space = {
    "C": hp.uniform("C", 1, 5),
    "fit_intercept": hp.choice("fit_intercept", intercepts),
    "cases" : hp.choice("cases", [("case1", case1), ("case2", case2), ("case3", case3)])
}

4.3 Define Objective Function

The objective function starts by retrieving values of different hyperparameters. It uses conditional logic to retrieve values of hyperparameters penalty and solver. The value is decided based on the case.

We then create LogisticRegression model using received values of hyperparameters and train it on a training dataset. We are then printing hyperparameters combination that was tried and accuracy of the model on the test dataset.

At last, our objective function returns the value of accuracy multiplied by -1. The reason for multiplying by -1 is that during the optimization process value returned by the objective function is minimized. In order to increase accuracy, we have multiplied it by -1 so that it becomes negative and the optimization process tries to find as much negative value as possible.

We can also use cross-entropy loss (commonly used for classification tasks) as value returned by objective function. In that case, we don't need to multiply by -1 as cross-entropy loss needs to be minimized and less value is good.

from sklearn.linear_model import LogisticRegression

def objective(args):
    C = args["C"]
    fit_intercept = args["fit_intercept"]
    kwds = args["cases"]
    penalty = kwds[1]["penalty1"] if kwds[0] == "case1" else kwds[1]["penalty2"] if kwds[0] == "case2" else kwds[1]["penalty3"]
    solver = kwds[1]["solver1"] if kwds[0] == "case1" else kwds[1]["solver2"] if kwds[0] == "case2" else kwds[1]["solver3"]

    log_reg = LogisticRegression(C=C,
                                 fit_intercept=fit_intercept,
                                 penalty=penalty,
                                 solver=solver,
                                 l1_ratio=0.5,
                                 random_state=123)

    log_reg.fit(X_train, Y_train)

    print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
    print("Accuracy : {}\n".format(log_reg.score(X_test, Y_test))) ## This can be commented if not needed.

    return -1 * log_reg.score(X_test, Y_test) ## Multiplied by -1 to maximize accuracy

4.4 Optimize Objective Function (Maximize for Highest Accuracy)

Below we have called fmin() function with objective function and search space declared earlier. We have instructed it to try 20 different combinations of hyperparameters on the objective function. We have also created Trials instance for tracking stats of the optimization process. We have used TPE algorithm for the hyperparameters optimization process.

trials_obj = hyperopt.Trials()

best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             trials=trials_obj,
                             max_evals=20)
Hyperparameters : {'C': 2.132667331628797, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 4.0164334560048385, 'cases': ('case3', {'penalty3': 'l1', 'solver3': 'saga'}), 'fit_intercept': True}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 2.1451046893043335, 'cases': ('case2', {'penalty2': 'l1', 'solver2': 'liblinear'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 3.5789654619971643, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 1.846165119583588, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 2.356022184620913, 'cases': ('case1', {'penalty1': 'none', 'solver1': 'lbfgs'}), 'fit_intercept': False}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 1.075729082181017, 'cases': ('case3', {'penalty3': 'elasticnet', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 4.1133711118733665, 'cases': ('case1', {'penalty1': 'l2', 'solver1': 'lbfgs'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 2.2293354795788574, 'cases': ('case2', {'penalty2': 'l1', 'solver2': 'liblinear'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 4.045078577616245, 'cases': ('case2', {'penalty2': 'l1', 'solver2': 'liblinear'}), 'fit_intercept': False}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 3.0959507645146225, 'cases': ('case1', {'penalty1': 'l2', 'solver1': 'lbfgs'}), 'fit_intercept': False}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 1.0345124154198277, 'cases': ('case2', {'penalty2': 'l2', 'solver2': 'liblinear'}), 'fit_intercept': True}
Accuracy : 1.0

Hyperparameters : {'C': 4.341182808917706, 'cases': ('case2', {'penalty2': 'l2', 'solver2': 'liblinear'}), 'fit_intercept': False}
Accuracy : 1.0

Hyperparameters : {'C': 4.226174618434312, 'cases': ('case1', {'penalty1': 'none', 'solver1': 'newton-cg'}), 'fit_intercept': False}
Accuracy : 0.9166666666666666

Hyperparameters : {'C': 4.992038571459227, 'cases': ('case1', {'penalty1': 'none', 'solver1': 'newton-cg'}), 'fit_intercept': True}
Accuracy : 0.9166666666666666

Hyperparameters : {'C': 1.4325225400837742, 'cases': ('case3', {'penalty3': 'elasticnet', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 1.7733695282704844, 'cases': ('case1', {'penalty1': 'l2', 'solver1': 'lbfgs'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 3.285207336481444, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 1.0086786647760757, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': True}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 3.3852032972891606, 'cases': ('case3', {'penalty3': 'elasticnet', 'solver3': 'saga'}), 'fit_intercept': True}
Accuracy : 0.6944444444444444

100%|██████████| 20/20 [00:00<00:00, 25.30trial/s, best loss: -1.0]

4.5 Print Best Results

In this section, we have printed the results of the optimization process. We have printed the best hyperparameters setting and accuracy of the model. We have multiplied value returned by method average_best_error() with -1 to calculate accuracy.

We have then constructed an exact dictionary of hyperparameters that gave the best accuracy.

print("Best Hyperparameters Settings : {}".format(best_results))
print("\nBest Accuracy : {}".format(-1 * trials_obj.average_best_error()))
Best Hyperparameters Settings : {'C': 1.0345124154198277, 'cases': 1, 'fit_intercept': 0, 'penalty2': 1, 'solver2': 0}

Best Accuracy : 1.0
C = best_results["C"]
fit_intercept = intercepts[best_results["fit_intercept"]]
if best_results["cases"] == 0:
    penalty = penalties1[best_results["penalty1"]]
    solver = solvers1[best_results["solver1"]]
elif best_results["cases"] == 1:
    penalty = penalties2[best_results["penalty2"]]
    solver = solvers2[best_results["solver2"]]
elif best_results["cases"] == 2:
    penalty = penalties3[best_results["penalty3"]]
    solver = solvers3[best_results["solver3"]]

print("Best Hyperparameters Settings : {}".format({"C":C,
                                                   "penalty": penalty,
                                                   "fit_intercept": fit_intercept,
                                                   "solver":solver,
                                                  }))
Best Hyperparameters Settings : {'C': 1.0345124154198277, 'penalty': 'l2', 'fit_intercept': True, 'solver': 'liblinear'}

4.6 Train and Evaluate Model with Best Hyperparameters

In this section, we have again created LogisticRegression model with the best hyperparameters setting that we got through an optimization process. We have then trained it on a training dataset and evaluated accuracy on both train and test datasets for verification purposes.

log_reg = LogisticRegression(C=C,
                             penalty=penalty,
                             fit_intercept=fit_intercept,
                             solver=solver,
                             random_state=123)

log_reg.fit(X_train, Y_train)

print("Test  Accuracy : {}".format(log_reg.score(X_test, Y_test)))
print("Train Accuracy : {}".format(log_reg.score(X_train, Y_train)))
Test  Accuracy : 1.0
Train Accuracy : 0.971830985915493

This ends our small tutorial explaining how to use Python library 'hyperopt' to find the best hyperparameters settings for our ML model.

References

Other Python Libraries for Hyperparameters Tuning

Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.


Subscribe to Our YouTube Channel

YouTube SubScribe

Newsletter Subscription