Updated On : Oct-01,2021 Tags hyperopt, hyperparameters-optimization
Hyperopt - Simple Guide to Hyperparameters Optimization/Tunning

Hyperopt - Simple Guide to Hyperparameters Optimization/Tunning

The complexity of machine learning models is increasing day by day due to the rise of deep learning and deep neural networks. This has given rise to a number of parameters for the ML model which is generally referred to as hyperparameters. Each of the model hyperparameters can accept a range of values and we don't know upfront which combinations of these values will give us the best results. The common approach used till now was to grid search through all possible combinations of values of hyperparameters. Though this approach works well with small models and datasets, it becomes increasingly time-consuming with real-world problems with billions of samples and ML models with lots of hyperparameters.

To solve this problem, Python has a library named hyperopt which is used to optimize hyperparameters combinations and try only those values which give the best results ignoring others. The hyperopt looks for hyperparameters combinations based on internal algorithms that search hyperparameters space in places where the good results are found initially. Hyperopt also let us run trials of finding the best hyperparameters settings in parallel using MongoDB and Spark. This lets us scale the process of finding the best hyperparameters on more than one computer and cores. As a part of this tutorial, we'll explain how we can use hyperopt to optimize hyperparameters that give the best results for our given ML model. We'll be explaining the usage of it with scikit-learn models to make things simpler and easy to understand.

Commonly Followed Steps to Use Hyperopt

  1. Create an Objective Function.
    • This step requires us to create a function that creates an ML model, fits it on train data, and evaluates it on validation or test set returning some metric (MSE, MAE, Accuracy, etc.) that captures the performance of the model. We want to minimize/maximize the metric value returned through this function.
  2. Create search space of hyperparameters.
    • This is the step where we declare a list of hyperparameters and a range of values for each that we want to try.
  3. Minimize objective function by trying different hyperparameters values from search space.
    • This is the step where we give different settings of hyperparameters to the objective function and return metric value for each setting. Hyperopt internally uses one of the below-mentioned algorithms to search hyperparameters space to find the best settings of hyperparameters. These algorithms search for space for the best results.

We'll be explaining how to perform these steps using the API of Hyperopt now.

Below we have sections of the tutorial to give an overview of the material covered in the tutorial.

Tutorial Sections Overview

We'll start our tutorial by importing the necessary libraries.

In [1]:
import hyperopt

import warnings

warnings.filterwarnings("ignore")

print("Hyperopt Version : {}".format(hyperopt.__version__))
Hyperopt Version : 0.2.5

1. Minimize Simple Line Formula

As a part of this section, we'll explain how to use hyperopt to minimize the simple line formula. We'll be trying to find a minimum value where line equation 5x-21 will be zero. We can easily calculate that by setting the equation to zero. But we want that hyperopt tries a list of different values of x and finds out at which value the line equation evaluates to zero. This simple example will help us understand how we can use hyperopt. We'll then explain usage with scikit-learn models from the next example.

Simple Example with Default Arguments

Define Objective Function

The first step will be to define an objective function which returns a metric value that we want to minimize. Hyperopt will give different values to this function and return metric value after each evaluation. This metric value will help it make a decision on which values of hyperparameter to try next.

Below we have defined an objective function with a single parameter x. It returns a value that we get after evaluating line formula 5x - 21. We have put line formula inside of python function abs() so that it returns value >=0. This way we can be sure that the minimum metric value returned will be 0. If we don't use abs() function to surround the line formula then negative values of x can keep decreasing metric value till negative infinity.

In [2]:
def objective(x):
    return abs(5*x - 21)

Define Search Space

The second step will be to define search space for hyperparameters. The search space refers to the name of hyperparameters and their range of values that we want to give to the objective function for evaluation.

In this simple example, we have only one hyperparameter named x whose different values will be given to the objective function in order to minimize the line formula.

Hyperopt requires us to declare search space using a list of functions it provides. It has a module named hp that provides a bunch of methods that can be used to declare search space for continuous (integers & floats) and categorical variables.

Below we have listed few methods and their definitions that we'll be using as a part of this tutorial


  • hp.uniform(label, low, high) - This method accepts string label as the first parameter specifying the name of the hyperparameter. The next two parameters are low and high values specifying the range from which we want to select different values. It retrieves values from a uniform distribution. It's used for continuous variables.
  • hp.normal(label, low, high) - This method also accepts a string label specifying hyperparameter name as the first argument. The low and high values specify the range from which to try different values. It retrieves values from a normal distribution. It's used for continuous variables.
  • hp.choice(label, options) - This method like the other two accepts string label naming hyperparameters. The second parameter is a list of values. The algorithm will select different values from this list. It's commonly used for categorical variables.

There are other methods available from hp module like lognormal(), loguniform(), pchoice(), etc which can be used for trying log and probability-based values. It's not included in this tutorial to keep it simple.


As we have only one hyperparameter for our line formula function, we have declared a search space that tries different values of it. We have declared search space using uniform() function with range [-10,10]. We'll explain in our upcoming examples, how we can create search space with multiple hyperparameters.

In [9]:
from hyperopt import hp

search_space = hp.uniform("x", -10, 10)

search_space
Out[9]:
<hyperopt.pyll.base.Apply at 0x7fe0adc750b8>

Minimize Objective Function

Our last step will be to use an algorithm that tries different values of hyperparameter from search space and evaluates objective function using those values. It tries to minimize the return value of an objective function.

Hyperopt provides a function named fmin() for this purpose. We need to provide it objective function, search space, and algorithm which tries different combinations of hyperparameters. It'll then use this algorithm to minimize the value returned by the objective function based on search space in less time. It'll look at places where the objective function is giving minimum value the majority of the time and explore hyperparameter values in those places.


  • fmin(fn,space,algo,max_evals=9223372036854775807,timeout=None,loss_threshold=None,trials=None,rstate=None,verbose=True,return_argmin=True,show_progressbar=True,early_stop_fn=None) - This function takes as input objective function, hyperparameters search space and search algorithm as input. It then tries different combinations of hyperparameters until it finds minimum value of objective function. It'll keep running and trying different values if we don't specify to stop it using one of the max_eval, timeout or loss_threshold parameter values.
    • The fn parameter accepts a callable which is our objective function.
    • The space parameter accepts search space declared using methods from hp module.
    • The algo parameter accepts one of the below-mentioned three options specifying search algorithm.
      • hyperopt.rand.suggest - It'll try random values of hyperparameters.
      • hyperopt.tpe.suggest - It'll try values of hyperparameters using Tree Parzen Estimators - TPE algorithm.
      • hyperopt.atpe.suggest - It'll try values of hyperparameters using Adaptive TPE algorithm.
    • The max_vals parameter accepts integer value specifying how many different trials of objective function should be executed it. It'll try that many values of hyperparameters combination on it.
    • The timeout parameter accepts integer values specifying the function to timeout after that many second has passed.
    • The loss_threshold parameter accepts a float value. Our objective function returns a metric value which is generally loss for ML algorithms like MSE for regression problems. If some combination of hyperparameters causes an objective function to return a value that is less than this parameter value then it'll stop the algorithm.
    • The trials parameter accepts an instance of Trials class. This class is generally used to store statistics of different trials (A single trial refers to a single combination of hyperparameters tried on an objective function).
    • The rstate parameter accepts numpy.RandomState. This is used for reproducibility.
    • The return_argmin parameter is True by default and causes fmin() to return a dictionary which has hyperparameters combination that gave best results i.e least value for objective function.
    • The early_stop_fn accepts a callable which is executed after each trial. The callable takes as input result of the objective function and returns True if the fmin() function should stop else returns False instructing to continue with more trials.

Below we have executed fmin() with our objective function, earlier declared search space, and TPE algorithm to search hyperparameters search space. We have instructed it to try 100 different values of hyperparameter x using max_evals parameter. The function returns a dictionary of best results i.e hyperparameters which gave the least value for the objective function. The TPE algorithm will try different values of hyperparameter x in the range [-10,10] evaluating line formula each time. It'll look where objective values are decreasing in the range and will try different values near those values to find the best results.

In [10]:
best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             max_evals=100)
100%|██████████| 100/100 [00:00<00:00, 525.37trial/s, best loss: 0.1308582656748598]

Below we have printed the best hyperparameter value that returned the minimum value from the objective function. We have then evaluated the value of the line formula as well using that hyperparameter value. We can notice from the result that it seems to have done a good job in finding the value of x which minimizes line formula 5x - 21 though it's not best. If we try more than 100 trials then it might further improve results.

In [11]:
best_results
Out[11]:
{'x': 4.173828346865028}
In [12]:
obj_func_res = abs(5*best_results["x"] - 21)

print("Value of Function 5x-21 at best value is : {}".format(obj_func_res))
Value of Function 5x-21 at best value is : 0.1308582656748598

Trials Object for Tracking Stats

When we executed fmin() function earlier which tried different values of parameter x on objective function. After trying 100 different values of x, it returned the value of x using which objective function returned the least value.

Though function tried 100 different values, we don't have information about which values were tried, objective values during trials, etc. In short, we don't have any stats about different trials.

Hyperopt lets us record stats of our optimization process using Trials instance. It'll record different values of hyperparameters tried, objective function values during each trial, time of trials, state of the trial (success/failure), etc. We just need to create an instance of Trials and give it to trials parameter of fmin() function and it'll record stats of our optimization process.

Below we have declared Trials instance and called fmin() function again with this object. We have again tried 100 trials on the objective function.

In [13]:
trials_obj = hyperopt.Trials()

best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             max_evals=100,
                             trials=trials_obj
                            )
100%|██████████| 100/100 [00:00<00:00, 509.74trial/s, best loss: 0.03237813101906539]

The Trials instance has a list of attributes and methods which can be explored to get an idea about individual trials.

The Trials instance has an attribute named trials which has a list of dictionaries where each dictionary has stats about one trial of the objective function. Below we have printed the content of the first trial. We can notice from the contents that it has information like id, loss, status, x value, datetime, etc.

In [14]:
trials_obj.trials[0]
Out[14]:
{'state': 2,
 'tid': 0,
 'spec': None,
 'result': {'loss': 56.22126394071591, 'status': 'ok'},
 'misc': {'tid': 0,
  'cmd': ('domain_attachment', 'FMinIter_Domain'),
  'workdir': None,
  'idxs': {'x': [0]},
  'vals': {'x': [-7.044252788143181]}},
 'exp_key': None,
 'owner': None,
 'version': 0,
 'book_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 57000),
 'refresh_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 57000)}

Individual Trial

Below we have retrieved the objective function value from the first trial available through trials attribute of Trial instance. We have a printed loss present in it. We have then retrieved x value of this trial and evaluated our line formula to verify loss value with it. We can notice that both are the same.

In [15]:
first_trial = trials_obj.trials[0]

print("Loss Value of First Trial : {}".format(first_trial['result']['loss']))

loss = abs(5*first_trial['misc']['vals']['x'][0] - 21)

print("Loss Value of First Trial : {}".format(loss))
Loss Value of First Trial : 56.22126394071591
Loss Value of First Trial : 56.22126394071591

Best Trial

The Trial object has an attribute named best_trial which returns a dictionary of the trial which gave the best results i.e. least value from an objective function (least loss). We have printed details of the best trial. We have then printed loss through best trial and verified it as well by putting x value of the best trial in our line formula.

In [16]:
best_trial = trials_obj.best_trial

best_trial
Out[16]:
{'state': 2,
 'tid': 77,
 'spec': None,
 'result': {'loss': 0.03237813101906539, 'status': 'ok'},
 'misc': {'tid': 77,
  'cmd': ('domain_attachment', 'FMinIter_Domain'),
  'workdir': None,
  'idxs': {'x': [77]},
  'vals': {'x': [4.193524373796187]}},
 'exp_key': None,
 'owner': None,
 'version': 0,
 'book_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 209000),
 'refresh_time': datetime.datetime(2021, 10, 8, 1, 47, 31, 209000)}
In [17]:
print("Loss Value of Best Trial : {}".format(best_trial['result']['loss']))

loss = abs(5*best_trial['misc']['vals']['x'][0] - 21)

print("Loss Value of Best Trial : {}".format(loss))
Loss Value of Best Trial : 0.03237813101906539
Loss Value of Best Trial : 0.03237813101906539

Useful Methods and Attributes of Trials Object

In this section, we'll explain the usage of some useful attributes and methods of Trial object.


Useful Attributes of Trials Object

  • results - This attribute returns a list of dictionaries. Each dictionary has details about the results of an individual trial. It has loss value and status of trial in it.
  • vals - This attribute returns a list of values of hyperparameters that were tried.

Useful Methods of Trials Object

  • average_best_error() - This function returns average error of all trials of the experiment.
  • statuses() - This method returns list of status values of trials.
  • losses() - This method returns list of losses.
  • miscs() - This method returns details like ids, working directory, values of hyperparameters, index of trial, etc.

Below we have printed values of useful attributes and methods of Trial instance for explanation purposes.

In [16]:
results = trials_obj.results

print("Total Results : {}".format(len(results)))
print("Best Result : {}".format(trials_obj.average_best_error()))
print("First Few Results : ")
results[:5]
Total Results : 100
Best Result : 0.4362271636430819
First Few Results :
Out[16]:
[{'loss': 36.03465338318733, 'status': 'ok'},
 {'loss': 16.725453187066798, 'status': 'ok'},
 {'loss': 22.59591554564137, 'status': 'ok'},
 {'loss': 31.781846627893387, 'status': 'ok'},
 {'loss': 47.18387600505959, 'status': 'ok'}]
In [17]:
print("First Few Status : {}".format(trials_obj.statuses()[:5]))
print("\nFirst Few X Values : {}".format(trials_obj.vals['x'][:5]))
print("\nFirst Few Losses : {}".format(trials_obj.losses()[:5]))
print("\nFirst Few Miscs : {}".format(trials_obj.miscs[:5]))
First Few Status : ['ok', 'ok', 'ok', 'ok', 'ok']

First Few X Values : [-3.0069306766374666, 7.5450906374133595, 8.719183109128274, -2.1563693255786776, -5.236775201011918]

First Few Losses : [36.03465338318733, 16.725453187066798, 22.59591554564137, 31.781846627893387, 47.18387600505959]

First Few Miscs : [{'tid': 0, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [0]}, 'vals': {'x': [-3.0069306766374666]}}, {'tid': 1, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [1]}, 'vals': {'x': [7.5450906374133595]}}, {'tid': 2, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [2]}, 'vals': {'x': [8.719183109128274]}}, {'tid': 3, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [3]}, 'vals': {'x': [-2.1563693255786776]}}, {'tid': 4, 'cmd': ('domain_attachment', 'FMinIter_Domain'), 'workdir': None, 'idxs': {'x': [4]}, 'vals': {'x': [-5.236775201011918]}}]

2. Regression using Scikit-Learn

In this section, we'll explain how we can use hyperopt with machine learning library scikit-learn. This framework will help the reader in deciding how it can be used with any other ML framework. The transition from scikit-learn to any other ML framework is pretty straightforward by following the below steps. We'll be using hyperopt to find optimal hyperparameters for a regression problem.

Load Dataset

We'll be using the Boston housing dataset available from scikit-learn. It has information houses in Boston like the number of bedrooms, the crime rate in the area, tax rate, etc. The target variable of the dataset is the median value of homes in 1000 dollars. As the target variable is a continuous variable, this will be a regression problem.

Below we have loaded our Boston hosing dataset as variable X and Y. The variable X has data for each feature and variable Y has target variable values. We have then divided the dataset into the train (80%) and test (20%) sets.

In [18]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[18]:
((404, 13), (102, 13), (404,), (102,))

Define Hyperparameters Search Space

Below we have declared hyperparameters search space for our example. We'll be using Ridge regression solver available from scikit-learn to solve the problem. We'll be trying to find the best values for three of its hyperparameters.

  • alpha
  • fit_intercept
  • solvers

We have declared search space as a dictionary. The alpha hyperparameter accepts continuous values whereas fit_intercept and solvers hyperparameters has list of fixed values. We have declared a dictionary where keys are hyperparameters names and values are calls to function from hp module which we discussed earlier. These functions are used to declare what values of hyperparameters will be sent to the objective function for evaluation.

In [19]:
intercepts = [True, False]
solvers = ["svd", "cholesky", "lsqr", "sag", "saga"]

search_space = {
    "alpha": hp.normal("alpha", 1, 5),
    "fit_intercept": hp.choice("fit_intercept", intercepts),
    "solver": hp.choice("solver", solvers)
}

Define Objective Function

Our objective function starts by creating Ridge solver with arguments given to the objective function. We then fit ridge solver on train data and predict labels for test data. We are then printing hyperparameters combination that was passed to the objective function. We also print the mean squared error on the test dataset. Our objective function returns MSE on test data which we want it to minimize for best results.

In [20]:
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

def objective(args):
    ridge_reg = Ridge(**args, random_state=123)

    ridge_reg.fit(X_train, Y_train)

    Y_pred = ridge_reg.predict(X_test)

    print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
    print("MSE : {}\n".format(mean_squared_error(Y_test, Y_pred)))

    return mean_squared_error(Y_test, Y_pred)

Optimize Objective Function (Minimize for Least MSE)

In this section, we have called fmin() function with the objective function, hyperparameters search space, and TPE algorithm for search. We have instructed the method to try 10 different trials of the objective function. We have also created Trials instance for tracking stats of trials.

We can notice from the output that it prints all hyperparameters combinations tried and their MSE as well.

In [21]:
trials_obj = hyperopt.Trials()

best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             trials=trials_obj,
                             max_evals=10)
Hyperparameters : {'alpha': 11.456909783038185, 'fit_intercept': False, 'solver': 'cholesky'}
MSE : 32.6780206950975

Hyperparameters : {'alpha': -6.022064763493154, 'fit_intercept': False, 'solver': 'saga'}
MSE : 40.0745463100764

Hyperparameters : {'alpha': 7.436267706835002, 'fit_intercept': False, 'solver': 'svd'}
MSE : 32.689769046138714

Hyperparameters : {'alpha': 1.5656653073238473, 'fit_intercept': True, 'solver': 'lsqr'}
MSE : 29.46296743296732

Hyperparameters : {'alpha': -3.592042958686301, 'fit_intercept': True, 'solver': 'svd'}
MSE : 32.02190509726122

Hyperparameters : {'alpha': -4.16209741657863, 'fit_intercept': True, 'solver': 'lsqr'}
MSE : 28.192485758346496

Hyperparameters : {'alpha': 7.529616236524469, 'fit_intercept': True, 'solver': 'saga'}
MSE : 29.357834183797227

Hyperparameters : {'alpha': 2.730093210771717, 'fit_intercept': True, 'solver': 'saga'}
MSE : 29.3466844072238

Hyperparameters : {'alpha': 8.423615538119535, 'fit_intercept': False, 'solver': 'svd'}
MSE : 32.68866941310808

Hyperparameters : {'alpha': 2.1400505222685284, 'fit_intercept': False, 'solver': 'cholesky'}
MSE : 32.65079545632969

100%|██████████| 10/10 [00:00<00:00, 18.36trial/s, best loss: 28.192485758346496]

Below we have printed the best results of the above experiment. Please make a note that in the case of hyperparameters with a fixed set of values, it returns the index of value from a list of values of hyperparameter. It returned index 0 for fit_intercept hyperparameter which points to value True if you check above in search space section. Same way, the index returned for hyperparameter solver is 2 which points to lsqr.

In [22]:
best_results
Out[22]:
{'alpha': -4.16209741657863, 'fit_intercept': 0, 'solver': 2}
In [23]:
best_trial = trials_obj.best_trial

best_trial
Out[23]:
{'state': 2,
 'tid': 5,
 'spec': None,
 'result': {'loss': 28.192485758346496, 'status': 'ok'},
 'misc': {'tid': 5,
  'cmd': ('domain_attachment', 'FMinIter_Domain'),
  'workdir': None,
  'idxs': {'alpha': [5], 'fit_intercept': [5], 'solver': [5]},
  'vals': {'alpha': [-4.16209741657863], 'fit_intercept': [0], 'solver': [2]}},
 'exp_key': None,
 'owner': None,
 'version': 0,
 'book_time': datetime.datetime(2021, 10, 7, 2, 15, 3, 417000),
 'refresh_time': datetime.datetime(2021, 10, 7, 2, 15, 3, 426000)}
In [24]:
print("Best MSE : {}".format(trials_obj.average_best_error()))
Best MSE : 28.192485758346496

Train and Evaluate Model with Best Hyperparameters

In this section, we have created Ridge model again with the best hyperparameters combination that we got using hyperopt. We have then trained the model on train data and evaluated it for MSE on both train and test data.

Please make a NOTE that we can save the trained model during the hyperparameters optimization process if the training process is taking a lot of time and we don't want to perform it again. We can include logic inside of the objective function which saves all different models that were tried so that we can later reuse the one which gave the best results by just loading weights.

In [25]:
alpha = best_results["alpha"]
fit_intercept = intercepts[best_results["fit_intercept"]]
solver = solvers[best_results["solver"]]

ridge = Ridge(alpha=alpha,
              fit_intercept=fit_intercept,
              solver=solver,
              random_state=123)

ridge.fit(X_train, Y_train)

Y_pred = ridge.predict(X_test)

print("Test  MSE : {}".format(mean_squared_error(Y_test, Y_pred)))
print("Train MSE : {}".format(mean_squared_error(Y_train, ridge.predict(X_train))))
Test  MSE : 28.192485758346496
Train MSE : 20.677107947815138

3. Classification using Scikit-Learn

In this section, we'll again explain how to use hyperopt with scikit-learn but this time we'll try it for classification problems. Also, we'll explain how we can create complicated search space through this example. We'll be using the wine dataset available from scikit-learn for this example. The wine dataset has the measurement of ingredients used in the creation of three different types of wine. The measurement of ingredients is the features of our dataset and wine type is the target variable.

Below we have loaded the wine dataset from scikit-learn and divided it into the train (80%) and test (20%) sets.

In [26]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_wine(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[26]:
((142, 13), (36, 13), (142,), (36,))

Define hyperparameters Search Space

We'll be using LogisticRegression solver for our problem hence we'll be declaring search space that tries different values of hyperparameters of it. We'll try to find the best values of the below-mentioned four hyperparameters for LogisticRegression which gives the best accuracy on our dataset.

  • fit_intercept
  • C
  • penalty
  • solver

The search space for this example is a little bit involved because some solver of LogisticRegression do not support all different penalties available. The saga solver supports penalties l1, l2, and elasticnet. The liblinear solver supports l1 and l2 penalties. The newton-cg and lbfgs solvers supports l2 penalty only.

As we want to try all solvers available and want to avoid failures due to penalty mismatch, we have created three different cases based on combinations. The hyperparameters fit_intercept and C are the same for all three cases hence our final search space consists of three key-value pairs (C, fit_intercept, and cases). The cases are further involved based on a combination of solver and penalty combinations. We have declared C using hp.uniform() method because it's a continuous feature. We want to try values in the range [1,5] for C. All other hyperparameters are declared using hp.choice() method as they are all categorical.

In [27]:
intercepts = [True, False]
solvers1 = ["newton-cg", "lbfgs", ]
solvers2 = ["liblinear", ]
solvers3 = ["saga", ]
penalties1 = ["l2", "none"]
penalties2 = ["l1", "l2"]
penalties3 = ["l1", "elasticnet", "l2"]

case1 = {
        "penalty1": hp.choice("penalty1", penalties1),
        "solver1": hp.choice("solver1", solvers1)
}
case2 = {
        "penalty2": hp.choice("penalty2", penalties2),
        "solver2": hp.choice("solver2", solvers2)
}
case3 = {
        "penalty3": hp.choice("penalty3", penalties3),
        "solver3": hp.choice("solver3", solvers3)
}

search_space = {
    "C": hp.uniform("C", 1, 5),
    "fit_intercept": hp.choice("fit_intercept", intercepts),
    "cases" : hp.choice("cases", [("case1", case1), ("case2", case2), ("case3", case3)])
}

Define Objective Function

The objective function starts by retrieving values of different hyperparameters. It uses conditional logic to retrieve values of hyperparameters penalty and solver. The value is decided based on the case. We then create LogisticRegression model using received values of hyperparameters and train it on a training dataset. We are then printing hyperparameters combination that was tried an accuracy of the model on the test dataset. At last, our objective function returns the value of accuracy multiplied by -1. The reason for multiplying by -1 is that during the optimization process value returned by the objective function is minimized. In order to increase accuracy, we have multiplied it by -1 so that it becomes negative and the optimization process tries to find as much negative value as possible.

In [28]:
from sklearn.linear_model import LogisticRegression

def objective(args):
    C = args["C"]
    fit_intercept = args["fit_intercept"]
    kwds = args["cases"]
    penalty = kwds[1]["penalty1"] if kwds[0] == "case1" else kwds[1]["penalty2"] if kwds[0] == "case2" else kwds[1]["penalty3"]
    solver = kwds[1]["solver1"] if kwds[0] == "case1" else kwds[1]["solver2"] if kwds[0] == "case2" else kwds[1]["solver3"]

    log_reg = LogisticRegression(C=C,
                                 fit_intercept=fit_intercept,
                                 penalty=penalty,
                                 solver=solver,
                                 l1_ratio=0.5,
                                 random_state=123)

    log_reg.fit(X_train, Y_train)

    print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
    print("Accuracy : {}\n".format(log_reg.score(X_test, Y_test))) ## This can be commented if not needed.

    return -1 * log_reg.score(X_test, Y_test)

Optimize Objective Function (Maximize for Highest Accuracy)

Below we have called fmin() function with objective function and search space declared earlier. We have instructed it to try 20 different combinations of hyperparameters on the objective function. We have also created Trials instance for tracking stats of the optimization process. We have used TPE algorithm for the hyperparameters optimization process.

In [29]:
trials_obj = hyperopt.Trials()

best_results = hyperopt.fmin(objective,
                             space=search_space,
                             algo=hyperopt.tpe.suggest,
                             trials=trials_obj,
                             max_evals=20)
Hyperparameters : {'C': 2.132667331628797, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 4.0164334560048385, 'cases': ('case3', {'penalty3': 'l1', 'solver3': 'saga'}), 'fit_intercept': True}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 2.1451046893043335, 'cases': ('case2', {'penalty2': 'l1', 'solver2': 'liblinear'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 3.5789654619971643, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 1.846165119583588, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 2.356022184620913, 'cases': ('case1', {'penalty1': 'none', 'solver1': 'lbfgs'}), 'fit_intercept': False}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 1.075729082181017, 'cases': ('case3', {'penalty3': 'elasticnet', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 4.1133711118733665, 'cases': ('case1', {'penalty1': 'l2', 'solver1': 'lbfgs'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 2.2293354795788574, 'cases': ('case2', {'penalty2': 'l1', 'solver2': 'liblinear'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 4.045078577616245, 'cases': ('case2', {'penalty2': 'l1', 'solver2': 'liblinear'}), 'fit_intercept': False}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 3.0959507645146225, 'cases': ('case1', {'penalty1': 'l2', 'solver1': 'lbfgs'}), 'fit_intercept': False}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 1.0345124154198277, 'cases': ('case2', {'penalty2': 'l2', 'solver2': 'liblinear'}), 'fit_intercept': True}
Accuracy : 1.0

Hyperparameters : {'C': 4.341182808917706, 'cases': ('case2', {'penalty2': 'l2', 'solver2': 'liblinear'}), 'fit_intercept': False}
Accuracy : 1.0

Hyperparameters : {'C': 4.226174618434312, 'cases': ('case1', {'penalty1': 'none', 'solver1': 'newton-cg'}), 'fit_intercept': False}
Accuracy : 0.9166666666666666

Hyperparameters : {'C': 4.992038571459227, 'cases': ('case1', {'penalty1': 'none', 'solver1': 'newton-cg'}), 'fit_intercept': True}
Accuracy : 0.9166666666666666

Hyperparameters : {'C': 1.4325225400837742, 'cases': ('case3', {'penalty3': 'elasticnet', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 1.7733695282704844, 'cases': ('case1', {'penalty1': 'l2', 'solver1': 'lbfgs'}), 'fit_intercept': True}
Accuracy : 0.9722222222222222

Hyperparameters : {'C': 3.285207336481444, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': False}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 1.0086786647760757, 'cases': ('case3', {'penalty3': 'l2', 'solver3': 'saga'}), 'fit_intercept': True}
Accuracy : 0.6944444444444444

Hyperparameters : {'C': 3.3852032972891606, 'cases': ('case3', {'penalty3': 'elasticnet', 'solver3': 'saga'}), 'fit_intercept': True}
Accuracy : 0.6944444444444444

100%|██████████| 20/20 [00:00<00:00, 25.30trial/s, best loss: -1.0]

In this section, we have printed the results of the optimization process. We have printed the best hyperparameters setting and accuracy of the model. We have multiplied value returned by method average_best_error() with -1 to calculate accuracy.

We have then constructed an exact dictionary of hyperparameters that gave the best accuracy.

In [30]:
print("Best Hyperparameters Settings : {}".format(best_results))
print("\nBest Accuracy : {}".format(-1 * trials_obj.average_best_error()))
Best Hyperparameters Settings : {'C': 1.0345124154198277, 'cases': 1, 'fit_intercept': 0, 'penalty2': 1, 'solver2': 0}

Best Accuracy : 1.0
In [31]:
C = best_results["C"]
fit_intercept = intercepts[best_results["fit_intercept"]]
if best_results["cases"] == 0:
    penalty = penalties1[best_results["penalty1"]]
    solver = solvers1[best_results["solver1"]]
elif best_results["cases"] == 1:
    penalty = penalties2[best_results["penalty2"]]
    solver = solvers2[best_results["solver2"]]
elif best_results["cases"] == 2:
    penalty = penalties3[best_results["penalty3"]]
    solver = solvers3[best_results["solver3"]]

print("Best Hyperparameters Settings : {}".format({"C":C,
                                                   "penalty": penalty,
                                                   "fit_intercept": fit_intercept,
                                                   "solver":solver,
                                                  }))
Best Hyperparameters Settings : {'C': 1.0345124154198277, 'penalty': 'l2', 'fit_intercept': True, 'solver': 'liblinear'}

Train and Evaluate Model with Best Hyperparameters

In this section, we have again created LogisticRegression model with the best hyperparameters setting that we got through an optimization process. We have then trained it on a training dataset and evaluated accuracy on both train and test datasets for verification purposes.

In [32]:
log_reg = LogisticRegression(C=C,
                             penalty=penalty,
                             fit_intercept=fit_intercept,
                             solver=solver,
                             random_state=123)

log_reg.fit(X_train, Y_train)

print("Test  Accuracy : {}".format(log_reg.score(X_test, Y_test)))
print("Train Accuracy : {}".format(log_reg.score(X_train, Y_train)))
Test  Accuracy : 1.0
Train Accuracy : 0.971830985915493

This ends our small tutorial explaining how to use hyperopt to find the best hyperparameters settings for our ML model. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki