The complexity of machine learning models is increasing day by day due to the rise of deep learning and deep neural networks. This has given rise to a number of parameters for the ML model which is generally referred to as hyperparameters. Each of the model hyperparameters can accept a range of values and we don't know upfront which combinations of these values will give us the best results. The common approach used till now was to grid search through all possible combinations of values of hyperparameters. Though this approach works well with small models and datasets, it becomes increasingly time-consuming with real-world problems with billions of samples and ML models with lots of hyperparameters.

To solve this problem, Python has a library named **hyperopt** which is used to optimize hyperparameters combinations and try only those values which give the best results ignoring others. The **hyperopt** looks for hyperparameters combinations based on internal algorithms that search hyperparameters space in places where the good results are found initially. **Hyperopt** also let us run trials of finding the best hyperparameters settings in parallel using **MongoDB and Spark**. This lets us scale the process of finding the best hyperparameters on more than one computer and cores. As a part of this tutorial, we'll explain how we can use **hyperopt** to optimize hyperparameters that give the best results for our given ML model. We'll be explaining the usage of it with scikit-learn models to make things simpler and easy to understand.

- Create an
**Objective Function**.- This step requires us to create a function that creates an ML model, fits it on train data, and evaluates it on validation or test set returning some metric (MSE, MAE, Accuracy, etc.) that captures the performance of the model. We want to minimize/maximize the metric value returned through this function.

- Create
**search space of hyperparameters**.- This is the step where we declare a list of hyperparameters and a range of values for each that we want to try.

**Minimize objective function**by trying different hyperparameters values from search space.- This is the step where we give different settings of hyperparameters to the objective function and return metric value for each setting.
**Hyperopt**internally uses one of the below-mentioned algorithms to search hyperparameters space to find the best settings of hyperparameters. These algorithms search for space for the best results.

- This is the step where we give different settings of hyperparameters to the objective function and return metric value for each setting.

We'll be explaining how to perform these steps using the API of **Hyperopt** now.

Below we have sections of the tutorial to give an overview of the material covered in the tutorial.

- Minimize Simple Line Formula
- Simple Example with Default Arguments
- Define Objective Function
- Define Search Space
- Minimize Objective Function
- Print Best Results

- Trials Object for Tracking Stats
- Useful Methods and Attributes of Trials Object

- Simple Example with Default Arguments
- Regression using Scikit-Learn
- Load Dataset
- Define Hyperparameters Search Space
- Define Objective Function
- Optimize Objective Function (Minimize for Least MSE)
- Print Best Results
- Train and Evaluate Model with Best Hyperparameters

- Classification using Scikit-Learn
- Load Dataset
- Define Hyperparameters Search Space
- Define Objective Function
- Optimize Objective Function (Maximize for Highest Accuracy)
- Print Best Results
- Train and Evaluate Model with Best Hyperparameters

- References

We'll start our tutorial by importing the necessary libraries.

In [1]:

```
import hyperopt
import warnings
warnings.filterwarnings("ignore")
print("Hyperopt Version : {}".format(hyperopt.__version__))
```

As a part of this section, we'll explain how to use **hyperopt** to minimize the simple line formula. We'll be trying to find a minimum value where line equation **5x-21** will be zero. We can easily calculate that by setting the equation to zero. But we want that **hyperopt** tries a list of different values of **x** and finds out at which value the line equation evaluates to zero. This simple example will help us understand how we can use **hyperopt**. We'll then explain usage with scikit-learn models from the next example.

The first step will be to define an objective function which returns a metric value that we want to minimize. **Hyperopt** will give different values to this function and return metric value after each evaluation. This metric value will help it make a decision on which values of hyperparameter to try next.

Below we have defined an objective function with a single parameter **x**. It returns a value that we get after evaluating line formula **5x - 21**. We have put line formula inside of python function **abs()** so that it returns value **>=0**. This way we can be sure that the minimum metric value returned will be 0. If we don't use **abs()** function to surround the line formula then negative values of **x** can keep decreasing metric value till negative infinity.

In [2]:

```
def objective(x):
return abs(5*x - 21)
```

The second step will be to define search space for hyperparameters. The search space refers to the name of hyperparameters and their range of values that we want to give to the objective function for evaluation.

In this simple example, we have only one hyperparameter named **x** whose different values will be given to the objective function in order to minimize the line formula.

**Hyperopt** requires us to declare search space using a list of functions it provides. It has a module named **hp** that provides a bunch of methods that can be used to declare search space for continuous (integers & floats) and categorical variables.

Below we have listed few methods and their definitions that we'll be using as a part of this tutorial

**hp.uniform(label, low, high)**- This method accepts string label as the first parameter specifying the name of the hyperparameter. The next two parameters are**low**and**high**values specifying the range from which we want to select different values. It retrieves values from a uniform distribution. It's used for continuous variables.**hp.normal(label, low, high)**- This method also accepts a string label specifying hyperparameter name as the first argument. The**low**and**high**values specify the range from which to try different values. It retrieves values from a normal distribution. It's used for continuous variables.**hp.choice(label, options)**- This method like the other two accepts string label naming hyperparameters. The second parameter is a list of values. The algorithm will select different values from this list. It's commonly used for categorical variables.

There are other methods available from **hp** module like **lognormal()**, **loguniform()**, **pchoice()**, etc which can be used for trying log and probability-based values. It's not included in this tutorial to keep it simple.

As we have only one hyperparameter for our line formula function, we have declared a search space that tries different values of it. We have declared search space using **uniform()** function with range **[-10,10]**. We'll explain in our upcoming examples, how we can create search space with multiple hyperparameters.

In [9]:

```
from hyperopt import hp
search_space = hp.uniform("x", -10, 10)
search_space
```

Out[9]:

Our last step will be to use an algorithm that tries different values of hyperparameter from search space and evaluates objective function using those values. It tries to minimize the return value of an objective function.

**Hyperopt** provides a function named **fmin()** for this purpose. We need to provide it **objective function**, **search space**, and **algorithm** which tries different combinations of hyperparameters. It'll then use this algorithm to minimize the value returned by the objective function based on search space in less time. It'll look at places where the objective function is giving minimum value the majority of the time and explore hyperparameter values in those places.

**fmin(fn,space,algo,max_evals=9223372036854775807,timeout=None,loss_threshold=None,trials=None,rstate=None,verbose=True,return_argmin=True,show_progressbar=True,early_stop_fn=None)**- This function takes as input objective function, hyperparameters search space and search algorithm as input. It then tries different combinations of hyperparameters until it finds minimum value of objective function. It'll keep running and trying different values if we don't specify to stop it using one of the**max_eval**,**timeout**or**loss_threshold**parameter values.- The
**fn**parameter accepts a callable which is our objective function. - The
**space**parameter accepts search space declared using methods from**hp**module. - The
**algo**parameter accepts one of the below-mentioned three options specifying search algorithm.**hyperopt.rand.suggest**- It'll try random values of hyperparameters.**hyperopt.tpe.suggest**- It'll try values of hyperparameters using**Tree Parzen Estimators - TPE**algorithm.**hyperopt.atpe.suggest**- It'll try values of hyperparameters using**Adaptive TPE**algorithm.

- The
**max_vals**parameter accepts integer value specifying how many different trials of objective function should be executed it. It'll try that many values of hyperparameters combination on it. - The
**timeout**parameter accepts integer values specifying the function to timeout after that many second has passed. - The
**loss_threshold**parameter accepts a float value. Our objective function returns a metric value which is generally loss for ML algorithms like MSE for regression problems. If some combination of hyperparameters causes an objective function to return a value that is less than this parameter value then it'll stop the algorithm. - The
**trials**parameter accepts an instance of**Trials**class. This class is generally used to store statistics of different trials (A single trial refers to a single combination of hyperparameters tried on an objective function). - The
**rstate**parameter accepts**numpy.RandomState**. This is used for reproducibility. - The
**return_argmin**parameter is**True**by default and causes**fmin()**to return a dictionary which has hyperparameters combination that gave best results i.e least value for objective function. - The
**early_stop_fn**accepts a callable which is executed after each trial. The callable takes as input result of the objective function and returns**True**if the**fmin()**function should stop else returns**False**instructing to continue with more trials.

- The

Below we have executed **fmin()** with our objective function, earlier declared search space, and **TPE** algorithm to search hyperparameters search space. We have instructed it to try **100** different values of hyperparameter **x** using **max_evals** parameter. The function returns a dictionary of best results i.e hyperparameters which gave the least value for the objective function. The **TPE** algorithm will try different values of hyperparameter **x** in the range **[-10,10]** evaluating line formula each time. It'll look where objective values are decreasing in the range and will try different values near those values to find the best results.

In [10]:

```
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
max_evals=100)
```

Below we have printed the best hyperparameter value that returned the minimum value from the objective function. We have then evaluated the value of the line formula as well using that hyperparameter value. We can notice from the result that it seems to have done a good job in finding the value of **x** which minimizes line formula **5x - 21** though it's not best. If we try more than **100** trials then it might further improve results.

In [11]:

```
best_results
```

Out[11]:

In [12]:

```
obj_func_res = abs(5*best_results["x"] - 21)
print("Value of Function 5x-21 at best value is : {}".format(obj_func_res))
```

When we executed **fmin()** function earlier which tried different values of parameter **x** on objective function. After trying 100 different values of **x**, it returned the value of **x** using which objective function returned the least value.

Though function tried 100 different values, we don't have information about which values were tried, objective values during trials, etc. In short, we don't have any stats about different trials.

**Hyperopt** lets us record stats of our optimization process using **Trials** instance. It'll record different values of hyperparameters tried, objective function values during each trial, time of trials, state of the trial (success/failure), etc. We just need to create an instance of **Trials** and give it to **trials** parameter of **fmin()** function and it'll record stats of our optimization process.

Below we have declared **Trials** instance and called **fmin()** function again with this object. We have again tried 100 trials on the objective function.

In [13]:

```
trials_obj = hyperopt.Trials()
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
max_evals=100,
trials=trials_obj
)
```

The **Trials** instance has a list of attributes and methods which can be explored to get an idea about individual trials.

The **Trials** instance has an attribute named **trials** which has a list of dictionaries where each dictionary has stats about one trial of the objective function. Below we have printed the content of the first trial. We can notice from the contents that it has information like id, loss, status, **x** value, datetime, etc.

In [14]:

```
trials_obj.trials[0]
```

Out[14]:

Below we have retrieved the objective function value from the first trial available through **trials** attribute of **Trial** instance. We have a printed loss present in it. We have then retrieved **x** value of this trial and evaluated our line formula to verify loss value with it. We can notice that both are the same.

In [15]:

```
first_trial = trials_obj.trials[0]
print("Loss Value of First Trial : {}".format(first_trial['result']['loss']))
loss = abs(5*first_trial['misc']['vals']['x'][0] - 21)
print("Loss Value of First Trial : {}".format(loss))
```

The **Trial** object has an attribute named **best_trial** which returns a dictionary of the trial which gave the best results i.e. least value from an objective function (least loss). We have printed details of the best trial. We have then printed loss through best trial and verified it as well by putting **x** value of the best trial in our line formula.

In [16]:

```
best_trial = trials_obj.best_trial
best_trial
```

Out[16]:

In [17]:

```
print("Loss Value of Best Trial : {}".format(best_trial['result']['loss']))
loss = abs(5*best_trial['misc']['vals']['x'][0] - 21)
print("Loss Value of Best Trial : {}".format(loss))
```

In this section, we'll explain the usage of some useful attributes and methods of **Trial** object.

**results**- This attribute returns a list of dictionaries. Each dictionary has details about the results of an individual trial. It has loss value and status of trial in it.**vals**- This attribute returns a list of values of hyperparameters that were tried.

**average_best_error()**- This function returns average error of all trials of the experiment.**statuses()**- This method returns list of status values of trials.**losses()**- This method returns list of losses.**miscs()**- This method returns details like ids, working directory, values of hyperparameters, index of trial, etc.

Below we have printed values of useful attributes and methods of **Trial** instance for explanation purposes.

In [16]:

```
results = trials_obj.results
print("Total Results : {}".format(len(results)))
print("Best Result : {}".format(trials_obj.average_best_error()))
print("First Few Results : ")
results[:5]
```

Out[16]:

In [17]:

```
print("First Few Status : {}".format(trials_obj.statuses()[:5]))
print("\nFirst Few X Values : {}".format(trials_obj.vals['x'][:5]))
print("\nFirst Few Losses : {}".format(trials_obj.losses()[:5]))
print("\nFirst Few Miscs : {}".format(trials_obj.miscs[:5]))
```

In this section, we'll explain how we can use **hyperopt** with machine learning library scikit-learn. This framework will help the reader in deciding how it can be used with any other ML framework. The transition from scikit-learn to any other ML framework is pretty straightforward by following the below steps. We'll be using **hyperopt** to find optimal hyperparameters for a regression problem.

We'll be using the Boston housing dataset available from scikit-learn. It has information houses in Boston like the number of bedrooms, the crime rate in the area, tax rate, etc. The target variable of the dataset is the median value of homes in 1000 dollars. As the target variable is a continuous variable, this will be a regression problem.

Below we have loaded our Boston hosing dataset as variable **X** and **Y**. The variable **X** has data for each feature and variable **Y** has target variable values. We have then divided the dataset into the train (80%) and test (20%) sets.

In [18]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[18]:

Below we have declared hyperparameters search space for our example. We'll be using **Ridge** regression solver available from scikit-learn to solve the problem. We'll be trying to find the best values for three of its hyperparameters.

**alpha****fit_intercept****solvers**

We have declared search space as a dictionary. The **alpha** hyperparameter accepts continuous values whereas **fit_intercept** and **solvers** hyperparameters has list of fixed values. We have declared a dictionary where keys are hyperparameters names and values are calls to function from **hp** module which we discussed earlier. These functions are used to declare what values of hyperparameters will be sent to the objective function for evaluation.

In [19]:

```
intercepts = [True, False]
solvers = ["svd", "cholesky", "lsqr", "sag", "saga"]
search_space = {
"alpha": hp.normal("alpha", 1, 5),
"fit_intercept": hp.choice("fit_intercept", intercepts),
"solver": hp.choice("solver", solvers)
}
```

Our objective function starts by creating **Ridge** solver with arguments given to the objective function. We then fit ridge solver on train data and predict labels for test data. We are then printing hyperparameters combination that was passed to the objective function. We also print the mean squared error on the test dataset. Our objective function returns **MSE** on test data which we want it to minimize for best results.

In [20]:

```
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
def objective(args):
ridge_reg = Ridge(**args, random_state=123)
ridge_reg.fit(X_train, Y_train)
Y_pred = ridge_reg.predict(X_test)
print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
print("MSE : {}\n".format(mean_squared_error(Y_test, Y_pred)))
return mean_squared_error(Y_test, Y_pred)
```

In this section, we have called **fmin()** function with the objective function, hyperparameters search space, and **TPE** algorithm for search. We have instructed the method to try 10 different trials of the objective function. We have also created **Trials** instance for tracking stats of trials.

We can notice from the output that it prints all hyperparameters combinations tried and their **MSE** as well.

In [21]:

```
trials_obj = hyperopt.Trials()
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
trials=trials_obj,
max_evals=10)
```

Below we have printed the best results of the above experiment. Please make a note that in the case of hyperparameters with a fixed set of values, it returns the index of value from a list of values of hyperparameter. It returned index **0** for **fit_intercept** hyperparameter which points to value **True** if you check above in search space section. Same way, the index returned for hyperparameter **solver** is 2 which points to **lsqr**.

In [22]:

```
best_results
```

Out[22]:

In [23]:

```
best_trial = trials_obj.best_trial
best_trial
```

Out[23]:

In [24]:

```
print("Best MSE : {}".format(trials_obj.average_best_error()))
```

In this section, we have created **Ridge** model again with the best hyperparameters combination that we got using **hyperopt**. We have then trained the model on train data and evaluated it for **MSE** on both train and test data.

Please make a **NOTE** that we can save the trained model during the hyperparameters optimization process if the training process is taking a lot of time and we don't want to perform it again. We can include logic inside of the objective function which saves all different models that were tried so that we can later reuse the one which gave the best results by just loading weights.

In [25]:

```
alpha = best_results["alpha"]
fit_intercept = intercepts[best_results["fit_intercept"]]
solver = solvers[best_results["solver"]]
ridge = Ridge(alpha=alpha,
fit_intercept=fit_intercept,
solver=solver,
random_state=123)
ridge.fit(X_train, Y_train)
Y_pred = ridge.predict(X_test)
print("Test MSE : {}".format(mean_squared_error(Y_test, Y_pred)))
print("Train MSE : {}".format(mean_squared_error(Y_train, ridge.predict(X_train))))
```

In this section, we'll again explain how to use **hyperopt** with scikit-learn but this time we'll try it for classification problems. Also, we'll explain how we can create complicated search space through this example. We'll be using the wine dataset available from scikit-learn for this example. The wine dataset has the measurement of ingredients used in the creation of three different types of wine. The measurement of ingredients is the features of our dataset and wine type is the target variable.

Below we have loaded the wine dataset from scikit-learn and divided it into the train (80%) and test (20%) sets.

In [26]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_wine(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[26]:

We'll be using **LogisticRegression** solver for our problem hence we'll be declaring search space that tries different values of hyperparameters of it. We'll try to find the best values of the below-mentioned four hyperparameters for **LogisticRegression** which gives the best accuracy on our dataset.

**fit_intercept****C****penalty****solver**

The search space for this example is a little bit involved because some **solver** of **LogisticRegression** do not support all different **penalties** available. The **saga** solver supports penalties **l1, l2, and elasticnet**. The **liblinear** solver supports **l1 and l2** penalties. The **newton-cg and lbfgs** solvers supports **l2** penalty only.

As we want to try all solvers available and want to avoid failures due to penalty mismatch, we have created three different cases based on combinations. The hyperparameters **fit_intercept** and **C** are the same for all three cases hence our final search space consists of three key-value pairs (**C, fit_intercept, and cases**). The **cases** are further involved based on a combination of **solver** and **penalty** combinations. We have declared **C** using **hp.uniform()** method because it's a continuous feature. We want to try values in the range **[1,5]** for **C**. All other hyperparameters are declared using **hp.choice()** method as they are all categorical.

In [27]:

```
intercepts = [True, False]
solvers1 = ["newton-cg", "lbfgs", ]
solvers2 = ["liblinear", ]
solvers3 = ["saga", ]
penalties1 = ["l2", "none"]
penalties2 = ["l1", "l2"]
penalties3 = ["l1", "elasticnet", "l2"]
case1 = {
"penalty1": hp.choice("penalty1", penalties1),
"solver1": hp.choice("solver1", solvers1)
}
case2 = {
"penalty2": hp.choice("penalty2", penalties2),
"solver2": hp.choice("solver2", solvers2)
}
case3 = {
"penalty3": hp.choice("penalty3", penalties3),
"solver3": hp.choice("solver3", solvers3)
}
search_space = {
"C": hp.uniform("C", 1, 5),
"fit_intercept": hp.choice("fit_intercept", intercepts),
"cases" : hp.choice("cases", [("case1", case1), ("case2", case2), ("case3", case3)])
}
```

The objective function starts by retrieving values of different hyperparameters. It uses conditional logic to retrieve values of hyperparameters **penalty** and **solver**. The value is decided based on the case. We then create **LogisticRegression** model using received values of hyperparameters and train it on a training dataset. We are then printing hyperparameters combination that was tried an accuracy of the model on the test dataset. At last, our objective function returns the value of accuracy multiplied by **-1**. The reason for multiplying by **-1** is that during the optimization process value returned by the objective function is minimized. In order to increase accuracy, we have multiplied it by **-1** so that it becomes negative and the optimization process tries to find as much negative value as possible.

In [28]:

```
from sklearn.linear_model import LogisticRegression
def objective(args):
C = args["C"]
fit_intercept = args["fit_intercept"]
kwds = args["cases"]
penalty = kwds[1]["penalty1"] if kwds[0] == "case1" else kwds[1]["penalty2"] if kwds[0] == "case2" else kwds[1]["penalty3"]
solver = kwds[1]["solver1"] if kwds[0] == "case1" else kwds[1]["solver2"] if kwds[0] == "case2" else kwds[1]["solver3"]
log_reg = LogisticRegression(C=C,
fit_intercept=fit_intercept,
penalty=penalty,
solver=solver,
l1_ratio=0.5,
random_state=123)
log_reg.fit(X_train, Y_train)
print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
print("Accuracy : {}\n".format(log_reg.score(X_test, Y_test))) ## This can be commented if not needed.
return -1 * log_reg.score(X_test, Y_test)
```

Below we have called **fmin()** function with objective function and search space declared earlier. We have instructed it to try 20 different combinations of hyperparameters on the objective function. We have also created **Trials** instance for tracking stats of the optimization process. We have used **TPE** algorithm for the hyperparameters optimization process.

In [29]:

```
trials_obj = hyperopt.Trials()
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
trials=trials_obj,
max_evals=20)
```

In this section, we have printed the results of the optimization process. We have printed the best hyperparameters setting and accuracy of the model. We have multiplied value returned by method **average_best_error()** with **-1** to calculate accuracy.

We have then constructed an exact dictionary of hyperparameters that gave the best accuracy.

In [30]:

```
print("Best Hyperparameters Settings : {}".format(best_results))
print("\nBest Accuracy : {}".format(-1 * trials_obj.average_best_error()))
```

In [31]:

```
C = best_results["C"]
fit_intercept = intercepts[best_results["fit_intercept"]]
if best_results["cases"] == 0:
penalty = penalties1[best_results["penalty1"]]
solver = solvers1[best_results["solver1"]]
elif best_results["cases"] == 1:
penalty = penalties2[best_results["penalty2"]]
solver = solvers2[best_results["solver2"]]
elif best_results["cases"] == 2:
penalty = penalties3[best_results["penalty3"]]
solver = solvers3[best_results["solver3"]]
print("Best Hyperparameters Settings : {}".format({"C":C,
"penalty": penalty,
"fit_intercept": fit_intercept,
"solver":solver,
}))
```

In this section, we have again created **LogisticRegression** model with the best hyperparameters setting that we got through an optimization process. We have then trained it on a training dataset and evaluated accuracy on both train and test datasets for verification purposes.

In [32]:

```
log_reg = LogisticRegression(C=C,
penalty=penalty,
fit_intercept=fit_intercept,
solver=solver,
random_state=123)
log_reg.fit(X_train, Y_train)
print("Test Accuracy : {}".format(log_reg.score(X_test, Y_test)))
print("Train Accuracy : {}".format(log_reg.score(X_train, Y_train)))
```

This ends our small tutorial explaining how to use **hyperopt** to find the best hyperparameters settings for our ML model. Please feel free to let us know your views in the comments section.

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs