The complexity of machine learning models is increasing day by day due to the rise of deep learning and deep neural networks. This has given rise to a number of parameters for the ML model which are generally referred to as hyperparameters.
Hyperparameters tuning also referred to as fine-tuning sometimes is a process of finding hyperparameters combination for ML / DL Model that gives best results (Global optima) in minimum amount of time.
ML Model trained with Hyperparameters combination found using this process generally gives best results compared to all other combinations. It gives best results for ML evaluation metrics. It gives least value for loss function.
ML model can accept a wide range of hyperparameters combinations and we don't know upfront which combination will give us the best results. Hence, we need to try few to find best performing one.
The common approach used till now was to grid search through all possible combinations of values of hyperparameters.
Though this approach works well with small models and datasets, it becomes increasingly time-consuming with real-world problems with billions of examples and ML models with lots of hyperparameters.
Python has bunch of libraries (Optuna, Hyperopt, Scikit-Optimize, bayes_opt, etc) for Hyperparameters tuning. Hyperopt is one such library that let us try different hyperparameters combinations to find best results in less amount of time.
The hyperopt looks for hyperparameters combinations based on internal algorithms (Random Search | Tree of Parzen Estimators (TPE) | Adaptive TPE) that search hyperparameters space in places where the good results are found initially.
Hyperopt also lets us run trials of finding the best hyperparameters settings in parallel using MongoDB and Spark. This lets us scale the process of finding the best hyperparameters on more than one computer and cores.
As a part of this tutorial, we have explained how to use Python library hyperopt for 'hyperparameters tuning' which can improve performance of ML Models. Tutorial provides a simple guide to use "hyperopt" with scikit-learn ML models to make things simpler and easy to understand.
Tutorial starts by optimizing parameters of a simple line formula to get individuals familiar with "hyperopt" library. Then, it explains how to use "hyperopt" with scikit-learn regression and classification models. We have also listed steps for using "hyperopt" at the beginning.
Below we have listed important sections of the tutorial to give an overview of the material covered.
NOTE: You can skip first section where we have explained the usage of "hyperopt" with simple line formula if you are in hurry. That section has many definitions. You can refer to it later as well. All sections are almost independent and you can go through any of them directly.
We'll start our tutorial by importing the necessary Python libraries.
import hyperopt
import warnings
warnings.filterwarnings("ignore")
print("Hyperopt Version : {}".format(hyperopt.__version__))
The first two steps can be performed in any order.
Now, We'll be explaining how to perform these steps using the API of Hyperopt.
As a part of this section, we'll explain how to use hyperopt to minimize the simple line formula. We'll be trying to find a minimum value where line equation 5x-21 will be zero.
We can easily calculate that by setting the equation to zero. But we want that hyperopt tries a list of different values of x and finds out at which value the line equation evaluates to zero.
This simple example will help us understand how we can use hyperopt. We'll then explain usage with scikit-learn models from the next example.
NOTE: Please feel free to skip this section if you are in hurry and want to learn how to use "hyperopt" with ML models. This section explains usage of "hyperopt" with simple line formula. It has quite theoretical sections. You can refer this section for theories when you have any doubt going through other sections. If you have enough time then going through this section will prepare you well with concepts.
The first step will be to define an objective function which returns a loss or metric that we want to minimize. Hyperopt will give different hyperparameters values to this function and return value after each evaluation. This value will help it make a decision on which values of hyperparameter to try next.
Below we have defined an objective function with a single parameter x. It returns a value that we get after evaluating line formula 5x - 21.
We have put line formula inside of python function abs() so that it returns value >=0. This way we can be sure that the minimum metric value returned will be 0. If we don't use abs() function to surround the line formula then negative values of x can keep decreasing metric value till negative infinity.
def objective(x):
return abs(5*x - 21)
The second step will be to define search space for hyperparameters. The search space refers to the name of hyperparameters and their range of values that we want to give to the objective function for evaluation.
In this simple example, we have only one hyperparameter named x whose different values will be given to the objective function in order to minimize the line formula.
Hyperopt requires us to declare search space using a list of functions it provides. It has a module named 'hp' that provides a bunch of methods that can be used to declare search space for continuous (integers & floats) and categorical variables.
Below we have listed few methods and their definitions that we'll be using as a part of this tutorial
There are other methods available from hp module like lognormal(), loguniform(), pchoice(), etc which can be used for trying log and probability-based values. It's not included in this tutorial to keep it simple.
As we have only one hyperparameter for our line formula function, we have declared a search space that tries different values of it. We have declared search space using uniform() function with range [-10,10]. We'll explain in our upcoming examples, how we can create search space with multiple hyperparameters.
from hyperopt import hp
search_space = hp.uniform("x", -10, 10)
search_space
Our last step will be to use an algorithm that tries different values of hyperparameter from search space and evaluates objective function using those values. It tries to minimize the return value of an objective function.
Hyperopt provides a function named 'fmin()' for this purpose. We need to provide it objective function, search space, and algorithm which tries different combinations of hyperparameters.
It'll then use this algorithm to minimize the value returned by the objective function based on search space in less time.
It'll look at places where the objective function is giving minimum value the majority of the time and explore hyperparameter values in those places.
Below we have executed fmin() with our objective function, earlier declared search space, and TPE algorithm to search hyperparameters search space. We have instructed it to try 100 different values of hyperparameter x using max_evals parameter.
The function returns a dictionary of best results i.e hyperparameters which gave the least value for the objective function.
The TPE algorithm tries different values of hyperparameter x in the range [-10,10] evaluating line formula each time. It'll look where objective values are decreasing in the range and will try different values near those values to find the best results.
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
max_evals=100)
Below we have printed the best hyperparameter value that returned the minimum value from the objective function.
We have then evaluated the value of the line formula as well using that hyperparameter value.
We can notice from the result that it seems to have done a good job in finding the value of x which minimizes line formula 5x - 21 though it's not best. If we try more than 100 trials then it might further improve results.
best_results
obj_func_res = abs(5*best_results["x"] - 21)
print("Value of Function 5x-21 at best value is : {}".format(obj_func_res))
When we executed 'fmin()' function earlier which tried different values of parameter x on objective function. After trying 100 different values of x, it returned the value of x using which objective function returned the least value.
Though function tried 100 different values, we don't have information about which values were tried, objective values during trials, etc. In short, we don't have any stats about different trials.
NOTE: Each individual hyperparameters combination given to objective function is counted as one trial.
Hyperopt lets us record stats of our optimization process using Trials instance. It'll record different values of hyperparameters tried, objective function values during each trial, time of trials, state of the trial (success/failure), etc. We just need to create an instance of Trials and give it to trials parameter of fmin() function and it'll record stats of our optimization process.
Below we have declared Trials instance and called fmin() function again with this object. We have again tried 100 trials on the objective function.
trials_obj = hyperopt.Trials()
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
max_evals=100,
trials=trials_obj
)
The Trials instance has a list of attributes and methods which can be explored to get an idea about individual trials.
The Trials instance has an attribute named trials which has a list of dictionaries where each dictionary has stats about one trial of the objective function. Below we have printed the content of the first trial. We can notice from the contents that it has information like id, loss, status, x value, datetime, etc.
trials_obj.trials[0]
Below we have retrieved the objective function value from the first trial available through trials attribute of Trial instance. We have a printed loss present in it. We have then retrieved x value of this trial and evaluated our line formula to verify loss value with it. We can notice that both are the same.
first_trial = trials_obj.trials[0]
print("Loss Value of First Trial : {}".format(first_trial['result']['loss']))
loss = abs(5*first_trial['misc']['vals']['x'][0] - 21)
print("Loss Value of First Trial : {}".format(loss))
The Trial object has an attribute named best_trial which returns a dictionary of the trial which gave the best results i.e. least value from an objective function (least loss). We have printed details of the best trial. We have then printed loss through best trial and verified it as well by putting x value of the best trial in our line formula.
best_trial = trials_obj.best_trial
best_trial
print("Loss Value of Best Trial : {}".format(best_trial['result']['loss']))
loss = abs(5*best_trial['misc']['vals']['x'][0] - 21)
print("Loss Value of Best Trial : {}".format(loss))
In this section, we'll explain the usage of some useful attributes and methods of Trial object.
Below we have printed values of useful attributes and methods of Trial instance for explanation purposes.
results = trials_obj.results
print("Total Results : {}".format(len(results)))
print("Best Result : {}".format(trials_obj.average_best_error()))
print("First Few Results : ")
results[:5]
print("First Few Status : {}".format(trials_obj.statuses()[:5]))
print("\nFirst Few X Values : {}".format(trials_obj.vals['x'][:5]))
print("\nFirst Few Losses : {}".format(trials_obj.losses()[:5]))
print("\nFirst Few Miscs : {}".format(trials_obj.miscs[:5]))
In this section, we'll explain how we can use hyperopt with machine learning library scikit-learn. This framework will help the reader in deciding how it can be used with any other ML framework.
The transition from scikit-learn to any other ML framework is pretty straightforward by following the below steps.
We'll be using hyperopt to find optimal hyperparameters for a regression problem.
We'll be using the Boston housing dataset available from scikit-learn. It has information houses in Boston like the number of bedrooms, the crime rate in the area, tax rate, etc. The target variable of the dataset is the median value of homes in 1000 dollars. As the target variable is a continuous variable, this will be a regression problem.
Below we have loaded our Boston hosing dataset as variable X and Y. The variable X has data for each feature and variable Y has target variable values. We have then divided the dataset into the train (80%) and test (20%) sets.
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Below we have declared hyperparameters search space for our example. We'll be using Ridge regression solver available from scikit-learn to solve the problem. We'll be trying to find the best values for three of its hyperparameters.
We have declared search space as a dictionary. The alpha hyperparameter accepts continuous values whereas fit_intercept and solvers hyperparameters has list of fixed values.
We have declared a dictionary where keys are hyperparameters names and values are calls to function from hp module which we discussed earlier. These functions are used to declare what values of hyperparameters will be sent to the objective function for evaluation.
intercepts = [True, False]
solvers = ["svd", "cholesky", "lsqr", "sag", "saga"]
search_space = {
"alpha": hp.normal("alpha", 1, 5),
"fit_intercept": hp.choice("fit_intercept", intercepts),
"solver": hp.choice("solver", solvers)
}
Our objective function starts by creating Ridge solver with arguments given to the objective function. We then fit ridge solver on train data and predict labels for test data. We are then printing hyperparameters combination that was passed to the objective function. We also print the mean squared error on the test dataset.
Our objective function returns MSE on test data which we want it to minimize for best results.
We have used mean_squared_error() function available from 'metrics' sub-module of scikit-learn to evaluate MSE. Scikit-learn provides many such evaluation metrics for common ML tasks. Please feel free to check below link if you want to know about them.
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
def objective(args):
ridge_reg = Ridge(**args, random_state=123)
ridge_reg.fit(X_train, Y_train)
Y_pred = ridge_reg.predict(X_test)
print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
print("MSE : {}\n".format(mean_squared_error(Y_test, Y_pred)))
return mean_squared_error(Y_test, Y_pred)
In this section, we have called fmin() function with the objective function, hyperparameters search space, and TPE algorithm for search. We have instructed the method to try 10 different trials of the objective function. We have also created Trials instance for tracking stats of trials.
We can notice from the output that it prints all hyperparameters combinations tried and their MSE as well.
trials_obj = hyperopt.Trials()
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
trials=trials_obj,
max_evals=10)
Below we have printed the best results of the above experiment. Please make a note that in the case of hyperparameters with a fixed set of values, it returns the index of value from a list of values of hyperparameter.
It returned index 0 for fit_intercept hyperparameter which points to value True if you check above in search space section. Same way, the index returned for hyperparameter solver is 2 which points to lsqr.
best_results
best_trial = trials_obj.best_trial
best_trial
print("Best MSE : {}".format(trials_obj.average_best_error()))
In this section, we have created Ridge model again with the best hyperparameters combination that we got using hyperopt. We have then trained the model on train data and evaluated it for MSE on both train and test data.
Please make a NOTE that we can save the trained model during the hyperparameters optimization process if the training process is taking a lot of time and we don't want to perform it again. We can include logic inside of the objective function which saves all different models that were tried so that we can later reuse the one which gave the best results by just loading weights.
from sklearn.metrics import mean_squared_error
alpha = best_results["alpha"]
fit_intercept = intercepts[best_results["fit_intercept"]]
solver = solvers[best_results["solver"]]
ridge = Ridge(alpha=alpha,
fit_intercept=fit_intercept,
solver=solver,
random_state=123)
ridge.fit(X_train, Y_train)
Y_pred = ridge.predict(X_test)
print("Test MSE : {}".format(mean_squared_error(Y_test, Y_pred)))
print("Train MSE : {}".format(mean_squared_error(Y_train, ridge.predict(X_train))))
In this section, we'll again explain how to use hyperopt with scikit-learn but this time we'll try it for classification problem.
Also, we'll explain how we can create complicated search space through this example.
We'll be using the wine dataset available from scikit-learn for this example. The wine dataset has the measurement of ingredients used in the creation of three different types of wine. The measurement of ingredients is the features of our dataset and wine type is the target variable.
Below we have loaded the wine dataset from scikit-learn and divided it into the train (80%) and test (20%) sets.
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_wine(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
We'll be using LogisticRegression solver for our problem hence we'll be declaring a search space that tries different values of hyperparameters of it. We'll try to find the best values of the below-mentioned four hyperparameters for LogisticRegression which gives the best accuracy on our dataset.
The search space for this example is a little bit involved because some solver of LogisticRegression do not support all different penalties available. The saga solver supports penalties l1, l2, and elasticnet. The liblinear solver supports l1 and l2 penalties. The newton-cg and lbfgs solvers supports l2 penalty only.
As we want to try all solvers available and want to avoid failures due to penalty mismatch, we have created three different cases based on combinations. The hyperparameters fit_intercept and C are the same for all three cases hence our final search space consists of three key-value pairs (C, fit_intercept, and cases). The cases are further involved based on a combination of solver and penalty combinations. We have declared C using hp.uniform() method because it's a continuous feature. We want to try values in the range [1,5] for C. All other hyperparameters are declared using hp.choice() method as they are all categorical.
intercepts = [True, False]
solvers1 = ["newton-cg", "lbfgs", ]
solvers2 = ["liblinear", ]
solvers3 = ["saga", ]
penalties1 = ["l2", "none"]
penalties2 = ["l1", "l2"]
penalties3 = ["l1", "elasticnet", "l2"]
case1 = {
"penalty1": hp.choice("penalty1", penalties1),
"solver1": hp.choice("solver1", solvers1)
}
case2 = {
"penalty2": hp.choice("penalty2", penalties2),
"solver2": hp.choice("solver2", solvers2)
}
case3 = {
"penalty3": hp.choice("penalty3", penalties3),
"solver3": hp.choice("solver3", solvers3)
}
search_space = {
"C": hp.uniform("C", 1, 5),
"fit_intercept": hp.choice("fit_intercept", intercepts),
"cases" : hp.choice("cases", [("case1", case1), ("case2", case2), ("case3", case3)])
}
The objective function starts by retrieving values of different hyperparameters. It uses conditional logic to retrieve values of hyperparameters penalty and solver. The value is decided based on the case.
We then create LogisticRegression model using received values of hyperparameters and train it on a training dataset. We are then printing hyperparameters combination that was tried and accuracy of the model on the test dataset.
At last, our objective function returns the value of accuracy multiplied by -1. The reason for multiplying by -1 is that during the optimization process value returned by the objective function is minimized. In order to increase accuracy, we have multiplied it by -1 so that it becomes negative and the optimization process tries to find as much negative value as possible.
We can also use cross-entropy loss (commonly used for classification tasks) as value returned by objective function. In that case, we don't need to multiply by -1 as cross-entropy loss needs to be minimized and less value is good.
from sklearn.linear_model import LogisticRegression
def objective(args):
C = args["C"]
fit_intercept = args["fit_intercept"]
kwds = args["cases"]
penalty = kwds[1]["penalty1"] if kwds[0] == "case1" else kwds[1]["penalty2"] if kwds[0] == "case2" else kwds[1]["penalty3"]
solver = kwds[1]["solver1"] if kwds[0] == "case1" else kwds[1]["solver2"] if kwds[0] == "case2" else kwds[1]["solver3"]
log_reg = LogisticRegression(C=C,
fit_intercept=fit_intercept,
penalty=penalty,
solver=solver,
l1_ratio=0.5,
random_state=123)
log_reg.fit(X_train, Y_train)
print("Hyperparameters : {}".format(args)) ## This can be commented if not needed.
print("Accuracy : {}\n".format(log_reg.score(X_test, Y_test))) ## This can be commented if not needed.
return -1 * log_reg.score(X_test, Y_test) ## Multiplied by -1 to maximize accuracy
Below we have called fmin() function with objective function and search space declared earlier. We have instructed it to try 20 different combinations of hyperparameters on the objective function. We have also created Trials instance for tracking stats of the optimization process. We have used TPE algorithm for the hyperparameters optimization process.
trials_obj = hyperopt.Trials()
best_results = hyperopt.fmin(objective,
space=search_space,
algo=hyperopt.tpe.suggest,
trials=trials_obj,
max_evals=20)
In this section, we have printed the results of the optimization process. We have printed the best hyperparameters setting and accuracy of the model. We have multiplied value returned by method average_best_error() with -1 to calculate accuracy.
We have then constructed an exact dictionary of hyperparameters that gave the best accuracy.
print("Best Hyperparameters Settings : {}".format(best_results))
print("\nBest Accuracy : {}".format(-1 * trials_obj.average_best_error()))
C = best_results["C"]
fit_intercept = intercepts[best_results["fit_intercept"]]
if best_results["cases"] == 0:
penalty = penalties1[best_results["penalty1"]]
solver = solvers1[best_results["solver1"]]
elif best_results["cases"] == 1:
penalty = penalties2[best_results["penalty2"]]
solver = solvers2[best_results["solver2"]]
elif best_results["cases"] == 2:
penalty = penalties3[best_results["penalty3"]]
solver = solvers3[best_results["solver3"]]
print("Best Hyperparameters Settings : {}".format({"C":C,
"penalty": penalty,
"fit_intercept": fit_intercept,
"solver":solver,
}))
In this section, we have again created LogisticRegression model with the best hyperparameters setting that we got through an optimization process. We have then trained it on a training dataset and evaluated accuracy on both train and test datasets for verification purposes.
log_reg = LogisticRegression(C=C,
penalty=penalty,
fit_intercept=fit_intercept,
solver=solver,
random_state=123)
log_reg.fit(X_train, Y_train)
print("Test Accuracy : {}".format(log_reg.score(X_test, Y_test)))
print("Train Accuracy : {}".format(log_reg.score(X_train, Y_train)))
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to