Updated On : Mar-27,2022 Time Investment : ~45 mins

Keras Tuner: Hyperparameters Tuning/Optimization of Keras Models

When designing a deep learning model, there are many decisions that need to be made and we don't know the answer to many of them upfront. The common questions like

  • Which activation function to use (relu or tanh or elu or selu?)
  • How many layers to keep in the network?
  • How many units per layer?
  • Which weights and bias initialization to use (normal, lecun-normal, he-initialization, etc)?
  • Whether to use regularization?
  • Which regularization to use (l1, l2, elasticnet, etc)?
  • Whether to use bias or not?
  • How much learning rate to use?
  • Which optimizer to use (SGD, Adam, Adagrad, etc)?
  • and many more.

There are some common standards like using relu activation and Adam optimizer gives good results. But this is not always true. There are not 100% right answers to the above questions for any given problem. These are commonly referred to as hyperparameters for which we need to make decisions. One of the solutions is to try all possible combinations of these hyperparameters to see which one works best. Though this solution seems viable, in reality, deep learning models have a lot of data and can require a lot of time to train hence grid searching through all possible combinations might not be a feasible solution. There are other algorithms like random search, hyperband, and Bayesian optimization which we have covered in this tutorial.

As a part of this tutorial, we'll be explaining how we can use Keras Tuner library to optimize the hyperparameters of networks designed by Python deep learning library keras. The keras tuner library provides an implementation of algorithms like random search, hyperband, and bayesian optimization for hyperparameters tuning. These algorithms find good hyperparameters settings in less number of trials without trying all possible combinations. They search for hyperparameters in the direction that is giving good results. We have explained step by step guide to hyperparameters optimization with simple examples using a keras tuner.

Below, we have listed important sections of tutorial to give an overview of the material covered.

Important Sections Of Tutorial

  1. Regression Example (Random Hyperparameters Search)
  2. Classification Example (Random Hyperparameters Search)
  3. Override Compile Arguments
  4. Override Existing Hyperparameters Search Space
  5. Fixing Few Hyperparameters
  6. Hyperband Algorithm
  7. Bayesian Optimization

Installation

  • pip install -U keras_tuner

Below, we have imported the necessary libraries and printed the versions that we have used in our tutorial.

import keras_tuner

print("Keras Tuner Version : {}".format(keras_tuner.__version__))
Keras Tuner Version : 1.1.0
from tensorflow import keras

print("Keras Version : {}".format(keras.__version__))
Keras Version : 2.6.0

1. Regression Example (Random Hyperparameters Search)

In our first example, we'll explain how we can use a keras tuner for regression tasks. We have loaded the Boston housing dataset available from datasets module of keras below.

from tensorflow.keras import datasets

(X_train_reg, Y_train_reg), (X_test_reg, Y_test_reg) = datasets.boston_housing.load_data()

X_train_reg.shape, X_test_reg.shape, Y_train_reg.shape, Y_test_reg.shape
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/boston_housing.npz
57344/57026 [==============================] - 0s 0us/step
65536/57026 [==================================] - 0s 0us/step
((404, 13), (102, 13), (404,), (102,))

In order to use the keras tuner, we need to design a function that takes as input a single parameter and returns a compiled keras model. The single input parameter is an instance of HyperParameters that has information about values of various hyperparameters that we want to tune. The HyperParameters instance has various methods that can be used to try different values for a particular type of hyperparameter. These methods let us provide values of different types like boolean, integer, list of strings, etc.

In our case below, we have created a neural network of 3 dense layers. For the first two dense layers, we want to try different values of units, use_bias, and activation hyperparameters. The last dense layer has one output unit which will be a prediction of our network. Then, we also want to try different optimizers to select the one that gives the best results. In order to try different values of hyperparameters, we have used various methods available from HyperParameters class. Below, we have explained the commonly used methods.

Important Methods of 'Hyperparameters' Object

  • Boolean(name, default=False) - This method let us provide boolean values for hyperparameter. It accept name as first parameter and default value is False. It'll try True and False values.
  • Choice(name, values, ordered=None, default=None) - This method let us provide list of values to choose from for hyperparameter. The first value is the name of the hyperparameter and the second value is a list of values from which to try different values. The ordered parameter accepts boolean value specifying whether
  • Float(name,min_value,max_value,step=None,sampling=None,default=None) - This method let us try different float values for the hyperparameter. It let us try values in the range [min value, max value]. It has few optional parameters as mentioned below which can be useful.
    • step - We can specify another small float value which is the minimum distance between two float values. Let's say that if we provide a value of 0.1 then different float values tried for the parameter will be at least 0.1 apart from one another.
    • sampling - This parameter accepts one of the below string specifying value sampling strategy.
      • 'linear'
      • 'log'
      • 'reverse_log'
  • Int(name,min_value,max_value,step=1,sampling=None,default=None) - This method let us try different integer values for given hyperparameter. It has same parameters as that of Float() method with only difference that it'll try integer values instead.

In our case below, we have asked to try various units of dense layers using Int() method. We have asked to try values in the range [16,50] with a value increment of 16 hence it'll try values 16,32 and 48. We have given different names for hyperparameters of both layers. For use_bias hyperparameter, we have used Boolean() method to try boolean values. For activation hyperparameter, we have used Choice() method to select 'relu' or 'tanh' activation function. We have also used Choice() method to try different optimizers ('sgd','rmsprop' and 'adam').

After creating a model with hyperparameters, we have compiled it and returned it from the function. This function will be used by the hyperparameters optimization algorithm. The algorithm will provide one set of hyperparameters settings to this function to create a model and then it'll run this model recording its performance of it. The algorithm will also keep track of the performance of various hyperparameters settings to select the best one.

from tensorflow.keras.models import Sequential
from tensorflow.keras import layers
from tensorflow.keras.optimizers import SGD

def build_model(hyperparams):
    model = Sequential()
    model.add(layers.Input(shape=(X_train_reg.shape[1],)))
    model.add(layers.Dense(units=hyperparams.Int("units_l1", 16, 50, step=16),
                           use_bias=hyperparams.Boolean("bias_l1"),
                           activation=hyperparams.Choice("act_l1", ["relu", "tanh"])
                          ))
    model.add(layers.Dense(units=hyperparams.Int("units_l2", 16, 50, step=16),
                           use_bias=hyperparams.Boolean("bias_l2"),
                           activation=hyperparams.Choice("act_l2", ["relu", "tanh"])
                          ))
    model.add(layers.Dense(1))

    optim=hyperparams.Choice("optimizer",["sgd","rmsprop","adam"])
    model.compile(optim, loss="mean_squared_error", metrics=["mean_squared_error"])

    return model

Below, we have performed hyperparameters tunning using a random search algorithm. We can create an instance of random search algorithm using RandomSearch() constructor available from keras tuner. The constructor takes the below-mentioned important parameters that are required for finding the best hyperparameters for the model.

  • hypermodel - This parameter accepts an instance of HyperModel class or a callable that takes hyperparameters and returns a compiled model. We have created a callable above that we'll provide to this parameter. We can also create a class that extends HyperModel class and has build() method that works exactly like our callable (takes hyperparameters and returns a compiled model) above. Then, we can give that class instance to this parameter.
  • objective - This parameter takes either string value of Objective instance specifying an objective function that we want to optimizer. Keras tuner will look for hyperparameters setting in a way that this objective function is minimized. To give an example, let's say, we give 'accuracy' as the value of this parameter then it'll try to maximize training accuracy, if we give 'val_loss' as the value of this parameter then it'll try to minimize validation loss.
    • We can provide any string metric or loss name to this parameter like 'val_loss', 'loss', 'mean_squared_error', etc. If we provide a string value then whether this objective function should be minimized or maximized will be inferred by the keras tuner.
    • We can also provide an instance of Objective. This is useful when we are using custom metric or loss and the keras tuner can't figure by itself whether to minimize or maximize it. The Objective() constructor takes two arguments.
      • name - This is the name of the metric to monitor.
      • direction - This is string specifying direction of optimization. The 'min' value for minimizing and 'max' value for maximizing.
  • max_trials - This parameter accepts integer values specifying the number of different hyperparameter settings to try.
  • seed - It's a random seed to reproduce the same results.
  • hyperparameters - This parameter is optional and accepts instances of HyperParameters that can be used to override what is already set when creating a function that creates the network. We have explained the usage of this later.
  • optimizer - This parameter is optional and accepts optimizers that can be used to override what we provided in a method that creates a model.
  • loss - This parameter is optional and accepts loss that can be used to override what we have already specified inside the method that creates a model.
  • metrics - This parameter is optional and accepts a list of metrics that can be used to override what we have already specified inside the method that creates a model.
  • directory - This parameter accepts relative path specifying where work should be saved when the tuner tries various hyperparameters settings. By default, it's the current directory.
  • project_name - This is the name of the project. The folder by this name will be created and all progress during the optimization process will be stored there. We need to give a different value to this parameter if we are trying more than one optimization at a time. We have given a different name for all our tuners.
  • overwrite - This parameter accepts boolean value specifying whether to reload an existing project of the same name if there is one. Default is False.

In our case below, we have created RandomSearch instance by giving the function we designed earlier. We have asked it to minimize validation mean squared error and try 5 different combinations of hyperparameters. It'll call the function 5 times with different hyperparameters settings and create 5 different models to try.

After we have created an instance of RandomSearch tuner, we need to call search() function on it to actually try different hyperparameters settings. The search() function accepts same parameters as fit() method of model instance. We have given train data and validation data to the method. The call to search() function will try 5 different hyperparameters settings by creating 5 different models using them. It'll run all models for 10 epochs and record all metrics for them. It'll then sort models that have less validation mean squared error to more than one.

The tuner prints the best validation mean squared error of 63.61 at the end of the tuning process.

from keras_tuner import RandomSearch
from keras_tuner import Objective

tuner1 =  RandomSearch(hypermodel=build_model,
                      objective="val_mean_squared_error",
                      #objective=Objective(name="val_mean_squared_error",direction="min"),
                      max_trials=5,
                      #seed=123,
                      project_name="Regression",
                      overwrite=True
                    )

tuner1.search(X_train_reg, Y_train_reg, batch_size=32, epochs=10, validation_data=(X_test_reg, Y_test_reg))
Trial 5 Complete [00h 00m 01s]
val_mean_squared_error: 282.96630859375

Best val_mean_squared_error So Far: 63.61709976196289
Total elapsed time: 00h 00m 08s

After hyperparameters tuning process has completed, we can call get_best_hyperparameters() method on instance of RandomSearch tuner. Below, we have printed the best hyperparameters combination.

best_params = tuner1.get_best_hyperparameters()

best_params[0].values
{'units_l1': 16,
 'bias_l1': True,
 'act_l1': 'relu',
 'units_l2': 32,
 'bias_l2': True,
 'act_l2': 'relu',
 'optimizer': 'adam'}

We can also retrieve a model instance that gave the best results. We can use the same model for making predictions as well. It'll be loaded with trained parameters. We can save the best model for later use as well.

best_model = tuner1.get_best_models()[0]

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 16)                224
_________________________________________________________________
dense_1 (Dense)              (None, 32)                544
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 33
=================================================================
Total params: 801
Trainable params: 801
Non-trainable params: 0
_________________________________________________________________
Y_test_reg_preds = best_model.predict(X_test_reg)

Y_test_reg[:5], Y_test_reg_preds[:5]
(array([ 7.2, 18.8, 19. , 27. , 22.2]),
 array([[10.577744],
        [21.05252 ],
        [26.507633],
        [17.621325],
        [24.755987]], dtype=float32))

We can print the results of trials by calling results_summary() function. It'll print results in order of best to worst performing models. We can provide num_trials parameter to it specifying to print only that many best entries.

In our case, we have asked to print 3 best-performing models. We can notice that the first best performing network has a validation mean squared error of around 63.6.

tuner1.results_summary(num_trials=3)
Results summary
Results in ./Regression
Showing 3 best trials
Objective(name='val_mean_squared_error', direction='min')
Trial summary
Hyperparameters:
units_l1: 16
bias_l1: True
act_l1: relu
units_l2: 32
bias_l2: True
act_l2: relu
optimizer: adam
Score: 63.61709976196289
Trial summary
Hyperparameters:
units_l1: 48
bias_l1: False
act_l1: relu
units_l2: 48
bias_l2: True
act_l2: tanh
optimizer: sgd
Score: 83.2453842163086
Trial summary
Hyperparameters:
units_l1: 16
bias_l1: True
act_l1: tanh
units_l2: 32
bias_l2: False
act_l2: relu
optimizer: rmsprop
Score: 282.96630859375

2. Classification Example (Random Hyperparameters Search)

As a part of our second example, we have explained how we can use a random search tuner for classification tasks. We have loaded the Fashion MNIST dataset below for our task. The dataset has grayscale images of shape (28,28) pixels for 10 different fashion items. The dataset is already divided into the train (60k images) and test (10k images) sets. We'll be trying various convolutional neural networks on this dataset to check which one is giving the best results.

import numpy as np
from tensorflow.keras import datasets

(X_train_classif, Y_train_classif), (X_test_classif, Y_test_classif) = datasets.fashion_mnist.load_data()

X_train_classif, X_test_classif = X_train_classif.reshape(-1,28,28,1), X_test_classif.reshape(-1,28,28,1)

classes = np.unique(Y_train_classif)

X_train_classif.shape, X_test_classif.shape, Y_train_classif.shape, Y_test_classif.shape
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
40960/29515 [=========================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
26435584/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
16384/5148 [===============================================================================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
4431872/4422102 [==============================] - 0s 0us/step
((60000, 28, 28, 1), (10000, 28, 28, 1), (60000,), (10000,))

In the below cell, we have created a new class that extends HyperModel class. The class has build() method that takes HyperParameters instance as input and returns a compiled keras model. It sets various hyperparameters using methods of HyperParameters instance. We'll be giving an instance of this class to RandomSearch() constructor later.

The function has first hyperparameter ('ConvNetType') that is choice between two values ('Conv1' and 'Conv2'). Based on the value of this parameter, we'll add two ('Conv1') or three ('Conv2') convolution layers to the network. For the convolution layer, we are trying different output channels using Int() method. For 2 layer convolution option ('Conv1'), both layers try 16 and 32 as output channel values. For 3 layer convolution option ('Conv2'), the first two convolution layer tries 16 and 32 output channel values and the third convolution layer tries 8 and 16 output channel values.

Apart from this, we have also asked to try different values of activation ('relu' and 'tanh') and kernel initialization ('random_normal', 'lecun_normal' and 'he_normal') using Choice() method.

Once, convolution layers are added to the network based on 'ConvNetType' hyperparameter value, we add a dense layer to the network that has 10 output units (same as the number of target classes) and softmax activation function.

Then, we have compiled the network and returned it.

This time, we have used another method of HyperParameters class named conditional_scope(). This method is used to create a scope that will be only active during specified values of particular hyperparameters. In our case, we have used it when 'ConNetType' has values 'Conv1' or 'Conv2'. This can be useful when we have to make decisions based on many values and we want a scope for a small list of values from all original values. To explain it with simple example, lets say that 'ConNetType' has 5 values ('Conv1', 'Conv2', 'Conv3', 'Conv4' and 'Conv5') and we want scope for 3 values ('Conv1', 'Conv3' and 'Conv5') and 2 values ('Conv2' and 'Conv4') separately.

from keras_tuner import HyperModel
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

class ConvNetwork(HyperModel):
    def build(self, hp):
        model = Sequential()
        model.add(layers.Input(shape=X_train_classif.shape[1:]))
        model_type = hp.Choice("ConvNetType", ["Conv1","Conv2"])

        if model_type == "Conv1":
            with hp.conditional_scope("ConvNetType", ["Conv1"]):
                activation = hp.Choice("activation", ["relu", "tanh"])
                kern_init = hp.Choice("kernel_initializer", ["random_normal", "lecun_normal","he_normal"])

                model.add(layers.Conv2D(filters=hp.Int("Conv1_1", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
                model.add(layers.Conv2D(filters=hp.Int("Conv1_2", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
        elif model_type == "Conv2":
            with hp.conditional_scope("ConvNetType", ["Conv2"]):
                activation = hp.Choice("activation", ["relu", "tanh"])
                kern_init = hp.Choice("kernel_initializer", ["random_normal", "lecun_normal","he_normal"])

                model.add(layers.Conv2D(filters=hp.Int("Conv2_1", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
                model.add(layers.Conv2D(filters=hp.Int("Conv2_2", 16, 33, step=16), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))
                model.add(layers.Conv2D(filters=hp.Int("Conv2_3", 8, 17, step=8), kernel_size=(3,3), padding="same", kernel_initializer=kern_init, activation=activation))

        model.add(layers.Flatten())
        model.add(layers.Dense(units=len(classes), activation="softmax"))

        model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

        return model

In the below cell, we have created a random search tuner and executed it for 5 trials. We have given our instance of HyperModel to it and have asked it to maximize validation accuracy using Objective instance.

We have executed the tuning process by calling search() function giving it train data validation data, batch size (512), and epochs (10).

The tuner prints the best validation accuracy of 0.903 at the end of the tuning process.

from keras_tuner import RandomSearch
from keras_tuner import Objective

conv2 = ConvNetwork()
tuner2 =  RandomSearch(hypermodel=conv2,
                      objective=Objective(name="val_accuracy",direction="max"),
                      max_trials=5,
                      #seed=123,
                      project_name="Classification",
                      overwrite=True
                    )

tuner2.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
Trial 5 Complete [00h 02m 17s]
val_accuracy: 0.8992000222206116

Best val_accuracy So Far: 0.9031000137329102
Total elapsed time: 00h 17m 29s

In the below cell, we have printed the best hyperparameters settings that gave 0.903 accuracy.

In the next cells, we have retrieved the best model and used it to evaluate performance on the test dataset which we had used as a validation dataset. Then, we have printed the tuning summary as well.

best_params = tuner2.get_best_hyperparameters()

best_params[0].values
{'ConvNetType': 'Conv1',
 'activation': 'relu',
 'kernel_initializer': 'random_normal',
 'Conv1_1': 16,
 'Conv1_2': 32}
best_model = tuner2.get_best_models()[0]

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 28, 28, 16)        160
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 32)        4640
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0
_________________________________________________________________
dense (Dense)                (None, 10)                250890
=================================================================
Total params: 255,690
Trainable params: 255,690
Non-trainable params: 0
_________________________________________________________________
best_model.evaluate(X_test_classif, Y_test_classif)
313/313 [==============================] - 2s 4ms/step - loss: 0.2906 - accuracy: 0.9031
[0.29061856865882874, 0.9031000137329102]
tuner2.results_summary(num_trials=3)
Results summary
Results in ./Classification
Showing 3 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: relu
kernel_initializer: random_normal
Conv1_1: 16
Conv1_2: 32
Score: 0.9031000137329102
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: tanh
kernel_initializer: random_normal
Conv1_1: 16
Conv1_2: 16
Score: 0.902400016784668
Trial summary
Hyperparameters:
ConvNetType: Conv2
activation: relu
kernel_initializer: random_normal
Conv2_1: 16
Conv2_2: 16
Conv2_3: 8
Score: 0.8992000222206116

3. Override Compile Arguments

In this section, we have explained how we can override default arguments like an optimizer, loss function, and metrics that we gave when we compile the model inside of the model creation function.

The RandomSearch tuner lets us provide arguments like an optimizer, loss, and metrics that will override whatever we had provided when compiling the model. Below, we have explained for example how we can override default arguments. We have overridden the optimizer from Adam to RMSProp. Though we have overridden loss and metrics as well, we have provided the same values again.

from keras_tuner import RandomSearch
from keras_tuner import Objective
from tensorflow.keras import metrics

conv3 = ConvNetwork()
tuner3 =  RandomSearch(hypermodel=conv3,
                       objective=Objective(name="val_accuracy",direction="max"),
                       max_trials=5,
                       optimizer="rmsprop",
                       loss="sparse_categorical_crossentropy",
                       #metrics=["accuracy", metrics.AUC(name="area_under_curve")],
                       metrics=["accuracy"],
                       #seed=123,
                       project_name="OverrideCompileArgs",
                       overwrite=True
                    )

tuner3.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
Trial 5 Complete [00h 05m 11s]
val_accuracy: 0.8883000016212463

Best val_accuracy So Far: 0.8974999785423279
Total elapsed time: 00h 19m 27s
best_params = tuner3.get_best_hyperparameters()

best_params[0].values
{'ConvNetType': 'Conv2',
 'activation': 'relu',
 'kernel_initializer': 'random_normal',
 'Conv2_1': 16,
 'Conv2_2': 16,
 'Conv2_3': 8}
best_model = tuner3.get_best_models()[0]

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 28, 28, 16)        160
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 16)        2320
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 8)         1160
_________________________________________________________________
flatten (Flatten)            (None, 6272)              0
_________________________________________________________________
dense (Dense)                (None, 10)                62730
=================================================================
Total params: 66,370
Trainable params: 66,370
Non-trainable params: 0
_________________________________________________________________
best_model.evaluate(X_test_classif, Y_test_classif)
313/313 [==============================] - 1s 4ms/step - loss: 0.3375 - accuracy: 0.8975
[0.3374863862991333, 0.8974999785423279]
tuner3.results_summary(num_trials=3)
Results summary
Results in ./OverrideCompileArgs
Showing 3 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
ConvNetType: Conv2
activation: relu
kernel_initializer: random_normal
Conv2_1: 16
Conv2_2: 16
Conv2_3: 8
Score: 0.8974999785423279
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: relu
kernel_initializer: random_normal
Conv1_1: 32
Conv1_2: 16
Score: 0.88919997215271
Trial summary
Hyperparameters:
ConvNetType: Conv2
activation: tanh
kernel_initializer: random_normal
Conv2_1: 32
Conv2_2: 32
Conv2_3: 16
Score: 0.8891000151634216

4. Override Existing Hyperparameters Search Space

As a part of this example, we have explained how we can override the existing setting of any hyperparameter by providing our new HyperParameters instance to RandomSearch() constructor.

There can be situations when we want to just override a few hyperparameters of the model and we don't want to modify settings done inside of the function building model. In those situations, we can define our own HyperParameters instance and set those hyperparameters that we want to modify inside it. Then, we provide this HyperParameters instance to hyperparameters argument of RandomSearch() constructor. This will override hyperparameters defined inside the function with those we provided through HyperParameters instance.

In our example below, we have again used the function from Regression section. We have override values that we try for activation functions of both dense layers. The function by default tries values relu and tanh. We have defined HyperParameters instance that replaces those values with selu and elu activations. Then, we have provided HyperParameters instance to RandomSearch tuner. We have then performed hyperparameters tunning by calling search() method on the tuner.

Later on, in the next few cells, we have printed the best hyperparameters, best model, and tuning results for verification purposes. We can notice from the results that the tuner now tries activation functions selu and elu instead of relu and tanh. This confirms that our settings are working as expected.

from keras_tuner import RandomSearch
from keras_tuner import Objective
from keras_tuner import HyperParameters

hp = HyperParameters()
hp.Choice("act_l1",["selu","elu"])
hp.Choice("act_l2",["selu","elu"])

#conv4 = ConvNetwork()
tuner4 =  RandomSearch(hypermodel=build_model,
                      objective=Objective(name="val_mean_squared_error",direction="min"),
                      max_trials=5,
                      hyperparameters=hp,
                      #seed=123
                      project_name="OverrideExistingHyperparameters",
                      overwrite=True
                    )

tuner4.search(X_train_reg, Y_train_reg, batch_size=512, epochs=10, validation_data=(X_test_reg, Y_test_reg))
Trial 5 Complete [00h 00m 01s]
val_mean_squared_error: 227.8800048828125

Best val_mean_squared_error So Far: 118.93730926513672
Total elapsed time: 00h 00m 06s
best_params = tuner4.get_best_hyperparameters()

best_params[0].values
{'act_l1': 'selu',
 'act_l2': 'selu',
 'units_l1': 48,
 'bias_l1': True,
 'units_l2': 32,
 'bias_l2': True,
 'optimizer': 'rmsprop'}
best_model = tuner4.get_best_models()[0]

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 48)                672
_________________________________________________________________
dense_1 (Dense)              (None, 32)                1568
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 33
=================================================================
Total params: 2,273
Trainable params: 2,273
Non-trainable params: 0
_________________________________________________________________
best_model.evaluate(X_train_reg, Y_train_reg)
13/13 [==============================] - 0s 2ms/step - loss: 112.1075 - mean_squared_error: 112.1075
[112.10753631591797, 112.10753631591797]
tuner4.results_summary(num_trials=3)
Results summary
Results in ./OverrideExistingHyperparameters
Showing 3 best trials
Objective(name='val_mean_squared_error', direction='min')
Trial summary
Hyperparameters:
act_l1: selu
act_l2: selu
units_l1: 48
bias_l1: True
units_l2: 32
bias_l2: True
optimizer: rmsprop
Score: 118.93730926513672
Trial summary
Hyperparameters:
act_l1: selu
act_l2: elu
units_l1: 32
bias_l1: False
units_l2: 32
bias_l2: True
optimizer: adam
Score: 134.54824829101562
Trial summary
Hyperparameters:
act_l1: elu
act_l2: elu
units_l1: 16
bias_l1: False
units_l2: 48
bias_l2: True
optimizer: adam
Score: 223.5952606201172

5. Fixing Few Hyperparameters

In this example, we have explained how we can fix the values of some of the hyperparameters that we are tunning. We can do this by defining our own HyperParameters instance and calling Fixed() method to fix values of hyperparameters. Then, we need to give this HyperParameters instance to RandomSearch() constructor.

Below, we are again using the model building function from the regression section. We have fixed a few hyperparameters that we don't want to tune. We have set activation of the first dense layer to relu, a number of units of the first dense layer to 32, and optimizer to adam. These 3 hyperparameters inside of the build function won't be tuned and these fixed values will be used. All other hyperparameters defined inside the function will still be tuned.

After fixing hyperparameters, we have created RandomSearch tuner with HyperParameters instance and called search() method on it to perform hyperparameters tuning.

In the next few cells after tuning, we have also printed the best hyperparameters found by the tuner, best model, and tuning summary results.

We can notice from the results that for the first dense layer units are set at 32 and an activation value of relu is used. The adam optimizer is used for optimization. This confirms that our settings are working as expected.

from keras_tuner import RandomSearch
from keras_tuner import Objective
from keras_tuner import HyperParameters

hp = HyperParameters()
hp.Fixed("act_l1","relu")
hp.Fixed("units_l1", 32)
hp.Fixed("optimizer", "adam")
#hp.Fixed("kernel_initializer", "he_normal")

#conv5 = ConvNetwork()
tuner5 =  RandomSearch(hypermodel=build_model,
                      objective=Objective(name="val_mean_squared_error",direction="min"),
                      max_trials=5,
                      hyperparameters=hp,
                      #seed=123
                      project_name="FixHyperparameters",
                      overwrite=True
                    )

tuner5.search(X_train_reg, Y_train_reg, batch_size=512, epochs=10, validation_data=(X_test_reg, Y_test_reg))
Trial 5 Complete [00h 00m 01s]
val_mean_squared_error: 626.8165893554688

Best val_mean_squared_error So Far: 249.16595458984375
Total elapsed time: 00h 00m 06s
best_params = tuner5.get_best_hyperparameters()

best_params[0].values
{'act_l1': 'relu',
 'units_l1': 32,
 'optimizer': 'adam',
 'bias_l1': True,
 'units_l2': 32,
 'bias_l2': True,
 'act_l2': 'relu'}
best_model = tuner5.get_best_models()[0]

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 32)                448
_________________________________________________________________
dense_1 (Dense)              (None, 32)                1056
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 33
=================================================================
Total params: 1,537
Trainable params: 1,537
Non-trainable params: 0
_________________________________________________________________
best_model.evaluate(X_train_reg, Y_train_reg)
13/13 [==============================] - 0s 1ms/step - loss: 247.3708 - mean_squared_error: 247.3708
[247.37081909179688, 247.37081909179688]
tuner5.results_summary(num_trials=3)
Results summary
Results in ./FixHyperparameters
Showing 3 best trials
Objective(name='val_mean_squared_error', direction='min')
Trial summary
Hyperparameters:
act_l1: relu
units_l1: 32
optimizer: adam
bias_l1: True
units_l2: 32
bias_l2: True
act_l2: relu
Score: 249.16595458984375
Trial summary
Hyperparameters:
act_l1: relu
units_l1: 32
optimizer: adam
bias_l1: False
units_l2: 32
bias_l2: True
act_l2: relu
Score: 411.9037170410156
Trial summary
Hyperparameters:
act_l1: relu
units_l1: 32
optimizer: adam
bias_l1: True
units_l2: 48
bias_l2: False
act_l2: tanh
Score: 523.9325561523438

6. Hyperband Algorithm

In this section, we have performed hyperparameters optimization using Hyperband algorithm. It is a variation of random search with explore-exploit theory to find good hyperparameters settings. It focuses on speeding up random search through adaptive resource allocation and early stopping. It randomly allocates resources like iterations, data samples, and features to different hyperparameters settings and tries to solve stochastic bandit problems where it keeps on eliminating underperforming settings. The keras tuner provides an implementation of Hyperband algorithm tuner through Hyperband() constructor. It has the majority of the parameters same as random search with a few additional parameters as listed below.

  • max_epochs - It accepts integers specifying a maximum number of epochs to train one model.
  • factor - This parameter accepts integer value specifying reduction factor for a number of epochs and number of models for each bracket. Default is 3.
  • hyperband_iterations - This parameter accepts integer specifying number of times to iterate over full hyperband algorithm. The default is 1. One iteration runs approximately max_epochs * (math.log(max_epochs, factor) ** 2) cumulative epochs across all trials.

In our case below, we have used Hyperband tuner for our classification task involving CNN. We have initialized it with the convolutional neural network with hyper band iterations set to 1. We have asked it to maximize validation accuracy.

After initializing the tuner, we have called search() method as usual to perform the tuning process. We have printed the best hyperparameters settings as well as the best model after completion of the tuning process. We have also printed the tuning process summary. We have got the best accuracy of 0.896. We can set hyperband_iterations to greater than 1 and it might improve results further.

from keras_tuner import Hyperband
from keras_tuner import Objective
from keras_tuner import Hyperband
from keras_tuner import Objective

conv6 = ConvNetwork()
tuner6 =  Hyperband(hypermodel=conv6,
                   objective=Objective(name="val_accuracy",direction="max"),
                   hyperband_iterations=1,
                   #seed=123
                   project_name="Hyperband",
                   overwrite=True
                  )

tuner6.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
Trial 63 Complete [00h 00m 20s]
val_accuracy: 0.8414999842643738

Best val_accuracy So Far: 0.8968999981880188
Total elapsed time: 00h 56m 09s
best_params = tuner6.get_best_hyperparameters()

best_params[0].values
{'ConvNetType': 'Conv1',
 'activation': 'tanh',
 'kernel_initializer': 'he_normal',
 'Conv1_1': 32,
 'Conv1_2': 32,
 'tuner/epochs': 2,
 'tuner/initial_epoch': 0,
 'tuner/bracket': 4,
 'tuner/round': 0}
best_model = tuner6.get_best_models()[0]

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 28, 28, 32)        320
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 32)        9248
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0
_________________________________________________________________
dense (Dense)                (None, 10)                250890
=================================================================
Total params: 260,458
Trainable params: 260,458
Non-trainable params: 0
_________________________________________________________________
best_model.evaluate(X_test_classif, Y_test_classif)
313/313 [==============================] - 3s 7ms/step - loss: 0.3044 - accuracy: 0.8969
[0.3044421970844269, 0.8968999981880188]
tuner6.results_summary(num_trials=3)
Results summary
Results in ./Hyperband
Showing 3 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: tanh
kernel_initializer: he_normal
Conv1_1: 32
Conv1_2: 32
tuner/epochs: 2
tuner/initial_epoch: 0
tuner/bracket: 4
tuner/round: 0
Score: 0.8968999981880188
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: tanh
kernel_initializer: random_normal
Conv1_1: 32
Conv1_2: 32
tuner/epochs: 2
tuner/initial_epoch: 0
tuner/bracket: 4
tuner/round: 0
Score: 0.8964999914169312
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: tanh
kernel_initializer: lecun_normal
Conv1_1: 32
Conv1_2: 16
tuner/epochs: 2
tuner/initial_epoch: 0
tuner/bracket: 4
tuner/round: 0
Score: 0.8964999914169312

7. Bayesian Optimization Algorithm

In this example, we have explained bayesian optimization tuner available from keras tuner. Bayesian optimization uses Bayes theorem to find the best hyperparameters settings. We can use the Bayesian optimization tuner by BayesianOptimization() constructor of the keras tuner. It has almost the same parameters as a random search tuner with a few additional parameters listed below.

  • num_initial_points - This parameter accepts integer values specifying the number of randomly generated samples for the initial training of the network. The default is 2.
  • alpha - This parameter accepts float value added to the diagonal of kernel matrix during fitting. It is the expected amount of noise in the observed performances in the Bayesian optimization process. The default value is 1e-4.
  • beta - This parameter accepts float value specifying balancing factor of exploration and exploitation. The larger value means more exploration. The default value is 2.6.

Below, we have initialized the bayesian optimization tuner and tried to find good hyperparameters settings for our classification task network (CNN). As usual, we have performed a search by calling search() method on the tuner object.

We have printed the best hyperparameters settings and best model after completion of the process, as well as a summary of various settings, tried.

from keras_tuner import BayesianOptimization
from keras_tuner import Objective

conv7 = ConvNetwork()
tuner7 =  BayesianOptimization(hypermodel=conv7,
                               objective=Objective(name="val_accuracy",direction="max"),
                               max_trials=10,
                               num_initial_points=2,
                               #seed=123
                               project_name="BayesianOptimization",
                               overwrite=True
                              )

tuner7.search(X_train_classif, Y_train_classif, batch_size=512, epochs=10, validation_data=(X_test_classif, Y_test_classif))
Trial 10 Complete [00h 05m 22s]
val_accuracy: 0.9064000248908997

Best val_accuracy So Far: 0.9103000164031982
Total elapsed time: 00h 37m 54s
best_params = tuner7.get_best_hyperparameters()

best_params[0].values
{'ConvNetType': 'Conv1',
 'activation': 'tanh',
 'kernel_initializer': 'random_normal',
 'Conv1_1': 32,
 'Conv1_2': 32}
best_model = tuner7.get_best_models()[0]

best_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 28, 28, 32)        320
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 32)        9248
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0
_________________________________________________________________
dense (Dense)                (None, 10)                250890
=================================================================
Total params: 260,458
Trainable params: 260,458
Non-trainable params: 0
_________________________________________________________________
best_model.evaluate(X_test_classif, Y_test_classif)
313/313 [==============================] - 3s 8ms/step - loss: 0.2762 - accuracy: 0.9103
[0.27616605162620544, 0.9103000164031982]
tuner7.results_summary(num_trials=3)
Results summary
Results in ./BayesianOptimization
Showing 3 best trials
Objective(name='val_accuracy', direction='max')
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: tanh
kernel_initializer: random_normal
Conv1_1: 32
Conv1_2: 32
Score: 0.9103000164031982
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: tanh
kernel_initializer: random_normal
Conv1_1: 32
Conv1_2: 32
Score: 0.9064000248908997
Trial summary
Hyperparameters:
ConvNetType: Conv1
activation: tanh
kernel_initializer: random_normal
Conv1_1: 32
Conv1_2: 32
Score: 0.9056000113487244

This ends our small tutorial explaining how we can use various tuners available from keras tuner to find the best hyperparameters for the given model. We have explained all hyperparameters tuning algorithms available from keras tuner. Please feel free to let us know your views in the comments section.

References

Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.


Subscribe to Our YouTube Channel

YouTube SubScribe

Newsletter Subscription