Updated On : Oct-15,2021 Tags scikeras, scikit-learn, keras
Scikeras : Give Scikit-Learn like API to your Keras Neural Networks

Scikeras : Give Scikit-Learn like API to your Keras Networks

Scikit-learn is the most famous machine learning library in the python community. It's been used by the majority of developers worldwide. It provides an implementation of the majority of machine learning algorithms in its API. One of the main reasons scikit-learn is preferred by many developers is the simplicity of the API. It let us train ML models with one function call, make a prediction using one function call and even evaluation of dataset can be done with just one function call. This easy-to-use API has made the scikit-learn widely accepted library of ML as it does not require a steep learning curve.

But latest ML problems (object detection, image classification, speech recognition, etc) are quite complicated and can not be solved using simple ML algorithms available from scikit-learn. It requires creating complicated neural networks like convolutional neural networks, recurrent neural networks, etc. One famous library for creating such a complicated neural network is keras. Keras like scikit-learn has been accepted by many developers worldwide to create deep neural networks. Like scikit-learn provides easy API in the machine learning domain, keras provides easy to use API for the deep learning domain. This is the reason keras has been favored by many developers worldwide to create deep neural networks though it requires little learning to get things right.

As a part of this tutorial, we are going to introduce a new library named scikeras which lets us use keras deep neural networks with simple API like that of scikit-learn. Scikeras let us wrap our keras models into classes available from scikeras. We can then use this wrapped instances like scikit-learn ML model instances and call methods like fit(), predict() and score() on them. In short, scikeras let us use keras model like they are scikit-learn models. We'll explain the API of scikeras with simple examples using toy datasets available from scikit-learn.

Below we have highlighted important sections of the tutorial to give an overview of the material that we'll be covering.

Important Sections of Tutorial

  1. Regression
    • Load Dataset
    • Create Keras Model
    • Wrap Keras Neural Network into Scikeras KerasRegressor
    • Train Wrapped Model
    • Make Predictions
    • Evaluate Model Performance on Test Data
    • Explore Training History
  2. Classification
    • Load Dataset
    • Create Keras Classifier Neural Network
    • Wrap Keras Model into Scikeras KerasClassifier
    • Train Model
    • Make Predictions
    • Evaluate Model Performance
    • Analyze Training History
  3. Warm Start
  4. Machine Learning Pipeline
  5. Grid Search
  6. ML Pipeline + Grid Search
  7. Saving and Loading Model

Below we have imported the necessary libraries that we'll be using in this tutorial and printed the versions of each of them.

In [1]:
import sklearn

print("Scikit-Learn Version : {}".format(sklearn.__version__))
Scikit-Learn Version : 0.24.2
In [2]:
import tensorflow
from tensorflow import keras

print("Tensorflow Version : {}".format(tensorflow.__version__))
Tensorflow Version : 2.6.0
In [3]:
import scikeras

print("Scikeras Version : {}".format(scikeras.__version__))
Scikeras Version : 0.4.1

1. Regression

In this section, we'll explain how we can solve a simple regression problem using keras neural net by wrapping it using scikeras so that we can use it like scikit-learn for training and evaluation. We'll be creating a very simple neural network for explanation purposes. The dataset used for example is a simple Boston housing toy dataset available from scikit-learn.

Load Dataset

We'll start by loading the Boston housing dataset available from scikit-learn. It has information houses in Boston like the number of bedrooms, the crime rate in the area, tax rate, etc. The target variable of the dataset is the median value of homes in 1000 dollars. As the target variable is a continuous variable, this will be a regression problem.

We have divided the dataset into the train (80%) and test (20%) sets as well.

In [4]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[4]:
((404, 13), (102, 13), (404,), (102,))

Create Keras Model

In this section, we have created a simple keras neural network that will be used for our regression problem. The network has an input layer with the same shape as that of a number of features of data which is 13 in our case. The second layer has 26 units, the third layer has 52 units and the final layer has 1 unit. The final layer has only 1 unit as it'll output prediction. The activation for the second and third layers is relu.

Please make a NOTE that we have not covered technical detail about model creation as we expect that reader has a bit background of simple neural network creation using keras.

In [5]:
from tensorflow import keras
from tensorflow.keras import models

neural_regressor = models.Sequential(
    [
        keras.layers.Dense(26, activation="relu", input_shape=(X_train.shape[1],)),
        keras.layers.Dense(52, activation="relu"),
        keras.layers.Dense(1)
    ]
)

neural_regressor.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 26)                364
_________________________________________________________________
dense_1 (Dense)              (None, 52)                1404
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 53
=================================================================
Total params: 1,821
Trainable params: 1,821
Non-trainable params: 0
_________________________________________________________________

Wrap Keras Neural Network into Scikeras KerasRegressor

In this section, we have explained how we can wrap the keras neural network into scikeras model so that it can be used like scikit-learn models. We'll be wrapping our network into KerasRegressor class from scikeras. It provides an API for regression tasks. Below we have highlighted the definition of KerasRegressor class.


  • KerasRegressor(model,optimizer="rmsprop",loss=None,metrics=None,epochs=1,batch_size=None,callbacks=None,validation_split=0.0, verbose=1, warm_start=False, random_state=None)
    • This class constructor takes as input keras neural network and returns an instance of KerasRegressor which will behave like regression estimator from scikit-learn. We can call methods like fit(), predict(), and score() on instance of KerasRegressor.
    • The model parameter takes as input instance of keras.Model.
    • The optimizer parameter takes as input optimizer name specified as a string. It can also accept instances of optimizers created from keras.optimizers module. The default value is 'rmsprop'.
    • The loss function takes as input loss function name specified as a string. It can also accept instances of loss created from keras.losses module or callback from the same module.
    • The metrics parameter accepts a list of strings specifying metrics that need to be evaluated on each epoch.
    • The epochs parameter accepts integer specifying number of passes through train data.
    • The batch_size parameter specifies the batch size to use during training.
    • The callbacks parameter accepts a list of callable that needs to be executed at various stages of training like start/end of an epoch, start/end of the batch, etc.
    • The validation_split parameter accepts float in the range 0-1 which will divide the training dataset into train and validation sets based on the proportion specified using this parameter value.
    • The warm_start parameter accepts boolean values specifying whether to perform warm start or not. It is set to False by default which will reinitialize neural network weights each time we call fit() method on KerasRegressor. If we set it to True then it won't reinitialize network weights and training will start with weights after the last call to fit() method.

Below we have wrapped our keras neural network inside of KerasRegressor class. We have asked to use adam as optimizer and mean squared error as a loss. We have asked to use a batch size of 8 and run the training process for 100 epochs. We have set the verbose parameter to 0 to silent output as we don't want to flood output with messages of each epoch.

In [6]:
from scikeras.wrappers import KerasRegressor

scikeras_regressor = KerasRegressor(model=neural_regressor,
                                    optimizer="adam",
                                    loss=keras.losses.mean_squared_error,
                                    batch_size=8,
                                    epochs=100,
                                    verbose=0
                                  )

Train Wrapped Model

Now, we have simply trained our KerasRegressor using fit() method by giving it train features and target values.

In [9]:
scikeras_regressor.fit(X_train, Y_train);

Make Predictions

In this section, we have made predictions on test data using predict() method of KerasRegressor instance.

In [211]:
Y_preds = scikeras_regressor.predict(X_test)

Y_preds[:5]
Out[211]:
array([11.073608, 26.462534, 41.206894, 16.121946, 31.896925],
      dtype=float32)

Evaluate Model Performance on Test Data

At last, we have calculated mean squared error and R^2 score on both train and test datasets to evaluate the performance of our neural network. We can notice from the results that it seems to have done a good job at the task.

The score() method will calculate the R^2 score for regression tasks by default.

If you are interested in learning about model evaluation metrics using scikit-learn then please feel free to check our tutorial on the same which explains the topic with simple and easy-to-understand examples.

In [212]:
from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, scikeras_regressor.predict(X_train))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, scikeras_regressor.predict(X_test))))

print("\nTrain R^2 : {}".format(scikeras_regressor.score(X_train, Y_train)))
print("Test  R^2 : {}".format(scikeras_regressor.score(X_test, Y_test)))
Train MSE : 14.433350680752365
Test  MSE : 20.05363882780375

Train R^2 : 0.829636176473454
Test  R^2 : 0.7576182303311937

Explore Training History

The history object is available through history_ attribute of KerasRegressor instance. We can use it to access loss and metric values for both train and validation sets. Those values can later be used for plotting purposes as well.

The metrics that we have set in metrics parameter of KerasRegressor will also have entries here for each epoch.

In [213]:
scikeras_regressor.history_.keys()
Out[213]:
dict_keys(['loss'])
In [214]:
scikeras_regressor.history_["loss"][-5:]
Out[214]:
[16.59892463684082,
 16.046966552734375,
 21.629776000976562,
 17.32277488708496,
 16.83257484436035]

2. Classification

In this section, we'll explain how we can solve a simple classification problem using keras neural net by wrapping it using scikeras so that we can use it like scikit-learn estimator for training and evaluation. We'll be creating a very simple neural network for explanation purposes. The dataset used for example is a simple wine classification toy dataset available from scikit-learn.

Load Dataset

In this section, we have loaded the wine dataset available from scikit-learn. The wine dataset has the measurement of ingredients used in the creation of three different types of wine. The measurement of ingredients is the features of our dataset and wine type is the target variable.

After loading, We have divided the dataset into the train (80%) and test (20%) sets.

In [10]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_wine(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[10]:
((142, 13), (36, 13), (142,), (36,))

Create Keras Classifier Neural Network

In this section, we have created a simple classification neural network which we'll use to solve our classification task. The input layer of the network is of shape 13 which is the same size as the number of features. The second layer has 13 units, the third layer has 26 units and the final layer has 3 units (same as the number of wine classes). The second and third layer has relu as activation function. The final layer has softmax as an activation function.

In [11]:
from tensorflow import keras
from tensorflow.keras import models

neural_classifier = models.Sequential(
    [
        keras.layers.Dense(13, activation="relu", input_shape=(X_train.shape[1],)),
        keras.layers.Dense(26, activation="relu"),
        keras.layers.Dense(3, activation="softmax")
    ]
)

neural_classifier.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_3 (Dense)              (None, 13)                182
_________________________________________________________________
dense_4 (Dense)              (None, 26)                364
_________________________________________________________________
dense_5 (Dense)              (None, 3)                 81
=================================================================
Total params: 627
Trainable params: 627
Non-trainable params: 0
_________________________________________________________________

Wrap Keras Model into Scikeras KerasClassifier

In this section, we have wrapped the keras neural network we created in the previous step into scikeras KerasClassifier. The KerasClassifier has an API for classification tasks. Below we have highlighted the definition of it which is almost the same as that of KerasRegressor.


  • KerasClassifier(model,optimizer="rmsprop",loss=None,metrics=None,epochs=1,batch_size=None,callbacks=None,validation_split=0.0, class_weight=None,verbose=1, warm_start=False, random_state=None)
    • This class constructor takes as input keras neural network and returns an instance of KerasClassifier which will behave like regression estimator from scikit-learn. We can call methods like fit(), predict(), score() and predict_proba() on instance of KerasClassifier.
    • The model parameter takes as input instance of keras.Model.
    • The optimizer parameter takes as input optimizer name specified as a string. It can also accept instances of optimizers created from keras.optimizers module. The default value is 'rmsprop'.
    • The loss function takes as input loss function name specified as a string. It can also accept instances of loss created from keras.losses module or callback from the same module.
    • The metrics parameter accepts a list of strings specifying metrics that need to be evaluated on each epoch.
    • The epochs parameter accepts integer specifying number of passes through train data.
    • The batch_size parameter specifies the batch size to use during training.
    • The callbacks parameter accepts a list of callable that needs to be executed at various stages of training like start/end of an epoch, start/end of the batch, etc.
    • The validation_split parameter accepts float in the range 0-1 which will divide the training dataset into train and validation sets based on the proportion specified using this parameter value.
    • The warm_start parameter accepts boolean values specifying whether to perform warm start or not. It is set to False by default which will reinitialize neural network weights each time we call fit() method on KerasClassifier. If we set it to True then it won't reinitialize network weights and training will start with weights after the last call to fit() method.

Below we have wrapped our keras classifier into an instance of KerasClassifier. We have asked to use adam optimizer and categorical cross entropy as a loss. We have set epochs to 100 so that training will make 100 passes through data. We have set validation_split to 0.1 which will instruct the model to use 10% of training data for validation purposes.

In [12]:
from scikeras.wrappers import KerasClassifier

scikeras_classifier = KerasClassifier(model=neural_classifier,
                                      optimizer="adam",
                                      loss=keras.losses.categorical_crossentropy,
                                      batch_size=8,
                                      epochs=100,
                                      verbose=0,
                                      validation_split=0.1
                                      )

Train Model

In this section, we have performed actual training by calling fit() method on an instance of KerasClassifier.

In [13]:
scikeras_classifier.fit(X_train, Y_train);

Make Predictions

In this section, we have made predictions on test data using predict() method. We can also make model output probabilities by calling predict_proba() method.

In [14]:
Y_preds = scikeras_classifier.predict(X_test)
Y_probs = scikeras_classifier.predict_proba(X_test)

Y_preds[:5], Y_probs[:5]
Out[14]:
(array([1, 0, 1, 2, 2]),
 array([[0.00256473, 0.97579134, 0.0216439 ],
        [0.954344  , 0.01311998, 0.03253599],
        [0.0036768 , 0.9303333 , 0.06598986],
        [0.0094066 , 0.1567529 , 0.8338405 ],
        [0.00464583, 0.28272545, 0.7126287 ]], dtype=float32))

Evaluate Model Performance

In this section, we have evaluated model performance by calculating accuracy on test and train datasets using score() method. It'll calculate accuracy for classification models.

In [15]:
print("Test  Accuracy : {:.2f}".format(scikeras_classifier.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(scikeras_classifier.score(X_train, Y_train)))
Test  Accuracy : 0.94
Train Accuracy : 0.90

Analyze Training History

Here, we have shown a few entries of training and validation losses using the history object of the model. The metrics that we have set in metrics parameter of KerasClassifier will also have entries here for each epoch.

In [16]:
scikeras_classifier.history_.keys()
Out[16]:
dict_keys(['loss', 'val_loss'])
In [17]:
scikeras_classifier.history_["loss"][-5:]
Out[17]:
[0.33565834164619446,
 0.4263419806957245,
 0.43367552757263184,
 0.34898641705513,
 0.3395718038082123]
In [18]:
scikeras_classifier.history_["val_loss"][-5:]
Out[18]:
[0.19689951837062836,
 0.3403555452823639,
 0.2308025360107422,
 0.13612313568592072,
 0.2179425209760666]

3. Warm Start

When running the keras model to improve performance, we generally run it for a few epochs, check performance and then run it again for a few epochs to check whether performance is improving or not. We generally perform these trials until we find good accuracy. The process will update weights that were already updated last time.

When we wrap our keras model inside of scikeras model, by default it'll reset weights of the model each time we call fit() method on them which is referred to as the cold start. If we want to call fit() method more than once and update weights of the model from the last call to fit() then we should set parameter warm_start to True when creating scikeras model. This will inform the model to set model weights only before the first call to fit() method and all subsequent calls should update already updated weights through previous calls to fit().

By default, warm_start parameter is set to False which will reset model weights before each call to fit(). We can change this default behavior by setting parameter warm_start to True.

Load Data, Create Neural Network and Wrap It Inside of KerasClassifier

Our code for this example starts by loading the wine dataset and divides it into train/test sets. It then creates a keras model which is the same as that of the classification section. We have then wrapped our keras model inside of KerasClassifier scikeras model. The code in this part is almost the same as the code from the classification section.

In [137]:
### Load Dataset

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_wine(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

### Define Neural Network

from scikeras.wrappers import KerasClassifier

from tensorflow import keras
from tensorflow.keras import models

neural_classifier = models.Sequential(
    [
        keras.layers.Dense(13, activation="relu", input_shape=(X_train.shape[1],)),
        keras.layers.Dense(26, activation="relu"),
        keras.layers.Dense(3, activation="softmax")
    ]
)

### Initialize Model

scikeras_classifier = KerasClassifier(model=neural_classifier,
                                    optimizer="adam",
                                    loss=keras.losses.categorical_crossentropy,
                                    batch_size=8,
                                    epochs=5,
                                    warm_start=True
                          )

Train Model

Below we are calling fit() method the first time with train data and the target variable. We have kept statistics getting displayed at the end of each epoch for this example. It's displaying train data loss at end of each epoch.

In [138]:
scikeras_classifier.fit(X_train, Y_train);
Epoch 1/5
18/18 [==============================] - 0s 2ms/step - loss: 48.3980
Epoch 2/5
18/18 [==============================] - 0s 2ms/step - loss: 9.2596
Epoch 3/5
18/18 [==============================] - 0s 3ms/step - loss: 4.1445
Epoch 4/5
18/18 [==============================] - 0s 2ms/step - loss: 1.5278
Epoch 5/5
18/18 [==============================] - 0s 1ms/step - loss: 1.1091

Below we have again called fit() method with train data to run the training process for another 5 epochs. This call to fit() method won't start with fresh model weights. Instead, it'll update weights from last call to fit() because we have set warm_start to True. We can notice from loss getting displayed after each epoch that it's decreasing from last call to fit().

In [139]:
scikeras_classifier.fit(X_train, Y_train);
Epoch 1/5
18/18 [==============================] - 0s 2ms/step - loss: 0.9780
Epoch 2/5
18/18 [==============================] - 0s 578us/step - loss: 0.9799
Epoch 3/5
18/18 [==============================] - 0s 3ms/step - loss: 1.0604
Epoch 4/5
18/18 [==============================] - 0s 2ms/step - loss: 0.8841
Epoch 5/5
18/18 [==============================] - 0s 2ms/step - loss: 0.8895

Below we have called fit() method again to run the training process for another 5 epochs.

In [140]:
scikeras_classifier.fit(X_train, Y_train);
Epoch 1/5
18/18 [==============================] - 0s 1ms/step - loss: 0.8919
Epoch 2/5
18/18 [==============================] - 0s 2ms/step - loss: 0.7897
Epoch 3/5
18/18 [==============================] - 0s 2ms/step - loss: 0.7840
Epoch 4/5
18/18 [==============================] - 0s 1ms/step - loss: 0.9901
Epoch 5/5
18/18 [==============================] - 0s 2ms/step - loss: 0.8462

Below we have printed model accuracy on train and test datasets after completion of the training process. The accuracy is quite less because we have run the training process for only 15 epochs. If we run it for like 100 epochs then accuracy will improve significantly which we have done in the classification section.

In [141]:
print("Test  Accuracy : {:.2f}".format(scikeras_classifier.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(scikeras_classifier.score(X_train, Y_train)))
5/5 [==============================] - 0s 3ms/step
Test  Accuracy : 0.61
18/18 [==============================] - 0s 992us/step
Train Accuracy : 0.55

4. Machine Learning Pipeline

In this section, we'll explain how we can create a machine learning pipeline where we perform a list of steps on data before feeding it to the model. We'll explain how we can create a pipeline using scikit-learn and use our scikeras model in it. The pipeline will be simple and will have two steps only. The first step will scale the data and the second step will fit keras model to it. We'll be using the Boston housing dataset for our purpose.

If you are interested in learning about how to create a machine learning pipeline using scikit-learn then please feel free to check our tutorial on the same which tries to explain the topic with simple and easy-to-understand examples.

Load Dataset

Below we have loaded the Boston housing dataset available from scikit-learn and divided it into train/test sets. The code is exactly the same as the one from the regression section.

In [163]:
### Load Dataset

from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[163]:
((404, 13), (102, 13), (404,), (102,))

Create Keras Model and Wrap It Inside of Scikeras Model

Below we have created a simple keras model for performing regression task on our Boston housing dataset and wrapped it inside of scikeras KerasRegressor model. The code for this part is exactly the same as that of the regression section hence we have not included a detailed explanation.

In [144]:
### Define Model

from tensorflow import keras
from tensorflow.keras import models

neural_regressor = models.Sequential(
    [
        keras.layers.Dense(26, activation="relu", input_shape=(X_train.shape[1],)),
        keras.layers.Dense(52, activation="relu"),
        keras.layers.Dense(1)
    ]
)

neural_regressor.summary()

### Initiate Model

from scikeras.wrappers import KerasRegressor

scikeras_regressor = KerasRegressor(model=neural_regressor,
                                    optimizer="adam",
                                    loss=keras.losses.mean_squared_error,
                                    batch_size=8,
                                    epochs=100,
                                    verbose=0
                          )
Model: "sequential_16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_50 (Dense)             (None, 26)                364
_________________________________________________________________
dense_51 (Dense)             (None, 52)                1404
_________________________________________________________________
dense_52 (Dense)             (None, 1)                 53
=================================================================
Total params: 1,821
Trainable params: 1,821
Non-trainable params: 0
_________________________________________________________________

Create and Train ML Pipeline

In this section, we have created our machine learning pipeline using Pipeline class of scikit-learn. It accepts a list of scikit-learn estimators which will be applied to data in sequence in which they are specified. We have a pipeline with two steps.

  1. Data Scaling
  2. Scikeras Regression Model

After creating the pipeline, we have trained the pipeline by calling fit() method on it giving train data and target variables to it.

If you want to learn about scaling the data for machine learning tasks then please feel free to check our tutorial on the same which covers the topic with simple and easy-to-understand examples.

In [145]:
## Create Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

ml_pipeline = Pipeline([("Normalize", RobustScaler()), ("Model", scikeras_regressor)])

ml_pipeline.fit(X_train, Y_train)
Out[145]:
Pipeline(steps=[('Normalize', RobustScaler()),
                ('Model',
                 KerasRegressor(batch_size=8, epochs=100, loss=<function mean_squared_error at 0x7fb99210f268>, model=<keras.engine.sequential.Sequential object at 0x7fb95e58fa20>, optimizer='Adam', verbose=0))])

Evaluate Performance of ML Pipeline

In this section, we have evaluated the performance of the ML pipeline by calculating MSE and R^2 score metrics on both train and test datasets. We can notice from the metrics output that the performance seems a little better due to scaling if we compare them with metrics results from the regression section.

In [146]:
### Evaluate Model

from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, ml_pipeline.predict(X_train).reshape(-1))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, ml_pipeline.predict(X_test).reshape(-1))))

print("\nTrain R^2 : {}".format(ml_pipeline.score(X_train, Y_train)))
print("Test  R^2 : {}".format(ml_pipeline.score(X_test, Y_test)))
Train MSE : 5.900818081631549
Test  MSE : 19.890827220335826

Train R^2 : 0.9303497883092433
Test  R^2 : 0.7595860809482091

5. Grid Search

In this section, we'll explain how we can perform a grid search on hyperparameters to tune the model for good performance. We'll be creating a simple keras model, wrapping it inside of the scikeras model, and grid searching different hyperparameters of the model to find parameters setting which gives the best results. We'll be using the Boston housing dataset for our purpose.

If you are interested in learning about hyperparameters grid search using scikit-learn then please feel free to check our tutorial on the same which covers the topic with simple and easy-to-understand examples.

Load Data, Create Keras Model and Wrap It Inside of KerasRegressor

In this section, we have loaded the Boston housing dataset and divided it into train/test sets. We have then created a simple keras model for the regression task and wrapped it inside of scikeras model. We have not provided optimizer parameter this time as we'll be trying different optimizers in a grid search.

In [181]:
### Load Dataset

from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

### Define Model

from tensorflow import keras
from tensorflow.keras import models

neural_regressor = models.Sequential(
    [
        keras.layers.Dense(26, activation="relu", input_shape=(X_train.shape[1],)),
        keras.layers.Dense(52, activation="relu"),
        keras.layers.Dense(1)
    ]
)

neural_regressor.summary()

### Initiate Model

from scikeras.wrappers import KerasRegressor

scikeras_regressor = KerasRegressor(model=neural_regressor,
                                    loss="mean_squared_error",
                                    verbose=0,
                                    epochs=100
                                    )
Model: "sequential_24"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_74 (Dense)             (None, 26)                364
_________________________________________________________________
dense_75 (Dense)             (None, 52)                1404
_________________________________________________________________
dense_76 (Dense)             (None, 1)                 53
=================================================================
Total params: 1,821
Trainable params: 1,821
Non-trainable params: 0
_________________________________________________________________

Grid Search Model Hyperparameters

In this section, we have first declared a hyperparameters search dictionary with three different hyperparameters.

  1. batch_size
  2. optimizer - This will try two different optimizers.
  3. optimizer__learning_rate - This will try 3 different values of learning rate for optimizer.

After creating a dictionary, we have created an instance of GridSearchCV by giving it scikeras model and hyperparameters dictionary. We have then called fit() method on an instance of GridSearchCV which will perform grid search by trying different combinations of those three hyperparameters to find the combination which gives the best result.

In [182]:
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings("ignore")

params = {
    "batch_size": [8,16],
    "optimizer": ["adam", "sgd"],
    "optimizer__learning_rate": [0.001, 0.01, 0.1],

}

grid = GridSearchCV(scikeras_regressor, params, scoring='r2')

grid.fit(X_train, Y_train)
Out[182]:
GridSearchCV(estimator=KerasRegressor(epochs=100, loss='mean_squared_error', model=<keras.engine.sequential.Sequential object at 0x7fb92c6d6400>, verbose=0),
             param_grid={'batch_size': [8, 16], 'optimizer': ['adam', 'sgd'],
                         'optimizer__learning_rate': [0.001, 0.01, 0.1]},
             scoring='r2')

Below we have printed hyperparameters setting that gave the best result. We have also printed the best score.

In [183]:
print("Best Score  : {}".format(grid.best_score_))
print("Best Params : {}".format(grid.best_params_))
Best Score  : 0.7587708752320055
Best Params : {'batch_size': 16, 'optimizer': 'adam', 'optimizer__learning_rate': 0.01}

Below we have evaluated MSE and R^2 score metrics on both train and test datasets to check the performance of the model with the best hyperparameters setting.

In [184]:
### Evaluate Model

from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, grid.predict(X_train))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, grid.predict(X_test))))

print("\nTrain R^2 : {}".format(grid.score(X_train, Y_train)))
print("Test  R^2 : {}".format(grid.score(X_test, Y_test)))
Train MSE : 14.932990609616233
Test  MSE : 26.485202415606704

Train R^2 : 0.8237386845777369
Test  R^2 : 0.6798820260674677

6. ML Pipeline + Grid Search

In this section, we have explained how we can perform a grid search on a machine learning pipeline. This way we can tune earlier components of the ML pipeline as well along with the ML model. We'll be using the same ML pipeline which we had used in the ML pipeline section. We'll be using the Boston housing dataset for this example.

Load Data, Create Keras Model and Wrap It Inside of KerasRegressor

In this section, we have loaded the Boston housing dataset and divided it into train/test sets. We have then created a simple keras model for the regression task and wrapped it inside of scikeras model. The code for this part is exactly the same as our code from the previous grid search section.

In [185]:
### Load Dataset

from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

### Define Model

from tensorflow import keras
from tensorflow.keras import models

neural_regressor = models.Sequential(
    [
        keras.layers.Dense(26, activation="relu", input_shape=(X_train.shape[1],)),
        keras.layers.Dense(52, activation="relu"),
        keras.layers.Dense(1)
    ]
)

neural_regressor.summary()

### Initiate Model

from scikeras.wrappers import KerasRegressor

scikeras_regressor = KerasRegressor(model=neural_regressor,
                                    loss="mean_squared_error",
                                    verbose=0,
                                    epochs=100
                                    )
Model: "sequential_25"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_77 (Dense)             (None, 26)                364
_________________________________________________________________
dense_78 (Dense)             (None, 52)                1404
_________________________________________________________________
dense_79 (Dense)             (None, 1)                 53
=================================================================
Total params: 1,821
Trainable params: 1,821
Non-trainable params: 0
_________________________________________________________________

Grid Search Model Hyperparameters

In this section, we have first declared a hyperparameters search dictionary with three hyperparameters to be tuned with different values of them. We have prefixed each hyperparameter name with string 'Model__' to specify that those hyperparameters are of scikeras model. The reason behind adding this prefix is that because we have specified scikeras model name as string 'Model' inside of ML pipeline.

After creating a dictionary, we have created an ML pipeline as we had created in the ML pipeline section. We have then created an instance of GridSearchCV by giving ML pipeline and hyperparameters dictionary to it. We have then called fit() method on an instance of GridSearchCV to perform grid search on hyperparameters.

In [186]:
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

## Declare Hyperparameters Range

params = {
    "Model__batch_size": [8,16],
    "Model__optimizer": ["adam", "sgd"],
    "Model__optimizer__learning_rate": [0.001, 0.01, 0.1],

}

### Create Pipeline

ml_pipeline = Pipeline([("Normalize", RobustScaler()), ("Model", scikeras_regressor)])

## Grid Search Hyperparameters

grid = GridSearchCV(ml_pipeline, params, scoring="r2")

grid.fit(X_train, Y_train)
Out[186]:
GridSearchCV(estimator=Pipeline(steps=[('Normalize', RobustScaler()),
                                       ('Model',
                                        KerasRegressor(epochs=100, loss='mean_squared_error', model=<keras.engine.sequential.Sequential object at 0x7fb92c8c2978>, verbose=0))]),
             param_grid={'Model__batch_size': [8, 16],
                         'Model__optimizer': ['adam', 'sgd'],
                         'Model__optimizer__learning_rate': [0.001, 0.01, 0.1]},
             scoring='r2')

Below we have printed the performance of the model which gave the best result and the hyperparameters setting which was responsible for that result.

In [187]:
print("Best Score  : {}".format(grid.best_score_))
print("Best Params : {}".format(grid.best_params_))
Best Score  : 0.8667697123938662
Best Params : {'Model__batch_size': 8, 'Model__optimizer': 'adam', 'Model__optimizer__learning_rate': 0.001}

Below we have printed MSE and R^2 scores evaluated on train and test datasets using the above ML pipeline with the best hyperparameters setting.

In [188]:
### Evaluate Model

from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, grid.predict(X_train))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, grid.predict(X_test))))

print("\nTrain R^2 : {}".format(grid.score(X_train, Y_train)))
print("Test  R^2 : {}".format(grid.score(X_test, Y_test)))
Train MSE : 5.8123676239318565
Test  MSE : 20.746198277111827

Train R^2 : 0.9313938118696553
Test  R^2 : 0.7492474909174864

7. Saving and Loading Model

In this section, we'll explain how we can save keras model wrapped inside of scikeras model to a file and then load it again.

We can access keras model underlying scikeras model anytime by just accessing model attribute of scikeras model. Below we have called model attribute on scikeras model from the regression section.

In [215]:
scikeras_regressor.model
Out[215]:
<keras.engine.sequential.Sequential at 0x7fb92c08c4a8>

Save Model

Keras model has a method named save() which accepts pathname as input and will save keras model in that path.

Below we have called save() method on keras model present inside of scikeras model from regression section. We can notice a logging message informing us that model is saved inside of keras_regressor path.

In [216]:
scikeras_regressor.model.save("keras_regressor")
INFO:tensorflow:Assets written to: keras_regressor/assets
In [19]:
%ls keras_regressor/
assets/  keras_metadata.pb  saved_model.pb  variables/

Load Model

We can load the keras model from the saved path by calling load_model() method available from keras.models module. Below we have reloaded our keras model from keras_regressor directory.

In [217]:
neural_regressor2 = keras.models.load_model("keras_regressor")

After loading the keras model, we have wrapped it again inside of KerasRegressor to create a new scikeras model. We have set other parameters exactly the same way as we had set earlier during the regression section.

In [218]:
scikeras_regressor2 = KerasRegressor(model=neural_regressor2,
                                     optimizer="adam",
                                     loss=keras.losses.mean_squared_error,
                                     batch_size=8,
                                     epochs=100,
                                     verbose=0
                                    )

After creating scikeras model, we need to initialize it as well so that it can be used to make predictions.

In [219]:
scikeras_regressor2.initialize(X_train, Y_train)
Out[219]:
KerasRegressor(
	model=<keras.engine.sequential.Sequential object at 0x7fb90ab44278>
	build_fn=None
	warm_start=False
	random_state=None
	optimizer=adam
	loss=<function mean_squared_error at 0x7fb99210f268>
	metrics=None
	batch_size=8
	validation_batch_size=None
	verbose=0
	callbacks=None
	validation_split=0.0
	shuffle=True
	run_eagerly=False
	epochs=100
)

Below we have calculated MSE and R^2 score metrics on both train and test datasets using both the original scikeras model from the regression section and the one we loaded from a file. We can notice that both have given the same results hence we have correctly loaded the model from the file.

In [222]:
from sklearn.metrics import mean_squared_error

print("===== Original Model Performance ========\n")

print("Train MSE : {}".format(mean_squared_error(Y_train, scikeras_regressor.predict(X_train))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, scikeras_regressor.predict(X_test))))

print("\nTrain R^2 : {}".format(scikeras_regressor.score(X_train, Y_train)))
print("Test  R^2 : {}".format(scikeras_regressor.score(X_test, Y_test)))

print("\n===== Loaded Model Performance ========\n")

print("Train MSE : {}".format(mean_squared_error(Y_train, scikeras_regressor2.predict(X_train))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, scikeras_regressor2.predict(X_test))))

print("\nTrain R^2 : {}".format(scikeras_regressor2.score(X_train, Y_train)))
print("Test  R^2 : {}".format(scikeras_regressor2.score(X_test, Y_test)))
===== Original Model Performance ========

Train MSE : 14.433350680752365
Test  MSE : 20.05363882780375

Train R^2 : 0.829636176473454
Test  R^2 : 0.7576182303311937

===== Loaded Model Performance ========

Train MSE : 14.433350680752365
Test  MSE : 20.05363882780375

Train R^2 : 0.829636176473454
Test  R^2 : 0.7576182303311937

This ends our small tutorial explaining how we can wrap keras model inside of scikeras so that the resulting model can be used like scikit-learn estimator with simple API. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki