Updated On : Oct-12,2021 Tags skorch, scikit-learn, pytorch
Skorch: Give Scikit-Learn like API to your PyTorch Neural Networks

Skorch: Give Scikit-Learn like API to your PyTorch Networks

Scikit-Learn is the most famous ML library out there. It's been the most preferred ML library for a long time. It has an implementation of the majority of ML algorithms related to any problems (regression, classification, clustering, anomaly detection, dimensionality reduction, etc.). One of the main reasons why it's most preferred is because it's easy to use API. It let us perform training of our data and evaluation of it on test sets with just a few function calls. Though it has good API and implementation of the majority of ML algorithms, it does not have support for deep neural networks (convolutional neural network, recurrent neural network, etc.) which are commonly getting used to solving complicated problems (image classification, speech recognition, etc) nowadays. Apart from this, Scikit-Learn does not have support for running code on GPU as well.

The commonly preferred library when creating a deep neural network is PyTorch. It let us create complicated networks like convolutional and recurrent neural networks as well as it let us run the code on GPU as well. But one of the drawbacks of Pytorch is that it's a lower-level library that requires us to design training and evaluation code for our problem. This can take time to get it working correctly and can result in errors as well sometimes. This can be a little hard for the person with a background in Scikit-Learn who wants to use PyTorch to solve their problem.

To eliminate the drawbacks of both Scikit-Learn and PyTorch, a new library named Skorch was created. It let us use PyTorch for creating complicated neural network models and then use Scikit-Learn like API for training and evaluating that model. This frees developers from the burden of writing code for the training and evaluation of models. Skorch also makes the life of a developer who has good experience with Scikit-Learn and lets him easily use PyTorch to solve complicated problems using neural networks without worrying too much about code of training and evaluation.

As a part of this tutorial, we'll explain with simple examples how we can use Skorch to train and evaluate PyTorch models. We'll be creating simple neural networks to make things easy to understand and will be trying them on small toy datasets. Below we have listed important sections of the tutorial for giving an overview of what we'll be covering.

Important Sections of Tutorial

  1. Regression
    • Load Dataset
    • Create Neural Network Model using PyTorch
    • Wrap Neural Network into NeuralNetRegressor Object of Skorch
    • Train Model
    • Predict Using Trained Model
    • Evaluate Model Performance
    • Analyze History Object of Training
  2. Classification
    • Load Dataset
    • Create Neural Network Model using PyTorch
    • Wrap Neural Network into NeuralNetClassifier Object of Skorch
    • Train Model
    • Predict Using Trained Model
    • Evaluate Model Performance
    • Analyze History Object of Training
  3. Machine Learning Pipeline
  4. Grid Search Hyperparameters
  5. ML Pipeline + Grid Search
  6. Saving and Loading ML Model

Below we have imported the necessary libraries that we'll use in our tutorial and printed the version of each.

In [1]:
import skorch

print("Skorch Version : {}".format(skorch.__version__))
Skorch Version : 0.10.0
In [2]:
import torch

print("Pytorch Version : {}".format(torch.__version__))
Pytorch Version : 1.9.1+cpu
In [3]:
import sklearn

print("Scikit-Learn Version : {}".format(sklearn.__version__))
Scikit-Learn Version : 0.24.2

1. Regression

In this section, we'll explain how we can use skorch for a regression problem. We'll be designing a simple PyTorch neural network and using it to solve a regression problem. We'll be using the Boston housing dataset available from scikit-learn for our purpose.

Load Dataset

In this section, we have loaded the Boston housing dataset available from scikit-learn. It has information houses in Boston like the number of bedrooms, the crime rate in the area, tax rate, etc. The target variable of the dataset is the median value of homes in 1000 dollars. As the target variable is a continuous variable, this will be a regression problem.

We have divided the dataset into the train (80%) and test (20%) sets as well. After that, we have converted the train and test dataset into PyTorch tensor as all PyTorch models require input as tensors.

In [4]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

## Convert dataset to torch tensors

from torch import tensor

X_train = tensor(X_train, dtype=torch.float32)
X_test = tensor(X_test, dtype=torch.float32)
Y_train = tensor(Y_train.reshape(-1,1), dtype=torch.float32)
Y_test = tensor(Y_test.reshape(-1,1), dtype=torch.float32)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[4]:
(torch.Size([404, 13]),
 torch.Size([102, 13]),
 torch.Size([404, 1]),
 torch.Size([102, 1]))

Create Neural Network using Pytorch

In this section, we have created a simple neural network that takes as input our dataset features and predicts the median house price.

The design of a neural network is simple. The first layer is the input layer with 13 inputs (one for each feature of data). The second layer has 26 neurons, the third layer has 52 neurons and the final layer has just one neuron for predicting the house price. We have initialized all layers in init method of the class. The forward method has actual logic about going through the network. It applies relu activation function after each layer and returns output at last.

Please make a NOTE that we have not explained neural network creation using PyTorch in detail as we expect that reader has a background in simple model creation using it.

In [100]:
from torch import nn
import torch.nn.functional as F

class Regressor(nn.Module):
    def __init__(self):
        super(Regressor, self).__init__()

        self.first_layer = nn.Linear(13, 26)
        self.second_layer = nn.Linear(26,52)
        self.final_layer = nn.Linear(52,1)

    def forward(self, x_batch):
        X = self.first_layer(x_batch)
        X = F.relu(X)

        X = self.second_layer(X)
        X = F.relu(X)

        return self.final_layer(X)

Wrap Neural Network into NeuralNetRegressor Object of Skorch

In this section, we have included logic that will create an ML model by wrapping PyTorch neural network which will behave like scikit-learn models and have API-like scikit-learn models (methods like fit(), predict(), etc).

In order to make our PyTorch neural network behave like scikit-learn ML models, we need to wrap them into NeuralNetRegressor instance of skorch. Below we have given the definition of NeuralNetRegressor for explanation purposes.


  • NeuralNetRegressor(module,optimizer=torch.optim.SGD,criterion=torch.nn.MSELoss,lr=0.01,max_epochs=10,batch_size=128,train_split=skorch.dataset.CVSplit(5),warm_start=False,verbose=1,device="cpu",**kwargs)
    • This class constructor takes as input pytorch neural network and returns an instance of NeuralNetRegressor which will behave like scikit-learn ML models. We can call methods like fit(), score() and predict() on instance of NeuralNetRegressor.
    • The module parameter takes an instance of torch.nn.Module class which represents neural network designed using PyTorch.
    • The optimizer parameter takes as input instance of any class from torch.optim module. The commonly used optimizers are SGD, Adam, Adagrad, etc. The default value of optimizer for NeuralNetRegressor is torch.optim.SGD. The optimizer is used to update the parameters of the neural network after calculating gradients.
    • The criterion parameter accepts reference to loss function available from torch.nn module. The default value of loss function for NeuralNetRegressor is torch.nn.MSELoss. The gradients of parameters is calculated with respect to loss function.
    • The lr parameter represents the learning rate of the neural network and accepts float value. The default value is 0.01.
    • The max_epochs parameter accepts integer value specifying the number of epochs to try.
    • The train_split parameter accepts instance of skorch.dataset.CVSplit specifying how to divide train dataset to create train and validation sets. The validation set will be used to evaluate the performance of the model for each validation set. The default value is skorch.dataset.CVSplit(5) which will divide the dataset into 5 sets, use 4 sets for training and 1 set for validation purposes.
    • The warm_start parameter accepts boolean value. If the value of the parameter is set to True then each call to fit() won't reinitialize the parameter. It'll start with the last trained model. If set to False, it'll reinitialize model parameter for each call to fit() method.
    • Apart from this, if we want to set the value of parameters of optimizer then we can give that by using special notation. We need to give an optimizer parameter prefixed by optimizer__ string. E.g - optimizer__momentum.

Below we have created an instance of NeuralNetRegressor by giving it our neural network which we had created in an earlier cell. We have given an instance of Adam optimizer to be used in the model. We have asked the model to be trained for 500 epochs for each call to fit() method. We have set verbose parameter to 0 which will have silent updates printed at each epoch. By default, the output result of each epoch is printed. As we have nearly 500 epochs, it'll flood output hence we have silent it.

In [105]:
from skorch import NeuralNetRegressor
from torch import optim

skorch_regressor = NeuralNetRegressor(module=Regressor, optimizer=optim.Adam, max_epochs=500, verbose=0)

skorch_regressor
Out[105]:
<class 'skorch.regressor.NeuralNetRegressor'>[uninitialized](
  module=<class '__main__.Regressor'>,
)

Train Model

In this section, we have simply trained the model by calling fit() method giving it train dataset features and the target variable.

In [106]:
skorch_regressor.fit(X_train, Y_train);

Predict Using Trained Model

We can make predictions using predict() method by giving it dataset features.

In [107]:
Y_preds = skorch_regressor.predict(X_test)

Y_preds[:5]
Out[107]:
array([[14.093064],
       [25.108112],
       [43.38271 ],
       [18.237015],
       [28.524502]], dtype=float32)

Evaluate Model Performance

In this section, we are evaluating model performance by calculating mean squared error and R2 score on both train and test datasets. We can notice that our model seems to have good performance based on evaluation results.

The score() method will calculate R^2 score for regression problems.

If you are interested in learning about model evaluation metrics using scikit-learn then please feel free to check our tutorial on the same which explains the topic with simple and easy-to-understand examples.

In [108]:
from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, skorch_regressor.predict(X_train).reshape(-1))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, skorch_regressor.predict(X_test).reshape(-1))))

print("\nTrain R^2 : {}".format(skorch_regressor.score(X_train, Y_train)))
print("Test  R^2 : {}".format(skorch_regressor.score(X_test, Y_test)))
Train MSE : 10.250248908996582
Test  MSE : 20.14325714111328

Train R^2 : 0.8790113398146482
Test  R^2 : 0.7565350467727023

Analyze History Object of Training

We can access training history using history attribute of NeuralNetRegressor instance. It has information about train loss, validation loss, epoch number, etc.

Below we have retrieved a few details from the history of training and printed them.

In [696]:
skorch_regressor.history[-2:]
Out[696]:
[{'batches': [{'train_loss': 8.739419937133789, 'train_batch_size': 128},
   {'train_loss': 9.040390014648438, 'train_batch_size': 128},
   {'train_loss': 9.30402660369873, 'train_batch_size': 67},
   {'valid_loss': 14.756830215454102, 'valid_batch_size': 81}],
  'epoch': 499,
  'train_batch_count': 3,
  'valid_batch_count': 1,
  'dur': 0.004074573516845703,
  'train_loss': 8.97580636617938,
  'train_loss_best': False,
  'valid_loss': 14.756830215454102,
  'valid_loss_best': False},
 {'batches': [{'train_loss': 7.998040199279785, 'train_batch_size': 128},
   {'train_loss': 11.204773902893066, 'train_batch_size': 128},
   {'train_loss': 5.27079439163208, 'train_batch_size': 67},
   {'valid_loss': 15.106494903564453, 'valid_batch_size': 81}],
  'epoch': 500,
  'train_batch_count': 3,
  'valid_batch_count': 1,
  'dur': 0.004060506820678711,
  'train_loss': 8.703106592314162,
  'train_loss_best': False,
  'valid_loss': 15.106494903564453,
  'valid_loss_best': False}]
In [729]:
skorch_regressor.history[:, ("train_loss", "valid_loss")][-5:]
Out[729]:
[(7.913102340402987, 14.174866676330566),
 (8.309625632991732, 20.683568954467773),
 (9.94256834349027, 18.97480583190918),
 (8.97580636617938, 14.756830215454102),
 (8.703106592314162, 15.106494903564453)]
In [730]:
skorch_regressor.history[-1:, ("train_loss", "valid_loss")]
Out[730]:
[(8.703106592314162, 15.106494903564453)]

2. Classification

In this section, we have explained how we can use PyTorch classification models like scikit-learn models. We’ll be designing a simple PyTorch classification neural network as a part of this example. We'll be using the wine dataset available from scikit-learn for our purpose.

Load Dataset

In this section, we have loaded the wine dataset available from scikit-learn. The wine dataset has the measurement of ingredients used in the creation of three different types of wine. The measurement of ingredients is the features of our dataset and wine type is the target variable.

After loading, We have divided the dataset into the train (80%) and test (20%) sets. We have then converted all train and test array into tensor as all PyTorch models requires input as tensors.

In [520]:
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_wine(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

## Convert dataset to torch tensors

from torch import tensor

X_train = tensor(X_train, dtype=torch.float32)
X_test = tensor(X_test, dtype=torch.float32)
Y_train = tensor(Y_train, dtype=torch.long)
Y_test = tensor(Y_test, dtype=torch.long)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Out[520]:
(torch.Size([142, 13]),
 torch.Size([36, 13]),
 torch.Size([142]),
 torch.Size([36]))

Create Neural Network Model using PyTorch

In this section, we have designed a simple PyTorch model of a few layers. The first layer is the input layer with 13 inputs (13 features). The second layer has 5 neurons and the last layer has 13 neurons. We have initialized each layer inside of init method of the model class. We have then included logic to take a dataset and make predictions inside of forward() method. We have also applied relu activation for each intermediate layer. At the final layer, we have applied softmax activation.

In [683]:
from torch import nn
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()

        self.first_layer = nn.Linear(13, 5)
        self.second_layer = nn.Linear(5,13)
        self.final_layer = nn.Linear(13,3)

    def forward(self, x_batch):
        X = self.first_layer(x_batch)
        X = F.relu(X)

        X = self.second_layer(X)
        X = F.relu(X)
        X = F.dropout(X, 0.15)

        X = self.final_layer(X)
        X = F.relu(X)

        return F.softmax(X, dim=1)

Wrap Neural Network into NeuralNetClassifier Object of Skorch

In order to make our PyTorch classification neural net behave like scikit-learn models, we need to wrap them inside of NeuralNetClassifier class. It has the almost same signature as that of NeuralNetRegressor. Below we have highlighted the definition.


  • NeuralNetClassifier(module,optimizer=torch.optim.SGD,criterion=torch.nn.modules.loss.NLLLoss,classes=None,lr=0.01,max_epochs=10,batch_size=128,train_split=skorch.dataset.CVSplit(5),warm_start=False,verbose=1,device="cpu",**kwargs)
    • This class constructor takes as input pytorch neural network and returns an instance of NeuralNetClassifier which will behave like scikit-learn ML models. We can call methods like fit(), predict(), score() and predict_proba() on instance of NeuralNetClassifier.
    • The module parameter takes an instance of torch.nn.Module class which represents neural network designed using PyTorch.
    • The optimizer parameter takes as input instance of any class from torch.optim module. The commonly used optimizers are SGD, Adam, Adagrad, etc. The default value of optimizer for NeuralNetClassifier is torch.optim.SGD. The optimizer is used to update the parameters of the neural network after calculating gradients.
    • The criterion parameter accepts reference to loss function available from torch.nn module. The default value of loss function for NeuralNetClassifier is torch.nn.modules.loss.NLLLoss. The gradients of parameters is calculated with respect to loss function.
    • The classes parameter takes as input list of classes of the task.
    • The lr parameter represents the learning rate of the neural network and accepts float value. The default value is 0.01.
    • The max_epochs parameter accepts integer value specifying the number of epochs to try.
    • The train_split parameter accepts instance of skorch.dataset.CVSplit specifying how to divide train dataset to create train and validation sets. The validation set will be used to evaluate the performance of the model for each validation set. The default value is skorch.dataset.CVSplit(5) which will divide the dataset into 5 sets, use 4 sets for training and 1 set for validation purposes.
    • The warm_start parameter accepts boolean value. If the value of the parameter is set to True then each call to fit() won't reinitialize the parameter. It'll start with the last trained model. If set to False, it'll reinitialize model parameter for each call to fit() method.
    • Apart from this, if we want to set the value of parameters of optimizer then we can give that by using special notation. We need to give an optimizer parameter prefixed by optimizer__ string. E.g - optimizer__momentum.

Below we have wrapped our PyTorch classifier inside of NeuralNetClassifier. We have also instructed to use nn.CrossEntropyLoss as loss function and optim.Adam as optimizer. We have also instructed train_split parameter to do a stratified split of the dataset as its classification dataset and we want the proportion of classes to be maintained across different datasets.

In [688]:
from skorch import NeuralNetClassifier
from torch import optim

skorch_classifier = NeuralNetClassifier(module=Classifier,
                                        criterion=nn.CrossEntropyLoss,
                                        optimizer=optim.Adam,
                                        max_epochs=750,
                                        train_split=skorch.dataset.CVSplit(cv=5, stratified=True),
                                        verbose=0,
                                       )

skorch_classifier
Out[688]:
<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.Classifier'>,
)

Train Model

Now, we have trained the model by calling fit() method and giving it the train dataset and the target variable.

In [689]:
skorch_classifier.fit(X_train, Y_train);

Predict Using Trained Model

Below we have made prediction on test dataset using predict() and predict_proba() methods. The predict() method will return actual class and predict_proba() will return probabilities of each class.

In [698]:
Y_preds = skorch_classifier.predict(X_test)
Y_probs = skorch_classifier.predict_proba(X_test)

Y_preds[:5], Y_probs[:5]
Out[698]:
(array([1, 0, 1, 2, 2]),
 array([[0.21194157, 0.5761169 , 0.21194157],
        [0.5761169 , 0.21194157, 0.21194157],
        [0.21194157, 0.5761169 , 0.21194157],
        [0.21194226, 0.21194226, 0.5761155 ],
        [0.21194509, 0.21194509, 0.5761099 ]], dtype=float32))

Evaluate Model Performance

In this section, we have printed the accuracy of the model on train and test datasets. The score() method will calculate accuracy by default.

In [691]:
print("Test  Accuracy : {:.2f}".format(skorch_classifier.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(skorch_classifier.score(X_train, Y_train)))
Test  Accuracy : 0.97
Train Accuracy : 0.94

Analyze History Object of Training

Below we have printed a few entries of the history of the training process for analysis purposes.

In [694]:
skorch_classifier.history[-2:]
Out[694]:
[{'batches': [{'train_loss': 0.6311323642730713, 'train_batch_size': 113},
   {'valid_loss': 0.5863255858421326, 'valid_batch_size': 29}],
  'epoch': 749,
  'train_batch_count': 1,
  'valid_batch_count': 1,
  'dur': 0.0020182132720947266,
  'train_loss': 0.6311323642730713,
  'train_loss_best': False,
  'valid_loss': 0.5863255858421326,
  'valid_loss_best': False,
  'valid_acc': 0.9655172413793104,
  'valid_acc_best': False},
 {'batches': [{'train_loss': 0.614655077457428, 'train_batch_size': 113},
   {'valid_loss': 0.5999448299407959, 'valid_batch_size': 29}],
  'epoch': 750,
  'train_batch_count': 1,
  'valid_batch_count': 1,
  'dur': 0.0019571781158447266,
  'train_loss': 0.614655077457428,
  'train_loss_best': False,
  'valid_loss': 0.5999448299407959,
  'valid_loss_best': False,
  'valid_acc': 0.9310344827586207,
  'valid_acc_best': False}]
In [724]:
skorch_classifier.history[:, ("train_loss", "valid_loss")][-5:]
Out[724]:
[(0.6224929094314575, 0.5907756090164185),
 (0.6160370111465454, 0.5874690413475037),
 (0.6088661551475525, 0.586213231086731),
 (0.6311323642730713, 0.5863255858421326),
 (0.614655077457428, 0.5999448299407959)]

3. Machine Learning Pipeline

In this section, we'll explain how we can create a machine learning pipeline using scikit-learn by treating our PyTorch model as a sklearn estimator using skorch. We'll be creating a simple ML pipeline with only two steps. The first step of the pipeline will scale the data and the second step will apply our PyTorch model wrapped inside of skorch class. We'll be using the Boston housing dataset for this example hence our example will be solving the regression task.

If you are interested in learning about how to create a machine learning pipeline using scikit-learn then please feel free to check our tutorial on the same which tries to explain the topic with simple and easy-to-understand examples.

Load Dataset

In this section, we have loaded the Boston housing dataset from sklearn, divided it into train/test sets, and converted datasets into Pytorch tensor. The code in this part is almost exactly the same as our code from the first example data load.

In [4]:
### Load Dataset

from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train = X_train.astype(np.float32)
X_test = X_test.astype(np.float32)
Y_train = Y_train.reshape(-1,1).astype(np.float32)
Y_test = Y_test.reshape(-1,1).astype(np.float32)

Create Neural Network and Wrap it in NeuralNetRegressor Object

In this section, we have created a simple PyTorch model and wrapped it into NeuralNetRegressor class of skorch so that it can be used like sklearn estimator. The code of this part is almost the same as that of the code from the regression section.

In [5]:
## Model Definition

from torch import nn
import torch.nn.functional as F

class Regressor(nn.Module):
    def __init__(self):
        super(Regressor, self).__init__()

        self.first_layer = nn.Linear(13, 26)
        self.second_layer = nn.Linear(26,52)
        self.final_layer = nn.Linear(52,1)

    def forward(self, x_batch):
        X = self.first_layer(x_batch)
        X = F.relu(X)

        X = self.second_layer(X)
        X = F.relu(X)

        return self.final_layer(X)

## Declare Model

from skorch import NeuralNetRegressor
from torch import optim

skorch_regressor = NeuralNetRegressor(module=Regressor, optimizer=optim.Adam, max_epochs=500, verbose=0)

skorch_regressor
Out[5]:
<class 'skorch.regressor.NeuralNetRegressor'>[uninitialized](
  module=<class '__main__.Regressor'>,
)

Create ML Pipeline and Train Pipeline

In this section, we have created a machine learning pipeline using Pipeline class of sklearn. Our ML pipeline consists of two steps.

  1. The first step applies RobustScaler to data to scale it.
  2. The second step applies an actual model to scaled data.

After creating the pipeline, we have called fit() method on it to train the pipeline using train data.

If you want to learn about scaling the data for machine learning tasks then please feel free to check our tutorial on the same which covers the topic with simple and easy-to-understand examples.

In [6]:
## Create Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

ml_pipeline = Pipeline([("Normalize", RobustScaler()), ("Model", skorch_regressor)])

ml_pipeline.fit(X_train, Y_train)
Out[6]:
Pipeline(steps=[('Normalize', RobustScaler()),
                ('Model',
                 <class 'skorch.regressor.NeuralNetRegressor'>[initialized](
  module_=Regressor(
    (first_layer): Linear(in_features=13, out_features=26, bias=True)
    (second_layer): Linear(in_features=26, out_features=52, bias=True)
    (final_layer): Linear(in_features=52, out_features=1, bias=True)
  ),
))])

Evaluate Pipeline Performance

In this section, we have evaluated the performance of the pipeline by calculating MSE and R^2 scores on train and test datasets. We can compare the output with that of output from the regression section and can notice that metrics results have improved significantly by just applying simple scaling to data.

In [7]:
### Evaluate Model

from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, ml_pipeline.predict(X_train).reshape(-1))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, ml_pipeline.predict(X_test).reshape(-1))))

print("\nTrain R^2 : {}".format(ml_pipeline.score(X_train, Y_train)))
print("Test  R^2 : {}".format(ml_pipeline.score(X_test, Y_test)))
Train MSE : 3.176720142364502
Test  MSE : 18.393556594848633

Train R^2 : 0.9625036387209648
Test  R^2 : 0.7776831226676528

4. Grid Search

In this section, we'll explain how we can perform hyperparameters tunning by grid searching through different values of hyperparameters. We'll be designing a simple PyTorch neural network, wrapping it inside of skorch class, and grid search through different hyperparameters of the model to find the best hyperparameter settings that give the best results. We'll be using the Boston housing dataset from the previous section.

If you are interested in learning about hyperparameters grid search using scikit-learn then please feel free to check our tutorial on the same which covers the topic with simple and easy-to-understand examples.

Create Neural Network and Wrap it in NeuralNetRegressor Object

In this section, we have created a simple PyTorch neural network for the regression task and wrapped it inside of NeuralNetRegressor class of skorch to make it behave like sklearn estimator. The code for this part is almost the same as the code from the regression section.

In [14]:
## Model Definition

from torch import nn
import torch.nn.functional as F

class Regressor(nn.Module):
    def __init__(self):
        super(Regressor, self).__init__()

        self.first_layer = nn.Linear(13, 26)
        self.second_layer = nn.Linear(26,52)
        self.final_layer = nn.Linear(52,1)

    def forward(self, x_batch):
        X = self.first_layer(x_batch)
        X = F.relu(X)

        X = self.second_layer(X)
        X = F.relu(X)

        return self.final_layer(X)

## Declare Model

from skorch import NeuralNetRegressor
from torch import optim

skorch_regressor = NeuralNetRegressor(module=Regressor, optimizer=optim.Adam, verbose=0)

skorch_regressor
Out[14]:
<class 'skorch.regressor.NeuralNetRegressor'>[uninitialized](
  module=<class '__main__.Regressor'>,
)

Grid Search Model Hyperparameters

In this section, we have first declared a hyperparameters dictionary with a list of hyperparameters and their different values to try. The GridSearchCV from sklearn will try all combinations of these hyperparameters values with our data and keep track of results. We'll be trying different values of the below 3 hyperparameters.

  1. lr - learning rate
  2. max_epochs - Number of pass through training data
  3. optimizer__weight_decay - This hyperparameter value is that of optimizer hence we have specified hyperparameter name after optimizer__.

After creating an instance of GridSearchCV by giving skorch regressor and hyperparameters dictionary, we have also performed hyperparameters search by calling fit() method on grid search instance. The call to fit() method will try different combinations of hyperparameters on a model with given data.

In [15]:
from sklearn.model_selection import GridSearchCV

params = {
    "lr": [0.01, 0.02],
    "max_epochs": [100, 250, 500],
    "optimizer__weight_decay": [0, 0.1]

}

grid = GridSearchCV(skorch_regressor, params)

grid.fit(X_train, Y_train)
Out[15]:
GridSearchCV(estimator=<class 'skorch.regressor.NeuralNetRegressor'>[uninitialized](
  module=<class '__main__.Regressor'>,
),
             param_grid={'lr': [0.01, 0.02], 'max_epochs': [100, 250, 500],
                         'optimizer__weight_decay': [0, 0.1]})

In this section, we have printed the best hyperparameters settings which gave the best score.

In [16]:
print("Best Score  : {}".format(grid.best_score_))
print("Best Params : {}".format(grid.best_params_))
Best Score  : 0.7622688901678492
Best Params : {'lr': 0.01, 'max_epochs': 500, 'optimizer__weight_decay': 0.1}

At last, we have printed MSE and R^2 scores for train/test datasets using grid instance which will use a model with best hyperparameters settings.

In [17]:
### Evaluate Model

from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, grid.predict(X_train).reshape(-1))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, grid.predict(X_test).reshape(-1))))

print("\nTrain R^2 : {}".format(grid.score(X_train, Y_train)))
print("Test  R^2 : {}".format(grid.score(X_test, Y_test)))
Train MSE : 15.645158767700195
Test  MSE : 26.84882926940918

Train R^2 : 0.8153326306231864
Test  R^2 : 0.6754870154482193

5. ML Pipeline + Grid Search

In this section, we have explained how we can perform a grid search for hyperparameters tunning on a machine learning pipeline. We can tune various parameters of individual parts of the pipeline. We'll be creating a pipeline using scikit-learn and performing a grid search on it. We'll be using the Boston housing dataset which we had loaded earlier. We'll also be reusing the skorch wrapped PyTorch model for the task which we had created in the previous section.

Grid Search ML Pipeline Hyperparameters

Below we have first declared a hyperparameters search dictionary. As we'll be tunning hyperparameters of our skorch model, we have prefixed all hyperparameter names with string 'Model__'. The reason behind adding this prefix is because we have given the name 'Model' to our skorch model inside of ML pipeline which we have declared next.

We have then created an instance of GridSearchCV by giving it ML pipeline and hyperparameters dictionary. At last, we have performed grid search on ML pipeline by calling fit() method on GridSearchCV instance by giving it training data.

In [11]:
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

params = {
    "Model__lr": [0.01, 0.02],
    "Model__max_epochs": [100, 250, 500],
    "Model__optimizer__weight_decay": [0, 0.1]

}

ml_pipeline = Pipeline([("Normalize", RobustScaler()), ("Model", skorch_regressor)])

grid = GridSearchCV(ml_pipeline, params)

grid.fit(X_train, Y_train)
Out[11]:
GridSearchCV(estimator=Pipeline(steps=[('Normalize', RobustScaler()),
                                       ('Model',
                                        <class 'skorch.regressor.NeuralNetRegressor'>[uninitialized](
  module=<class '__main__.Regressor'>,
))]),
             param_grid={'Model__lr': [0.01, 0.02],
                         'Model__max_epochs': [100, 250, 500],
                         'Model__optimizer__weight_decay': [0, 0.1]})

In this section, we have printed the best hyperparameters settings which gave the best score.

In [13]:
print("Best Score  : {}".format(grid.best_score_))
print("Best Params : {}".format(grid.best_params_))
Best Score  : 0.85718430858714
Best Params : {'Model__lr': 0.01, 'Model__max_epochs': 250, 'Model__optimizer__weight_decay': 0.1}

At last, we have printed MSE and R^2 scores for train/test datasets using grid instance which will use a model with best hyperparameters settings.

In [12]:
### Evaluate Model

from sklearn.metrics import mean_squared_error

print("Train MSE : {}".format(mean_squared_error(Y_train, grid.predict(X_train).reshape(-1))))
print("Test  MSE : {}".format(mean_squared_error(Y_test, grid.predict(X_test).reshape(-1))))

print("\nTrain R^2 : {}".format(grid.score(X_train, Y_train)))
print("Test  R^2 : {}".format(grid.score(X_test, Y_test)))
Train MSE : 5.5419440269470215
Test  MSE : 17.25084114074707

Train R^2 : 0.9345857601285606
Test  R^2 : 0.7914947434747013

6. Saving and Loading Model

In this section, we have explained how we can save the trained skorch model and then load it again from saved files.

The skorch model provides a method named save_params() which lets us save model weights, optimizer, loss function, and training history to different files. We can then load the model using these files and resume training or make direct predictions.

Save Model

Below we have called save_params() method on skorch model from classification section. We have provided it with four different file names for saving different details of the model as well.

In [699]:
skorch_classifier.save_params(f_params="params.pkl",
                              f_optimizer="opt.pkl",
                              f_criterion="criterion.pkl",
                              f_history="hist.json"
                             )

Create Empty Model

Below we have created a new instance of NeuralNetClassifier using our PyTorch model. We have given all other parameter values which we had used in the classification section.

After creating the model, we need to call initialize() method on it in order to make any prediction using it. This is needed because we haven't called fit() method a single time on it. If we have called fit() method then weights and other things will be initialized and we don't need to call initialize().

We have also evaluated model performance after initializing it and we can notice that as the model is initialized with random weights, the performance is not good.

In [713]:
from skorch import NeuralNetClassifier
from torch import optim

skorch_classifier2 = NeuralNetClassifier(module=Classifier,
                                         criterion=nn.CrossEntropyLoss,
                                         optimizer=optim.Adam,
                                         max_epochs=750,
                                         train_split=skorch.dataset.CVSplit(cv=5, stratified=True),
                                         verbose=0,
                                        )

skorch_classifier2
Out[713]:
<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.Classifier'>,
)
In [714]:
skorch_classifier2.initialize()

print("Test  Accuracy : {:.2f}".format(skorch_classifier2.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(skorch_classifier2.score(X_train, Y_train)))
Test  Accuracy : 0.31
Train Accuracy : 0.28

Load Model

We can load the model with previously saved weights using load_params() method of skorch model.

Below we have called load_params() method on the newly created skorch model by giving it various file names to which we had saved model weights and other details.

In [715]:
skorch_classifier2.load_params(
                              f_params="params.pkl",
                              f_optimizer="opt.pkl",
                              f_criterion="criterion.pkl",
                              f_history="hist.json"
                            )

After loading the model from files, below we have evaluated the performance of the model on train and test datasets. We can see that it's giving the same results as that of the classification section model.

In [717]:
print("Test  Accuracy : {:.2f}".format(skorch_classifier2.score(X_test, Y_test)))
print("Train Accuracy : {:.2f}".format(skorch_classifier2.score(X_train, Y_train)))
Test  Accuracy : 0.97
Train Accuracy : 0.95

This ends our small tutorial explaining how we can wrap PyTorch model inside of scikeras model so that the resulting model can be used like scikit-learn estimator. Please feel free to let us know your views in the comments section.

References



Sunny Solanki  Sunny Solanki