Updated On : Jan-14,2022 Time Investment : ~45 mins

PyTorch Lightning: Simplify Model Training by Eliminating Loops

PyTorch Lightning is a framework designed on the top of PyTorch to simplify the training and predictions tasks of neural networks. It helps developers eliminate loops to go through train data in batches to train networks, validation data in batches to evaluate model performance during training, and test data in batches to make predictions. Apart from this, it frees developers from moving models and data from CPUs to GPUs/TPUs and vice-versa. The developers can eliminate '.to(device)' code and it'll just work fine with PyTorch Lightning. It also frees developers from writing code to run training on multiple GPUs/TPUs in parallel. Just with a few settings, it'll run the training process in parallel on its own without the developer explicitly coding to run things in parallel.

As a part of this tutorial, we'll explain how we can create simple networks using PyTorch Lightning and automate training and predictions processes. The main aim of the tutorial is to get individuals started using PyTorch Lightning. As PyTorch Lightning is based on PyTorch, it requires PyTorch knowledge. If you want to learn about PyTorch and guide to design neural networks using it then please feel free to check the below links.

If you are a fan of scikit-learn then there is a package named Skorch that can give scikit-learn like API to your PyTorch models. Please feel free to check the below link if you want to learn about it.

Below, we have highlighted important sections of tutorial to give an overview of the material covered.

Important Sections of Tutorial

  1. Load Dataset
  2. Create Neural Network
  3. Train Neural Network
  4. Make Predictions
  5. Evaluate Model Performance
  6. Another Example to Neural Network Class in PyTorch Lightning

Installation

  • pip install pytorch-lightning
import pytorch_lightning as pl

print("PyTorch Lightning Version : {}".format(pl.__version__))
PyTorch Lightning Version : 1.4.9
import torch

print("PyTorch Version : {}".format(torch.__version__))
PyTorch Version : 1.10.1+cpu

1. Load Dataset

In this section, we'll be loading the digits dataset available from scikit-learn as PyTorch dataset. PyTorch let us load data in batches using two classes.

  1. Dataset - This class is responsible for actually maintaining data. It has methods to retrieve individual samples of data and a few other functionalities.
  2. DataLoader - This class wraps Dataset class and loads data in batches as per batch size. The DataLoader object is a generator kind of object that returns a tuple of two values (features tensor, target tensor) in for loop. We can also retrieve a single batch of data by giving DataLoader object to next() method.

Both classes are available from utils.data sub-module of Pytorch.

In order to incorporate our digits data into PyTorch and load it in batches, we have first created a class by extending a Dataset class which will hold our data. When we implement our custom Dataset class, we need to provide an implementation of three methods.

  1. init() - This method initializes dataset with details about data. We can initialize data here as well.
  2. len() - This method returns total number of samples present in dataset.
  3. getitem() - This method takes index as input and returns sample as given input index.

In our case, as our dataset is small, we have loaded it in the main memory inside of init() method. We have then divided the dataset into the train (80%) and test (20%) sets. Our implementation of init() method takes a few other arguments as well. The first argument specifies whether the dataset holds train data or test data specified as a string. The second and third arguments are transformations to be applied to features tensor and target tensors.

The len() method returns a number of samples in train data for the training dataset and the number of samples in test data for the test dataset. The __getitem() method returns a sample at a specified index based on train and test datasets. It returns a tuple of two values where the first value is a features tensor and the second value is the target tensor. It returns tensors after applying transformations. The transformations in our case transform arrays to torch tensors. Commonly applied transformations are cropping images, normalizing images, etc.

After defining the dataset class, we have initialized train and test Dataset objects. We have then wrapped both datasets inside of DataLoader object. We have also retrieved the single batch data from DataLoader objects and printed their shape for verification purposes.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset

class DigitsDataset(Dataset):
    def __init__(self, train_or_test="train", feat_transform=torch.tensor, target_transform=torch.tensor):
        self.typ = train_or_test

        X, Y = datasets.load_digits(return_X_y=True)

        self.X_train, self.X_test, self.Y_train, self.Y_test = train_test_split(X, Y,
                                                                                train_size=0.8,
                                                                                stratify=Y,
                                                                                random_state=123)
        self.feat_transform = feat_transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.Y_train) if self.typ == "train" else len(self.Y_test)

    def __getitem__(self, idx):
        if self.typ == "train":
            x, y = self.X_train[idx], self.Y_train[idx]
        else:
            x, y = self.X_test[idx], self.Y_test[idx]

        return self.feat_transform(x), self.target_transform(y)
train_dataset = DigitsDataset("train")
test_dataset = DigitsDataset("test")
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=32)
test_loader  = DataLoader(test_dataset,  batch_size=32)
for X_batch, Y_batch in train_loader:
    print(X_batch.shape, Y_batch.shape)
    break

for X_batch, Y_batch in test_loader:
    print(X_batch.shape, Y_batch.shape)
    break
torch.Size([32, 64]) torch.Size([32])
torch.Size([32, 64]) torch.Size([32])

2. Create Neural Network

In this section, we have explained how we can create a model using PyTorch Lightening so that we can avoid training loops.

In order to create a network using PyTorch Lightning, we need to create a class that extends LightningModule class of pytorch lightning. Then, we need to implement a few methods in this class that will be used for training and making predictions. Below, we have highlighted important methods and what to implement in them. Some of the methods are optional and need to be implemented in special cases only. The superclass LightningModule has default implementation for the majority of methods.

  1. init() - In this method, we initialize layers and activation functions. We can even initialize whole neural network if we are using Sequential() API.
  2. forward(X_batch) - In this method, we take batch data as input and perform forward pass through a neural network and return predictions. We use layers initialized in init() method to perform forward pass.
  3. backward(loss) - This method takes loss value returned by training_step() as input and calls bakward() method on it. We can override this method in our implementation if we want our implementation than the default one.
  4. configure_optimizers() - In this method, we define optimizer for our task and return it.
  5. training_step(batch,batch_idx) - In this method, we take batch data as input. We then perform forward pass-through data to make predictions. Then, we calculate loss value by giving predictions and actual targets to the loss function. We can return a single loss value from this function or dictionary. If we want to return other metrics/details then we can return the dictionary but one of the keys should be 'loss' which has lost value. This method will be called for each batch of train data.
  6. validation_step(batch,batch_idx) - This method works exactly like training_step() but works on validation batch.
  7. test_step(batch,batch_idx) - This method works exactly like training_step() but works on test data batch.
  8. predict_step(batch,batch_idx) - This method takes as input a single batch of data and makes predictions on it. It then returns predictions.
  9. training_epoch_end(outputs) - This method will be called at the end of the epoch after going through total data in batches. This method will have a list of values returned by training_step which can be a list of loss values or a list of dictionaries. We can then work on this list as per our needs.
  10. validation_epoch_end(outputs) - This method works like training_epoch_end() but with validation data.

There are other methods in LightningModule for advanced tasks which we had not covered here. Please feel free to check it from pytorch lightning docs on LightningModule.

Below, we have created a simple neural network by extending LightningModule class. We have implemented our neural network using Sequential API of PyTorch. We have kept model definition inside of init() method. We have then implemented forward() method that takes as input a single batch of data and performs forward pass of data through the network to make predictions.

Our network has 3 fully connected layers. The first layer takes data of shape (n_samples,64) and outputs (n_samples,16). The second linear layer takes input of shape (n_samples,16) and outputs (n_samples,32). The third and final linear layer takes data of shape (n_samples,32) and outputs (n_samples,10). The first two-layer has Relu (Rectified Linear Unit) as an activation function and the last layer has softmax activation function.

Then, in the next cell, we have initialized the classifier and printed it. Then, in the next cell after that, we have given random data of the expected input shape to initialized network for making predictions. We can verify from the output shape that the network works as expected.

from torch import nn
from torch.optim import Adam

class DigitsClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(64,16),
            nn.ReLU(),

            nn.Linear(16,32),
            nn.ReLU(),

            nn.Linear(32,10),
            nn.Softmax(dim=-1),
        )

    def forward(self, X_batch):
        preds = self.model(X_batch)
        return preds
classifier = DigitsClassifier()

classifier
DigitsClassifier(
  (model): Sequential(
    (0): Linear(in_features=64, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=32, bias=True)
    (3): ReLU()
    (4): Linear(in_features=32, out_features=10, bias=True)
    (5): Softmax(dim=-1)
  )
)
preds = classifier(torch.rand(50,64))

preds.shape
torch.Size([50, 10])
preds[:5]
tensor([[0.0920, 0.1132, 0.1140, 0.1032, 0.0903, 0.0902, 0.0849, 0.1232, 0.1014,
         0.0876],
        [0.0926, 0.1140, 0.1180, 0.1013, 0.0887, 0.0923, 0.0842, 0.1218, 0.0998,
         0.0873],
        [0.0845, 0.1112, 0.1175, 0.1039, 0.0902, 0.0927, 0.0861, 0.1231, 0.1005,
         0.0902],
        [0.0907, 0.1194, 0.1125, 0.0981, 0.0929, 0.0930, 0.0849, 0.1183, 0.1065,
         0.0838],
        [0.0949, 0.1182, 0.1100, 0.1024, 0.0911, 0.0907, 0.0850, 0.1196, 0.1057,
         0.0822]], grad_fn=<SliceBackward0>)

In the below cell, we have again implemented our neural network by extending LightningModule class but this time we have implemented the majority of the necessary methods. We have implemented our network in init() method like earlier and implemented forward pass in forward() method. The code is the same as the network that we had defined above. We have also defined cross entropy loss this time in init() method.

The training_step() method takes as input a single batch of data. It then makes predictions using our neural network, calculates cross-entropy loss, and returns it. The implementation of validation_step() and test_step() is almost exactly the same. The predict_step() method takes a batch of data as input, makes predictions on it, and returns them.

The optimizer for our network is defined in configure_optimizers() method. The method returns optimizer initialized with neural network parameters. We have defined Adam optimizer with a learning rate of 0.001.

from torch import nn
from torch.optim import Adam

class DigitsClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(64,16),
            nn.ReLU(),

            nn.Linear(16,32),
            nn.ReLU(),

            nn.Linear(32,10),
            nn.Softmax(dim=-1),
        )

        self.crossentropy_loss = nn.CrossEntropyLoss()

    def forward(self, X_batch):
        preds = self.model(X_batch)
        return preds

    def training_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self.model(X_batch.float())

        loss_val = self.crossentropy_loss(preds, Y_batch.long())
        self.log("Train Loss : ", loss_val)

        return loss_val

    def validation_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self.model(X_batch.float())

        loss_val = self.crossentropy_loss(preds, Y_batch.long())
        self.log("Validation Loss : ", loss_val)

        return loss_val

    def test_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self.model(X_batch.float())

        loss_val = self.crossentropy_loss(preds, Y_batch.long())
        self.log("Test Loss : ", loss_val)

        return loss_val

    def predict_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self.model(X_batch.float())

        return preds

    def configure_optimizers(self):
        optimizer = Adam(self.model.parameters(), lr=1e-3)
        return optimizer

3. Train Neural Network

In this section, we'll train the neural network that we created in the previous section. We have first initialized the train and test dataset objects and wrapped them inside of data loader objects. We have set the batch size to 64 in the data loader object which will give a batch of 64 samples to various methods of the network.

In order to train our neural network, we need to initialize the instance of Trainer class. This instance has a list of parameters that can help us with training. Below, we have highlighted some of the useful parameters/

  • min_epochs - Minimum number of epochs for which to run training.
  • max_epochs - Maximum number of epochs for which to run training.
  • accelerator - It accepts string specifying which backend to use for training and inference. The valid values are 'cpu', 'gpu', 'tpu', 'ipu' and 'auto'.
  • default_root_dir - The path for training logs and weights. It can be a local path as well as remote (S3 buckets, etc).
  • devices - It accepts an integer specifying the number of devices on which to train the network. We can also give a list of IDs to this method specifying device ids on which to perform training.
  • log_every_n_steps - It accepts an integer specifying after how many batches should we add a log to disks.
  • logger - This parameter accepts either boolean value or logging object. If set to True then it'll log using TensorBoardLogger.
  • num_nodes - It accepts an integer specifying the number of GPU nodes to use for distributed training.
  • num_processes - It accepts an integer specifying the number of processes to create for training.
  • precision It accepts integers 16, 32, or 64 specifying floating number precisions for training.
  • weights_save_path - It accepts directory where to save model weights.
  • strategy - It accepts either string or training strategy object specifying strategy to use for distributed training.

In our case, we have initialized the Trainer object with max_epochs set to 30, accelerator as 'cpu' and log at every 20 steps.

In order to train our neural network, we need to call fit() method on Trainer object by giving neural network model, train data loader, and validation data loader. The validation data loader is optional. The call to fit() will start training and the progress bar will be displayed to show the progress of a single epoch. The loss will be printed at the end of an epoch.

We can separately call validate() and test() methods, if we want to get loss and other metrics of validation and test sets. The validate() and test() methods work exactly like fit() method and takes model followed by data loader object. We have called validate() and test() methods with our test dataset for testing purpose.

train_dataset = DigitsDataset("train")
test_dataset = DigitsDataset("test")
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=64, num_workers=4)
test_loader  = DataLoader(test_dataset,  batch_size=64, num_workers=4)
classifier = DigitsClassifier()

#pl.seed_everything(42, workers=True)
trainer = pl.Trainer(max_epochs=30, accelerator="cpu", log_every_n_steps=20) #, deterministic=True)

trainer.fit(classifier, train_loader, test_loader)
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs

  | Name              | Type             | Params
-------------------------------------------------------
0 | model             | Sequential       | 1.9 K
1 | crossentropy_loss | CrossEntropyLoss | 0
-------------------------------------------------------
1.9 K     Trainable params
0         Non-trainable params
1.9 K     Total params
0.008     Total estimated model params size (MB)

PyTorch Lightning: Simplify Model Training by Eliminating Loops

trainer.validate(classifier, test_loader)
--------------------------------------------------------------------------------
DATALOADER:0 VALIDATE RESULTS
{'Validation Loss : ': 1.4943116903305054}
--------------------------------------------------------------------------------
[{'Validation Loss : ': 1.4943116903305054}]
trainer.test(classifier, test_loader)
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'Test Loss : ': 1.4943116903305054}
--------------------------------------------------------------------------------

[{'Test Loss : ': 1.4943116903305054}]

4. Make Predictions

We can make predictions with PyTorch Lightning by calling predict() method on Trainer object by giving model and data loader objects. It'll return a list of predictions. We can combine them later. Below, we have made predictions on our test dataset by giving model and test loader to predict() method.

preds = trainer.predict(classifier, test_loader)

preds = torch.concat(preds)

preds = preds.argmax(axis=1)

preds[:5]
tensor([5, 9, 9, 6, 1])

5. Evaluate Model Performance

In this section, we have evaluated the performance of our neural network by calculating the accuracy of test predictions. We have also printed a classification report of test predictions that has information like precision, recall, and f1-score per target class.

Y_test = []
for x,y in test_loader:
    Y_test.append(y)

Y_test = torch.concat(Y_test)

Y_test[:5]
tensor([5, 9, 9, 6, 1])
from sklearn.metrics import accuracy_score

print("Test Accuracy : {:.3f}".format(accuracy_score(preds, Y_test)))
Test Accuracy : 0.969
from sklearn.metrics import classification_report

print("Classification Report : ")
print(classification_report(preds, Y_test))
Classification Report :
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        36
           1       0.94      0.92      0.93        37
           2       0.97      0.97      0.97        35
           3       1.00      0.90      0.95        41
           4       1.00      1.00      1.00        36
           5       1.00      0.97      0.99        38
           6       1.00      1.00      1.00        36
           7       0.97      1.00      0.99        35
           8       0.80      1.00      0.89        28
           9       1.00      0.95      0.97        38

    accuracy                           0.97       360
   macro avg       0.97      0.97      0.97       360
weighted avg       0.97      0.97      0.97       360

6. Another Example to Neural Network Class in PyTorch Lightning

Below, we have shown one more example demonstrating how we can create neural networks using PyTorch Lightning. The majority of the methods are almost the same as our previous example. We have added implementation of two extra methods training_epoch_end() and validation_epoch_end(). Also, we haven't defined our neural network this time using Sequential API. Instead, we have defined layers of network in init() method. We have then called these layers inside of forward() method to perform forward pass-through data.

from torch import nn
from torch.optim import Adam
import torch.nn.functional as F

class DigitsClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(64,16)
        self.lin2 = nn.Linear(16,32)
        self.lin3 = nn.Linear(32,10)

    def forward(self, X_batch):
        x = F.relu(self.lin1(X_batch))
        x = F.relu(self.lin2(x))
        x = F.relu(self.lin3(x))
        return F.softmax(x, dim=-1)

    def training_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self(X_batch.float())

        loss_val = F.cross_entropy(preds, Y_batch.long())
        self.log("Train Loss : ", loss_val)

        return {"loss": loss_val}

    def training_epoch_end(self,losses):
        print(len(losses)) ## This will be same as number of training batches

    def validation_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self(X_batch.float())

        loss_val = F.cross_entropy(preds, Y_batch.long())
        self.log("Validation Loss : ", loss_val)

        return {"loss": loss_val}

    def validation_epoch_end(self,losses):
        print(len(losses)) ## This will be same as number of validation batches

    def test_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self(X_batch.float())

        loss_val = F.cross_entropy(preds, Y_batch.long())
        self.log("Test Loss : ", loss_val)

        return {"loss": loss_val}

    def predict_step(self, batch, batch_idx):
        X_batch, Y_batch = batch
        preds = self(X_batch.float())

        return preds

    def configure_optimizers(self):
        optimizer = Adam(self.parameters(), lr=1e-3, eps=1e-6)
        return optimizer

This ends our small tutorial explaining how we can create a neural network using PyTorch Lightning. This will get individuals started with Lightning framework. Please feel free to let us know your views in the comments section.

References

Sunny Solanki  Sunny Solanki

Share Views Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.