Updated On : Jan-07,2022 Time Investment : ~30 mins

PyTorch - Convolutional Neural Networks

Convolutional neural networks or CNN are commonly used networks nowadays to solve many tasks related to images. They have preferred architecture when solving tasks like image classification, object detection, image segmentation, etc. They are also commonly used in NLP and time-series tasks. The main layer that is used repeatedly in CNN is the convolution layer which applies convolution operation on input data. One of the main advantages of CNN is that it has fewer parameters compared to a big fully connected neural network with dense layers hence it takes less time to train CNN. We have covered the theory of CNN in detail in our blog about it. Please feel free to check it if you want to understand the theory better.

As a part of this tutorial, we'll explain how we can create simple CNNs using high-level Pytorch API ('torch.nn'). We'll be using Fashion MNIST dataset for our purpose. We expect that the reader of this tutorial has basic knowledge of neural networks and Pytorch. If you want some background on Pytorch and designing neural networks using it then please check our below tutorials. It'll help you easily go through this tutorial.

Below, we have listed important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial

  1. Simple Convolutional Neural Network
    • Load Dataset
    • Create Neural Network
    • Train Model (SGD Optimizer)
    • Make Predictions
    • Evaluate Model Performance
    • Train Model (Adam Optimizer)
    • Make Predictions
    • Evaluate Model Performance
  2. Channels First vs Channels Last

Below we have imported Pytorch and printed the version of it that we'll be using in this tutorial.

import torch

print("Torch Version : {}".format(torch.__version__))
Torch Version : 1.10.1+cpu

Simple Convolutional Neural Network

In this section, we'll explain how we can create CNN using PyTorch. We'll be creating a simple CNN with 2 convolution layers for our explanation purposes. We'll train our CNN first with SGD optimizer and then with Adam optimizer to check which one gives better results.

Load Dataset

In this section, we have loaded the Fashion MNIST dataset from keras. The dataset has grayscale images of shape (28,28) for 10 different fashion items. There are 60k images in the train set and 10k images in the test set. After loading datasets, we have converted them to PyTorch tensor as required by models created using PyTorch. We have then reshaped images from shape (28,28) to (1,28,28) to introduce channel dimension at beginning. The convolution layer requires channel dimension and the PyTorch convolution layer requires channel dimension at beginning. In color (RGB) images, there are 3 channels but in our cases, as images are grayscale, we have introduced channel dimension at the beginning. The images are represented at integers in the range [0,255]. We have divided images from both train and test sets by float 255 to bring the numbers in the range [0,1]. This will help the optimization algorithm converge faster.

from tensorflow import keras
from sklearn.model_selection import train_test_split

(X_train, Y_train), (X_test, Y_test) = keras.datasets.fashion_mnist.load_data()

X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
                                   torch.tensor(X_test, dtype=torch.float32),\
                                   torch.tensor(Y_train, dtype=torch.long),\
                                   torch.tensor(Y_test, dtype=torch.long)

X_train, X_test = X_train.reshape(-1,1,28,28), X_test.reshape(-1,1,28,28)

X_train, X_test = X_train/255.0, X_test/255.0

classes =  Y_train.unique()

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape

Create Neural Network

In this section, we have created a CNN using Pytorch. We have created a class named ConvNet by extending nn.Module class. The init() method of our class has layers for our model and forward() method actually performs forward pass through input data.

Our CNN consists of 3 convolution layers. The first convolution layer has a channel size of 32 and will apply kernels of shape (3,3) on input images. The second convolution layer has a channel size of 16 and a kernel size of (3,3). The third convolution layer has a channel size of 8 and a kernel size of (3,3). All convolution layers have padding set to 'SAME' which will inform to add padding of zeros around images to keep them of same height and width even after applying convolution operation. We are applying Relu (Rectified Linear Unit) activation function to the output of all 3 convolution layers. After applying convolution layers, we are flattening the output of the third convolution layer. Then, we have used a linear layer with a number of units 10 which is the same as the output classes.

Our input images of shape (n_samples,1,height,width) will be transformed to shape (n_samples,32,height,width) by first convolution layer. The height and width in our case is 28. The n_samples will be same as batch size when training. Then, the second convolution layer will transform data shape from (n_samples,32,height,width) to (n_samples,16,height,width). Then, third convolution layer will transform data shape from (n_samples,16,height,width) to (n_samples,8,height,width). Then, we are flattening the output of shape (n_samples,8,height,width) to (n_samples,8 x height x width). At last, linear layer will transform he output from shape (n_samples,8 x height x width) to (n_samples,10).

We have defined our whole convolution neural networks using layers available from 'nn' module of PyTorch. We have put layers sequentially inside of nn.Sequential() constructor of PyTorch that applies layers in the sequence in which they were added to the network. Inside of forward() method, we are simply performing forward pass of data through the network by calling nn.Sequential object defined in init() method.

After creating the network, we have initialized it in the next cell and performed a forward pass through it with a few samples for verification.

from torch import nn

class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.seq = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), padding="same"),
            nn.ReLU(),

            nn.Conv2d(in_channels=32, out_channels=16, kernel_size=(3,3), padding="same"),
            nn.ReLU(),

            nn.Conv2d(in_channels=16, out_channels=8, kernel_size=(3,3), padding="same"),
            nn.ReLU(),

            nn.Flatten(),
            nn.Linear(8*28*28, len(classes)),
            #nn.Softmax(dim=1)            
        )

    def forward(self, x_batch):
        preds = self.seq(x_batch)
        return preds
conv_net = ConvNet()

conv_net
ConvNet(
  (seq): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=same)
    (1): ReLU()
    (2): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=same)
    (3): ReLU()
    (4): Conv2d(16, 8, kernel_size=(3, 3), stride=(1, 1), padding=same)
    (5): ReLU()
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=6272, out_features=10, bias=True)
  )
)
preds = conv_net(X_train[:5])

preds.shape
torch.Size([5, 10])

Train Model (SGD Optimizer)

In this section, we'll be training our CNN. To train our CNN, we have created a small function that performs training by taking data and other details as input. The function takes CNN, loss function, optimizer, data features, target values, number of epochs, and batch size as input. It then executes the training loop number of epochs time.

During each epoch, it first calculates start and end indexes of input data. It then generates a batch of data and performs a forward pass through CNN to make predictions. It then calculates a loss value using predictions and actual target values. It then zeros previous gradients present in the optimizer object. Then, it calls backward() method on loss value to calculate the gradient of the loss with respect to CNN weights. It then calls step() method on the optimizer to update model weights using calculated gradients. It keeps printing loss value at every epoch as well.

def TrainModelInBatches(model, loss_func, optimizer, X, Y, batch_size=32, epochs=5):
    for i in range(epochs):
        batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices

        losses = [] ## Record loss of each batch
        for batch in batches:
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data

            preds = model(X_batch) ## Make Predictions by forward pass through network

            loss = loss_func(preds, Y_batch) ## Calculate Loss
            losses.append(loss) ## Record Loss

            optimizer.zero_grad() ## Zero weights before calculating gradients
            loss.backward() ## Calculate Gradients
            optimizer.step() ## Update Weights

        print("Categorical Cross Entropy : {:.3f}".format(torch.tensor(losses).mean()))

We'll be using cross entropy loss function for our multi-class classification task. The function is available from 'nn' module with class named CrossEntropyLoss(). It takes predictions and actual target values as input and returns loss value.

Please make a NOTE that this function first performs softmax on input predictions and then calculates cross-entropy loss. This is the reason, we have not used softmax activation function on the output of our last layer in network definition.

loss = nn.CrossEntropyLoss()

loss(preds, Y_train[:5])
tensor(2.3235, grad_fn=<NllLossBackward>)

Now, we are actually training our neural network by initializing individual network parts and calling the training function designed in the previous cell. We have initialized a number of epochs to 25, learning rate to 0.001, and batch size to 128. We have then created an instance of CNN. Followed by it, we have created a loss function and SGD optimizer for our optimization process. We have given model weights and a learning rate to the optimizer.

Then, in the next cell, we have called our training function to actually perform training. We can notice from decreasing cross-entropy loss that our model seems to be doing a good job.

from torch.optim import SGD, RMSprop, Adam

torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.

epochs = 25
learning_rate = torch.tensor(1/1e3) # 0.001
batch_size=128

conv_net = ConvNet()
cross_entropy_loss = nn.CrossEntropyLoss()
optimizer = SGD(params=conv_net.parameters(), lr=learning_rate)
TrainModelInBatches(conv_net,
                    cross_entropy_loss,
                    optimizer,
                    X_train, Y_train,
                    batch_size=batch_size,
                    epochs=epochs)
Categorical Cross Entropy : 2.288
Categorical Cross Entropy : 2.031
Categorical Cross Entropy : 0.959
Categorical Cross Entropy : 0.694
Categorical Cross Entropy : 0.628
Categorical Cross Entropy : 0.590
Categorical Cross Entropy : 0.564
Categorical Cross Entropy : 0.545
Categorical Cross Entropy : 0.529
Categorical Cross Entropy : 0.516
Categorical Cross Entropy : 0.505
Categorical Cross Entropy : 0.496
Categorical Cross Entropy : 0.487
Categorical Cross Entropy : 0.480
Categorical Cross Entropy : 0.473
Categorical Cross Entropy : 0.467
Categorical Cross Entropy : 0.462
Categorical Cross Entropy : 0.457
Categorical Cross Entropy : 0.452
Categorical Cross Entropy : 0.448
Categorical Cross Entropy : 0.444
Categorical Cross Entropy : 0.441
Categorical Cross Entropy : 0.438
Categorical Cross Entropy : 0.434
Categorical Cross Entropy : 0.432

Make Predictions

In this section, we are making predictions on train and test datasets. We have defined a small function that performs prediction on input data in batches. It then combines predictions of all batches to create predictions for total input data.

As the output of our CNN is 10 values per sample, we need to convert it to one value (prediction class). We have done that by calling argmax() method which returns the index of maximum value and we predict that index as the prediction class of the sample.

def MakePredictions(model, input_data, batch_size=32):
    batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices

    with torch.no_grad(): ## Disables automatic gradients calculations
        preds = []
        for batch in batches:
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch = input_data[start:end]

            preds.append(model(X_batch))

    return preds
test_preds = MakePredictions(conv_net, X_test, batch_size=128) ## Make Predictions on test dataset

test_preds = torch.cat(test_preds) ## Combine predictions of all batches

test_preds = test_preds.argmax(dim=1)

train_preds = MakePredictions(conv_net, X_train, batch_size=128) ## Make Predictions on train dataset

train_preds = torch.cat(train_preds)

train_preds = train_preds.argmax(dim=1)

test_preds[:5], train_preds[:5]
(tensor([9, 2, 1, 1, 6]), tensor([9, 0, 3, 3, 3]))
Y_test[:5], Y_train[:5]
(tensor([9, 2, 1, 1, 6]), tensor([9, 0, 0, 3, 0]))

Evaluate Model Performance

In this section, we have evaluated the performance of our CNN. We have first calculated the accuracy of train and test predictions. Then, we have calculated the classification report on test predictions which has information like precision, recall, and f1-score for each target class. We have calculated these metrics using methods available from scikit-learn.

If you want to learn about various ML metrics calculation methods available through scikit-learn then please feel free to check our tutorial which covers the majority of them in detail.

from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test  Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
Train Accuracy : 0.848
Test  Accuracy : 0.836
from sklearn.metrics import classification_report

print("Test Data Classification Report : ")
print(classification_report(Y_test, test_preds))
Test Data Classification Report :
              precision    recall  f1-score   support

           0       0.80      0.83      0.81      1000
           1       0.98      0.92      0.95      1000
           2       0.67      0.79      0.73      1000
           3       0.75      0.90      0.82      1000
           4       0.74      0.69      0.71      1000
           5       0.96      0.92      0.94      1000
           6       0.67      0.48      0.56      1000
           7       0.89      0.95      0.92      1000
           8       0.94      0.94      0.94      1000
           9       0.95      0.94      0.94      1000

    accuracy                           0.84     10000
   macro avg       0.84      0.84      0.83     10000
weighted avg       0.84      0.84      0.83     10000

Train Model (Adam Optimizer)

In this section, we have again initialized our CNN and performed training on it again by using Adam optimizer this time. We are training for 15 epochs only to check whether Adam does a better job than SGD.

from torch.optim import SGD, RMSprop, Adam

torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.

epochs = 15
learning_rate = torch.tensor(1/1e3) # 0.001
batch_size=128

conv_net = ConvNet()
cross_entropy_loss = nn.CrossEntropyLoss()
optimizer = Adam(params=conv_net.parameters(), lr=learning_rate)
TrainModelInBatches(conv_net,
                    cross_entropy_loss,
                    optimizer,
                    X_train, Y_train,
                    batch_size=batch_size,
                    epochs=epochs)
Categorical Cross Entropy : 0.499
Categorical Cross Entropy : 0.326
Categorical Cross Entropy : 0.283
Categorical Cross Entropy : 0.254
Categorical Cross Entropy : 0.230
Categorical Cross Entropy : 0.211
Categorical Cross Entropy : 0.193
Categorical Cross Entropy : 0.178
Categorical Cross Entropy : 0.165
Categorical Cross Entropy : 0.153
Categorical Cross Entropy : 0.143
Categorical Cross Entropy : 0.133
Categorical Cross Entropy : 0.124
Categorical Cross Entropy : 0.115
Categorical Cross Entropy : 0.107

Make Predictions

In this section, we have made predictions on train and test datasets using CNN trained with Adam optimizer.

test_preds = MakePredictions(conv_net, X_test, batch_size=128) ## Make Predictions on test dataset

test_preds = torch.cat(test_preds) ## Combine predictions of all batches

test_preds = test_preds.argmax(dim=1)

train_preds = MakePredictions(conv_net, X_train, batch_size=128) ## Make Predictions on train dataset

train_preds = torch.cat(train_preds)

train_preds = train_preds.argmax(dim=1)

test_preds[:5], train_preds[:5]
(tensor([9, 2, 1, 1, 6]), tensor([9, 0, 0, 3, 0]))

Evaluate Model Performance

In this section, we have evaluated the performance of our new CNN trained with Adam optimizer by calculating accuracy and classification report. We can notice from the resulting accuracy that our model seems to have done a better job compared to the model trained with SGD optimizer. But, we think that our model has overfitted a but on train data.

from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test  Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
Train Accuracy : 0.952
Test  Accuracy : 0.895
from sklearn.metrics import classification_report

print("Test Data Classification Report : ")
print(classification_report(Y_test, test_preds))
Test Data Classification Report :
              precision    recall  f1-score   support

           0       0.79      0.88      0.83      1000
           1       0.98      0.99      0.99      1000
           2       0.81      0.89      0.85      1000
           3       0.89      0.91      0.90      1000
           4       0.85      0.81      0.83      1000
           5       0.96      0.97      0.97      1000
           6       0.79      0.65      0.72      1000
           7       0.96      0.92      0.94      1000
           8       0.98      0.96      0.97      1000
           9       0.92      0.97      0.95      1000

    accuracy                           0.89     10000
   macro avg       0.89      0.89      0.89     10000
weighted avg       0.89      0.89      0.89     10000

Channels First vs Channels Last

The color or RGB images are represented using multi-dimensional arrays. There are two different ways to represent RGB images using multi-dimensional arrays in computers.

  1. Channels First - Here, we have 3 2D arrays. Each array represents information about individual channel. THe RGB image of shape (28,28) will be represented as (3,28,28).
  2. Channels Last - Here, we maintain details of channels together for each pixel. The RGB image of shape (28,28) pixel is represented as (28,28,3). There are 3 values per pixel of (28,28) image.

In our tutorial at the beginning when we loaded our dataset, we introduced channel detail for our grayscale images by adding one extra dimension at the beginning. The convolution layer available through PyTorch requires that we provide input images where channel details are present first. They require images in Channels First format.

conv1 = nn.Conv2d(1,16, (3,3), padding="same")
conv2 = nn.Conv2d(16,32, (3,3), padding="same")

preds1 = conv1(torch.rand(50,1,28,28))
preds2 = conv2(preds1)

print("Conv Layer 1 Weights Shape : {}".format(list(conv1.parameters())[0].shape))
print("Conv Layer 2 Weights Shape : {}".format(list(conv2.parameters())[0].shape))

print("\nInput Shape               : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
Conv Layer 1 Weights Shape : torch.Size([16, 1, 3, 3])
Conv Layer 2 Weights Shape : torch.Size([32, 16, 3, 3])

Input Shape               : (50, 1, 28, 28)
Conv Layer 1 Output Shape : torch.Size([50, 16, 28, 28])
Conv Layer 2 Output Shape : torch.Size([50, 32, 28, 28])

This ends our small tutorial explaining how we can create simple convolutional neural networks using PyTorch. Please feel free to let us know your views in the comments section.

References

Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.