Convolutional neural networks or CNN are commonly used networks nowadays to solve many tasks related to images. They have preferred architecture when solving tasks like image classification, object detection, image segmentation, etc. They are also commonly used in NLP and time-series tasks. The main layer that is used repeatedly in CNN is the convolution layer which applies convolution operation on input data. One of the main advantages of CNN is that it has fewer parameters compared to a big fully connected neural network with dense layers hence it takes less time to train CNN. We have covered the theory of CNN in detail in our blog about it. Please feel free to check it if you want to understand the theory better.
As a part of this tutorial, we'll explain how we can create simple CNNs using high-level Pytorch API ('torch.nn'). We'll be using Fashion MNIST dataset for our purpose. We expect that the reader of this tutorial has basic knowledge of neural networks and Pytorch. If you want some background on Pytorch and designing neural networks using it then please check our below tutorials. It'll help you easily go through this tutorial.
Below, we have listed important sections of the tutorial to give an overview of the material covered.
Below we have imported Pytorch and printed the version of it that we'll be using in this tutorial.
import torch
print("Torch Version : {}".format(torch.__version__))
In this section, we'll explain how we can create CNN using PyTorch. We'll be creating a simple CNN with 2 convolution layers for our explanation purposes. We'll train our CNN first with SGD optimizer and then with Adam optimizer to check which one gives better results.
In this section, we have loaded the Fashion MNIST dataset from keras. The dataset has grayscale images of shape (28,28) for 10 different fashion items. There are 60k images in the train set and 10k images in the test set. After loading datasets, we have converted them to PyTorch tensor as required by models created using PyTorch. We have then reshaped images from shape (28,28) to (1,28,28) to introduce channel dimension at beginning. The convolution layer requires channel dimension and the PyTorch convolution layer requires channel dimension at beginning. In color (RGB) images, there are 3 channels but in our cases, as images are grayscale, we have introduced channel dimension at the beginning. The images are represented at integers in the range [0,255]. We have divided images from both train and test sets by float 255 to bring the numbers in the range [0,1]. This will help the optimization algorithm converge faster.
from tensorflow import keras
from sklearn.model_selection import train_test_split
(X_train, Y_train), (X_test, Y_test) = keras.datasets.fashion_mnist.load_data()
X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
torch.tensor(X_test, dtype=torch.float32),\
torch.tensor(Y_train, dtype=torch.long),\
torch.tensor(Y_test, dtype=torch.long)
X_train, X_test = X_train.reshape(-1,1,28,28), X_test.reshape(-1,1,28,28)
X_train, X_test = X_train/255.0, X_test/255.0
classes = Y_train.unique()
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
In this section, we have created a CNN using Pytorch. We have created a class named ConvNet by extending nn.Module class. The init() method of our class has layers for our model and forward() method actually performs forward pass through input data.
Our CNN consists of 3 convolution layers. The first convolution layer has a channel size of 32 and will apply kernels of shape (3,3) on input images. The second convolution layer has a channel size of 16 and a kernel size of (3,3). The third convolution layer has a channel size of 8 and a kernel size of (3,3). All convolution layers have padding set to 'SAME' which will inform to add padding of zeros around images to keep them of same height and width even after applying convolution operation. We are applying Relu (Rectified Linear Unit) activation function to the output of all 3 convolution layers. After applying convolution layers, we are flattening the output of the third convolution layer. Then, we have used a linear layer with a number of units 10 which is the same as the output classes.
Our input images of shape (n_samples,1,height,width) will be transformed to shape (n_samples,32,height,width) by first convolution layer. The height and width in our case is 28. The n_samples will be same as batch size when training. Then, the second convolution layer will transform data shape from (n_samples,32,height,width) to (n_samples,16,height,width). Then, third convolution layer will transform data shape from (n_samples,16,height,width) to (n_samples,8,height,width). Then, we are flattening the output of shape (n_samples,8,height,width) to (n_samples,8 x height x width). At last, linear layer will transform he output from shape (n_samples,8 x height x width) to (n_samples,10).
We have defined our whole convolution neural networks using layers available from 'nn' module of PyTorch. We have put layers sequentially inside of nn.Sequential() constructor of PyTorch that applies layers in the sequence in which they were added to the network. Inside of forward() method, we are simply performing forward pass of data through the network by calling nn.Sequential object defined in init() method.
After creating the network, we have initialized it in the next cell and performed a forward pass through it with a few samples for verification.
from torch import nn
class ConvNet(nn.Module):
def __init__(self):
super(ConvNet, self).__init__()
self.seq = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=(3,3), padding="same"),
nn.ReLU(),
nn.Conv2d(in_channels=32, out_channels=16, kernel_size=(3,3), padding="same"),
nn.ReLU(),
nn.Conv2d(in_channels=16, out_channels=8, kernel_size=(3,3), padding="same"),
nn.ReLU(),
nn.Flatten(),
nn.Linear(8*28*28, len(classes)),
#nn.Softmax(dim=1)
)
def forward(self, x_batch):
preds = self.seq(x_batch)
return preds
conv_net = ConvNet()
conv_net
preds = conv_net(X_train[:5])
preds.shape
In this section, we'll be training our CNN. To train our CNN, we have created a small function that performs training by taking data and other details as input. The function takes CNN, loss function, optimizer, data features, target values, number of epochs, and batch size as input. It then executes the training loop number of epochs time.
During each epoch, it first calculates start and end indexes of input data. It then generates a batch of data and performs a forward pass through CNN to make predictions. It then calculates a loss value using predictions and actual target values. It then zeros previous gradients present in the optimizer object. Then, it calls backward() method on loss value to calculate the gradient of the loss with respect to CNN weights. It then calls step() method on the optimizer to update model weights using calculated gradients. It keeps printing loss value at every epoch as well.
def TrainModelInBatches(model, loss_func, optimizer, X, Y, batch_size=32, epochs=5):
for i in range(epochs):
batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
preds = model(X_batch) ## Make Predictions by forward pass through network
loss = loss_func(preds, Y_batch) ## Calculate Loss
losses.append(loss) ## Record Loss
optimizer.zero_grad() ## Zero weights before calculating gradients
loss.backward() ## Calculate Gradients
optimizer.step() ## Update Weights
print("Categorical Cross Entropy : {:.3f}".format(torch.tensor(losses).mean()))
We'll be using cross entropy loss function for our multi-class classification task. The function is available from 'nn' module with class named CrossEntropyLoss(). It takes predictions and actual target values as input and returns loss value.
Please make a NOTE that this function first performs softmax on input predictions and then calculates cross-entropy loss. This is the reason, we have not used softmax activation function on the output of our last layer in network definition.
loss = nn.CrossEntropyLoss()
loss(preds, Y_train[:5])
Now, we are actually training our neural network by initializing individual network parts and calling the training function designed in the previous cell. We have initialized a number of epochs to 25, learning rate to 0.001, and batch size to 128. We have then created an instance of CNN. Followed by it, we have created a loss function and SGD optimizer for our optimization process. We have given model weights and a learning rate to the optimizer.
Then, in the next cell, we have called our training function to actually perform training. We can notice from decreasing cross-entropy loss that our model seems to be doing a good job.
from torch.optim import SGD, RMSprop, Adam
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 25
learning_rate = torch.tensor(1/1e3) # 0.001
batch_size=128
conv_net = ConvNet()
cross_entropy_loss = nn.CrossEntropyLoss()
optimizer = SGD(params=conv_net.parameters(), lr=learning_rate)
TrainModelInBatches(conv_net,
cross_entropy_loss,
optimizer,
X_train, Y_train,
batch_size=batch_size,
epochs=epochs)
In this section, we are making predictions on train and test datasets. We have defined a small function that performs prediction on input data in batches. It then combines predictions of all batches to create predictions for total input data.
As the output of our CNN is 10 values per sample, we need to convert it to one value (prediction class). We have done that by calling argmax() method which returns the index of maximum value and we predict that index as the prediction class of the sample.
def MakePredictions(model, input_data, batch_size=32):
batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
with torch.no_grad(): ## Disables automatic gradients calculations
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
test_preds = MakePredictions(conv_net, X_test, batch_size=128) ## Make Predictions on test dataset
test_preds = torch.cat(test_preds) ## Combine predictions of all batches
test_preds = test_preds.argmax(dim=1)
train_preds = MakePredictions(conv_net, X_train, batch_size=128) ## Make Predictions on train dataset
train_preds = torch.cat(train_preds)
train_preds = train_preds.argmax(dim=1)
test_preds[:5], train_preds[:5]
Y_test[:5], Y_train[:5]
In this section, we have evaluated the performance of our CNN. We have first calculated the accuracy of train and test predictions. Then, we have calculated the classification report on test predictions which has information like precision, recall, and f1-score for each target class. We have calculated these metrics using methods available from scikit-learn.
If you want to learn about various ML metrics calculation methods available through scikit-learn then please feel free to check our tutorial which covers the majority of them in detail.
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
from sklearn.metrics import classification_report
print("Test Data Classification Report : ")
print(classification_report(Y_test, test_preds))
In this section, we have again initialized our CNN and performed training on it again by using Adam optimizer this time. We are training for 15 epochs only to check whether Adam does a better job than SGD.
from torch.optim import SGD, RMSprop, Adam
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 15
learning_rate = torch.tensor(1/1e3) # 0.001
batch_size=128
conv_net = ConvNet()
cross_entropy_loss = nn.CrossEntropyLoss()
optimizer = Adam(params=conv_net.parameters(), lr=learning_rate)
TrainModelInBatches(conv_net,
cross_entropy_loss,
optimizer,
X_train, Y_train,
batch_size=batch_size,
epochs=epochs)
In this section, we have made predictions on train and test datasets using CNN trained with Adam optimizer.
test_preds = MakePredictions(conv_net, X_test, batch_size=128) ## Make Predictions on test dataset
test_preds = torch.cat(test_preds) ## Combine predictions of all batches
test_preds = test_preds.argmax(dim=1)
train_preds = MakePredictions(conv_net, X_train, batch_size=128) ## Make Predictions on train dataset
train_preds = torch.cat(train_preds)
train_preds = train_preds.argmax(dim=1)
test_preds[:5], train_preds[:5]
In this section, we have evaluated the performance of our new CNN trained with Adam optimizer by calculating accuracy and classification report. We can notice from the resulting accuracy that our model seems to have done a better job compared to the model trained with SGD optimizer. But, we think that our model has overfitted a but on train data.
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
from sklearn.metrics import classification_report
print("Test Data Classification Report : ")
print(classification_report(Y_test, test_preds))
The color or RGB images are represented using multi-dimensional arrays. There are two different ways to represent RGB images using multi-dimensional arrays in computers.
In our tutorial at the beginning when we loaded our dataset, we introduced channel detail for our grayscale images by adding one extra dimension at the beginning. The convolution layer available through PyTorch requires that we provide input images where channel details are present first. They require images in Channels First format.
conv1 = nn.Conv2d(1,16, (3,3), padding="same")
conv2 = nn.Conv2d(16,32, (3,3), padding="same")
preds1 = conv1(torch.rand(50,1,28,28))
preds2 = conv2(preds1)
print("Conv Layer 1 Weights Shape : {}".format(list(conv1.parameters())[0].shape))
print("Conv Layer 2 Weights Shape : {}".format(list(conv2.parameters())[0].shape))
print("\nInput Shape : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
This ends our small tutorial explaining how we can create simple convolutional neural networks using PyTorch. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to