Convolutional neural networks (CNN) are a special kind of neural network that uses convolution operations on input data to find out patterns in it. The CNN commonly has one or more convolution layers followed by linear layers. It also sometimes includes batch normalization and max-pooling layers after convolution layers. CNN is commonly used for various tasks related to visual imagery like object detection, image segmentation, image classification, etc. It is also now commonly used in NLP and time-series problems. One of the main advantages of CNN is that it has quite fewer parameters to train compared to deep fully connected neural networks. If you want to know in detail about CNN then please check the blog on it which covers theory on it.
As a part of this tutorial, we'll be creating a small CNN network using MXNet. We'll be creating a CNN using Gluon API of MXNet. The main aim of the tutorial is to get individuals started developing CNN using MXNet. We have already created another tutorial on MXNet where we have explained how to create fully connected neural networks using Gluon API of MXNet. Please feel free to check if it was something you were looking for. It'll also provide background for this tutorial.
Below, we have listed important sections of the tutorial to give an overview of the material covered.
import mxnet
print("Mxnet Version : {}".format(mxnet.__version__))
In this section, we'll explain how to create a simple convolutional neural network using MXNet. We'll be using the Fashion MNIST dataset available from keras for our purpose.
In this section, we have loaded Fashion MNIST dataset available from keras. The dataset has grayscale images of 10 fashion items. The images have shape (28,28) pixels. The dataset is already divided into the train (60k images) and test (10k images) sets. After loading datasets, we have converted them from numpy arrays to MXNet NDArray using array() method of nd sub-module. We have also reshaped the original images and added one extra dimension at beginning ((n,28,28) -> (n,1,28,28)) to be considered as channel. As our images are grayscale, it has only one channel, the RGB images have 3 channels. The introduction of this extra dimension representing channels is required by convolution layers as they transform channel dimensions as we'll see below.
After reshaping images datasets, we have also divided images by float value 255. This is done to bring all values in the range [0,1]. It'll help the optimization algorithm to converge faster. By default, images are represented as integers in the range [0,255].
from tensorflow import keras
from sklearn.model_selection import train_test_split
from mxnet import nd
import numpy as np
(X_train, Y_train), (X_test, Y_test) = keras.datasets.fashion_mnist.load_data()
X_train, X_test, Y_train, Y_test = nd.array(X_train, dtype=np.float32),\
nd.array(X_test, dtype=np.float32),\
nd.array(Y_train, dtype=np.float32),\
nd.array(Y_test, dtype=np.float32)
X_train, X_test = X_train.reshape(-1,1,28,28), X_test.reshape(-1,1,28,28)
X_train, X_test = X_train/255.0, X_test/255.0
classes = np.unique(Y_train.asnumpy())
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
In this section, we have created a CNN using Gluon API of MXNet. We have created a CNN by creating a class that extends nn.Block class. We have defined two methods inside the class.
Our neural networks consist of two convolution layers. The first convolution layer has 32 output channels and a kernel size of (3,3). The second convolution layer has 16 output channels and a kernel size of (3,3). We have created both convolution layers using Conv2D() constructor available through 'nn' submodule of mxnet.gluon module. We have specified to use Relu (rectified linear unit) activation function on the output of both convolution layers. Our forward() method first applies these two convolution layers to input data. It then flattens the output of the second convolution layer and feeds them to the linear layer. The linear layer has 10 output units which are the same as the number of unique target classes. At last, we have applied softmax() activation to the output of the last linear layer. It'll map the outputs of linear layers to probabilities in the range [0,1] such that the sum of all 10 values per sample will be 1.
Our input has a shape of (n_samples,1,28,28). The first convolution layer will transform the input to shape (n_samples,32,28,28) from (n_samples,1,28,28). The second convolution layer will transform the input data from shape (n_samples,32,28,28) to (n_samples,16,28,28). The flatten layer will flatten the output of second convolution layer. It'll transform the data shape from (n_samples,16,28,28) to (n_samples, 16 x 28 x 28) = (n_samples, 12544). The linear layer will transform input data shape from (n_samples, 12544) to (n_samples, 10).
After defining our CNN, we have created CNN by creating an instance of the class. Then we have initialized the network weights by calling initialize() method on it and performed the forward pass through the network with a few data samples for verification.
Then, in the below cell, we have retrieved model weights using method collect_params() and printed the shape of weights and biases.
from mxnet.gluon import nn
class CNN(nn.Block):
def __init__(self, **kwargs):
super(CNN, self).__init__(**kwargs)
self.conv1 = nn.Conv2D(channels=32, kernel_size=(3,3), activation="relu", padding=(1,1))
self.conv2 = nn.Conv2D(channels=16, kernel_size=(3,3), activation="relu", padding=(1,1))
self.flatten = nn.Flatten()
self.linear = nn.Dense(len(classes))
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.flatten(x)
x = self.linear(x)
return nd.softmax(x)
model = CNN()
model
model.initialize()
preds = model(X_train[:5])
preds
model
for layer_name, weights in model.collect_params().items():
print("{} - {}".format(layer_name,weights.shape))
In this section, we are training the CNN we defined in previous cells. We have created a small function that takes a few things as input and performs training. The function takes Trainer object, data features, target values, number of epochs, and batch size as input. It then executes the training loop number of epochs times. We have defined Trainer object in the next cell when we call this function.
For each epoch, it calculates the start and end indexes of batches of data. It then loops through data in batches. For each batch, it performs a forward pass-through model by giving data to it. The batch predictions are recorded and loss value is calculated using actual target values along with predictions. These two operations (forward pass and loss calculation) are performed in autograd.record() context manager. The context manager helps with gradients calculation. After loss calculation, we call backward() method on loss to calculate gradients. We then record the loss of the batch. Then, we call step() function on Trainer object that updates the weights of CNN based on learning rate using gradients. We are also printing loss value at every epoch to check model progress.
from mxnet import autograd
def TrainModelInBatches(trainer, X, Y, epochs, batch_size=32):
for i in range(1, epochs+1):
batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
with autograd.record():
preds = model(X_batch) ## Forward pass to make predictions
loss_val = loss_func(preds.squeeze(), Y_batch) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
losses.append(loss_val)
trainer.step(len(X_batch)) ## Update weights
print("CrossEntropyLoss : {:.3f}".format(np.array(losses).mean()))
In the below cell, we are calling the training function designed in the previous cell to perform training. We have initialized the number of epochs to 25, learning rate to 0.001, and batch size to 256. Then, we have created CNN and initialized its weights. Followed by it, we have defined the loss function that we'll use for our purpose. We'll be using SoftmaxCrossEntropyLoss() loss. It's a cross-entropy loss for multi-class classification problems.
Then, we have initialized the Trainer object. The Trainer object has details about the optimizer. We have given model parameters to trainer using collect_params() method. We have instructed it to use Adam optimizer for our case. The third parameter to Trainer object is a dictionary specifying parameters that will be given to the optimizer.
After initializing Trainer object, we have called our training function with the necessary parameters to perform training. We can notice from the loss value getting printed at every epoch that our model seems to be doing a good job.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
batch_size=256
epochs=25
learning_rate = 0.001
model = CNN()
model.initialize()
loss_func = loss.SoftmaxCrossEntropyLoss()
trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})
TrainModelInBatches(trainer, X_train, Y_train, epochs, batch_size=batch_size)
In this section, we are making predictions on train and test datasets using our trained model. We have created a small function that loops through data in batches and makes predictions one batch of data at a time. It then combines predictions of all batches and returns.
As the output of our model is 10 probabilities per sample, we need to convert it to a single target class prediction. We have done that by retrieving the index of highest probability from 10 probabilities. The highest index will be target class prediction. We have done that using argmax() function.
def MakePredictions(input_data, batch_size=32):
batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
test_preds = MakePredictions(X_test,batch_size=256)
test_preds = nd.concatenate(test_preds).squeeze()
test_preds = test_preds.asnumpy().argmax(axis=1)
train_preds = MakePredictions(X_train,batch_size=256)
train_preds = nd.concatenate(train_preds).squeeze()
train_preds = train_preds.asnumpy().argmax(axis=1)
test_preds[:5], train_preds[:5]
In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions. We have also calculated the classification report of test predictions which has information like precision, recall, and f1-score per target class. We have calculated accuracy and classification report using functions available from scikit-learn.
If you want to learn about various methods available through scikit-learn to calculate various ML metrics then please feel free to check the below tutorial that covers the majority of them in detail.
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train.asnumpy(), train_preds)))
print("Test Accuracy : {:.3f}".format(accuracy_score(Y_test.asnumpy(), test_preds)))
from sklearn.metrics import classification_report
print("Test Classification Report ")
print(classification_report(Y_test.asnumpy(), test_preds))
The convolution layers of CNN work on channels of input data. It transforms channels. The number of channels can be different based on image types like grayscale, RGB, RGBA, CMYK, etc. The images in computers are represented using multi-dimensional arrays commonly referred to as tensors by the majority of ML libraries. In those tensors, the channel details are presented in two ways.
When we loaded our grayscale images above, we had introduced extra channels dimension for our images at the beginning. We had transformed image shape from (n_samples,28,28) to (n_samples,1,28,28). By default, the convolution layer available through Conv2D() constructor requires that channel details are present first in images. It works well if channel details are present first in images. We have created one example in the below cell to show how the convolution layer transforms images and how it transforms channels of images.
But what if the channel details are present at last in our images. How can we handle those situations?
Fortunately, the Conv2D() constructor available through MXNet let us easily handle those situations by providing the layout of data through layout parameter. We need to specify the data layout as a string. The string has 4 characters.
We have explained how we can handle channels last situation for example in the cell next to the below cell.
from mxnet.gluon import nn
conv1 = nn.Conv2D(channels=16, kernel_size=(3,3), activation="relu", padding=(1,1))
conv1.initialize()
conv2 = nn.Conv2D(channels=32, kernel_size=(3,3), activation="relu", padding=(1,1))
conv2.initialize()
preds1 = conv1(nd.random.uniform(shape=(50,1,28,28)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.weight.shape))
print("Weights of Second Conv Layer : {}".format(conv2.weight.shape))
print("\nInput Shape : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
from mxnet.gluon import nn
conv1 = nn.Conv2D(channels=16, kernel_size=(3,3), activation="relu", padding=(1,1), layout="NHWC")
conv1.initialize()
conv2 = nn.Conv2D(channels=32, kernel_size=(3,3), activation="relu", padding=(1,1), layout="NHWC")
conv2.initialize()
preds1 = conv1(nd.random.uniform(shape=(50,28,28,1)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.weight.shape))
print("Weights of Second Conv Layer : {}".format(conv2.weight.shape))
print("\nInput Shape : {}".format((50,28,28,1)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
This ends our small tutorial explaining how we can use Gluon API of MXNet to create convolutional neural networks. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to