Updated On : Jan-11,2022 Time Investment : ~30 mins

MXNet: Convolutional Neural Networks (CNN)

Convolutional neural networks (CNN) are a special kind of neural network that uses convolution operations on input data to find out patterns in it. The CNN commonly has one or more convolution layers followed by linear layers. It also sometimes includes batch normalization and max-pooling layers after convolution layers. CNN is commonly used for various tasks related to visual imagery like object detection, image segmentation, image classification, etc. It is also now commonly used in NLP and time-series problems. One of the main advantages of CNN is that it has quite fewer parameters to train compared to deep fully connected neural networks. If you want to know in detail about CNN then please check the blog on it which covers theory on it.

As a part of this tutorial, we'll be creating a small CNN network using MXNet. We'll be creating a CNN using Gluon API of MXNet. The main aim of the tutorial is to get individuals started developing CNN using MXNet. We have already created another tutorial on MXNet where we have explained how to create fully connected neural networks using Gluon API of MXNet. Please feel free to check if it was something you were looking for. It'll also provide background for this tutorial.

Below, we have listed important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial

  1. Simple Convolutional Neural Network
    • Load Dataset
    • Create CNN Model
    • Train Model
    • Make Predictions
    • Evaluate Model Performance
  2. Guide to Handle Channels First vs Channels Last
import mxnet

print("Mxnet Version : {}".format(mxnet.__version__))
Mxnet Version : 1.8.0

1. Simple Convolutional Neural Network

In this section, we'll explain how to create a simple convolutional neural network using MXNet. We'll be using the Fashion MNIST dataset available from keras for our purpose.

Load Dataset

In this section, we have loaded Fashion MNIST dataset available from keras. The dataset has grayscale images of 10 fashion items. The images have shape (28,28) pixels. The dataset is already divided into the train (60k images) and test (10k images) sets. After loading datasets, we have converted them from numpy arrays to MXNet NDArray using array() method of nd sub-module. We have also reshaped the original images and added one extra dimension at beginning ((n,28,28) -> (n,1,28,28)) to be considered as channel. As our images are grayscale, it has only one channel, the RGB images have 3 channels. The introduction of this extra dimension representing channels is required by convolution layers as they transform channel dimensions as we'll see below.

After reshaping images datasets, we have also divided images by float value 255. This is done to bring all values in the range [0,1]. It'll help the optimization algorithm to converge faster. By default, images are represented as integers in the range [0,255].

from tensorflow import keras
from sklearn.model_selection import train_test_split

from mxnet import nd
import numpy as np

(X_train, Y_train), (X_test, Y_test) = keras.datasets.fashion_mnist.load_data()

X_train, X_test, Y_train, Y_test = nd.array(X_train, dtype=np.float32),\
                                   nd.array(X_test, dtype=np.float32),\
                                   nd.array(Y_train, dtype=np.float32),\
                                   nd.array(Y_test, dtype=np.float32)

X_train, X_test = X_train.reshape(-1,1,28,28), X_test.reshape(-1,1,28,28)

X_train, X_test = X_train/255.0, X_test/255.0

classes =  np.unique(Y_train.asnumpy())

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 0us/step
40960/29515 [=========================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 0s 0us/step
26435584/26421880 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
16384/5148 [===============================================================================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 0s 0us/step
4431872/4422102 [==============================] - 0s 0us/step
((60000, 1, 28, 28), (10000, 1, 28, 28), (60000,), (10000,))

Create CNN Model

In this section, we have created a CNN using Gluon API of MXNet. We have created a CNN by creating a class that extends nn.Block class. We have defined two methods inside the class.

  1. init() - In this method, we have defined all the layers that we'll be using in our neural network.
  2. forward() - In this method, we have performed a forward pass through our neural network. We have applied layers defined in init() method in the sequence in which we want them to be applied to input data. At last, we have returned the predictions.

Our neural networks consist of two convolution layers. The first convolution layer has 32 output channels and a kernel size of (3,3). The second convolution layer has 16 output channels and a kernel size of (3,3). We have created both convolution layers using Conv2D() constructor available through 'nn' submodule of mxnet.gluon module. We have specified to use Relu (rectified linear unit) activation function on the output of both convolution layers. Our forward() method first applies these two convolution layers to input data. It then flattens the output of the second convolution layer and feeds them to the linear layer. The linear layer has 10 output units which are the same as the number of unique target classes. At last, we have applied softmax() activation to the output of the last linear layer. It'll map the outputs of linear layers to probabilities in the range [0,1] such that the sum of all 10 values per sample will be 1.

Our input has a shape of (n_samples,1,28,28). The first convolution layer will transform the input to shape (n_samples,32,28,28) from (n_samples,1,28,28). The second convolution layer will transform the input data from shape (n_samples,32,28,28) to (n_samples,16,28,28). The flatten layer will flatten the output of second convolution layer. It'll transform the data shape from (n_samples,16,28,28) to (n_samples, 16 x 28 x 28) = (n_samples, 12544). The linear layer will transform input data shape from (n_samples, 12544) to (n_samples, 10).

After defining our CNN, we have created CNN by creating an instance of the class. Then we have initialized the network weights by calling initialize() method on it and performed the forward pass through the network with a few data samples for verification.

Then, in the below cell, we have retrieved model weights using method collect_params() and printed the shape of weights and biases.

from mxnet.gluon import nn

class CNN(nn.Block):
    def __init__(self, **kwargs):
        super(CNN, self).__init__(**kwargs)
        self.conv1 = nn.Conv2D(channels=32, kernel_size=(3,3), activation="relu", padding=(1,1))
        self.conv2 = nn.Conv2D(channels=16, kernel_size=(3,3), activation="relu", padding=(1,1))
        self.flatten = nn.Flatten()
        self.linear = nn.Dense(len(classes))

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)

        x = self.flatten(x)
        x = self.linear(x)
        return nd.softmax(x)

model = CNN()

model
CNN(
  (conv1): Conv2D(None -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (conv2): Conv2D(None -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (flatten): Flatten
  (linear): Dense(None -> 10, linear)
)
model.initialize()

preds = model(X_train[:5])

preds
[[0.10568322 0.10078681 0.11555998 0.08337371 0.10345909 0.10854454
  0.09054128 0.09666862 0.09255377 0.10282896]
 [0.11884566 0.10166357 0.10785108 0.08734033 0.10306055 0.09939245
  0.09628883 0.08728751 0.08772776 0.11054229]
 [0.10358198 0.09399475 0.10747004 0.09911988 0.10435332 0.09818961
  0.09988517 0.09489598 0.09259309 0.1059161 ]
 [0.10664859 0.09304286 0.10677639 0.09555829 0.10678992 0.09574397
  0.09915031 0.09263102 0.09256649 0.1110922 ]
 [0.10790703 0.08977602 0.11213429 0.09285845 0.11377399 0.09814226
  0.0981866  0.08502316 0.09123085 0.11096737]]
<NDArray 5x10 @cpu(0)>
model
CNN(
  (conv1): Conv2D(1 -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (conv2): Conv2D(32 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), Activation(relu))
  (flatten): Flatten
  (linear): Dense(12544 -> 10, linear)
)
for layer_name, weights in model.collect_params().items():
    print("{} - {}".format(layer_name,weights.shape))
conv0_weight - (32, 1, 3, 3)
conv0_bias - (32,)
conv1_weight - (16, 32, 3, 3)
conv1_bias - (16,)
dense0_weight - (10, 12544)
dense0_bias - (10,)

Train Model

In this section, we are training the CNN we defined in previous cells. We have created a small function that takes a few things as input and performs training. The function takes Trainer object, data features, target values, number of epochs, and batch size as input. It then executes the training loop number of epochs times. We have defined Trainer object in the next cell when we call this function.

For each epoch, it calculates the start and end indexes of batches of data. It then loops through data in batches. For each batch, it performs a forward pass-through model by giving data to it. The batch predictions are recorded and loss value is calculated using actual target values along with predictions. These two operations (forward pass and loss calculation) are performed in autograd.record() context manager. The context manager helps with gradients calculation. After loss calculation, we call backward() method on loss to calculate gradients. We then record the loss of the batch. Then, we call step() function on Trainer object that updates the weights of CNN based on learning rate using gradients. We are also printing loss value at every epoch to check model progress.

from mxnet import autograd

def TrainModelInBatches(trainer, X, Y, epochs, batch_size=32):
    for i in range(1, epochs+1):
        batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices

        losses = [] ## Record loss of each batch
        for batch in batches:
            batch = batch.asscalar()
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data

            with autograd.record():
                preds = model(X_batch) ## Forward pass to make predictions
                loss_val = loss_func(preds.squeeze(), Y_batch) ## Calculate Loss
            loss_val.backward() ## Calculate Gradients

            loss_val = loss_val.mean().asscalar()
            losses.append(loss_val)

            trainer.step(len(X_batch)) ## Update weights

        print("CrossEntropyLoss : {:.3f}".format(np.array(losses).mean()))

In the below cell, we are calling the training function designed in the previous cell to perform training. We have initialized the number of epochs to 25, learning rate to 0.001, and batch size to 256. Then, we have created CNN and initialized its weights. Followed by it, we have defined the loss function that we'll use for our purpose. We'll be using SoftmaxCrossEntropyLoss() loss. It's a cross-entropy loss for multi-class classification problems.

Then, we have initialized the Trainer object. The Trainer object has details about the optimizer. We have given model parameters to trainer using collect_params() method. We have instructed it to use Adam optimizer for our case. The third parameter to Trainer object is a dictionary specifying parameters that will be given to the optimizer.

After initializing Trainer object, we have called our training function with the necessary parameters to perform training. We can notice from the loss value getting printed at every epoch that our model seems to be doing a good job.

from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd

batch_size=256
epochs=25
learning_rate = 0.001

model = CNN()
model.initialize()
loss_func = loss.SoftmaxCrossEntropyLoss()

trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})

TrainModelInBatches(trainer, X_train, Y_train, epochs, batch_size=batch_size)
CrossEntropyLoss : 1.725
CrossEntropyLoss : 1.649
CrossEntropyLoss : 1.635
CrossEntropyLoss : 1.630
CrossEntropyLoss : 1.625
CrossEntropyLoss : 1.621
CrossEntropyLoss : 1.618
CrossEntropyLoss : 1.616
CrossEntropyLoss : 1.614
CrossEntropyLoss : 1.612
CrossEntropyLoss : 1.610
CrossEntropyLoss : 1.609
CrossEntropyLoss : 1.608
CrossEntropyLoss : 1.599
CrossEntropyLoss : 1.568
CrossEntropyLoss : 1.560
CrossEntropyLoss : 1.555
CrossEntropyLoss : 1.552
CrossEntropyLoss : 1.549
CrossEntropyLoss : 1.548
CrossEntropyLoss : 1.545
CrossEntropyLoss : 1.543
CrossEntropyLoss : 1.540
CrossEntropyLoss : 1.539
CrossEntropyLoss : 1.537

Make Predictions

In this section, we are making predictions on train and test datasets using our trained model. We have created a small function that loops through data in batches and makes predictions one batch of data at a time. It then combines predictions of all batches and returns.

As the output of our model is 10 probabilities per sample, we need to convert it to a single target class prediction. We have done that by retrieving the index of highest probability from 10 probabilities. The highest index will be target class prediction. We have done that using argmax() function.

def MakePredictions(input_data, batch_size=32):
    batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices

    preds = []
    for batch in batches:
        batch = batch.asscalar()
        if batch != batches[-1]:
            start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
        else:
            start, end = int(batch*batch_size), None

        X_batch = input_data[start:end]

        preds.append(model(X_batch))

    return preds
test_preds = MakePredictions(X_test,batch_size=256)

test_preds = nd.concatenate(test_preds).squeeze()

test_preds = test_preds.asnumpy().argmax(axis=1)

train_preds = MakePredictions(X_train,batch_size=256)

train_preds = nd.concatenate(train_preds).squeeze()

train_preds = train_preds.asnumpy().argmax(axis=1)

test_preds[:5], train_preds[:5]
(array([9, 2, 1, 1, 6]), array([9, 0, 0, 3, 1]))

Evaluate Model Performance

In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions. We have also calculated the classification report of test predictions which has information like precision, recall, and f1-score per target class. We have calculated accuracy and classification report using functions available from scikit-learn.

If you want to learn about various methods available through scikit-learn to calculate various ML metrics then please feel free to check the below tutorial that covers the majority of them in detail.

from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train.asnumpy(), train_preds)))
print("Test  Accuracy : {:.3f}".format(accuracy_score(Y_test.asnumpy(), test_preds)))
Train Accuracy : 0.925
Test  Accuracy : 0.895
from sklearn.metrics import classification_report

print("Test Classification Report ")
print(classification_report(Y_test.asnumpy(), test_preds))
Test Classification Report
              precision    recall  f1-score   support

         0.0       0.82      0.86      0.84      1000
         1.0       0.98      0.98      0.98      1000
         2.0       0.85      0.81      0.83      1000
         3.0       0.88      0.91      0.90      1000
         4.0       0.80      0.87      0.84      1000
         5.0       0.97      0.97      0.97      1000
         6.0       0.74      0.65      0.69      1000
         7.0       0.96      0.96      0.96      1000
         8.0       0.97      0.97      0.97      1000
         9.0       0.96      0.96      0.96      1000

    accuracy                           0.90     10000
   macro avg       0.89      0.90      0.89     10000
weighted avg       0.89      0.90      0.89     10000

2. Guide to Handle Channels First vs Channels Last

The convolution layers of CNN work on channels of input data. It transforms channels. The number of channels can be different based on image types like grayscale, RGB, RGBA, CMYK, etc. The images in computers are represented using multi-dimensional arrays commonly referred to as tensors by the majority of ML libraries. In those tensors, the channel details are presented in two ways.

  1. Channels First - Here, we represent RGB image of (28,28) pixels as (3,28,28).
  2. Channels Last - Here, we represent RGB image of (28,28) pixels as (28,28,3).

When we loaded our grayscale images above, we had introduced extra channels dimension for our images at the beginning. We had transformed image shape from (n_samples,28,28) to (n_samples,1,28,28). By default, the convolution layer available through Conv2D() constructor requires that channel details are present first in images. It works well if channel details are present first in images. We have created one example in the below cell to show how the convolution layer transforms images and how it transforms channels of images.

But what if the channel details are present at last in our images. How can we handle those situations?

Fortunately, the Conv2D() constructor available through MXNet let us easily handle those situations by providing the layout of data through layout parameter. We need to specify the data layout as a string. The string has 4 characters.

  • 'NCHW' - (n_samples, channels, height, width) - This is default value.
  • 'NHWC' - (n_samples, height, width, channels) - Here we have instructed to use channels at the end.

We have explained how we can handle channels last situation for example in the cell next to the below cell.

from mxnet.gluon import nn

conv1 = nn.Conv2D(channels=16, kernel_size=(3,3), activation="relu", padding=(1,1))
conv1.initialize()

conv2 = nn.Conv2D(channels=32, kernel_size=(3,3), activation="relu", padding=(1,1))
conv2.initialize()

preds1 = conv1(nd.random.uniform(shape=(50,1,28,28)))
preds2 = conv2(preds1)

print("Weights of First Conv Layer : {}".format(conv1.weight.shape))
print("Weights of Second Conv Layer : {}".format(conv2.weight.shape))

print("\nInput Shape               : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
Weights of First Conv Layer : (16, 1, 3, 3)
Weights of Second Conv Layer : (32, 16, 3, 3)

Input Shape               : (50, 1, 28, 28)
Conv Layer 1 Output Shape : (50, 16, 28, 28)
Conv Layer 2 Output Shape : (50, 32, 28, 28)
from mxnet.gluon import nn

conv1 = nn.Conv2D(channels=16, kernel_size=(3,3), activation="relu", padding=(1,1), layout="NHWC")
conv1.initialize()

conv2 = nn.Conv2D(channels=32, kernel_size=(3,3), activation="relu", padding=(1,1), layout="NHWC")
conv2.initialize()

preds1 = conv1(nd.random.uniform(shape=(50,28,28,1)))
preds2 = conv2(preds1)

print("Weights of First Conv Layer : {}".format(conv1.weight.shape))
print("Weights of Second Conv Layer : {}".format(conv2.weight.shape))

print("\nInput Shape               : {}".format((50,28,28,1)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
Weights of First Conv Layer : (16, 3, 3, 1)
Weights of Second Conv Layer : (32, 3, 3, 16)

Input Shape               : (50, 28, 28, 1)
Conv Layer 1 Output Shape : (50, 28, 28, 16)
Conv Layer 2 Output Shape : (50, 28, 28, 32)

This ends our small tutorial explaining how we can use Gluon API of MXNet to create convolutional neural networks. Please feel free to let us know your views in the comments section.

References

Sunny Solanki  Sunny Solanki

YouTube Subscribe Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Need Help Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.


Subscribe to Our YouTube Channel

YouTube SubScribe

Newsletter Subscription