Updated On : Jan-05,2022 Time Investment : ~30 mins

MXNet: Guide to Create Neural Networks

MXNet is an open-source framework for developing deep neural networks. It was co-developed by Carlos Guestrin at the University of Washington. Its primary API is available as python but provides API for other languages like Java, C++, Julia, R, Scala, etc. MXNet allows us to create neural networks which can work on CPUs as well as on GPUs. MXNet has two primary APIs for creating neural networks.

  1. Gluon API
  2. Module API

Internally MXNet maintains as arrays as NDArray objects which is a data structure developed by MXNet for maintaining multi-dimensional arrays. It provides some of the functions provided by numpy. As a part of this tutorial, we'll explain how to develop simple neural networks with Gluon API of MXNet. We'll be using small toy datasets available from scikit-learn for solving simple regression and classification tasks. The main aim of the tutorial is to get newcomers started developing neural networks using MXNet.

Below we have listed important sections of the tutorial to give an overview of the material covered.

Important Sections of Tutorial

  1. Regression
    • Load Dataset
    • Normalize Data
    • Create Neural Network
    • Train Neural Network
    • Make Predictions
    • Evaluate Model Performance
    • Train Model on Batches of Data
    • Make Predictions in Batches
    • Evaluate Model Performance
  2. Classification

Installation

  • pip install mxnet
import mxnet

print("Mxnet Version : {}".format(mxnet.__version__))
Mxnet Version : 1.9.0

1. Regression

In this section, we'll explain how we can create simple neural networks using MXNet to solve simple regression tasks. We'll be using a small regression dataset available from scikit-learn for our purposes.

Load Dataset

In this section, we have loaded the Boston housing dataset available from scikit-learn. We have loaded data features in variable X and target values in variable Y. The target values are median house prices in 1000 dollars. As target values are continuous, this will be a regression task. We have then divided the dataset into the train (80%) and test (20%) sets. We have then converted numpy arrays to NDArray type from np module of MXNet. We have converted it to NDArray type from np module as it let us perform statistical operations like mean and standard deviation on it. The NDArray available from nd module does not have many statistical operations. We can easily convert NDArray between np and nd module. Our neural networks require that we use NDArray created from nd module hence we'll be converting it during the training process. We have used NDArray from np module because we want to normalize data in the next section and standard deviation statistical function is not available for NDArray from nd module.

from mxnet import nd, np
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train, X_test, Y_train, Y_test = np.array(X_train), np.array(X_test), np.array(Y_train), np.array(Y_test)

samples, features = X_train.shape

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
((404, 13), (102, 13), (404,), (102,))

Normalize Data

In this section, we have normalized our train and test datasets. The data normalization is performed so that features of data that are on different scales and vary a lot in values can be brought to the same scale. It'll help the optimization function converge faster.

In order to normalize data, we have first calculated the mean and standard deviation of features of train data. We have then subtracted the mean from both train and test sets. At last, we have divided subtracted values by standard deviation.

mean = X_train.mean(axis=0)
std = X_train.std(axis=0)

X_train = (X_train - mean) / std
X_test = (X_test - mean) / std

Create Neural Network

In this section, we'll explain how to create a neural network using MXNet. The gluon module of MXNet provides a module named nn which provides API for creating neural networks. It has available classes for creating layers of different types like linear, convolutional, lstm, etc. Below, we'll explain two different ways of creating a neural network using MXNet.

Stack Layers Sequentially

The first way of creating a neural network is using Sequential() constructor. It creates neural networks whose layers will be applied sequentially to data one after another in the order in which they were added. The Sequential object provides a method named add() that lets us add layers to our sequential neural network.

The linear layers can be created using Dense() constructor available through nn module. It accepts a number of units of that layer as the first parameter. We can also provide activation to be applied to the output of the layer. Apart from units and activation, we can also provide details like weight initializer function, bias initializer, data type, the flag indicating whether to use bias or not, etc. We can add layers all together by calling add() method once or we can call add() method more than once as well to add layers.

Below we have created neural network with layer sizes [5,10,15,1] and used activation function rectified linear unit (relu) for hidden layers.

This way of creating a neural network is almost identical to Sequential API of keras.

from mxnet.gluon import nn

model = nn.Sequential()

model.add(
            nn.Dense(5,activation="relu"),
            nn.Dense(10,activation="relu"),
            nn.Dense(15,activation="relu"),
            nn.Dense(1)
         )

model
Sequential(
  (0): Dense(None -> 5, Activation(relu))
  (1): Dense(None -> 10, Activation(relu))
  (2): Dense(None -> 15, Activation(relu))
  (3): Dense(None -> 1, linear)
)
for w_name, w in model.collect_params().items():
    print(w_name, w.shape)
dense0_weight (5, 0)
dense0_bias (5,)
dense1_weight (10, 0)
dense1_bias (10,)
dense2_weight (15, 0)
dense2_bias (15,)
dense3_weight (1, 0)
dense3_bias (1,)

Once we have created a neural network, we need to call initialize() method on it to initialize the model. It'll create neural network parameters. Then, we can call the model by giving input data and it'll make predictions.

Please make a NOTE that MXNet neural networks require data presented as NDArray created from nd module as input. In our case, we have created NDArray from np module earlier, hence we have converted it to nd module NDArray using method named as_nd_ndarray().

The astype() method works like the numpy astype method that can be used to convert an array from one data type to another.

model.initialize()
X_sample  = X_train[:5].as_nd_ndarray()

model(X_sample.astype("float32"))
[[3.8899941e-04]
 [1.6881044e-05]
 [5.0215105e-05]
 [9.8287455e-05]
 [4.2939623e-04]]
<NDArray 5x1 @cpu(0)>

We can access model parameters by calling method collect_params(). It returns a dictionary-like object which has information about the weights and biases of each layer of the neural network.

model.collect_params()
sequential0_ (
  Parameter dense0_weight (shape=(5, 13), dtype=float32)
  Parameter dense0_bias (shape=(5,), dtype=float32)
  Parameter dense1_weight (shape=(10, 5), dtype=float32)
  Parameter dense1_bias (shape=(10,), dtype=float32)
  Parameter dense2_weight (shape=(15, 10), dtype=float32)
  Parameter dense2_bias (shape=(15,), dtype=float32)
  Parameter dense3_weight (shape=(1, 15), dtype=float32)
  Parameter dense3_bias (shape=(1,), dtype=float32)
)
for w_name, w in model.collect_params().items():
    print(w_name, w.shape)
dense0_weight (5, 13)
dense0_bias (5,)
dense1_weight (10, 5)
dense1_bias (10,)
dense2_weight (15, 10)
dense2_bias (15,)
dense3_weight (1, 15)
dense3_bias (1,)
Extend nn.Block Class to Create Neural Network

The second way of creating a neural network gives us more flexibility and control over how to execute layers. In this section, we have created a neural network by extending it. nn.Block class. We define all the layers of neural network in __init__() method and actual forward pass logic inside of forward() method. This gives us more control over forward pass logic as we can define things the way we want rather than simple forward pass through layers one by one.

Below we have created a neural network with layers [5,10,15,1]. We have used relu activation for all hidden layers.

This way of creating a neural network is almost identical to creating a neural network in PyTorch by extending nn.Module class.

from mxnet.gluon import nn

class MLP(nn.Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        self.linear1 = nn.Dense(5,activation="relu")
        self.linear2 = nn.Dense(10,activation="relu")
        self.linear3 = nn.Dense(15,activation="relu")
        self.linear4 = nn.Dense(1)

    def forward(self, x):
        x = self.linear1(x)
        x = self.linear2(x)
        x = self.linear3(x)

        return self.linear4(x)

In the below cell, we have created our neural network by creating an instance of MLP class. We have then initialized the network by calling initialize() method on it. Then in the next cell, we have performed forward pass through sample data using our network.

model = MLP()

model
MLP(
  (linear1): Dense(None -> 5, Activation(relu))
  (linear2): Dense(None -> 10, Activation(relu))
  (linear3): Dense(None -> 15, Activation(relu))
  (linear4): Dense(None -> 1, linear)
)
model.initialize()
X_sample  = X_train[:5].as_nd_ndarray()

model(X_sample.astype("float32"))
[[-7.7166100e-05]
 [-3.7268012e-05]
 [ 2.6801605e-05]
 [ 6.1092760e-05]
 [-8.9538378e-05]]
<NDArray 5x1 @cpu(0)>
model
MLP(
  (linear1): Dense(13 -> 5, Activation(relu))
  (linear2): Dense(5 -> 10, Activation(relu))
  (linear3): Dense(10 -> 15, Activation(relu))
  (linear4): Dense(15 -> 1, linear)
)

We can retrieve weights of neural networks by calling weight and bias property of layer objects.

weight1 = model.linear1.weight.data()
bias1 = model.linear1.bias.data()

print("Weight Data Type : {}".format(type(weight1)))
print("Bias   Data Type : {}".format(type(bias1)))

weight1.shape, bias1.shape
Weight Data Type : <class 'mxnet.ndarray.ndarray.NDArray'>
Bias   Data Type : <class 'mxnet.ndarray.ndarray.NDArray'>
((5, 13), (5,))

Train Neural Network

In this section, we'll explain how we can train our neural network in MXNet. We have imported all necessary modules at the beginning. First, we have initialized a number of epochs to 2000 and the learning rate to 0.001. We have then initialized our model by creating an instance of MLP class we created earlier and calling initialize() method on it. We have then initialized the loss function for our regression task. We'll be using the mean square error loss function and try to minimize it.

In order to update model parameters during training, we need to create Trainer object by calling Trainer() constructor. We provide model parameters to the constructor by calling collect_params() method on it. Then the second argument to Trainer() constructor is either name of optimizer or the instance of Optimizer specifying optimizer that we'll use for updating weights. The third argument to Trainer() constructor is dictionary specifying parameters to Optimizer object. When we specify the optimizer name as a string, MXNet will internally create Optimizer instance for that optimizer.

After creating Trainer instance, we are executing the training loop number of epoch time. Each time, we are first making a prediction using our model by performing a forward pass through it and then calculating loss value for predictions. We are performing these two steps inside autograd.record() context manager which will help us record gradients of loss with respect to model weights/biases. We are then calling backward() on loss value to calculate gradients of loss. In order to actually update weights, we need to call step() method on Trainer instance by giving the size of data (batch size) given to the model. In our case, the batch size is whole data.

We have also printed the loss value at every 100 epochs to check progress. We can notice from the decreasing loss value that our model is doing better.

from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd

epochs=2000
learning_rate = 0.001

model = MLP()
model.initialize()
mse_loss = loss.L2Loss()

trainer = gluon.Trainer(model.collect_params(), "sgd", {"learning_rate": learning_rate})

for epoch in range(1, epochs+1):
    with autograd.record():
        preds = model(X_train.as_nd_ndarray().astype("float32")) ## Make Predictions
        loss_val = mse_loss(preds.squeeze(), Y_train.as_nd_ndarray().astype("float32")) ## Calculate Loss
    loss_val.backward() ## Calculate Gradients

    loss_val = loss_val.mean().asscalar()

    trainer.step(len(X_train)) ## Update Weights

    if epoch%100==0:
        print("MSE : {:.3f}".format(loss_val))
MSE : 244.692
MSE : 124.798
MSE : 16.365
MSE : 8.616
MSE : 7.689
MSE : 7.108
MSE : 6.503
MSE : 5.989
MSE : 5.636
MSE : 5.346
MSE : 5.085
MSE : 4.861
MSE : 4.694
MSE : 4.579
MSE : 4.488
MSE : 4.426
MSE : 4.385
MSE : 4.356
MSE : 4.309
MSE : 4.267

Make Predictions

In this section, we are making predictions on train and test datasets. We can make predictions by simply calling the model object giving data to it.

train_preds = model(X_train.as_nd_ndarray().astype("float32"))

train_preds[:5]
[[47.71743 ]
 [12.079327]
 [19.68465 ]
 [27.227922]
 [17.7092  ]]
<NDArray 5x1 @cpu(0)>
test_preds = model(X_test.as_nd_ndarray().astype("float32"))

test_preds[:5]
[[22.747108]
 [27.243809]
 [42.956066]
 [21.473675]
 [29.156776]]
<NDArray 5x1 @cpu(0)>

Evaluate Model Performance

In this section, we are evaluating the performance of our regression model by calculating r^2 score on train and test predictions. The r^2 score is calculated for regression tasks and generally has value in the range [0-1] for good models. The values near 1 are considered a good model. Below, we have calculated r^2 score on the train and test predictions using r2_score() method available from scikit-learn. We can notice from the value of the score that our model seems to be doing a good job.

If you want to learn in-depth about r^2 score then please feel free to check our tutorial on metrics available from scikit-learn which covers it in detail.

from sklearn.metrics import r2_score

print("Test  R^2 Score : {:.2f}".format(r2_score(test_preds.squeeze().asnumpy(),Y_test.asnumpy())))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.squeeze().asnumpy(), Y_train.asnumpy())))
Test  R^2 Score : 0.71
Train R^2 Score : 0.89

Train Model on Batches of Data

Our previous training code worked on whole training data at a time. But in real-life, the datasets can be large and we can not fit the whole dataset into computer memory. In those situations, we train the model on a small batch of data at a time bringing only that many samples in the main memory of the computer. We go through the whole data in batches taking on a specified number of samples at a time to train the model.

Our dataset for this example is small and fits into the main memory of the computer. In order to explain how to perform training on data in batches, we'll treat our dataset as if it does not fit into the main memory. We'll divide our dataset into batches and train the model by giving a single batch of data at a time.

Below we have designed a function that implements our logic of training a neural network in batches. The function has one main loop which executes the training number of epochs time. For each epoch, we are calculating indexes of batches. We are then looping through indexes of batches taking a single batch of data and training model with it until the whole data is covered. The training process is exactly the same as we had explained earlier with the only difference being that we are training on the batch of data at a time rather than whole data. We can provide batch size as the last parameter of the method.

def TrainModelInBatches(trainer, X, Y, learning_rate, epochs, batch_size=32):
    for i in range(epochs):
        batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices

        losses = [] ## Record loss of each batch
        for batch in batches:
            batch = batch.asscalar()
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data

            with autograd.record():
                preds = model(X_batch.as_nd_ndarray().astype("float32")) ## Make Predictions
                loss_val = loss_func(preds.squeeze(), Y_batch.as_nd_ndarray().astype("float32")) ## Calculate Loss

            loss_val.backward() ## Calculate Gradients

            loss_val = loss_val.mean().asscalar()
            losses.append(loss_val)

            trainer.step(len(X_batch)) ## Update weights

        if i % 100 == 0: ## Print MSE every 100 epochs
            print("MSE : {:.2f}".format(np.array(losses).mean()))

Below, we are actually training our neural network in batches. We have initialized the number of epochs to 500, learning rate to 0.001, and batch size to 32. We have then initialized our neural network and its weights by calling initialize() method on it. We have then created a L2Loss() function. Then, we have created a Trainer object which will have optimizer details and will be used to update the weights of the neural network.

At last, we have called our training function which we had defined in the previous cell with specified parameters. We can notice from the loss value getting printed every 100 epochs that our model is doing a good job.

from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd

epochs=500
learning_rate = 0.001
batch_size=32

model = MLP()
model.initialize()
loss_func = loss.L2Loss()

trainer = gluon.Trainer(model.collect_params(), "sgd", {"learning_rate": learning_rate})

TrainModelInBatches(trainer, X_train, Y_train, learning_rate, epochs, batch_size=batch_size)
MSE : 289.83
MSE : 6.11
MSE : 5.51
MSE : 4.94
MSE : 4.67

Make Predictions in Batches

In this section, we are making predictions on train and test datasets. We have designed a function that makes predictions in batches as we can't bring whole data into main memory, we need to make predictions in batches. It uses the same logic to calculate indexes of batches as we used in the training function earlier.

def MakePredictions(input_data, batch_size=32):
    batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices

    preds = []
    for batch in batches:
        batch = batch.asscalar()
        if batch != batches[-1]:
            start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
        else:
            start, end = int(batch*batch_size), None

        X_batch = input_data[start:end]

        preds.append(model(X_batch))

    return preds
test_preds = MakePredictions(X_test.as_nd_ndarray().astype("float32"))

test_preds = nd.concatenate(test_preds).squeeze()

train_preds = MakePredictions(X_train.as_nd_ndarray().astype("float32"))

train_preds = nd.concatenate(train_preds).squeeze()

test_preds[:5], train_preds[:5]
(
 [25.190327 25.724134 43.929695 22.621965 30.559372]
 <NDArray 5 @cpu(0)>,

 [46.939785 11.804449 21.221048 25.186422 17.19573 ]
 <NDArray 5 @cpu(0)>)

Evaluate Model Performance

In this section, we have evaluated the performance of our neural network which was trained on train data in batches by calculating r2 score on train and test predictions. We can notice from the scores that both are near 1 which indicates that our model is doing decent job prediction.

from sklearn.metrics import r2_score

print("Test  R^2 Score : {:.2f}".format(r2_score(test_preds.squeeze().asnumpy(),Y_test.asnumpy())))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.squeeze().asnumpy(), Y_train.asnumpy())))
Test  R^2 Score : 0.71
Train R^2 Score : 0.88

2. Classification

In this section, we'll explain how we can create a simple neural network using MXNet to solve classification tasks. We'll be using the breast cancer dataset available from scikit-learn for our explanation purposes. The majority of the code in this section is repeated from the regression section with a few minor changes. Hence, we won't include a detailed description of parts that are repeated here. We'll only include descriptions when there is something new to explain.

Load Dataset

In this section, we have loaded the Breast cancer dataset available from scikit-learn. The target values of the dataset are either 1 indicating malignant cancer or 0 indicating benign cancer. The features (independent variables) of the dataset are various measurements of tumors. The dataset has two outcomes, this will be a binary classification problem. After loading the dataset, we have divided it into the train (80%) and test (20%) sets.

from mxnet import nd, np
from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_breast_cancer(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

X_train, X_test, Y_train, Y_test = np.array(X_train), np.array(X_test), np.array(Y_train), np.array(Y_test)

samples, features = X_train.shape
classes = np.unique(np.array(Y))

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
((455, 30), (114, 30), (455,), (114,))
samples, features, classes
(455, 30, array([0, 1], dtype=int64))

Normalize Data

In this section, we have normalized our train and test datasets by using mean and standard deviations calculated on the training dataset.

mean = X_train.mean(axis=0)
std = X_train.std(axis=0)

X_train = (X_train - mean) / std
X_test = (X_test - mean) / std

Create Neural Network

In this section, we have created our neural network by extending nn.Block class like we had done in the regression section. Our neural network for this section is exactly the same as our neural network from the regression section with the only difference that activation for the last layer is sigmoid. The sigmoid function maps values in the range 0-1. After creating the model, we have initialized it and performed one forward pass through the network to make predictions for verification purposes.

from mxnet.gluon import nn

class MLP(nn.Block):
    def __init__(self, **kwargs):
        super(MLP, self).__init__(**kwargs)
        self.linear1 = nn.Dense(5,activation="relu")
        self.linear2 = nn.Dense(10,activation="relu")
        self.linear3 = nn.Dense(15,activation="relu")
        self.linear4 = nn.Dense(1, activation="sigmoid")

    def forward(self, x):
        x = self.linear1(x)
        x = self.linear2(x)
        x = self.linear3(x)

        return self.linear4(x)

model = MLP()

model
MLP(
  (linear1): Dense(None -> 5, Activation(relu))
  (linear2): Dense(None -> 10, Activation(relu))
  (linear3): Dense(None -> 15, Activation(relu))
  (linear4): Dense(None -> 1, Activation(sigmoid))
)
model.initialize()

model(X_train[:5].as_nd_ndarray().astype("float32"))
[[0.50000834]
 [0.50001526]
 [0.50000095]
 [0.5000808 ]
 [0.5000067 ]]
<NDArray 5x1 @cpu(0)>
model
MLP(
  (linear1): Dense(30 -> 5, Activation(relu))
  (linear2): Dense(5 -> 10, Activation(relu))
  (linear3): Dense(10 -> 15, Activation(relu))
  (linear4): Dense(15 -> 1, Activation(sigmoid))
)

Train Model

In this section, we have included logic that trains our model. We have initialized a number of epochs to 500 and the learning rate to 0.001. We have then created a model and initialized it. The loss function that we have used for the binary classification task is SigmoidBCELoss (sigmoid binary cross-entropy loss). In our Trainer object, we are using Adam optimizer this time. The rest of the training code is exactly the same as the code from the regression section. We can notice from loss values getting printed every 100 epochs that our model is doing a good job.

from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd

epochs=500
learning_rate = 0.001

model = MLP()
model.initialize()
loss_func = loss.SigmoidBCELoss(from_sigmoid=True) # loss.LogisticLoss(label_format="binary")

trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})

for epoch in range(1, epochs+1):
    with autograd.record():
        preds = model(X_train.as_nd_ndarray().astype("float32")) ## Make Predictions
        loss_val = loss_func(preds, Y_train.as_nd_ndarray().astype("float32")) ## Calculate Loss

    loss_val.backward() ## Calculate Gradients

    loss_val = loss_val.mean().asscalar()

    trainer.step(len(X_train)) ## Update Model Weights

    if epoch%100==0:
        print("BCE : {:.3f}".format(loss_val))
BCE : 0.392
BCE : 0.255
BCE : 0.095
BCE : 0.054
BCE : 0.038

Make Predictions

In this section, we are making predictions on our train and test datasets using our trained model. As the output of our model is in the range [0,1] due to the sigmoid activation function. We need to convert these float values to the actual prediction class (0 or 1). We have done that by setting a threshold at 0.5, predicting all values less than the threshold as class 0 and all values greater as class 1.

train_preds = model(X_train.as_nd_ndarray().astype("float32"))

train_preds = (train_preds > 0.5).astype("float32")

train_preds[:5]
[[1.]
 [1.]
 [0.]
 [0.]
 [1.]]
<NDArray 5x1 @cpu(0)>
test_preds = model(X_test.as_nd_ndarray().astype("float32"))

test_preds = (test_preds > 0.5).astype("float32")

test_preds[:5]
[[0.]
 [0.]
 [1.]
 [1.]
 [1.]]
<NDArray 5x1 @cpu(0)>

Evaluate Model Performance

In this section, we have evaluated the performance of our binary classification model by calculating the accuracy of train and test predictions. We have also printed a classification report on the test dataset which has information like precision, recall, and f1-score.

Please feel free to check our tutorial explaining metrics from scikit-learn to know about classification reports and accuracy score functions.

from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train.asnumpy(), train_preds.squeeze().asnumpy())))
print("Test  Accuracy : {:.2f}".format(accuracy_score(Y_test.asnumpy(), test_preds.squeeze().asnumpy())))
Train Accuracy : 0.99
Test  Accuracy : 0.99
from sklearn.metrics import classification_report

print("Classification Report :")
print(classification_report(Y_test.asnumpy(), test_preds.squeeze().asnumpy()))
Classification Report :
              precision    recall  f1-score   support

           0       0.98      1.00      0.99        42
           1       1.00      0.99      0.99        72

    accuracy                           0.99       114
   macro avg       0.99      0.99      0.99       114
weighted avg       0.99      0.99      0.99       114

Train Model on Batches of Data

In this section, we have included the function to train the model on batches of data. The code for this function is almost exactly the same as the function we had used in the regression section.

def TrainModelInBatches(trainer, X, Y, learning_rate, epochs, batch_size=32):
    for i in range(epochs):
        batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices

        losses = [] ## Record loss of each batch
        for batch in batches:
            batch = batch.asscalar()
            if batch != batches[-1]:
                start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
            else:
                start, end = int(batch*batch_size), None

            X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data

            with autograd.record():
                preds = model(X_batch.as_nd_ndarray().astype("float32")) ## Make Predictions
                loss_val = loss_func(preds.squeeze(), Y_batch.as_nd_ndarray().astype("float32")) ## Calculate Loss

            loss_val.backward() ## Calculate Gradients

            loss_val = loss_val.mean().asscalar()
            losses.append(loss_val)

            trainer.step(len(X_batch)) ## Update Model Weights

        if i % 100 == 0: ## Print MSE every 100 epochs
            print("BCE : {:.2f}".format(np.array(losses).mean()))

Below, we are training our neural network by giving data in batches. We have initialized the number of epochs to 200, learning rate to 0.001, and batch size to 32. We have then created the model and initialized its weights. We have then initialized the loss function and Trainer object. At last, we have called our function from the previous cell to perform training in batches.

from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd

epochs=200
learning_rate = 0.001
batch_size=32

model = MLP()
model.initialize()
loss_func = loss.SigmoidBCELoss(from_sigmoid=True)

trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})

TrainModelInBatches(trainer, X_train, Y_train, learning_rate, epochs, batch_size=batch_size)
BCE : 0.69
BCE : 0.03

Make Predictions in Batches

In this section, we have made predictions on train and test sets in batches. We have copied the same function we had used in the regression section for prediction in batches.

def MakePredictions(input_data, batch_size=32):
    batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices

    preds = []
    for batch in batches:
        batch = batch.asscalar()
        if batch != batches[-1]:
            start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
        else:
            start, end = int(batch*batch_size), None

        X_batch = input_data[start:end]

        preds.append(model(X_batch))

    return preds
test_preds = MakePredictions(X_test.as_nd_ndarray().astype("float32"))

test_preds = nd.concatenate(test_preds).squeeze()

test_preds = (test_preds > 0.5).astype("float32")

train_preds = MakePredictions(X_train.as_nd_ndarray().astype("float32"))

train_preds = nd.concatenate(train_preds).squeeze()

train_preds = (train_preds > 0.5).astype("float32")

test_preds[:5], train_preds[:5]
(
 [0. 0. 1. 1. 1.]
 <NDArray 5 @cpu(0)>,

 [1. 1. 0. 0. 1.]
 <NDArray 5 @cpu(0)>)

Evaluate Model Performance

In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions. We have also calculated a classification report on test predictions.

from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train.asnumpy(), train_preds.squeeze().asnumpy())))
print("Test  Accuracy : {:.2f}".format(accuracy_score(Y_test.asnumpy(), test_preds.squeeze().asnumpy())))
Train Accuracy : 1.00
Test  Accuracy : 0.96
from sklearn.metrics import classification_report

print("Classification Report :")
print(classification_report(Y_test.asnumpy(), test_preds.squeeze().asnumpy()))
Classification Report :
              precision    recall  f1-score   support

           0       0.95      0.95      0.95        42
           1       0.97      0.97      0.97        72

    accuracy                           0.96       114
   macro avg       0.96      0.96      0.96       114
weighted avg       0.96      0.96      0.96       114

Sunny Solanki  Sunny Solanki

Share Views Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.