MXNet is an open-source framework for developing deep neural networks. It was co-developed by Carlos Guestrin at the University of Washington. Its primary API is available as python but provides API for other languages like Java, C++, Julia, R, Scala, etc. MXNet allows us to create neural networks which can work on CPUs as well as on GPUs. MXNet has two primary APIs for creating neural networks.
Internally MXNet maintains as arrays as NDArray objects which is a data structure developed by MXNet for maintaining multi-dimensional arrays. It provides some of the functions provided by numpy. As a part of this tutorial, we'll explain how to develop simple neural networks with Gluon API of MXNet. We'll be using small toy datasets available from scikit-learn for solving simple regression and classification tasks. The main aim of the tutorial is to get newcomers started developing neural networks using MXNet.
Below we have listed important sections of the tutorial to give an overview of the material covered.
import mxnet
print("Mxnet Version : {}".format(mxnet.__version__))
In this section, we'll explain how we can create simple neural networks using MXNet to solve simple regression tasks. We'll be using a small regression dataset available from scikit-learn for our purposes.
In this section, we have loaded the Boston housing dataset available from scikit-learn. We have loaded data features in variable X and target values in variable Y. The target values are median house prices in 1000 dollars. As target values are continuous, this will be a regression task. We have then divided the dataset into the train (80%) and test (20%) sets. We have then converted numpy arrays to NDArray type from np module of MXNet. We have converted it to NDArray type from np module as it let us perform statistical operations like mean and standard deviation on it. The NDArray available from nd module does not have many statistical operations. We can easily convert NDArray between np and nd module. Our neural networks require that we use NDArray created from nd module hence we'll be converting it during the training process. We have used NDArray from np module because we want to normalize data in the next section and standard deviation statistical function is not available for NDArray from nd module.
from mxnet import nd, np
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train, X_test, Y_train, Y_test = np.array(X_train), np.array(X_test), np.array(Y_train), np.array(Y_test)
samples, features = X_train.shape
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
In this section, we have normalized our train and test datasets. The data normalization is performed so that features of data that are on different scales and vary a lot in values can be brought to the same scale. It'll help the optimization function converge faster.
In order to normalize data, we have first calculated the mean and standard deviation of features of train data. We have then subtracted the mean from both train and test sets. At last, we have divided subtracted values by standard deviation.
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
In this section, we'll explain how to create a neural network using MXNet. The gluon module of MXNet provides a module named nn which provides API for creating neural networks. It has available classes for creating layers of different types like linear, convolutional, lstm, etc. Below, we'll explain two different ways of creating a neural network using MXNet.
The first way of creating a neural network is using Sequential() constructor. It creates neural networks whose layers will be applied sequentially to data one after another in the order in which they were added. The Sequential object provides a method named add() that lets us add layers to our sequential neural network.
The linear layers can be created using Dense() constructor available through nn module. It accepts a number of units of that layer as the first parameter. We can also provide activation to be applied to the output of the layer. Apart from units and activation, we can also provide details like weight initializer function, bias initializer, data type, the flag indicating whether to use bias or not, etc. We can add layers all together by calling add() method once or we can call add() method more than once as well to add layers.
Below we have created neural network with layer sizes [5,10,15,1] and used activation function rectified linear unit (relu) for hidden layers.
This way of creating a neural network is almost identical to Sequential API of keras.
from mxnet.gluon import nn
model = nn.Sequential()
model.add(
nn.Dense(5,activation="relu"),
nn.Dense(10,activation="relu"),
nn.Dense(15,activation="relu"),
nn.Dense(1)
)
model
for w_name, w in model.collect_params().items():
print(w_name, w.shape)
Once we have created a neural network, we need to call initialize() method on it to initialize the model. It'll create neural network parameters. Then, we can call the model by giving input data and it'll make predictions.
Please make a NOTE that MXNet neural networks require data presented as NDArray created from nd module as input. In our case, we have created NDArray from np module earlier, hence we have converted it to nd module NDArray using method named as_nd_ndarray().
The astype() method works like the numpy astype method that can be used to convert an array from one data type to another.
model.initialize()
X_sample = X_train[:5].as_nd_ndarray()
model(X_sample.astype("float32"))
We can access model parameters by calling method collect_params(). It returns a dictionary-like object which has information about the weights and biases of each layer of the neural network.
model.collect_params()
for w_name, w in model.collect_params().items():
print(w_name, w.shape)
The second way of creating a neural network gives us more flexibility and control over how to execute layers. In this section, we have created a neural network by extending it. nn.Block class. We define all the layers of neural network in __init__() method and actual forward pass logic inside of forward() method. This gives us more control over forward pass logic as we can define things the way we want rather than simple forward pass through layers one by one.
Below we have created a neural network with layers [5,10,15,1]. We have used relu activation for all hidden layers.
This way of creating a neural network is almost identical to creating a neural network in PyTorch by extending nn.Module class.
from mxnet.gluon import nn
class MLP(nn.Block):
def __init__(self, **kwargs):
super(MLP, self).__init__(**kwargs)
self.linear1 = nn.Dense(5,activation="relu")
self.linear2 = nn.Dense(10,activation="relu")
self.linear3 = nn.Dense(15,activation="relu")
self.linear4 = nn.Dense(1)
def forward(self, x):
x = self.linear1(x)
x = self.linear2(x)
x = self.linear3(x)
return self.linear4(x)
In the below cell, we have created our neural network by creating an instance of MLP class. We have then initialized the network by calling initialize() method on it. Then in the next cell, we have performed forward pass through sample data using our network.
model = MLP()
model
model.initialize()
X_sample = X_train[:5].as_nd_ndarray()
model(X_sample.astype("float32"))
model
We can retrieve weights of neural networks by calling weight and bias property of layer objects.
weight1 = model.linear1.weight.data()
bias1 = model.linear1.bias.data()
print("Weight Data Type : {}".format(type(weight1)))
print("Bias Data Type : {}".format(type(bias1)))
weight1.shape, bias1.shape
In this section, we'll explain how we can train our neural network in MXNet. We have imported all necessary modules at the beginning. First, we have initialized a number of epochs to 2000 and the learning rate to 0.001. We have then initialized our model by creating an instance of MLP class we created earlier and calling initialize() method on it. We have then initialized the loss function for our regression task. We'll be using the mean square error loss function and try to minimize it.
In order to update model parameters during training, we need to create Trainer object by calling Trainer() constructor. We provide model parameters to the constructor by calling collect_params() method on it. Then the second argument to Trainer() constructor is either name of optimizer or the instance of Optimizer specifying optimizer that we'll use for updating weights. The third argument to Trainer() constructor is dictionary specifying parameters to Optimizer object. When we specify the optimizer name as a string, MXNet will internally create Optimizer instance for that optimizer.
After creating Trainer instance, we are executing the training loop number of epoch time. Each time, we are first making a prediction using our model by performing a forward pass through it and then calculating loss value for predictions. We are performing these two steps inside autograd.record() context manager which will help us record gradients of loss with respect to model weights/biases. We are then calling backward() on loss value to calculate gradients of loss. In order to actually update weights, we need to call step() method on Trainer instance by giving the size of data (batch size) given to the model. In our case, the batch size is whole data.
We have also printed the loss value at every 100 epochs to check progress. We can notice from the decreasing loss value that our model is doing better.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=2000
learning_rate = 0.001
model = MLP()
model.initialize()
mse_loss = loss.L2Loss()
trainer = gluon.Trainer(model.collect_params(), "sgd", {"learning_rate": learning_rate})
for epoch in range(1, epochs+1):
with autograd.record():
preds = model(X_train.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = mse_loss(preds.squeeze(), Y_train.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
trainer.step(len(X_train)) ## Update Weights
if epoch%100==0:
print("MSE : {:.3f}".format(loss_val))
In this section, we are making predictions on train and test datasets. We can make predictions by simply calling the model object giving data to it.
train_preds = model(X_train.as_nd_ndarray().astype("float32"))
train_preds[:5]
test_preds = model(X_test.as_nd_ndarray().astype("float32"))
test_preds[:5]
In this section, we are evaluating the performance of our regression model by calculating r^2 score on train and test predictions. The r^2 score is calculated for regression tasks and generally has value in the range [0-1] for good models. The values near 1 are considered a good model. Below, we have calculated r^2 score on the train and test predictions using r2_score() method available from scikit-learn. We can notice from the value of the score that our model seems to be doing a good job.
If you want to learn in-depth about r^2 score then please feel free to check our tutorial on metrics available from scikit-learn which covers it in detail.
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.squeeze().asnumpy(),Y_test.asnumpy())))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.squeeze().asnumpy(), Y_train.asnumpy())))
Our previous training code worked on whole training data at a time. But in real-life, the datasets can be large and we can not fit the whole dataset into computer memory. In those situations, we train the model on a small batch of data at a time bringing only that many samples in the main memory of the computer. We go through the whole data in batches taking on a specified number of samples at a time to train the model.
Our dataset for this example is small and fits into the main memory of the computer. In order to explain how to perform training on data in batches, we'll treat our dataset as if it does not fit into the main memory. We'll divide our dataset into batches and train the model by giving a single batch of data at a time.
Below we have designed a function that implements our logic of training a neural network in batches. The function has one main loop which executes the training number of epochs time. For each epoch, we are calculating indexes of batches. We are then looping through indexes of batches taking a single batch of data and training model with it until the whole data is covered. The training process is exactly the same as we had explained earlier with the only difference being that we are training on the batch of data at a time rather than whole data. We can provide batch size as the last parameter of the method.
def TrainModelInBatches(trainer, X, Y, learning_rate, epochs, batch_size=32):
for i in range(epochs):
batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
with autograd.record():
preds = model(X_batch.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = loss_func(preds.squeeze(), Y_batch.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
losses.append(loss_val)
trainer.step(len(X_batch)) ## Update weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(np.array(losses).mean()))
Below, we are actually training our neural network in batches. We have initialized the number of epochs to 500, learning rate to 0.001, and batch size to 32. We have then initialized our neural network and its weights by calling initialize() method on it. We have then created a L2Loss() function. Then, we have created a Trainer object which will have optimizer details and will be used to update the weights of the neural network.
At last, we have called our training function which we had defined in the previous cell with specified parameters. We can notice from the loss value getting printed every 100 epochs that our model is doing a good job.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=500
learning_rate = 0.001
batch_size=32
model = MLP()
model.initialize()
loss_func = loss.L2Loss()
trainer = gluon.Trainer(model.collect_params(), "sgd", {"learning_rate": learning_rate})
TrainModelInBatches(trainer, X_train, Y_train, learning_rate, epochs, batch_size=batch_size)
In this section, we are making predictions on train and test datasets. We have designed a function that makes predictions in batches as we can't bring whole data into main memory, we need to make predictions in batches. It uses the same logic to calculate indexes of batches as we used in the training function earlier.
def MakePredictions(input_data, batch_size=32):
batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
test_preds = MakePredictions(X_test.as_nd_ndarray().astype("float32"))
test_preds = nd.concatenate(test_preds).squeeze()
train_preds = MakePredictions(X_train.as_nd_ndarray().astype("float32"))
train_preds = nd.concatenate(train_preds).squeeze()
test_preds[:5], train_preds[:5]
In this section, we have evaluated the performance of our neural network which was trained on train data in batches by calculating r2 score on train and test predictions. We can notice from the scores that both are near 1 which indicates that our model is doing decent job prediction.
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.squeeze().asnumpy(),Y_test.asnumpy())))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.squeeze().asnumpy(), Y_train.asnumpy())))
In this section, we'll explain how we can create a simple neural network using MXNet to solve classification tasks. We'll be using the breast cancer dataset available from scikit-learn for our explanation purposes. The majority of the code in this section is repeated from the regression section with a few minor changes. Hence, we won't include a detailed description of parts that are repeated here. We'll only include descriptions when there is something new to explain.
In this section, we have loaded the Breast cancer dataset available from scikit-learn. The target values of the dataset are either 1 indicating malignant cancer or 0 indicating benign cancer. The features (independent variables) of the dataset are various measurements of tumors. The dataset has two outcomes, this will be a binary classification problem. After loading the dataset, we have divided it into the train (80%) and test (20%) sets.
from mxnet import nd, np
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train, X_test, Y_train, Y_test = np.array(X_train), np.array(X_test), np.array(Y_train), np.array(Y_test)
samples, features = X_train.shape
classes = np.unique(np.array(Y))
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
samples, features, classes
In this section, we have normalized our train and test datasets by using mean and standard deviations calculated on the training dataset.
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
In this section, we have created our neural network by extending nn.Block class like we had done in the regression section. Our neural network for this section is exactly the same as our neural network from the regression section with the only difference that activation for the last layer is sigmoid. The sigmoid function maps values in the range 0-1. After creating the model, we have initialized it and performed one forward pass through the network to make predictions for verification purposes.
from mxnet.gluon import nn
class MLP(nn.Block):
def __init__(self, **kwargs):
super(MLP, self).__init__(**kwargs)
self.linear1 = nn.Dense(5,activation="relu")
self.linear2 = nn.Dense(10,activation="relu")
self.linear3 = nn.Dense(15,activation="relu")
self.linear4 = nn.Dense(1, activation="sigmoid")
def forward(self, x):
x = self.linear1(x)
x = self.linear2(x)
x = self.linear3(x)
return self.linear4(x)
model = MLP()
model
model.initialize()
model(X_train[:5].as_nd_ndarray().astype("float32"))
model
In this section, we have included logic that trains our model. We have initialized a number of epochs to 500 and the learning rate to 0.001. We have then created a model and initialized it. The loss function that we have used for the binary classification task is SigmoidBCELoss (sigmoid binary cross-entropy loss). In our Trainer object, we are using Adam optimizer this time. The rest of the training code is exactly the same as the code from the regression section. We can notice from loss values getting printed every 100 epochs that our model is doing a good job.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=500
learning_rate = 0.001
model = MLP()
model.initialize()
loss_func = loss.SigmoidBCELoss(from_sigmoid=True) # loss.LogisticLoss(label_format="binary")
trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})
for epoch in range(1, epochs+1):
with autograd.record():
preds = model(X_train.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = loss_func(preds, Y_train.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
trainer.step(len(X_train)) ## Update Model Weights
if epoch%100==0:
print("BCE : {:.3f}".format(loss_val))
In this section, we are making predictions on our train and test datasets using our trained model. As the output of our model is in the range [0,1] due to the sigmoid activation function. We need to convert these float values to the actual prediction class (0 or 1). We have done that by setting a threshold at 0.5, predicting all values less than the threshold as class 0 and all values greater as class 1.
train_preds = model(X_train.as_nd_ndarray().astype("float32"))
train_preds = (train_preds > 0.5).astype("float32")
train_preds[:5]
test_preds = model(X_test.as_nd_ndarray().astype("float32"))
test_preds = (test_preds > 0.5).astype("float32")
test_preds[:5]
In this section, we have evaluated the performance of our binary classification model by calculating the accuracy of train and test predictions. We have also printed a classification report on the test dataset which has information like precision, recall, and f1-score.
Please feel free to check our tutorial explaining metrics from scikit-learn to know about classification reports and accuracy score functions.
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train.asnumpy(), train_preds.squeeze().asnumpy())))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test.asnumpy(), test_preds.squeeze().asnumpy())))
from sklearn.metrics import classification_report
print("Classification Report :")
print(classification_report(Y_test.asnumpy(), test_preds.squeeze().asnumpy()))
In this section, we have included the function to train the model on batches of data. The code for this function is almost exactly the same as the function we had used in the regression section.
def TrainModelInBatches(trainer, X, Y, learning_rate, epochs, batch_size=32):
for i in range(epochs):
batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
with autograd.record():
preds = model(X_batch.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = loss_func(preds.squeeze(), Y_batch.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
losses.append(loss_val)
trainer.step(len(X_batch)) ## Update Model Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("BCE : {:.2f}".format(np.array(losses).mean()))
Below, we are training our neural network by giving data in batches. We have initialized the number of epochs to 200, learning rate to 0.001, and batch size to 32. We have then created the model and initialized its weights. We have then initialized the loss function and Trainer object. At last, we have called our function from the previous cell to perform training in batches.
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=200
learning_rate = 0.001
batch_size=32
model = MLP()
model.initialize()
loss_func = loss.SigmoidBCELoss(from_sigmoid=True)
trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})
TrainModelInBatches(trainer, X_train, Y_train, learning_rate, epochs, batch_size=batch_size)
In this section, we have made predictions on train and test sets in batches. We have copied the same function we had used in the regression section for prediction in batches.
def MakePredictions(input_data, batch_size=32):
batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
test_preds = MakePredictions(X_test.as_nd_ndarray().astype("float32"))
test_preds = nd.concatenate(test_preds).squeeze()
test_preds = (test_preds > 0.5).astype("float32")
train_preds = MakePredictions(X_train.as_nd_ndarray().astype("float32"))
train_preds = nd.concatenate(train_preds).squeeze()
train_preds = (train_preds > 0.5).astype("float32")
test_preds[:5], train_preds[:5]
In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions. We have also calculated a classification report on test predictions.
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train.asnumpy(), train_preds.squeeze().asnumpy())))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test.asnumpy(), test_preds.squeeze().asnumpy())))
from sklearn.metrics import classification_report
print("Classification Report :")
print(classification_report(Y_test.asnumpy(), test_preds.squeeze().asnumpy()))
This ends our small tutorial explaining how we can use gluon API of MXNet to create simple neural networks. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to