**MXNet** is an open-source framework for developing deep neural networks. It was co-developed by Carlos Guestrin at the University of Washington. Its primary API is available as python but provides API for other languages like Java, C++, Julia, R, Scala, etc. **MXNet** allows us to create neural networks which can work on CPUs as well as on GPUs. **MXNet** has two primary APIs for creating neural networks.

**Gluon API****Module API**

Internally **MXNet** maintains as arrays as **NDArray** objects which is a data structure developed by **MXNet** for maintaining multi-dimensional arrays. It provides some of the functions provided by numpy. As a part of this tutorial, we'll explain how to develop simple neural networks with **Gluon API** of **MXNet**. We'll be using small toy datasets available from scikit-learn for solving simple regression and classification tasks. The main aim of the tutorial is to get newcomers started developing neural networks using **MXNet**.

Below we have listed important sections of the tutorial to give an overview of the material covered.

- Regression
- Load Dataset
- Normalize Data
- Create Neural Network
- Train Neural Network
- Make Predictions
- Evaluate Model Performance
- Train Model on Batches of Data
- Make Predictions in Batches
- Evaluate Model Performance

- Classification

**pip install mxnet**

```
import mxnet
print("Mxnet Version : {}".format(mxnet.__version__))
```

In this section, we'll explain how we can create simple neural networks using **MXNet** to solve simple regression tasks. We'll be using a small regression dataset available from scikit-learn for our purposes.

In this section, we have loaded the Boston housing dataset available from scikit-learn. We have loaded data features in variable **X** and target values in variable **Y**. The target values are median house prices in 1000 dollars. As target values are continuous, this will be a regression task. We have then divided the dataset into the train (80%) and test (20%) sets. We have then converted numpy arrays to **NDArray** type from **np** module of **MXNet**. We have converted it to **NDArray** type from **np** module as it let us perform statistical operations like mean and standard deviation on it. The **NDArray** available from **nd** module does not have many statistical operations. We can easily convert **NDArray** between **np** and **nd** module. Our neural networks require that we use **NDArray** created from **nd** module hence we'll be converting it during the training process. We have used **NDArray** from **np** module because we want to normalize data in the next section and standard deviation statistical function is not available for **NDArray** from **nd** module.

```
from mxnet import nd, np
```

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train, X_test, Y_train, Y_test = np.array(X_train), np.array(X_test), np.array(Y_train), np.array(Y_test)
samples, features = X_train.shape
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

In this section, we have normalized our train and test datasets. The data normalization is performed so that features of data that are on different scales and vary a lot in values can be brought to the same scale. It'll help the optimization function converge faster.

In order to normalize data, we have first calculated the mean and standard deviation of features of train data. We have then subtracted the mean from both train and test sets. At last, we have divided subtracted values by standard deviation.

```
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
```

In this section, we'll explain how to create a neural network using **MXNet**. The **gluon** module of **MXNet** provides a module named **nn** which provides API for creating neural networks. It has available classes for creating layers of different types like linear, convolutional, lstm, etc. Below, we'll explain two different ways of creating a neural network using **MXNet**.

The first way of creating a neural network is using **Sequential()** constructor. It creates neural networks whose layers will be applied sequentially to data one after another in the order in which they were added. The **Sequential** object provides a method named **add()** that lets us add layers to our sequential neural network.

The linear layers can be created using **Dense()** constructor available through **nn** module. It accepts a number of units of that layer as the first parameter. We can also provide activation to be applied to the output of the layer. Apart from units and activation, we can also provide details like weight initializer function, bias initializer, data type, the flag indicating whether to use bias or not, etc. We can add layers all together by calling **add()** method once or we can call **add()** method more than once as well to add layers.

Below we have created neural network with layer sizes **[5,10,15,1]** and used activation function **rectified linear unit (relu)** for hidden layers.

This way of creating a neural network is almost identical to **Sequential** API of keras.

```
from mxnet.gluon import nn
model = nn.Sequential()
model.add(
nn.Dense(5,activation="relu"),
nn.Dense(10,activation="relu"),
nn.Dense(15,activation="relu"),
nn.Dense(1)
)
model
```

```
for w_name, w in model.collect_params().items():
print(w_name, w.shape)
```

Once we have created a neural network, we need to call **initialize()** method on it to initialize the model. It'll create neural network parameters. Then, we can call the model by giving input data and it'll make predictions.

Please make a **NOTE** that **MXNet** neural networks require data presented as **NDArray** created from **nd** module as input. In our case, we have created **NDArray** from **np** module earlier, hence we have converted it to **nd** module **NDArray** using method named **as_nd_ndarray()**.

The **astype()** method works like the numpy astype method that can be used to convert an array from one data type to another.

```
model.initialize()
```

```
X_sample = X_train[:5].as_nd_ndarray()
model(X_sample.astype("float32"))
```

We can access model parameters by calling method **collect_params()**. It returns a dictionary-like object which has information about the weights and biases of each layer of the neural network.

```
model.collect_params()
```

```
for w_name, w in model.collect_params().items():
print(w_name, w.shape)
```

The second way of creating a neural network gives us more flexibility and control over how to execute layers. In this section, we have created a neural network by extending it. **nn.Block** class. We define all the layers of neural network in **__init__()** method and actual forward pass logic inside of **forward()** method. This gives us more control over forward pass logic as we can define things the way we want rather than simple forward pass through layers one by one.

Below we have created a neural network with layers **[5,10,15,1]**. We have used **relu** activation for all hidden layers.

This way of creating a neural network is almost identical to creating a neural network in **PyTorch** by extending **nn.Module** class.

```
from mxnet.gluon import nn
class MLP(nn.Block):
def __init__(self, **kwargs):
super(MLP, self).__init__(**kwargs)
self.linear1 = nn.Dense(5,activation="relu")
self.linear2 = nn.Dense(10,activation="relu")
self.linear3 = nn.Dense(15,activation="relu")
self.linear4 = nn.Dense(1)
def forward(self, x):
x = self.linear1(x)
x = self.linear2(x)
x = self.linear3(x)
return self.linear4(x)
```

In the below cell, we have created our neural network by creating an instance of **MLP** class. We have then initialized the network by calling **initialize()** method on it. Then in the next cell, we have performed forward pass through sample data using our network.

```
model = MLP()
model
```

```
model.initialize()
```

```
X_sample = X_train[:5].as_nd_ndarray()
model(X_sample.astype("float32"))
```

```
model
```

We can retrieve weights of neural networks by calling **weight** and **bias** property of layer objects.

```
weight1 = model.linear1.weight.data()
bias1 = model.linear1.bias.data()
print("Weight Data Type : {}".format(type(weight1)))
print("Bias Data Type : {}".format(type(bias1)))
weight1.shape, bias1.shape
```

In this section, we'll explain how we can train our neural network in **MXNet**. We have imported all necessary modules at the beginning. First, we have initialized a number of epochs to **2000** and the learning rate to **0.001**. We have then initialized our model by creating an instance of **MLP** class we created earlier and calling **initialize()** method on it. We have then initialized the loss function for our regression task. We'll be using the mean square error loss function and try to minimize it.

In order to update model parameters during training, we need to create **Trainer** object by calling **Trainer()** constructor. We provide model parameters to the constructor by calling **collect_params()** method on it. Then the second argument to **Trainer()** constructor is either name of optimizer or the instance of **Optimizer** specifying optimizer that we'll use for updating weights. The third argument to **Trainer()** constructor is dictionary specifying parameters to **Optimizer** object. When we specify the optimizer name as a string, **MXNet** will internally create **Optimizer** instance for that optimizer.

After creating **Trainer** instance, we are executing the training loop number of epoch time. Each time, we are first making a prediction using our model by performing a forward pass through it and then calculating loss value for predictions. We are performing these two steps inside **autograd.record()** context manager which will help us record gradients of loss with respect to model weights/biases. We are then calling **backward()** on loss value to calculate gradients of loss. In order to actually update weights, we need to call **step()** method on **Trainer** instance by giving the size of data (batch size) given to the model. In our case, the batch size is whole data.

We have also printed the loss value at every 100 epochs to check progress. We can notice from the decreasing loss value that our model is doing better.

```
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=2000
learning_rate = 0.001
model = MLP()
model.initialize()
mse_loss = loss.L2Loss()
trainer = gluon.Trainer(model.collect_params(), "sgd", {"learning_rate": learning_rate})
for epoch in range(1, epochs+1):
with autograd.record():
preds = model(X_train.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = mse_loss(preds.squeeze(), Y_train.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
trainer.step(len(X_train)) ## Update Weights
if epoch%100==0:
print("MSE : {:.3f}".format(loss_val))
```

In this section, we are making predictions on train and test datasets. We can make predictions by simply calling the model object giving data to it.

```
train_preds = model(X_train.as_nd_ndarray().astype("float32"))
train_preds[:5]
```

```
test_preds = model(X_test.as_nd_ndarray().astype("float32"))
test_preds[:5]
```

In this section, we are evaluating the performance of our regression model by calculating **r^2 score** on train and test predictions. The **r^2 score** is calculated for regression tasks and generally has value in the range **[0-1]** for good models. The values near 1 are considered a good model. Below, we have calculated **r^2 score** on the train and test predictions using **r2_score()** method available from scikit-learn. We can notice from the value of the score that our model seems to be doing a good job.

If you want to learn in-depth about **r^2 score** then please feel free to check our tutorial on metrics available from scikit-learn which covers it in detail.

```
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.squeeze().asnumpy(),Y_test.asnumpy())))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.squeeze().asnumpy(), Y_train.asnumpy())))
```

Our previous training code worked on whole training data at a time. But in real-life, the datasets can be large and we can not fit the whole dataset into computer memory. In those situations, we train the model on a small batch of data at a time bringing only that many samples in the main memory of the computer. We go through the whole data in batches taking on a specified number of samples at a time to train the model.

Our dataset for this example is small and fits into the main memory of the computer. In order to explain how to perform training on data in batches, we'll treat our dataset as if it does not fit into the main memory. We'll divide our dataset into batches and train the model by giving a single batch of data at a time.

Below we have designed a function that implements our logic of training a neural network in batches. The function has one main loop which executes the training number of epochs time. For each epoch, we are calculating indexes of batches. We are then looping through indexes of batches taking a single batch of data and training model with it until the whole data is covered. The training process is exactly the same as we had explained earlier with the only difference being that we are training on the batch of data at a time rather than whole data. We can provide batch size as the last parameter of the method.

```
def TrainModelInBatches(trainer, X, Y, learning_rate, epochs, batch_size=32):
for i in range(epochs):
batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
with autograd.record():
preds = model(X_batch.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = loss_func(preds.squeeze(), Y_batch.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
losses.append(loss_val)
trainer.step(len(X_batch)) ## Update weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(np.array(losses).mean()))
```

Below, we are actually training our neural network in batches. We have initialized the number of epochs to **500**, learning rate to **0.001**, and batch size to **32**. We have then initialized our neural network and its weights by calling **initialize()** method on it. We have then created a **L2Loss()** function. Then, we have created a **Trainer** object which will have optimizer details and will be used to update the weights of the neural network.

At last, we have called our training function which we had defined in the previous cell with specified parameters. We can notice from the loss value getting printed every 100 epochs that our model is doing a good job.

```
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=500
learning_rate = 0.001
batch_size=32
model = MLP()
model.initialize()
loss_func = loss.L2Loss()
trainer = gluon.Trainer(model.collect_params(), "sgd", {"learning_rate": learning_rate})
TrainModelInBatches(trainer, X_train, Y_train, learning_rate, epochs, batch_size=batch_size)
```

In this section, we are making predictions on train and test datasets. We have designed a function that makes predictions in batches as we can't bring whole data into main memory, we need to make predictions in batches. It uses the same logic to calculate indexes of batches as we used in the training function earlier.

```
def MakePredictions(input_data, batch_size=32):
batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
```

```
test_preds = MakePredictions(X_test.as_nd_ndarray().astype("float32"))
test_preds = nd.concatenate(test_preds).squeeze()
train_preds = MakePredictions(X_train.as_nd_ndarray().astype("float32"))
train_preds = nd.concatenate(train_preds).squeeze()
test_preds[:5], train_preds[:5]
```

In this section, we have evaluated the performance of our neural network which was trained on train data in batches by calculating **r2 score** on train and test predictions. We can notice from the scores that both are near 1 which indicates that our model is doing decent job prediction.

```
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.squeeze().asnumpy(),Y_test.asnumpy())))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.squeeze().asnumpy(), Y_train.asnumpy())))
```

In this section, we'll explain how we can create a simple neural network using **MXNet** to solve classification tasks. We'll be using the breast cancer dataset available from scikit-learn for our explanation purposes. The majority of the code in this section is repeated from the regression section with a few minor changes. Hence, we won't include a detailed description of parts that are repeated here. We'll only include descriptions when there is something new to explain.

In this section, we have loaded the Breast cancer dataset available from scikit-learn. The target values of the dataset are either **1** indicating malignant cancer or **0** indicating benign cancer. The features (independent variables) of the dataset are various measurements of tumors. The dataset has two outcomes, this will be a binary classification problem. After loading the dataset, we have divided it into the train (80%) and test (20%) sets.

```
from mxnet import nd, np
```

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train, X_test, Y_train, Y_test = np.array(X_train), np.array(X_test), np.array(Y_train), np.array(Y_test)
samples, features = X_train.shape
classes = np.unique(np.array(Y))
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

```
samples, features, classes
```

In this section, we have normalized our train and test datasets by using mean and standard deviations calculated on the training dataset.

```
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
```

In this section, we have created our neural network by extending **nn.Block** class like we had done in the regression section. Our neural network for this section is exactly the same as our neural network from the regression section with the only difference that activation for the last layer is **sigmoid**. The **sigmoid** function maps values in the range 0-1. After creating the model, we have initialized it and performed one forward pass through the network to make predictions for verification purposes.

```
from mxnet.gluon import nn
class MLP(nn.Block):
def __init__(self, **kwargs):
super(MLP, self).__init__(**kwargs)
self.linear1 = nn.Dense(5,activation="relu")
self.linear2 = nn.Dense(10,activation="relu")
self.linear3 = nn.Dense(15,activation="relu")
self.linear4 = nn.Dense(1, activation="sigmoid")
def forward(self, x):
x = self.linear1(x)
x = self.linear2(x)
x = self.linear3(x)
return self.linear4(x)
model = MLP()
model
```

```
model.initialize()
model(X_train[:5].as_nd_ndarray().astype("float32"))
```

```
model
```

In this section, we have included logic that trains our model. We have initialized a number of epochs to **500** and the learning rate to **0.001**. We have then created a model and initialized it. The loss function that we have used for the binary classification task is **SigmoidBCELoss (sigmoid binary cross-entropy loss)**. In our **Trainer** object, we are using **Adam** optimizer this time. The rest of the training code is exactly the same as the code from the regression section. We can notice from loss values getting printed every 100 epochs that our model is doing a good job.

```
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=500
learning_rate = 0.001
model = MLP()
model.initialize()
loss_func = loss.SigmoidBCELoss(from_sigmoid=True) # loss.LogisticLoss(label_format="binary")
trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})
for epoch in range(1, epochs+1):
with autograd.record():
preds = model(X_train.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = loss_func(preds, Y_train.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
trainer.step(len(X_train)) ## Update Model Weights
if epoch%100==0:
print("BCE : {:.3f}".format(loss_val))
```

In this section, we are making predictions on our train and test datasets using our trained model. As the output of our model is in the range **[0,1]** due to the sigmoid activation function. We need to convert these float values to the actual prediction class (0 or 1). We have done that by setting a threshold at **0.5**, predicting all values less than the threshold as class **0** and all values greater as class **1**.

```
train_preds = model(X_train.as_nd_ndarray().astype("float32"))
train_preds = (train_preds > 0.5).astype("float32")
train_preds[:5]
```

```
test_preds = model(X_test.as_nd_ndarray().astype("float32"))
test_preds = (test_preds > 0.5).astype("float32")
test_preds[:5]
```

In this section, we have evaluated the performance of our binary classification model by calculating the accuracy of train and test predictions. We have also printed a classification report on the test dataset which has information like precision, recall, and f1-score.

Please feel free to check our tutorial explaining metrics from scikit-learn to know about classification reports and accuracy score functions.

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train.asnumpy(), train_preds.squeeze().asnumpy())))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test.asnumpy(), test_preds.squeeze().asnumpy())))
```

```
from sklearn.metrics import classification_report
print("Classification Report :")
print(classification_report(Y_test.asnumpy(), test_preds.squeeze().asnumpy()))
```

In this section, we have included the function to train the model on batches of data. The code for this function is almost exactly the same as the function we had used in the regression section.

```
def TrainModelInBatches(trainer, X, Y, learning_rate, epochs, batch_size=32):
for i in range(epochs):
batches = nd.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
with autograd.record():
preds = model(X_batch.as_nd_ndarray().astype("float32")) ## Make Predictions
loss_val = loss_func(preds.squeeze(), Y_batch.as_nd_ndarray().astype("float32")) ## Calculate Loss
loss_val.backward() ## Calculate Gradients
loss_val = loss_val.mean().asscalar()
losses.append(loss_val)
trainer.step(len(X_batch)) ## Update Model Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("BCE : {:.2f}".format(np.array(losses).mean()))
```

Below, we are training our neural network by giving data in batches. We have initialized the number of epochs to **200**, learning rate to **0.001**, and batch size to **32**. We have then created the model and initialized its weights. We have then initialized the loss function and **Trainer** object. At last, we have called our function from the previous cell to perform training in batches.

```
from mxnet import gluon
from mxnet.gluon import loss
from mxnet import autograd
epochs=200
learning_rate = 0.001
batch_size=32
model = MLP()
model.initialize()
loss_func = loss.SigmoidBCELoss(from_sigmoid=True)
trainer = gluon.Trainer(model.collect_params(), "adam", {"learning_rate": learning_rate})
TrainModelInBatches(trainer, X_train, Y_train, learning_rate, epochs, batch_size=batch_size)
```

In this section, we have made predictions on train and test sets in batches. We have copied the same function we had used in the regression section for prediction in batches.

```
def MakePredictions(input_data, batch_size=32):
batches = nd.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
batch = batch.asscalar()
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
```

```
test_preds = MakePredictions(X_test.as_nd_ndarray().astype("float32"))
test_preds = nd.concatenate(test_preds).squeeze()
test_preds = (test_preds > 0.5).astype("float32")
train_preds = MakePredictions(X_train.as_nd_ndarray().astype("float32"))
train_preds = nd.concatenate(train_preds).squeeze()
train_preds = (train_preds > 0.5).astype("float32")
test_preds[:5], train_preds[:5]
```

In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions. We have also calculated a classification report on test predictions.

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train.asnumpy(), train_preds.squeeze().asnumpy())))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test.asnumpy(), test_preds.squeeze().asnumpy())))
```

```
from sklearn.metrics import classification_report
print("Classification Report :")
print(classification_report(Y_test.asnumpy(), test_preds.squeeze().asnumpy()))
```

This ends our small tutorial explaining how we can use **gluon** API of **MXNet** to create simple neural networks. Please feel free to let us know your views in the comments section.

- Guide to Create Neural Networks using High-level JAX API
- Sonnet: Guide to Create Simple Neural Networks
- Guide to Create Simple Neural Networks using PyTorch
- Guide to Create Simple Neural Networks using JAX
- Create Simple PyTorch Neural Networks using 'torch.nn' Module
- Scikit-Learn - Neural Network
- Flax: Framework to Create Neural Networks using JAX

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our **YouTube** channel.

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at **coderzcolumn07@gmail.com**. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs