**Pytorch** is at the forefront of machine learning research with its pythonic framework to design neural networks. **Pytorch** provides a low-level numpy-like API to design a neural network from totally scratch as well as a high-level API where layers, loss functions, activation function, optimizers, etc are already defined and can be used without modification. We have already covered a tutorial where we discussed how we can create small neural networks to solve simple tasks using low-level **PyTorch** API where we have designed everything. Please feel free to check that tutorial as it'll help you with this tutorial.

As a part of this tutorial, we'll again explain how to create simple neural networks but this time using high-level API of **PyTorch** available through **torch.nn** module. We'll be working with small toy datasets available from scikit-learn to solve one regression and one classification task. We'll be creating small neural networks to explain the **torch.nn** API. We assume that the reader of the tutorial has little background on neural networks, loss function, optimizers, activation function, etc as we won't be covering them in detail here. The main aim of the tutorial is to get individuals started using **Pytorch's** **torch.nn** module to design neural networks.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

- Regression
- Load Data
- Normalize Data
- Define Neural Network Model
- Train Model
- Make Predictions
- Evaluate Performance of Model
- Train Model in Batches
- Make Predictions in Batches
- Evaluate Performance of Model

- Classification

Below we have imported **PyTorch** and printed version of it that we'll be using in this tutorial.

In [1]:

```
import torch
print("PyTorch Version : {}".format(torch.__version__))
```

In [2]:

```
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device : {}".format(device))
```

In this section, we'll explain how we can create simple neural networks to solve regression tasks. We'll be using a small toy dataset available from scikit-learn for our purposes. We'll be using **torch.nn** module to design neural networks.

In this section, we have loaded the Boston housing dataset available from scikit-learn. We have loaded data features into variable **X** and target values in variable **Y**. After loading the dataset, we have divided it into the train (80%) and test (20%) sets. We have then converted datasets loaded as numpy array to **torch tensor** as the neural network requires an input of that type. We have also recorded a number of samples in training data and a number of features of data in separate variables.

In [3]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
torch.tensor(X_test, dtype=torch.float32),\
torch.tensor(Y_train, dtype=torch.float32),\
torch.tensor(Y_test, dtype=torch.float32)
samples, features = X_train.shape
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[3]:

In [4]:

```
samples, features
```

Out[4]:

In this section, we have normalized our data. We have had first taken the mean and standard deviation of train data. Then we have subtracted the mean from both the train and test datasets. At last, we have divided the difference by standard deviation to get normalized data.

The data normalization process will bring the majority of features in the same range. This will help optimization algorithm gradient descent to converge faster. The features in quite different ranges can cause issues with gradient descent which can oscillate due to a different range of features.

In [5]:

```
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean)/ std
X_test = (X_test - mean)/ std
```

In this section, we'll explain how we can define a neural network using **torch.nn** module.

In order to create a neural network using **torch.nn** module, we need to create a Python class that will inherit class **nn.Module**. The network is defined by inheriting **nn.Module** class will inherit the methods and attributes of it. The inheritance from **nn.Module** class will bring properties like parameters of neural network available through **parameters()** method, calling neural network object by giving input data to perform forward pass & make predictions, etc.

The general structure followed is to define all our layers, activation functions, etc inside of ** init()** method of the class and actual forward pass process inside of

Below we have first defined our **Regressor** class by inheriting from **nn.Module** class. We have then defined three linear or dense layers inside of ** init()** method which are hidden layers of our neural network. Then we have defined the final layer of our neural network. We have used

The actual forward pass-through data happens inside of **forward()** method. This method takes one parameter which is input data. It then applies the first layer on input data by calling the object of the first layer with input data. It then applies the activation function on the output of the first layer and stores results in a variable. The output of the first layer is given to the second layer followed by the third layer and then the final layer. This whole process defines our forward pass of the neural network through data. We return the output of the last layer from **forward()** method. The **forward()** method will be called each time we call an instance of **Regressor** class with input data (Internally ** call()** method calls

In [6]:

```
from torch import nn
class Regressor(nn.Module):
def __init__(self):
super(Regressor, self).__init__()
self.first_layer = nn.Linear(features, 5)
self.second_layer = nn.Linear(5, 10)
self.third_layer = nn.Linear(10, 15)
self.final_layer = nn.Linear(15,1)
self.relu = nn.ReLU()
def forward(self, X_batch):
layer_out = self.relu(self.first_layer(X_batch))
layer_out = self.relu(self.second_layer(layer_out))
layer_out = self.relu(self.third_layer(layer_out))
return self.final_layer(layer_out)
```

Below we have created an instance of **Regressor** class and then called it to perform a forward pass through our data. We have given the first 5 entries of our train data as input to it to make predictions.

In [7]:

```
regressor = Regressor()
preds = regressor(X_train[:5])
preds
```

Out[7]:

In this section, we have actually trained our neural network. We have created a function that will actually perform training. We need to give function neural network model, loss function, optimizer, data features, target values, and a number of epochs as input to perform training on given input data.

The function executes the loop number of epochs time. It first performs a forward pass-through model with data features and makes predictions. It then calculates loss using the loss function by giving it predictions and actual target values as input.

We have then called **optimizer.zero_grad()** function. The reason behind calling this function is that we need to remove any previous gradient values present in our parameters before we call **backward()** on loss as backward propagation will add gradients to existing gradients and if we already have values there then it'll create a problem.

Then we have called **loss.backward()** method. This method actually performs a backward propagation algorithm and calculates gradients of loss with respect to input weights and biases.

At last, we have called **optimizer.step()**. This will actually update weights using learning rate and gradients.

We are also printing loss value at every 100 epochs to check progress.

In [8]:

```
def TrainModel(model, loss_func, optimizer, X, Y, epochs=500):
for i in range(epochs):
preds = model(X) ## Make Predictions by forward pass through network
loss = loss_func(preds.ravel(), Y) ## Calculate Loss
optimizer.zero_grad() ## Zero weights before calculating gradients
loss.backward() ## Calculate Gradients
optimizer.step() ## Update Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(loss))
```

Below we are actually performing training of our neural network. We have initialized a number of epochs, (**1000**), learning rate (**0.001**), neural network, loss function, and gradient descent optimizer. We have then called our training function with all these initialized values to perform training.

As this is a regression problem, we'll be using mean square error as our loss function. It finds an average of the squared difference between original target values and predicted ones. We have defined loss by creating an instance of **MSELoss** class available from **torch.nn** module.

```
mean squared error(actual, predictions) = 1/n * (actual - predictions)^2
n = Number of Data Samples
```

The optimizer that we'll be using is gradient descent, it subtracts learning rate times gradients from our weights to update weights. It does not change the learning rate throughout the whole training process. We have created **SGD** optimizer by creating an instance of class **SGD** available from **torch.optim** module. We need to provide model parameters and learning rate to **SGD()** constructor. We can access model parameters by simply calling **parameters()** method on it.

We can notice from the **MSE** loss getting printed every 100 epochs that as it's reducing our model seems to perform better. We'll also evaluate the performance of our model later on test data using different metrics.

In [9]:

```
from torch.optim import SGD, RMSprop, Adam
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 1000
learning_rate = torch.tensor(1/1e3) # 0.001
regressor = Regressor()
mse_loss = nn.MSELoss()
optimizer = SGD(params=regressor.parameters(), lr=learning_rate)
TrainModel(regressor, mse_loss, optimizer, X_train, Y_train, epochs=epochs)
```

In this section, we have made predictions on train and test datasets using our trained neural network regressor.

In [10]:

```
test_preds = regressor(X_test) ## Make Predictions on test dataset
test_preds[:5]
```

Out[10]:

In [11]:

```
train_preds = regressor(X_train) ## Make Predictions on train dataset
train_preds[:5]
```

Out[11]:

In this section, we have evaluated the performance of our neural network regressor by calculating **R^2 score** on train and test predictions. We have used **r2_score()** method available from scikit-learn to calculate the score. The **R^2 score** generally returns the value in the range **[0,1]** where a value near 1 is a good score and indicates a good model. WE can notice from our results below that our model seems to have done a decent job at prediction.

If you are interested in learning about model evaluation and scoring metrics then please feel free to check our tutorial which covers various metrics available from scikit-learn with simple examples.

In [12]:

```
from sklearn.metrics import r2_score
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.detach().numpy().squeeze(), Y_train.detach().numpy())))
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.detach().numpy().squeeze(), Y_test.detach().numpy())))
```

The dataset that we have taken for explanation is quite a small toy dataset and easily fits into the main memory of the computer. But many real-life datasets are huge and do not fit into the main memory of the computer. In order to handle such big datasets, we perform training only on a small number of samples of that dataset which fits into the main memory at a time. This small subset of our whole dataset is referred to as batch. We perform forward pass through the whole dataset in batches by taking a small batch of samples at a time. We also update weights for each batch.

Earlier we updated the weights of our neural network one time for a single pass-through dataset whereas, in this case of pass-through data in batches, we'll update weights multiple times. Hence, we'll be updating weights more than once for a single pass-through data. This algorithm of updating weights in batches is commonly referred to as **stochastic gradient descent** a version of gradient descent for working with large out of memory datasets.

Even though our dataset is quite small, we'll assume that it does not fit into the main memory to explain how to perform training in batches and we'll treat the dataset in batches.

Our function loops the number of epochs times. Each time, it creates a number of batches and then uses batch numbers to create start and end indices of each batch. It then takes a single batch at a time by filtering the original dataset based on start and end indices, making predictions on batch, calculating loss, calculating gradients, and updating weights using gradients. This process is repeated for all batches of data. When we are done with all batches of data covering the whole dataset, it’s referred to as one epoch. We are recording a loss for each batch of data and printing the average of batch losses at every 100 epochs.

In [13]:

```
def TrainModelInBatches(model, loss_func, optimizer, X, Y, batch_size=32, epochs=500):
for i in range(epochs):
batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
preds = model(X_batch) ## Make Predictions by forward pass through network
loss = loss_func(preds.ravel(), Y_batch) ## Calculate Loss
losses.append(loss) ## Record Loss
optimizer.zero_grad() ## Zero weights before calculating gradients
loss.backward() ## Calculate Gradients
optimizer.step() ## Update Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(torch.tensor(losses).mean()))
```

Below we are training our neural network in batches using the function we created in the previous cell. We have first initialized number of epochs (**1000**), learning rate (**0.001**), batch size (**32**), our neural network, loss function (**MSELoss()**) and gradient descent optimizer (**SGD()**). We have initialized **SGD()** by giving our model parameters and learning rate. At last, we have called our function by giving our model, loss function, optimizer, training data features, target values, batch size, and epochs as input. We can notice from the loss value getting printed at every 100 epochs that our network seems to be doing a good job.

In [14]:

```
from torch.optim import SGD, RMSprop, Adam
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 1000
learning_rate = torch.tensor(1/1e3) # 0.001
batch_size=32
regressor = Regressor()
mse_loss = nn.MSELoss()
optimizer = SGD(params=regressor.parameters(), lr=learning_rate)
TrainModelInBatches(regressor, mse_loss, optimizer, X_train, Y_train, batch_size=batch_size, epochs=epochs)
```

In this section, we have defined a function that makes predictions on a batch of data and then combines predictions of all batches. The logic for this function uses logic used in the training function to create indices of batches. It makes predictions on input data in batches and returns combined predictions of all batches.

In [15]:

```
def MakePredictions(model, input_data, batch_size=32):
batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
with torch.no_grad(): ## Disables automatic gradients calculations
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
```

Below we have called our function to make predictions in batches on train and test datasets.

In [16]:

```
test_preds = MakePredictions(regressor, X_test) ## Make Predictions on test dataset
test_preds = torch.cat(test_preds).ravel() ## Combine predictions of all batches
train_preds = MakePredictions(regressor, X_train) ## Make Predictions on train dataset
train_preds = torch.cat(train_preds).ravel() ## Combine predictions of all batches
```

In this section, we have evaluated the performance of our model by calculating **R^2 score** on train and test predictions. We can notice from the results that our model seems to be doing a decent job at prediction.

In [17]:

```
from sklearn.metrics import r2_score
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.detach().numpy(), Y_train.detach().numpy())))
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.detach().numpy(), Y_test.detach().numpy())))
```

In this section, we'll explain how we can create simple neural networks using **torch.nn** module of **PyTorch** to solve classification tasks. We'll be using a small toy dataset available from scikit-learn for our purpose. We'll be creating a neural network to solve a binary classification task.

In this section, we'll be reusing much of the code we had designed in the regression section. We have included it here for someone who starts from this section and can follow up code till the end without looking for functions in the regression section. Though we have included code again here, we have not included a detailed description of repeated code again. The reader needs to check the regression section for functions where we have not included detailed descriptions.

In this section, we have loaded the breast cancer binary classification dataset available from scikit-learn. The data features are loaded in a variable named **X** and target values are included in a variable named **Y**. We have then divided data into the train (**80%**) and test (**20%**) sets. We have also converted datasets loaded as numpy array to torch tensors as required by **PyTorch** models.

In [18]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
torch.tensor(X_test, dtype=torch.float32),\
torch.tensor(Y_train, dtype=torch.long),\
torch.tensor(Y_test, dtype=torch.long)
samples, features = X_train.shape
classes = Y_test.unique()
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[18]:

In [19]:

```
samples, features, classes
```

Out[19]:

In this section, we have normalized our datasets. The code is an exact copy of the code from the regression section hence the detailed description is not included here.

In [20]:

```
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean)/ std
X_test = (X_test - mean)/ std
```

In this section, we have defined our classification neural network. The structure of our neural network for the classification task is almost exactly the same as our neural network from the regression section with a few minor changes. We have declared **Softmax()** function as activation function for our last layer inside of **\__init__()** method. For the classification task, our last layer has 2 output units because we are dealing with a binary classification task. For the regression task, we had not applied any activation function to the last layer whereas here we have applied the softmax activation function to the output of the last layer.

As the output of the last layer is softmax, there will be two probabilities present per sample of our data, and probabilities for a single sample will sum up to 1. The first probability from two probability will be for class 0 (benign tumor) and the second probability will be for class 1 (malignant tumor). We'll predict class based on whichever is higher.

In [21]:

```
from torch import nn
class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.first_layer = nn.Linear(features, 5)
self.second_layer = nn.Linear(5, 10)
self.third_layer = nn.Linear(10, 15)
self.final_layer = nn.Linear(15,2)
self.relu = nn.ReLU()
self.softmax = nn.Softmax(dim=1)
def forward(self, X_batch):
layer_out = self.relu(self.first_layer(X_batch))
layer_out = self.relu(self.second_layer(layer_out))
layer_out = self.relu(self.third_layer(layer_out))
return self.softmax(self.final_layer(layer_out))
```

Below we have initialized our classification neural network and made predictions on 5 samples of train data using it for testing purposes.

In [22]:

```
classifier = Classifier()
preds = classifier(X_train[:5])
preds
```

Out[22]:

In this section, we have included functions to train the neural network. This function will train whole data in a single pass and the code for it is exactly the same as the one we had in the regression section. There is only one change from the regression section that we are printing log loss here instead.

In [23]:

```
def TrainModel(model, loss_func, optimizer, X, Y, epochs=500):
for i in range(epochs):
preds = model(X) ## Make Predictions by forward pass through network
loss = loss_func(preds, Y) ## Calculate Loss
optimizer.zero_grad() ## Zero weights before calculating gradients
loss.backward() ## Calculate Gradients
optimizer.step() ## Update Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("NegLogLoss : {:.2f}".format(loss))
```

Now, we are actually training our classification function by using a function defined in the previous cell. We have initialized number of epochs (**1500**), learning rate (**0.01**), neural network, negative log loss (**NLLLoss()**) and gradient descent optimized (**SGD()**). We have then called our training function with classification neural network, loss function, optimizer, train features, train target values, and the number of epochs.

We can notice from the log loss getting printed at every 100 epochs of our training process that the network seems to be doing a good job at prediction because loss is decreasing at every 100 epochs.

In [24]:

```
from torch.optim import SGD
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 1500
learning_rate = torch.tensor(1/1e2) # 0.01
classifier = Classifier()
nll_loss = nn.NLLLoss()
optimizer = SGD(params=classifier.parameters(), lr=learning_rate)
TrainModel(classifier, nll_loss, optimizer, X_train, Y_train, epochs=epochs)
```

In this section, we have made predictions on train and test datasets using our trained neural network. As we had explained earlier, the output of our neural network is probabilities, we have included simple logic to convert probabilities to actual class by calling **torch.argmax()** method. It returns the argument of maximum probabilities in the 1st axis which will be either 0 or 1.

In [25]:

```
test_preds = classifier(X_test) ## Make Predictions on test dataset
test_preds = torch.argmax(test_preds, axis=1) ## Convert Probabilities to class type
train_preds = classifier(X_train) ## Make Predictions on train dataset
train_preds = torch.argmax(train_preds, axis=1) ## Convert Probabilities to class type
test_preds[:5], train_preds[:5]
```

Out[25]:

In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions. We can notice from the results that our network seems to be doing a decent job at prediction. We have also calculated classification report on test predictions using **classification_report()** function available from scikit-learn.

If you want to learn about the model evaluation and scoring metrics then please feel free to check our tutorial which covers various metrics available from scikit-learn.

In [26]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds)))
```

In [27]:

```
from sklearn.metrics import classification_report
print("Test Data Classification Report : ")
print(classification_report(Y_test, test_preds))
```

In this section, we have included code for training our classification neural network on data in batches. The code for this function is an almost exact copy of the function we had in the regression section hence we have not included a detailed description here.

In [28]:

```
def TrainModelInBatches(model, loss_func, optimizer, X, Y, batch_size=32, epochs=500):
for i in range(epochs):
batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
preds = model(X_batch) ## Make Predictions by forward pass through network
loss = loss_func(preds, Y_batch) ## Calculate Loss
losses.append(loss) ## Record Loss
optimizer.zero_grad() ## Zero weights before calculating gradients
loss.backward() ## Calculate Gradients
optimizer.step() ## Update Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("NegLogLoss : {:.2f}".format(torch.tensor(losses).mean()))
```

Below we are actually training our neural network using train data in batches. We have first initialized number of epochs (**1500**), learning rate (**0.001**), batch size (**32**), classification neural network, log loss (**NLLLoss()**) and gradient descent optimizer (**SGD()**). We have then called our training function with classifier, loss function, optimizer, train data features, train target values, batch size, and epochs to perform training.

We can notice from the loss value getting printed at every 100 epochs that our model is doing a good job because loss is decreasing over time.

In [29]:

```
from torch.optim import SGD
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 1500
learning_rate = torch.tensor(1/1e3) # 0.001
batch_size = 32
classifier = Classifier()
nll_loss = nn.NLLLoss()
optimizer = SGD(params=classifier.parameters(), lr=learning_rate)
TrainModelInBatches(classifier, nll_loss, optimizer, X_train, Y_train, batch_size=batch_size, epochs=epochs)
```

In this section, we have included the code for making predictions on data in batches using our neural network. The code for this function is exactly the same as the one from the regression section.

In [30]:

```
def MakePredictions(model, input_data, batch_size=32):
batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
with torch.no_grad():
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
```

Below we have made predictions on the train and test datasets in batches (batch size = **32**). We have then combined predictions of each batch and converted them to class from probabilities.

In [31]:

```
test_preds = MakePredictions(classifier, X_test) ## Make Predictions on test dataset
test_preds = torch.cat(test_preds) ## Combine all batch predictions
test_preds = torch.argmax(test_preds, axis=1) ## Convert Probabilities to class type
train_preds = MakePredictions(classifier, X_train) ## Make Predictions on train dataset
train_preds = torch.cat(train_preds) ## Combine all batch predictions
train_preds = torch.argmax(train_preds, axis=1) ## Convert Probabilities to class type
```

At last, we have evaluated the performance of our model by calculating the accuracy of train and test predictions. We have also calculated a classification report on test predictions which includes measures like precision, recall, and f1-score. We can notice from the results that our model seems to be doing a decent job at a classification task.

In [32]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds)))
```

In [33]:

```
from sklearn.metrics import classification_report
print("Test Data Classification Report : ")
print(classification_report(Y_test, test_preds))
```

This ends our small tutorial explaining how we can create simple neural networks using a high-level **torch.nn** module of **PyTorch**. Please feel free to let us know your views in the comments section.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

If you like our work please give a thumbs-up to our article in the comments section below. You can also support us with a small contribution by clicking on