**Pytorch** is a Python library that provides a framework for developing deep neural networks. It has a numpy-like API for working with N-dimensional arrays but operations on an array can be run on GPU as well which will be quite fast compared to when run on CPU. Apart from linear algebra on GPU, it provides **autograd** functionality which automatically calculates the gradients of function with respect to specified variables. This has sped up the deep learning research a lot as scientists do not need to write code to find out gradients of loss of complicated neural networks. Apart from this, it also provides different modules to create neural network layers, loss functions, optimizers, etc. Overall, **PyTorch** is specifically designed to speed up deep learning research with so many functionalities.

As a part of this tutorial, we'll explain with simple examples how we can create a simple neural network to solve regression and classification tasks using **PyTorch**. We'll be using toy datasets available from scikit-learn for our problem. We assume that the reader of this tutorial has a little bit of background on neural network terms (like hidden layers, loss function, optimizer, SGD, etc) as we won't be explaining their inner working in detail. The main aim of the tutorial is to get individuals started developing neural networks using **PyTorch**.

If you want to learn about the basics of **PyTorch** then please feel free to check our small tutorial where we have covered the basic API of it.

Below we have highlighted important sections of the tutorial to give an overview of the material covered.

- Regression
- Load Dataset
- Normalize Data
- Initialize Model Weights
- Activation for Hidden Layers
- Single Layer of Neural Network
- Single Forward Pass through Data to Make Predictions
- Define Loss Function
- Train Neural Network
- Make Predictions
- Evaluate Performance of Neural Network
- Train Data in Batches
- Make Predictions in Batches
- Evaluate Performance

- Classification

Below we have imported the **PyTorch** and printed the version of it that we'll be using in this tutorial.

In [1]:

```
import torch
print("PyTorch Version : {}".format(torch.__version__))
```

In [2]:

```
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Device : {}".format(device))
```

In this section, we'll explain how we can create a simple neural network using **PyTorch** numpy-like API to solve simple regression tasks. We'll be using the Boston housing dataset from scikit-learn for our example. We'll create individual parts of the neural network, test them and then connect all of them together.

Below we have loaded the Boston housing dataset available from scikit-learn. The features of the dataset are stored in a variable **X** and target values which are median house prices in dollar 1000 are stored in variable **Y**.

After loading the dataset, we have divided it into the train (80%) and test (20%) sets. We have then converted all numpy arrays to **PyTorch** tensors using **torch.tensor()** method. All **PyTorch** method requires input to be torch tensors hence this step was necessary.

We have also recorded a number of training samples and the number of features in separate variables as we'll be using them in our code.

In [3]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
torch.tensor(X_test, dtype=torch.float32),\
torch.tensor(Y_train, dtype=torch.float32),\
torch.tensor(Y_test, dtype=torch.float32)
samples, features = X_train.shape
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[3]:

In [4]:

```
samples, features
```

Out[4]:

In this section, we have normalized our data. The main reason behind normalizing data is to bring all features of data on the same scale so that all of them are in the almost same range. This helps the gradient descent algorithm to converge faster.

In order to normalize data, we have first found out the mean and standard deviation of our train data. Then we have subtracted the mean from both train and test sets. Then, we have divided the difference by standard deviation.

In [5]:

```
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean)/ std
X_test = (X_test - mean)/ std
```

In this section, we have designed a small function that takes as input size of layers of our neural networks and then returns weights and biases for all layers in a list initialized with random values.

The function takes layer sizes as input which includes the last layer as well. It then loops through layer sizes and creates weights and biases for each layer. The weights and biases for each layer are kept together in a list. So in the final returned list of weights, the first entry will be (weights, biases) for the first layer, a second entry will be (weights, biases) for the second layer, and so on.

The shape of weights of each layer will be **#units x #units_prev_layer** except for the first layer. The first layer weights will have shape **#units x #features**. All the biases will have shape **(#units,)**.

When we initialized weights and biases with random values using **torch.rand()** function, we provided one extra parameter named **requires_grad** with value **True**. This parameter indicates that whenever we calculate gradients of any function (e.g - Our neural network loss function) which has used these tensors in its calculation, we need to find the gradient of that function with respect to these tensors and store that gradient values inside of **grad** attribute of tensors.

In [6]:

```
def InitializeWeights(layer_sizes):
weights = []
for i, units in enumerate(layer_sizes):
if i==0:
w = torch.rand(units,features, dtype=torch.float32, requires_grad=True) ## First Layer
else:
w = torch.rand(units,layer_sizes[i-1], dtype=torch.float32, requires_grad=True) ## All other layers
b = torch.rand(units, dtype=torch.float32, requires_grad=True) ## Bias
weights.append([w,b])
return weights
```

Below we have tested our function by giving layer sizes **[15,10,1]**. After initializing weights and biases using our function, we have also printed the shape of them for verifying that the function works as expected.

We can notice from the results that the first layer has weights of shape **15x13** which is according to **#unit x #features**, the second layer has shape **10x15** according to **#units x #prev_layer_units**, and so on.

In [7]:

```
torch.manual_seed(123)
weights = InitializeWeights([15,10,1])
for i, (w,b) in enumerate(weights):
print("Layer : {}, Weights : {}, Biases : {}".format(i+1, w.shape, b.shape))
```

In this section, we have designed an activation function that we'll use for our hidden layers (layers except for input and output layers). The activation function that we'll use for our purpose is **Relu (Rectified Linear Units)**. The activation function takes as input an array and returns an array of the same shape where all values less than 0 will be replaced with 0 hence it'll only have values greater than or equal to 0.

In [8]:

```
def Relu(tensor):
return torch.maximum(tensor, torch.zeros_like(tensor)) # max(0,x)
```

Below we have tested our activation function by giving a sample tensor as input.

In [9]:

```
tensor = torch.tensor([-1,0,1,-2,4,-6,5])
Relu(tensor)
```

Out[9]:

In this section, we have defined a simple function that performs the work of one layer of our neural network. The function takes as input weights, input data, and activation function. It then performs the matrix dot product of input data and weights of the neural network. Then it adds biases to the output of the dot product. At last, we apply the activation function to the result and return it.

When performing matrix dot product of input data and weights, we have given transpose of weights as input. The reason behind this is that shape of input data will be **#batch_size x #features** and the shape of weights will be **#units x #features** for the first layer hence we need to take the transpose of weights to match dimensions for dot product. The same will happen for inner layers where we'll have the shape of input data **#batch_size x #prev_layer_units** and shape of weights will be **#units x #prev_layer_units#** hence we need to take the transpose of weights.

The output of this function will be of shape **#batch_size x #units**.

In [10]:

```
def LinearLayer(weights, input_data, activation=lambda x: x):
w, b = weights
out = torch.matmul(input_data, w.T) + b ## Multiply input by weights and add bias to it.
return activation(out) ## Apply activation at last
```

Below we have tested our function on random data and printed shape of input data and output data for verification purposes. We have used weights that we initialized when we defined our weight initialization function. We have used weights of the first layer as input to function. We can notice from the output that it has shape **5x15** which matches **#batch_size x #units** of the first layer.

In [11]:

```
rand_data = torch.rand(5, features)
out = LinearLayer(weights[0], rand_data, Relu)
print("Data Shape : {}".format(rand_data.shape))
print("Output Shape : {}".format(out.shape))
```

In this section, we have defined a function that performs one full forward pass of data through a neural network. The function takes weights and input data as input. It then loops through weights taking weights and biases of single layer and performs calculation of single layer by calling the function we designed in the previous cell. We loop through weights and biases of all layers except the last layer. For all inner layers, we have given **Relu** function as an activation function. For the last layer, we have not given any activation function because we want the output of the last layer as it is.

In [12]:

```
def ForwardPass(weights, input_data):
layer_out = input_data
for i in range(len(weights[:-1])):
layer_out = LinearLayer(weights[i], layer_out, Relu) ## Hidden Layer
preds = LinearLayer(weights[-1], layer_out) ## Final Layer
return preds.ravel()
```

Below we have tested the forward pass function of our neural network by giving our train data as input.

In [13]:

```
preds = ForwardPass(weights, X_train)
print("Input Shape : {}, Output Shape : {}".format(X_train.shape, preds.shape))
```

In this section, we have defined a loss function of our neural network. We'll be using the mean squared error loss function for our regression task.

`MSE(actual, predictions) = 1/n * (actual - prediction)^2`

The function takes as input actual target values and predicted target values as input. It then subtracts prediction values from actual target values, takes the square of the difference, and then averages all values to return one MSE value.

In [14]:

```
def MeanSquaredErrorLoss(actual, preds):
return torch.pow(actual - preds, 2).mean()
```

Below we have tested our loss function with a simple example.

In [15]:

```
y1 = torch.tensor([1,2,3], dtype=torch.float32)
y2 = torch.tensor([4,5,6],dtype=torch.float32)
MeanSquaredErrorLoss(y1, y2)
```

Out[15]:

In this section, we have defined a function that will actually perform the training of our neural network. Our training function takes as input train data features (**X**), target values (**Y**), learning rate, and a number of epochs. It then executes the training loop number of epochs time. Each time, it performed forward pass of train data through the neural network, calculates loss, calculates gradients, and at last update gradients. The forward pass is performed using the function we designed earlier by giving weights and features data as input. It returns predictions for input data. The predictions and actual target values are used to calculate loss value using the loss function. The gradients are calculated by simply calling **backward()** method on loss value. The **backward()** method uses chain rule to calculate gradients. This will calculate gradients of loss with respect to all weights where we had specified **requires_grad** as **True**. All the weights will have **grad** attribute set with gradient values.

As of last, we update weights using a loop. We subtract learning rate time gradients from all weights and biases. This process of updating weights by a small amount is commonly referred to gradient descent algorithm. After weights are updated, we set **grad** property of all weights tensor to **None**. This is done to prevent any issue when we call **backward()** method which generally adds gradients to **grad** attribute as it can add new gradients to previously present gradients if we don't remove them.

We have kept code to update weights inside of **torch.no_grad()** context manager. The reason behind this is that **PyTorch** keeps calculating gradients each time a function involving tensor with requires grad is executed. To prevent the calculation of gradients, we use this context manager.

We are also printing loss value at every 100 epochs.

In [16]:

```
def TrainModel(X, Y, learning_rate, epochs):
for i in range(1, epochs+1):
preds = ForwardPass(weights, X) ## Make Predictions by forward pass through network
loss = MeanSquaredErrorLoss(Y, preds) ## Calculate Loss
loss.backward() ## Calculate Gradients
with torch.no_grad():
for j in range(len(weights)): ## Update Weights
weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases
weights[j][0].grad = None
weights[j][1].grad = None
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(loss))
```

Here, we have trained our neural network by calling the function we designed in the previous cell. We have first initialized number of epochs (**2500**), learning rate (**0.0001**) and layer sizes (**[5,10,15,1]**). We have initialized weights using the weight initialization function which we created earlier by giving layer sizes as input.

We have then called our training function with train features, train target values, learning rate, and epochs. We can notice from the loss getting printed at every 100 epochs that our neural network is going in the right direction.

In [17]:

```
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 2500
learning_rate = torch.tensor(1/1e4) # 0.0001
layer_sizes = [5,10,15,1] ## Layer sizes including last layer
weights = InitializeWeights(layer_sizes) ## Initialize Weights
TrainModel(X_train, Y_train, learning_rate, epochs)
```

In this section, we are actually making predictions using our trained neural network weights.

We have made predictions for train data and test data both using our updated weights. We have used a function which we had designed earlier to perform one forward pass through a neural network.

In [18]:

```
train_preds = ForwardPass(weights, X_train)
train_preds[:5]
```

Out[18]:

In [19]:

```
test_preds = ForwardPass(weights, X_test)
test_preds[:5]
```

Out[19]:

In this section, we have evaluated the performance of our neural network by calculating **R^2 score** on both train and test predictions. We have used **r2_score()** function available from scikit-learn to calculate score. The function takes as input actual target values and predicted values. It then returns value in the range **[0,1]**. A value close to 1 is considered a good score.

If you are interested in learning about how **R^2 score** works then please feel free to check our tutorial on scikit-learn metrics which covers it in detail.

In [20]:

```
from sklearn.metrics import r2_score
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.detach().numpy(), Y_train.detach().numpy())))
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.detach().numpy(), Y_test.detach().numpy())))
```

In many real-life situations, the size of data is quite large and it generally does not fit into the main memory of the computer. To solve this issue, we take a small batch of data from the whole dataset at a time, make predictions on it, calculate loss, calculate gradients and then update weights using those gradients. We divide the dataset into batches and perform the same tasks for all batches of data. This algorithm of working on a small batch of data which consists of a few samples of data is generally referred to as **Stochastic gradient descent** a variant of gradient descent.

Our current dataset is small and it fits into the main memory of the computer, but we'll treat it as a big dataset that does not fit into the main memory to explain training data in batches.

We have designed a different function to perform training in batches. The function takes training data, training label, learning rate, number of epochs, and batch size as inputs. It then performs a training loop number of epoch times. Each time, we first calculate a number of batches of our data. We then loop through a number of batches calculating start and end indices of batches to filter our original data to take a single batch of data. We then perform forward pass through a single batch of data, calculate loss, calculate gradients by calling **backward()** on loss, and at last update the weights of the neural network. We do this process for all batches of data and all batches are executed the number of epochs times. In this case, we have separated the logic for updating weights into a separate function to prevent the training function from getting large.

We are also printing loss at every 100 epochs to track it.

In [21]:

```
def UpdateWeights(weights, learning_rate):
with torch.no_grad():
for j in range(len(weights)): ## Update Weights
weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases
weights[j][0].grad = None
weights[j][1].grad = None
def TrainModelInBatches(X, Y, learning_rate, epochs, batch_size=32):
for i in range(1, epochs+1):
batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
preds = ForwardPass(weights, X_batch) ## Make Predictions by forward pass through network
loss = MeanSquaredErrorLoss(Y_batch, preds) ## Calculate Loss
losses.append(loss) ## Record Loss
loss.backward() ## Calculate Gradients
UpdateWeights(weights, learning_rate) ## Update Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(torch.tensor(losses).mean()))
```

Below we are actually performing training of our neural network by calling the training routine we designed in the previous cell. We have initialized number of epochs (**2500**), learning rate (**0.0001**) and layer sizes (**[5,10,15,1]**). We have then initialized the weights of the neural network by calling the weight initialization function giving it layer sizes. At last, we have called our training function to actually perform training by giving training data to it.

We can notice from the **MSE** getting printed every 100 epochs that our neural network is doing a good job.

In [22]:

```
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 2500
learning_rate = torch.tensor(1/1e4) # 0.0001
layer_sizes = [5,10,15,1] ## Layer sizes including last layer
weights = InitializeWeights(layer_sizes) ## Initialize Weights
TrainModelInBatches(X_train, Y_train, learning_rate, epochs)
```

As we have assumed that we can only fit certain samples into the main memory of the computer and not all of them, we need to design a function that will do predictions on a batch of data.

Below we have created a function that performs prediction on batches of data taking one batch at a time and at last, it combines all predictions.

The function generates a number of batches just like our training function at the beginning. It then loops through data in batches, makes predictions, and combines them before returning all predictions.

In [23]:

```
def MakePredictions(input_data, batch_size=32):
batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
with torch.no_grad(): ## Disables automatic gradients calculations
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(ForwardPass(weights, X_batch))
return preds
```

Below we have made predictions on the test and train dataset in batches using the function we designed earlier. We have then combined the predictions of all batches as well.

In [24]:

```
test_preds = MakePredictions(X_test)
test_preds = torch.cat(test_preds)
train_preds = MakePredictions(X_train)
train_preds = torch.cat(train_preds)
```

In this section, we have evaluated the **R^2 score** on the train and test predictions. We can notice from the results that the result are a little better compared to when we worked on the whole data at a time. This might be due to weights getting updated for each batch of the data.

In [25]:

```
from sklearn.metrics import r2_score
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds.detach().numpy(), Y_train.detach().numpy())))
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds.detach().numpy(), Y_test.detach().numpy())))
```

In this section, we'll explain how we can design a neural network to solve classification tasks. We'll be creating a small neural network to solve a simple binary classification task. We'll be using the breast cancer dataset available from scikit-learn for our purpose.

The majority of code in this section will be repeated of what we had already coded in the regression section hence we won't be including detailed descriptions of them again. We have included them here for someone who starts directly from this section to follow along from top to bottom without copying code from the regression section.

In this section, we have loaded the breast cancer dataset available from scikit-learn. The dataset has features related to measurements of tumors in breast cancer and the target variable is binary indicating whether the tumor is malignant or benign. The features of the dataset are loaded in variable **X** and target values are loaded in variable **Y**.

After loading the dataset, we have divided it into the train (80%) and test (20%) sets. We have then converted all numpy arrays holding datasets to **PyTorch** tensors. We have also recorded a number of training samples, a number of data features, and unique classes of the target in separate variables as we'll be using them in our code later on.

In [26]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train, X_test, Y_train, Y_test = torch.tensor(X_train, dtype=torch.float32),\
torch.tensor(X_test, dtype=torch.float32),\
torch.tensor(Y_train, dtype=torch.float32),\
torch.tensor(Y_test, dtype=torch.float32)
samples, features = X_train.shape
classes = Y_test.unique()
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[26]:

In [27]:

```
samples, features, classes
```

Out[27]:

In this section, we have normalized our data as usual by subtracting the mean and dividing the difference by standard deviation. The code is exactly the same as that from the regression section.

In [28]:

```
mean = X_train.mean(axis=0)
std = X_train.std(axis=0)
X_train = (X_train - mean)/ std
X_test = (X_test - mean)/ std
```

In this section, we have included a function to initialize the weights of neural networks. This function is almost exactly the same as the one we used in the regression section with a few minor modifications. We have introduced a new parameter named scale in the method signature. We are not passing **requires_grad** when creating weights using **torch.rand()** methods. Instead, we are setting **requires_grad** after we have applied scale to our weights.

The reason behind applying scale to our weights to decrease their values is that during our training process, we came to know that current weights were generating big values as input to our sigmoid function which was evaluated to 1 for large values. This resulted in our loss function being evaluated to 0 because our loss function takes a log of predictions. The loss of 0 resulted in all gradients becoming 0. As all gradients were zero, our model was not training. In order for our model to move, we needed to reduce weights so that values that go inside are not that big so that it returns values less than 1. These kinds of adjustments are needed when weights turn 0 or NaNs.

In [29]:

```
def InitializeWeights(layer_sizes, scale=0.1):
weights = []
for i, units in enumerate(layer_sizes):
if i==0:
w = torch.rand(units,features, dtype=torch.float32)
else:
w = torch.rand(units,layer_sizes[i-1], dtype=torch.float32)
b = torch.rand(units, dtype=torch.float32)
if scale: ## Scale weights
w = w*scale
b = b*scale
w.requires_grad=True ## Set requires grad after weights are updated with scale
b.requires_grad=True
weights.append([w,b])
return weights
```

In this section, we have included activation function **Relu** for the inner layers of our neural network. It takes as input **PyTorch** tensor and returns tensor where all values less than 0 will be replaced by 0.

In [30]:

```
def Relu(tensor):
return torch.maximum(tensor, torch.zeros_like(tensor)) # max(0,x)
```

In this section, we have included code for the activation function of our last layer. As this is a binary classification task, we'll be using **sigmoid function** as an activation function for the last layer of our neural network. The sigmoid function takes as input **PyTorch** tensor and maps their values in the range **[0,1]**.

`sigmoid(x) = 1 / (1 + e^-x)`

After defining the function, we have also tested the function on random data. We have compared the results with the ready function available from **torch.nn** module.

In [31]:

```
def Sigmoid(tensor):
return 1 / (1 + torch.exp(-tensor))
```

In [32]:

```
tensor = torch.tensor([1,2,3,4,5])
Sigmoid(tensor), torch.nn.Sigmoid()(tensor)
```

Out[32]:

In this section, we have included a function that applies one layer of a neural network to input data. The code for this function is an exact copy of what we have in the regression section.

In [33]:

```
def LinearLayer(weights, input_data, activation=lambda x: x):
w, b = weights
out = torch.matmul(input_data, w.T) + b ## Multiply input by weights and add bias to it.
return activation(out) ## Apply activation at last
```

In this section, we have included a function that performs one forward pass through a whole neural network. It uses the function we defined in the previous cell to apply one layer at a time. This function has exactly the same code as the one from the regression section with only two minor changes. In the regression section, we did not have any activation function applied to the last layer whereas here, we have provided the sigmoid function as the activation function for the last layer. The other change is that we have clipped the values which came from the last layer in the range **[0.01,0.99]**. This was done to prevent loss function getting 0 as it uses log and log of 1 is 0. The loss of 0 can make gradients 0 and mess up the whole training process.

In [34]:

```
def ForwardPass(weights, input_data):
layer_out = input_data
for i in range(len(weights[:-1])):
layer_out = LinearLayer(weights[i], layer_out, Relu) ## Hidden Layer
preds = LinearLayer(weights[-1], layer_out, Sigmoid) ## Final Layer
return torch.clamp(preds.squeeze(), 0.01, 0.99)
```

In this section, we have defined a loss function for our binary classification task. We'll be using **log loss** function for our purpose. We have also included the formula of the log loss function below.

`log_loss(actual, predictions) = -actual * log(predictions) - (1-actual) * log(1-predictions)`

After defining the loss function, we have also tested the function with two arrays as input. We have also verified the function output with the ready log loss function available from scikit-learn to check whether our implementation is right.

In [35]:

```
def NegLogLoss(actual, preds):
loss = - actual * torch.log(preds) - (1 - actual) * torch.log(1 - preds)
return loss.mean()
```

In [36]:

```
y1 = torch.tensor([1,1,0, 0,1])
y2 = torch.tensor([0.7,0.1,0.69, 0.1,0.23])
NegLogLoss(y1, y2)
```

Out[36]:

In [37]:

```
from sklearn.metrics import log_loss
log_loss(y1.detach().numpy(), y2.detach().numpy())
```

Out[37]:

In this section, we have defined a function that will actually train our neural network. We just need to call this function and it'll perform the training process. The function has exactly the same code as the one we used in the regression section with only one minor difference which is that we have used the log loss function here.

In [38]:

```
from torch import autograd
def TrainModel(X, Y, learning_rate, epochs):
for i in range(1, epochs+1):
preds = ForwardPass(weights, X) ## Make Predictions by forward pass through network
loss = NegLogLoss(Y, preds) ## Calculate Loss
loss.backward() ## Calculate Gradients
with torch.no_grad():
for j in range(len(weights)): ## Update Weights
weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases
weights[j][0].grad = None
weights[j][1].grad = None
if i % 100 == 0: ## Print NegLogLoss every 100 epochs
print("NegLogLoss : {:.2f}".format(loss))
```

Below we have actually performed training of our neural network by calling the training function we designed in the previous cell. We have first initialized number of epochs (**2500**), learning rate (**0.01**) and layer sizes (**[5,10,15,1]**). We have then initialized our layer weights and biases using the weight initialization function we had designed earlier.

At last, we have called our training function with train features data, train target values, learning rate, and epochs as input. We can notice from the log loss getting printed at every 100 epochs that it seems to be moving in the right direction.

In [39]:

```
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 2500
learning_rate = torch.tensor(1/1e2) # 0.01
layer_sizes = [5,10,15,1] ## Layer sizes including last layer
weights = InitializeWeights(layer_sizes) ## Initialize Weights
TrainModel(X_train, Y_train, learning_rate, epochs)
```

In this section, we are making predictions on our train and test datasets. We have used the forward pass function we designed earlier to make predictions on train and test datasets. The output of our neural network is sigmoid output which is probabilities in the range **[0,1]**. We need to convert these probabilities to the actual class of our classification problem. We have set the threshold at 0.5, classifying all values greater than it as class 1 (malignant tumor) and all values less than the threshold as class 0 (benign tumor).

In [40]:

```
train_preds = ForwardPass(weights, X_train)
train_preds = torch.as_tensor(train_preds > 0.5, dtype=torch.float32)
train_preds[:5], Y_train[:5]
```

Out[40]:

In [41]:

```
test_preds = ForwardPass(weights, X_test)
test_preds = torch.as_tensor(test_preds > 0.5, dtype=torch.float32)
test_preds[:5], Y_test[:5]
```

Out[41]:

In this section, we have evaluated the performance of our classification neural network by calculating accuracy on train and test predictions. We can notice from the results that our model seems to have done a decent job.

In [42]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds)))
```

In this section, we have explained how we can perform training in batches on datasets that do not fit into the main memory of the computer. We have included code for the function which actually performs training in batches. The code for this function is exactly the same as the one we used in the regression section with one minor change. We are using the log loss function this time for our classification problem.

In [43]:

```
def UpdateWeights(weights, learning_rate):
with torch.no_grad():
for j in range(len(weights)): ## Update Weights
weights[j][0] -= learning_rate * weights[j][0].grad ## Update Weights
weights[j][1] -= learning_rate * weights[j][1].grad ## Update Biases
weights[j][0].grad = None
weights[j][1].grad = None
def TrainModelInBatches(X, Y, learning_rate, epochs, batch_size=32):
for i in range(1, epochs+1):
batches = torch.arange((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
preds = ForwardPass(weights, X_batch) ## Make Predictions by forward pass through network
loss = NegLogLoss(Y_batch, preds) ## Calculate Loss
losses.append(loss) ## Record Loss
loss.backward() ## Calculate Gradients
UpdateWeights(weights, learning_rate) ## Update Weights
if i % 100 == 0: ## Print NegLogLoss every 100 epochs
print("NegLogLoss : {:.2f}".format(torch.tensor(losses).mean()))
```

Below we have actually trained our neural network by calling the training function we designed in the previous cell. We have initialized number of epochs (**2500**), learning rate (**0.001**) and layer sizes (**[5,10,15,1]**). We have first initialized our model's weights and biases of each layer using the weights initialization function we designed earlier. We have then trained our neural network using the function we designed in the previous cell by giving it training data features, training target values, learning rate, and epochs as input.

We can notice from the log loss getting printed every 100 epochs that it seems to be doing better.

In [44]:

```
torch.manual_seed(42) ##For reproducibility.This will make sure that same random weights are initialized each time.
epochs = 2500
learning_rate = torch.tensor(1/1e3) # 0.01
layer_sizes = [5,10,15, 1] ## Layer sizes including last layer
weights = InitializeWeights(layer_sizes) ## Initialize Weights
TrainModelInBatches(X_train, Y_train, learning_rate, epochs)
```

In this section, we have included a function to make predictions on a dataset in batches. As we are assuming that our dataset does not fit into the main memory of a computer and we can bring only a batch of data into the main memory, we need to design a function that does prediction on batches of data and then combine them. The below function has almost exactly the same code as the one we had used in the regression section.

In [45]:

```
def MakePredictions(input_data, batch_size=32):
batches = torch.arange((input_data.shape[0]//batch_size)+1) ### Batch Indices
with torch.no_grad(): ## Disables automatic gradients calculations
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(ForwardPass(weights, X_batch))
return preds
```

Below we have made predictions on train and test datasets using a function from the previous cell. We have also converted probabilities to class type for evaluation purposes.

In [46]:

```
test_preds = MakePredictions(X_test) ## Make Predictions on test dataset
test_preds = torch.cat(test_preds) ## Combine all batch predictions
test_preds = torch.as_tensor(test_preds > 0.5, dtype=torch.float32) ## Convert Probabilities to class type
train_preds = MakePredictions(X_train) ## Make Predictions on train dataset
train_preds = torch.cat(train_preds) ## Combine all batch predictions
train_preds = torch.as_tensor(train_preds > 0.5, dtype=torch.float32) ## Convert Probabilities to class type
```

At last, we have evaluated the performance of our model by calculating the accuracy of the train and test predictions below. We can notice from the results that the model seems to have done a decent job.

In [47]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds)))
```

This ends our small tutorial explaining how we can use **PyTorch's** low-level numpy-like API to create neural networks. We have covered a tutorial on creating a neural network using **PyTorch's** high-level API available through **torch.nn** module as well in the separate tutorial for those interested in learning about it (link in the Reference section below). Please feel free to let us know your views in the comments section.

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

Sunny Solanki