Updated On : Dec-28,2021 Time Investment : ~30 mins

Sonnet: Guide to Create Simple Neural Networks

Sonnet is a deep learning library created on the top of tensorflow by deepmind. It let us construct deep neural networks very easily. It also provides the implementation of many commonly used neural network architectures like multi-layer perceptrons, resnet, etc. Sonnet is a high-level framework like keras but it does not provide a training framework to train models and users need to design it.

As a part of this tutorial, we'll explain how to get started with Sonnet. We'll be creating simple neural networks using MLP() network constructor provided by Sonnet. We'll be using small toy datasets available from scikit-learn for explanation purposes.

Below we have highlighted important sections of the tutorial to give an overview of the material that we have covered.

Important Sections of Tutorial

  1. Regression
    • Load Dataset
    • Normalize Data
    • Create Neural Network
    • Train Neural Network
    • Make Predictions
    • Evaluate Model Performance
    • Train Network on Batches of Data
    • Make Predictions in Batches
    • Evaluate Model Performance
  2. Classification

Installation

  • pip install dm-sonnet

Below we have imported Sonnet and printed the version that we'll be using in our tutorial. We have also imported and printed the version of tensorflow as Sonnet is built on top of it.

import sonnet as snt

print("Sonnet Version : {}".format(snt.__version__))
Sonnet Version : 2.0.0
import tensorflow as tf

print("Tensorflow Version : {}".format(tf.__version__))
Tensorflow Version : 2.7.0

1. Regression

In this section, we'll explain how we can create a simple neural network using Sonnet to solve regression tasks. We'll be using Boston housing data for our regression task.

Load Dataset

In this section, we have loaded the Boston housing dataset available from scikit-learn. We have loaded data features in the variable named X and target values in the variable named Y. The target value is continuous value specifying median house price in 1000 dollars. We have then divided the dataset into the train (80%) and test (20%) sets. We have then converted numpy arrays to tensorflow tensors as the neural networks created through Sonnet will require tensorflow tensor as input.

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_boston(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)

X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
                                   tf.convert_to_tensor(X_test, dtype=tf.float32),\
                                   tf.convert_to_tensor(Y_train, dtype=tf.float32),\
                                   tf.convert_to_tensor(Y_test, dtype=tf.float32)

samples, features = X_train.shape

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
(TensorShape([404, 13]),
 TensorShape([102, 13]),
 TensorShape([404]),
 TensorShape([102]))
samples, features
(404, 13)

Normalize Data

In this section, we have normalized our dataset. The data normalization brings the majority of the features of the dataset onto the same scale. If the features are on different scales and vary a lot in value then it can make optimization algorithms like gradient descent hard time to converge. Data normalization can help optimization algorithms converge faster.

To normalize our datasets, we have first calculated the mean and standard deviation of each feature of train data. Then we have subtracted mean and divided subtracted values by standard deviation for both train and test sets.

mean = tf.math.reduce_mean(X_train, axis=0)
std = tf.math.reduce_std(X_train, axis=0)

X_train = (X_train - mean) / std
X_test = (X_test - mean) / std

Create Neural Network

The simplest way to create a neural network using Sonnet is by using readily available networks from nets module. It provides us with MLP() which lets us create multi-layer perceptrons by providing layer sizes as input. We'll be using MLP() constructor to create neural networks for our task in this tutorial.


  • MLP(output_sizes=None,w_init=None,b_init=None,with_bias=None,activation=None,dropout_rate=None,activate_final=False, name="") - This constructor takes layer sizes as list and creates neural network of linear layers from it.
    • The w_init parameter accepts a function that can initialize weights of layers.
    • The b_init parameter accepts a function that can initialize biases of layers.
    • The with_bias takes the boolean value as input. If we set it to True then bias will be included else not. By default, it'll be set to True.
    • The activation parameter accepts activation that needs to be applied to layers. By default, it's Relu.
    • The dropout_rate accepts float in the range 0-1 specifying the dropout value for each layer.
    • The activate_final parameter accepts boolean value specifying whether to activate final layer output or not.

Below we have created a regressor for our task by calling constructor MLP(). We have asked it to create a neural network with layer sizes [5,10,15,1]. Then in the next cell, we have performed a forward pass on it by providing random data for verification purposes.

regressor = snt.nets.MLP(output_sizes=[5,10,15,1])

print(regressor)
MLP(output_sizes=[5, 10, 15, 1])
preds = regressor(tf.random.uniform(X_train.shape))

preds[:5]
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[-0.23220932],
       [-0.50559634],
       [-0.19719885],
       [-0.09522709],
       [-0.3821597 ]], dtype=float32)>

Train Neural Network

In this section, we are actually training our neural network which we created earlier. We have first declared the number of epochs (1000) and learning rate (0.001). We have then initialized our gradient descent optimizer giving the learning rate that we'll be using for updating weights. We have then initialized MeanSquaredError() loss function which will be loss for our regression task. We'll be evaluating it at every epoch and calculate gradients of weights with respect to it which we'll later use to update weights.

We have then executed the training loop number of epoch time. We have wrapped the whole training loop inside of tf.GradientTape() context manager as it'll let us record gradients. Without it, gradients won't be recorded. During each training loop, we first perform forward pass through training data using the model to make predictions, then we calculate loss and gradients, and at last, we update weights using the optimizer. We use GradientTape object to calculate gradients of loss with respect to weights. We are printing the loss value every 100 epochs to check progress. We can notice from the loss value getting printed every 100 epochs that our model seems to be doing a good job.

epochs = 1000
learning_rate = 0.001

optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
mse_loss = tf.losses.MeanSquaredError()

for i in range(epochs):
    with tf.GradientTape() as tape:
        preds = regressor(X_train) ## Make Predictions
        loss = mse_loss(Y_train, preds) ## Calculate Loss

        params = regressor.trainable_variables ## Retrieve Model Parameters
        grads = tape.gradient(loss, params) ## Calculate Gradients

        optimizer.apply(grads, params) ## Update Weights

        if i % 100 == 0: ## Print MSE every 100 epochs
            print("MSE : {:.2f}".format(loss))
MSE : 598.35
MSE : 16.49
MSE : 12.21
MSE : 10.44
MSE : 9.56
MSE : 9.02
MSE : 8.70
MSE : 8.48
MSE : 8.32
MSE : 8.14

Make Predictions

In this section, we have made predictions on train and test datasets. We simply need to call the instance of MLP by giving data to it to make predictions.

train_preds = regressor(X_train)

train_preds[:5]
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[47.467865],
       [11.79853 ],
       [22.5399  ],
       [26.812387],
       [15.326295]], dtype=float32)>
test_preds = regressor(X_test)

test_preds[:5]
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[14.199584],
       [26.667833],
       [44.501595],
       [21.763641],
       [29.870522]], dtype=float32)>

Evaluate Model Performance

In this section, we are evaluating the performance of our model. We have first calculated the mean squared error on both train and test predictions.

Then in the next cell, we have calculated R^2 score on both train and test predictions. The R^2 score is calculated for regression tasks and has values generally in the range [0,1]. The values near 1 are considered a good model. We can notice from our results that r^2 score is near to 1 for both train and test predictions.

If you want to learn about metrics like r^2 score and other metrics available from scikit-learn then please feel free to check our tutorial which covers the majority of metrics.

print("Test  MSE Score : {:.2f}".format(mse_loss(Y_test, test_preds)))
print("Train MSE Score : {:.2f}".format(mse_loss(Y_train, train_preds)))
Test  MSE Score : 21.67
Train MSE Score : 8.57
from sklearn.metrics import r2_score

print("Test  R^2 Score : {:.2f}".format(r2_score(tf.squeeze(test_preds), Y_test)))
print("Train R^2 Score : {:.2f}".format(r2_score(tf.squeeze(train_preds), Y_train)))
Test  R^2 Score : 0.71
Train R^2 Score : 0.89

Train Network on Batches of Data

Our previous training of a neural network considered the whole dataset. As our dataset is tiny and easily fits into the main memory, we can perform training by taking whole training data at a time. But in real life, there are situations where whole train data does not fit into the main memory of the computer. In those situations, we only bring a few samples of train data into the main memory and perform training on them. We cover the whole training data by bringing data into batches in the main memory.

To explain how we can perform training on data in batches, we'll treat our data as if it does not fit into the main memory. We'll then perform training on batches of data.

First, we have created a neural network with the same layer sizes as earlier using MLP() constructor. We have then initialized epochs (500), batch size (32), and learning rate (0.001). We have then initialized the optimizer by providing the learning rate. We have also initialized our mean squared error loss function. We are then executing our training loop number of epochs time.

During each epoch, we are dividing data into batches based on batch size. We are training a neural network on the batch of data at a time and updating weights based on the loss calculated on the batch of data. Other than including the logic of the batch, the rest of the code is almost the same as our previous training code.

regressor = snt.nets.MLP(output_sizes=[5,10,15,1])

preds = regressor(tf.random.uniform(X_train[:5].shape))
import numpy as np

epochs = 500
batch_size = 32
learning_rate = 0.001

optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
mse_loss = tf.losses.MeanSquaredError()

for i in range(epochs):
    batches = tf.range((X_train.shape[0]//batch_size)+1) ### Batch Indices

    losses = [] ## Record loss of each batch
    for batch in batches:
        if batch != batches[-1]:
            start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
        else:
            start, end = int(batch*batch_size), None

        X_batch, Y_batch = X_train[start:end], Y_train[start:end] ## Single batch of data

        with tf.GradientTape() as tape:
            preds = regressor(X_batch) ## Make Predictions
            loss = mse_loss(Y_batch, preds) ## Calculate Loss

            params = regressor.trainable_variables ## Retrieve Model Parameters
            grads = tape.gradient(loss, params) ## Calculate Gradients

            optimizer.apply(grads, params) ## Update Weights

        losses.append(loss) ## Record Loss

    if i % 100 == 0: ## Print MSE every 100 epochs
        print("MSE : {:.2f}".format(tf.reduce_mean(tf.convert_to_tensor(losses))))
MSE : 575.26
MSE : 8.76
MSE : 7.83
MSE : 7.34
MSE : 6.80

Make Predictions in Batches

As we can not fit whole data into main memory for making predictions, we'll make predictions also on batches of data. We'll later combine predictions of all batches to form predictions of whole input data.

Below we have created a function that takes as input neural network, input data, and batch size as input. It then loops through input data taking a single batch of data based on batch size, makes predictions on it, and records predictions. We have then combined all predictions later on.

def MakePredictions(model, input_data, batch_size=32):
    batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices

    preds = []
    for batch in batches:
        if batch != batches[-1]:
            start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
        else:
            start, end = int(batch*batch_size), None

        X_batch = input_data[start:end]

        preds.append(model(X_batch))

    return preds
train_preds = MakePredictions(regressor, X_train, 32)
train_preds = tf.squeeze(tf.concat(train_preds, axis=0))

test_preds = MakePredictions(regressor, X_test, 32)
test_preds = tf.squeeze(tf.concat(test_preds, axis=0))

Evaluate Model Performance

In this section, we have evaluated the performance of our model by calculating r^2 score on train and test predictions.

print("Test  MSE Score : {:.2f}".format(mse_loss(Y_test, test_preds)))
print("Train MSE Score : {:.2f}".format(mse_loss(Y_train, train_preds)))
Test  MSE Score : 20.91
Train MSE Score : 6.17
from sklearn.metrics import r2_score

print("Test  R^2 Score : {:.2f}".format(r2_score(test_preds, Y_test)))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds, Y_train)))
Test  R^2 Score : 0.73
Train R^2 Score : 0.92

2. Classification

In this section, we'll explain how we can create simple neural networks using Sonnet to solve classification tasks. We'll be using a small dataset available from scikit-learn for explanation purposes. We'll be reusing the majority of our code from the regression section. Due to this, we won't include a detailed description of repeated code parts.

Load Data

In this section, we have loaded the breast cancer dataset available from scikit-learn. The target values of the dataset are either 1 indicating malignant tumor or 0 indicating benign tumor. The features are various measurements of tumors. We have then divided the dataset into the train (80%) and test (20%) sets.

from sklearn import datasets
from sklearn.model_selection import train_test_split

X, Y = datasets.load_breast_cancer(return_X_y=True)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)

X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
                                   tf.convert_to_tensor(X_test, dtype=tf.float32),\
                                   tf.convert_to_tensor(Y_train, dtype=tf.float32),\
                                   tf.convert_to_tensor(Y_test, dtype=tf.float32)

samples, features = X_train.shape
classes = tf.unique(Y)

X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
(TensorShape([455, 30]),
 TensorShape([114, 30]),
 TensorShape([455]),
 TensorShape([114]))
samples, features, classes.y
(455, 30, <tf.Tensor: shape=(2,), dtype=int64, numpy=array([0, 1])>)

Normalize Data

In this section, we have normalized our train and test datasets using mean and standard deviation calculated on features of train datasets.

mean = tf.math.reduce_mean(X_train, axis=0)
std = tf.math.reduce_std(X_train, axis=0)

X_train = (X_train - mean) / std
X_test = (X_test - mean) / std

Create Neural Network

In this section, we have created a neural network using MLP() constructor available from nets module of Sonnet. We have designed the neural network with the layer sizes [5,10,15,1].

classifier = snt.nets.MLP(output_sizes=[5,10,15,1])

print(classifier)
MLP(output_sizes=[5, 10, 15, 1])
preds = classifier(tf.random.uniform(X_train.shape))

preds[:5]
<tf.Tensor: shape=(5, 1), dtype=float32, numpy=
array([[-0.04777364],
       [-0.03383067],
       [-0.02051447],
       [-0.03855529],
       [-0.02435572]], dtype=float32)>

Train Neural Network

In this section, we have included logic to train our classification neural network. The code for this section is almost the same as that from the regression section with a few minor changes.

We have initialized a number of epochs to 1000 and the learning rate to 0.001. We are using binary cross-entropy loss as our loss function for a binary classification task.

We are applying a sigmoid function to the output of the neural network to convert output as a probability in the range [0,1]. The MLP() constructor does not let us specify the activation function explicitly for the last layer hence we need to apply the sigmoid function to the output of the neural network. As our output of neural network after applying sigmoid function is probability in the range 0-1, we'll later include threshold to convert probability to actual prediction class.

epochs = 1000
learning_rate = 0.001

optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
binary_crossentropy_loss = tf.losses.BinaryCrossentropy()

for i in range(epochs):
    with tf.GradientTape() as tape:
        preds = classifier(X_train) ## Make Predictions
        preds = tf.squeeze(preds)
        preds = tf.sigmoid(preds)
        loss = binary_crossentropy_loss(Y_train, preds) ## Calculate Loss

        params = classifier.trainable_variables ## Retrieve Model Parameters
        grads = tape.gradient(loss, params) ## Calculate Gradients

        optimizer.apply(grads, params) ## Update Weights

        if i % 100 == 0: ## Print CrossEntropy every 100 epochs
            print("Binary Cross Entropy : {:.2f}".format(loss))
Binary Cross Entropy : 0.69
Binary Cross Entropy : 0.68
Binary Cross Entropy : 0.68
Binary Cross Entropy : 0.67
Binary Cross Entropy : 0.66
Binary Cross Entropy : 0.66
Binary Cross Entropy : 0.65
Binary Cross Entropy : 0.65
Binary Cross Entropy : 0.64
Binary Cross Entropy : 0.64

Make Predictions

In this section, we are making predictions on train and test sets. After making predictions, we are applying a sigmoid function to the output of the neural network. We have then set the threshold at 0.5, classifying values less than it as class 0 (benign tumor) and values greater than it as class 1 (malignant tumor).

test_preds = classifier(X_test)
test_preds_probs = tf.sigmoid(tf.squeeze(test_preds))
test_preds_classes = tf.cast((test_preds_probs > 0.5), dtype=tf.float32)

train_preds = classifier(X_train)
train_preds_probs = tf.sigmoid(tf.squeeze(train_preds))
train_preds_classes = tf.cast((train_preds_probs > 0.5), dtype=tf.float32)

Evaluate Model Performance

In this section, we have evaluated the performance of our classification model by calculating the accuracy of train and test sets.

print("Test  NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(test_preds_probs, Y_test)))
print("Train NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(train_preds_probs, Y_train)))
Test  NegLogLoss Score : 7.26
Train NegLogLoss Score : 7.18
from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds_classes)))
print("Test  Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds_classes)))
Train Accuracy : 0.73
Test  Accuracy : 0.77

Train Network on Batches of Data

In this section, we have included logic to train our classification model on batches of data. The logic for training data in batches is almost the same as that from the regression section with minor changes that are loss function and application of a sigmoid function to the output of the neural network.

classifier = snt.nets.MLP(output_sizes=[5,10,15,1])

preds = classifier(tf.random.uniform(X_train.shape))
import numpy as np

epochs = 500
batch_size = 32
learning_rate = 0.001

optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
binary_crossentropy_loss = tf.losses.BinaryCrossentropy()

for i in range(epochs):
    batches = tf.range((X_train.shape[0]//batch_size)+1) ### Batch Indices

    losses = [] ## Record loss of each batch
    for batch in batches:
        if batch != batches[-1]:
            start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
        else:
            start, end = int(batch*batch_size), None

        X_batch, Y_batch = X_train[start:end], Y_train[start:end] ## Single batch of data

        with tf.GradientTape() as tape:
            preds = classifier(X_train)
            preds = tf.squeeze(preds)
            preds = tf.sigmoid(preds)
            loss = binary_crossentropy_loss(Y_train, preds) ## Calculate Loss

            params = classifier.trainable_variables ## Retrieve Model Parameters
            grads = tape.gradient(loss, params) ## Calculate Gradients

            optimizer.apply(grads, params) ## Update Weights

        losses.append(loss) ## Record Loss

    if i % 100 == 0: ## Print CrossEntropy every 100 epochs
        print("Binary Cross Entropy : {:.2f}".format(tf.math.reduce_mean(tf.convert_to_tensor(losses))))
Binary Cross Entropy : 0.11
Binary Cross Entropy : 0.10
Binary Cross Entropy : 0.09
Binary Cross Entropy : 0.08
Binary Cross Entropy : 0.07

Make Predictions in Batches

In this section, we have made predictions on train and test sets by giving data to the model in batches. We have then combined predictions of batches.

def MakePredictions(model, input_data, batch_size=32):
    batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices

    preds = []
    for batch in batches:
        if batch != batches[-1]:
            start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
        else:
            start, end = int(batch*batch_size), None

        X_batch = input_data[start:end]

        batch_preds = model(X_batch)

        preds.append(tf.sigmoid(tf.squeeze(batch_preds)))

    return preds
train_preds = MakePredictions(classifier, X_train, 32)
train_preds_probs = tf.squeeze(tf.concat(train_preds, axis=0))
train_preds_classes = tf.cast((train_preds_probs > 0.5), dtype=tf.float32)

test_preds = MakePredictions(classifier, X_test, 32)
test_preds_probs = tf.squeeze(tf.concat(test_preds, axis=0))
test_preds_classes = tf.cast((test_preds_probs > 0.5), dtype=tf.float32)
test_preds_classes[:5], Y_test[:5]
(<tf.Tensor: shape=(5,), dtype=float32, numpy=array([0., 0., 1., 1., 1.], dtype=float32)>,
 <tf.Tensor: shape=(5,), dtype=float32, numpy=array([0., 0., 1., 1., 1.], dtype=float32)>)

Evaluate Model Performance

In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions.

print("Test  NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(test_preds_probs, Y_test)))
print("Train NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(train_preds_probs, Y_train)))
Test  NegLogLoss Score : 0.80
Train NegLogLoss Score : 0.82
from sklearn.metrics import accuracy_score

print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds_classes)))
print("Test  Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds_classes)))
Train Accuracy : 0.98
Test  Accuracy : 0.98
Sunny Solanki  Sunny Solanki

Share Views Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.