**Sonnet** is a deep learning library created on the top of **tensorflow** by deepmind. It let us construct deep neural networks very easily. It also provides the implementation of many commonly used neural network architectures like multi-layer perceptrons, resnet, etc. **Sonnet** is a high-level framework like keras but it does not provide a training framework to train models and users need to design it.

As a part of this tutorial, we'll explain how to get started with **Sonnet**. We'll be creating simple neural networks using **MLP()** network constructor provided by **Sonnet**. We'll be using small toy datasets available from scikit-learn for explanation purposes.

Below we have highlighted important sections of the tutorial to give an overview of the material that we have covered.

- Regression
- Load Dataset
- Normalize Data
- Create Neural Network
- Train Neural Network
- Make Predictions
- Evaluate Model Performance
- Train Network on Batches of Data
- Make Predictions in Batches
- Evaluate Model Performance

- Classification

**pip install dm-sonnet**

Below we have imported **Sonnet** and printed the version that we'll be using in our tutorial. We have also imported and printed the version of **tensorflow** as **Sonnet** is built on top of it.

In [1]:

```
import sonnet as snt
print("Sonnet Version : {}".format(snt.__version__))
```

In [2]:

```
import tensorflow as tf
print("Tensorflow Version : {}".format(tf.__version__))
```

In this section, we'll explain how we can create a simple neural network using **Sonnet** to solve regression tasks. We'll be using Boston housing data for our regression task.

In this section, we have loaded the Boston housing dataset available from scikit-learn. We have loaded data features in the variable named **X** and target values in the variable named **Y**. The target value is continuous value specifying median house price in 1000 dollars. We have then divided the dataset into the train (80%) and test (20%) sets. We have then converted numpy arrays to tensorflow tensors as the neural networks created through **Sonnet** will require tensorflow tensor as input.

In [216]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
tf.convert_to_tensor(X_test, dtype=tf.float32),\
tf.convert_to_tensor(Y_train, dtype=tf.float32),\
tf.convert_to_tensor(Y_test, dtype=tf.float32)
samples, features = X_train.shape
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[216]:

In [217]:

```
samples, features
```

Out[217]:

In this section, we have normalized our dataset. The data normalization brings the majority of the features of the dataset onto the same scale. If the features are on different scales and vary a lot in value then it can make optimization algorithms like gradient descent hard time to converge. Data normalization can help optimization algorithms converge faster.

To normalize our datasets, we have first calculated the mean and standard deviation of each feature of train data. Then we have subtracted mean and divided subtracted values by standard deviation for both train and test sets.

In [219]:

```
mean = tf.math.reduce_mean(X_train, axis=0)
std = tf.math.reduce_std(X_train, axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
```

The simplest way to create a neural network using **Sonnet** is by using readily available networks from **nets** module. It provides us with **MLP()** which lets us create multi-layer perceptrons by providing layer sizes as input. We'll be using **MLP()** constructor to create neural networks for our task in this tutorial.

**MLP(output_sizes=None,w_init=None,b_init=None,with_bias=None,activation=None,dropout_rate=None,activate_final=False, name="")**- This constructor takes layer sizes as list and creates neural network of linear layers from it.- The
**w_init**parameter accepts a function that can initialize weights of layers. - The
**b_init**parameter accepts a function that can initialize biases of layers. - The
**with_bias**takes the boolean value as input. If we set it to**True**then bias will be included else not. By default, it'll be set to**True**. - The
**activation**parameter accepts activation that needs to be applied to layers. By default, it's Relu. - The
**dropout_rate**accepts float in the range 0-1 specifying the dropout value for each layer. - The
**activate_final**parameter accepts boolean value specifying whether to activate final layer output or not.

- The

Below we have created a regressor for our task by calling constructor **MLP()**. We have asked it to create a neural network with layer sizes **[5,10,15,1]**. Then in the next cell, we have performed a forward pass on it by providing random data for verification purposes.

In [221]:

```
regressor = snt.nets.MLP(output_sizes=[5,10,15,1])
print(regressor)
```

In [222]:

```
preds = regressor(tf.random.uniform(X_train.shape))
preds[:5]
```

Out[222]:

In this section, we are actually training our neural network which we created earlier. We have first declared the number of epochs (**1000**) and learning rate (**0.001**). We have then initialized our gradient descent optimizer giving the learning rate that we'll be using for updating weights. We have then initialized **MeanSquaredError()** loss function which will be loss for our regression task. We'll be evaluating it at every epoch and calculate gradients of weights with respect to it which we'll later use to update weights.

We have then executed the training loop number of epoch time. We have wrapped the whole training loop inside of **tf.GradientTape()** context manager as it'll let us record gradients. Without it, gradients won't be recorded. During each training loop, we first perform forward pass through training data using the model to make predictions, then we calculate loss and gradients, and at last, we update weights using the optimizer. We use **GradientTape** object to calculate gradients of loss with respect to weights. We are printing the loss value every 100 epochs to check progress. We can notice from the loss value getting printed every 100 epochs that our model seems to be doing a good job.

In [223]:

```
epochs = 1000
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
mse_loss = tf.losses.MeanSquaredError()
for i in range(epochs):
with tf.GradientTape() as tape:
preds = regressor(X_train) ## Make Predictions
loss = mse_loss(Y_train, preds) ## Calculate Loss
params = regressor.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(loss))
```

In this section, we have made predictions on train and test datasets. We simply need to call the instance of **MLP** by giving data to it to make predictions.

In [224]:

```
train_preds = regressor(X_train)
train_preds[:5]
```

Out[224]:

In [148]:

```
test_preds = regressor(X_test)
test_preds[:5]
```

Out[148]:

In this section, we are evaluating the performance of our model. We have first calculated the mean squared error on both train and test predictions.

Then in the next cell, we have calculated **R^2 score** on both train and test predictions. The **R^2 score** is calculated for regression tasks and has values generally in the range **[0,1]**. The values near 1 are considered a good model. We can notice from our results that **r^2 score** is near to 1 for both train and test predictions.

If you want to learn about metrics like **r^2 score** and other metrics available from scikit-learn then please feel free to check our tutorial which covers the majority of metrics.

In [149]:

```
print("Test MSE Score : {:.2f}".format(mse_loss(Y_test, test_preds)))
print("Train MSE Score : {:.2f}".format(mse_loss(Y_train, train_preds)))
```

In [150]:

```
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(tf.squeeze(test_preds), Y_test)))
print("Train R^2 Score : {:.2f}".format(r2_score(tf.squeeze(train_preds), Y_train)))
```

Our previous training of a neural network considered the whole dataset. As our dataset is tiny and easily fits into the main memory, we can perform training by taking whole training data at a time. But in real life, there are situations where whole train data does not fit into the main memory of the computer. In those situations, we only bring a few samples of train data into the main memory and perform training on them. We cover the whole training data by bringing data into batches in the main memory.

To explain how we can perform training on data in batches, we'll treat our data as if it does not fit into the main memory. We'll then perform training on batches of data.

First, we have created a neural network with the same layer sizes as earlier using **MLP()** constructor. We have then initialized epochs (**500**), batch size (**32**), and learning rate (**0.001**). We have then initialized the optimizer by providing the learning rate. We have also initialized our mean squared error loss function. We are then executing our training loop number of epochs time.

During each epoch, we are dividing data into batches based on batch size. We are training a neural network on the batch of data at a time and updating weights based on the loss calculated on the batch of data. Other than including the logic of the batch, the rest of the code is almost the same as our previous training code.

In [151]:

```
regressor = snt.nets.MLP(output_sizes=[5,10,15,1])
preds = regressor(tf.random.uniform(X_train[:5].shape))
```

In [153]:

```
import numpy as np
epochs = 500
batch_size = 32
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
mse_loss = tf.losses.MeanSquaredError()
for i in range(epochs):
batches = tf.range((X_train.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X_train[start:end], Y_train[start:end] ## Single batch of data
with tf.GradientTape() as tape:
preds = regressor(X_batch) ## Make Predictions
loss = mse_loss(Y_batch, preds) ## Calculate Loss
params = regressor.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
losses.append(loss) ## Record Loss
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(tf.reduce_mean(tf.convert_to_tensor(losses))))
```

As we can not fit whole data into main memory for making predictions, we'll make predictions also on batches of data. We'll later combine predictions of all batches to form predictions of whole input data.

Below we have created a function that takes as input neural network, input data, and batch size as input. It then loops through input data taking a single batch of data based on batch size, makes predictions on it, and records predictions. We have then combined all predictions later on.

In [156]:

```
def MakePredictions(model, input_data, batch_size=32):
batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
```

In [157]:

```
train_preds = MakePredictions(regressor, X_train, 32)
train_preds = tf.squeeze(tf.concat(train_preds, axis=0))
test_preds = MakePredictions(regressor, X_test, 32)
test_preds = tf.squeeze(tf.concat(test_preds, axis=0))
```

In this section, we have evaluated the performance of our model by calculating **r^2 score** on train and test predictions.

In [158]:

```
print("Test MSE Score : {:.2f}".format(mse_loss(Y_test, test_preds)))
print("Train MSE Score : {:.2f}".format(mse_loss(Y_train, train_preds)))
```

In [159]:

```
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds, Y_test)))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds, Y_train)))
```

In this section, we'll explain how we can create simple neural networks using **Sonnet** to solve classification tasks. We'll be using a small dataset available from scikit-learn for explanation purposes. We'll be reusing the majority of our code from the regression section. Due to this, we won't include a detailed description of repeated code parts.

In this section, we have loaded the breast cancer dataset available from scikit-learn. The target values of the dataset are either **1** indicating malignant tumor or **0** indicating benign tumor. The features are various measurements of tumors. We have then divided the dataset into the train (80%) and test (20%) sets.

In [4]:

```
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
tf.convert_to_tensor(X_test, dtype=tf.float32),\
tf.convert_to_tensor(Y_train, dtype=tf.float32),\
tf.convert_to_tensor(Y_test, dtype=tf.float32)
samples, features = X_train.shape
classes = tf.unique(Y)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[4]:

In [239]:

```
samples, features, classes.y
```

Out[239]:

In this section, we have normalized our train and test datasets using mean and standard deviation calculated on features of train datasets.

In [240]:

```
mean = tf.math.reduce_mean(X_train, axis=0)
std = tf.math.reduce_std(X_train, axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
```

In this section, we have created a neural network using **MLP()** constructor available from **nets** module of **Sonnet**. We have designed the neural network with the layer sizes **[5,10,15,1]**.

In [241]:

```
classifier = snt.nets.MLP(output_sizes=[5,10,15,1])
print(classifier)
```

In [242]:

```
preds = classifier(tf.random.uniform(X_train.shape))
preds[:5]
```

Out[242]:

In this section, we have included logic to train our classification neural network. The code for this section is almost the same as that from the regression section with a few minor changes.

We have initialized a number of epochs to **1000** and the learning rate to **0.001**. We are using binary cross-entropy loss as our loss function for a binary classification task.

We are applying a sigmoid function to the output of the neural network to convert output as a probability in the range **[0,1]**. The **MLP()** constructor does not let us specify the activation function explicitly for the last layer hence we need to apply the sigmoid function to the output of the neural network. As our output of neural network after applying sigmoid function is probability in the range 0-1, we'll later include threshold to convert probability to actual prediction class.

In [246]:

```
epochs = 1000
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
binary_crossentropy_loss = tf.losses.BinaryCrossentropy()
for i in range(epochs):
with tf.GradientTape() as tape:
preds = classifier(X_train) ## Make Predictions
preds = tf.squeeze(preds)
preds = tf.sigmoid(preds)
loss = binary_crossentropy_loss(Y_train, preds) ## Calculate Loss
params = classifier.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
if i % 100 == 0: ## Print CrossEntropy every 100 epochs
print("Binary Cross Entropy : {:.2f}".format(loss))
```

In this section, we are making predictions on train and test sets. After making predictions, we are applying a sigmoid function to the output of the neural network. We have then set the threshold at 0.5, classifying values less than it as class 0 (benign tumor) and values greater than it as class 1 (malignant tumor).

In [250]:

```
test_preds = classifier(X_test)
test_preds_probs = tf.sigmoid(tf.squeeze(test_preds))
test_preds_classes = tf.cast((test_preds_probs > 0.5), dtype=tf.float32)
train_preds = classifier(X_train)
train_preds_probs = tf.sigmoid(tf.squeeze(train_preds))
train_preds_classes = tf.cast((train_preds_probs > 0.5), dtype=tf.float32)
```

In this section, we have evaluated the performance of our classification model by calculating the accuracy of train and test sets.

In [251]:

```
print("Test NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(test_preds_probs, Y_test)))
print("Train NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(train_preds_probs, Y_train)))
```

In [252]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds_classes)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds_classes)))
```

In this section, we have included logic to train our classification model on batches of data. The logic for training data in batches is almost the same as that from the regression section with minor changes that are loss function and application of a sigmoid function to the output of the neural network.

In [7]:

```
classifier = snt.nets.MLP(output_sizes=[5,10,15,1])
preds = classifier(tf.random.uniform(X_train.shape))
```

In [198]:

```
import numpy as np
epochs = 500
batch_size = 32
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
binary_crossentropy_loss = tf.losses.BinaryCrossentropy()
for i in range(epochs):
batches = tf.range((X_train.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X_train[start:end], Y_train[start:end] ## Single batch of data
with tf.GradientTape() as tape:
preds = classifier(X_train)
preds = tf.squeeze(preds)
preds = tf.sigmoid(preds)
loss = binary_crossentropy_loss(Y_train, preds) ## Calculate Loss
params = classifier.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
losses.append(loss) ## Record Loss
if i % 100 == 0: ## Print CrossEntropy every 100 epochs
print("Binary Cross Entropy : {:.2f}".format(tf.math.reduce_mean(tf.convert_to_tensor(losses))))
```

In this section, we have made predictions on train and test sets by giving data to the model in batches. We have then combined predictions of batches.

In [199]:

```
def MakePredictions(model, input_data, batch_size=32):
batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
batch_preds = model(X_batch)
preds.append(tf.sigmoid(tf.squeeze(batch_preds)))
return preds
```

In [200]:

```
train_preds = MakePredictions(classifier, X_train, 32)
train_preds_probs = tf.squeeze(tf.concat(train_preds, axis=0))
train_preds_classes = tf.cast((train_preds_probs > 0.5), dtype=tf.float32)
test_preds = MakePredictions(classifier, X_test, 32)
test_preds_probs = tf.squeeze(tf.concat(test_preds, axis=0))
test_preds_classes = tf.cast((test_preds_probs > 0.5), dtype=tf.float32)
```

In [201]:

```
test_preds_classes[:5], Y_test[:5]
```

Out[201]:

In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions.

In [202]:

```
print("Test NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(test_preds_probs, Y_test)))
print("Train NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(train_preds_probs, Y_train)))
```

In [203]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds_classes)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds_classes)))
```

This ends our small tutorial explaining how we can create simple neural networks using **sonnet**. Please feel free to let us know your views in the comments section.

- Haiku: Guide to Create Multi-Layer Perceptrons using JAX
- Guide to Create Neural Networks using High-level JAX API
- Create Simple PyTorch Neural Networks using 'torch.nn' Module
- Guide to Create Simple Neural Networks using PyTorch
- Guide to Create Simple Neural Networks using JAX
- Scikeras: Give Scikit-Learn like API to your Keras Neural Networks
- Skorch: Give Scikit-Learn like API to your PyTorch Neural Networks

**Thank You** for visiting our website. If you like our work, please support us so that we can keep on creating new tutorials/blogs on interesting topics (like AI, ML, Data Science, Python, Digital Marketing, SEO, etc.) that can help people learn new things faster. You can support us by clicking on the **Coffee** button at the bottom right corner. We would appreciate even if you can give a thumbs-up to our article in the comments section below.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

Sunny Solanki