Sonnet is a deep learning library created on the top of tensorflow by deepmind. It let us construct deep neural networks very easily. It also provides the implementation of many commonly used neural network architectures like multi-layer perceptrons, resnet, etc. Sonnet is a high-level framework like keras but it does not provide a training framework to train models and users need to design it.
As a part of this tutorial, we'll explain how to get started with Sonnet. We'll be creating simple neural networks using MLP() network constructor provided by Sonnet. We'll be using small toy datasets available from scikit-learn for explanation purposes.
Below we have highlighted important sections of the tutorial to give an overview of the material that we have covered.
Below we have imported Sonnet and printed the version that we'll be using in our tutorial. We have also imported and printed the version of tensorflow as Sonnet is built on top of it.
import sonnet as snt
print("Sonnet Version : {}".format(snt.__version__))
import tensorflow as tf
print("Tensorflow Version : {}".format(tf.__version__))
In this section, we'll explain how we can create a simple neural network using Sonnet to solve regression tasks. We'll be using Boston housing data for our regression task.
In this section, we have loaded the Boston housing dataset available from scikit-learn. We have loaded data features in the variable named X and target values in the variable named Y. The target value is continuous value specifying median house price in 1000 dollars. We have then divided the dataset into the train (80%) and test (20%) sets. We have then converted numpy arrays to tensorflow tensors as the neural networks created through Sonnet will require tensorflow tensor as input.
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_boston(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=123)
X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
tf.convert_to_tensor(X_test, dtype=tf.float32),\
tf.convert_to_tensor(Y_train, dtype=tf.float32),\
tf.convert_to_tensor(Y_test, dtype=tf.float32)
samples, features = X_train.shape
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
samples, features
In this section, we have normalized our dataset. The data normalization brings the majority of the features of the dataset onto the same scale. If the features are on different scales and vary a lot in value then it can make optimization algorithms like gradient descent hard time to converge. Data normalization can help optimization algorithms converge faster.
To normalize our datasets, we have first calculated the mean and standard deviation of each feature of train data. Then we have subtracted mean and divided subtracted values by standard deviation for both train and test sets.
mean = tf.math.reduce_mean(X_train, axis=0)
std = tf.math.reduce_std(X_train, axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
The simplest way to create a neural network using Sonnet is by using readily available networks from nets module. It provides us with MLP() which lets us create multi-layer perceptrons by providing layer sizes as input. We'll be using MLP() constructor to create neural networks for our task in this tutorial.
Below we have created a regressor for our task by calling constructor MLP(). We have asked it to create a neural network with layer sizes [5,10,15,1]. Then in the next cell, we have performed a forward pass on it by providing random data for verification purposes.
regressor = snt.nets.MLP(output_sizes=[5,10,15,1])
print(regressor)
preds = regressor(tf.random.uniform(X_train.shape))
preds[:5]
In this section, we are actually training our neural network which we created earlier. We have first declared the number of epochs (1000) and learning rate (0.001). We have then initialized our gradient descent optimizer giving the learning rate that we'll be using for updating weights. We have then initialized MeanSquaredError() loss function which will be loss for our regression task. We'll be evaluating it at every epoch and calculate gradients of weights with respect to it which we'll later use to update weights.
We have then executed the training loop number of epoch time. We have wrapped the whole training loop inside of tf.GradientTape() context manager as it'll let us record gradients. Without it, gradients won't be recorded. During each training loop, we first perform forward pass through training data using the model to make predictions, then we calculate loss and gradients, and at last, we update weights using the optimizer. We use GradientTape object to calculate gradients of loss with respect to weights. We are printing the loss value every 100 epochs to check progress. We can notice from the loss value getting printed every 100 epochs that our model seems to be doing a good job.
epochs = 1000
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
mse_loss = tf.losses.MeanSquaredError()
for i in range(epochs):
with tf.GradientTape() as tape:
preds = regressor(X_train) ## Make Predictions
loss = mse_loss(Y_train, preds) ## Calculate Loss
params = regressor.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(loss))
In this section, we have made predictions on train and test datasets. We simply need to call the instance of MLP by giving data to it to make predictions.
train_preds = regressor(X_train)
train_preds[:5]
test_preds = regressor(X_test)
test_preds[:5]
In this section, we are evaluating the performance of our model. We have first calculated the mean squared error on both train and test predictions.
Then in the next cell, we have calculated R^2 score on both train and test predictions. The R^2 score is calculated for regression tasks and has values generally in the range [0,1]. The values near 1 are considered a good model. We can notice from our results that r^2 score is near to 1 for both train and test predictions.
If you want to learn about metrics like r^2 score and other metrics available from scikit-learn then please feel free to check our tutorial which covers the majority of metrics.
print("Test MSE Score : {:.2f}".format(mse_loss(Y_test, test_preds)))
print("Train MSE Score : {:.2f}".format(mse_loss(Y_train, train_preds)))
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(tf.squeeze(test_preds), Y_test)))
print("Train R^2 Score : {:.2f}".format(r2_score(tf.squeeze(train_preds), Y_train)))
Our previous training of a neural network considered the whole dataset. As our dataset is tiny and easily fits into the main memory, we can perform training by taking whole training data at a time. But in real life, there are situations where whole train data does not fit into the main memory of the computer. In those situations, we only bring a few samples of train data into the main memory and perform training on them. We cover the whole training data by bringing data into batches in the main memory.
To explain how we can perform training on data in batches, we'll treat our data as if it does not fit into the main memory. We'll then perform training on batches of data.
First, we have created a neural network with the same layer sizes as earlier using MLP() constructor. We have then initialized epochs (500), batch size (32), and learning rate (0.001). We have then initialized the optimizer by providing the learning rate. We have also initialized our mean squared error loss function. We are then executing our training loop number of epochs time.
During each epoch, we are dividing data into batches based on batch size. We are training a neural network on the batch of data at a time and updating weights based on the loss calculated on the batch of data. Other than including the logic of the batch, the rest of the code is almost the same as our previous training code.
regressor = snt.nets.MLP(output_sizes=[5,10,15,1])
preds = regressor(tf.random.uniform(X_train[:5].shape))
import numpy as np
epochs = 500
batch_size = 32
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
mse_loss = tf.losses.MeanSquaredError()
for i in range(epochs):
batches = tf.range((X_train.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X_train[start:end], Y_train[start:end] ## Single batch of data
with tf.GradientTape() as tape:
preds = regressor(X_batch) ## Make Predictions
loss = mse_loss(Y_batch, preds) ## Calculate Loss
params = regressor.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
losses.append(loss) ## Record Loss
if i % 100 == 0: ## Print MSE every 100 epochs
print("MSE : {:.2f}".format(tf.reduce_mean(tf.convert_to_tensor(losses))))
As we can not fit whole data into main memory for making predictions, we'll make predictions also on batches of data. We'll later combine predictions of all batches to form predictions of whole input data.
Below we have created a function that takes as input neural network, input data, and batch size as input. It then loops through input data taking a single batch of data based on batch size, makes predictions on it, and records predictions. We have then combined all predictions later on.
def MakePredictions(model, input_data, batch_size=32):
batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
preds.append(model(X_batch))
return preds
train_preds = MakePredictions(regressor, X_train, 32)
train_preds = tf.squeeze(tf.concat(train_preds, axis=0))
test_preds = MakePredictions(regressor, X_test, 32)
test_preds = tf.squeeze(tf.concat(test_preds, axis=0))
In this section, we have evaluated the performance of our model by calculating r^2 score on train and test predictions.
print("Test MSE Score : {:.2f}".format(mse_loss(Y_test, test_preds)))
print("Train MSE Score : {:.2f}".format(mse_loss(Y_train, train_preds)))
from sklearn.metrics import r2_score
print("Test R^2 Score : {:.2f}".format(r2_score(test_preds, Y_test)))
print("Train R^2 Score : {:.2f}".format(r2_score(train_preds, Y_train)))
In this section, we'll explain how we can create simple neural networks using Sonnet to solve classification tasks. We'll be using a small dataset available from scikit-learn for explanation purposes. We'll be reusing the majority of our code from the regression section. Due to this, we won't include a detailed description of repeated code parts.
In this section, we have loaded the breast cancer dataset available from scikit-learn. The target values of the dataset are either 1 indicating malignant tumor or 0 indicating benign tumor. The features are various measurements of tumors. We have then divided the dataset into the train (80%) and test (20%) sets.
from sklearn import datasets
from sklearn.model_selection import train_test_split
X, Y = datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, stratify=Y, random_state=123)
X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
tf.convert_to_tensor(X_test, dtype=tf.float32),\
tf.convert_to_tensor(Y_train, dtype=tf.float32),\
tf.convert_to_tensor(Y_test, dtype=tf.float32)
samples, features = X_train.shape
classes = tf.unique(Y)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
samples, features, classes.y
In this section, we have normalized our train and test datasets using mean and standard deviation calculated on features of train datasets.
mean = tf.math.reduce_mean(X_train, axis=0)
std = tf.math.reduce_std(X_train, axis=0)
X_train = (X_train - mean) / std
X_test = (X_test - mean) / std
In this section, we have created a neural network using MLP() constructor available from nets module of Sonnet. We have designed the neural network with the layer sizes [5,10,15,1].
classifier = snt.nets.MLP(output_sizes=[5,10,15,1])
print(classifier)
preds = classifier(tf.random.uniform(X_train.shape))
preds[:5]
In this section, we have included logic to train our classification neural network. The code for this section is almost the same as that from the regression section with a few minor changes.
We have initialized a number of epochs to 1000 and the learning rate to 0.001. We are using binary cross-entropy loss as our loss function for a binary classification task.
We are applying a sigmoid function to the output of the neural network to convert output as a probability in the range [0,1]. The MLP() constructor does not let us specify the activation function explicitly for the last layer hence we need to apply the sigmoid function to the output of the neural network. As our output of neural network after applying sigmoid function is probability in the range 0-1, we'll later include threshold to convert probability to actual prediction class.
epochs = 1000
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
binary_crossentropy_loss = tf.losses.BinaryCrossentropy()
for i in range(epochs):
with tf.GradientTape() as tape:
preds = classifier(X_train) ## Make Predictions
preds = tf.squeeze(preds)
preds = tf.sigmoid(preds)
loss = binary_crossentropy_loss(Y_train, preds) ## Calculate Loss
params = classifier.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
if i % 100 == 0: ## Print CrossEntropy every 100 epochs
print("Binary Cross Entropy : {:.2f}".format(loss))
In this section, we are making predictions on train and test sets. After making predictions, we are applying a sigmoid function to the output of the neural network. We have then set the threshold at 0.5, classifying values less than it as class 0 (benign tumor) and values greater than it as class 1 (malignant tumor).
test_preds = classifier(X_test)
test_preds_probs = tf.sigmoid(tf.squeeze(test_preds))
test_preds_classes = tf.cast((test_preds_probs > 0.5), dtype=tf.float32)
train_preds = classifier(X_train)
train_preds_probs = tf.sigmoid(tf.squeeze(train_preds))
train_preds_classes = tf.cast((train_preds_probs > 0.5), dtype=tf.float32)
In this section, we have evaluated the performance of our classification model by calculating the accuracy of train and test sets.
print("Test NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(test_preds_probs, Y_test)))
print("Train NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(train_preds_probs, Y_train)))
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds_classes)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds_classes)))
In this section, we have included logic to train our classification model on batches of data. The logic for training data in batches is almost the same as that from the regression section with minor changes that are loss function and application of a sigmoid function to the output of the neural network.
classifier = snt.nets.MLP(output_sizes=[5,10,15,1])
preds = classifier(tf.random.uniform(X_train.shape))
import numpy as np
epochs = 500
batch_size = 32
learning_rate = 0.001
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
binary_crossentropy_loss = tf.losses.BinaryCrossentropy()
for i in range(epochs):
batches = tf.range((X_train.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X_train[start:end], Y_train[start:end] ## Single batch of data
with tf.GradientTape() as tape:
preds = classifier(X_train)
preds = tf.squeeze(preds)
preds = tf.sigmoid(preds)
loss = binary_crossentropy_loss(Y_train, preds) ## Calculate Loss
params = classifier.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
losses.append(loss) ## Record Loss
if i % 100 == 0: ## Print CrossEntropy every 100 epochs
print("Binary Cross Entropy : {:.2f}".format(tf.math.reduce_mean(tf.convert_to_tensor(losses))))
In this section, we have made predictions on train and test sets by giving data to the model in batches. We have then combined predictions of batches.
def MakePredictions(model, input_data, batch_size=32):
batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
batch_preds = model(X_batch)
preds.append(tf.sigmoid(tf.squeeze(batch_preds)))
return preds
train_preds = MakePredictions(classifier, X_train, 32)
train_preds_probs = tf.squeeze(tf.concat(train_preds, axis=0))
train_preds_classes = tf.cast((train_preds_probs > 0.5), dtype=tf.float32)
test_preds = MakePredictions(classifier, X_test, 32)
test_preds_probs = tf.squeeze(tf.concat(test_preds, axis=0))
test_preds_classes = tf.cast((test_preds_probs > 0.5), dtype=tf.float32)
test_preds_classes[:5], Y_test[:5]
In this section, we have evaluated the performance of our model by calculating the accuracy of train and test predictions.
print("Test NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(test_preds_probs, Y_test)))
print("Train NegLogLoss Score : {:.2f}".format(binary_crossentropy_loss(train_preds_probs, Y_train)))
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.2f}".format(accuracy_score(Y_train, train_preds_classes)))
print("Test Accuracy : {:.2f}".format(accuracy_score(Y_test, test_preds_classes)))
This ends our small tutorial explaining how we can create simple neural networks using sonnet. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to