Sonnet is a deep learning library built on top of Tensorflow by Google DeepMind to simplify the development of deep neural networks. Sonnet let us design neural networks like Keras (Sequential API) and PyTorch (Extending sonnet.Module class). Sonnet simplifies and speeds up neural network design and lets developers/researchers experiment more cycles. We have already covered a simple tutorial explaining how to create fully-connect networks using Sonnet. Please feel free to check the below tutorial if you are looking for it. It'll also help as a background for this tutorial.
As a part of this tutorial, we'll be explaining how we can create convolutional neural networks (CNNs) using Sonnet. We'll explain different ways of building CNNs and training them with different optimizers. This tutorial won't go into details of the neural network like layers, activation functions, optimizers, etc. We expect that reader has background on these things to follow along. The tutorial is designed so that individuals can start using CNNs for their tasks using Sonnet. If you want to know the theory behind CNNs and their pros/cons then please feel free to check our blog from the below link.
Below, we have highlighted important sections of our tutorial to give an overview of the material covered.
Below, we have imported Sonnet and tensorflow libraries. We have also printed the version of both that we'll be using in our tutorial.
import sonnet as snt
print("Sonnet Version : {}".format(snt.__version__))
import tensorflow as tf
print("Tensorflow Version : {}".format(tf.__version__))
In this section, we'll explain how we can create a simple convolutional neural network of 2 convolution layers to solve multi-class classification tasks. We'll be using the fashion MNIST dataset available from keras for our purpose which has images for 10 different fashion items.
In this section, we have loaded the Fashion MNIST dataset available from keras. It has grayscale images of shape (28,28) for 10 different fashion items. The dataset is already divided into the train (60k images) and test (10k images) sets. After loading datasets, we have converted them to tensorflow tensor as Sonnet networks require tensors as input. We have then reshaped the dataset and introduced one extra dimension at the end of images to transform them from shape (28,28) to (28,28,1). The reason behind doing this transformation is that convolution layers work on channels of input images and transform them. The color or RGB images have 3 channels (Red, Green, and Blue) whereas grayscale images have no channel or we can say has only one channel as it is just a shade of black and white. We have introduced one extra dimension at the end of tensors to treat it like a channel dimension for grayscale images. After adding an extra dimension, we have also divided images by float value 255 to bring all values of tensors in the range [0,1]. By default, tensors have values in the range [0,255]. This transformation of values in the range [0,1] will help the optimization algorithm converge faster during training.
from tensorflow import keras
from sklearn.model_selection import train_test_split
(X_train, Y_train), (X_test, Y_test) = keras.datasets.fashion_mnist.load_data()
X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
tf.convert_to_tensor(X_test, dtype=tf.float32),\
tf.convert_to_tensor(Y_train, dtype=tf.float32),\
tf.convert_to_tensor(Y_test, dtype=tf.float32)
X_train, X_test = tf.reshape(X_train, (-1,28,28,1)), tf.reshape(X_test, (-1,28,28,1))
X_train, X_test = X_train/255.0, X_test/255.0
classes = tf.unique(Y_train)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
In this section, we have created a CNN for our multi-class classification task. We have explained two different ways of creating a CNN using Sonnet. We'll explain how both work as well.
In this section, we have created a CNN using Sequential API of Sonnet. Sonnet has a class named Sequential that accepts a list of layers as input and creates a neural network from it. It'll then apply layers in sequence in which they were given as input. This way of creating a neural network is almost the same as that of Keras Sequential API.
Our CNN consists of 2 convolution layers. The first convolution layer has 32 output channels and a kernel size of (3,3). The second convolution layer has 16 output channels and a kernel size of (3,3). Both convolution layer has padding set to 'SAME' which will ensure that the height and width of the image are the same after applying convolution operation. It'll apply padding of zeros to maintain image dimensions. We have used Relu (Rectified Linear Unit) activation function after both convolution layers. After the application of both convolution layers, we have flattened the output using flatten layer to give it to a linear/dense layer. The flattened output is given to a linear/dense layer that has 10 output units. Our dataset has 10 different categories of fashion images, hence we have chosen output units for the last layer to 10. At last, we have applied softmax activation function to the output of the linear layer. The softmax activation function will map the 10 output values of the linear layer to probabilities in the range [0,1] and the sum of all 10 values will be 1. The 10 values for each sample will be mapped to range [0,1] and the sum of probabilities for each sample of data will be 1.
Our input data has shape (n_samples,28,28,1). The first convolution layer will transform data from shape (n_samples,28,28,1) to (n_samples,28,28,32). The second convolution layer will transform data from shape (n_samples,28,28,32) to (n_samples,28,28,16). The flatten layer will flatten data from shape (n_samples,28,28,16) to (n_samples,28 x 28 x 16) = (n_samples,12544). Then at last linear layer will transform shape from (n_samples,12544) to (n_samples,10).
After creating a CNN with Sequential API, we can simply call it by providing input data to perform a forward pass through it to make predictions. We have performed a forward pass through the network by giving a few data samples and printed output in the next cell below. The model parameters are available through attribute trainable_variables. We have printed the shapes of model parameters as well in the cell below.
cnn = snt.Sequential([
snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME"),
tf.nn.relu,
snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME"),
tf.nn.relu,
snt.Flatten(),
snt.Linear(10),
tf.nn.softmax,
])
cnn
cnn(X_train[:5])
for tensor in cnn.trainable_variables:
print("{} : {}".format(tensor.name, tensor.shape))
In this section, we have explained the second way of creating a CNN using Sonnet. Here, we'll create a CNN by extending sonnet.Module class. This approach almost seems like PyTorch approach of creating neural networks. We have created a CNN with the same layers as we had created with Sequential API in the previous section.
In order to create a CNN this way, we need to implement two methods.
After defining CNN, we have also performed a forward pass through it with a few data samples for verification purposes. We have also printed the shapes of network parameters later.
class CNN(snt.Module):
def __init__(self,name="CNN"):
super(CNN, self).__init__(name=name)
self.conv1 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME")
self.conv2 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME")
self.flatten = snt.Flatten()
self.linear = snt.Linear(10)
def __call__(self, X_batch):
x = tf.nn.relu(self.conv1(X_batch))
x = tf.nn.relu(self.conv2(x))
x = self.flatten(x)
x = self.linear(x)
return tf.nn.softmax(x)
cnn = CNN()
cnn(X_train[:5])
for tensor in cnn.trainable_variables:
print("{} : {}".format(tensor.name, tensor.shape))
In this section, we are training our CNN. In order to train the network, we have defined a function that will be used to train the neural network.
The function takes data features (X), target values (Y), number of epochs, and batch size as input. It then executes the training loop number of epochs time. Each time, it calculates the start and end indexes of batches of data. It then divides data into batches using these indexes and loops through data in batches. For each batch of data, it performs a forward pass through the network to make predictions. It then calculates loss using predictions and actual target values. Both of these operations are done inside tf.GradientTape() context manager which will record the gradient of the loss with respect to model parameters. We then retrieve the model parameters using trainable_variables attribute. We give loss value and model parameters to gradient() method of GradientTape which will calculate the gradients of loss with respect to parameters and return them. We'll then call apply() method of the optimizer to update model parameters with gradients. We are recording a loss for each batch and printing the average loss for each epoch.
def TrainModelInBatches(X, Y, epochs, batch_size=32):
for i in range(1, epochs+1):
batches = tf.range((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
with tf.GradientTape() as tape:
preds = cnn(X_batch) ## Make Predictions on Batch of Data
loss = loss_func(Y_batch, preds) ## Calculate Loss
params = cnn.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
losses.append(loss) ## Record Loss
print("CrossEntropyLoss : {:.3f}".format(tf.math.reduce_mean(tf.convert_to_tensor(losses))))
Now, we are actually training our CNN by initializing necessary variables and calling a function defined in the previous cell. We have initialized the learning rate to 0.001, epochs to 25, and batch size to 256. Then, we have initialized SGD optimizer with learning rate from sonnet.optimizers module.We have initialized CategoricalCrossentropy loss for our task as well. We'll be using cross entropy loss for our purposes. It's a commonly used loss function for multi-class classification tasks. At last, we have called our training function with the necessary parameters to train CNN. Please make a note that we are giving target values as one-hot encoded using to_categorical() function of keras. Our loss function requires target values to be one-hot encoded to calculate loss value. We can notice from the loss value getting printed after each epoch that our model seems to be doing a good job.
learning_rate = 1/1e3
epochs = 25
batch_size=256
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
loss_func = tf.losses.CategoricalCrossentropy()
TrainModelInBatches(X_train, tf.keras.utils.to_categorical(Y_train), epochs, batch_size)
In this section, we are making predictions on train and test datasets using our trained CNN model. The function loop through input data in batches and makes predictions for each batch of data. It then combines predictions of all batches and returns them.
As the output of our neural network is 10 probabilities per sample, we need to include logic to retrieve the target class from these 10 probabilities. To do that, we'll retrieve the index of highest probability from 10 probabilities and predict that index as the target class. We'll need to do that for each data sample. To execute this logic, we have used argmax() method on the output of the neural network to predict the actual target class for each data sample.
def MakePredictions(input_data, batch_size=32):
batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
if X_batch.shape[0] != 0:
preds.append(cnn(X_batch))
return preds
test_preds = MakePredictions(X_test, batch_size=batch_size)
test_preds = tf.concat(test_preds, axis=0) ## Combine predictions of all batches
test_preds = tf.argmax(test_preds, axis=1)
train_preds = MakePredictions(X_train, batch_size=batch_size)
train_preds = tf.concat(train_preds, axis=0) ## Combine predictions of all batches
train_preds = tf.argmax(train_preds, axis=1)
test_preds[:5], train_preds[:5]
In this section, we are evaluating the performance of our neural network by calculating the accuracy of train and test predictions. We have also calculated a classification report on test data which has information like precision, recall, and f1-score for each target class. To calculate performance metrics, we have used functions available from scikit-learn.
If you want to learn about various functions available through scikit-learn for various ML metrics then please feel free to check the below tutorial that covers the majority of them in detail.
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
from sklearn.metrics import classification_report
print("Test Classification Report ")
print(classification_report(Y_test, test_preds))
In this section, we are training our CNN again but this time we have used Adam optimizer instead of SGD optimizer. All other parameter settings are exactly the same as our previous SGD training. We have done a comparison here to check whether Adam helps improve performance.
learning_rate = 1/1e4
epochs = 25
batch_size=256
optimizer = snt.optimizers.Adam(learning_rate=learning_rate)
loss_func = tf.losses.SparseCategoricalCrossentropy()
TrainModelInBatches(X_train, Y_train, epochs, batch_size)
In this section, we have made predictions on our train and test datasets using CNN trained with Adam optimizer.
test_preds = MakePredictions(X_test, batch_size=batch_size)
test_preds = tf.concat(test_preds, axis=0) ## Combine predictions of all batches
test_preds = tf.argmax(test_preds, axis=1)
train_preds = MakePredictions(X_train, batch_size=batch_size)
train_preds = tf.concat(train_preds, axis=0) ## Combine predictions of all batches
train_preds = tf.argmax(train_preds, axis=1)
test_preds[:5], train_preds[:5]
In this section, we have evaluated the performance of our CNN by calculating the accuracy of train and test predictions. We have also calculated the classification report for test predictions. From the results, we can notice that performance is improved.
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
from sklearn.metrics import classification_report
print("Test Classification Report ")
print(classification_report(Y_test, test_preds))
In our example above, we used grayscale images. We introduced the channels dimension at the end to train CNN. As we said earlier the RGB or color images has 3 channels. There are two different ways to represent channels in the multi-dimensional array when representing images.
In our example, we had kept the channel dimension at last. But the developer can face situations where the data has channels first format. To handle those situations, Conv2D layer of Sonnet has parameter named data_format. The default value of this parameter is NHWC.
If we have data where channel details are present at the beginning then we can specify the value of data_format parameter as NCHW and Conv2D layer will work fine with that format.
Below, we have explained with examples how we can use different data formats. If we don't handle them properly then it can impact the results.
conv1 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME")
conv2 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME")
preds1 = conv1(tf.random.normal((50,28,28,1)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.trainable_variables[1].shape))
print("Weights of Second Conv Layer : {}".format(conv2.trainable_variables[1].shape))
print("\nInput Shape : {}".format((50,28,28,1)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
conv1 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME")
conv2 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME")
preds1 = conv1(tf.random.normal((50,1,28,28)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.trainable_variables[1].shape))
print("Weights of Second Conv Layer : {}".format(conv2.trainable_variables[1].shape))
print("\nInput Shape : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
conv1 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME", data_format="NCHW")
conv2 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME", data_format="NCHW")
preds1 = conv1(tf.random.normal((50,1,28,28)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.trainable_variables[1].shape))
print("Weights of Second Conv Layer : {}".format(conv2.trainable_variables[1].shape))
print("\nInput Shape : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
This ends our small tutorial explaining how we can create convolutional neural networks (CNN) using Sonnet. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to