**Sonnet** is a deep learning library built on top of **Tensorflow** by Google DeepMind to simplify the development of deep neural networks. **Sonnet** let us design neural networks like **Keras** (**Sequential** API) and **PyTorch** (Extending **sonnet.Module** class). **Sonnet** simplifies and speeds up neural network design and lets developers/researchers experiment more cycles. We have already covered a simple tutorial explaining how to create fully-connect networks using **Sonnet**. Please feel free to check the below tutorial if you are looking for it. It'll also help as a background for this tutorial.

As a part of this tutorial, we'll be explaining how we can create convolutional neural networks (CNNs) using **Sonnet**. We'll explain different ways of building CNNs and training them with different optimizers. This tutorial won't go into details of the neural network like layers, activation functions, optimizers, etc. We expect that reader has background on these things to follow along. The tutorial is designed so that individuals can start using CNNs for their tasks using **Sonnet**. If you want to know the theory behind CNNs and their pros/cons then please feel free to check our blog from the below link.

Below, we have highlighted important sections of our tutorial to give an overview of the material covered.

- Simple Convolutional Neural Network
- Load Dataset
- Create CNN
- CNN using Sequential API
- CNN by Extending sonnet.Module Class

- Train Model (SGD)
- Make Predictions
- Evaluate Model Performance
- Train Model (Adam)
- Make Predictions
- Evaluate Model Performance

- Guide to Handle Channels First vs Channels Last

**pip install dm-sonnet**

Below, we have imported **Sonnet** and **tensorflow** libraries. We have also printed the version of both that we'll be using in our tutorial.

In [2]:

```
import sonnet as snt
print("Sonnet Version : {}".format(snt.__version__))
```

In [3]:

```
import tensorflow as tf
print("Tensorflow Version : {}".format(tf.__version__))
```

In this section, we'll explain how we can create a simple convolutional neural network of 2 convolution layers to solve multi-class classification tasks. We'll be using the fashion MNIST dataset available from keras for our purpose which has images for 10 different fashion items.

In this section, we have loaded the Fashion MNIST dataset available from keras. It has grayscale images of shape **(28,28)** for 10 different fashion items. The dataset is already divided into the train (60k images) and test (10k images) sets. After loading datasets, we have converted them to **tensorflow tensor** as **Sonnet** networks require **tensors** as input. We have then reshaped the dataset and introduced one extra dimension at the end of images to transform them from shape **(28,28)** to **(28,28,1)**. The reason behind doing this transformation is that convolution layers work on channels of input images and transform them. The color or RGB images have 3 channels (Red, Green, and Blue) whereas grayscale images have no channel or we can say has only one channel as it is just a shade of black and white. We have introduced one extra dimension at the end of tensors to treat it like a channel dimension for grayscale images. After adding an extra dimension, we have also divided images by float value 255 to bring all values of tensors in the range **[0,1]**. By default, tensors have values in the range **[0,255]**. This transformation of values in the range **[0,1]** will help the optimization algorithm converge faster during training.

In [4]:

```
from tensorflow import keras
from sklearn.model_selection import train_test_split
(X_train, Y_train), (X_test, Y_test) = keras.datasets.fashion_mnist.load_data()
X_train, X_test, Y_train, Y_test = tf.convert_to_tensor(X_train, dtype=tf.float32),\
tf.convert_to_tensor(X_test, dtype=tf.float32),\
tf.convert_to_tensor(Y_train, dtype=tf.float32),\
tf.convert_to_tensor(Y_test, dtype=tf.float32)
X_train, X_test = tf.reshape(X_train, (-1,28,28,1)), tf.reshape(X_test, (-1,28,28,1))
X_train, X_test = X_train/255.0, X_test/255.0
classes = tf.unique(Y_train)
X_train.shape, X_test.shape, Y_train.shape, Y_test.shape
```

Out[4]:

In this section, we have created a CNN for our multi-class classification task. We have explained two different ways of creating a CNN using **Sonnet**. We'll explain how both work as well.

In this section, we have created a CNN using **Sequential** API of **Sonnet**. **Sonnet** has a class named **Sequential** that accepts a list of layers as input and creates a neural network from it. It'll then apply layers in sequence in which they were given as input. This way of creating a neural network is almost the same as that of **Keras Sequential** API.

Our CNN consists of 2 convolution layers. The first convolution layer has **32** output channels and a kernel size of **(3,3)**. The second convolution layer has **16** output channels and a kernel size of **(3,3)**. Both convolution layer has padding set to **'SAME'** which will ensure that the height and width of the image are the same after applying convolution operation. It'll apply padding of zeros to maintain image dimensions. We have used **Relu (Rectified Linear Unit)** activation function after both convolution layers. After the application of both convolution layers, we have flattened the output using flatten layer to give it to a linear/dense layer. The flattened output is given to a linear/dense layer that has 10 output units. Our dataset has 10 different categories of fashion images, hence we have chosen output units for the last layer to 10. At last, we have applied **softmax** activation function to the output of the linear layer. The **softmax** activation function will map the 10 output values of the linear layer to probabilities in the range **[0,1]** and the sum of all 10 values will be 1. The 10 values for each sample will be mapped to range **[0,1]** and the sum of probabilities for each sample of data will be 1.

Our input data has shape **(n_samples,28,28,1)**. The first convolution layer will transform data from shape **(n_samples,28,28,1)** to **(n_samples,28,28,32)**. The second convolution layer will transform data from shape **(n_samples,28,28,32)** to **(n_samples,28,28,16)**. The flatten layer will flatten data from shape **(n_samples,28,28,16)** to **(n_samples,28 x 28 x 16) = (n_samples,12544)**. Then at last linear layer will transform shape from **(n_samples,12544)** to **(n_samples,10)**.

After creating a CNN with **Sequential** API, we can simply call it by providing input data to perform a forward pass through it to make predictions. We have performed a forward pass through the network by giving a few data samples and printed output in the next cell below. The model parameters are available through attribute **trainable_variables**. We have printed the shapes of model parameters as well in the cell below.

In [5]:

```
cnn = snt.Sequential([
snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME"),
tf.nn.relu,
snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME"),
tf.nn.relu,
snt.Flatten(),
snt.Linear(10),
tf.nn.softmax,
])
cnn
```

Out[5]:

In [6]:

```
cnn(X_train[:5])
```

Out[6]:

In [7]:

```
for tensor in cnn.trainable_variables:
print("{} : {}".format(tensor.name, tensor.shape))
```

In this section, we have explained the second way of creating a CNN using **Sonnet**. Here, we'll create a CNN by extending **sonnet.Module** class. This approach almost seems like **PyTorch** approach of creating neural networks. We have created a CNN with the same layers as we had created with **Sequential** API in the previous section.

In order to create a CNN this way, we need to implement two methods.

- In this method, we initialize the layers of our neural networks.**init**()- In this method, we perform forward pass through input data using layers defined in**call**()method. The method takes data as input and returns predictions at last.**init**()

After defining CNN, we have also performed a forward pass through it with a few data samples for verification purposes. We have also printed the shapes of network parameters later.

In [8]:

```
class CNN(snt.Module):
def __init__(self,name="CNN"):
super(CNN, self).__init__(name=name)
self.conv1 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME")
self.conv2 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME")
self.flatten = snt.Flatten()
self.linear = snt.Linear(10)
def __call__(self, X_batch):
x = tf.nn.relu(self.conv1(X_batch))
x = tf.nn.relu(self.conv2(x))
x = self.flatten(x)
x = self.linear(x)
return tf.nn.softmax(x)
```

In [9]:

```
cnn = CNN()
cnn(X_train[:5])
```

Out[9]:

In [10]:

```
for tensor in cnn.trainable_variables:
print("{} : {}".format(tensor.name, tensor.shape))
```

In this section, we are training our CNN. In order to train the network, we have defined a function that will be used to train the neural network.

The function takes data features (X), target values (Y), number of epochs, and batch size as input. It then executes the training loop number of epochs time. Each time, it calculates the start and end indexes of batches of data. It then divides data into batches using these indexes and loops through data in batches. For each batch of data, it performs a forward pass through the network to make predictions. It then calculates loss using predictions and actual target values. Both of these operations are done inside **tf.GradientTape()** context manager which will record the gradient of the loss with respect to model parameters. We then retrieve the model parameters using **trainable_variables** attribute. We give loss value and model parameters to **gradient()** method of **GradientTape** which will calculate the gradients of loss with respect to parameters and return them. We'll then call **apply()** method of the optimizer to update model parameters with gradients. We are recording a loss for each batch and printing the average loss for each epoch.

In [11]:

```
def TrainModelInBatches(X, Y, epochs, batch_size=32):
for i in range(1, epochs+1):
batches = tf.range((X.shape[0]//batch_size)+1) ### Batch Indices
losses = [] ## Record loss of each batch
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch, Y_batch = X[start:end], Y[start:end] ## Single batch of data
with tf.GradientTape() as tape:
preds = cnn(X_batch) ## Make Predictions on Batch of Data
loss = loss_func(Y_batch, preds) ## Calculate Loss
params = cnn.trainable_variables ## Retrieve Model Parameters
grads = tape.gradient(loss, params) ## Calculate Gradients
optimizer.apply(grads, params) ## Update Weights
losses.append(loss) ## Record Loss
print("CrossEntropyLoss : {:.3f}".format(tf.math.reduce_mean(tf.convert_to_tensor(losses))))
```

Now, we are actually training our CNN by initializing necessary variables and calling a function defined in the previous cell. We have initialized the learning rate to **0.001**, epochs to **25**, and batch size to **256**. Then, we have initialized **SGD** optimizer with learning rate from **sonnet.optimizers** module.We have initialized **CategoricalCrossentropy** loss for our task as well. We'll be using cross entropy loss for our purposes. It's a commonly used loss function for multi-class classification tasks. At last, we have called our training function with the necessary parameters to train CNN. Please make a note that we are giving target values as one-hot encoded using **to_categorical()** function of keras. Our loss function requires target values to be one-hot encoded to calculate loss value. We can notice from the loss value getting printed after each epoch that our model seems to be doing a good job.

In [12]:

```
learning_rate = 1/1e3
epochs = 25
batch_size=256
optimizer = snt.optimizers.SGD(learning_rate=learning_rate)
loss_func = tf.losses.CategoricalCrossentropy()
TrainModelInBatches(X_train, tf.keras.utils.to_categorical(Y_train), epochs, batch_size)
```

In this section, we are making predictions on train and test datasets using our trained CNN model. The function loop through input data in batches and makes predictions for each batch of data. It then combines predictions of all batches and returns them.

As the output of our neural network is 10 probabilities per sample, we need to include logic to retrieve the target class from these 10 probabilities. To do that, we'll retrieve the index of highest probability from 10 probabilities and predict that index as the target class. We'll need to do that for each data sample. To execute this logic, we have used **argmax()** method on the output of the neural network to predict the actual target class for each data sample.

In [13]:

```
def MakePredictions(input_data, batch_size=32):
batches = tf.range((input_data.shape[0]//batch_size)+1) ### Batch Indices
preds = []
for batch in batches:
if batch != batches[-1]:
start, end = int(batch*batch_size), int(batch*batch_size+batch_size)
else:
start, end = int(batch*batch_size), None
X_batch = input_data[start:end]
if X_batch.shape[0] != 0:
preds.append(cnn(X_batch))
return preds
```

In [14]:

```
test_preds = MakePredictions(X_test, batch_size=batch_size)
test_preds = tf.concat(test_preds, axis=0) ## Combine predictions of all batches
test_preds = tf.argmax(test_preds, axis=1)
train_preds = MakePredictions(X_train, batch_size=batch_size)
train_preds = tf.concat(train_preds, axis=0) ## Combine predictions of all batches
train_preds = tf.argmax(train_preds, axis=1)
test_preds[:5], train_preds[:5]
```

Out[14]:

In this section, we are evaluating the performance of our neural network by calculating the accuracy of train and test predictions. We have also calculated a classification report on test data which has information like precision, recall, and f1-score for each target class. To calculate performance metrics, we have used functions available from scikit-learn.

If you want to learn about various functions available through scikit-learn for various ML metrics then please feel free to check the below tutorial that covers the majority of them in detail.

In [15]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
```

In [16]:

```
from sklearn.metrics import classification_report
print("Test Classification Report ")
print(classification_report(Y_test, test_preds))
```

In this section, we are training our CNN again but this time we have used **Adam** optimizer instead of **SGD** optimizer. All other parameter settings are exactly the same as our previous **SGD** training. We have done a comparison here to check whether **Adam** helps improve performance.

In [17]:

```
learning_rate = 1/1e4
epochs = 25
batch_size=256
optimizer = snt.optimizers.Adam(learning_rate=learning_rate)
loss_func = tf.losses.SparseCategoricalCrossentropy()
TrainModelInBatches(X_train, Y_train, epochs, batch_size)
```

In this section, we have made predictions on our train and test datasets using CNN trained with **Adam** optimizer.

In [18]:

```
test_preds = MakePredictions(X_test, batch_size=batch_size)
test_preds = tf.concat(test_preds, axis=0) ## Combine predictions of all batches
test_preds = tf.argmax(test_preds, axis=1)
train_preds = MakePredictions(X_train, batch_size=batch_size)
train_preds = tf.concat(train_preds, axis=0) ## Combine predictions of all batches
train_preds = tf.argmax(train_preds, axis=1)
test_preds[:5], train_preds[:5]
```

Out[18]:

In this section, we have evaluated the performance of our CNN by calculating the accuracy of train and test predictions. We have also calculated the classification report for test predictions. From the results, we can notice that performance is improved.

In [19]:

```
from sklearn.metrics import accuracy_score
print("Train Accuracy : {:.3f}".format(accuracy_score(Y_train, train_preds)))
print("Test Accuracy : {:.3f}".format(accuracy_score(Y_test, test_preds)))
```

In [20]:

```
from sklearn.metrics import classification_report
print("Test Classification Report ")
print(classification_report(Y_test, test_preds))
```

In our example above, we used grayscale images. We introduced the channels dimension at the end to train CNN. As we said earlier the RGB or color images has 3 channels. There are two different ways to represent channels in the multi-dimensional array when representing images.

**Channels First**- Here, we represent color image of**(28,28)**pixels as**(3,28,28)**dimension array.**Channels Last**- Here, we represent color image of**(28,28)**pixels as**(28,28,3)**dimension array.

In our example, we had kept the channel dimension at last. But the developer can face situations where the data has channels first format. To handle those situations, **Conv2D** layer of **Sonnet** has parameter named **data_format**. The default value of this parameter is **NHWC**.

**N**- Number of data samples.**H**- Height of Image**W**- Width of Image**C**- Number of channels.

If we have data where channel details are present at the beginning then we can specify the value of **data_format** parameter as **NCHW** and **Conv2D** layer will work fine with that format.

Below, we have explained with examples how we can use different data formats. If we don't handle them properly then it can impact the results.

In [21]:

```
conv1 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME")
conv2 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME")
preds1 = conv1(tf.random.normal((50,28,28,1)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.trainable_variables[1].shape))
print("Weights of Second Conv Layer : {}".format(conv2.trainable_variables[1].shape))
print("\nInput Shape : {}".format((50,28,28,1)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
```

In [22]:

```
conv1 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME")
conv2 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME")
preds1 = conv1(tf.random.normal((50,1,28,28)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.trainable_variables[1].shape))
print("Weights of Second Conv Layer : {}".format(conv2.trainable_variables[1].shape))
print("\nInput Shape : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
```

In [23]:

```
conv1 = snt.Conv2D(output_channels=16, kernel_shape=(3,3), padding="SAME", data_format="NCHW")
conv2 = snt.Conv2D(output_channels=32, kernel_shape=(3,3), padding="SAME", data_format="NCHW")
preds1 = conv1(tf.random.normal((50,1,28,28)))
preds2 = conv2(preds1)
print("Weights of First Conv Layer : {}".format(conv1.trainable_variables[1].shape))
print("Weights of Second Conv Layer : {}".format(conv2.trainable_variables[1].shape))
print("\nInput Shape : {}".format((50,1,28,28)))
print("Conv Layer 1 Output Shape : {}".format(preds1.shape))
print("Conv Layer 2 Output Shape : {}".format(preds2.shape))
```

This ends our small tutorial explaining how we can create convolutional neural networks (CNN) using **Sonnet**. Please feel free to let us know your views in the comments section.

If you want to

- provide some suggestions on topic
- share your views
- include some details in tutorial
- suggest some new topics on which we should create tutorials/blogs

If you like our work please give a thumbs-up to our article in the comments section below. You can also support us with a small contribution by clicking on