The convolutional neural network is a type of artificial neural network which has proven giving very good results for visual imagery over the last few years. Over the years many version of convolutional neural network has been designed to solve many tasks as well as to win image net competitions. Any artificial neural network which uses the convolution layer in its architecture can be considered as ConvNet. ConvNets typically start with recognizing smaller patterns/objects in data and later on combines these patterns/objects further using more convolution layers to predict the whole object. Yann Lecun developed the first successful ConvNet by applying backpropagation to it during the 1990s called LeNet. Later on, different versions of ConvNet has won imagenet competitions a few times. We'll be discussing convolutional neural net workings as well as it's applications in this article.
We'll start by explaining mathematical operation convolution.
From a computer science point of view convolution operation refers to the application of one small array(commonly refers to as a filter) on another big array in some way to produce third output array. We multiply filter array with part(same size as filter array) of the original big array starting from the top and then sum up all values of resulting array to produce first value of the final array. We then keep on moving by one step to the next value of the big array and repeat the same process until the whole big array is processed from left to right. Please make a note that the convolution operation decreases the size of the original array based on filter size. We'll try to explain the whole process with a few examples below to clear understanding further.
import numpy as np import scipy.ndimage as ndi import matplotlib.pyplot as plt %matplotlib inline
Let's define a simple array of size 10 where the first five elements are
0s and the last five elements are
original_arr = np.zeros(10) original_arr[5:] = 1 original_arr
array([0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])
Let's define a simple filter array of size 3 with all elements as 1/3.
filter_arr = np.array([1/3, 1/3, 1/3]) filter_arr
array([0.33333333, 0.33333333, 0.33333333])
out = np.convolve(original_arr, filter_arr, mode="valid") out
array([0. , 0. , 0. , 0.33333333, 0.66666667, 1. , 1. , 1. ])
filter_arr started multiplying
original_arr from starting by moving one step at a time. It then sums up multiplication results to generate the first element of the result. It then repeats the same process till the end of an array is reached.
Note Please make a note that default
mode of application with
full which appends
0s on both start and end so that all elements of array are processed. There are other modes as well like
same which returns resulting array as the same length of original array by adding
0s at end of an original array during convolution operation. We have used mode
valid which does not perform any kind of padding of
np.convolve(original_arr, filter_arr, mode="full")
array([0. , 0. , 0. , 0. , 0. , 0.33333333, 0.66666667, 1. , 1. , 1. , 0.66666667, 0.33333333])
Please make a note that above result appends 2
0s at beginning of original array and 2
0s at end of original array.
np.convolve(original_arr, filter_arr, mode="same")
array([0. , 0. , 0. , 0. , 0.33333333, 0.66666667, 1. , 1. , 1. , 0.66666667])
Please make a note that above result appends one 0 at beginning of original array and one 0 at end of original array.
We can also use ndimage module of scipy to compute convolution which has some more mode available for testing purpose.
out = ndi.convolve(original_arr, filter_arr) out
array([0. , 0. , 0. , 0. , 0.33333333, 0.66666667, 1. , 1. , 1. , 1. ])
From the above result, we can see that it has the same length as the original array but the last element is not 0.666667. The default mode with the
convolve function of the
ndimage module is
reflect which extends the original array with the same last element. Hence our original array had the last element as 1 it got appended at last to get the same size as the original array.
Let's try convolution operation on 2D array.
original_arr = np.zeros((7, 7), dtype=float) original_arr[2:5, 2:5] = 1 original_arr
array([[0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 1., 1., 1., 0., 0.], [0., 0., 1., 1., 1., 0., 0.], [0., 0., 1., 1., 1., 0., 0.], [0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0.]])
Let's try to visualize original array.
filter_arr = np.full((3,3), 1/9) filter_arr
array([[0.11111111, 0.11111111, 0.11111111], [0.11111111, 0.11111111, 0.11111111], [0.11111111, 0.11111111, 0.11111111]])
out = ndi.convolve(original_arr, filter_arr) out
array([[0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0.11111111, 0.22222222, 0.33333333, 0.22222222, 0.11111111, 0. ], [0. , 0.22222222, 0.44444444, 0.66666667, 0.44444444, 0.22222222, 0. ], [0. , 0.33333333, 0.66666667, 1. , 0.66666667, 0.33333333, 0. ], [0. , 0.22222222, 0.44444444, 0.66666667, 0.44444444, 0.22222222, 0. ], [0. , 0.11111111, 0.22222222, 0.33333333, 0.22222222, 0.11111111, 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
As we can see above the original filter seems to smooth edge which separates black from white. Filters can perform many things like the above one. As a part of the convolutional neural network, the model learns this filter by training on data to learn intricate patterns.
Please make a note that
np.convolve can only work on
1D array. For 2 or more dimensions, we need to use
scipy.ndimage convolve function.
We'll now try to understand the architecture of convolution as we have a clear understanding of what is convolution operation.
Any artificial neural network which uses convolution layers in its architecture is called a convolutional neural network. Convolutional Neural Network generally performs quite well for image classification/object detection hence we'll try to explain it's architecture from the image classification task point of view. Below we have shown sample CNN architecture for processing images.
Color images are maintained as a three-dimensional array (width x height x channels). Hence CNN neurons of convolution layers are also arranged in a way that applies a list of filters on these 3-dimensional images.
Generally, the architecture of CNN consists of a list of blocks of
(Convolution, Pooling) layers followed by
fully connected layers. Conv block can consist of a single Convolution layer followed by single pooling layers or even more than one convolution layer followed by a single pooling layer.
Below are described common layers of CNN:
RELU (Rectified Linear Unit)). It takes as input the number of filters which we discussed above that will be initially initialized randomly or zeros but will get values during training. These filters are weights of convolution layers.
POOLING Layer: We then apply the pooling layer which reduces the size of activated output from the Conv layer resulting in downsampling. Pooling reduces size in both dimensions for images. Pooling layers do not have any weights to train as it just decreases the size of the input. Below sample image depicts how pooling works.
Fully Connected Layer: Output of last Pooling layer is flattened converting it to one dimensional and then given as input to fully connected layer. It's then followed by activation function which is generally sigmoid/softmax for classification tasks to predict object.
The below image displays how image is transformed when it passes through various layers of CNN.
We can see from the above image that convolution layers followed by pooling layers create a simple representation of an original image to classify it. All deep learning libraries like Keras, Pytorch, Tensorflow, etc provides layers ready-made for convolution and pooling operations. Keras provides a very easy API to design CNNs.Keras, Pytorch, Tensorflow also provide a few famous CNN architecture along with their weights which can be used for transfer learning for other almost same tasks.
We'll now list-down a few common applications of CNN.
Below we have listed some famous CNN architectures. Libraries like Keras, Pytorch, Tensorflow, etc provide these architectures as well as their trained weights which can be used for other image classification tasks.
The below articles were referred to while creating this article and images were taken from below materials for explanation purpose. Readers can further go through the below articles to enhance their knowledge to the next level.