Artificial intelligence (AI) is the emulation of human intelligence in devices that have been designed to behave and think like humans. The phrase can also be used to refer to any computer that demonstrates characteristics of the human intellect, like learning and problem-solving. Through CoderzColumn Ai tutorials, you will learn to code for these concepts:
For an in-depth understanding of the above concepts, check out the sections below.
Deep learning is a field in Machine Learning that uses deep neural networks to solve tasks. The neural networks with generally more than one hidden layer are referred to as deep neural networks.
Many real-world tasks like object detection, image classification, image segmentation, etc can not be solved with simple machine learning models (decision trees, random forest, logistic regression, etc). Research has shown that neural networks with many layers are quite good at solving these kinds of tasks involving unstructured data (Image, text, audio, video, etc). Deep neural networks nowadays can have different kinds of layers like convolution, recurrent, etc apart from dense layers.
Python has many famous deep learning libraries (PyTorch, Keras, JAX, Flax, MXNet, Tensorflow, Sonnet, Haiku, PyTorch Lightning, Scikeras, Skorch, etc) that let us create deep neural networks to solve complicated tasks.
Image classification is a sub-field under computer vision and image processing that identifies an object present in an image and assigns a label to an image based on it. Image classification generally works on an image with a single object present in it.
Over the years, many deep neural networks (VGG, ResNet, AlexNet, MobileNet, etc) were developed that solved image classification task with quite a high accuracy. Due to the high accuracy of these algorithms, many Python deep learning libraries started providing these neural networks. We can simply load these networks with weights and make predictions using them.
Python libraries PyTorch and MXNet have helper modules named 'torchvision' and 'gluoncv’ respectively that provide an implementation of image classification networks.
Object detection is a sub-field of computer vision and image processing that detect presence of one or more objects in an image and draws a bounding box around them with labels above. It detects semantic objects present in an image.
Object detection has many applications like image captioning, image annotation, vehicle counting, activity recognition, face detection, etc.
Over the years, many deep neural networks (R-CNN, Faster R-CNN, SSD, Retina Net, YOLO, etc) have been developed to solve object detection tasks. They are quite good at the tasks.
Python deep learning libraries PyTorch and MXNet provide an implementation of these algorithms through helper modules 'torchvision’ and 'gluoncv’. We can simply load these neural networks with weights and make predictions by giving input images.
Image segmentation as the name suggests is the process of segmenting an image into sections where sections are generally objects like humans, animals, cars, traffic lights, etc. During segmentation process, we assign typically labels to each pixel such that all pixels of one object have same labels.
Image segmentation has various applications like face detection, video surveillance, medical imaging, pedestrian detection, detecting objects in satellite images, etc.
Image segmentation can be solved using various non-ML-based methods (thresholding-based methods, clustering based methods, histogram-based methods, etc) as well as ML algorithms (U-Net, Fast-FCN, Mask R-CNN, DeepLab, LRASPP, etc) developed over years.
Python library scikit-image provides an implementation of various non-ML methods of solving image segmentation. For ML-based implementation, Python deep learning libraries PyTorch and MXNet have helper modules named torchvision and gluoncv that provide pre-trained, deep learning models, for image segmentation.
Text classification also referred to as document classification is a problem in computer science where each text document is assigned a unique category or label based on its content.
In order to classify text documents using deep neural networks, the text content of documents needs to be converted to real values. There are many approaches to converting text data to real-valued data like bag of words (word frequency, one-hot encoding, etc), Tf-IDF, Word embeddings, character embeddings, etc.
Once data is converted to real-valued data, deep neural networks of different types (Multi-Layer Perceptron, CNN, LSTM, etc) can be used to classify text documents. Neural networks can be designed with any Python deep learning library. Many libraries provide helper functionalities to handle text data.
Text generation also referred to as natural language generation is a sub-field of natural language processing (NLP) involving generation of new text. Text generation has various applications like generating reports, image captions, chatbots, etc.
Deep neural networks especially Recurrent neural networks and their variants (LSTM, GRU, etc) are proven to give good results for text generation tasks due to their ability to remember sequences.
Python deep learning libraries PyTorch, keras, MXNet, Flax, etc can be used to design RNNs to solve text generation task.
After training deep neural networks, we generally evaluate the performance of model by calculating and visualizing various ML Metrics (confusion matrix, ROC AUC curve, precision-recall curve, silhouette Analysis, elbow method, etc).
These metrics are normally a good starting point. But in many situations, they don’t give a 100% picture of model performance. E.g., A simple cat vs dog image classifier can be using background pixels to classify images instead of actual object (cat or dog) pixels.
In these situations, our ML metrics will give good results. But we should always be a little skeptical of model performance.
We can dive further deep and try to understand how our model is performing on an individual example by interpreting results. Various algorithms have been developed over time to interpret predictions of deep neural networks and many Python libraries (lime, eli5, treeinterpreter, shap, captum, etc) provide their implementation.
Learning rate is one of the most important hyperparameters during training of deep neural networks. A good learning rate can help neural networks converge quite faster.
Various experiments have shown that varying learning rate during training gives quite better results compared to keeping it constant throughout training process. It is recommended to gradually reduce learning rate over time during training. The process of changing learning rate during training is referred to as learning rate scheduling or learning rate annealing. The learning rate can be changed after each epoch or each batch.
Python deep learning libraries provide various ways to perform learning rate scheduling. Some common LR scheduling techniques like step LR, exponential LR, lambda LR, cyclic LR, cosine LR, etc are available in majority of libraries.