Updated On : Jul-13,2022 Tags pytorch, object-detectio…

PyTorch: Object Detection using Pre-Trained Models

Object detection is an active research area of computer vision and image processing that finds out objects present in an image of certain classes. It generally detects objects present in an image, draws a bounding box around it, and labels it. It has many applications like image annotation, face detection, object tracking, vehicle counting, etc. Over time, many approaches are developed for solving object detection using neural networks and without them as well. The approaches using deep neural networks have quite a high accuracy though these networks are quite complicated and require a lot of data to train. Due to this, many deep learning libraries (PyTorch, MXNet (GluonCV), Tensorflow, OpenCV, etc) provide an implementation of these networks which has pre-trained networks that can be directly used for our purpose if our classes fall into their classes else we can fine-tune their network by training little on our dataset with small categories as well.

As a part of the article, we have explained how to use pre-trained Object detection networks available from PyTorch. The tutorial explains how we can take any random image from the network and try to look for objects in it using these pre-trained PyTorch models. We can load these networks with pre-trained weights or even without them if we have enough data to train these architectures. Currently, PyTorch provides the below-mentioned models for object detection.

  1. Faster R-CNN
  2. FCOS (Fully Convolutional One-Stage Object Detection)
  3. RetinaNet
  4. SSD (Single Shot MultiBox Detector)
  5. SSDlite

These pre-trained networks make the task of object detection quite easy if you don't want to train complicated neural networks. These networks use pre-trained image classification neural networks like RestNet, MobileNet, VGG, etc for retrieving object features. Majority of object detection networks are trained on COCO dataset and image classification networks are trained on ImageNet dataset.

Below, we have listed essential tutorial sections to give an overview of the material covered.

Important Sections Of Tutorial

  1. Load Sample Images
    • 1.1 Download Sample Images
    • 1.2 Load Images as Pillow Images
    • 1.3 Convert Pillow Images to Torch Tensors
    • 1.4 Add Batch Dimension
    • 1.5 Convert Images Represented as Integer (0-255) to Floats (0-1)
  2. Load Pre-Trained PyTorch Model (Faster R-CNN with ResNet50 Backbone)
  3. Make Predictions
  4. Visualize Results
    • 4.1 Load Target Classes Mapping
    • 4.2 Map Target Category Ids to Labels
    • 4.3 Visualize Bounding Boxes On Original Images
  5. Try Other Pre-Trained Models

Below, we have imported the necessary Python libraries that we'll use for our tutorial. We have also printed the versions that we have used in our tutorial.

import torch

print("PyTorch Version : {}".format(torch.__version__))
PyTorch Version : 1.11.0+cpu
import torchvision

print("TorchVision Version : {}".format(torchvision.__version__))
TorchVision Version : 0.12.0+cpu

Pycocotools Installation

  • !pip install pycocotools
import pycocotools

1. Load Sample Images

In this section, we have simply downloaded a few images from the internet, load them as Pillow images, and then converted them to torch tensors for giving them to model for prediction. We need to convert images to tensors before giving them to the network for predictions always as they word on tensors only.

1.1 Download Sample Images

Below, we have simply downloaded two images from the Internet using wget shell command. We have selected images randomly. Please feel free to use your images if you have some ready that you want to try.

!wget https://images.click.in/classifieds/images/95/30_12_2017_15_58_25_562aaa7a9b6593ce55f7e59cae781674_vpwodzncbi.jpg
!wget https://gumlet.assettype.com/freepressjournal/import/2016/07/kids-playing.jpg
--2022-07-13 05:54:27--  https://images.click.in/classifieds/images/95/30_12_2017_15_58_25_562aaa7a9b6593ce55f7e59cae781674_vpwodzncbi.jpg
Resolving images.click.in (images.click.in)... 23.55.209.233
Connecting to images.click.in (images.click.in)|23.55.209.233|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 61994 (61K) [image/jpeg]
Saving to: ‘30_12_2017_15_58_25_562aaa7a9b6593ce55f7e59cae781674_vpwodzncbi.jpg’

30_12_2017_15_58_25 100%[===================>]  60.54K   400KB/s    in 0.2s

2022-07-13 05:54:28 (400 KB/s) - ‘30_12_2017_15_58_25_562aaa7a9b6593ce55f7e59cae781674_vpwodzncbi.jpg’ saved [61994/61994]

--2022-07-13 05:54:29--  https://gumlet.assettype.com/freepressjournal/import/2016/07/kids-playing.jpg
Resolving gumlet.assettype.com (gumlet.assettype.com)... 146.75.37.55, 2a04:4e42:77::311
Connecting to gumlet.assettype.com (gumlet.assettype.com)|146.75.37.55|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 92078 (90K) [image/jpeg]
Saving to: ‘kids-playing.jpg’

kids-playing.jpg    100%[===================>]  89.92K  --.-KB/s    in 0.1s

2022-07-13 05:54:29 (741 KB/s) - ‘kids-playing.jpg’ saved [92078/92078]

!mv 30_12_2017_15_58_25_562aaa7a9b6593ce55f7e59cae781674_vpwodzncbi.jpg holiday.jpg

%ls
__notebook__.ipynb  holiday.jpg  kids-playing.jpg

1.2 Load Images as Pillow Images

In this section, we have loaded images in memory using Python module pillow. After loading images, we have also displayed them. Both images have people in them. The first image is of the family on vacation and the second image has children playing ball. Next, we'll convert these images to torch tensors.

from PIL import Image

holiday = Image.open("holiday.jpg")

holiday

PyTorch: Object Detection using Pre-Trained Models

kids_playing = Image.open("kids-playing.jpg")

kids_playing

PyTorch: Object Detection using Pre-Trained Models

1.3 Convert Pillow Images to Torch Tensors

In this section, we have simply converted Pillow images to torch tensors. We have used the functional API of torchvision module to convert images to tensors. The transforms sub-module of torchvision has a method named pil_to_tensor() that can help us convert images to tensors. The tensors have shape (channels, height, width). As our images are RGB, we have 3 channels.

from torchvision.transforms.functional import pil_to_tensor

holiday_tensor_int = pil_to_tensor(holiday)
kids_playing_tensor_int = pil_to_tensor(kids_playing)

holiday_tensor_int.shape, kids_playing_tensor_int.shape
(torch.Size([3, 340, 450]), torch.Size([3, 533, 800]))

1.4 Add Batch Dimension

In this section, we have simply added one dimension at the beginning of our images which is the batch dimension. We added this dimension because models work on batches of images.

holiday_tensor_int = holiday_tensor_int.unsqueeze(dim=0)
kids_playing_tensor_int = kids_playing_tensor_int.unsqueeze(dim=0)

holiday_tensor_int.shape, kids_playing_tensor_int.shape
(torch.Size([1, 3, 340, 450]), torch.Size([1, 3, 533, 800]))

1.5 Convert Images Represented as Integer (0-255) to Floats (0-1)

By default, the image tensors are integer tensors that have values in the range 0-255. The PyTorch pre-trained models are generally trained on images represented as float tensors. So, we have created new copies of our images represented as float tensors by dividing integer tensors by 255. We'll require integer tensors in features when plotting it again with bounding boxes.

print(holiday_tensor_int.min(), holiday_tensor_int.max())

holiday_tensor_float = holiday_tensor_int / 255.0
kids_playing_tensor_float = kids_playing_tensor_int / 255.0

print(holiday_tensor_float.min(), holiday_tensor_float.max())
tensor(0, dtype=torch.uint8) tensor(255, dtype=torch.uint8)
tensor(0.) tensor(1.)

2. Load Pre-Trained PyTorch Model (Faster R-CNN with ResNet50 Backbone)

In this section, we have loaded our first pre-trained PyTorch model. The pre-trained models are available from sub-modules of models module of torchvision library. Pytorch has a separate library torchvision for working with vision-related tasks. It provides helper functions to simplify tasks related to computer vision.

The sub-module named detection provides us with various methods that can be called to load pre-trained object detection models. We have loaded the Faster R-CNN model for our purpose. It uses ResNet-50-FPN (Feature Pyramid Network) network for detecting important features in images. We have loaded network by calling method fasterrcnn_resnet50_fpn(). We have provided it with parameters pretrained set to True because we want a network with trained parameters. Currently, by default, they load weights of a model trained on COCO dataset which has around 91 categories of objects.

If you have enough data and you can train the network by yourself then you can load just architecture by setting pretrained to False.

After loading the model, we have set it in evaluation mode by calling eval() method. This will prevent the calculation of gradients which happens during the training phase which is set by default.

from torchvision.models.detection import fasterrcnn_resnet50_fpn

object_detection_model = fasterrcnn_resnet50_fpn(pretrained=True, progress=False)

object_detection_model.eval(); ## Setting Model for Evaluation/Prediction
Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth

3. Make Predictions

Here, we have made predictions on our images using our train model. We have made predictions on both images. The output of the model is a dictionary with three keys.

  • boxes - It has shape (no_of_objects, 4). The 4 numbers for each object represents bounding box covering object represented as (top-x, top-y, bottom-x, bottom-y).
  • labels - It has a list of labels of detected objects. Currently, it'll be integer values but we'll convert them to target labels by loading mapping from COCO website.
  • scores - It has probabilities of detected objects saying how much confident the model was in detecting these objects.

After making predictions, we have removed objects where the model has a prediction probability less than 0.8.

holiday_preds = object_detection_model(holiday_tensor_float)

holiday_preds
[{'boxes': tensor([[ 77.3332,  64.9794, 175.3670, 271.5141],
          [298.0679,  70.5143, 376.4788, 284.1354],
          [245.2462, 139.0273, 317.0147, 283.9664],
          [159.3096, 112.5192, 227.8741, 275.4441],
          [259.4896, 203.4603, 270.6145, 215.5286],
          [244.8168, 185.8696, 264.0533, 245.4257],
          [ 82.5829, 134.4643, 102.1378, 160.6974],
          [ 82.8002, 137.4602, 101.8921, 160.4365],
          [ 83.7411, 142.8226,  99.1271, 159.4648],
          [291.8989, 213.4655, 299.3017, 220.6194],
          [349.4891, 178.3998, 376.2213, 225.5978],
          [245.0587, 182.9415, 263.7846, 247.4702],
          [261.6569, 202.7805, 269.9643, 211.0015]], grad_fn=<StackBackward0>),
  'labels': tensor([ 1,  1,  1,  1, 34, 34, 37, 34, 37, 37, 34, 42, 34]),
  'scores': tensor([0.9998, 0.9997, 0.9996, 0.9995, 0.9752, 0.8965, 0.8015, 0.5139, 0.4637,
          0.3935, 0.3100, 0.2015, 0.0652], grad_fn=<IndexBackward0>)}]
holiday_preds[0]["boxes"] = holiday_preds[0]["boxes"][holiday_preds[0]["scores"] > 0.8]
holiday_preds[0]["labels"] = holiday_preds[0]["labels"][holiday_preds[0]["scores"] > 0.8]
holiday_preds[0]["scores"] = holiday_preds[0]["scores"][holiday_preds[0]["scores"] > 0.8]

holiday_preds
[{'boxes': tensor([[ 77.3332,  64.9794, 175.3670, 271.5141],
          [298.0679,  70.5143, 376.4788, 284.1354],
          [245.2462, 139.0273, 317.0147, 283.9664],
          [159.3096, 112.5192, 227.8741, 275.4441],
          [259.4896, 203.4603, 270.6145, 215.5286],
          [244.8168, 185.8696, 264.0533, 245.4257],
          [ 82.5829, 134.4643, 102.1378, 160.6974]], grad_fn=<IndexBackward0>),
  'labels': tensor([ 1,  1,  1,  1, 34, 34, 37]),
  'scores': tensor([0.9998, 0.9997, 0.9996, 0.9995, 0.9752, 0.8965, 0.8015],
         grad_fn=<IndexBackward0>)}]
kids_preds = object_detection_model(kids_playing_tensor_float)

kids_preds
[{'boxes': tensor([[143.3646,  93.6994, 308.6345, 414.2636],
          [531.8278,  70.2597, 691.4973, 454.3749],
          [287.3516, 104.1377, 479.3222, 420.0710],
          [419.3455, 116.3491, 566.5842, 425.7735],
          [437.6349, 403.0547, 534.5991, 496.3159],
          [390.8573, 131.6313, 501.7326, 414.3199],
          [435.9698, 123.1395, 467.6657, 200.2942],
          [434.9914, 191.5204, 456.2515, 209.0986]], grad_fn=<StackBackward0>),
  'labels': tensor([ 1,  1,  1,  1, 37,  1,  1, 77]),
  'scores': tensor([0.9998, 0.9998, 0.9985, 0.9975, 0.9898, 0.1213, 0.1156, 0.1084],
         grad_fn=<IndexBackward0>)}]
kids_preds[0]["boxes"] = kids_preds[0]["boxes"][kids_preds[0]["scores"] > 0.8]
kids_preds[0]["labels"] = kids_preds[0]["labels"][kids_preds[0]["scores"] > 0.8]
kids_preds[0]["scores"] = kids_preds[0]["scores"][kids_preds[0]["scores"] > 0.8]

kids_preds
[{'boxes': tensor([[143.3646,  93.6994, 308.6345, 414.2636],
          [531.8278,  70.2597, 691.4973, 454.3749],
          [287.3516, 104.1377, 479.3222, 420.0710],
          [419.3455, 116.3491, 566.5842, 425.7735],
          [437.6349, 403.0547, 534.5991, 496.3159]], grad_fn=<IndexBackward0>),
  'labels': tensor([ 1,  1,  1,  1, 37]),
  'scores': tensor([0.9998, 0.9998, 0.9985, 0.9975, 0.9898], grad_fn=<IndexBackward0>)}]

4. Visualize Results

Now, at last, we'll visualize prediction results. In order to do that, we first need to retrieve the mapping of integer target labels to their actual string target labels. We'll download and load mapping from COCO website first.

4.1 Load Target Classes Mapping

Below, we have downloaded annotations from COCO website as a zip file and then unzipped it. You'll need unzip command installed on your computer for running the below cell. If you don't have unzip installed and you want to do it using Python then you can use zipfile module as well.

!wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
!unzip annotations_trainval2017.zip
--2022-07-13 05:54:50--  http://images.cocodataset.org/annotations/annotations_trainval2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.217.226.241
Connecting to images.cocodataset.org (images.cocodataset.org)|52.217.226.241|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 252907541 (241M) [application/zip]
Saving to: ‘annotations_trainval2017.zip’

annotations_trainva 100%[===================>] 241.19M  30.7MB/s    in 8.5s

2022-07-13 05:54:59 (28.4 MB/s) - ‘annotations_trainval2017.zip’ saved [252907541/252907541]

Archive:  annotations_trainval2017.zip
  inflating: annotations/instances_train2017.json
  inflating: annotations/instances_val2017.json
  inflating: annotations/captions_train2017.json
  inflating: annotations/captions_val2017.json
  inflating: annotations/person_keypoints_train2017.json
  inflating: annotations/person_keypoints_val2017.json

Below, we have created an instance of COCO from pycocotools library. We had provided information for installing this library earlier. It provides an API to access various datasets and annotation files available from COCO. This coco object has various methods that can be very helpful. We'll use one such method next to retrieve the actual labels of our target classes.

from pycocotools.coco import COCO

annFile='annotations/instances_val2017.json'

coco=COCO(annFile)
loading annotations into memory...
Done (t=0.84s)
creating index...
index created!

4.2 Map Target Category Ids to Labels

Below, we have retrieved actual string target labels by calling loadCats() method on COCO object. We have provided original integer labels to the method. Next, we'll use these string labels when displaying predictions.

holiday_labels = coco.loadCats(holiday_preds[0]["labels"].numpy())

holiday_labels
[{'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'sports', 'id': 34, 'name': 'frisbee'},
 {'supercategory': 'sports', 'id': 34, 'name': 'frisbee'},
 {'supercategory': 'sports', 'id': 37, 'name': 'sports ball'}]
kids_labels = coco.loadCats(kids_preds[0]["labels"].numpy())

kids_labels
[{'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'person', 'id': 1, 'name': 'person'},
 {'supercategory': 'sports', 'id': 37, 'name': 'sports ball'}]

4.3 Visualize Bounding Boxes On Original Images

Here, we are visualizing images with objects detected by the model surrounded by bounding boxes and a title at the top with a score. We have first below merger labels and their score to create new labels.

Then, we have called utility visualization function named draw_bounding_boxes() provided by torchvision module's utils sub-module. We have provided function with our image (as integer tensor), bounding boxes, modified labels, and color guide. We have asked to color "person" object red and all other objects green. The output of the function is a torch tensor which has bounding boxes and labels included in them.

To visualize image, we have converted tensors to Pillow images by calling to_pil_image() function available functional API of torchvision.

We have visualized both images. We can notice that in the case of the first image, it correctly predicts "person" objects. It is even making a few mistakes like predicting mat as sports ball, float tube as Frisbee, and not recognizing hat. For the second image, it correctly detects all objects. It is able to detect a child partly hidden behind other children.

from torchvision.utils import draw_bounding_boxes

holiday_annot_labels = ["{}-{:.2f}".format(label["name"], prob) for label, prob in zip(holiday_labels, holiday_preds[0]["scores"].detach().numpy())]

holiday_output = draw_bounding_boxes(image=holiday_tensor_int[0],
                             boxes=holiday_preds[0]["boxes"],
                             labels=holiday_annot_labels,
                             colors=["red" if label["name"]=="person" else "green" for label in holiday_labels],
                             width=2
                            )

holiday_output.shape
torch.Size([3, 340, 450])
from torchvision.transforms.functional import to_pil_image

to_pil_image(holiday_output)

PyTorch: Object Detection using Pre-Trained Models

from torchvision.utils import draw_bounding_boxes

kids_annot_labels = ["{}-{:.2f}".format(label["name"], prob) for label, prob in zip(kids_labels, kids_preds[0]["scores"].detach().numpy())]

kids_output = draw_bounding_boxes(image=kids_playing_tensor_int[0],
                             boxes=kids_preds[0]["boxes"],
                             labels=kids_annot_labels,
                             colors=["red" if label["name"]=="person" else "green" for label in kids_labels],
                             width=2,
                             font_size=16,
                             fill=True
                            )

to_pil_image(kids_output)

PyTorch: Object Detection using Pre-Trained Models

5. Try Other Pre-Trained Models

PyTorch torchvision module provides an implementation of other models as well which we have imported below. We'll suggest readers to try them if the above model is not giving that much good results.

from torchvision.models.detection import fasterrcnn_mobilenet_v3_large_320_fpn,\
                                         fasterrcnn_mobilenet_v3_large_fpn,\
                                         fcos_resnet50_fpn,\
                                         ssdlite320_mobilenet_v3_large,\
                                         ssd300_vgg16,\
                                         retinanet_resnet50_fpn

References

Sunny Solanki  Sunny Solanki

Share Views Want to Share Your Views? Have Any Suggestions?

If you want to

  • provide some suggestions on topic
  • share your views
  • include some details in tutorial
  • suggest some new topics on which we should create tutorials/blogs
Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.