Detection Algorithms

Detection algorithms are the most frequenlty used in computer vision alogrithms generally, and in drones navigation specifically. Henceforth, the library comes in equipped with state-of-the-art (SOTA) algorithms along with different implementations:

Faster Region-based CNN (R-CNN) (PyTorch)
CenterNet (Mxnet)
You Only Look Once (YOLO) (Mxnet)
Signle-Shot Detector (SSD) (PyTorch, Mxnet)

Detection on Your Web Camera

You can get started by feeding a video stream from your web camera (or any camera) with a few lines of code.

from dronevis.detection_torch import FasterRCNN

model = FasterRCNN()    # initialize model instance
model.load_model()      # load the model weights
model.detect_webcam()   # start video detection

A window pops-up with your webcam video stream, and boxes around detected objects.

Note

The model weights need to be downloaded, so make sure you have a working internet connection. However, once the weights are downloaded once, they will be stored in ~/.cache/torch/hub/checkpoints (on Ubuntu) and you needn’t to download them again.

You can see that the models run with PyTorch, which will automatically check whether you have a GPU device and load the model accordingly. If you have multiple GPUs and you want to specify one of them for the detection, just set the device property of the model to your desired choice (either "cuda:<device-index>" or "cpu"):

model.device = "cuda:1" # set second GPU (index=1) for inference

Different Model Implementations

The library takes into account the numerous implementations found on the internet, and that users usually prefer a framework over the other. Hence, detection models are currenly built with two frameworks:

PyTorch
Mxnet

You can see the types of implementation. However, for easier user interactivity, major used methods are unified across all models. Each model has 4 main methods:

load_model : load the model weights from cache, or download them.
predict : run the model on input image
transform_img : run the model’s transformation on input image
detect_webcam : start detection on your webcam.

Abstract Models

To provide a unified interface for all detection models, all implementations must inherit from an abstract base class.

Main Abstract Model

class dronevis.abstract.abstract_model.CVModel

Base class for creating custom comptervision models.

To use the abstract class just inherit it, and override the abstract method.

Main methods:

1. load_model Load model weights from web or cache. You only need to download the model weights once, and they will be stored and loaded automatically each time you use them later.

2. predict Run model inference on input image You don’t have to transform the image before the inference, input images will be transformed automatically.

3. transform_img Transform input image according to models transformations

4. detect_webcam Start webcam (or any camera) detection

abstract load_model(): Load model weights from disk

abstract predict(image): Get predictions for inference on input image

abstract transform_img(image): Transform input image using model transformations

abstract detect_webcam(video_index, window_name='Cam Detection')

Run model on a video stream from the webcam

Parameters:

video_index (int) – Index of the camera/video device to retrieve stream
window_name (str, optional) – Name of openCV window for running the mpdel.
Detection". (Defaults to "Cam) –

Now, each model inherits from this abstract class, and must implement its abstract methods. You can implement your own model as follows:

from drone.abstract import CVModel

class CustomModel(CVModel):

    def load_model(self):
        """Load your model weights""""
        pass

    def predict(self, image):
        """Run model on input image and return inference results""""
        pass

    def transform_img(self, image):
        """Transform input image""""
        pass

    def detect_webcam(self, video_index, window_name):
        """Retrieve video stream from device at video index, and start model inference""""
        pass

Torch Abstract Models

class dronevis.abstract.abstract_torch_model.TorchDetectionModel

Base class (inherits from CV abstract model) for creating custom PyTorch models. To use the abstract class just inherit it, and override the abstract method.

For each prediction, the model output 300 labels, and their corresponding 300 scores. Labels are picked if they surpass the threshold accuracy.

__init__()

Construct torch models, and detect device for inference (cuda or cpu).

Torch detection models are assumed to be trained on COCO dataset. In addition, torch can detect if you have an available GPU. The property device, contains the device that will be used for inference. You can change the device by changing the device property.

predict(image, detection_threshold=0.7)

Predict all classes in an image using torch model

Parameters:

image (numpy.ndarray) – video frame or image to predict the classes in it
detection_threshold (float) – thershold to determine if the calss will be taken or not

Returns:

output image with boxes drawn

Return type:

numpy.ndarray

transform_img(image)

Transform image to tensor

Parameters:: img (numpy.ndarray) – input array
Returns:: tensor img
Return type:: torch.Tensor

draw_boxes(boxes, classes, labels, image)

Draw boxes for the predicted classes in an image using torch model

Parameters:

boxes (numpy.ndarray) – predicted boxes returned by predict function
classes (List) – predicted classes in an image returned by predict function
labels (torch.Tensor) – class labels in an image returned by predict function
image (numpy.ndarray) – an image to draw boxes on.

Returns:

cv2 image after drawing boxes of the predicted classes on it with their labels

Return type:

numpy.ndarray

detect_webcam(video_index=0, window_name='Cam Detection')

Detecting objects with a webcam using torch model (to quit running this function press ‘q’)

The stream is retrieved and decoded using opencv library.

Parameters:

video_index (int, optional) – device index used to retrieve video stream, it
0. (can be an index or an IP. Defaults to) –
window_name (str, optional) – name of video stream window. Defaults to “Cam Detection”.

As pretrained PyTorch models have many methods into common, TorchDetectionModel unifies the common methods in a single class, and each torch model implementation inherits from this class. However, each inherited model must implement the load_model method.

from dronevis.abstract.abstract_torch_model import TorchDetectionModel

class CustomTorchModel(TorchDetectionModel):

    def __init__(self):
        super(CustomTorchModel, self).__init__()

    def load_model(self):
        """Load model weights"""
        pass