YOLO

You Only Look Once (YOLO). is the most popular and efficient model in computer vision. Introduced in 2015 to be trained end-to-end, it aimed at real-time object detection and classification. The model family belongs to one-stage object detection models that process an entire image in a single forward pass of a CNN. Unlike two-stage detection models such as R-CNN and its variants – first propose regions of interest and then, classify these regions- YOLO processes the entire image in a single pass, making it exceedingly faster.

YOLOv5 Torch

YOLOv5, developed by ultralytics, used YOLOv3 head network, however, introduced a new backbone network called EfficientDet. Furthermore, significant improvements have been made to boost the detection speed and increase the accuracy.

Dynamic Anchor Assignment: adjusting the anchor boxes used during training to better fit the distribution of object sizes in the dataset.
Improved Data Augmentation: improves model capabilities in difficult lighting conditions, as well as in situations where the objects are occluded.
Modified Non-Maximum Suppression: more efficient and accurate version was developed to improve overall detection performance.

YOLOv5 became the world’s state-of-the-art repo for object detection back in 2020 given its flexible Pythonic structure. Evaluated on MS COCO dataset test-dev 2017, YOLOv5x achieved an AP of 50.7% with an image size of 640 pixels. Using a batch size of 32, it can achieve a speed of 200 FPS on an NVIDIA V100.

class dronevis.models.YOLOv5

YOLOv5 implementation with torch hub model (inherits from CVModel).

For more details see YOLOv5.

__init__(): Initialize local path

load_model(): Load model from PyTorchHub

transform_img(image)

Idle transformation.

Implemented just for code integrity

Parameters: image (np.ndarray) – input image
Returns: output image
Return type: np.ndarray

predict(image)

Run model inference on input image and return bouding boxes along with object names

Parameters: image (np.array) – input image
Returns: detections object
Return type: torch.hub.models.self.common.Detections

detect_webcam(video_index=0, window_name='YOLOv5 Detection')

Start webcam detection from video_index (to quit running this function press ‘q’)

The stream is retrieved and decoded using opencv library.

Parameters

video_index (Union[str, int], optional) – index of video stream device.
to 0 (Defaults) –
window_name (str, optional) – name of cv2 window. Defaults to “Cam Detection”.

YOLOv8 Interface

class dronevis.models.yolov8.YOLOv8(track=False, show_conf=True, show_labels=True)

YOLOv8 implementation with ultralytics model (inherits from CVModel)

__init__(track=False, show_conf=True, show_labels=True): Initialize self. See help(type(self)) for accurate signature.

abstract load_model(model_weights='yolov8.pt'): Load model weights from disk

transform_img(image)

Idel transformation for the input image, since yolov8 model does the transformations internaly during the inference.

Parameters: image (np.ndarray) – Input image
Returns: Same as the input image
Return type: np.ndarray

predict(image, confidence=0.5, track=False)

Run model inference on the provided image with the desired confidence

Parameters

image (np.ndarray) – Input image for inference
confidence (float, optional) – Confidence score representing what is threshold to
considered for detection. Defaults to 0.5. (be) –
track (bool, optional) – Whether to track the objects or not. Defaults to False.

Returns

Predicted image with bounding boxes drawn.

Return type

np.ndarray

detect_webcam(video_index=0, window_name='YOLOv8', track=False)

Run web cam detection with yolov8 model

Parameters

video_index (Union[str, int], optional) – Index of the video stream. It can accept a
index or a URL for remote sources. Defaults to 0. (camera) –
window_name (str, optional) – Name of the cv2 window viewing the frames.
to"Cam Detection". (Defaults) –
track (bool, optional) – Whether to track the objects or not. Defaults to False.

YOLOv8 Detection Torch

YOLOv8 is the last model in the YOLO series (at the time of developing our work), surpassing all of them in both accuracy and speed. YOLOv8 introduced minor changes, e.g., removal/addition of some CNN layers or changing the kernel sizes), yet the major change was anchor-free detections. YOLOv8 predicts the center of an object directly instead of the offset from a known anchor box. It is more flexible as it does not require the manual specification of anchor boxes, which can be difficult to choose and can lead to sub-optimal results in previous models of YOLO. In addition, YOLOv8 introduced multiple models for solving other common tasks in computer vision – Instance Segmentation, Image Classification, Object Tracking. Evaluated on MS COCO dataset test-dev 2017, YOLOv8x achieved an AP of 53.9% with an image size of 640 pixels (compared to 50.7% of YOLOv5 on the same input size) with a speed of over 500 FPS on a TensorRT.

class dronevis.models.YOLOv8Detection(track=False, show_conf=True, show_labels=True)

YOLOv8 model implementation for object detection

load_model(model_weights='yolov8n.pt')

Load model weights

Parameters

model_weights (str, optional) – Path to model weight or the name of the official
in the ultralytics website which will be downloaded automatically. (weights) –
to "yolov8.pt". (Defaults) –