YOLO
You Only Look Once (YOLO). is the most popular and efficient model in computer vision. Introduced in 2015 to be trained end-to-end, it aimed at real-time object detection and classification. The model family belongs to one-stage object detection models that process an entire image in a single forward pass of a CNN. Unlike two-stage detection models such as R-CNN and its variants – first propose regions of interest and then, classify these regions- YOLO processes the entire image in a single pass, making it exceedingly faster.
YOLOv5 Torch
YOLOv5, developed by ultralytics, used YOLOv3 head network, however, introduced a new backbone network called EfficientDet. Furthermore, significant improvements have been made to boost the detection speed and increase the accuracy.
Dynamic Anchor Assignment: adjusting the anchor boxes used during training to better fit the distribution of object sizes in the dataset.
Improved Data Augmentation: improves model capabilities in difficult lighting conditions, as well as in situations where the objects are occluded.
Modified Non-Maximum Suppression: more efficient and accurate version was developed to improve overall detection performance.
YOLOv5 became the world’s state-of-the-art repo for object detection back in 2020 given its flexible Pythonic structure. Evaluated on MS COCO dataset test-dev 2017, YOLOv5x achieved an AP of 50.7% with an image size of 640 pixels. Using a batch size of 32, it can achieve a speed of 200 FPS on an NVIDIA V100.
- class dronevis.models.YOLOv5
YOLOv5 implementation with torch hub model (inherits from CVModel).
For more details see YOLOv5.
- __init__()
Initialize local path
- load_model()
Load model from PyTorchHub
- transform_img(image)
Idle transformation.
Implemented just for code integrity
- Parameters
image (np.ndarray) – input image
- Returns
output image
- Return type
np.ndarray
- predict(image)
Run model inference on input image and return bouding boxes along with object names
- Parameters
image (np.array) – input image
- Returns
detections object
- Return type
torch.hub.models.self.common.Detections
- detect_webcam(video_index=0, window_name='YOLOv5 Detection')
Start webcam detection from video_index (to quit running this function press ‘q’)
The stream is retrieved and decoded using opencv library.
YOLOv8 Interface
- class dronevis.models.yolov8.YOLOv8(track=False, show_conf=True, show_labels=True)
YOLOv8 implementation with ultralytics model (inherits from CVModel)
- __init__(track=False, show_conf=True, show_labels=True)
Initialize self. See help(type(self)) for accurate signature.
- abstract load_model(model_weights='yolov8.pt')
Load model weights from disk
- transform_img(image)
Idel transformation for the input image, since yolov8 model does the transformations internaly during the inference.
- Parameters
image (np.ndarray) – Input image
- Returns
Same as the input image
- Return type
np.ndarray
- predict(image, confidence=0.5, track=False)
Run model inference on the provided image with the desired confidence
- Parameters
- Returns
Predicted image with bounding boxes drawn.
- Return type
np.ndarray
- detect_webcam(video_index=0, window_name='YOLOv8', track=False)
Run web cam detection with yolov8 model
- Parameters
video_index (Union[str, int], optional) – Index of the video stream. It can accept a
index or a URL for remote sources. Defaults to 0. (camera) –
window_name (str, optional) – Name of the cv2 window viewing the frames.
to"Cam Detection". (Defaults) –
track (bool, optional) – Whether to track the objects or not. Defaults to False.
YOLOv8 Detection Torch
YOLOv8 is the last model in the YOLO series (at the time of developing our work), surpassing all of them in both accuracy and speed. YOLOv8 introduced minor changes, e.g., removal/addition of some CNN layers or changing the kernel sizes), yet the major change was anchor-free detections. YOLOv8 predicts the center of an object directly instead of the offset from a known anchor box. It is more flexible as it does not require the manual specification of anchor boxes, which can be difficult to choose and can lead to sub-optimal results in previous models of YOLO. In addition, YOLOv8 introduced multiple models for solving other common tasks in computer vision – Instance Segmentation, Image Classification, Object Tracking. Evaluated on MS COCO dataset test-dev 2017, YOLOv8x achieved an AP of 53.9% with an image size of 640 pixels (compared to 50.7% of YOLOv5 on the same input size) with a speed of over 500 FPS on a TensorRT.
- class dronevis.models.YOLOv8Detection(track=False, show_conf=True, show_labels=True)
YOLOv8 model implementation for object detection