GLP Depth Estimation

Monocular depth estimation means deducing the distance or depth information of objects within a scene from a single 2D image or video frame. This process facilitates comprehension of the spatial arrangement of the environment and the relative separations between objects. When applied to drones, monocular depth estimation plays a crucial role in autonomous navigation, obstacle avoidance, and scene understanding. By extracting depth cues from the drone’s camera feed, the drone gains the ability to perceive its surroundings, gauge the elevation of obstacles or landmarks, and make informed decisions to ensure secure and efficient flight. The method proposed in GLP is utilized for this purpose. It consists of an encoder part implemented with transformers, possessing a large receptive field for capturing global context. The decoder part aims to capture local features to preserve structural details and generate a detailed feature map facilitated by skip connections and a selective feature fusion model that employs attention maps to pinpoint distinct features. This approach achieved state-of-the-art results on the NYU Depth V2 dataset, yielding a root mean squared error of 0.344 with minimal parameter usage. An example of the depth estimation model is shown below where the darker parts of the image represent nearer objects while the brighter regions represent more far objects.

GLP Depth Estimation

Example

from dronevis.models import DepthEstimator

model = DepthEstimator()
model.load_model()
model.detect_webcam()

GLP Depth Estimation Class

class dronevis.models.DepthEstimator

Depth Estimation class with huggingface

Source: https://huggingface.co/docs/transformers/tasks/monocular_depth_estimation

__init__()

Initialize self. See help(type(self)) for accurate signature.

load_model(model_name='vinvino02/glpn-nyu')

Load the model from huggingface model hub

Parameters

model_name (str, optional) – Model name to load. Defaults to “vinvino02/glpn-nyu”.

transform_img(image)

Idel transformation for the input image, since depth estimation model does the transformations internaly during the inference.

Parameters

image (Union[np.ndarray, PIL.Image]) – Input image

Returns

Same as the input image

Return type

np.ndarray

predict(image)

Run model inference on the provided image

Parameters

image (np.ndarray) – Input image for inference

Returns

Predicted image with bounding boxes drawn.

Return type

np.ndarray

detect_webcam(video_index=0, window_name='Depth Estimation')

Run model on a video stream from the webcam

Parameters
  • video_index (int) – Index of the camera/video device to retrieve stream

  • window_name (str, optional) – Name of openCV window for running the mpdel.

  • to "Cam Detection". (Defaults) –