Gesture Recognition Control

While user interfaces provide comprehensive access to drone functions, including movement control and various computer vision tasks through the drone’s camera, they necessitate proximity to a computer device. In cases where simple drone movement control is sufficient, hand gesture control can offer enhanced intuitiveness and interactivity compared to traditional user interfaces. The DroneVis library enables straightforward drone control using hand gestures. To achieve this capability, the Mediapipe hands model is employed to extract the keypoints of each hand, yielding 3D coordinates for 21 keypoints. Six distinct gestures are utilized for drone control, as showcased below.

A small dataset comprising 328 images encompassing these six gestures is collected for training a gesture classifier model, as depicted in the figure below.

The model’s input consists of the 63-dimensional keypoints extracted by Mediapipe 21x3, representing the x, y, and z coordinates of the 21 keypoints. The classifier network incorporates a single fully connected layer with 50 units, followed by a leaky ReLU activation, and an output layer with 6 units corresponding to the six gestures, followed by a softmax activation. Layer count and sizes are empirically determined for optimal performance, ensuring real-time feasibility for drone control. The dataset is partitioned into 80% training, 10% validation, and 10% testing sets, with label stratification to account for minor class imbalances.

The gesture recognition is incorporated by default into the GUI, yet you can run it manually like any other model and control its parameters:

from dronevis.models import GestureRecognition

model = GestureRecognition()
model.load_model()
model.detect_webcam()

Gesture Recognition Class

class dronevis.models.GestureRecognition(min_detection_confidence=0.5, min_tracking_confidence=0.5)

Gesture Recognition class with mediapipe

This class inherits from base class CVModel, and implements its abstract methods for code integrity.

__init__(min_detection_confidence=0.5, min_tracking_confidence=0.5)

Construct model instance

Parameters

min_detection_confidence (float, optional) – Threshold for detection
min_tracking_confidence (float, optional) – Threshold for tracking. Defaults to 0.5.

load_model(weights_path=None): Load model from memory

transform_img(image): Idle transformation of the image

predict(image)

Run model inference on input image and output gesture keypoints and name

Parameters: img (np.array) – Input image (assumed to be non-transformed)
Returns: Output image with keypoints drawn and gesture label recognized
Return type: np.array

detect_webcam(video_index=0, window_name='Gesture Recognition')

Run webcam (or any video streaming device) with gesture recognition module

Parameters

video_index (Union[int, str], optional) – Index of video device. can be an IP
video_path. Defaults to 0. (or) –
window_name (str, optional) – Name of opencv window. Defaults to “Gesture Recognition”.

on_frame_detect(label, video_index=0)

Run detection on tkinter label

Parameters

label (Label) – Tkinter label to view output
video_index (int, optional) – Index of the video device. Defaults to 0.

stop_frame_detection(): Stop frame detection