Gesture Recognition Control

While user interfaces provide comprehensive access to drone functions, including movement control and various computer vision tasks through the drone’s camera, they necessitate proximity to a computer device. In cases where simple drone movement control is sufficient, hand gesture control can offer enhanced intuitiveness and interactivity compared to traditional user interfaces. The DroneVis library enables straightforward drone control using hand gestures. To achieve this capability, the Mediapipe hands model is employed to extract the keypoints of each hand, yielding 3D coordinates for 21 keypoints. Six distinct gestures are utilized for drone control, as showcased below.

Hand Gesture for Drone Control

A small dataset comprising 328 images encompassing these six gestures is collected for training a gesture classifier model, as depicted in the figure below.

Gesture Classifier Model

The model’s input consists of the 63-dimensional keypoints extracted by Mediapipe 21x3, representing the x, y, and z coordinates of the 21 keypoints. The classifier network incorporates a single fully connected layer with 50 units, followed by a leaky ReLU activation, and an output layer with 6 units corresponding to the six gestures, followed by a softmax activation. Layer count and sizes are empirically determined for optimal performance, ensuring real-time feasibility for drone control. The dataset is partitioned into 80% training, 10% validation, and 10% testing sets, with label stratification to account for minor class imbalances.

The gesture recognition is incorporated by default into the GUI, yet you can run it manually like any other model and control its parameters:

from dronevis.models import GestureRecognition

model = GestureRecognition()
model.load_model()
model.detect_webcam()

Gesture Recognition Class

class dronevis.models.GestureRecognition(min_detection_confidence=0.5, min_tracking_confidence=0.5)

Gesture Recognition class with mediapipe

This class inherits from base class CVModel, and implements its abstract methods for code integrity.

__init__(min_detection_confidence=0.5, min_tracking_confidence=0.5)

Construct model instance

Parameters
  • min_detection_confidence (float, optional) – Threshold for detection

  • min_tracking_confidence (float, optional) – Threshold for tracking. Defaults to 0.5.

load_model(weights_path=None)

Load model from memory

transform_img(image)

Idle transformation of the image

predict(image)

Run model inference on input image and output gesture keypoints and name

Parameters

img (np.array) – Input image (assumed to be non-transformed)

Returns

Output image with keypoints drawn and gesture label recognized

Return type

np.array

detect_webcam(video_index=0, window_name='Gesture Recognition')

Run webcam (or any video streaming device) with gesture recognition module

Parameters
  • video_index (Union[int, str], optional) – Index of video device. can be an IP

  • video_path. Defaults to 0. (or) –

  • window_name (str, optional) – Name of opencv window. Defaults to “Gesture Recognition”.

on_frame_detect(label, video_index=0)

Run detection on tkinter label

Parameters
  • label (Label) – Tkinter label to view output

  • video_index (int, optional) – Index of the video device. Defaults to 0.

stop_frame_detection()

Stop frame detection