Action Recognition Models

Action recognition is a fundamental task in computer vision that involves the analysis of video sequences to identify and categorize specific human actions or activities. Action recognition with drones holds significant importance in various fields due to its potential to enhance safety and decision-making. By accurately identifying and tracking human actions or movements from an aerial perspective, drones can play a crucial role in security and surveillance operations, enabling rapid response to potential threats or emergencies. Additionally, drones equipped with action recognition capabilities in sports and entertainment can capture dynamic and engaging footage, enhancing the viewer experience. For this task, we offer three different video recognition models that achieve state-of-the-art results on well-known video recognition datasets.

We developed a common class for all action recognition models:

class dronevis.models.ActionRecognizer(num_preds=1)

Action recognition class with VideoMAE

This class inherits from base class CVModel, and implements its abstract methods for code integrity.

__init__(num_preds=1)

Construct model instance

Parameters

num_preds (int, optional) – number of predictions to return.
to 1. (Defaults) –

load_model(model_name='mcg')

Load model from memory

Parameters

model_name (str, optional) – Type of the model to be used. There are
available types ["google", "mcg", "facebook"] Defaults to "mcg". (3) –

transform_img(image)

Transform input video

Parameters

image (np.ndarray) – input video, using “image” just for
purpose (inheritance) –

predict(image)

Run model inference on the provided video

Parameters

image (np.ndarray) – input video, using “image” just for
purpose (inheritance) –

Returns

List of predicted labels

Return type

List[str]

detect_webcam(video_index=0, window_name='Action Recognition', num_frames=16)

Run model inference on webcam feed

Parameters

video (Union[int, str], optional) – webcam id or video path.
to 0. (Defaults) –
fps (int, optional) – frames per second. Defaults to 30.
window_name (str, optional) – window name. Defaults to “Action Recognition”.
num_frames (int, optional) – number of frames to sample. Defaults to 16.