MediaPipe Pose

MediaPipe Pose Estimation is based on the Blazepose architecture. Unlike YOLOv8-Pose, MediaPipe provides 33 3D keypoints in real-time. These keypoints are a superset of the 17 keypoints provided by YOLOv8 (COCO dataset keypoints), and they also include keypoints for the face, hands, and feet (found in BlazeFace and BlazePalm). The pipeline of this pose estimation involves first detecting a person in the image using a face detector and then predicting the keypoints, assuming that the face is always visible. MediaPipe Pose Estimation is mainly designed for fitness applications for a single person or a few people in the scene.

Example

from dronevis.models import PoseSegEstimation

model = PoseSegEstimation()
model.load_model()
model.detect_webcam()

MediaPipe Pose Class

class dronevis.models.PoseSegEstimation(is_seg=False, is_seg_pose=False)

Pose estimation class for loading and predicting with mediapipe BlazePose model.

The model inherits from base CVModel and implements its abstract methods: load_model, transform_img, predict, detect_webcam.

__init__(is_seg=False, is_seg_pose=False): Initialize self. See help(type(self)) for accurate signature.

load_model(): Load model from weights associated with mediapipe

transform_img(image)

Transform image from BGR to RGB

Parameters: image (np.array) – input image
Returns: transformed image
Return type: np.array

predict(image, is_seg=False, is_seg_pose=False, all_formats=False)

Predict keypoints for pose and draw them on input image. Input image is assumed to be BGR.

Parameters

image (np.array) – input image
is_seg (bool, optional) – flag whether a segmentation is desired. Defaults to False.
all_formats (bool, optional) – flag whether to return all image format (segmentation,
estimation, and pose-segmentation) Defaults to False. (pose) –

Returns

output image with keypoints drawn, segmented image segmented image with pose points

Return type

Tuple[np.array, …]

detect_webcam(video_index=0, window_name='Pose')

Start webcam pose estimation from video_index (to quit running this function press ‘q’)

The stream is retrieved and decoded using opencv library.

Parameters

video_index (int | str, optional) – index of video stream device. Defaults to 0.
window_name (str, optional) – name of cv2 window. Defaults to “Pose”.
is_seg (bool, optional) – flag whether a segmentation is desired. Defaults to False.