MediaPipe Pose

MediaPipe Pose Estimation is based on the Blazepose architecture. Unlike YOLOv8-Pose, MediaPipe provides 33 3D keypoints in real-time. These keypoints are a superset of the 17 keypoints provided by YOLOv8 (COCO dataset keypoints), and they also include keypoints for the face, hands, and feet (found in BlazeFace and BlazePalm). The pipeline of this pose estimation involves first detecting a person in the image using a face detector and then predicting the keypoints, assuming that the face is always visible. MediaPipe Pose Estimation is mainly designed for fitness applications for a single person or a few people in the scene.

MediaPipe Pose

Example

from dronevis.models import PoseSegEstimation

model = PoseSegEstimation()
model.load_model()
model.detect_webcam()

MediaPipe Pose Class

class dronevis.models.PoseSegEstimation(is_seg=False, is_seg_pose=False)

Pose estimation class for loading and predicting with mediapipe BlazePose model.

The model inherits from base CVModel and implements its abstract methods: load_model, transform_img, predict, detect_webcam.

__init__(is_seg=False, is_seg_pose=False)

Initialize self. See help(type(self)) for accurate signature.

load_model()

Load model from weights associated with mediapipe

transform_img(image)

Transform image from BGR to RGB

Parameters

image (np.array) – input image

Returns

transformed image

Return type

np.array

predict(image, is_seg=False, is_seg_pose=False, all_formats=False)

Predict keypoints for pose and draw them on input image. Input image is assumed to be BGR.

Parameters
  • image (np.array) – input image

  • is_seg (bool, optional) – flag whether a segmentation is desired. Defaults to False.

  • all_formats (bool, optional) – flag whether to return all image format (segmentation,

  • estimation, and pose-segmentation) Defaults to False. (pose) –

Returns

output image with keypoints drawn, segmented image segmented image with pose points

Return type

Tuple[np.array, …]

detect_webcam(video_index=0, window_name='Pose')

Start webcam pose estimation from video_index (to quit running this function press ‘q’)

The stream is retrieved and decoded using opencv library.

Parameters
  • video_index (int | str, optional) – index of video stream device. Defaults to 0.

  • window_name (str, optional) – name of cv2 window. Defaults to “Pose”.

  • is_seg (bool, optional) – flag whether a segmentation is desired. Defaults to False.