Support drop_tail_invalid_frames for animate model preprocess#1092
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to handle invalid body keypoints during video preprocessing by adding a --drop_tail_invalid_frames flag and implementing robust fallbacks, such as empty face crops and skipping character replacement for frames with missing keypoints. Feedback suggests improving the portability of the new shell script by removing hardcoded absolute paths and increasing the robustness of the preprocessing pipeline by utilizing per-frame metadata and dimensions instead of relying on the first frame's properties.
| lightx2v_path=/data/nvme1/yongyang/dok/bugs/v1/LightX2V | ||
| model_path=/data/nvme1/wushuo/hf_models/Wan2.2-Animate-14B | ||
| video_path=/data/nvme1/yongyang/dok/bugs/examples/qqqq/input1.mp4 | ||
| refer_path=/data/nvme1/yongyang/dok/bugs/examples/qqqq/src_ref.png |
There was a problem hiding this comment.
| continue | ||
|
|
||
| try: | ||
| face_bbox_for_image = get_face_bboxes(meta["keypoints_face"][:, :2], scale=1.3, image_shape=(frames[0].shape[0], frames[0].shape[1])) |
There was a problem hiding this comment.
Using frames[0].shape is less robust than using the shape of the current frame being processed. Although all frames are expected to have the same resolution in this pipeline, it's better practice to use the local variable frame which is already available in the loop.
| face_bbox_for_image = get_face_bboxes(meta["keypoints_face"][:, :2], scale=1.3, image_shape=(frames[0].shape[0], frames[0].shape[1])) | |
| face_bbox_for_image = get_face_bboxes(meta["keypoints_face"][:, :2], scale=1.3, image_shape=frame.shape[:2]) |
| each_keypoint = body_key_points[each_index] | ||
| if None is each_keypoint: | ||
| if use_valid_body_keypoints: | ||
| points = self._get_body_prompt_points(kp2ds[key_frame_index], kp2ds[0]["width"], kp2ds[0]["height"]) |
There was a problem hiding this comment.
It is better to use the width and height from the current frame's pose metadata (kp2ds[key_frame_index]) instead of the first frame of the chunk (kp2ds[0]) for better consistency and robustness.
| points = self._get_body_prompt_points(kp2ds[key_frame_index], kp2ds[0]["width"], kp2ds[0]["height"]) | |
| points = self._get_body_prompt_points(kp2ds[key_frame_index], kp2ds[key_frame_index]["width"], kp2ds[key_frame_index]["height"]) |
| wh = np.array([[kp2ds[0]["width"], kp2ds[0]["height"]]]) | ||
| points = (keypoints_body * wh).astype(np.int32) | ||
| keypoints_body = np.array(keypoints_body_list)[:, :2] | ||
| wh = np.array([[kp2ds[0]["width"], kp2ds[0]["height"]]]) |
There was a problem hiding this comment.
No description provided.