Skip to content

Support drop_tail_invalid_frames for animate model preprocess#1092

Merged
llmc-reviewer merged 1 commit into
mainfrom
animate
May 25, 2026
Merged

Support drop_tail_invalid_frames for animate model preprocess#1092
llmc-reviewer merged 1 commit into
mainfrom
animate

Conversation

@helloyongyang
Copy link
Copy Markdown
Contributor

No description provided.

@llmc-reviewer llmc-reviewer merged commit 557eb4d into main May 25, 2026
2 checks passed
@llmc-reviewer llmc-reviewer deleted the animate branch May 25, 2026 08:40
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to handle invalid body keypoints during video preprocessing by adding a --drop_tail_invalid_frames flag and implementing robust fallbacks, such as empty face crops and skipping character replacement for frames with missing keypoints. Feedback suggests improving the portability of the new shell script by removing hardcoded absolute paths and increasing the robustness of the preprocessing pipeline by utilizing per-frame metadata and dimensions instead of relying on the first frame's properties.

Comment on lines +4 to +7
lightx2v_path=/data/nvme1/yongyang/dok/bugs/v1/LightX2V
model_path=/data/nvme1/wushuo/hf_models/Wan2.2-Animate-14B
video_path=/data/nvme1/yongyang/dok/bugs/examples/qqqq/input1.mp4
refer_path=/data/nvme1/yongyang/dok/bugs/examples/qqqq/src_ref.png
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script contains hardcoded absolute paths to specific user directories (e.g., /data/nvme1/yongyang/...). This makes the script non-portable and likely to fail for other users. Consider using relative paths or environment variables to specify these locations.

continue

try:
face_bbox_for_image = get_face_bboxes(meta["keypoints_face"][:, :2], scale=1.3, image_shape=(frames[0].shape[0], frames[0].shape[1]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using frames[0].shape is less robust than using the shape of the current frame being processed. Although all frames are expected to have the same resolution in this pipeline, it's better practice to use the local variable frame which is already available in the loop.

Suggested change
face_bbox_for_image = get_face_bboxes(meta["keypoints_face"][:, :2], scale=1.3, image_shape=(frames[0].shape[0], frames[0].shape[1]))
face_bbox_for_image = get_face_bboxes(meta["keypoints_face"][:, :2], scale=1.3, image_shape=frame.shape[:2])

each_keypoint = body_key_points[each_index]
if None is each_keypoint:
if use_valid_body_keypoints:
points = self._get_body_prompt_points(kp2ds[key_frame_index], kp2ds[0]["width"], kp2ds[0]["height"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is better to use the width and height from the current frame's pose metadata (kp2ds[key_frame_index]) instead of the first frame of the chunk (kp2ds[0]) for better consistency and robustness.

Suggested change
points = self._get_body_prompt_points(kp2ds[key_frame_index], kp2ds[0]["width"], kp2ds[0]["height"])
points = self._get_body_prompt_points(kp2ds[key_frame_index], kp2ds[key_frame_index]["width"], kp2ds[key_frame_index]["height"])

wh = np.array([[kp2ds[0]["width"], kp2ds[0]["height"]]])
points = (keypoints_body * wh).astype(np.int32)
keypoints_body = np.array(keypoints_body_list)[:, :2]
wh = np.array([[kp2ds[0]["width"], kp2ds[0]["height"]]])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous comment, use the width and height from the current frame's pose metadata instead of the first frame of the chunk.

Suggested change
wh = np.array([[kp2ds[0]["width"], kp2ds[0]["height"]]])
wh = np.array([[kp2ds[key_frame_index]["width"], kp2ds[key_frame_index]["height"]]])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants