Skip to content

Cosmos Transfer2.5 Auto-Regressive Inference Pipeline#13114

Merged
yiyixuxu merged 3 commits intohuggingface:mainfrom
miguelmartin75:cosmos/transfer2.5-ar
Feb 26, 2026
Merged

Cosmos Transfer2.5 Auto-Regressive Inference Pipeline#13114
yiyixuxu merged 3 commits intohuggingface:mainfrom
miguelmartin75:cosmos/transfer2.5-ar

Conversation

@miguelmartin75
Copy link
Contributor

@miguelmartin75 miguelmartin75 commented Feb 10, 2026

What does this PR do?

This builds off #13066 by adding auto-regressive inference for Cosmos Transfer2.5. This pipeline does not require the controlnet or controls to be input. From the documentation:

The call function can be used in two modes: with or without controls.
When controls are not provided (controls is None), inference works in the same manner as predict2.5 (see
Cosmos2_5_PredictPipeline). This mode strictly uses the base transformer (self.transformer) to perform
inference and accepts as input an optional image or video along with a prompt / negative_prompt, and
can be used in the following ways:
- Text2World: image=None, video=None, prompt provided.
- Image2World: image provided, video=None, prompt provided.
- Video2World: video provided, image=None, prompt provided.
When controls are provided and a ControlNet is attached, controls drive the conditioning and video &
image is ignored. Controls are assumed to be pre-processed, e.g. edge maps are pre-computed.
Setting num_frames will restrict the total number of frames output, if not provided or assigned to None
(default) then the number of output frames will match the input video, image or controls respectively.
Auto-regressive inference is supported and thus a sliding window of num_frames_per_chunk frames are used per
denoising loop. In addition, when auto-regressive inference is performed, the previous
num_latent_conditional_frames or num_conditional_frames are used to condition the following denoising
inference loops.

Who can review?

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR!

my main question is would it make sense to make this pipeline strictly ControlNet-focused? looking at the pipeline code, this would simplify the pipeline quite a bit

@sayakpaul sayakpaul requested a review from DN6 February 20, 2026 04:36
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!
i left a few more questions/feedbacks, let me know what you think
we can merge this soon!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!
let's fix the CI too

@miguelmartin75
Copy link
Contributor Author

CI should be fixed now

@yiyixuxu yiyixuxu merged commit 212db7b into huggingface:main Feb 26, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants