Skip to content

fix: Validate spill read schema#21738

Open
2010YOUY01 wants to merge 3 commits intoapache:mainfrom
2010YOUY01:validate-spill-read
Open

fix: Validate spill read schema#21738
2010YOUY01 wants to merge 3 commits intoapache:mainfrom
2010YOUY01:validate-spill-read

Conversation

@2010YOUY01
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

Follow-up to a review comment in : #21713 (comment)
Not a bug fix, this PR tries to be more defensive and catch potential bugs.

Before, when you write a spill file from a SpillManager, then read with another SpillManager of different schema, it would succeed. This is not a expected use pattern, an error will get propagated to the caller, and become harder to debug.

This PR validates the schema when reading the first batch, and fail fast if the schema does not match.

Note it only validates the schema, if two SpillManagers with the same schema do read and write, it's still allowed, but this is not a expected use pattern. Validating this case requires assigning SpillManager UID, and add that to the Arrow IPC file metadata, can be tricky, so leave this as TODO for simplicity.

What changes are included in this PR?

Are these changes tested?

UTs

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Apr 20, 2026
Comment thread datafusion/physical-plan/src/spill/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @2010YOUY01 makes sense

2010YOUY01 and others added 2 commits April 21, 2026 12:39
Co-authored-by: Oleks V <comphead@users.noreply.github.com>
// schema of the current `SpillManager`
let actual_schema = reader.schema();

if actual_schema != expected_schema {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The equality includes metadata. So we also need to strict metadta?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I’m not sure—some metadata may need to be preserved through the round trip.

What do you think about keeping the metadata comparison for now to stay conservative? If it proves unnecessarily strict later, we can relax it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants