Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions datafusion/physical-plan/src/spill/spill_manager.rs
Original file line number Diff line number Diff line change
Expand Up @@ -160,9 +160,22 @@ impl SpillManager {
Ok(file.map(|f| (f, max_record_batch_size)))
}

/// Reads a spill file as a stream. The file must be created by the current `SpillManager`.
/// This method will generate output in FIFO order: the batch appended first
/// will be read first.
/// Reads a spill file as a stream. The file must be created by the current
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen if file created by another SpillManager? would the file be ignored or crash?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code—it appears to succeed silently, and error would be propagated to the stream consumer due to a schema mismatch.
I’ve updated the comment to indicate this may lead to undefined behavior.
We could be more defensive here; I’ll send a follow-up patch to make it fail fast.

/// `SpillManager`; otherwise behavior is undefined.
///
/// Output is produced in FIFO order: the batch appended first is read first.
///
/// # Arg `max_record_batch_memory`
///
/// Most callers should pass `None`. This is mainly useful for the
/// memory-limited sort-preserving merge path.
///
/// When provided, this value is used only as a validation hint. If a
/// decoded batch exceeds this threshold, a debug-level log message is
/// emitted.
///
/// That path uses the maximum spilled batch size to conservatively estimate
/// the merge degree when merging multiple sorted runs.
pub fn read_spill_as_stream(
&self,
spill_file_path: RefCountedTempFile,
Expand Down
Loading