Split FileStreamMetrics into its own module#21340
Merged
comphead merged 3 commits intoapache:mainfrom Apr 3, 2026
Merged
Conversation
alamb
commented
Apr 3, 2026
| } | ||
|
|
||
| /// A timer that can be started and stopped. | ||
| pub struct StartableTime { |
Contributor
Author
There was a problem hiding this comment.
This code is just moved into its own module in the first commit
I also had to touch up the links in the documentation which I did in its own commit for easier review
This was referenced Apr 3, 2026
comphead
approved these changes
Apr 3, 2026
Contributor
comphead
left a comment
There was a problem hiding this comment.
Thanks, makes a lot of sense
Contributor
Author
|
Thanks @comphead |
github-merge-queue bot
pushed a commit
that referenced
this pull request
Apr 14, 2026
Stacked on - #21327 - #21340 ## Which issue does this PR close? - part of #20529 - Broken out of #20820 ## Rationale for this change The Morsel API allows for finer grain parallelism (and IO). It is important to have the FileStream work in terms of the Morsel API to allow future features (like workstealing, etc) ## What changes are included in this PR? I apologize for the large diff; Note about 1/2 of this PR is tests and a test framework to test the calling sequence of FileStream. 1. Rewrite FileStream in terms of the MorselAPI 2. Add snapshot driven test to document the I/O and CPU patterns in FileStream 3. Add snapshot based tests that show the ordering of files ## Are these changes tested? Yes by existing functional and benchmark tests, as well as new functional snapshot based tests ## Are there any user-facing changes? No (not yet) --------- Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Apr 20, 2026
## Which issue does this PR close? - Closes #20529 - Closes #20820 ## Rationale for this change This PR finally enables dynamic work scheduling in the FileStream (so that if a task is done it can look at any remaining work) This improves performance on queries that scan multiple files and the work is not balanced evenly across partitions in the plan (e.g. we have dynamic filtering that reduces work significantly) It is the last of a sequence of several PRs: - #21342 - #21327 - #21340 ## What changes are included in this PR? 1. Add shared state across sibling FileStream's and the wiring to connect them 2. Sibling streams put their file work into a shared queue when it can be reordered 3. Add a bunch of tests sjpw Note there are a bunch of other things that are NOT included in this PR, including 1. Trying to limit concurrent IO (this PR has the same properties as main -- up to one outstanding IO per partition) 2. Trying to issue multiple IOs by the same partition (aka to interleave IO and CPU work more) 4. Splitting files into smaller units (e.g. across row groups) As @Dandandan proposes below, I expect we can work on those changes as follow on PRs. ## Are these changes tested? Yes by existing functional and benchmark tests, as well as new functional tests ## Are there any user-facing changes? Yes, faster performance (see benchmarks): #21351 (comment) --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
@Dandandan, @adriangb and I are in the process of significantly reworking how FileStream works internally (morsels!)
As the FileStream will get more complicated, it needs a better structure than one giant module to ensure we can work with it. I am trying to do this as incremental PRs to keep review burden low
What changes are included in this PR?
Are these changes tested?
Yes by CI
Are there any user-facing changes?
No, just internal code motion