bench: Scale sort benchmarks to 1M rows to exercise merge path#21630
Merged
mbutrovich merged 3 commits intoapache:mainfrom Apr 15, 2026
Merged
bench: Scale sort benchmarks to 1M rows to exercise merge path#21630mbutrovich merged 3 commits intoapache:mainfrom
mbutrovich merged 3 commits intoapache:mainfrom
Conversation
…ently takes the <1MB sort in-place path of the existing ExternalSorter. We want to see larger cases too.
Dandandan
approved these changes
Apr 14, 2026
Contributor
Author
|
Running locally each 10M iteration takes like 30 seconds. I have to do |
Contributor
Author
|
Let me think if 1M would be a more reasonable size up to still hit what we want. |
Contributor
Sounds good - let's try that first? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Current sort benchmarks use 100K rows across 8 partitions (~12.5K rows per partition, ~100KB for integers). This falls below the
sort_in_place_threshold_bytes(1MB), so the "sort partitioned" benchmarks always take the concat-and-sort-in-place path and never exercise the sort-then-merge path that dominates real workloads.What changes are included in this PR?
Parameterizes the sort benchmark on input size, running each case at both 100K rows (existing) and 1M rows (new). At 1M rows, each partition holds ~125K rows (~1MB for integers), which exercises the merge path.
INPUT_SIZEconstant replaced withINPUT_SIZESarray:[(100_000, "100k"), (1_000_000, "1M")]DataGeneratortakesinput_sizeas a constructor parameterinput_sizesort partitioned i64 100k,sort partitioned i64 10M)Are these changes tested?
Benchmark compiles and runs. No functional test changes.
Are there any user-facing changes?
No.