arrow-select: fuse inline BinaryView filter coalescing#9755
arrow-select: fuse inline BinaryView filter coalescing#9755ClSlaid wants to merge 2 commits intoapache:mainfrom
Conversation
Avoid materialising an intermediate filtered RecordBatch when coalescing sparse inline BinaryView batches, and reuse shared filter indices for primitive columns in mixed batches. This addresses issue apache#9143 and adds focused tests and benchmarks for single-column and mixed BinaryView filter paths. Signed-off-by: cl <cailue@apache.org>
|
/cc @alamb PTAL. |
|
run benchmark coalesce_kernels |
Split push_batch_with_filter into an explicit fused path classifier and dedicated inline BinaryView dispatch helpers, and use iterator-based extension in the primitive fast path. This keeps unsupported BinaryView layouts on the existing filter path while making the optimised routing for issue apache#9143 clearer to review and maintain. Signed-off-by: cl <cailue@apache.org>
|
Optimising BinaryView filter will give you a slight overall performance regression on that benchmark. @Dandandan 🫠 Warp up
I reran the relevant Results:
I also checked which path the benchmark is actually taking.
So the main improvement is in the fully inline |
| .as_primitive::<T>(); | ||
|
|
||
| if let Some(nulls) = s.nulls().filter(|nulls| nulls.null_count() > 0) { | ||
| for &idx in indices { |
There was a problem hiding this comment.
I think there should be a path based on iterator here as well?
|
In my earlier experiments, #8951 I also got good performance on my local machine but was regressing quite a bit on the shared benchmarks based on the @alamb runner (which was running on Intel CPUs). I got some improvements from increasing the selectivity threshold https://github.com/apache/arrow-rs/pull/8951/files#diff-49658ee26403ad8cc710620108072c017373bea671df8d2a83543b9de13d0e8aL42 and there are some optimizations we can do to the filter iterators to make them faster. |
Summary
BinaryViewfilter and coalescing paths so primitive columns and inline views can be appended directly without materialising an intermediate filteredRecordBatchBinaryViewinputsBinaryViewfilter cases related to#9143Verification
cargo test -p arrow-select coalesce --libcargo clippy -p arrow-select --lib --tests -- -D warningscargo clippy -p arrow --bench coalesce_kernels --features test_utils -- -D warningscargo bench -p arrow --bench coalesce_kernels --features test_utils -- --noplot single_binaryviewcargo bench -p arrow --bench coalesce_kernels --features test_utils -- --noplot mixed_binaryviewBenchmark Results
Measured against a clean
origin/mainworktree with the sameBinaryViewbenchmark additions. The figures below compare representative median times from the baseline worktree and this branch.Mixed primitive + BinaryView
mixed_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.001:23.16 ms->8.51 msmixed_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01:2.37 ms->1.31 msmixed_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.001:31.70 ms->14.33 msmixed_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.01:3.92 ms->2.44 msSingle BinaryView
single_binaryview, 8192, nulls: 0, selectivity: 0.01:4.86 ms->4.90 ms(roughly flat, slightly slower)single_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.001:34.72 ms->19.33 mssingle_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01:3.46 ms->2.03 mssingle_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.01:5.93 ms->3.97 mssingle_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.8:597 µs->619 µs(regression)single_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.8:1.78 ms->1.79 ms(roughly flat, slightly slower)In short, this change substantially improves the mixed primitive + inline
BinaryViewpath that motivated#9143, while the single-columnBinaryViewbenchmarks still show trade-offs: sparse inline cases improve, but dense inline cases are slightly slower and the non-inline single-column path is effectively unchanged.Closes #9143.