perf: Optimize dynamic IN list evaluation with vectorized Arrow eq kernel by zhangxffff · Pull Request #20428 · apache/datafusion

zhangxffff · 2026-02-19T08:48:24Z

Which issue does this PR close?

Closes Optimize IN list with columns evaluation with vectorized Arrow eq kernel #20427 .

Rationale for this change

The dynamic IN list evaluation path in InListExpr::evaluate() is triggered when the list contains non-constant expressions such as column references (e.g., a IN (b, c, d)). It currently compares values row-by-row via make_comparator bypassing Arrow's vectorized SIMD kernels. This makes it orders of magnitude slower than the static filter path (HashSet) used for constant literals.

What changes are included in this PR?

In the dynamic IN list path (None => branch in InListExpr::evaluate()):

Vectorized comparison: Use arrow::compute::kernels::cmp::eq instead of per-row make_comparator for non-nested types (primitives, strings, binary, dictionary). For nested types (Struct, List, Map), fall back to make_comparator since arrow_eq intentionally rejects them due to ambiguous null semantics.
Short-circuit with break: Replace try_fold with an explicit for loop. Check found.true_count() == num_rows before each iteration and break immediately — skipping both evaluate() and comparison for remaining list items.

Are these changes tested?

Yes. Add 7 testcase to cover dynamic in path. Also add New criterion benchmarks (bench_dynamic_int32, bench_dynamic_utf8).

Benchmarked with cargo bench --bench in_list -- "in_list_dynamic" on 8192-row batches. Int32 improved 11–320x, Utf8 improved 2–10x, with the largest gains on high match rates due to early termination.

(zhangxffff) zhangxffff@95d3d60664da ~/W/datafusion (feat/opt_dynamic_in)> critcmp after before
group                                                 after                                  before
-----                                                 -----                                  ------
in_list_dynamic/Int32/list=28/match=0%/nulls=0%       1.00     92.8±0.87µs        ? ?/sec    11.80 1094.4±21.14µs        ? ?/sec
in_list_dynamic/Int32/list=28/match=0%/nulls=20%      1.00    107.9±0.90µs        ? ?/sec    16.57 1788.5±12.38µs        ? ?/sec
in_list_dynamic/Int32/list=28/match=100%/nulls=0%     1.00      3.5±0.04µs        ? ?/sec    319.51 1116.0±21.26µs        ? ?/sec
in_list_dynamic/Int32/list=28/match=100%/nulls=20%    1.00    107.3±1.97µs        ? ?/sec    16.63 1783.9±15.31µs        ? ?/sec
in_list_dynamic/Int32/list=28/match=50%/nulls=0%      1.00     49.6±0.31µs        ? ?/sec    32.46 1610.0±27.97µs        ? ?/sec
in_list_dynamic/Int32/list=28/match=50%/nulls=20%     1.00    107.2±1.15µs        ? ?/sec    19.54     2.1±0.01ms        ? ?/sec
in_list_dynamic/Int32/list=3/match=0%/nulls=0%        1.00     10.1±0.07µs        ? ?/sec    11.61   116.9±1.61µs        ? ?/sec
in_list_dynamic/Int32/list=3/match=0%/nulls=20%       1.00     11.4±0.11µs        ? ?/sec    15.09   172.6±0.97µs        ? ?/sec
in_list_dynamic/Int32/list=3/match=100%/nulls=0%      1.00      3.5±0.02µs        ? ?/sec    33.76   119.7±2.75µs        ? ?/sec
in_list_dynamic/Int32/list=3/match=100%/nulls=20%     1.00     11.4±0.16µs        ? ?/sec    15.12   173.1±0.97µs        ? ?/sec
in_list_dynamic/Int32/list=3/match=50%/nulls=0%       1.00     10.0±0.08µs        ? ?/sec    16.92   170.0±4.09µs        ? ?/sec
in_list_dynamic/Int32/list=3/match=50%/nulls=20%      1.00     11.6±0.19µs        ? ?/sec    17.49   202.5±2.60µs        ? ?/sec
in_list_dynamic/Int32/list=8/match=0%/nulls=0%        1.00     26.6±0.45µs        ? ?/sec    11.73   312.5±2.88µs        ? ?/sec
in_list_dynamic/Int32/list=8/match=0%/nulls=20%       1.00     30.4±0.43µs        ? ?/sec    16.23   493.6±3.45µs        ? ?/sec
in_list_dynamic/Int32/list=8/match=100%/nulls=0%      1.00      3.5±0.03µs        ? ?/sec    90.16   313.6±6.51µs        ? ?/sec
in_list_dynamic/Int32/list=8/match=100%/nulls=20%     1.00     31.2±0.23µs        ? ?/sec    16.04   499.9±4.44µs        ? ?/sec
in_list_dynamic/Int32/list=8/match=50%/nulls=0%       1.00     26.6±0.17µs        ? ?/sec    17.07   453.4±8.14µs        ? ?/sec
in_list_dynamic/Int32/list=8/match=50%/nulls=20%      1.00     30.8±0.37µs        ? ?/sec    19.16   590.6±2.55µs        ? ?/sec
in_list_dynamic/Utf8/list=28/match=0%                 1.00    149.9±2.70µs        ? ?/sec    10.51  1574.8±9.97µs        ? ?/sec
in_list_dynamic/Utf8/list=28/match=100%               1.00    725.9±9.09µs        ? ?/sec    2.13   1548.3±7.20µs        ? ?/sec
in_list_dynamic/Utf8/list=28/match=50%                1.00   1070.0±7.65µs        ? ?/sec    2.93      3.1±0.02ms        ? ?/sec
in_list_dynamic/Utf8/list=3/match=0%                  1.00     16.1±0.17µs        ? ?/sec    10.07   162.4±1.62µs        ? ?/sec
in_list_dynamic/Utf8/list=3/match=100%                1.00     65.8±1.22µs        ? ?/sec    2.39    157.1±1.79µs        ? ?/sec
in_list_dynamic/Utf8/list=3/match=50%                 1.00     93.4±2.15µs        ? ?/sec    3.32    309.6±1.23µs        ? ?/sec
in_list_dynamic/Utf8/list=8/match=0%                  1.00     43.0±0.29µs        ? ?/sec    10.37   445.6±3.09µs        ? ?/sec
in_list_dynamic/Utf8/list=8/match=100%                1.00    197.5±0.93µs        ? ?/sec    2.21    436.9±2.10µs        ? ?/sec
in_list_dynamic/Utf8/list=8/match=50%                 1.00    296.8±2.36µs        ? ?/sec    2.97    880.5±6.52µs        ? ?/sec

Are there any user-facing changes?

No.

adriangb · 2026-02-19T13:17:12Z

Yes. Add 7 testcase to cover dynamic in path. Also add New criterion benchmarks (bench_dynamic_int32, bench_dynamic_utf8).

Could you make a PR adding just the benchmarks first? That way we can merge that and then show a clear improvement on this PR

adriangb · 2026-02-19T13:18:07Z

It would also be nice to see these seemingly unrelated changes (use kernel and break early) as separate PRs

zhangxffff · 2026-02-20T07:01:17Z

Yes. Add 7 testcase to cover dynamic in path. Also add New criterion benchmarks (bench_dynamic_int32, bench_dynamic_utf8).

Could you make a PR adding just the benchmarks first? That way we can merge that and then show a clear improvement on this PR

Thank you so much for your review!

I’ve opened a separate PR (#20444) focused on adding the benchmarks first, and would greatly appreciate it if you could take a look at this code when you have some time.

I will follow up with further optimizations separately after #20444 is merged.

Dandandan · 2026-02-20T07:42:25Z

datafusion/physical-expr/src/expressions/in_list.rs

-                        let rhs = match expr? {
-                            ColumnarValue::Array(array) => {
+                let use_arrow_eq = !value.data_type().is_nested();
+                let mut found =


Instead of setting allocating the first to all -false (i.e. extra allocation / memset), it could initialize the first separately.

Dandandan · 2026-02-20T07:43:17Z

datafusion/physical-expr/src/expressions/in_list.rs

+                                    array.as_ref(),
+                                    SortOptions::default(),
+                                )?;
+                                (0..num_rows)


BooleanBuffer::collect_bool + cloning the original null buffer is much faster.

github-actions bot added the physical-expr Changes to the physical-expr crates label Feb 19, 2026

zhangxffff marked this pull request as draft February 19, 2026 08:52

zhangxffff marked this pull request as ready for review February 19, 2026 09:43

zhangxffff added 4 commits February 19, 2026 09:44

perf: Vectorize IN list evaluation with arrow eq kernel

7e16978

add dynamic in list test

93daecc

fix cargo fmt and clippy

9c2fe21

remove unused match_percent

61af0e2

zhangxffff force-pushed the feat/opt_dynamic_in branch from 6a309d5 to 61af0e2 Compare February 19, 2026 10:09

zhangxffff mentioned this pull request Feb 20, 2026

bench: Add IN list benchmarks for non-constant list expressions #20444

Open

Dandandan reviewed Feb 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize dynamic IN list evaluation with vectorized Arrow eq kernel#20428

perf: Optimize dynamic IN list evaluation with vectorized Arrow eq kernel#20428
zhangxffff wants to merge 4 commits intoapache:mainfrom
zhangxffff:feat/opt_dynamic_in

zhangxffff commented Feb 19, 2026 •

edited

Loading

Uh oh!

adriangb commented Feb 19, 2026

Uh oh!

adriangb commented Feb 19, 2026

Uh oh!

zhangxffff commented Feb 20, 2026

Uh oh!

Dandandan Feb 20, 2026

Uh oh!

Dandandan Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

zhangxffff commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Feb 19, 2026

Uh oh!

adriangb commented Feb 19, 2026

Uh oh!

zhangxffff commented Feb 20, 2026

Uh oh!

Dandandan Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Dandandan Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

zhangxffff commented Feb 19, 2026 •

edited

Loading