Skip to content

Adds dynamic filter support for NestedLoopJoinExec#21851

Open
SubhamSinghal wants to merge 2 commits intoapache:mainfrom
SubhamSinghal:nlj-dynamic-filter
Open

Adds dynamic filter support for NestedLoopJoinExec#21851
SubhamSinghal wants to merge 2 commits intoapache:mainfrom
SubhamSinghal:nlj-dynamic-filter

Conversation

@SubhamSinghal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #16973 (partial — adds dynamic filter support for NestedLoopJoinExec)

Rationale for this change

NestedLoopJoinExec handles non-equi joins (range, temporal, inequality) but currently reads ALL probe-side data even
when the build side has a narrow range. For example:

SELECT * FROM events e JOIN windows w ON e.ts BETWEEN w.start AND w.end

If the build side (windows) has start values in [100, 300] and end values in [150, 400], the probe scan reads all
events even though only events with ts in [100, 400] can possibly match. With dynamic filters, the probe scan can skip
row groups outside this range.

HashJoinExec already supports dynamic filter pushdown for equi-joins. This PR extends the same mechanism to NLJ for
non-equi joins by analyzing the JoinFilter expression to derive bounds from build-side data.

What changes are included in this PR?

New: nlj_filter_analysis.rs — Expression analysis module that walks the JoinFilter expression tree to extract
(probe_col, operator, build_col) pairs and derive probe-side bounds:

  • extract_bound_pairs() — finds BinaryExpr comparisons between build and probe columns
  • compute_build_bounds() — computes min/max from the merged build batch
  • build_probe_predicate() — converts bounds into probe-side filter (e.g., ts >= 100 AND ts <= 400)

Modified: nested_loop_join.rs:

  • NLJDynamicFilter struct holding DynamicFilterPhysicalExpr and extracted bound pairs
  • gather_filters_for_pushdown() — creates dynamic filter in Post phase, routes parent filters
  • handle_child_pushdown_result() — captures the filter Arc reference
  • handle_buffering_left() — after build data is ready, computes bounds and pushes filter before probe starts

Are these changes tested?

Yes, with UT

Are there any user-facing changes?

No.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Apr 25, 2026
@2010YOUY01
Copy link
Copy Markdown
Contributor

Thank you—this is an exciting optimization!

I am working on a general infrastructure for NLJ dynamic filters and custom build index that could help simplify this implementation. Would you (and other reviewers) be open to waiting until I submit that PR next 1-2 weeks, so we can coordinate and collaborate on this? I’d appreciate any thoughts on this direction!

Here is the preview and WIP draft:

The core idea is that, most specialized joins (e.g., Piecewise Merge Join, IEJoin, Spatial Join, Array Set Joins) follow a standard pattern:

  1. Buffer: Collect all build-side data.
  2. Probe: Iterate row-by-row.

Specialization typically only requires:

  • Custom Dynamic Filters: To reduce probe-side size (as seen in this PR).
  • Custom Indices: To accelerate the probing process.

Taking this PR as example, beyond the dynamic filter implemented, if we know a window range has a fixed maximum span, we could sort the build side and use a custom index to accelerate the probe further. So I'm hoping to add a common trait to support both custom dynamic filter and custom runtime index.

Introducing a common extension point can make adding similar optimizations easier -- only a small trait need to be implemented to specify how to build/probe index, how to build dynamic filters, for each specialization, and we won't need to touch the join core state machine each time.

I have a WIP draft of this infrastructure here (only refactor and API rough shape is done, still working on adding a example implementation for both custom index and dynamic filter):
https://github.com/2010YOUY01/arrow-datafusion/tree/join-accelerator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Join DynamicFilter enhancements

2 participants