Adds dynamic filter support for NestedLoopJoinExec#21851
Adds dynamic filter support for NestedLoopJoinExec#21851SubhamSinghal wants to merge 2 commits intoapache:mainfrom
Conversation
|
Thank you—this is an exciting optimization! I am working on a general infrastructure for NLJ dynamic filters and custom build index that could help simplify this implementation. Would you (and other reviewers) be open to waiting until I submit that PR next 1-2 weeks, so we can coordinate and collaborate on this? I’d appreciate any thoughts on this direction! Here is the preview and WIP draft: The core idea is that, most specialized joins (e.g., Piecewise Merge Join, IEJoin, Spatial Join, Array Set Joins) follow a standard pattern:
Specialization typically only requires:
Taking this PR as example, beyond the dynamic filter implemented, if we know a window range has a fixed maximum span, we could sort the build side and use a custom index to accelerate the probe further. So I'm hoping to add a common trait to support both custom dynamic filter and custom runtime index. Introducing a common extension point can make adding similar optimizations easier -- only a small trait need to be implemented to specify how to build/probe index, how to build dynamic filters, for each specialization, and we won't need to touch the join core state machine each time. I have a WIP draft of this infrastructure here (only refactor and API rough shape is done, still working on adding a example implementation for both custom index and dynamic filter): |
Which issue does this PR close?
Rationale for this change
NestedLoopJoinExec handles non-equi joins (range, temporal, inequality) but currently reads ALL probe-side data even
when the build side has a narrow range. For example:
If the build side (windows) has start values in [100, 300] and end values in [150, 400], the probe scan reads all
events even though only events with ts in [100, 400] can possibly match. With dynamic filters, the probe scan can skip
row groups outside this range.
HashJoinExec already supports dynamic filter pushdown for equi-joins. This PR extends the same mechanism to NLJ for
non-equi joins by analyzing the JoinFilter expression to derive bounds from build-side data.
What changes are included in this PR?
New: nlj_filter_analysis.rs — Expression analysis module that walks the JoinFilter expression tree to extract
(probe_col, operator, build_col) pairs and derive probe-side bounds:
Modified: nested_loop_join.rs:
Are these changes tested?
Yes, with UT
Are there any user-facing changes?
No.