feat(unparser): Keep inner join Filter → TableScan predicates to WHERE instead of moving to JOIN ON#21694
Open
sgrebnov wants to merge 3 commits intoapache:mainfrom
Open
feat(unparser): Keep inner join Filter → TableScan predicates to WHERE instead of moving to JOIN ON#21694sgrebnov wants to merge 3 commits intoapache:mainfrom
Filter → TableScan predicates to WHERE instead of moving to JOIN ON#21694sgrebnov wants to merge 3 commits intoapache:mainfrom
Conversation
…HERE` instead of moving to `JOIN ON`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Partially addresses #13156 (inner joins only; outer joins require additional work)
When the DataFusion optimizer pushes filter predicates into
TableScannodes (e.g. viaFilterPushdown), the unparser'stry_transform_to_simple_table_scan_with_filtersextracts those filters and then always folds them into theJOIN ONclause. This is problematic when the extracted filters contain subquery expressions (scalar subqueries,IN,EXISTS), because some SQL backends — notably BigQuery — reject subqueries insideJOIN ONpredicates.This currently breaks 5 TPC-H queries (Q2, Q16, Q17, Q18, Q21) when unparsed SQL is sent to BigQuery.
We did attempt to fix this (#13496) by moving all filters to
WHERE, which brokeLEFT/RIGHT/FULLjoin semantics (moving a filter fromONtoWHEREchanges the result for outer joins, as demonstrated in #13132).What changes are included in this PR?
For inner joins only,
table_scan_filtersextracted bytry_transform_to_simple_table_scan_with_filtersare now placed in theWHEREclause instead of theJOIN ONclause. This is safe becauseONandWHEREare semantically equivalent for inner joins.For non-inner joins (
LEFT,RIGHT,FULL), the existing behavior is preserved — filters remain inJOIN ON— since moving them toWHEREwould change query semantics.Are these changes tested?
Yes.
test_join_with_table_scan_filtersthat constructs an inner join where the right side has atable_scan_with_filterscontaining a scalar subquery. Verifies the subquery predicate appears inWHERE, notJOIN ON.test_join_with_table_scan_filtersreflecting thattable_scan_filtersnow appear inWHEREfor inner joins.Are there any user-facing changes?
SQL generated by the unparser for inner joins may now place
TableScanpushdown filters in theWHEREclause instead of theJOIN ONclause (similar to changes intest_join_with_table_scan_filters)Alternatives considered
An alternative approach considered is to introduce a
supports_subquery_in_join_predicatedialect flag that only moves subquery-containing filters toWHEREwhen the dialect opts in (e.g.BigQueryDialect), preserving existing behavior for all other dialects.Example implementation: spiceai#151
Current approach was chosen due to
WHEREcan trigger filter pushdown on the target backend, which is a potential performance win.