Skip to content

feat(unparser): Keep inner join Filter → TableScan predicates to WHERE instead of moving to JOIN ON#21694

Open
sgrebnov wants to merge 3 commits intoapache:mainfrom
spiceai:sgrebnov/0417-inner-join-filter
Open

feat(unparser): Keep inner join Filter → TableScan predicates to WHERE instead of moving to JOIN ON#21694
sgrebnov wants to merge 3 commits intoapache:mainfrom
spiceai:sgrebnov/0417-inner-join-filter

Conversation

@sgrebnov
Copy link
Copy Markdown
Member

Which issue does this PR close?

Partially addresses #13156 (inner joins only; outer joins require additional work)

When the DataFusion optimizer pushes filter predicates into TableScan nodes (e.g. via FilterPushdown), the unparser's try_transform_to_simple_table_scan_with_filters extracts those filters and then always folds them into the JOIN ON clause. This is problematic when the extracted filters contain subquery expressions (scalar subqueries, IN, EXISTS), because some SQL backends — notably BigQuery — reject subqueries inside JOIN ON predicates.

This currently breaks 5 TPC-H queries (Q2, Q16, Q17, Q18, Q21) when unparsed SQL is sent to BigQuery.

We did attempt to fix this (#13496) by moving all filters to WHERE, which broke LEFT/RIGHT/FULL join semantics (moving a filter from ON to WHERE changes the result for outer joins, as demonstrated in #13132).

What changes are included in this PR?

For inner joins only, table_scan_filters extracted by try_transform_to_simple_table_scan_with_filters are now placed in the WHERE clause instead of the JOIN ON clause. This is safe because ON and WHERE are semantically equivalent for inner joins.

For non-inner joins (LEFT, RIGHT, FULL), the existing behavior is preserved — filters remain in JOIN ON — since moving them to WHERE would change query semantics.

Are these changes tested?

Yes.

  • Added a test case in test_join_with_table_scan_filters that constructs an inner join where the right side has a table_scan_with_filters containing a scalar subquery. Verifies the subquery predicate appears in WHERE, not JOIN ON.
  • Updated existing snapshots in test_join_with_table_scan_filters reflecting that table_scan_filters now appear in WHERE for inner joins.

Are there any user-facing changes?

SQL generated by the unparser for inner joins may now place TableScan pushdown filters in the WHERE clause instead of the JOIN ON clause (similar to changes in test_join_with_table_scan_filters)

Alternatives considered

An alternative approach considered is to introduce a supports_subquery_in_join_predicate dialect flag that only moves subquery-containing filters to WHERE when the dialect opts in (e.g. BigQueryDialect), preserving existing behavior for all other dialects.

Example implementation: spiceai#151

Current approach was chosen due to

  • simplicity: no new dialect flag, fewer code paths, simple change.
  • performance (potentially): as noted in #13156, placing filters in WHERE can trigger filter pushdown on the target backend, which is a potential performance win.

@github-actions github-actions bot added the sql SQL Planner label Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant