Skip to content

feat: support TimestampType join keys in SortMergeJoin#3986

Open
andygrove wants to merge 5 commits intoapache:mainfrom
andygrove:feat/smj-timestamp-keys
Open

feat: support TimestampType join keys in SortMergeJoin#3986
andygrove wants to merge 5 commits intoapache:mainfrom
andygrove:feat/smj-timestamp-keys

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #3985.

Rationale for this change

CometSortMergeJoinExec rejected TimestampType join keys via supportedSortMergeJoinEqualType, forcing a fall-back to Spark whenever a SortMergeJoin keyed on a timestamp column. The restriction was historical — TimestampNTZType was already allowed but TimestampType had never been enabled.

Naively flipping the guard surfaces an upstream limitation: DataFusion's SMJ comparator (joins/sort_merge_join/stream.rs is_join_arrays_equal, joins/utils.rs compare_join_arrays) only pattern-matches Timestamp(_, None). Spark's TimestampType serializes to Timestamp(µs, "UTC"), hitting the fallthrough not_impl_err!. Because Spark stores TimestampType as UTC-normalized microseconds, casting Timestamp(µs, Some(_))Timestamp(µs, None) is a metadata-only operation that preserves both equality and row order, so it is safe to apply at the join-key level.

What changes are included in this PR?

  • JVM guard: add TimestampType to supportedSortMergeJoinEqualType in operators.scala.
  • Native planner workaround: in the OpStruct::SortMergeJoin arm of planner.rs, introduce a strip_timestamp_tz helper that wraps any Timestamp(_, Some(_)) join-key expression in a CastExpr to Timestamp(_, None) before handing the keys to SortMergeJoinExec::try_new. sort_options and the child plans are untouched.
  • Tests: replace the obsolete "falls back to Spark" test in CometJoinSuite with four positive tests exercising the new path:
    • Inner join with Asia/Kathmandu session timezone.
    • LEFT OUTER, RIGHT OUTER, FULL OUTER variants.
    • Composite (string, timestamp) join key.
    • Nullable timestamp keys (both inner and full-outer variants).

How are these changes tested?

All four new tests use checkSparkAnswerAndOperator with classOf[CometSortMergeJoinExec] to assert both correctness against Spark and native execution. The full CometJoinSuite passes (13/13). cargo clippy --all-targets --workspace -- -D warnings is clean and make format reports no changes.

@andygrove andygrove marked this pull request as ready for review April 18, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SMJ support for Timestamp join keys

1 participant