Skip to content

chore: Improve shuffle fallback logic#3989

Open
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:refactor-shuffle-fallback-coordinator
Open

chore: Improve shuffle fallback logic#3989
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:refactor-shuffle-fallback-coordinator

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 18, 2026

Which issue does this PR close?

Closes 3984

Rationale for this change

What changes are included in this PR?

  • Remove STAGE_FALLBACK_TAG since this was duplicating the existing withInfo mechanism for tagging plans with fallback reasons
  • Improve shuffle serde logic to avoid tagging the plan until native and columnar compatibility checks have both run
  • Improve documentation on withInfo methods

How are these changes tested?

Existing tests, especially those added in #3982.

Replace the separate STAGE_FALLBACK_TAG with explain-info-based stickiness.
Shuffle path checks (`nativeShuffleFailureReasons`, `columnarShuffleFailureReasons`)
are now pure and return reasons instead of tagging eagerly. A new
`shuffleSupported` coordinator short-circuits on `hasExplainInfo`, tries native
then columnar, and tags via `withInfos` only on total failure. DPP fallback,
which disqualifies both paths, moves into the coordinator. This removes the
need for `CometFallback` and eliminates the semantic split where `withInfo`
could fire for a path-specific failure while the node still converted via a
different path.
…llup

Expand the doc comments on withInfo/withInfos/hasExplainInfo to make clear
that these record fallback reasons surfaced in extended explain output,
and that any call to withInfo is a signal that the node falls back to
Spark.

Also restore the child-expression rollup for native range-partitioning
sort orders that was lost in the earlier refactor: when exprToProto fails
on a sort-order expression, its own fallback reasons (e.g. strict
floating-point sort) are now copied onto the shuffle's reasons so they
surface alongside 'unsupported range partitioning sort order'.
@andygrove andygrove changed the title feat: Improve shuffle fallback logic and reduce planning overhead feat: Improve shuffle fallback logic Apr 18, 2026
@andygrove andygrove changed the title feat: Improve shuffle fallback logic chore: Improve shuffle fallback logic Apr 18, 2026
@andygrove andygrove marked this pull request as ready for review April 18, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant