Skip to content

feat: Add native support for mode fn#3971

Open
pchintar wants to merge 1 commit intoapache:mainfrom
pchintar:mode-native
Open

feat: Add native support for mode fn#3971
pchintar wants to merge 1 commit intoapache:mainfrom
pchintar:mode-native

Conversation

@pchintar
Copy link
Copy Markdown

Which issue does this PR close?

Closes #3970 .

Rationale for this change

This PR adds native support for the MODE aggregate function in Comet, enabling fully native execution without fallback to Spark.

MODE returns the most frequent value in a group. Unlike simple aggregates, it requires maintaining full frequency state, which is implemented here in a mergeable and execution-friendly form.

What changes are included in this PR?

  • Added a native Rust implementation (mode.rs) using:

    • HashMap-based frequency tracking per aggregation unit
    • Mergeable binary state to support partial → final aggregation
  • Implemented both:

    • Accumulator (non-grouped execution)
    • GroupsAccumulator (vectorized grouped execution)
  • GroupsAccumulator maintains per-group frequency maps, ensuring:

    • correct grouping semantics
    • efficient batched updates
  • Added full planner + serde wiring:

    • proto (Mode)
    • Scala serde (CometMode)
    • planner integration (AggExprStruct::Mode)

The implementation preserves full correctness across all execution stages:

  • Partial + Final aggregation

    • state stores complete frequency maps (not just local winners)
    • guarantees correctness after merge
  • Null handling

    • nulls are ignored during counting
    • all-null inputs → NULL
  • Tie handling

    • deterministic selection via max_by_key on counts
  • Type coverage

    • supports all primitive and commonly used Comet-compatible types
  • Floating-point edge cases

    • normalized hashing for NaN values to ensure consistent grouping

How are these changes tested?

  • Verified across:

    • multiple data types
    • grouped and non-grouped queries
  • Edge cases tested:

    • empty input
    • all-null input
    • mixed null/non-null
    • skewed distributions (clear vs non-clear modes)
    • tie scenarios
  • Confirmed no fallback in execution plans

@pchintar
Copy link
Copy Markdown
Author

Hi, so I understand the issue with the formatting of rust files & so I ran cargo fmt to address the lint failure. But I don't know what's causing the remaining error of "[Spark SQL Tests / spark-sql-auto-sql_core-2/spark-4.0.1"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native support for MODE aggregate function

1 participant