Skip to content

fix: UNIQUE constraint with NULLs incorrectly collapses GROUP BY groups#21715

Open
xiedeyantu wants to merge 4 commits intoapache:mainfrom
xiedeyantu:fd
Open

fix: UNIQUE constraint with NULLs incorrectly collapses GROUP BY groups#21715
xiedeyantu wants to merge 4 commits intoapache:mainfrom
xiedeyantu:fd

Conversation

@xiedeyantu
Copy link
Copy Markdown
Member

Which issue does this PR close?

Rationale for this change

UNIQUE constraints can contain multiple NULL values, so they do not guarantee row-level uniqueness in SQL semantics. The optimizer was incorrectly treating nullable unique constraints as functional dependencies that could reduce GROUP BY keys, which collapsed distinct NULL rows into a single group.

What changes are included in this PR?

This PR updates functional-dependency handling so nullable dependencies derived from UNIQUE constraints are not used to eliminate GROUP BY expressions. It also adds a regression test covering the NULL case from the issue report.

Are these changes tested?

Yes. I ran:

  • cargo fmt --all
  • cargo clippy -p datafusion-common --all-targets -- -D warnings
  • cargo test -p datafusion-common functional_dependencies
  • cargo test -p datafusion-sqllogictest --test sqllogictests -- group_by

Are there any user-facing changes?

Yes. Queries that group by nullable UNIQUE columns will no longer return incorrect aggregated results when multiple NULL values are present.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Apr 18, 2026
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Apr 18, 2026
@github-actions github-actions bot added the substrait Changes to the substrait crate label Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect query results for GROUP BY with UNIQUE constraint

1 participant