Skip to content

feat: add review-datafusion-pr Claude Code skill#3974

Open
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:feat/review-datafusion-pr-skill
Open

feat: add review-datafusion-pr Claude Code skill#3974
andygrove wants to merge 3 commits intoapache:mainfrom
andygrove:feat/review-datafusion-pr-skill

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

We already have a review-comet-pr skill that helps reviewers check PRs in this repo for Spark compatibility and implementation correctness. A similar workflow applies when reviewing PRs in the upstream apache/datafusion repository, particularly for the datafusion-spark compatible function library and for core DataFusion changes that may affect Comet.

The upstream repo has a different test approach. It uses .slt (sqllogictest) files written in DataFusion SQL syntax, so the tests cannot be run directly in Spark. A reviewer needs to manually run equivalent queries in Spark to verify that the DataFusion implementation produces the same result.

This skill packages that workflow so it is consistent across reviews and so new reviewers have a concrete checklist to follow.

What changes are included in this PR?

Adds a new Claude Code skill at .claude/skills/review-datafusion-pr/SKILL.md. The skill covers:

  • PR classification into a Spark expression track, a Comet API impact track, or both
  • Reading the Spark source and Spark tests as the canonical reference for expression behavior
  • Reviewing the Rust implementation under datafusion/spark/src/function/
  • Reviewing the .slt test file against the testing guide in datafusion/sqllogictest/test_files/spark/README.md
  • A manual Spark cross-check step with translation notes from DataFusion SQL to Spark SQL, since .slt tests cannot prove Spark equivalence on their own
  • A checklist for breaking API changes in the DataFusion crates that Comet depends on (datafusion, datafusion-datasource, datafusion-physical-expr-adapter, datafusion-spark)
  • CI status, documentation, and common review issues

How are these changes tested?

Manual review of the skill content. The skill is guidance for human reviewers and is not executed by CI.

@andygrove andygrove marked this pull request as ready for review April 17, 2026 13:23
@parthchandra
Copy link
Copy Markdown
Contributor

Would the DataFusion repo be a better place for this? (I've no objection to having it in the Comet repo too).

@andygrove
Copy link
Copy Markdown
Member Author

Would the DataFusion repo be a better place for this? (I've no objection to having it in the Comet repo too).

It's a question worth discussing. The skill is reviewing specifically for any impact on the Comet project.

@parthchandra
Copy link
Copy Markdown
Contributor

Would the DataFusion repo be a better place for this? (I've no objection to having it in the Comet repo too).

It's a question worth discussing. The skill is reviewing specifically for any impact on the Comet project.

My point exactly. DF reviewers may not always remember to consider impact on Comet and this would certainly be a useful for them and invaluable for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants