test: add SQL tests documenting Spark encode behavior by andygrove · Pull Request #3975 · apache/datafusion-comet

andygrove · 2026-04-17T14:12:51Z

Which issue does this PR close?

Closes #.

Rationale for this change

There is an upstream DataFusion PR to add a Spark-compatible encode expression: apache/datafusion#21331.

This PR adds tests to Comet to make it easier to review the DataFusion PR.

What changes are included in this PR?

Two files under spark/src/test/resources/sql-tests/expressions/string/:

encode.sql runs on every supported Spark version. It covers UTF-8, US-ASCII, ISO-8859-1, UTF-16 (with BOM), UTF-16BE, UTF-16LE, and UTF-32 (no BOM), plus emoji / surrogate pairs, empty strings, NULL inputs for both arguments, case-insensitive charset names, column versus literal arguments, and binary input with both valid and invalid UTF-8 bytes.
encode_strict.sql is gated by MinSparkVersion: 4.0. It pins Spark's charset whitelist (rejecting UTF-32BE, UTF-32LE, UTF8, UTF16, UTF16BE, ASCII, LATIN1, ISO88591, and EBCDIC with expect_error(INVALID_PARAMETER_VALUE.CHARSET)) and the raise-on-unmappable behavior (expect_error(MALFORMED_CHARACTER_CODING) for é in US-ASCII, Ā in ISO-8859-1, and an emoji in US-ASCII).

All positive queries use query spark_answer_only because Comet currently falls back to Spark for encode, and error cases use query expect_error(...) which works through the fallback path as well.

How are these changes tested?

Ran the new tests locally against both the default Spark 3.5 profile and the Spark 4.0 profile:

./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite encode" -Dtest=none passes encode.sql and skips encode_strict.sql (as expected, since the strict file is gated by MinSparkVersion: 4.0).
./mvnw -Pspark-4.0 test -Dsuites="org.apache.comet.CometSqlFileTestSuite encode" -Dtest=none passes both files.

test: add SQL tests documenting Spark encode behavior

1a93827

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add SQL tests documenting Spark encode behavior#3975

test: add SQL tests documenting Spark encode behavior#3975
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:test-encode-sql

andygrove commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andygrove commented Apr 17, 2026 •

edited

Loading