The upstream deequ library released 2.0.14-spark-4.0 on March 23, 2026, adding official Apache Spark 4.0 support (see awslabs/deequ#676, awslabs/deequ#678).
pydeequ currently does not support Spark 4 because:
SPARK_TO_DEEQU_COORD_MAPPING in pydeequ/configs.py only maps up to Spark 3.5
- The PySpark optional dependency in
pyproject.toml is capped at <3.4.0
- Spark 4 uses Scala 2.13, which removed
scala.collection.JavaConversions and changed how Seq.empty is accessed via reflection — both used in pydeequ internals
Required changes
pydeequ/configs.py: add "4.0": "com.amazon.deequ:deequ:2.0.14-spark-4.0" to SPARK_TO_DEEQU_COORD_MAPPING
pyproject.toml: widen pyspark optional dep from >=2.4.7,<3.4.0 to >=2.4.7,<5.0.0
pydeequ/scala_utils.py: replace removed JavaConversions with JavaConverters (iterableAsScalaIterableConverter, mapAsJavaMapConverter)
pydeequ/profiles.py: same JavaConversions → JavaConverters fix
pydeequ/analyzers.py + pydeequ/checks.py: replace scala.collection.Seq.empty() (inaccessible via Py4J in Scala 2.13) with an empty java.util.ArrayList converted via to_scala_seq
.github/workflows/base.yml: add Spark 4.0.0 to the test matrix with Java 17 (required by Spark 4); use include: matrix style to pair each Spark version with its Java version
Additional context
Issue authored with assistance from Claude Code
The upstream deequ library released
2.0.14-spark-4.0on March 23, 2026, adding official Apache Spark 4.0 support (see awslabs/deequ#676, awslabs/deequ#678).pydeequ currently does not support Spark 4 because:
SPARK_TO_DEEQU_COORD_MAPPINGinpydeequ/configs.pyonly maps up to Spark 3.5pyproject.tomlis capped at<3.4.0scala.collection.JavaConversionsand changed howSeq.emptyis accessed via reflection — both used in pydeequ internalsRequired changes
pydeequ/configs.py: add"4.0": "com.amazon.deequ:deequ:2.0.14-spark-4.0"toSPARK_TO_DEEQU_COORD_MAPPINGpyproject.toml: widen pyspark optional dep from>=2.4.7,<3.4.0to>=2.4.7,<5.0.0pydeequ/scala_utils.py: replace removedJavaConversionswithJavaConverters(iterableAsScalaIterableConverter,mapAsJavaMapConverter)pydeequ/profiles.py: sameJavaConversions→JavaConvertersfixpydeequ/analyzers.py+pydeequ/checks.py: replacescala.collection.Seq.empty()(inaccessible via Py4J in Scala 2.13) with an emptyjava.util.ArrayListconverted viato_scala_seq.github/workflows/base.yml: add Spark 4.0.0 to the test matrix with Java 17 (required by Spark 4); useinclude:matrix style to pair each Spark version with its Java versionAdditional context
com.amazon.deequ:deequ:2.0.14-spark-4.0Issue authored with assistance from Claude Code