What is the problem the feature request solves?
Apache Spark 4.0 introduced the VARIANT semi-structured data type.
Currently, Spark writes VARIANT columns to Parquet as opaque binary blobs
(value + metadata bytes per row). This means:
- No column pruning: the entire binary blob is read even if only one sub-field is needed
- No predicate pushdown: filters on VARIANT sub-fields (e.g. payload:temp > 20.0)
cannot be pushed into the Parquet scan
Ref
apache/arrow-rs#6736 , apache/parquet-format#456
Although we can get details here apache/arrow-rs#6736 , but I think DataFusion v53.0.0 does NOT depend on parquet-variant yet.
Describe the potential solution
No response
Additional context
No response
What is the problem the feature request solves?
Apache Spark 4.0 introduced the VARIANT semi-structured data type.
Currently, Spark writes VARIANT columns to Parquet as opaque binary blobs
(value + metadata bytes per row). This means:
cannot be pushed into the Parquet scan
Ref
apache/arrow-rs#6736 , apache/parquet-format#456
Although we can get details here apache/arrow-rs#6736 , but I think DataFusion v53.0.0 does NOT depend on parquet-variant yet.
Describe the potential solution
No response
Additional context
No response