Skip to content

[FEATURE] Shredded Parquet Reader/Writer support for Variant type #3983

@Shekharrajak

Description

@Shekharrajak

What is the problem the feature request solves?

Apache Spark 4.0 introduced the VARIANT semi-structured data type.
Currently, Spark writes VARIANT columns to Parquet as opaque binary blobs
(value + metadata bytes per row). This means:

  • No column pruning: the entire binary blob is read even if only one sub-field is needed
  • No predicate pushdown: filters on VARIANT sub-fields (e.g. payload:temp > 20.0)
    cannot be pushed into the Parquet scan

Ref
apache/arrow-rs#6736 , apache/parquet-format#456

Although we can get details here apache/arrow-rs#6736 , but I think DataFusion v53.0.0 does NOT depend on parquet-variant yet.

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions