Skip to content

Avro write support - will you accept a patch? #615

@martin-traverse

Description

@martin-traverse

Describe the enhancement requested

Hi,

We need full Avro read / write support in our project and are working on an implementation. I had a look at what already exists in arrow-java, I think it would be fairly straightforward to extend what is there to get full read/write support in the Arrow Java project. Here is what I am proposing:

  • A set of producers to handle the Avro data structures, mirroring the existing consumers
  • Handle the high level file structure (header, embedded schema and block structure)
  • Support for compressed blocks (using the existing codecs in the Avro project)
  • High level APIs for read / write, including incremental read (block by block, corresponding to the VSR)

The last point is important for us because we handle streaming data, if we can check a whole block is available before reading it we should be able to prevent avoid on IO calls.

If I draft a PR along these lines, would there be interest to help me refine it and get it into arrow-java? If not we can do our own implementation which will be simpler because we don't need all the features and data types, but I think the delta is not that large and IMO it would be a good thing to have in the Arrow Java toolkit.

Thoughts welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions