Skip to content

Arrow + encryption: wire user serializer through EncryptionWrapper (currently silently substituted with MessagePack) #134

@27Bslash6

Description

@27Bslash6

Background

The EncryptionWrapper constructor accepts any SerializerProtocol:
```python
def init(self, serializer: Optional[SerializerProtocol] = None, ...)
```
and the class docstring claims it 'Works with ANY serializer (StandardSerializer, OrjsonSerializer, ArrowSerializer)'.

However, the integration site in `CacheSerializationHandler` does not honor this:

```python

src/cachekit/cache_handler.py:497-498

master_key_bytes = bytes.fromhex(self.master_key) if self.master_key else None
wrapper = EncryptionWrapper(tenant_id=tenant_id, master_key=master_key_bytes)
```

The user's chosen serializer is never passed in. EncryptionWrapper then falls back to `StandardSerializer()` (encryption_wrapper.py:117). So users requesting `@cache(serializer=ArrowSerializer())` with encryption would silently get MessagePack encoding under the hood.

The v0.6.0 validation rule (`cache_handler.py:351-362`) papered over this by rejecting non-default serializers outright. PR #133 removed 2 integration tests that asserted the silent-substitution behavior — they were testing something that never worked end-to-end.

What should work

`ArrowSerializer`, `OrjsonSerializer` (and `StandardSerializer`) are all cross-language wire formats. They should compose with encryption — Arrow has Java/JS/C++/Rust impls, JSON is universal, MessagePack is what we use today. `AutoSerializer` uses Python-only msgpack ext types (numpy/pandas/datetime) and should remain blocked under encryption to preserve cross-SDK readability.

Scope

  1. Thread `self._base_serializer` into `EncryptionWrapper(serializer=..., ...)` at `cache_handler.py:498`.
  2. Add a `cross_sdk_compatible: ClassVar[bool]` (or equivalent marker) to `SerializerProtocol`. `StandardSerializer`/`OrjsonSerializer`/`ArrowSerializer` mark True. `AutoSerializer` marks False.
  3. Loosen the validation rule in `cache_handler.py:351-362`: allow string serializers `{default, std, standard, orjson, arrow}` and serializer instances whose class marks `cross_sdk_compatible=True`. Keep blocking `'auto'` and unmarked custom instances.
  4. Update `strategy/saas-protocol-v1.0.md` to document which inner formats are valid envelope payloads when `encrypted=true`.
  5. Cross-SDK interop test: encrypt+Arrow on Python, decrypt+Arrow on TS or Rust SDK.
  6. Re-add (rewritten) integration coverage that was removed in test: fix MASTER_KEY env leak that broke 14 integration tests post #127 #133 — Arrow DataFrame caching under encryption end-to-end.

Out of scope

`AutoSerializer` + encryption stays forbidden (Python-only ext types break cross-SDK readability). Custom user-supplied serializer instances without the cross-SDK marker stay forbidden.

Surfaced by

PR #133 (CI fix for MASTER_KEY env leak).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions