Skip to content

Implement __array_ufunc__ for Arkouda-backed pandas ExtensionArray #5437

@ajpotts

Description

@ajpotts

Summary

Implement NumPy ufunc interoperability for the Arkouda pandas ExtensionArray by adding a correct, well-scoped __array_ufunc__ implementation. This will allow common ufuncs (e.g., np.add, np.subtract, np.negative, np.logical_and, comparisons, etc.) to operate on Arkouda-backed Series/arrays without silently materializing to NumPy, while preserving pandas semantics where required.

Background / Motivation

Today, many NumPy ufunc operations on Arkouda-backed pandas objects either:

  • fall back to object/NumPy materialization (breaking scalability), or
  • error in inconsistent ways, or
  • route through pandas that expects __array_ufunc__ and __array_priority__ behavior for ExtensionArrays.

A minimal-but-correct __array_ufunc__ enables:

  • predictable behavior for arithmetic and elementwise operations,
  • better pandas compatibility (pandas frequently triggers ufunc paths),
  • clear errors for unsupported dtypes (e.g., Strings, Categorical) or unsupported ufuncs/methods.

Goals

  • Add __array_ufunc__ to the Arkouda ExtensionArray implementation.
  • Support elementwise ufuncs for numeric and boolean dtypes where there is a reasonable Arkouda mapping.
  • Handle method="__call__" and method="reduce" (as appropriate) with clear scoping.
  • Respect pandas expectations:
    • return an ExtensionArray (or Series via pandas) when appropriate,
    • propagate np.nan / missing values correctly (where applicable),
    • preserve dtype where possible.
  • Avoid accidental conversion to NumPy unless explicitly requested (e.g., via out being a NumPy array, or ufunc not supported).

Non-goals (for this ticket)

  • Full coverage of every NumPy ufunc and method (accumulate, reduceat, outer, etc.).
  • Supporting ufuncs for Arkouda Strings and Categorical unless there is a clear, existing Arkouda primitive (should raise a helpful TypeError for now).
  • Implementing NumPy array protocol conversions beyond what is needed for ufunc interoperability.

Proposed Behavior

Supported inputs

  • self is the Arkouda ExtensionArray.
  • Additional inputs may include:
    • scalar Python numbers/bools,
    • NumPy scalars,
    • other Arkouda ExtensionArray instances,
    • pandas arrays/Series that wrap Arkouda arrays (unwrap as needed).

Dispatch rules

  1. Reject unsupported method values with NotImplemented (or TypeError if pandas expects it), except for:
    • __call__ (required)
    • reduce (optional, only for a small safe subset such as np.add.reduce, np.logical_or.reduce if Arkouda equivalents exist)
  2. If any input is a higher-priority type that should handle the ufunc, return NotImplemented.
  3. Map ufuncs to Arkouda server-side ops:
    • Unary: negative, absolute, invert (for bool/int), etc.
    • Binary: add, subtract, multiply, true_divide, floor_divide, power (if supported), comparisons (equal, not_equal, less, greater, etc.), logical ops for bool.
  4. If out is provided:
    • If out contains Arkouda ExtensionArrays: write into those (if we support it), else reject with a clear error.
    • If out contains NumPy arrays: either materialize (explicit) or raise (preferred) — pick one and document it.
  5. Return type:
    • For elementwise ops: return a new Arkouda ExtensionArray with the result.
    • For reduce: return a scalar (Python/NumPy scalar) or a 0-dim equivalent consistent with pandas expectations.

Error messages

  • For unsupported dtypes (Strings/Categorical): raise TypeError like:
    • "NumPy ufunc '<name>' is not supported for Arkouda dtype '<dtype>'"
  • For unsupported ufuncs: raise NotImplementedError or return NotImplemented depending on pandas expectations; include a message guiding users to convert explicitly if they really want NumPy.

Implementation Notes

  • Location: Arkouda pandas ExtensionArray class
  • Consider implementing a small internal dispatcher:
    • _UFUNC_TABLE: dict[np.ufunc, callable] or mapping by ufunc.__name__.
    • Centralize dtype checks and missing-value handling.
  • Ensure correct behavior with:
    • __array_priority__ (set high enough to win dispatch vs NumPy when appropriate),
    • __array__ (if implemented) does not accidentally trigger conversions in the ufunc path.
  • Make sure __array_ufunc__ does not break Series ops that pandas already routes through its own arithmetic machinery.

Repro / Expected UX

Example (should stay on Arkouda)

>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series([1, 2, 3], dtype="ak")
>>> (np.add(s.array, 5)).to_numpy()  # materialize only at the end
array([6, 7, 8])

Example (unsupported dtype gives helpful error)

>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series(["a", "b"], dtype="ak")
>>> np.add(s.array, "x")
TypeError: NumPy ufunc 'add' is not supported for Arkouda dtype 'string'

Tests

Add unit tests covering:

  • Unary ufunc: np.negative, np.absolute (numeric)
  • Binary ufunc: np.add, np.subtract, np.multiply, np.true_divide (numeric)
  • Comparisons: np.equal, np.less, etc. (numeric/bool)
  • Mixed scalar + EA and EA + EA
  • out= behavior (whatever policy is chosen)
  • Unsupported ufunc raises/returns NotImplemented in a predictable way
  • Unsupported dtype (Strings/Categorical) raises a clear TypeError
  • Ensure no silent to_numpy() / materialization occurs in the supported paths:
    • validate the result is an Arkouda ExtensionArray (or wraps one)

Acceptance Criteria

  • __array_ufunc__ is implemented on the Arkouda ExtensionArray.
  • Core elementwise numeric ufuncs work end-to-end without NumPy materialization.
  • Unsupported ufuncs/dtypes produce clear, consistent errors.
  • Test suite includes coverage for supported, unsupported, and edge cases (including out=).
  • Documentation/comments explain the supported ufunc surface and rationale for exclusions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions