Skip to content

Widen expression predicate term field type for mypy compatibility#3421

Open
aashish-g03 wants to merge 2 commits into
apache:mainfrom
aashish-g03:fix/mypy-expression-constructors
Open

Widen expression predicate term field type for mypy compatibility#3421
aashish-g03 wants to merge 2 commits into
apache:mainfrom
aashish-g03:fix/mypy-expression-constructors

Conversation

@aashish-g03
Copy link
Copy Markdown

Rationale for this change

Without the pydantic mypy plugin, mypy generates __init__ signatures from field declarations rather than from the explicit __init__ overrides on UnboundPredicate and its subclasses. This causes downstream users to get type errors when constructing predicates with string column names:

from pyiceberg.expressions import EqualTo

expr = EqualTo(term="my_column", value=42)
# mypy error: Argument "term" has incompatible type "str"; expected "UnboundTerm"

Closes #3101.

What changes were included in this PR?

Widen the term field declaration from UnboundTerm to Annotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)] in UnboundPredicate and LiteralPredicate. This uses the same Annotated + BeforeValidator pattern already established in partitioning.py for transform field coercion.

The BeforeValidator calls the existing _to_unbound_term helper, which coerces str to Reference. The stored value is always UnboundTerm at runtime.

Changes:

  • UnboundPredicate.term: UnboundTerm -> Annotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)]
  • LiteralPredicate.term: same widening (re-declares the field)
  • Added # type: ignore[union-attr] on 3 bind() call sites where self.term.bind() is called -- the validator guarantees the runtime type, but mypy sees the union
  • All existing __init__ methods are preserved unchanged (they handle more than just term coercion)

How was this patch tested?

All 757 existing expression and conversion tests pass. Verified with mypy that keyword construction EqualTo(term="col", value=42) no longer produces the arg-type error on term.

Note: Positional construction EqualTo("col", 42) still produces mypy errors (Too many positional arguments) because mypy without the pydantic plugin cannot infer positional signatures from explicit __init__ overrides. That is a broader issue affecting all Pydantic model constructors in the codebase and is outside the scope of this fix.

Without the pydantic mypy plugin, mypy generates __init__ signatures
from field declarations rather than from explicit __init__ overrides.
This caused EqualTo(term="col", value=42) to produce:

  error: Argument "term" has incompatible type "str"; expected "UnboundTerm"

Fix: widen field type from UnboundTerm to str | UnboundTerm with a
BeforeValidator that coerces str to Reference (matching the existing
_to_unbound_term helper). The validator ensures the stored value is
always UnboundTerm at runtime.

Closes apache#3101
Copilot AI review requested due to automatic review settings May 27, 2026 06:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates UnboundPredicate/LiteralPredicate term field validation to accept raw string terms via Pydantic v2 BeforeValidator, improving model parsing ergonomics.

Changes:

  • Introduces Annotated + BeforeValidator(_to_unbound_term) on term fields to coerce str into Reference.
  • Adds mypy type: ignore[union-attr] in bind() methods due to widened term type.
  • Updates imports to include Annotated and BeforeValidator.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

model_config = ConfigDict(arbitrary_types_allowed=True)

term: UnboundTerm
term: Annotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)]
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I considered Annotated[UnboundTerm, BeforeValidator(_to_unbound_term)] first, but it does not fix the issue.

Without the pydantic mypy plugin (which this project does not use), mypy generates __init__ signatures from the outer Annotated type. So Annotated[UnboundTerm, ...] still produces:

error: Argument "term" has incompatible type "str"; expected "UnboundTerm"  [arg-type]

I verified this with a minimal repro — only str | UnboundTerm as the outer type makes mypy accept str in the constructor without the plugin. The type: ignore[union-attr] on the 3 bind() calls is the trade-off, but the validator guarantees the runtime type is always UnboundTerm.


def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundUnaryPredicate:
bound_term = self.term.bind(schema, case_sensitive)
bound_term = self.term.bind(schema, case_sensitive) # type: ignore[union-attr]

def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundSetPredicate:
bound_term = self.term.bind(schema, case_sensitive)
bound_term = self.term.bind(schema, case_sensitive) # type: ignore[union-attr]
class LiteralPredicate(UnboundPredicate, ABC):
type: TypingLiteral["lt", "lt-eq", "gt", "gt-eq", "eq", "not-eq", "starts-with", "not-starts-with"] = Field(alias="type")
term: UnboundTerm
term: Annotated[str | UnboundTerm, BeforeValidator(_to_unbound_term)]

def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundLiteralPredicate:
bound_term = self.term.bind(schema, case_sensitive)
bound_term = self.term.bind(schema, case_sensitive) # type: ignore[union-attr]
SetPredicate.__getnewargs__ return annotation used UnboundTerm,
but the field is now str | UnboundTerm. Widen the tuple type to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expression constructors are not mypy-compatible without the pydantic plugin

2 participants