Skip to content

Implement predicate functions: all(), any(), none(), single()#2359

Open
gregfelice wants to merge 4 commits intoapache:masterfrom
gregfelice:feature_predicate_functions
Open

Implement predicate functions: all(), any(), none(), single()#2359
gregfelice wants to merge 4 commits intoapache:masterfrom
gregfelice:feature_predicate_functions

Conversation

@gregfelice
Copy link
Copy Markdown
Contributor

@gregfelice gregfelice commented Mar 25, 2026

Summary

Implements the four openCypher predicate functions (issues #552, #553, #555, #556):

  • all(x IN list WHERE predicate) — true if all elements match
  • any(x IN list WHERE predicate) — true if at least one matches
  • none(x IN list WHERE predicate) — true if no elements match
  • single(x IN list WHERE predicate) — true if exactly one matches

These are among the most requested Cypher features for AGE and are critical for users migrating from Neo4j and Kuzu (recently archived).

Implementation

Approach: Builds on the existing list comprehension infrastructure (unnest-based subqueries with child parsestates for variable scoping).

SQL transformation strategy: Each predicate function is lowered to an aggregate subquery over unnest(list) that preserves Cypher's three-valued (TRUE/FALSE/NULL) semantics:

  • all(), any(), none()SELECT CASE WHEN bool_or(<first-branch>) THEN <result1> WHEN bool_or(pred IS NULL) THEN NULL ELSE <result2> END FROM unnest(list) AS x — two bool_or() aggregates compute (1) whether any element satisfies the decisive condition and (2) whether any element's predicate is NULL, combined via CASE to return the correct three-valued result. On an empty list both aggregates return NULL, so the ELSE branch yields the vacuous-truth defaults (all()/none() → true, any() → false).
  • single()SELECT count(*) FROM unnest(list) AS x WHERE pred IS TRUE) = 1 — exact count of truthy matches. (LIMIT 2 short-circuit optimization deferred; current form evaluates the full list.)
  • All four are wrapped in a CASE WHEN <list-expr> IS NULL THEN NULL ELSE <subquery> END guard so a NULL list expression propagates NULL rather than collapsing to the empty-list defaults.

This aggregate-based shape (rather than EXISTS/NOT EXISTS) was chosen specifically to preserve correct NULL handling: WHERE in a SQL subquery drops rows where the predicate is NULL, which would incorrectly collapse any(x IN [NULL] WHERE x > 0) from NULL to false under an EXISTS shape. The aggregate form keeps NULL-producing rows visible and folds them into a proper three-valued result.

Files changed (12):

File Change
cypher_nodes.h New cypher_predicate_function node type with CPFK_ALL/ANY/NONE/SINGLE enum
ag_nodes.h / ag_nodes.c Node registration
cypher_outfuncs.h / cypher_outfuncs.c Serialization
cypher_kwlist.h Three new keywords: ANY_P, NONE, SINGLE
cypher_gram.y Grammar rules in expr_func_subexpr, build_predicate_function_node() helper with NULL-list guard, extract_iter_variable_name() shared helper (rejects qualified ColumnRefs), keywords added to safe_keywords
cypher_clause.c transform_cypher_predicate_function() — builds query tree with bool_or() aggregates + CASE for three-valued semantics; make_bool_or_agg() helper
cypher_analyze.c Expression walker for new node type
Makefile Register new regression test
regress/sql/predicate_functions.sql New test file
regress/expected/predicate_functions.out Expected output

Backward compatibility:

  • ANY_P, NONE, SINGLE added to safe_keywords so they work as property keys and label names (e.g., {any: 1, none: 2, single: 3})
  • ALL was already a reserved keyword with safe_keywords entry
  • No grammar conflicts (verified: zero new shift/reduce or reduce/reduce warnings)

Regression Tests

Test queries covering:

  • Basic true/false for each function
  • Empty list edge cases (vacuous truth for all()/none(), false for any()/single())
  • NULL list input (all four return NULL via the guard)
  • NULL predicate results within non-empty lists (three-valued semantics)
  • Graph data integration (MATCH (u) WHERE all(x IN u.vals WHERE ...))
  • Boolean combinations (any(...) AND all(...))
  • Nested predicates (any(x IN ... WHERE all(y IN ... WHERE ...)))
  • Keyword backward compatibility ({any: 1, none: 2, single: 3})
  • Deterministic ordering (all graph queries use ORDER BY)

All regression tests pass.

Implement the four openCypher predicate functions (issues apache#552, apache#553,
apache#555, apache#556) that test list elements against a predicate:

  all(x IN list WHERE predicate)    -- true if all elements match
  any(x IN list WHERE predicate)    -- true if at least one matches
  none(x IN list WHERE predicate)   -- true if no elements match
  single(x IN list WHERE predicate) -- true if exactly one matches

Implementation approach:
- Add cypher_predicate_function node type with CPFK_ALL/ANY/NONE/SINGLE
  kind enum, reusing the list comprehension's unnest-based transformation
- Grammar rules in expr_func_subexpr (alongside EXISTS, COALESCE, COUNT)
- Transform to efficient SQL sublinks:
  all()   -> NOT EXISTS (SELECT 1 FROM unnest WHERE NOT pred)
  any()   -> EXISTS (SELECT 1 FROM unnest WHERE pred)
  none()  -> NOT EXISTS (SELECT 1 FROM unnest WHERE pred)
  single() -> (SELECT count(*) FROM unnest WHERE pred) = 1
- Three new keywords (ANY_P, NONE, SINGLE) added to safe_keywords for
  backward compatibility as property keys and label names
- Shared extract_iter_variable_name() helper for variable validation

All 32 regression tests pass. New predicate_functions test covers basic
semantics, empty lists, graph data integration, boolean combinations,
nested predicates, and keyword backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for the openCypher predicate functions all(), any(), none(), and single() by introducing a new AST node and transforming it into unnest(...)-based SQL subqueries during parsing/analyzing, with a new regression test suite.

Changes:

  • Add new cypher_predicate_function AST node (+ enum kind) and register/serialize it as an ExtensibleNode.
  • Extend the Cypher grammar with all/any/none/single(variable IN list WHERE predicate) and add keywords to the lexer + safe_keywords.
  • Implement query-tree transformation for predicate functions and add regression tests (predicate_functions).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/include/parser/cypher_kwlist.h Adds any/none/single as Cypher keywords.
src/backend/parser/cypher_gram.y Adds grammar rules + helper to build predicate-function nodes and wraps them in SubLinks.
src/include/nodes/cypher_nodes.h Introduces cypher_predicate_function node + kind enum.
src/include/nodes/ag_nodes.h Registers new node tag for predicate functions.
src/backend/nodes/ag_nodes.c Adds node name and ExtensibleNode methods entry for predicate-function node.
src/include/nodes/cypher_outfuncs.h Declares serialization function for the new node.
src/backend/nodes/cypher_outfuncs.c Implements serialization for cypher_predicate_function.
src/backend/parser/cypher_clause.c Transforms predicate-function node into unnest-based subqueries (EXISTS / count).
src/backend/parser/cypher_analyze.c Adds expression walker support for the new node type.
Makefile Registers the new predicate_functions regression test.
regress/sql/predicate_functions.sql Adds regression SQL coverage for predicate functions.
regress/expected/predicate_functions.out Adds expected output for the new regression test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_clause.c Outdated
Comment thread src/backend/parser/cypher_gram.y
Comment thread regress/sql/predicate_functions.sql
Comment thread src/backend/parser/cypher_clause.c
@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Please see the above comments by Copilot

… perf, tests

- Rewrite predicate functions from EXISTS_SUBLINK to EXPR_SUBLINK with
  aggregate-based CASE expressions (bool_or + IS TRUE/FALSE/NULL) to
  preserve three-valued Cypher NULL semantics
- Add list_length check in extract_iter_variable_name() to reject
  qualified names like x.y as iterator variables
- Add copy/read support for cypher_predicate_function ExtensibleNode
  to prevent query rewriter crashes
- Use IS TRUE filtering in single() count (LIMIT 2 optimization
  breaks correlated variable refs in graph contexts -- documented)
- Add 13 NULL regression tests: null list input, null elements,
  null predicates for all four functions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gregfelice
Copy link
Copy Markdown
Contributor Author

Addressed all 4 Copilot suggestions:

  1. NULL semantics — Rewrote from EXISTS_SUBLINK to EXPR_SUBLINK with aggregate-based CASE expressions (bool_or(pred IS TRUE/FALSE) + bool_or(pred IS NULL)) to preserve three-valued Cypher NULL semantics. any(x IN [NULL] WHERE x > 0) now correctly returns NULL instead of false.

  2. Unqualified iterator check — Added list_length(cref->fields) != 1 validation in extract_iter_variable_name(). Qualified names like x.y now error with "qualified name not allowed as iterator variable".

  3. NULL regression tests — Added 13 new test cases covering: NULL list input for all four functions, null elements in lists, literal null predicates, and mixed null/non-null elements.

  4. single() performance — Applied IS TRUE filtering so NULL predicates aren't counted. The LIMIT 2 optimization breaks correlated variable references in graph property contexts (e.g., MATCH (u) WHERE single(x IN u.vals WHERE ...)), documented with TODO for future optimization pass.

Bonus: Added copy/read support for cypher_predicate_function ExtensibleNode to prevent "unexpected copyObject()" crashes when PostgreSQL's query rewriter copies expression trees.

All 32 regression tests pass (predicate_functions: ok).

@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice We'll see what Copilot thinks ;) Btw, in the future, can you put your comments in the reply to Copilot, please.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_gram.y
Comment thread src/backend/parser/cypher_clause.c Outdated
Comment thread src/backend/parser/cypher_clause.c
…ate.h

1. Add NULL-list guard for all predicate functions (all/any/none/single).
   Wraps the result with CASE WHEN list IS NULL THEN NULL ELSE <result>
   END in the grammar layer.  This fixes single(x IN null WHERE ...)
   returning false instead of NULL.  The expr pointer is safely shared
   between the NullTest and the predicate function node because AGE's
   expression transformer creates new nodes without modifying the
   parse tree in-place.

2. Fix single() block comment in transform_cypher_predicate_function:
   described LIMIT 2 optimization but implementation uses plain
   count(*).  Updated comment to match actual implementation.

3. Keep #include "catalog/pg_aggregate.h" -- Copilot suggested removal
   but AGGKIND_NORMAL macro requires it (build fails without it).

Regression test: predicate_functions OK.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/parser/cypher_clause.c
Comment thread src/backend/parser/cypher_gram.y
@jrgemignani
Copy link
Copy Markdown
Contributor

@gregfelice Please see Copilot above

…mprehensions

- Refactor build_list_comprehension_node() to reuse the shared
  extract_iter_variable_name() helper, so `var IN list` validation
  is consistent between list comprehensions and predicate functions
  (all/any/none/single). Qualified ColumnRefs like `x.y IN list`
  are now rejected in list comprehensions the same way they are
  in predicate functions.
- Update list_comprehension expected output for the normalized
  lowercase "syntax error at or near IN" message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gregfelice
Copy link
Copy Markdown
Contributor Author

@jrgemignani — all Copilot items addressed and threads resolved. Round 3 commit (507a2e5) refactors build_list_comprehension_node() to reuse extract_iter_variable_name(), so iterator-variable validation is consistent between list comprehensions and predicate functions. The PR description already describes the bool_or() aggregate approach correctly. Ready for re-review when you have a moment. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants