Skip to content

FOREACH clause support #2381

@gregfelice

Description

@gregfelice

Summary

Add openCypher FOREACH clause support to AGE. FOREACH is a common ETL / iterative-update construct and is one of the few remaining Phase 1 Cypher parity gaps — along with pattern expressions in WHERE (#1577, PR #2360), predicate functions (PR #2359), and MERGE ON CREATE/MATCH SET (PR #2347).

Cypher semantics (openCypher / Neo4j)

FOREACH (var IN list-expression | update-clause [update-clause ...])
  • Body may contain only update clauses: CREATE, MERGE, SET, REMOVE, DELETE, and nested FOREACH.
  • Body runs once per list element; binds var to the current element in scope for body clauses only.
  • Produces no new rows in the outer query — the outer row set passes through unchanged.
  • No read clauses (MATCH, WITH, RETURN) inside the body.
  • Empty or NULL list → no-op, outer rows preserved.

Examples:

// Create nodes from a list
FOREACH (name IN ['Alice','Bob','Carol'] | CREATE (:Person {name: name}))

// Per-row iterative update
MATCH (p:Person)
FOREACH (tag IN p.tags | SET p.tag_count = p.tag_count + 1)

// Idempotent tag creation
MATCH (p:Person)
FOREACH (t IN p.tag_names | MERGE (tag:Tag {name: t}) MERGE (p)-[:HAS_TAG]->(tag))

Why FOREACH is not UNWIND

UNWIND flattens a list into the row stream — every element becomes an outer row. FOREACH is the opposite: its body runs side-effecting update clauses per element, but the outer row set is unchanged. You can sometimes rewrite one as the other, but not when there are projections downstream that must not be multiplied by list length.

Existing AGE infrastructure that can be reused

  • cypher_unwind node + transform_cypher_unwind (src/backend/parser/cypher_clause.c:1440) — list iteration, element-variable binding, UNWIND expr AS var grammar shape.
  • transform_cypher_set_item_list (src/backend/parser/cypher_clause.c:1862) — per-item update list transform, already parameterized via cypher_update_item.
  • Existing CustomScan executor nodes for cypher_create, cypher_set, cypher_delete, cypher_merge — these are exactly the body clauses FOREACH needs to invoke per iteration.

Proposed implementation strategy

Two viable paths; happy to take maintainer input before writing code.

Option A — New cypher_foreach CustomScan node (preferred). Analogous to cypher_create. Holds (a) the list expression, (b) pre-built child update-clause plans, (c) a per-element tuple slot. Per outer tuple: iterate the list, bind var, invoke each child's executor in sequence; no tuples emitted. This matches AGE's existing architecture for write clauses and gives clean semantics (body runs, outer row passes through).

Option B — Lower to side-effecting SubPlan. Transform FOREACH (x IN list | body) into a correlated SubPlan that UNWINDs list and runs body clauses, attached as an init node to the outer query so it runs per outer row but discards its output. Less new code but harder to reason about row-preservation guarantees.

Option A is probably the path that fits AGE best.

Sketch of the code changes

Grammar (cypher_gram.y)

  • New foreach non-terminal mirroring unwind (line ~974).
  • New parse node cypher_foreach mirroring cypher_unwind with fields: target_name, expr, body_clauses.
  • Register in the clause alternation and the transform dispatch in transform_cypher_clause (cypher_clause.c:504).
  • Reject non-update body clauses at parse time with a location-bearing error.

Transform (transform_cypher_foreach)

  • Push a parse scope with var bound to the element type.
  • Recursively transform each body clause — each becomes its own Query, chained as children of the cypher_foreach node.
  • Validate body is update-only (cypher_create / cypher_set / cypher_delete / cypher_merge / nested cypher_foreach).

Executor

  • New src/backend/executor/cypher_foreach.c analogous to cypher_create.c.
  • ExecCypherForeach iterates the evaluated list, sets the ecxt_scantuple element slot, calls each child update executor in sequence, performs per-iteration cleanup, and emits no tuples — the outer tuple passes through.

Regression tests (regress/sql/cypher_foreach.sql)

  • Smoke: FOREACH (x IN [1,2,3] | CREATE (:N {v: x})) → count check.
  • Nested SET: MATCH (n:Person) FOREACH (tag IN n.tags | SET n.tag_count = n.tag_count + 1).
  • MERGE inside FOREACH: idempotent tag creation pattern.
  • Nested FOREACH.
  • Reject reads: FOREACH (x IN list | MATCH ...) → parse error with location.
  • Empty list: no-op, outer rows preserved.
  • NULL list: treat as empty (Neo4j semantics).

Open questions for maintainers

  1. Preference on Option A vs Option B above?
  2. Any concerns about adding a new CustomScan node in src/backend/executor/ vs slotting into an existing file?
  3. Should RETURN-inside-FOREACH produce a dedicated error message, or fall through the general "unexpected clause" path?

Happy to own this — wanted to file the issue first to align on strategy before writing code, given the scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions