Skip to content

DestinationAppwrite: tolerate existing schema on Skip/Upsert re-migration#171

Open
premtsd-code wants to merge 38 commits intomainfrom
feat/skip-duplicates
Open

DestinationAppwrite: tolerate existing schema on Skip/Upsert re-migration#171
premtsd-code wants to merge 38 commits intomainfrom
feat/skip-duplicates

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 22, 2026

Summary

Re-migration against a destination Appwrite project needs the destination to tolerate existing schema resources — otherwise the second run throws DuplicateException on every database/table/attribute/index that already exists. This PR makes DestinationAppwrite handle reconciliation locally, with no changes required in utopia-php/database.

Schema decisions are routed through a single method on the enum — OnDuplicate::resolveSchemaAction() — which returns one of three SchemaAction cases: Create, Tolerate, or UpdateInPlace. The decision uses updatedAt only:

  • Source updatedAt strictly newer than dest → UpdateInPlace
  • Source updatedAt equal or older than dest → Tolerate (preserve dest drift)
  • Unparseable / null / zero-date timestamps → Tolerate (conservative)
  • Existing dest doc not found → Create
  • Skip mode → always Tolerate on existing
  • Fail mode → always Create on existing (lets the library surface DuplicateException)

The destination then runs a spec-match guard that overrides UpdateInPlace to Tolerate whenever source and dest already have identical spec — see below.

Spec-match guard

Before dispatching to the match arm, every create method runs a full-spec equality check that overrides UpdateInPlace to Tolerate when source and dest already match:

  • databaseSpecMatches — name, enabled, type, originalId, database
  • tableSpecMatches — name, enabled, $permissions (sorted), documentSecurity
  • attributeSpecMatches — type, size, required, default, array, signed, format, formatOptions, filters; for relationships: structural fields + onDelete
  • indexSpecMatches — type, attributes, orders

Effect: a source-side drop+recreate that produces an identical spec on dest no longer triggers a wasteful drop+recreate cascade — Tolerate skips the work, and the row pass under Upsert refreshes data via upsertDocuments regardless. Comparisons use a valuesMatch helper that ksort()s associative arrays before === so formatOptions = {min:1, max:100} matches {max:100, min:1} even when utopia and source serialize keys in different orders. Lists (filters, columns) keep order-sensitive comparison since their order is semantically meaningful.

SDK-aligned UpdateInPlace

UpdateInPlace for attributes is constrained to changes the Appwrite SDK can express via updateXAttribute / updateRelationshipAttribute. Anything outside that surface falls through to drop+recreate. The boundary is encoded as two class constants on DestinationAppwrite:

private const ATTRIBUTE_NON_SDK_FIELDS = ['type', 'array', 'signed', 'format', 'formatOptions', 'filters'];
private const RELATIONSHIP_STRUCTURAL_FIELDS = ['relationType', 'twoWay', 'twoWayKey', 'relatedCollection'];

A source-side change touching one of these means the physical column shape is changing and no in-place SDK call can express it → drop+recreate. The AppwriteAttributeSdkBoundaryTest unit test reflects on appwrite/appwrite's TablesDB + Databases services and asserts these constants stay aligned with the SDK reality, so a new SDK endpoint surfaces in CI within one commit.

Per-resource behavior

Resource Skip Upsert (spec same OR dest ≥ source) Upsert (SDK-reachable change, source newer) Upsert (non-SDK change, source newer) Fail (default)
Database Tolerate (SKIPPED) Tolerate UpdateInPlace — name, enabled, type, originalId, database, $updatedAt (no non-SDK distinction; same as left) pre-check skipped → library throws
Table Tolerate (SKIPPED) Tolerate UpdateInPlace — name, enabled, $permissions, documentSecurity, $updatedAt (same) pre-check skipped → library throws
Attribute (regular) Tolerate (SKIPPED) Tolerate UpdateInPlace via updateAttribute drop+recreate via deleteAttribute + create pre-check skipped → library throws
Attribute (relationship, two-way) Tolerate (SKIPPED) Tolerate updateRelationshipAttribute for onDelete / newKey (refreshes partner meta) drop+recreate via deleteRelationship (clears partner mirror) pre-check skipped → library throws
Attribute (relationship, one-way) Tolerate (SKIPPED) Tolerate (one-way onDelete is gated off; falls through to drop+recreate) drop+recreate via deleteRelationship pre-check skipped → library throws
Index Tolerate (SKIPPED) Tolerate (no SDK in-place primitive) drop+recreate pre-check skipped → library throws
Row skipDuplicates(createDocuments) upsertDocuments upsertDocuments upsertDocuments plain createDocuments

Tolerate branches set Resource::STATUS_SKIPPED and return false so migration counters show skip instead of success for no-op re-migrations. Immutable container fields ($id, $createdAt, internal sequences) are never touched on UpdateInPlace.

Relationship drop uses deleteRelationship (both types)

One-way and two-way relationships both flow through Database::deleteRelationship() rather than deleteAttribute. Reason: utopia's deleteAttribute doesn't unwire the parent column for relationships at all — only deleteRelationship clears both metadata docs and the physical column on each side. The row pass restores partner data on recreate, so two-way drop is safe.

Two-way onDelete UpdateInPlace syncs the partner meta doc

utopia-php/database's updateRelationship updates the physical constraint on both sides of a two-way relationship, but the Appwrite-level attributes meta doc on the partner side is application-layer state utopia doesn't know about. updateRelationshipInPlace refreshes the partner meta's options.onDelete and purges the partner table's caches after the parent-side update, so reads from the related collection don't see stale onDelete. (One-way + onDelete change is gated off and falls through to drop+recreate, since utopia's updateRelationship partner-cascade throws on one-way.)

Upsert orphan cleanup

DestinationAppwrite tracks every attribute and index key source declares per (database, table) during the run in an $orphansByTable map. At cleanup time it queries the destination's attributes / indexes collections for that table and drops any whose key wasn't in the tracked set.

When cleanup fires:

  • Per-table, before rows land — triggered inside createRecord on first row batch flush for a given table. Ensures the Structure validator sees the post-cleanup schema, so rows aren't rejected on orphan required columns.
  • End-of-migration sweep — triggered from the run() override after source streaming completes. Only visits tables that had no rows.

Gating: tracking and cleanup are both short-circuited when $this->onDuplicate !== OnDuplicate::Upsert. Skip and Fail modes never touch destination state they weren't explicitly asked to write.

Fail-open on partial failure: run() calls cleanupUpsertOrphans() after parent::run() returns — if the migration throws mid-stream, the cleanup is skipped and the destination is preserved as-is.

Two-way relationships: orphan drops route through deleteRelationship, which clears the partner mirror automatically.

Two-way relationship pair-key dedup

Two-way relationships emit two source-side resources (one per side). Without dedup, the partner pass would re-process and either double-work or destroy partner-side row data already migrated.

Prevention: a canonical pair-key ({dbSeq}@{min(tuple)}<->{max(tuple)} where tuple = {tableId}::{attributeKey}) is recorded in $processedTwoWayPairs after the first side completes. The partner's second pass checks the set and short-circuits with STATUS_SKIPPED.

Why pre-check over try/catch

Migration is a sequential single-writer, so the race-condition justification for exception-driven control flow doesn't apply. Pre-check reads top-to-bottom without using exceptions for control flow. On OnDuplicate::Fail (the default) the pre-check is gated off entirely — fresh-migration paths pay zero overhead.

Zero-date and null-timestamp hardening

sourceIsNewer() treats null, empty, garbage, and MySQL zero-date ('0000-00-00 00:00:00') timestamps as "unknown" → no destructive action. PHP's strtotime('0000-00-00 00:00:00') returns a large negative epoch rather than false, so the naive === false check would have missed the sentinel — parseTimestamp() rejects non-positive epochs explicitly.

Maintainability — helpers, constants, and run-state isolation

Helpers extracted to keep create methods narrow:

  • META_DATABASES / META_ATTRIBUTES / META_INDEXES class constants — replace ~30 magic-string collection-name occurrences. Phpstan-checkable; typo-resistant.
  • databaseCollectionId / tableCollectionId / attributeIndexMetaId — single source of truth for the platform-DB collection ids.
  • purgeTableCaches — collapses 8+ duplicated cache-purge pairs.
  • resolveTwoWayPartner returning a typed array shape — single entry point for the 3 sites that previously extracted relatedCollection / twoWayKey / relatedTable with inconsistent fallbacks.
  • arraysDifferOnKeys — drives the SDK-boundary diff in updateAttributeInPlace and updateRelationshipInPlace.
  • valuesMatch — order-independent equality for associative arrays (formatOptions, options).
  • databaseSpecMatches / tableSpecMatches / attributeSpecMatches / indexSpecMatches — full-spec-equality checks driving the unified spec-match guard.
  • bestEffort — explicit "swallows all Throwable" name for cleanup-path errors.
  • resetRunState() — invoked at the top of run() so reusing a destination instance no longer leaks $rowBuffer, $orphansByTable, or $processedTwoWayPairs.

The action-dispatch shape in each create method:

$action = $this->onDuplicate->resolveSchemaAction(!$existing->isEmpty(), $updatedAt, $existing->getUpdatedAt());
// Spec match → skip work. Create excluded; nothing on dest to match against.
if ($action !== SchemaAction::Create && $this->XSpecMatches(...)) {
    $action = SchemaAction::Tolerate;
}
match ($action) { ... };

Zero library changes

All of this uses existing utopia-php/database APIs:

  • getDocument(collection, id) for pre-check
  • updateDocument for container UpdateInPlace
  • updateAttribute / updateRelationship for SDK-reachable in-place updates
  • deleteAttribute / deleteRelationship / deleteIndex for leaf drop+recreate and orphan drops
  • find('attributes' | 'indexes', ...) for orphan discovery
  • skipDuplicates() (5.3.22) for row-level Skip
  • upsertDocuments() (existing) for row-level Upsert

No database library bump required.

Tests

  • Unit — 20 tests / 32 assertions across two files:
    • OnDuplicateTest — exists × mode × updatedAt matrix for resolveSchemaAction, including null/zero-date/unparseable-timestamp datasets (DataProvider) and OnDuplicate::values() case ordering.
    • AppwriteAttributeSdkBoundaryTest — reflects on appwrite/appwrite's TablesDB and Databases services, verifies ATTRIBUTE_NON_SDK_FIELDS doesn't claim any SDK-reachable param as non-SDK. Catches SDK drift in CI when appwrite/appwrite adds new mutable params.
  • E2E (in appwrite/appwrite PR #11910):
    • testAppwriteMigrationRowsOnDuplicate — rows under all three modes
    • testAppwriteMigrationReRunIsIdempotent — unchanged source is a no-op (every resource Tolerated)
    • testAppwriteMigrationUpsertUpdatesContainerMetadata — Upsert reconciles database/table drift via UpdateInPlace
    • testAppwriteMigrationSkipPreservesContainerDrift — Skip preserves dest container drift
    • testAppwriteMigrationUpsertUpdatesAttributeInPlace — SDK-reachable attribute change propagates via updateAttributeInPlace; row data preserved
    • testAppwriteMigrationSkipPreservesAttributeDrift — Skip preserves dest attribute drift (leaf-level analog)
    • testAppwriteMigrationUpsertUpdatesRelationshipOnDeleteInPlace — two-way onDelete change updates in place on both sides; partner meta refreshed
    • testAppwriteMigrationUpsertTwoWayRecreateSkipsPartnerSide — pair-key dedup prevents partner drop+recreate from wiping rows already migrated this run
    • testAppwriteMigrationUpsertOneWayRelationshipDropAndRecreate — one-way + onDelete change falls through to drop+recreate (in-place gated off for one-way)
    • testAppwriteMigrationUpsertAttributeRecreateDropsAndRecreates — source drops+recreates with DIFFERENT spec → drop+recreate, row pass refills
    • testAppwriteMigrationUpsertSameSpecRecreateTolerates — source drops+recreates with SAME spec → spec-match guard forces Tolerate; dest meta untouched
    • testAppwriteMigrationUpsertDropsOrphanColumn — Upsert drops destination columns source no longer declares
    • testAppwriteMigrationSkipKeepsOrphanColumn — Skip preserves orphan columns
  • Verified green across all six adapter × mode combinations (MariaDB, PostgreSQL, MongoDB × dedicated, shared).

Test plan

  • Fresh migration paths unchanged (all three modes).
  • Skip re-run is a no-op for every resource type; counters show skip for tolerated resources.
  • Upsert re-run with unchanged source is idempotent (no drops, no updates, no row writes).
  • Source recreates a column with same spec → spec-match guard forces Tolerate; dest column meta untouched.
  • Source recreates a column with different spec, SDK-reachable change → UpdateInPlace via updateAttribute.
  • Source recreates a column with non-SDK change (e.g. type change) → drop+recreate; row pass refills.
  • Index whose spec matches → Tolerate (no needless drop+recreate).
  • One-way and two-way relationships both drop via deleteRelationship (parent column unwired correctly on each side).
  • Two-way onDelete UpdateInPlace refreshes the partner-side Appwrite meta doc.
  • Two-way relationship partner side is SKIPPED via pair-key set when first side was already reconciled.
  • Upsert drops destination attributes/indexes source no longer declares.
  • Skip mode preserves destination orphans and attribute/container drift.
  • Zero-date / null / unparseable updatedAt tolerate instead of dropping.
  • formatOptions ksort-equality survives utopia/source key-order differences.
  • SDK-boundary lock-in test enforces ATTRIBUTE_NON_SDK_FIELDS matches appwrite/appwrite Update endpoints.
  • vendor/bin/pint --test passes.
  • vendor/bin/phpstan analyse --level 3 src tests passes.

Re-migration needs the destination to tolerate existing schema resources.
The database library stays minimal — migration owns the reconciliation
decision since it has both source and destination metadata in hand.

Per-resource behavior:

- Database: pre-check `databases` metadata. If exists, hydrate sequence
  from the existing document and return. Both Skip and Upsert tolerate
  unconditionally — databases contain the entire tree of collections +
  rows, dropping would destroy everything.

- Table: pre-check `database_{sequence}` metadata. Same pattern —
  tolerate unconditionally, hydrate sequence, never drop. Attribute and
  index reconciliation happens per-resource at the lower layer.

- Attribute: pre-check `attributes` metadata by composite id
  ({databaseSeq}_{tableSeq}_{key}). Skip tolerates. Upsert compares
  source's updatedAt vs destination's — if source is strictly newer the
  column is dropped (via `dbForDatabases->deleteAttribute`) and recreated
  so the spec matches source. Column data is wiped on drop; rows will be
  repopulated by the row-level Upsert path that follows.

- Index: same pattern against `indexes` metadata. Drop is cheap because
  indexes carry no user data — only rebuild time.

Approach is pre-check rather than try/catch for control flow: migration
is sequential single-writer, so the race condition justification for
try/catch doesn't apply, and pre-check reads top-to-bottom with no
exception-as-control-flow.

Row-level dispatch (unchanged, already committed):
  OnDuplicate::Upsert → $dbForDatabases->upsertDocuments(...)
  OnDuplicate::Skip   → $dbForDatabases->skipDuplicates(fn () => createDocuments(...))
  OnDuplicate::Fail   → plain createDocuments

Both row-level primitives are existing library APIs — no database-library
changes are required for this feature.

Helpers:
  shouldTolerateSchemaDuplicate()  // onDuplicate !== Fail
  sourceSpecIsNewer($src, $dst)    // strtotime compare, false on empty
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR adds re-migration tolerance to DestinationAppwrite by routing all schema resources through OnDuplicate::resolveSchemaAction() which returns SchemaAction::{Create,Skip,Overwrite}. It also adds orphan-cleanup for Overwrite mode, a two-way relationship pair-key dedup mechanism, and moves index limit/validation checks after the pre-check drop to fix false-positive failures at capacity. Previously flagged P1 issues (stale $table after drop, IndexValidator ordering, missing pair-key guard, stale $table in createRecord) all appear addressed in this iteration.

Confidence Score: 4/5

Safe to merge for fresh migrations; Overwrite re-migration may trigger unnecessary drop+recreate for attributes where empty filters/formatOptions arrays are stored as null by Appwrite.

All previously flagged P1 issues are addressed in this revision. The two remaining comments are P2: a null-vs-empty-array spec mismatch that could trigger unnecessary drop+recreate in Overwrite mode, and a minor missing partner-table cache purge inside dropAttributeForRecreate. Neither breaks fresh migrations or causes data loss.

src/Migration/Destinations/Appwrite.php — valuesMatch / attributeSpecMatches null-vs-[] edge case and dropAttributeForRecreate partner cache purge.

Important Files Changed

Filename Overview
src/Migration/Destinations/Appwrite.php Core re-migration logic: resolveSchemaAction routing, drop+recreate for attributes/indexes/relationships, orphan cleanup, two-way pair dedup, and index pre-check reordering. All major P1 issues from prior review rounds appear addressed; one P2 null-vs-empty-array comparison issue in spec matching remains.
src/Migration/Destinations/OnDuplicate.php Adds SchemaAction enum, resolveSchemaAction(), and parseTimestamp() with zero-date hardening; renames Upsert→Overwrite (breaking semver change noted in prior threads).
tests/Migration/Unit/Destinations/AppwriteAttributeSdkBoundaryTest.php New lock-in test reflecting on appwrite/appwrite SDK to assert ATTRIBUTE_IMMUTABLE_FIELDS stays aligned with SDK update endpoints.
tests/Migration/Unit/General/OnDuplicateTest.php Comprehensive DataProvider coverage for the mode × exists × updatedAt matrix including zero-date and garbage timestamp edge cases.

Reviews (32): Last reviewed commit: "Rename SchemaAction cases to match Appwr..." | Re-trigger Greptile

Comment thread src/Migration/Destinations/Appwrite.php Outdated
Comment thread src/Migration/Destinations/Appwrite.php Outdated
Comment thread src/Migration/Destinations/Appwrite.php Outdated
Guard against malformed datetime strings (e.g. "0000-00-00 00:00:00")
that PHP's strtotime() returns false for. Without the explicit check,
false gets loose-compared as 0 and the helper silently returns false,
meaning a corrupted source updatedAt would always tolerate the existing
destination entry and never trigger drop+recreate.

Appwrite itself always emits parseable RFC 3339 timestamps, so this is
mainly defensive for non-Appwrite sources (Supabase, NHost, CSV).

Flagged by greptile P2.
On a first-run migration createField writes TWO metadata documents for a
two-way relationship attribute: parent-side at
{dbSeq}_{tableSeq}_{key} (line 763) and child-side at
{dbSeq}_{relatedTableSeq}_{twoWayKey} (line 819). Migration operates
directly on the database via $dbForProject — it doesn't rely on
Appwrite's attribute-event workers to derive the child side — so both
documents are physically written by migration itself.

The Upsert drop block was only removing the parent-side document +
column. On the recreate pass that followed, line 819 re-wrote the
child-side document and immediately hit DuplicateException because the
stale child doc from the previous migration run was still sitting there.
The inner catch rolled back the parent and the whole attribute aborted
with "Attribute already exists" — breaking Upsert re-migration for every
two-way relationship whose source spec had changed.

Fix: in the Upsert drop branch, when the resource is a two-way
relationship, also delete the child-side metadata document and the
child-side physical column on the related table, then purge caches.
Both deletes are wrapped in try/catch to tolerate the case where a prior
interrupted run (or utopia-php/database's relationship handling) already
cleaned one side — the goal is to guarantee a clean slate for the
downstream recreate, not to require both sides to still exist.

Flagged by greptile P1. Reproducing scenario:
  - OnDuplicate::Upsert
  - Destination already contains the attribute
  - Source's two-way relationship spec was modified between runs
    (source.updatedAt > dest.updatedAt)
DestinationAppwrite's shouldTolerateSchemaDuplicate() and
sourceSpecIsNewer() were private helpers that encoded mode-specific
behavior. Moving them onto OnDuplicate itself puts the behavior on the
type that owns it and makes the call sites self-documenting:

  $this->onDuplicate->toleratesSchemaDuplicate()
  $this->onDuplicate->shouldReconcileSchema($src, $dst)

shouldReconcileSchema() also subsumes the explicit Skip check at the
call sites — it returns false for Skip and Fail, so the compound
"`Skip || !sourceSpecIsNewer(...)`" condition collapses to a single
"`!shouldReconcileSchema(...)`".

No behavior change. Lint + PHPStan L3 clean.
OnDuplicate previously exposed two methods (toleratesSchemaDuplicate +
shouldReconcileSchema) that call sites had to combine. That split meant
the decision was spread across two methods and the call site — adding a
new mode later would require updating all three places in sync.

Replace with a single resolveSchemaAction() that returns one of three
SchemaAction enum cases (Create / Tolerate / DropAndRecreate). The
SchemaAction enum lives in the same file as OnDuplicate so the complete
decision surface is in one place.

For data-preserving containers (databases, tables) the method accepts
allowDrop=false so it can never return DropAndRecreate — callers that
use that path don't need to worry about accidentally dropping the
container. In practice the container call sites just use the inline
`$this->onDuplicate !== OnDuplicate::Fail` gate and then a simple
"is empty?" branch; resolveSchemaAction() is reserved for the leaves
(attributes, indexes) that actually benefit from the three-way choice.

Fail-mode short-circuit is preserved at every call site: no
getDocument pre-check when onDuplicate is Fail, so first-time
migrations (the common default-mode path) pay zero overhead. The
short-circuit stays at the call site rather than inside the enum method
so the "don't query on Fail" invariant is visible where the query is
issued.

Call sites:
  - createDatabase / createEntity: inline `!== Fail` gate, tolerate on
    existing. No drop ever.
  - createField: Fail short-circuit, else resolveSchemaAction + match.
    DropAndRecreate triggers the two-way relationship cleanup.
  - createIndex: same pattern, no two-way concern.
…havioral comments

Three review improvements to the schema-tolerance work:

1. OnDuplicate::resolveSchemaAction — remove the $allowDrop parameter.
   No caller passes it; all four call sites rely on the default. Containers
   (databases, tables) bypass the method entirely with an inline
   "!== Fail" check. The parameter advertised tuning that wasn't real,
   confusing readers about what was actually gated. Docstring tightened
   to mention that this method is only safe for leaf resources.

2. Extract deleteAttributeCompletely() as a reusable primitive that
   fully removes an attribute from destination — both metadata documents
   AND physical columns, mirroring createField's two-document write for
   two-way relationships. The Upsert drop block in createField shrinks
   from ~40 nested lines to a single method call; the two-way cleanup
   is now a named concern that future non-Upsert paths (Merge mode,
   explicit rollback, etc.) can reuse without duplication.

3. Replace line-number comments ("the first-run createField wrote TWO
   metadata docs (parent at line 763, child at line 819)") with
   behavioral descriptions ("createField writes a second `attributes`
   document on the related table"). Line references rot on every edit;
   behavioral descriptions don't.

Pure refactor — no functional change. Pint + PHPStan L3 clean.
… handling

resolveSchemaAction is the load-bearing decision point for re-migration
tolerance. Unit test locks the full mode × existence × timestamp matrix so
a future refactor can't silently shift the mapping. 14 tests, sub-millisecond
to run, no destination spin-up required.

Coverage:
- Not-exists returns Create for all modes.
- Fail + exists → Create (routed through create flow so library
  DuplicateException surfaces as designed, not silently tolerated).
- Skip ignores timestamps entirely — "don't touch" contract enforced
  by verifying both timestamp orderings produce Tolerate.
- Upsert with source strictly newer → DropAndRecreate.
- Upsert with dest newer or equal timestamps → Tolerate (strict > gate
  avoids unnecessary destructive rebuild when nothing has changed).
- Upsert with unparseable timestamps → Tolerate (conservative, dataset
  covers empty strings, '0000-00-00 00:00:00', and non-date garbage).
- OnDuplicate::values() case ordering guarded against accidental
  rename/reorder that would break SDK param validators.

Writing the test surfaced a real gap in sourceIsNewer: PHP's strtotime()
accepts '0000-00-00 00:00:00' leniently and returns a large negative
epoch (-62169984000) rather than false. The previous `=== false` check
missed the MySQL zero-date sentinel and would have returned true when
source was a valid date and dest was '0000-00-00', triggering a
destructive drop on garbage input. Harden sourceIsNewer to additionally
reject non-positive epochs (<= 0) — any legitimate Appwrite updatedAt
is well past 1970-01-01.
Extends the Upsert reconciliation to cover container metadata drift.
Previously databases and tables always tolerated an existing destination
under Upsert — source renames, enable toggles, and (for tables)
permission changes silently never reached destination on re-migration.
Now Upsert + source strictly newer triggers an updateDocument on the
container's own metadata doc, leaving children (rows, sub-resources)
untouched.

Mechanism:

- New SchemaAction::UpdateInPlace case alongside Create / Tolerate /
  DropAndRecreate.
- OnDuplicate::resolveSchemaAction gains a canDrop parameter (default
  false — safer default, destructive reconciliation must be opt-in):
    canDrop: true  → DropAndRecreate on Upsert-newer (attributes, indexes)
    canDrop: false → UpdateInPlace on Upsert-newer (databases, tables)
- createField / createIndex pass canDrop: true.
- createDatabase / createEntity rely on the default and handle
  SchemaAction::UpdateInPlace by calling updateDocument with a scoped
  field set:
    Database → name, search, enabled, type, originalId, database, $updatedAt
    Table    → name, search, enabled, $permissions, documentSecurity, $updatedAt
  Immutable fields ($id, $createdAt, internal sequences) are never touched.

Source-wins semantics for table permissions follow the full-Upsert
contract: if the caller wants destination drift preserved across re-runs,
they should pick Skip mode, not Upsert. Row Upsert already works this
way; containers now match.

Unit tests cover both canDrop branches and confirm Skip never returns
UpdateInPlace regardless of timestamps (reinforcing the "don't touch"
contract). 16 tests, pint + phpstan L3 clean.
@premtsd-code premtsd-code changed the title feat(DestinationAppwrite): tolerate existing schema on Skip/Upsert re-migration DestinationAppwrite: tolerate existing schema on Skip/Upsert re-migration Apr 23, 2026
- Track attribute/index keys seen from source per (db, table); after rows
  land per table (and in a final sweep at end-of-run) drop destination
  attributes/indexes whose keys source did not emit. Source rename
  (delete+create) on source now propagates as a drop on destination
  without leaving the old column orphaned. Gated to Upsert mode only.

- resolveSchemaAction now accepts nullable timestamps — attribute docs
  with missing $updatedAt no longer throw TypeError, they fall through
  to Tolerate (same conservative handling as zero-date/garbage input).

- Tolerate branches mark resources as SKIPPED instead of implicit
  SUCCESS so re-migration counters reflect 'already exists' no-ops.

- Internal refactor: consolidated 3 orphan-tracker state fields into
  one $orphanScopes map; extracted dropOrphansByKind(), tolerateMissing(),
  dropTwoWayMirror() helpers; renamed deleteAttributeCompletely to
  dropAttributeForRecreate.
- $orphanScopes      -> $orphansByTable
- orphanScopeKey()    -> tableIdentity()
- cleanupUpsertOrphansForScope() -> cleanupUpsertOrphansForTable()
- $scopeKey local    -> $tableId
- $scope local       -> $tracked (what source declared for this table)

No behavior change. The old names were consumer-oriented jargon; the
new names name the actual unit (one entry per table).
Processing one side of a two-way relationship already reconciles both
metadata docs and both physical columns (via createRelationship on
Create/DropAndRecreate, or via the mirror in dropTwoWayMirror on
DropAndRecreate). Re-processing the partner side would either waste
work or — worse — drop the mirror after the partner table's rows
already migrated, destroying relationship data.

Track a canonical pair-key (sorted (table, key) tuples scoped to the
database) as soon as createField finishes a two-way attribute. On the
partner's second pass the check at the top of createField sees the
pair in $processedTwoWayPairs and short-circuits with STATUS_SKIPPED.

Covers the scenario where one side's source _updatedAt is newer than
the other: whichever side is processed first wins; the partner is
SKIPPED regardless of its own timestamp. Data safety preferred over
honoring a second, potentially-divergent schema decision.
Comment thread src/Migration/Destinations/Appwrite.php
Relationships can't take the DropAndRecreate path cleanly:
- utopia's deleteAttribute throws for VAR_RELATIONSHIP
- Two-way drops cascade to the partner table's physical column
  and destroy its row data (partner's rows migrated earlier in
  the run won't get re-migrated)

Route relationships through SchemaAction::UpdateInPlace instead:
- canDrop: !$isRelationship on resolveSchemaAction
- New updateRelationshipInPlace() helper calls
  $dbForDatabases->updateRelationship() for the fields utopia
  supports (newTwoWayKey, twoWay, onDelete) + updateDocument()
  on Appwrite's application-level doc
- relationType changes can't be updated in place and would silently
  diverge utopia's internal metadata from Appwrite's doc; throw a
  clear error asking the caller to drop+recreate manually

Each side of a two-way is authoritative for its own Appwrite-level
doc. utopia's updateRelationship cascades internal metadata on both
sides in one call and is idempotent for same-target writes, so
neither side's pass stomps the other.

Dead-code cleanup: delete dropTwoWayMirror() and the relationship
branch of dropAttributeForRecreate — both were only reachable from
paths that no longer fire. Simplifies dropAttributeForRecreate
signature (drops unused $type and $relatedTable params).

Fix dropOrphanAttribute for relationship orphans: deleteAttribute
throws for VAR_RELATIONSHIP, so use deleteRelationship which
cascades both sides' physical columns + utopia metadata. Clean the
two Appwrite-level docs separately.

All 34 unit tests pass, pint + phpstan L3 clean.
… updates

resolveSchemaAction now distinguishes 'source resource was deleted+
recreated' (createdAt diff) from 'source resource was edited'
(updatedAt-newer with createdAt unchanged). The first triggers
DropAndRecreate; the second triggers UpdateInPlace. Containers
(databases, tables) get UpdateInPlace in both cases (canDrop=false)
since dropping would cascade child loss.

UpdateInPlace is now SDK-aligned. New helpers updateAttributeInPlace
(non-relationships) and refactored updateRelationshipInPlace return
bool: true if the source change maps to an SDK updateXAttribute
endpoint (required, default, size, min/max, elements, newKey,
onDelete), false to fall through to DropAndRecreate. Non-SDK fields
(type, array, signed, format, formatOptions, filters) and structural
relationship changes (relationType, twoWay toggle, twoWayKey,
relatedCollection) automatically route to drop+recreate.

Both relationship types (one-way and two-way) now use deleteRelationship
on the DropAndRecreate path. One-way is a clean drop (no partner data
to lose). Two-way drops both columns; partner row data is restored by
the subsequent Upsert row pass. canDrop on attributes is now uniformly
true.

Indexes have no SDK in-place primitive, so UpdateInPlace consults
indexSpecMatches: matching specs tolerate (filters phantom updatedAt
drift), differing specs drop and recreate.

Maintainability passes: extracted databaseCollectionId, tableCollectionId,
attributeIndexMetaId, purgeTableCaches, arraysDifferOnKeys helpers.
SDK-boundary fields lifted to ATTRIBUTE_NON_SDK_FIELDS and
RELATIONSHIP_STRUCTURAL_FIELDS class constants — single point of change
when SDK adds new endpoints.

Test matrix expanded: createdAt diff × canDrop × mode × unparseable
timestamp combinations.
Comment thread src/Migration/Destinations/Appwrite.php Outdated
utopia's updateRelationship updates the physical constraint on both
sides, but the Appwrite-level attributes meta doc on the partner side
keeps the old options.onDelete value. Reads from the related
collection's attribute list (Appwrite API, validators, subsequent
re-migration pre-checks) see stale data.

Mirrors the partner-side cleanup pattern already used in
dropAttributeForRecreate. One-way unaffected (onDelete change on
one-way already routes to DropAndRecreate).
Comment thread src/Migration/Destinations/Appwrite.php
Source emits both sides of a two-way relationship as separate Column
resources, but processing one side already reconciles both physical
columns + both Appwrite-level meta docs (utopia's createRelationship/
deleteRelationship cascade). Without dedup, the partner pass can fire
DropAndRecreate again — partner's source createdAt is independent of
parent's and may not match what we just stamped on dest's partner meta
(parent's createdAt).

Reintroduces the pair-key dedup that existed in 791dbb1 and was removed
in 6e6f825 when relationships routed exclusively through UpdateInPlace.
Commit 786a27b reopened the DropAndRecreate path for createdAt-different
without restoring the dedup; this fixes that gap.

Pair-key is canonical (sorted, scoped to dbSeq) so both sides resolve to
the same string. Set after Tolerate, after UpdateInPlace success, and
after the create-path completion. Skip check at the top of createField.
Four targeted fixes from a code review pass on the schema-reconciliation
code, picked for clear correctness or future-debug-cost improvements:

1. META_DATABASES / META_ATTRIBUTES / META_INDEXES class constants. ~30
   bare collection-name string literals are now phpstan-checkable and a
   typo no longer compiles. Avoided false-positive replacements in
   getAttribute('attributes', []) / collectionStructure['indexes'] etc.,
   which name UtopiaDocument fields, not collections.

2. resetRunState() called at top of run(). $rowBuffer, $orphansByTable,
   and $processedTwoWayPairs are instance state that previously leaked
   across run() invocations on a reused Appwrite destination instance.
   A second run on the same instance would inherit the first run's
   processed-pair set and silently SKIP unrelated two-way attributes.

3. tolerateMissing() renamed to bestEffort(). The previous name implied
   it caught only NotFound errors, but the implementation swallows all
   \Throwable. Honest naming prevents 'why didn't the orphan get
   cleaned up' debugging time later.

4. Inconsistent partner-side error message: 'Column limit exceeded' on
   line 949 was a copy-paste drift from the parent block's correct
   'Attribute limit exceeded'. Same conceptual failure should surface
   the same user-visible message.

5. TwoWayPartner value object + resolveTwoWayPartner() helper. Three
   sites (dropAttributeForRecreate, refreshTwoWayPartnerOnDelete,
   dropOrphanAttribute) independently extracted relatedCollection /
   twoWayKey from options with subtly different fallback handling
   (?? null vs ?? ''). Single helper returns ?TwoWayPartner with
   relatedTable / partnerKey / partnerMetaId; null when not two-way or
   the partner table has gone away. Inconsistent fallbacks were a P1
   bug waiting to ship.

All four are P1 reviewer concerns. Larger items (split createField,
template-method for the pre-check skeleton) deferred to a follow-up
PR per the user's explicit instruction.

31/31 unit tests pass, pint clean, phpstan level 3 clean.
Comment thread src/Migration/Destinations/Appwrite.php
The 3-field readonly value object was thin enough that an array shape
(typed via phpdoc) covers the same phpstan-checkable surface without a
separate class file. Helper return type narrows from ?TwoWayPartner to
?array{relatedTable, partnerKey, partnerMetaId} and call sites switch
from $partner->field to $partner['field'].
Reflects on appwrite/appwrite's TablesDB and Databases services to build
the set of meta-doc fields the SDK can mutate via update*Column /
update*Attribute endpoints, then asserts no overlap with our
ATTRIBUTE_NON_SDK_FIELDS constant.

When the SDK ships a new mutable param (e.g. exposing 'signed' or 'array'
on Update), CI fails so we sync the constant. Catches drift in CI rather
than at runtime.

PARAM_TO_META_FIELD documents the SDK-vs-meta-doc vocabulary mismatch:
- xdefault -> default (PHP keyword workaround)
- min/max/elements -> null (sub-fields of formatOptions, not direct)
- onDelete -> null (sub-field of relationship options)

Currently observed SDK-exposed top-level fields: required, default, size.
Constant correctly lists: type, array, signed, format, formatOptions,
filters. Zero overlap, test passes.
Comment thread src/Migration/Destinations/Appwrite.php Outdated
Adds an unconditional spec-match check after resolveSchemaAction returns
DropAndRecreate or UpdateInPlace. If source's full attribute/table/db/
index spec already matches dest, action is forced to Tolerate — no DDL,
no in-place call, no row-data wipe.

Closes the gap where a user dropping + recreating a column on source
with the same spec triggered a wasteful drop+recreate cascade on dest
(createdAt advanced even though spec was unchanged). Row pass under
Upsert mode still propagates source row values via upsertDocuments.

Helpers: databaseSpecMatches, tableSpecMatches, attributeSpecMatches
(handles relationship via RELATIONSHIP_STRUCTURAL_FIELDS + onDelete),
indexSpecMatches (existing). Inserted as a single override at each of
the four pre-check sites.

createIndex: removed the inline indexSpecMatches IIFE inside UpdateInPlace
since the unified outer guard now subsumes it. The 'Index spec unchanged'
status message merges into the generic 'Already exists on destination' —
slight loss of fidelity, gain in symmetry across the four create methods.

47/47 unit tests pass, pint clean, phpstan level 3 clean.
Comment thread src/Migration/Destinations/Appwrite.php Outdated
Mostly mechanical reduction of multi-line WHY blocks to one-liners and
removal of obvious WHAT comments. Keeps load-bearing rationale (e.g.
'utopia syncs both physical sides; partner meta has to be refreshed
by hand') as one short line per concern.
- valuesMatch helper: ksort assoc arrays before === so {min,max} matches
  {max,min}. Lists (filters, columns) keep order-sensitive comparison
  since order is semantically meaningful there.
- Apply via attributeSpecMatches (formatOptions, relationship options,
  onDelete) and arraysDifferOnKeys (used by updateAttributeInPlace and
  updateRelationshipInPlace).
- One-line comment on each spec-match guard documenting why the Create
  case is excluded (no existing dest doc to match against).
…ything

resolveSchemaAction's createdAt branch became redundant once the
spec-match guard landed. The flow:

  - resolveSchemaAction returns Create / Tolerate / UpdateInPlace
    based purely on existence and updatedAt-newer.
  - The destination's spec-match guard overrides UpdateInPlace to
    Tolerate when source and dest have identical spec.
  - Inner attribute/index spec checks fall through to drop+recreate
    when the SDK can't express the change in place.

End-state behavior on every scenario is identical (verified before
this commit). The createdAt path was extra work that produced the
same outcome via a longer route.

Removed:
- $sourceCreatedAt / $destCreatedAt / $canDrop params from
  resolveSchemaAction
- createdAtDiffers private helper
- SchemaAction::DropAndRecreate enum case (no longer returned by
  resolveSchemaAction; inner fallthrough drops directly)
- 4 call sites in Appwrite.php drop the createdAt args + canDrop:
  named arg + DropAndRecreate match arm + DropAndRecreate post-match
  if-clause
- ~13 createdAt-specific unit tests; remaining 19 test the
  exists × mode × updatedAt matrix
1. Stale $table after dropAttributeForRecreate caused false LimitException

   $table is loaded once at the top of createField. After
   dropAttributeForRecreate runs (UpdateInPlace fallthrough), the meta
   doc + physical column are gone from the DB but the in-memory
   $table object still has the dropped entry in its 'attributes'
   array. utopia's checkAttribute clones $table and appends the new
   attribute, then asks the adapter for getCountOfAttributes — at the
   adapter ceiling this throws LimitException even though the net
   change is zero.

   Fix: re-fetch $table after dropAttributeForRecreate.

2. Partner meta-doc deleted from wrong table when relatedCollection
   changes

   dropAttributeForRecreate looked up the partner meta via
   $resource->getOptions() — source's options. But it runs precisely
   when RELATIONSHIP_STRUCTURAL_FIELDS differ (relatedCollection,
   twoWayKey, etc), so source's options point to the NEW partner table
   while dest's existing partner meta lives on the OLD partner table.
   The OLD partner meta was left orphaned forever.

   Fix: pass $existingAttr through to dropAttributeForRecreate and
   use its 'options' (dest's current state) for partner resolution.
Comment thread src/Migration/Destinations/Appwrite.php
…ments

$existingAttr is always non-empty at the only call site (UpdateInPlace
fallthrough requires $exists=true). Drops the unreachable nullable
fallback path and the defensive isEmpty() check.

Comments collapsed to one line each — the WHY is small enough to fit.
Greptile #2: source-side filter (Sources/Appwrite::exportColumns) skips
RELATION_SIDE_CHILD, so the partner meta-doc created inline inside the
parent's createField never reaches the partner table's createField path
— and therefore never gets registered with trackOrphanCandidate. If
any other column on the partner table is migrated, orphan cleanup sees
the partner key absent from the tracked set and deletes it, breaking
the relationship's child side.

Adds a trackOrphanCandidate call right after the inline createDocument
so the partner key is registered against $relatedTable.

Phpstan note: adding the strict-typed call inside the existing
relationship block shifted phpstan's narrowing analysis and made it
flag two pre-existing $relatedTable usages (createRelationship at line
957 and purgeCachedDocument at line 1004). Both are reachable only when
type === VAR_RELATIONSHIP, where $relatedTable is guaranteed defined.
Suppressed with @phpstan-ignore-next-line on those two specific lines.
Comment thread src/Migration/Destinations/Appwrite.php
Greptile: $count (META_INDEXES query) and IndexValidator both ran
BEFORE the pre-check / drop block. On an Upsert recreate path:

- The to-be-dropped index was included in the count, so a table at
  getLimitForIndexes() falsely threw 'Index limit reached' even
  though the net change is zero (one drop, one add).
- IndexValidator was initialized with $tableIndexes from the
  in-memory $table — including the existing index — so the new
  index spec could be flagged as a conflict against itself ('Invalid
  index: …') before the drop ran.

Reordered createIndex so:
  1. Pre-check + drop block runs first (Tolerate returns early).
  2. $table is reloaded after a drop so embedded $tableIndexes
     reflects the post-drop state.
  3. count, validateFieldsForIndexes, build $index, IndexValidator
     all run AFTER the drop, against the fresh $table.

Mirrors the attribute path's structure where checkAttribute already
runs after dropAttributeForRecreate.
Comment thread src/Migration/Destinations/Appwrite.php Outdated
private const META_INDEXES = 'indexes';

/** Fields the SDK's per-type updateXAttribute endpoints don't expose; a change here forces drop+recreate. */
private const ATTRIBUTE_NON_SDK_FIELDS = [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this ATTRIBUTE_IMMUTABLE_FIELDS

Comment thread src/Migration/Destinations/Appwrite.php Outdated
Comment on lines +66 to +69
/** Names of the platform-DB collections holding Appwrite schema metadata. */
private const META_DATABASES = 'databases';
private const META_ATTRIBUTES = 'attributes';
private const META_INDEXES = 'indexes';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are project DB collections

Comment thread src/Migration/Destinations/Appwrite.php Outdated
];

/** Fields the SDK's updateRelationshipAttribute doesn't expose (only onDelete/newKey are SDK-reachable). */
private const RELATIONSHIP_STRUCTURAL_FIELDS = [
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, RELATIONSHIP_IMMUTABLE_FIELDS

Comment on lines +5 to +9
enum SchemaAction
{
case Create;
case Tolerate;
case UpdateInPlace;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make these match Appwrite terms; skip, overwrite and fail

Maintainer review feedback:

- ATTRIBUTE_NON_SDK_FIELDS -> ATTRIBUTE_IMMUTABLE_FIELDS
- RELATIONSHIP_STRUCTURAL_FIELDS -> RELATIONSHIP_IMMUTABLE_FIELDS
- META_* docblock 'platform-DB' -> 'project-DB' (these collections live
  in dbForProject, not the platform-level DB)

Naming shift: "NON_SDK"/"STRUCTURAL" describes the cause (SDK doesn't
expose them), "IMMUTABLE" describes the effect (can't change in place,
only via drop+recreate). The latter is the audience-facing semantic.

SDK-boundary lock-in test (AppwriteAttributeSdkBoundaryTest) still
reflects on the renamed constant; updated its references.
Maintainer review: align with Appwrite's terminology (skip / overwrite
/ fail). Public-API breaking change — value 'upsert' becomes
'overwrite', so any pre-1.9.3 consumer of OnDuplicate::Upsert or the
'upsert' string needs to update.

Renamed in this repo:
- enum case + value
- all OnDuplicate::Upsert references in DestinationAppwrite
- internal helpers cleanupUpsertOrphans(ForTable) -> cleanupOverwriteOrphans(ForTable)
- comment / docblock mentions of 'Upsert-mode' / 'Upsert recreate'
- unit test method names testUpsert* -> testOverwrite*

Cross-repo follow-up needed in appwrite/appwrite:
- app/controllers/api/migrations.php (3 endpoint param descriptions)
- src/Appwrite/Platform/Workers/Migrations.php (OnDuplicate reference)
- e2e tests passing 'onDuplicate' => 'upsert'
- console UI hardcoded option strings (separate repo)
Comment thread src/Migration/Destinations/OnDuplicate.php
Comment thread src/Migration/Destinations/Appwrite.php
premtsd-code added a commit to appwrite/appwrite that referenced this pull request Apr 30, 2026
Maintainer review on utopia-php/migration#171 renamed
OnDuplicate::Upsert -> OnDuplicate::Overwrite (value 'upsert' ->
'overwrite') to align with Appwrite terms (skip / overwrite / fail).
Applying the cross-repo ripple here:

- app/controllers/api/migrations.php: 3 endpoint param descriptions
  updated ('upsert' -> 'overwrite' in the help text). The validator
  still uses OnDuplicate::values() so it auto-picks up the new value.
- tests/e2e/Services/Migrations/MigrationsBase.php: all
  'onDuplicate' => 'upsert' -> 'overwrite'; method names
  testAppwriteMigrationUpsert* -> testAppwriteMigrationOverwrite*;
  comments / assertion messages / local var names switched.
- Left untouched: utopia's upsertDocuments operation, transaction
  TransactionState 'upsert' action, Operation validator — those refer
  to the database-level upsert primitive, not the OnDuplicate enum.

composer.lock: utopia-php/migration 7d71505 -> b8ae7bc.
Greptile #3167268161: cleanupOverwriteOrphansForTable removes orphan
attributes from dest's META_ATTRIBUTES + physical schema, but the
in-memory $table object still holds them in its 'attributes' array.
The field-strip loop right after iterates $table->getAttribute(
'attributes') to decide which row payload fields to keep — finds the
orphan as 'valid' and leaves it in the row. Subsequent upsertDocuments
hits the post-cleanup schema, structure validator rejects the row.

Fix: reload $table immediately after orphan cleanup. Mirrors the
attribute-path and index-path stale-$table fixes.
Jake's review on PR #171 (line 9 of OnDuplicate.php — SchemaAction
enum): align the SchemaAction case names with Appwrite's user-facing
terms (skip / overwrite / fail).

- SchemaAction::Tolerate -> SchemaAction::Skip
- SchemaAction::UpdateInPlace -> SchemaAction::Overwrite
- SchemaAction::Create kept (no Appwrite equivalent; means 'no
  existing dest doc, run the create flow')

Now the inner machinery uses the same vocabulary as the user-facing
OnDuplicate enum (Fail / Skip / Overwrite). Comments and test method
names updated to match (Tolerate -> Skip, UpdateInPlace -> Overwrite
where they refer to the action — the test name 'UpdatesInPlace' as a
verb phrase stays).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants