Skip to content

task(Migrate ESMappingAPIImpl) Refs:34933 #34938

Draft
fabrizzio-dotCMS wants to merge 37 commits intomainfrom
issue-34933-ESMappingAPIImpl
Draft

task(Migrate ESMappingAPIImpl) Refs:34933 #34938
fabrizzio-dotCMS wants to merge 37 commits intomainfrom
issue-34933-ESMappingAPIImpl

Conversation

@fabrizzio-dotCMS
Copy link
Member

@fabrizzio-dotCMS fabrizzio-dotCMS commented Mar 10, 2026

Proposed Changes

  • change 1
  • change 2

Checklist

  • Tests
  • Translations
  • Security Implications Contemplated (add notes if applicable)

Additional Info

** any additional useful context or info **

Screenshots

Original Updated
** original screenshot ** ** updated screenshot **

This PR fixes: #34933

fabrizzio-dotCMS and others added 10 commits March 11, 2026 20:41
- Add CONNECTION_RETRY_SLEEP_SECONDS property to OSIndexProperty to fix conflicting semantics with CONNECTION_TIMEOUT (HTTP ms vs retry seconds)
- Use CONNECTION_RETRY_SLEEP_SECONDS in OSIndexAPIImpl.waitUtilIndexReady()
- Fix IndexAPIImpl class Javadoc to reference FEATURE_FLAG_OPEN_SEARCH_PHASE and migration-phase-based routing instead of stale FEATURE_FLAG_OPEN_SEARCH_READ/WRITE flags
- Replace non-thread-safe SimpleDateFormat with DateTimeFormatter in VersionedIndicesAPIImpl (@ApplicationScoped bean)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…4728

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Mar 17, 2026

Claude finished @fabrizzio-dotCMS's task in 2m 31s —— View job


PR Review: ESMappingAPIImpl Migration

  • Gather context
  • Read key changed files
  • Review findings

Summary

Good structural direction — vendor types are properly confined to the implementation classes. But several issues range from dead code to bugs that will surface in dual-write or OS-only phases.


Issues

1. getProvider() is hardcoded to ES — OS path is unreachable

ContentletIndexAPIImpl.java:155-158:

ContentletIndexOperations getProvider() {
    //TODO: For now always default to ES
    return operationsES;
}

Every write operation (index, delete, bulk) routes through this. Even in Phase 3 (OS-only), all writes still go to ES. operationsOS is injected via CDI but never called. The dual-write fan-out that the architecture promises doesn't happen. Fix this →


2. appendBulkRemoveRequest* methods use ES-only IndiciesInfo — OS deletes silently dropped

ContentletIndexAPIImpl.java:1132-1146 and 1153-1168:

final IndiciesInfo info = legacyIndiciesAPI.loadIndicies(); // ES indices only
for (final String index : info.asMap().values()) {
    getProvider().addDeleteOp(req, index, id);
}

IndiciesInfo.asMap() only contains ES index names. In any non-Phase-0 state, OS indices are not covered here. Delete operations sent to ES indices via an OS provider would fail, and OS indices would never get the delete. Fix this →


3. indexReadyOS() and isOsWorkingIndexEmpty() check OS indices using the ES client

ContentletIndexAPIImpl.java:204-208:

final boolean hasWorking = indices.working()
        .map(name -> Try.of(() -> esIndexApi.indexExists(name)).getOrElse(false))
        .orElse(false);

esIndexApi.indexExists() talks to Elasticsearch. An OS index name checked against ES will always return false, so indexReadyOS() always returns false in dual-write phases. This means indexReady() always returns false when migration has started, and checkAndInitializeIndex() will keep trying to create indices and trigger spurious reindexes on every boot.

Same problem in isOsWorkingIndexEmpty()getIndexDocumentCount routes through getProvider() → ES, so it counts documents in an ES index for an OS index name (returns 0), triggering full reindex. Fix this →


4. Dead code in fullReindexAbort() — old indices are fetched but never deleted

ContentletIndexAPIImpl.java:1398-1399:

info.getReindexWorking();
info.getReindexLive();

Return values are discarded. The reindex indices are fetched, then legacyIndiciesAPI.point(newinfo) removes them from the pointer table, but the actual ES/OS index objects are never deleted. The old behavior presumably called delete() on these. Fix this →


5. ESIndexBulkRequest cast in ES impl has no type guard — asymmetric with OS

ContentletIndexOperationsES.java:290-292:

private static BulkRequest asBulkRequest(final IndexBulkRequest req) {
    return ((ESIndexBulkRequest) req).delegate; // bare ClassCastException on wrong type
}

The OS equivalent has an instanceof guard with a descriptive DotRuntimeException. If the wrong request type reaches the ES path, the error is opaque. Minor, but the asymmetry is a footgun when wiring up dual-write. Fix this →


6. OSIndexBulkProcessor is missing byte-size and flush-interval flush triggers

ContentletIndexOperationsOS.java:109-176: The manual OSIndexBulkProcessor only flushes on maxActions. The ES BulkProcessor also flushes on setBulkSize (byte threshold) and a flush interval. Under high-throughput reindex with many small docs, the OS path won't auto-flush until it hits the action count, potentially accumulating unbounded memory. Under low-throughput, it might never flush on its own if close() isn't called.


7. addBulkRequest and addBulkRequestToProcessor are near-duplicate (~130 lines each)

ContentletIndexAPIImpl.java:922-1052: Both methods contain the same contentlet → index mapping logic with only the accumulation target differing (IndexBulkRequest vs IndexBulkProcessor). Any fix to the indexing logic must be applied to both — a maintenance hazard. Consider extracting the shared mapping+routing logic.


8. AtomicReference.updateAndGet comment is wrong

ContentletIndexAPIImpl.java:137-140:

// Thread-safe: uses AtomicReference#updateAndGet to ensure
// exactly one instance is published without synchronization overhead.

updateAndGet may invoke the function multiple times in the presence of contention — it retries on CAS failure. "Exactly one instance" is not guaranteed; APILocator.getContentMappingAPI() could be called multiple times. In practice harmless since the return is stable, but the comment is misleading.


9. Vendor-specific exceptions still leaked in migration-path methods

ContentletIndexAPIImpl.java:282, 291, 308, 463, 498, 1403: Methods like createContentIndex, fullReindexStart, fullReindexAbort, and initIndexES catch and rethrow as ElasticsearchException. These are on code paths that will also serve OS operations. Either introduce a neutral DotIndexException for these or use DotRuntimeException consistently.


Previous Comment

The previous Claude review flagged the OSGi rollback risk (broken interface contracts). That concern stands — the two-phase removal pattern should be applied for ContentMappingAPI, checkAndInitialiazeIndex, and the removed BulkRequest/BulkResponse method signatures.

@claude
Copy link
Contributor

claude bot commented Mar 17, 2026

Pull Request Unsafe to Rollback!!!

  • Category: M-4 — OSGi Plugin API Breakage
  • Risk Level: 🟡 MEDIUM
  • Why it's unsafe: The ContentletIndexAPI public interface has had multiple breaking signature changes: Elasticsearch-specific types (BulkRequest, BulkResponse, BulkProcessor, ActionListener) have been replaced with vendor-neutral abstractions (IndexBulkRequest, IndexBulkProcessor, IndexBulkListener, IndexBulkItemResult). The method checkAndInitialiazeIndex() was renamed to checkAndInitializeIndex(), getRidOfOldIndex() was removed, putToIndex(BulkRequest, ActionListener) was removed, and timestampFormatter changed type from SimpleDateFormat to DateTimeFormatter. In addition, ContentMappingAPI was deleted entirely and replaced by ContentIndexMappingAPI. Any OSGi plugin compiled against the old interface will fail to activate after rollback to N-1 if it was recompiled against N.
  • Code that makes it unsafe:
    • dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ContentletIndexAPI.java — method getRidOfOldIndex() removed; checkAndInitialiazeIndex() renamed; putToIndex(BulkRequest, ActionListener<BulkResponse>) removed; createBulkRequest(), appendBulkRequest(), appendBulkRemoveRequest(), createBulkProcessor(BulkProcessorListener), appendToBulkProcessor(BulkProcessor, ...) all changed to use vendor-neutral types; timestampFormatter type changed from SimpleDateFormat to DateTimeFormatter
    • dotCMS/src/main/java/com/dotcms/content/business/ContentMappingAPI.java — interface deleted and replaced by ContentIndexMappingAPI
    • dotCMS/src/main/java/com/dotmarketing/common/reindex/BulkProcessorListener.java — now implements IndexBulkListener instead of BulkProcessor.Listener; method signatures beforeBulk and afterBulk changed
  • Alternative (if possible): Keep the old interface methods as @Deprecated default delegators so OSGi plugins compiled against N-1 can still activate after rollback. Introduce the new vendor-neutral types alongside the old ones for one release cycle, then remove the deprecated methods in the following release (two-phase removal pattern).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Not Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[TASK] Migrate ESMappingAPIImpl

1 participant