Fix Parallel Forks: Closes (ForkedContainerError and DuplicateBatchError) when hydrating twice and submitting in parallel (via Counter DDS)#26443
Conversation
There was a problem hiding this comment.
Pull request overview
This PR attempts to fix ForkedContainerError and DuplicateBatchError issues that occur when multiple containers are hydrated from the same pending state and submit operations in parallel. The fix modifies the ID Compressor to silently skip re-finalization of ID ranges that overlap with already-finalized IDs in fork scenarios, allowing higher-level fork detection mechanisms to handle the error.
Changes:
- Modified
IdCompressor.finalizeCreationRange()to detect and skip overlapping range re-finalization for local sessions - Removed local-only test restriction for the "Parallel Forks" test case
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| packages/runtime/id-compressor/src/idCompressor.ts | Added fork scenario detection logic (lines 331-349) to silently skip re-finalization of overlapping ID ranges instead of throwing an error |
| packages/test/test-end-to-end-tests/src/test/offline/stashedOps.spec.ts | Removed if (provider.driver.type !== "local") skip guard to enable the parallel forks test on all provider types |
| if ( | ||
| rangeBaseLocal > expectedRangeBaseLocal && | ||
| isLocal && | ||
| rangeBaseLocal <= lastCluster.baseLocalId | ||
| ) { | ||
| // This range overlaps with IDs that were already finalized from stashed ops. | ||
| // This is expected in a fork scenario - skip it and let the higher-level | ||
| // fork detection (via batchId mechanism) handle the error appropriately. | ||
| return; | ||
| } |
There was a problem hiding this comment.
The fork detection logic is too broad and will incorrectly allow non-fork scenarios where ranges are finalized twice or out of order within a single IdCompressor instance. This will break existing tests like "prevents attempts to finalize ranges twice" and "when there are multiple outstanding ranges".
The current conditions cannot distinguish between:
- A genuine fork (two IdCompressor instances loaded from same pending state)
- A programming error (finalizing the same range twice in one instance)
Both scenarios meet all three conditions and result in a silent return, which is incorrect for case 2. A more robust approach would be to track whether we're in a fork scenario explicitly (e.g., via a flag set during rehydration) or to validate that the range being finalized is actually a duplicate from stashed ops rather than just any overlapping range.
| // Check if we're trying to re-finalize a range that was already finalized. | ||
| // This can happen in a fork scenario where multiple containers load from the same | ||
| // pending state, both having the same local session ID, and both try to finalize | ||
| // the same range from their stashed ops. | ||
| // | ||
| // Validation criteria for a fork scenario (all must be true): | ||
| // 1. Range is out of sequence (rangeBaseLocal > expected) | ||
| // 2. Must be for the local session (forked containers share the same session ID) | ||
| // 3. The range must overlap with already-finalized IDs (not just out of sequence) | ||
| if ( | ||
| rangeBaseLocal > expectedRangeBaseLocal && | ||
| isLocal && | ||
| rangeBaseLocal <= lastCluster.baseLocalId | ||
| ) { | ||
| // This range overlaps with IDs that were already finalized from stashed ops. | ||
| // This is expected in a fork scenario - skip it and let the higher-level | ||
| // fork detection (via batchId mechanism) handle the error appropriately. | ||
| return; | ||
| } |
There was a problem hiding this comment.
The new fork detection logic (lines 340-349 in idCompressor.ts) silently returns for overlapping ranges, but there is no test coverage for this new behavior. Specifically:
- No test verifies that the fork scenario (two instances from same pending state) correctly skips re-finalization
- No test verifies that existing error detection still works correctly after this change
- The existing test "prevents attempts to finalize ranges twice" (line 467) will likely fail with this change
Add comprehensive test coverage for the fork detection logic, including tests that verify both the fork scenario and non-fork scenarios still behave correctly.
| const incrementValue = 3; | ||
| const pendingLocalState = await generatePendingState( | ||
| testContainerConfig_noSummarizer, |
There was a problem hiding this comment.
Removing the local-only skip guard suggests this test is now expected to pass on all provider types. However, without understanding the root cause of why this was local-only (likely timing-related), it's unclear if the idCompressor fix actually addresses the underlying issue.
Consider adding a comment explaining why this restriction is no longer needed, or verify through testing that the fix indeed makes the test stable on non-local providers. The comment at line 2262 suggests the original issue was "Ranges finalized out of order" from the ID Compressor, so if the fix resolves that, this change may be appropriate - but it should be verified.
|
🔗 No broken links found! ✅ Your attention to detail is admirable. linkcheck output |
TBD