Skip to content

Fix Parallel Forks: Closes (ForkedContainerError and DuplicateBatchError) when hydrating twice and submitting in parallel (via Counter DDS)#26443

Open
dannimad wants to merge 4 commits intomainfrom
fix-id
Open

Fix Parallel Forks: Closes (ForkedContainerError and DuplicateBatchError) when hydrating twice and submitting in parallel (via Counter DDS)#26443
dannimad wants to merge 4 commits intomainfrom
fix-id

Conversation

@dannimad
Copy link
Contributor

TBD

@dannimad dannimad requested a review from a team as a code owner February 13, 2026 22:06
Copilot AI review requested due to automatic review settings February 13, 2026 22:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix ForkedContainerError and DuplicateBatchError issues that occur when multiple containers are hydrated from the same pending state and submit operations in parallel. The fix modifies the ID Compressor to silently skip re-finalization of ID ranges that overlap with already-finalized IDs in fork scenarios, allowing higher-level fork detection mechanisms to handle the error.

Changes:

  • Modified IdCompressor.finalizeCreationRange() to detect and skip overlapping range re-finalization for local sessions
  • Removed local-only test restriction for the "Parallel Forks" test case

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
packages/runtime/id-compressor/src/idCompressor.ts Added fork scenario detection logic (lines 331-349) to silently skip re-finalization of overlapping ID ranges instead of throwing an error
packages/test/test-end-to-end-tests/src/test/offline/stashedOps.spec.ts Removed if (provider.driver.type !== "local") skip guard to enable the parallel forks test on all provider types

Comment on lines 340 to 349
if (
rangeBaseLocal > expectedRangeBaseLocal &&
isLocal &&
rangeBaseLocal <= lastCluster.baseLocalId
) {
// This range overlaps with IDs that were already finalized from stashed ops.
// This is expected in a fork scenario - skip it and let the higher-level
// fork detection (via batchId mechanism) handle the error appropriately.
return;
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fork detection logic is too broad and will incorrectly allow non-fork scenarios where ranges are finalized twice or out of order within a single IdCompressor instance. This will break existing tests like "prevents attempts to finalize ranges twice" and "when there are multiple outstanding ranges".

The current conditions cannot distinguish between:

  1. A genuine fork (two IdCompressor instances loaded from same pending state)
  2. A programming error (finalizing the same range twice in one instance)

Both scenarios meet all three conditions and result in a silent return, which is incorrect for case 2. A more robust approach would be to track whether we're in a fork scenario explicitly (e.g., via a flag set during rehydration) or to validate that the range being finalized is actually a duplicate from stashed ops rather than just any overlapping range.

Copilot uses AI. Check for mistakes.
Comment on lines 331 to 349
// Check if we're trying to re-finalize a range that was already finalized.
// This can happen in a fork scenario where multiple containers load from the same
// pending state, both having the same local session ID, and both try to finalize
// the same range from their stashed ops.
//
// Validation criteria for a fork scenario (all must be true):
// 1. Range is out of sequence (rangeBaseLocal > expected)
// 2. Must be for the local session (forked containers share the same session ID)
// 3. The range must overlap with already-finalized IDs (not just out of sequence)
if (
rangeBaseLocal > expectedRangeBaseLocal &&
isLocal &&
rangeBaseLocal <= lastCluster.baseLocalId
) {
// This range overlaps with IDs that were already finalized from stashed ops.
// This is expected in a fork scenario - skip it and let the higher-level
// fork detection (via batchId mechanism) handle the error appropriately.
return;
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fork detection logic (lines 340-349 in idCompressor.ts) silently returns for overlapping ranges, but there is no test coverage for this new behavior. Specifically:

  1. No test verifies that the fork scenario (two instances from same pending state) correctly skips re-finalization
  2. No test verifies that existing error detection still works correctly after this change
  3. The existing test "prevents attempts to finalize ranges twice" (line 467) will likely fail with this change

Add comprehensive test coverage for the fork detection logic, including tests that verify both the fork scenario and non-fork scenarios still behave correctly.

Copilot uses AI. Check for mistakes.
Comment on lines 2254 to 2256
const incrementValue = 3;
const pendingLocalState = await generatePendingState(
testContainerConfig_noSummarizer,
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the local-only skip guard suggests this test is now expected to pass on all provider types. However, without understanding the root cause of why this was local-only (likely timing-related), it's unclear if the idCompressor fix actually addresses the underlying issue.

Consider adding a comment explaining why this restriction is no longer needed, or verify through testing that the fix indeed makes the test stable on non-local providers. The comment at line 2262 suggests the original issue was "Ranges finalized out of order" from the ID Compressor, so if the fix resolves that, this change may be appropriate - but it should be verified.

Copilot uses AI. Check for mistakes.
@dannimad dannimad requested a review from a team as a code owner February 16, 2026 18:49
@github-actions
Copy link
Contributor

🔗 No broken links found! ✅

Your attention to detail is admirable.

linkcheck output


> fluid-framework-docs-site@0.0.0 ci:check-links /home/runner/work/FluidFramework/FluidFramework/docs
> start-server-and-test "npm run serve -- --no-open" 3000 check-links

1: starting server using command "npm run serve -- --no-open"
and when url "[ 'http://127.0.0.1:3000' ]" is responding with HTTP status code 200
running tests using command "npm run check-links"


> fluid-framework-docs-site@0.0.0 serve
> docusaurus serve --no-open

[SUCCESS] Serving "build" directory at: http://localhost:3000/

> fluid-framework-docs-site@0.0.0 check-links
> linkcheck http://localhost:3000 --skip-file skipped-urls.txt

Crawling...

Stats:
  257699 links
    1822 destination URLs
    2063 URLs ignored
       0 warnings
       0 errors


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant