Skip to content

BulkWriter produces IndexError: pop from an empty deque #16138

@mbyrne00

Description

@mbyrne00

Determine this is the right repository

  • I determined this is the correct repository in which to report this bug.

Happy to provide a PR if it's likely to be merged.

Summary of the issue

Context:
Using BulkWriter to bulk-write ~1M documents to Firestore from a Cloud Run job. Firestore returns retryable errors (contention) for multiple operations within the same batch window.

Expected Behaviour:
BulkWriter retries the failed operations and completes the flush successfully.

Actual Behaviour:
BulkWriter._schedule_ready_retries() crashes with IndexError: pop from an empty deque when ≥2 retries are ready simultaneously. The method interleaves popleft() with retry.retry(self), and retry.retry() re-enters the send loop (set()_enqueue_current_batch_ensure_sending_send_until_queue_is_empty_schedule_ready_retries), which drains the remaining retries from the deque. The outer loop then calls popleft() on the now-empty deque.

Related: #15385 (open refactor ticket for BulkWriter retries)

API client name and version

google-cloud-firestore 2.25.0

Reproduction steps: code

file: test_bulkwriter_bug.py

import bisect
import concurrent.futures
import datetime
from unittest.mock import patch

import pytest
from google.cloud.firestore import Client
from google.cloud.firestore_v1.bulk_writer import (
    BulkWriterSetOperation,
    OperationRetry,
)


def test_bulkwriter_schedule_ready_retries_reentrancy_bug():
    """BulkWriter crashes when ≥2 retries are ready simultaneously.

    _schedule_ready_retries computes take_until_index=N, then interleaves
    popleft() with retry.retry(self). retry.retry() re-enters the send loop
    via set() → _enqueue_current_batch → _ensure_sending →
    _send_until_queue_is_empty → _schedule_ready_retries, which drains the
    remaining retries. The outer loop then calls popleft() on the empty deque.

    batch_size=1 forces _enqueue_current_batch on every set(), making the
    re-entrant path deterministic with just 2 retries.
    """
    client = Client(project="any-project")
    writer = client.bulk_writer()
    writer.batch_size = 1

    collection = client.collection("test")
    for i in range(2):
        op = BulkWriterSetOperation(
            reference=collection.document(f"doc-{i}"),
            document_data={"i": i},
            merge=False,
            attempts=1,
        )
        bisect.insort(
            writer._retries,
            OperationRetry(
                operation=op,
                run_at=datetime.datetime(2020, 1, 1, tzinfo=datetime.timezone.utc),
            ),
        )

    f = concurrent.futures.Future()
    f.set_result(None)

    with patch.object(writer, "_send_batch", return_value=f):
        with pytest.raises(IndexError, match="pop from an empty deque"):
            writer.flush()

Reproduction steps: supporting files

None required. The test is self-contained — no Firestore emulator, network access, or credentials needed.

Reproduction steps: actual results

IndexError: pop from an empty deque

Full traceback (from _schedule_ready_retries → re-entrant _send_until_queue_is_empty_schedule_ready_retries):

File "google/cloud/firestore_v1/bulk_writer.py", line 511, in _schedule_ready_retries
    retry: OperationRetry = self._retries.popleft()
IndexError: pop from an empty deque

Reproduction steps: expected results

flush() completes without error. All retryable operations are re-queued and processed.

OS & version + platform

First observed on Cloud Run (GCP, australia-southeast1). Reproduced locally on macOS (Apple Silicon).

Python environment

Python 3.13.7

Python dependencies

google-cloud-firestore   2.25.0

(Only google-cloud-firestore is relevant; the bug is in synchronous retry scheduling logic with no external dependencies.)

Additional context

Root cause: _schedule_ready_retries (line 499 of bulk_writer.py) computes the number of ready retries via bisect, then loops that many times calling popleft() and retry.retry(self) interleaved. retry.retry(self) calls bulk_writer.set() which, if the batch is full, triggers _enqueue_current_batch_ensure_sending_send_until_queue_is_empty_schedule_ready_retries re-entrantly. The inner call drains the remaining retries; the outer loop then calls popleft() on the empty deque.

Suggested fix: Drain all ready retries into a local list before processing:

def _schedule_ready_retries(self) -> None:
    take_until_index = bisect.bisect(
        self._retries, datetime.datetime.now(tz=datetime.timezone.utc)
    )
    ready = [self._retries.popleft() for _ in range(take_until_index)]
    for retry in ready:
        retry.retry(self)

No memory concern — ready holds the same OperationRetry references already in the deque (no data copying). The maximum number of simultaneous retries is bounded by the batch size (default 20).

Related: #15385 — open refactor ticket for BulkWriter retry internals. This bug is a concrete crash caused by the hand-rolled retry logic flagged in that issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: firestoreIssues related to the Firestore API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions