Skip to content

feat: honor Retry-After, jitter backoff, bound response-header wait#22

Merged
acoshift merged 1 commit into
mainfrom
feat/ingest-backpressure
Jun 13, 2026
Merged

feat: honor Retry-After, jitter backoff, bound response-header wait#22
acoshift merged 1 commit into
mainfrom
feat/ingest-backpressure

Conversation

@acoshift

Copy link
Copy Markdown
Member

The cheap reliability bundle from the review (#2, #4, #3). All in the flush/retryFlush/transport path next to the taxonomy from #21.

429 / Retry-After (#2)

flush() now reads Retry-After on a 429 (or 503), parses both delta-seconds and HTTP-date forms, clamps to RetryAfterMax (60s), and paces the next retry to it instead of the fixed 100ms–1s backoff. The value is carried to retryFlush via a worker-local, and the retry sleep is now a closeSignal-interruptible timer, so a long Retry-After (or a Close) can't delay shutdown. Retry-After is intentionally ignored on the closing path so a server-requested delay can't stretch shutdown.

→ Stops the client re-POSTing a saturated Quickwit ingest queue once/sec and lets it cooperate with server-side pacing.

Backoff jitter (#3)

Both retry loops use equal jitter (sleep in [d/2, d], via math/rand/v2 — no new dep, concurrency-safe). De-synchronizes retries across workers so they don't arrive in lockstep waves that can re-tip a recovering server.

ResponseHeaderTimeout (#4)

The default transport bounds the wait for response headers to getIngestTimeout()/3 (~5s default). A server that completes the request but stalls before responding (typical L4 LB fronting an overloaded indexer) frees the worker in ~5s instead of the full 15s deadline. Firing returns a transport error, which is already retryable. User-supplied clients via SetHTTPClient are untouched.

Invariants

429/503 stay retryable; the failure taxonomy, fire-and-forget batching, ordering, and the at-least-once/exactly-once-settle guarantees are unchanged. retryAfter is a worker-local (one per loop() goroutine), so no new shared state.

Tests

  • Unit: parseRetryAfter (seconds / HTTP-date / past-date / empty / garbage / zero / negative), jitterBackoff bounds over 1000 draws, ResponseHeaderTimeout == ingestTimeout/3.
  • End-to-end: a 429 with Retry-After: 1 paces the retry to ~1s (vs the 100ms backoff) and still delivers; held over -count=4 -race.

Full suite green, go vet clean, race-clean across -count=2.

🤖 Generated with Claude Code

Three transport/retry hardening items from the review.

- 429/503 Retry-After: flush() now reads the Retry-After header on a 429 (or
  503), parses both delta-seconds and HTTP-date forms, clamps to RetryAfterMax
  (60s), and paces the next retry to it instead of the fixed 100ms-1s backoff.
  The value is carried to retryFlush via a worker-local; the retry sleep is now
  a closeSignal-interruptible timer, so a long Retry-After (or Close) cannot
  delay shutdown. Retry-After is deliberately ignored on the closing path.

- Backoff jitter: both retry loops use equal jitter (sleep in [d/2, d], via
  math/rand/v2) so retries across workers spread out instead of arriving in
  lockstep waves that can re-tip a recovering server.

- ResponseHeaderTimeout: the default transport now bounds the wait for response
  headers to getIngestTimeout()/3 (~5s default). A server that completes the
  request but stalls before responding frees the worker in ~5s instead of the
  full 15s deadline. Firing returns a transport error, already retryable.
  User-supplied clients via SetHTTPClient are untouched.

429/503 stay retryable; the failure taxonomy, fire-and-forget batching,
ordering, and the at-least-once/settle invariants are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@acoshift acoshift merged commit 6d7a8c3 into main Jun 13, 2026
1 check passed
@acoshift acoshift deleted the feat/ingest-backpressure branch June 13, 2026 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant