Skip to content

BYOK Azure OpenAI: 429 throttling exhausts all retries in ~0.15s with no effective backoff; retry count/wait not configurable #3679

@daweins

Description

@daweins

Describe the bug

Using a BYOK Azure OpenAI provider (COPILOT_PROVIDER_TYPE=azure), transient throttling (429 Too Many Requests) causes Copilot CLI to burn all 5 retries near-instantly and abort, instead of backing off. This is especially impactful for self-hosted / air-gapped Azure OpenAI deployments with low quota ceilings (e.g. small TPM/PTU allocations), where brief, fully recoverable throttle windows are common.

I hit this quickly and often with an Azure-hosted GPT 5.4 deployment on a 270K token/min quota — even routine sessions trip the limit and then fail hard instead of waiting out the short throttle window.

Observed output:

Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....

Failed to get response from the AI model; retried 5 times (total retry wait time: 0.15 seconds).

All 5 retries are consumed in ~0.15s total (as low as ~0.02s each), so a recoverable throttle is treated as a hard failure.

Affected version

Copilot CLI version: 1.0.51

Steps to reproduce the behavior

  1. Configure a BYOK Azure OpenAI deployment with a low rate-limit ceiling (e.g. GPT 5.4 at 270K token/min).
  2. Run Copilot CLI with that provider/model and issue requests that exceed the ceiling.
  3. Observe 5 immediate retries and abort with a sub-second total wait time.

Expected behavior

  • On 429/503 from a BYOK provider, apply effective backoff:
    • Respect Retry-After when present (Azure OpenAI typically returns it) — confirm this path is wired for the azure/BYOK provider, not only GitHub-hosted CAPI.
    • When no Retry-After is present, fall back to a default exponential backoff with jitter (capped at a few minutes) rather than ~0.
  • Make retry behavior user-configurable, e.g. environment variables such as COPILOT_RETRY_MAX and COPILOT_RETRY_WAIT_MS for headless/CI use.
  • Don't reinterpret a long server-side wait as a transport failure (the ~30–60s client-timeout interaction noted in ❌ Error: (query) Execution failed: CAPIError: 400 The requested model is not supported. (Request ID: F72A:17FAF7:84A361F:A2C39DA:69DCC02C) #2661

Additional context

  • ⭐ Implement proper HTTP retry logic for 429 responses #2760 (closed, v1.0.32) added Retry-After-based bounded exponential backoff, but on BYOK Azure OpenAI with 1.0.51 the retries still exhaust in ~0.15s — suggesting the backoff isn't applied on the BYOK/azure path, or the throttle response is reaching the client without a usable Retry-After.
  • Docs confirm no retry settings exist today

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:modelsModel selection, availability, switching, rate limits, and model-specific behaviorarea:networkingProxy, SSL/TLS, certificates, corporate environments, and connectivity issues

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions