BYOK Azure OpenAI: `429` throttling exhausts all retries in ~0.15s with no effective backoff; retry count/wait not configurable

### Describe the bug

Using a BYOK **Azure OpenAI** provider (`COPILOT_PROVIDER_TYPE=azure`), transient throttling (`429 Too Many Requests`) causes Copilot CLI to burn all 5 retries near-instantly and abort, instead of backing off. This is especially impactful for self-hosted / air-gapped Azure OpenAI deployments with low quota ceilings (e.g. small TPM/PTU allocations), where brief, fully recoverable throttle windows are common.

I hit this **quickly and often** with an Azure-hosted **GPT 5.4** deployment on a **270K token/min** quota — even routine sessions trip the limit and then fail hard instead of waiting out the short throttle window.

Observed output:

```text
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....

Failed to get response from the AI model; retried 5 times (total retry wait time: 0.15 seconds).
```

All 5 retries are consumed in ~0.15s total (as low as ~0.02s each), so a recoverable throttle is treated as a hard failure.

### Affected version

**Copilot CLI version:** 1.0.51

### Steps to reproduce the behavior

1. Configure a BYOK Azure OpenAI deployment with a low rate-limit ceiling (e.g. GPT 5.4 at 270K token/min).
2. Run Copilot CLI with that provider/model and issue requests that exceed the ceiling.
3. Observe 5 immediate retries and abort with a sub-second total wait time.

### Expected behavior

- On `429`/`503` from a BYOK provider, apply effective backoff:
  - Respect `Retry-After` when present (Azure OpenAI typically returns it) — confirm this path is wired for the `azure`/BYOK provider, not only GitHub-hosted CAPI.
  - When no `Retry-After` is present, fall back to a **default exponential backoff with jitter** (capped at a few minutes) rather than ~0.
- Make retry behavior **user-configurable**, e.g. environment variables such as `COPILOT_RETRY_MAX` and `COPILOT_RETRY_WAIT_MS` for headless/CI use.
- Don't reinterpret a long server-side wait as a transport failure (the ~30–60s client-timeout interaction noted in #2661 

### Additional context

- #2760 (closed, v1.0.32) added `Retry-After`-based bounded exponential backoff, but on BYOK **Azure OpenAI** with 1.0.51 the retries still exhaust in ~0.15s — suggesting the backoff isn't applied on the BYOK/`azure` path, or the throttle response is reaching the client without a usable `Retry-After`.
- Docs confirm no retry settings exist today

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BYOK Azure OpenAI: `429` throttling exhausts all retries in ~0.15s with no effective backoff; retry count/wait not configurable #3679

Describe the bug

Affected version

Steps to reproduce the behavior

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

BYOK Azure OpenAI: 429 throttling exhausts all retries in ~0.15s with no effective backoff; retry count/wait not configurable #3679

Description

Describe the bug

Affected version

Steps to reproduce the behavior

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

BYOK Azure OpenAI: `429` throttling exhausts all retries in ~0.15s with no effective backoff; retry count/wait not configurable #3679