You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using a BYOK Azure OpenAI provider (COPILOT_PROVIDER_TYPE=azure), transient throttling (429 Too Many Requests) causes Copilot CLI to burn all 5 retries near-instantly and abort, instead of backing off. This is especially impactful for self-hosted / air-gapped Azure OpenAI deployments with low quota ceilings (e.g. small TPM/PTU allocations), where brief, fully recoverable throttle windows are common.
I hit this quickly and often with an Azure-hosted GPT 5.4 deployment on a 270K token/min quota — even routine sessions trip the limit and then fail hard instead of waiting out the short throttle window.
Observed output:
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Request failed due to a transient API error. Retrying....
Failed to get response from the AI model; retried 5 times (total retry wait time: 0.15 seconds).
All 5 retries are consumed in ~0.15s total (as low as ~0.02s each), so a recoverable throttle is treated as a hard failure.
Affected version
Copilot CLI version: 1.0.51
Steps to reproduce the behavior
Configure a BYOK Azure OpenAI deployment with a low rate-limit ceiling (e.g. GPT 5.4 at 270K token/min).
Run Copilot CLI with that provider/model and issue requests that exceed the ceiling.
Observe 5 immediate retries and abort with a sub-second total wait time.
Expected behavior
On 429/503 from a BYOK provider, apply effective backoff:
Respect Retry-After when present (Azure OpenAI typically returns it) — confirm this path is wired for the azure/BYOK provider, not only GitHub-hosted CAPI.
When no Retry-After is present, fall back to a default exponential backoff with jitter (capped at a few minutes) rather than ~0.
Make retry behavior user-configurable, e.g. environment variables such as COPILOT_RETRY_MAX and COPILOT_RETRY_WAIT_MS for headless/CI use.
⭐ Implement proper HTTP retry logic for 429 responses #2760 (closed, v1.0.32) added Retry-After-based bounded exponential backoff, but on BYOK Azure OpenAI with 1.0.51 the retries still exhaust in ~0.15s — suggesting the backoff isn't applied on the BYOK/azure path, or the throttle response is reaching the client without a usable Retry-After.
Describe the bug
Using a BYOK Azure OpenAI provider (
COPILOT_PROVIDER_TYPE=azure), transient throttling (429 Too Many Requests) causes Copilot CLI to burn all 5 retries near-instantly and abort, instead of backing off. This is especially impactful for self-hosted / air-gapped Azure OpenAI deployments with low quota ceilings (e.g. small TPM/PTU allocations), where brief, fully recoverable throttle windows are common.I hit this quickly and often with an Azure-hosted GPT 5.4 deployment on a 270K token/min quota — even routine sessions trip the limit and then fail hard instead of waiting out the short throttle window.
Observed output:
All 5 retries are consumed in ~0.15s total (as low as ~0.02s each), so a recoverable throttle is treated as a hard failure.
Affected version
Copilot CLI version: 1.0.51
Steps to reproduce the behavior
Expected behavior
429/503from a BYOK provider, apply effective backoff:Retry-Afterwhen present (Azure OpenAI typically returns it) — confirm this path is wired for theazure/BYOK provider, not only GitHub-hosted CAPI.Retry-Afteris present, fall back to a default exponential backoff with jitter (capped at a few minutes) rather than ~0.COPILOT_RETRY_MAXandCOPILOT_RETRY_WAIT_MSfor headless/CI use.Additional context
Retry-After-based bounded exponential backoff, but on BYOK Azure OpenAI with 1.0.51 the retries still exhaust in ~0.15s — suggesting the backoff isn't applied on the BYOK/azurepath, or the throttle response is reaching the client without a usableRetry-After.