Skip to content

Preserve typed source error on TwirpErrorResponse so callers can distinguish transport vs server-returned Unavailable #324

@tclem

Description

@tclem

Problem

TwirpErrorResponse stringifies the originating reqwest::Error into a rust_error: Option<String> field, which throws away the typed error and prevents downstream callers from distinguishing transport-level failures from server-returned errors with the same twirp code.

crates/twirp/src/error.rs:180:

rust_error: Option<String>,

crates/twirp/src/error.rs:215:

pub fn with_rust_error<E: std::error::Error>(self, err: E) -> Self {
    self.with_rust_error_string(format!("{err:?}"))
}

After #314, From<reqwest::Error> correctly maps connect/timeout/request errors to TwirpErrorCode::Unavailable. But on the receive path, a server that responds with a twirp body whose code is unavailable (HTTP 503 — load shed, drain, brief overload, panic recovery) round-trips into the same TwirpErrorCode::Unavailable. Callers cannot tell "I couldn't reach the host" from "the host told me it was unavailable."

Why this matters

We use blackbird's twirp client to fan out queries across a sharded index. We want to evict a host from our routing table only when it's actually unreachable (connection refused, DNS failure, etc.) — not when it returned a 503. The current API forces us to either:

  • Treat all Unavailable errors as eviction-worthy, which kicks healthy-but-slow hosts out of rotation and concentrates load on the survivors (which is what we hit and reverted in github/blackbird#15118).
  • Treat no Unavailable errors as eviction-worthy, which works but loses the ability to evict known-dead hosts on the hot path (we now rely on a ~1s background poller).

If we could downcast the underlying error and call reqwest::Error::is_connect() / is_timeout(), the distinction would be trivial.

Proposed fix

Replace rust_error: Option<String> with something that preserves the typed source. Two shapes I considered:

Option A — boxed std::error::Error:

pub struct TwirpErrorResponse {
    // ...
    source: Option<Box<dyn std::error::Error + Send + Sync + 'static>>,
}

impl TwirpErrorResponse {
    pub fn source(&self) -> Option<&(dyn std::error::Error + 'static)> { ... }
}

Callers downcast:

if let Some(src) = err.source() {
    if let Some(reqwest_err) = src.downcast_ref::<reqwest::Error>() {
        if reqwest_err.is_connect() { /* evict */ }
    }
}

Pros: flexible, no enum churn when new source types are added.
Cons: downcasting is a little awkward and only works when the caller knows the concrete type.

Option B — typed enum:

pub enum TwirpErrorSource {
    Reqwest(reqwest::Error),
    Url(url::ParseError),
    // ...
}

Pros: easier to pattern-match.
Cons: forces reqwest into the public API of the twirp crate (it's already a dep, but this makes it part of the surface), and the enum has to grow whenever a new From<...> impl is added.

I'd suggest Option A — it stays small and doesn't pin the public API to specific transport crates.

Either way, the existing rust_error() accessor that returns the stringified form can be preserved as a convenience: self.source.as_ref().map(|s| format!("{s:?}")).

Backward compatibility

The current rust_error: Option<String> field is private (accessed via rust_error()). Replacing the field while keeping the accessor returning Option<&String> is a breaking change for that accessor's return type but not for the rest of the API. If you'd rather avoid a return-type change, the accessor could return Option<String> by formatting on the fly.

Workaround we're using

Drop the per-query eviction entirely and rely on a background liveness poller. See github/blackbird#15118 for context.

  Generated via Copilot (Claude Opus 4.7) on behalf of @tclem

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions