Surface readTreehash errors instead of swallowing#118
Open
sbalabanov wants to merge 1 commit into
Open
Conversation
|
|
d0f73b9 to
fa8e46d
Compare
f7d9283 to
6bd1655
Compare
fa8e46d to
793d755
Compare
6bd1655 to
12fbcd7
Compare
readTreehash previously logged a warning and returned "" for any error,
collapsing two very different cases into one signal: a legitimate cache
miss (treehash not yet computed) and a real infra failure (e.g. storage
backend rejecting a missing-TTL request). The caller couldn't tell them
apart, so an outage that disabled the cache showed up only as silent
recomputation — visible only by tailing logs.
Change the signature to (string, error):
- NotFound -> ("", nil): expected miss, callers fall through
- any other failure -> ("", err): wrapped with the storage key
- success -> (treehash, nil)
Pre-cache lookup in GetChangedTargets now surfaces the error as a
request failure tagged failureReasonTreehashRead / ErrorTypeInfra so
storage breakage becomes a visible request-level failure rather than
silent degradation.
The cache-write goroutine can't return synchronously, so it logs at
Error level and abandons the write — losing the cache opportunity but
keeping the failure loud.
Tests:
- TestReadTreehash covers all four signature outcomes.
- TestGetChangedTargets_TreehashReadError (replaces the old
GetGraphError test, which only worked because errors were swallowed)
asserts the request fails with the new failure_reason classification.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
793d755 to
5b7fe0b
Compare
xytan0056
approved these changes
Jun 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
readTreehashcollapsed two very different cases — a legitimate cache miss (treehash not yet computed) and a real infra failure (e.g. storage backend rejecting a missing-TTL request) — into the same return value:"". Callers couldn't distinguish them, so an outage that disabled the cache showed up only as silent recomputation, visible only by tailing logs.Changes
readTreehashnow returns(string, error):NotFound→("", nil): the expected miss, callers fall through to recompute("", err): wrapped with the storage key for context(treehash, nil)Pre-cache lookup in
GetChangedTargetssurfaces the error as a request failure taggedfailureReasonTreehashRead/ErrorTypeInfra. Storage breakage now becomes a visible request-level failure rather than silent degradation, so dashboards and on-call alerts can see it.Cache-write goroutine can't return synchronously, so it logs at
Errorlevel (wasWarn) and abandons the write. The cache opportunity is lost but the failure stays loud.New
failureReasonTreehashReadfailure-reason constant for metric tagging.Tests
TestReadTreehashcovers the four signature outcomes: NotFound miss, generic storage error, nil response, success.TestGetChangedTargets_TreehashReadErrorreplaces the oldGetGraphErrortest (which only "worked" because errors were swallowed and the request still reached graph fetch) and asserts the request fails with the newfailureReasonTreehashReadclassification.Test plan
Stacking
Based on `fix-cache-goroutine-ctx` (#115). Merge order: #115 first, then this. Once #115 lands, this PR's base should be retargeted to `main`.
🤖 Generated with Claude Code