Detach cancellation in cache-write goroutine; log infra errors#117
Merged
Conversation
xytan0056
commented
Jun 15, 2026
| resp, err := st.Get(ctx, storage.DownloadRequest{Key: common.GetTreehashCachePath(buildDescription)}) | ||
| // A genuine cache miss (not-found) is silent; any other storage error is logged so an | ||
| // infra failure that disables the cache (e.g. a missing-deadline "missing TTL" reject) is visible. | ||
| func readTreehash(ctx context.Context, st storage.Storage, logger *zap.Logger, buildDescription *pb.BuildDescription) string { |
Contributor
Author
There was a problem hiding this comment.
add a log here for verbosity
yushan8
reviewed
Jun 15, 2026
| // "InvalidArgument: missing TTL". Detach cancellation so the cache write | ||
| // survives client disconnect, but preserve the request values and set a | ||
| // fresh deadline so the storage reads/writes carry a valid TTL. | ||
| const cacheWriteTimeout = 2 * time.Minute |
Contributor
There was a problem hiding this comment.
Can this be configurable in the service config?
Contributor
Author
There was a problem hiding this comment.
removed it and decided to add deadline in storage implementation
The compared-targets cache-write goroutine in GetChangedTargets used context.Background(), losing request values (tracing/identity). Switch to context.WithoutCancel(ctx) so the write survives client disconnect while keeping the request's baggage. Per-operation deadlines stay out of the controller — storage.Storage is backend-agnostic and must not encode any one backend's I/O budget. The TTL-rejection case (e.g. terrablob/YARPC requiring a deadline) is the storage implementation's responsibility to wrap with its own timeout. readTreehash now takes a *zap.Logger and logs any non-NotFound storage error; the silent swallow is what hid this bug. Genuine cache misses stay quiet. The cache-write goroutine also warns when it skips because a treehash was empty.
b52aa0d to
d39ef6e
Compare
|
|
yushan8
approved these changes
Jun 15, 2026
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The compared-targets cache-write goroutine in GetChangedTargets used context.Background(), losing the request's tracing/identity baggage. Switch to context.WithoutCancel(ctx) so the write survives client disconnect while keeping request values.
Per-operation deadlines stay out of the controller — storage.Storage is backend-agnostic and must not encode any one backend's I/O budget. Backends that need a deadline (e.g. YARPC/terrablob, which rejects deadline-less contexts with InvalidArgument: missing TTL) own that responsibility in their Put/Get implementations.
readTreehash now takes a *zap.Logger and logs any non-NotFound storage error. The silent swallow is what hid this bug — a genuine cache miss stays quiet, but an infra failure disabling the cache is now visible. The goroutine also warns when it skips because a treehash was empty.
Test Plan
Unit tests
tested in Uber's internal env
Issue