Skip to content

Detach cancellation in cache-write goroutine; log infra errors#117

Merged
xytan0056 merged 1 commit into
mainfrom
fix-cache-write-context-deadline
Jun 15, 2026
Merged

Detach cancellation in cache-write goroutine; log infra errors#117
xytan0056 merged 1 commit into
mainfrom
fix-cache-write-context-deadline

Conversation

@xytan0056

@xytan0056 xytan0056 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

The compared-targets cache-write goroutine in GetChangedTargets used context.Background(), losing the request's tracing/identity baggage. Switch to context.WithoutCancel(ctx) so the write survives client disconnect while keeping request values.
Per-operation deadlines stay out of the controller — storage.Storage is backend-agnostic and must not encode any one backend's I/O budget. Backends that need a deadline (e.g. YARPC/terrablob, which rejects deadline-less contexts with InvalidArgument: missing TTL) own that responsibility in their Put/Get implementations.

readTreehash now takes a *zap.Logger and logs any non-NotFound storage error. The silent swallow is what hid this bug — a genuine cache miss stays quiet, but an infra failure disabling the cache is now visible. The goroutine also warns when it skips because a treehash was empty.

Test Plan

Unit tests
tested in Uber's internal env

Issue

@xytan0056 xytan0056 requested review from a team as code owners June 15, 2026 05:55
resp, err := st.Get(ctx, storage.DownloadRequest{Key: common.GetTreehashCachePath(buildDescription)})
// A genuine cache miss (not-found) is silent; any other storage error is logged so an
// infra failure that disables the cache (e.g. a missing-deadline "missing TTL" reject) is visible.
func readTreehash(ctx context.Context, st storage.Storage, logger *zap.Logger, buildDescription *pb.BuildDescription) string {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a log here for verbosity

Comment thread controller/getchangedtargets.go Outdated
// "InvalidArgument: missing TTL". Detach cancellation so the cache write
// survives client disconnect, but preserve the request values and set a
// fresh deadline so the storage reads/writes carry a valid TTL.
const cacheWriteTimeout = 2 * time.Minute

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be configurable in the service config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it and decided to add deadline in storage implementation

The compared-targets cache-write goroutine in GetChangedTargets used
context.Background(), losing request values (tracing/identity). Switch
to context.WithoutCancel(ctx) so the write survives client disconnect
while keeping the request's baggage.

Per-operation deadlines stay out of the controller — storage.Storage is
backend-agnostic and must not encode any one backend's I/O budget. The
TTL-rejection case (e.g. terrablob/YARPC requiring a deadline) is the
storage implementation's responsibility to wrap with its own timeout.

readTreehash now takes a *zap.Logger and logs any non-NotFound storage
error; the silent swallow is what hid this bug. Genuine cache misses
stay quiet. The cache-write goroutine also warns when it skips because
a treehash was empty.
@xytan0056 xytan0056 force-pushed the fix-cache-write-context-deadline branch from b52aa0d to d39ef6e Compare June 15, 2026 18:29
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@xytan0056 xytan0056 changed the title Give cache-write goroutine a context with a deadline Detach cancellation in cache-write goroutine; log infra errors Jun 15, 2026
@xytan0056 xytan0056 merged commit 95b2719 into main Jun 15, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants