Skip to content

Add OpenTelemetry tracing spans to VM startup pipeline#60

Merged
JAORMX merged 1 commit intomainfrom
jaosorior/add-otel-tracing
Apr 7, 2026
Merged

Add OpenTelemetry tracing spans to VM startup pipeline#60
JAORMX merged 1 commit intomainfrom
jaosorior/add-otel-tracing

Conversation

@JAORMX
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX commented Apr 7, 2026

Summary

  • Instrument the critical path through microvm.Run() with OpenTelemetry trace spans so consumers can identify performance bottlenecks in VM startup
  • When no TracerProvider is configured (the default), all tracing is no-op with zero overhead
  • Promotes go.opentelemetry.io/otel and go.opentelemetry.io/otel/trace from indirect to direct dependencies (already in the dep tree via transitive deps)

Spans added

microvm.go — root span + 8 child spans covering each sequential phase:

  • microvm.Run (root) with image/name/cpus/memory attributes
  • microvm.Preflight, microvm.ImagePull, microvm.RootfsClone
  • microvm.RootfsHooks, microvm.BackendPrepare, microvm.NetworkStart
  • microvm.VMSpawn, microvm.PostBoot

image/pull.go — sub-spans in PullWithFetcher():

  • microvm.image.CacheLookup (with cache_hit attribute)
  • microvm.image.Fetch, microvm.image.Extract (with layered attribute)
  • microvm.image.CacheStore

ssh/client.go — span + events in WaitForReady():

  • microvm.SSHWaitReady with host/port/user attributes
  • ssh.probe_failed event per poll iteration with probe count

preflight/checker.go — parent + per-check spans:

  • microvm.preflight.RunAll with check count
  • microvm.preflight.Check per check with name/required attributes

hypervisor/libkrun/backend.go — span in Start():

  • microvm.backend.Start with sub-spans for ResolveRuntime and ResolveFirmware

Motivation

Profiling brood-box startup showed "Sandbox ready" taking ~20s. The microvm.Run() call was a black box. With these spans, a consumer that configures a TracerProvider can now see exactly where time goes:

microvm.Run                                      21.451s
  microvm.Preflight                               0.000s
  microvm.ImagePull                               0.000s  (cache hit)
  microvm.RootfsClone                             5.354s  ← COW clone bottleneck
  microvm.RootfsHooks                             0.015s
  microvm.BackendPrepare                          0.000s
  microvm.NetworkStart                            0.000s
  microvm.VMSpawn                                 0.018s
  microvm.PostBoot                               16.062s
    microvm.SSHWaitReady                         16.061s  ← 8 probes @ 2s

Test plan

  • go test ./... — all existing tests pass (tracing is no-op without provider)
  • golangci-lint run ./... — 0 issues
  • Integration: brood-box consumer with --trace flag produces expected span hierarchy

🤖 Generated with Claude Code

Instrument the critical path through microvm.Run() with OTel trace
spans so consumers can identify performance bottlenecks. When no
TracerProvider is configured (the default), all tracing is no-op
with zero overhead.

Spans added:
- microvm.Run (root) with image/name/cpus/memory attributes
- microvm.Preflight, microvm.ImagePull, microvm.RootfsClone
- microvm.RootfsHooks, microvm.BackendPrepare, microvm.NetworkStart
- microvm.VMSpawn, microvm.PostBoot
- microvm.image.CacheLookup/Fetch/Extract/CacheStore (image/pull.go)
- microvm.SSHWaitReady with per-probe events (ssh/client.go)
- microvm.preflight.RunAll + per-check spans (preflight/checker.go)
- microvm.backend.Start + ResolveRuntime/ResolveFirmware (backend.go)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX merged commit b823f5c into main Apr 7, 2026
7 checks passed
@JAORMX JAORMX deleted the jaosorior/add-otel-tracing branch April 7, 2026 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant