Conversation
Adds a top-level `databricks doctor` command that validates CLI setup by running sequential diagnostic checks: CLI version, config file readability, active profile, authentication, user identity, and network connectivity. Auth failures are reported as check results, not command errors. Supports both text output (colored status icons) and JSON output (`--output json`). Co-authored-by: Isaac
|
Commit: 4f9375e
18 interesting tests: 9 SKIP, 7 KNOWN, 2 flaky
Top 27 slowest tests (at least 2 minutes):
|
Co-authored-by: Isaac
The test used env.Set(ctx, ...) to set DATABRICKS_HOST and
DATABRICKS_TOKEN, but checkAuth creates a bare config.Config{}
that reads from real environment variables via os.Getenv, not
the context-based env layer. Use t.Setenv instead so the SDK
can see the values.
Co-authored-by: Isaac
shreyas-goenka
left a comment
There was a problem hiding this comment.
Note: This review was posted by Claude (AI assistant). Shreyas will do a separate, more thorough review pass.
Priority: HIGH — Config resolution diverges from real CLI auth path
MAJOR: resolveConfig diverges from real CLI auth
The resolveConfig function in databricks doctor constructs its own config resolution path instead of going through the standard SDK/CLI authentication flow. This means the doctor command could report "config is fine" while the real CLI fails (or vice versa). If the goal is to diagnose auth issues, it should use the same code path the CLI uses.
MEDIUM: Network check bypasses SDK HTTP client
The connectivity check uses http.DefaultClient directly instead of going through the SDK's configured HTTP client. In enterprise environments with proxies or custom TLS, this will give misleading results — the check might fail even though the SDK would succeed (or vice versa).
Other Observations
- Good idea for a diagnostic command overall
- The step-by-step output format is user-friendly
- Missing test coverage for the core diagnostic logic
When the workspace client is unavailable but config is resolved, the network check was falling back to http.DefaultClient. This ignores proxy and custom TLS settings from the SDK config, giving misleading results in enterprise environments. Use configuredNetworkHTTPClient(cfg) instead, which respects HTTPTransport and InsecureSkipVerify from the config.
…allback, skip status - Detect account-level configs (AccountID + account host) and use NewAccountClient instead of always using NewWorkspaceClient - Add 15s per-check deadline for auth and identity checks to prevent hangs on unresponsive IdP - Network check now tries even when config resolution fails, as long as a host URL is available from partial config resolution - Identity marked as 'skip' (not 'fail') when auth failed or when using account-level profile, avoiding double failures from one root cause - Add skip status rendering in text output
- isAccountLevelConfig now handles both AccountHost and UnifiedHost - checkIdentity uses isAccountLevelConfig instead of raw HostType check - Use http.MethodGet and authCfg.Host instead of empty strings - Remove hardcoded config file path (let SDK resolve it) - Add "skip" to CheckResult status doc comment - checkCurrentProfile shows resolved profile from config file Co-authored-by: Isaac
resolveConfig now checks for [__settings__].default_profile when no explicit profile is set via --profile flag or DATABRICKS_CONFIG_PROFILE. This mirrors the resolution logic in cmd/root/auth.go so users who rely on `databricks auth switch` see the correct profile in doctor output. Co-authored-by: Isaac
Table-driven tests for isAccountLevelConfig covering classic account host, unified host with/without account ID, workspace host, and missing host. Also adds a test for checkIdentity skipping the /me endpoint for unified-host account profiles. Co-authored-by: Isaac
Remove the custom Loaders override in resolveConfig. The CLI's env.NewConfigLoader panics on non-string config attributes (e.g. DATABRICKS_RATE_LIMIT which is an int), and the SDK's default loaders already handle env vars properly. This aligns with how cmd/root/auth.go creates config objects. Also add an acceptance test under acceptance/cmd/doctor/ that exercises the full doctor flow with a mock HTTP server. Co-authored-by: Isaac
…tion Use env.NewConfigLoader(ctx) in the doctor command's config loader chain so that all SDK config attributes (HOST, TOKEN, PROFILE, etc.) set via env.Set(ctx, ...) participate in auth resolution, not just CONFIG_FILE. Fix isAccountLevelConfig to treat unified-host profiles with both account_id and workspace_id as workspace-level, matching the profiler classification in libs/databrickscfg/profile/profiler.go. Co-authored-by: Isaac
The GetConfiguredDefaultProfile function is from the auth switch PR which has not been merged into this branch's base yet. Remove the dependency to fix the build. Co-authored-by: Isaac
Use SetS instead of Set to properly convert string env var values to the target type (int, bool, etc.) before assignment. Co-authored-by: Isaac
Use filepath.ToSlash so the config file path always uses forward slashes, making acceptance test output stable across OSes. Co-authored-by: Isaac
Why
Users debugging CLI setup issues (auth failures, config problems, network issues) have no single command to diagnose their environment. They must manually run separate commands to check auth, config, and connectivity.
Changes
Before: Users had to manually run separate commands to check auth, config, and connectivity.
Now: A new
databricks doctorcommand runs all diagnostic checks and reports results as a checklist:Text output uses colored status icons ([ok], [FAIL], etc.) to stdout. JSON output (
--output json) returns a structured array. Auth failures are reported as check results, not command errors.Post-review fixes
AccountIDis set ANDWorkspaceIDis emptyhttp.MethodGetwith proper URLfilepath.JoinCheckResultstatus doc commentacceptance/cmd/doctor/env.NewConfigLoader(ctx)to participate in full auth resolutionOpen item
doctorcommand name should be added to a deny list for new API names in the universe API linters, so future auto-generated API commands don't collide with it. Tracked separately.Test plan
make lintfullpassesmake checkspasses