Skip to content

feat: SPOG (Single Point of Gateway) host support#1479

Merged
sd-db merged 30 commits into
mainfrom
sd-db/spog-impl
Jun 2, 2026
Merged

feat: SPOG (Single Point of Gateway) host support#1479
sd-db merged 30 commits into
mainfrom
sd-db/spog-impl

Conversation

@sd-db
Copy link
Copy Markdown
Collaborator

@sd-db sd-db commented May 25, 2026

Summary

Adds SPOG (Single Point of Gateway) support — account-level vanity hosts (e.g. xyz.azuredatabricks.net) where workspaces are disambiguated by ?o=<workspace-id> on http_path. Matches the contract in databricks-sql-python (#767), databricks-sql-go, databricks-jdbc, and the ADBC Rust driver.

Opt-in via the dep-ceiling bumps in #1474: activates only when databricks-sql-connector ≥ 4.2.6 and databricks-sdk ≥ 0.76.0 are installed (Config.workspace_id was introduced in SDK 0.76.0; verified end-to-end against a SPOG vanity URL). Legacy hosts and older deps are unaffected.

Pre-flight checks

Host type ?o= in http_path Outcome
SPOG (unified) present proceed — workspace id passed to SDK
SPOG (unified) missing warning
non-SPOG present warning
non-SPOG missing proceed
host probe failed proceed (probe is non-fatal)

Test plan

  • Unit (hatch run unit tests/unit -q) — 1174 passed
  • pre-commit run --all-files
  • dbt debug SPOG status block verified on both
  • New SPOG tests added (unit + functional)
  • Integration test run succeeds (🆗 link)
  • Integration test run suceeds at min-deps (🆗 link)
  • New workflow to run integration tests on spog urls weekly for sanity check (spog-integration.yml) succeeds
  • Follow up docs PR

sd-db added 2 commits May 25, 2026 11:08
Implements support for Databricks SPOG hosts — account-level vanity URLs
(e.g. peco.azuredatabricks.net) where workspaces are disambiguated by a
`?o=<workspace-id>` query parameter on http_path. Approach matches the
convention adopted by databricks-sql-python, databricks-sql-go,
databricks-jdbc, and the ADBC Rust driver: parse ?o= from http_path and
use it to set the X-Databricks-Org-Id header on non-OAuth endpoints.

Opt-in via the dependency ceiling bumps already landed: requires
`databricks-sql-connector >= 4.2.6` and `databricks-sdk >= 0.104.0` for
the SPOG code path to activate. Pre-SPOG dep versions continue to work
unchanged on legacy hosts.

- `extract.py` — pure parser; pulls ?o=<workspace-id> from http_path.
- `capabilities.py` — runtime detect SPOG support: `connector_supports_spog`
  (version-detect with packaging.version.Version), `sdk_supports_workspace_id`
  (feature-detect via inspect.signature(Config) so forks/wrappers report
  correctly).
- `probe.py` — one-shot per-host probe of /.well-known/databricks-config.
  3-attempt exponential backoff; on exhaust returns HostMetadata(host_type=None)
  + WARN. Probe failure is never fatal.
- `decision.py` — applies the spec §8 decision matrix at connection.open():
  raises DbtConfigError on every misconfiguration row with a pointed
  upgrade/fix message; returns the extracted workspace_id on the happy path.

- `credentials.py`:
  - Cluster-ID regex tightened: `(.*)` -> `([^?&]+)` so the capture stops
    at any query string (independently useful even on legacy hosts).
  - DatabricksCredentialManager gains a `workspace_id` field populated by
    `extract_workspace_id(credentials.http_path)` in create_from.
  - All five `authenticate_with_*` methods now plumb workspace_id into
    `Config(...)` via a single `_config_kwargs` helper — gated on
    `sdk_supports_workspace_id()` so old SDKs are unaffected.
- `connections.py`: `DatabricksConnectionManager.open()` collects every
  http_path in play (default + per-compute) and invokes
  `check_spog_preconditions(host=..., http_paths=...)` before constructing
  conn_args. On legacy hosts the call is a no-op; on misconfiguration it
  raises a pointed DbtConfigError.
- `impl.py`: `DatabricksAdapter.debug_query` override emits a SPOG status
  block (host_type, workspace_id, dep-version suitability) before the
  standard `select 1 as id`. Makes 'is SPOG working here?' a one-command
  answer for support escalations.

- 35 unit tests under `tests/unit/spog/` covering every §8 matrix row,
  retry/backoff math, capability detection branches, both-deps-old
  ordering, HTTP-error retry fallback, and probe caching.
- 17 cross-module unit tests (workspace_id plumbing, connection.open
  wiring, dbt debug block, cluster-id regex).
- 3 functional tests under `tests/functional/adapter/spog/`:
  - test_spog_debug — assert dbt debug emits the SPOG block.
  - test_spog_missing_o_raises — strip ?o=, expect the §8 row-4 error.
  - test_spog_probe_failure_fallback — simulate probe failure; expect
    WARN + run still succeeds.
  All three skip when DBT_DATABRICKS_SPOG_* env vars are absent.

`.github/workflows/spog-integration.yml` runs the SPOG functional tests
against `peco.azuredatabricks.net?o=6436897454825492` using the existing
Azure secrets (same workspace; only host + ?o= suffix differ). Forces
SPOG-capable connector and SDK pins. Triggered weekly + workflow_dispatch.

Design spec at `docs/superpowers/specs/2026-05-19-dbt-databricks-spog-design.md`;
implementation plan at `docs/superpowers/plans/2026-05-19-dbt-databricks-spog.md`;
follow-up items tracked in `.claude/ideas/spog-future-tasks.md` (gitignored,
local-only). CHANGELOG entry added under `dbt-databricks next`.
- test_python_helpers: stub Mock() credentials.http_path with a real
  string so extract_workspace_id() (now called in
  DatabricksCredentialManager.create_from) doesn't trip on
  "argument of type 'Mock' is not iterable".
- test_auth (TestEnsureConfigTriggersTheRightAuth): autouse-patch
  sdk_supports_workspace_id() to False so the auth-routing
  assertions stay focused on auth_type. SPOG workspace_id plumbing
  has its own coverage in tests/unit/spog/.
@sd-db sd-db requested a review from jprakash-db as a code owner May 25, 2026 05:44
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 25, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  dbt/adapters/databricks
  connections.py
  credentials.py
  impl.py 1116-1117, 1121-1126, 1141-1142
  dbt/adapters/databricks/spog
  capabilities.py
  decision.py
  extract.py
  probe.py
Project Total  

This report was generated by python-coverage-comment-action

sd-db added 7 commits May 26, 2026 12:06
- ?o= is the only canonical SPOG opt-in marker. Drop /o/<id>/
  cluster-path extraction in spog/extract.py to match the connector's
  _extract_spog_headers contract; cluster paths must add ?o= explicitly.
- Short-circuit check_spog_preconditions when either dep is below the
  SPOG floor. Pre-SPOG dep installs are fully dormant — no probe, no
  matrix, no behavior change vs the pre-SPOG era.
- Downgrade "SPOG host without ?o=" and "non-SPOG host with stray ?o="
  from DbtConfigError to logger.warning. Only multi-compute ?o= conflict
  stays a hard raise.
- impl.py + decision.py: import the probe module qualified-style so tests
  only need one mock.patch site instead of patching each binding alias.
- Tests: localize the probe stub to TestDatabricksAdapter (the only class
  that exercises connection.open() end-to-end). Add a per-dir conftest
  under tests/unit/spog/ that raises AssertionError on any unmocked
  requests.get inside probe.py — keeps unit tests offline even where the
  parent probe stub doesn't apply.
- test_auth: drop the _no_workspace_id_plumbing fixture. With the ?o=-only
  extractor, _COMMON_KWARGS's cluster path no longer produces a
  workspace_id, so the SPOG branch in _config_kwargs is already inert.
- CHANGELOG: terse the SPOG entry.
Per review:
- spog-integration.yml: drop the SPOG-specific DBT_DATABRICKS_SPOG_*
  workflow env vars and the "force SPOG-capable pin" step. The workflow
  now plugs new SPOG-specific GitHub secrets (DBT_DATABRICKS_SPOG_HOST_NAME
  + DBT_DATABRICKS_SPOG_HTTP_PATH) into the standard env var names, so
  the rest of the test machinery reuses the same `databricks_uc_sql_endpoint`
  profile and no SPOG-aware code lives in profile/conftest land.
- tests/profiles.py: drop `databricks_uc_sql_endpoint_spog_target` and its
  branch — the standard target already reads what the SPOG workflow now
  sets.
- tests/conftest.py: drop the `_spog` suffix carve-out in
  `skip_by_profile_type`; no SPOG profile names remain.
- test_spog_debug: skip when http_path lacks `?o=`; extract the expected
  workspace_id from the http_path env directly instead of reading a
  SPOG-specific env var.
- CHANGELOG: drop the redundant "new spog/ package" Under-the-Hood line;
  the Feature bullet already covers the user-facing change.
The block was paraphrasing what `secrets.X → env.Y` already says. Code
stands on its own; no hidden constraint to call out.
Reuse the existing DBT_DATABRICKS_UC_ENDPOINT_HTTP_PATH secret and append
?o=<workspace-id> inline rather than carrying a full duplicated SPOG http_path.
The two new SPOG-specific secrets are host (DBT_DATABRICKS_SPOG_HOST_NAME)
and workspace id (DBT_DATABRICKS_SPOG_WORKSPACE_ID); both plug into the
standard env var names the rest of the test machinery already reads.
Public ubuntu-latest can't reach internal PyPI / Databricks targets.
Match the integration.yml + min-deps-test-fast.yml shape: the protected
runner group, the setup-jfrog-pypi composite action for package installs,
plus pinned setup-python / setup-uv / hatch action SHAs and UV_FROZEN.
- Temporary push trigger scoped to sd-db/spog-impl so we can validate the
  workflow end-to-end before merge (workflow_dispatch needs the file on
  the default branch first; this trigger goes away once the PR lands).
- Drop docs/superpowers/{plans,specs}/2026-05-19-dbt-databricks-spog-*.md
  — internal planning artifacts that shouldn't live in the repo.
- Use OIDC service-principal auth (TEST_PECO_SP_*) instead of PAT, matching
  the integration workflow against the same workspace.
- Hardcode DBT_DATABRICKS_UC_INITIAL_CATALOG=peco (no secret of that name
  exists; the prior placeholder produced an empty value → "Invalid catalog
  name").
- Add `environment: azure-prod` to match the integration workflow scope.
The previous workflow plumbed `secrets.DBT_DATABRICKS_UC_ENDPOINT_HTTP_PATH`,
which is a stale 2022 secret pointing at a warehouse no longer reachable
("ENDPOINT_NOT_FOUND: SQL warehouse ... does not exist at all in the database").
Switch to `TEST_PECO_WAREHOUSE_HTTP_PATH` — the same warehouse the live
integration workflow uses — and set it on DBT_DATABRICKS_HTTP_PATH (the
fallback the test profile reads) to match integration.yml's wiring.
- test_spog_debug: accept either DBT_DATABRICKS_UC_ENDPOINT_HTTP_PATH or
  DBT_DATABRICKS_HTTP_PATH so the skipif and the assertion line up with
  whichever env var the workflow sets (live workflow uses HTTP_PATH,
  matching integration.yml).
- test_spog_probe_failure_fallback: patch probe_host directly instead of
  requests.get. `mock.patch("...spog.probe.requests.get")` walks to the
  shared requests module and patches it globally — that took out the
  SDK's auth-time HTTP calls and caused dbt debug to fail with
  "invalid_client". Patching probe_host is the correct surgical scope.
Match the integration workflow's TEST_PECO_* secret-name convention
(TEST_PECO_SP_ID, TEST_PECO_WAREHOUSE_HTTP_PATH, etc.) instead of the
adapter-namespaced DBT_DATABRICKS_* prefix, which was misleading for
GitHub secrets (that prefix belongs to env vars the adapter reads).
Drop the SPOG-only subset and instead run all of tests/functional with
--profile databricks_uc_sql_endpoint, with env vars wired to the SPOG
vanity host + ?o= so every code path is exercised through SPOG routing.

- Schedule moved from Saturday 22:00 UTC to Sunday 21:30 UTC
  (Monday 03:00 IST) so results are visible at the start of the week.
- pytest gets -n 10 --dist=loadfile for parallelism and --reruns 1
  --reruns-delay 60 to absorb transient connectivity flakes.
- timeout-minutes bumped 25 → 90 to fit the full suite.
- Added DBT_TEST_USER and DBT_DATABRICKS_LOCATION_ROOT to match the
  env shape the existing integration job uses for the same profile.
Replace the SPOG-only mini-workflow with a near-identical copy of
integration.yml's prepare-shards + 3 sharded functional jobs (uc-cluster,
uc-sql-endpoint, cluster). Same sharding, parallelism, retry, log-upload
shape as the live integration matrix.

The only deltas vs integration.yml:
- triggers: workflow_dispatch + Sunday 21:30 UTC schedule (+ temporary
  push on sd-db/spog-impl). No PR-event / prepare / gate / report-status.
- env: host points at TEST_PECO_SPOG_HOST, the warehouse path carries
  ?o=<wsid>, DBT_DATABRICKS_SPOG_WORKSPACE_ID is exported so the cluster
  path builder uses the SPOG workspace id and appends ?o= to cluster
  paths (so cluster profile control-plane calls also route via SPOG).

build_cluster_http_path.py learns one new env var:
DBT_DATABRICKS_SPOG_WORKSPACE_ID. When set, it bypasses the legacy
hostname-regex derivation (which can't parse `peco.azuredatabricks.net`)
and appends ?o=<wsid> to the cluster/uc-cluster paths. Legacy mode is
untouched (env var unset → hostname-regex path → no suffix).
- test_spog_probe_failure_fallback: drop the hardcoded
  databricks_uc_sql_endpoint_target() fixture override. When the test
  ran in cluster/uc-cluster shards, that override pointed at a profile
  the job didn't have env vars for, causing the connection setup to
  hang in 5-minute retries before erroring. Inheriting the shard's
  active profile fixes it; the probe-failure scenario is
  profile-agnostic anyway.
- test_spog_debug: extend _resolved_http_path() to also check the
  cluster + uc-cluster path env vars (DBT_DATABRICKS_UC_CLUSTER_HTTP_PATH
  and DBT_DATABRICKS_CLUSTER_HTTP_PATH). Without this the skipif
  returned None on those shards and the test was silently skipped even
  though the cluster path now carries ?o=<wsid> under SPOG.
@sd-db
Copy link
Copy Markdown
Collaborator Author

sd-db commented May 29, 2026

All SPOG Integration Tests are passing 🆗

Note:- SPOG Integration Tests / uc-sql-endpoint (0) (push) failing is a regression in main that is due to a server side change. Fix raised in #1489.

@sd-db
Copy link
Copy Markdown
Collaborator Author

sd-db commented Jun 1, 2026

/integration-test min-deps

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Min-deps integration tests dispatched for PR #1479 by @sd-db. Track progress in the Actions tab.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Min-deps integration results for PR #1479 — UC cluster ✅ success · SQL warehouse ✅ success · All-purpose cluster ✅ success · Shard coverage ✅ success

Run details.

@jprakash-db
Copy link
Copy Markdown
Collaborator


🔴 High-Confidence / High-Severity

F1 — requests is imported at module load but not a declared runtime dependency [HIGH CONF · 2 reviewers]
dbt/adapters/databricks/spog/probe.py:13 does a top-level import requests. probe.py is imported transitively by decision.py → connections.py and by impl.py, so requests loads on every adapter import — not lazily. But pyproject.toml [project].dependencies (lines 24–27) lists only databricks-sdk and databricks-sql-connector[pyarrow]; requests appears only as the dev stub types-requests (line 65). It works today by transitive luck (the SDK/connector pull requests in), but that isn't contractually guaranteed across the allowed version ranges. If a future SDK drops transitive requests, the entire adapter fails to import for everyone, SPOG or not.
Fix: add requests to runtime dependencies, or import it lazily inside probe_host. Verified against pyproject.toml @ head.

F2 — The network probe runs on every open() for all capable-deps users — no host/?o= gate [HIGH CONF · 4 reviewers]
connections.py:493 calls check_spog_preconditions unconditionally in open(). The only short-circuit (decision.py:30) is the dep-version gate (connector_supports_spog() and sdk_supports_workspace_id()). There is no early-out for "no ?o= present anywhere." So any user whose resolver picks connector ≥ 4.2.6 + sdk ≥ 0.76.0 (the pyproject ceiling <4.3.0/<0.105.0 permits this) now pays a blocking HTTPS GET to https://{host}/.well-known/databricks-config on first connect to every host — including legacy, non-SPOG, no-?o= setups that get zero benefit. In a locked-down/air-gapped egress environment this is a guaranteed retry-budget stall on first open.
Fix: skip the probe entirely when no http_path contains ?o= (i.e. gate on extracted being non-empty), so legacy users aren't probed. Verified at decision.py:30/46 + connections.py:493.

⚠️ The PR description attributes the "opt-in via dep-ceiling bumps" to sibling PR #1474 — this PR's own pyproject.toml floors are unchanged (sdk>=0.68.0, connector>=4.1.1). Even if #1474 raises the floor, the gate remains dep-version, not host-shape — so "legacy hosts unaffected" is not true for the latency/network path regardless of where the floor lives.

F3 — Contract mismatch with the connector: cluster-path SPOG without ?o= splits data-plane vs control-plane routing [HIGH CONF · 2 reviewers · verified at the branching point]
dbt's extract_workspace_id (extract.py:7–12) reads only ?o=. The connector's _extract_spog_headers (databricks-sql-python session.py:212–226) inspects two sources in priority order: ?o= then the cluster-path segment /sql/protocolv1/o//. Consequence: on a SPOG host using an all-purpose cluster http_path without ?o=, the connector routes SQL (data-plane) correctly via the path segment, but dbt never extracts workspace_id, so the SDK Config(workspace_id=...) (control-plane: REST/Jobs/Workspace) is not plumbed — a silent split-brain, the exact thing the PR says it fixes. Worse: check_spog_preconditions will emit the "no ?o=" warning for a config the connector already handles, and the dbt test docstrings asserting "the connector only reads ?o=" are factually wrong — they'll mislead future maintainers.
Fix: either (a) make extract_workspace_id fall back to the /o// path segment to match the connector, or (b) if cluster-path SPOG is out of scope, say so explicitly in the warning + CHANGELOG and correct the test docstrings. Verified at session.py:184–226.


🟡 Medium-Severity

F4 — ?o= value is never validated; asymmetric with connector's ^[0-9]+$ [MODERATE CONF · 2 reviewers]
The connector only injects the org-id header if the value matches _ORG_ID_RE = ^[0-9]+$ (session.py:218), silently dropping a non-numeric value. dbt's extract_workspace_id returns the raw string (?o=abc → "abc"), which then flows into Config(workspace_id="abc"). Result: dbt plumbs a bad value into the control plane while the connector drops it on the data plane — another silent asymmetry with no diagnostic. Fix: mirror the connector's numeric guard and warn on mismatch. Verified.

F5 — Probe-failure logged at INFO while config smells log at WARNING [MODERATE CONF · 2 reviewers]
probe.py:50 logs the probe-exhaustion message (the case most likely to precede real routing failures) at logger.info, while the benign "looks like SPOG, no ?o=" smell logs at logger.warning (decision.py:49/60). The signal an operator most needs is the quietest. Fix: raise the probe-failure message to WARNING. Verified.

F6 — Existing connection-manager test now makes a live network call [MODERATE CONF · 2 reviewers]
tests/unit/test_connection_manager.py::test_open_calls_is_cluster_http_path_for_warehouse was modified to add compute = None and host = "example.cloud.databricks.com" but does not patch check_spog_preconditions / probe_host. With capable deps installed, open() reaches the real probe → live requests.get to example.cloud.databricks.com with 3 retries + real backoff. The _guard_unmocked_requests_get fixture only covers tests/unit/spog/, not this file. Fix: patch check_spog_preconditions (or stub probe_host) here, like TestOpenSpogIntegration does. Verified in diff.

F8 — Worst-case ~16s blocking connect latency on a hanging probe endpoint [MODERATE CONF · 3 reviewers]
requests.get(url, timeout=5) × 3 attempts + ~1.75s backoff ≈ ~16s of serial blocking added to first open() if /.well-known/databricks-config accepts the TCP connection but hangs. Mitigated by @cache (one probe per host per process) and non-fatal fallback, but for short-lived dbt run invocations or per-shard CI workers it's paid every run. Fix: tighten to a connect/read tuple e.g. timeout=(2, 3). (Performance scoped this LOW given caching; ops scoped it HIGH — reported at the conservative middle.) Verified.

@sd-db
Copy link
Copy Markdown
Collaborator Author

sd-db commented Jun 2, 2026

  • F1 - Unrelated to the current task and mostly fine
  • F2 - not required
  • F3 - Not true, this was fixed
  • F4 - Not required, we don't do the validations at our end. Better to bubble out whatever error server returns here
  • F5 - Maybe we can move everything to INFO, let me check again. EDIT - Current code is fine, not required
  • F6 - This is expected, not an issue
  • F8 - Not an issue

@sd-db sd-db merged commit 5295e13 into main Jun 2, 2026
9 checks passed
@sd-db sd-db deleted the sd-db/spog-impl branch June 2, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants