diff --git a/docs/rfc/0012-github-installations.mdx b/docs/rfc/0012-github-installations.mdx new file mode 100644 index 000000000..f06b51c6f --- /dev/null +++ b/docs/rfc/0012-github-installations.mdx @@ -0,0 +1,323 @@ +--- +title: "RFC 0012: First-Class GitHub Installations" +--- + +| Category | Status | Created | Author | +| ------------ | --------------------------------- | ---------- | ---------------- | +| Integrations | Draft | 2026-04-20 | Aditya Choudhari | + +## Summary + +Introduce a first-class `github_installation` table that binds a GitHub App +installation to a ctrlplane workspace, with creation restricted to a verified +OAuth flow. This closes a multi-tenancy gap where any workspace on a shared +ctrlplane instance can currently impersonate any GitHub installation by typing +its numeric ID into a job agent config. The RFC covers the schema, the linking +flow, and the migration of the existing job agent integration to reference +installations by foreign key. + +## Motivation + +### The current state + +Ctrlplane runs a single GitHub App (`GITHUB_BOT_APP_ID`, `GITHUB_BOT_PRIVATE_KEY`) +that users install on their GitHub org. The installation is represented inside +`job_agent.config` as an untyped JSON blob: + +```json +{ + "type": "github-app", + "installationId": 12345678, + "owner": "acme-corp" +} +``` + +There is no dedicated schema for GitHub installations. The zod validator in +`packages/trpc/src/routes/job-agents.ts` accepts any `installationId: +z.number()` without cross-checking it against the calling workspace or the +calling user's GitHub identity. At runtime, the workspace engine mints an +installation token using the App's private key and whatever `installationId` +the job agent config carries — the server has no way to know whether the +workspace should legitimately have access to that installation. + +### The multi-tenancy gap + +Installation IDs are not secret. They are visible in the URL of any GitHub App +installation settings page (`https://github.com/organizations//settings/installations/`) +and are sequential integers. + +On a shared ctrlplane instance, the following attack is trivial: + +1. A user in Workspace B discovers (or guesses) the installation ID of + Workspace A's GitHub org. +2. The user creates a job agent in Workspace B with that installation ID. +3. Ctrlplane's server, holding the App's private key, successfully mints an + installation token for Workspace A's org. +4. Workspace B can now list repos, dispatch workflows, and read repo metadata + for Workspace A's org. + +The only implicit defense today is "you have to know the installation ID," +which is not an auth boundary. + +### Why the fix has to happen at link-time + +The workspace engine's GitHub dispatcher at +`apps/workspace-engine/pkg/jobagents/github/workflow_dispatcher.go` is a thin +wrapper that calls `gh.CreateClientForInstallation(ctx, cfg.InstallationId)` +with whatever `InstallationId` is in the config. Runtime validation in the +engine is the wrong layer — by then, the config has already been persisted and +the damage is done. The check must happen at the point of creation, i.e. +when a workspace first claims an installation. + +## Goals + +- Give GitHub installations a first-class table scoped to a workspace. +- Eliminate free-text `installationId` input from every user-facing surface. +- Require GitHub-side proof of admin rights (via OAuth + `GET /user/installations`) + before a link is accepted. +- Allow one installation to be linked by multiple workspaces, as long as each + link is independently verified. +- Migrate the existing GitHub Actions job agent to reference installations by + FK instead of duplicating the ID in JSON. + +## Non-Goals + +- A CLI flow for linking. The UI flow is the v1 surface; CLI is a follow-up + that plugs into the same backend handler. +- Supporting user-level GitHub OAuth as a general-purpose login mechanism. The + OAuth token acquired during linking is used only to verify installation + access for that single request and is discarded immediately after. +- Terraform-based creation of installation rows. Linking requires a human + browser session; a data-source or CLI-assisted workflow will follow. + +## Proposal + +### Schema + +```sql +CREATE TABLE github_installation ( + id uuid PRIMARY KEY DEFAULT gen_random_uuid(), + workspace_id uuid NOT NULL REFERENCES workspace(id) ON DELETE CASCADE, + + installation_id bigint NOT NULL, -- GitHub's numeric ID + owner text NOT NULL, -- GitHub org or user login + account_type text NOT NULL, -- 'Organization' | 'User' + + created_by_user_id uuid NOT NULL REFERENCES "user"(id), + created_at timestamptz NOT NULL DEFAULT now(), + updated_at timestamptz NOT NULL DEFAULT now(), + + UNIQUE (workspace_id, installation_id) +); + +CREATE INDEX github_installation_installation_id_idx + ON github_installation(installation_id); +``` + +Key points: + +- **`UNIQUE (workspace_id, installation_id)`** — one installation can be + linked to many workspaces, but each `(workspace, installation)` pair exists + at most once. A shared-org scenario (e.g., dev workspace and prod workspace + both watching the same repos) is supported natively. +- **No global uniqueness on `installation_id`** — multiple workspaces + legitimately share a real-world org installation. +- **`created_by_user_id`** — records which ctrlplane user completed the link. + Used for audit and for the per-user OAuth verification flow. + +### Linking flow + +The **only** backend entrypoint that creates rows in `github_installation` is a +single Setup URL handler. No tRPC mutation, no REST endpoint, no job-agent +config path accepts a raw `installationId`. + +``` +apps/api/src/routes/github/setup.ts +``` + +**Step 1 — User initiates the link from the UI.** +A "Connect GitHub" action in workspace settings opens: + +``` +https://github.com/apps//installations/new?state= +``` + +The `state` is a short-lived (≤5 min), single-use, HMAC-signed token containing +`{ workspaceId, userId, nonce }`. This binds the subsequent redirect to the +initiating ctrlplane session and prevents CSRF. + +**Step 2 — GitHub install UX happens on github.com.** +The user picks an org, selects which repos to grant access to, and installs +(or reconfigures) the App. GitHub redirects the browser to ctrlplane's Setup +URL with `?installation_id=&setup_action=install&state=`. + +**Step 3 — Ctrlplane verifies state and requires GitHub user OAuth.** + +1. Verify `state` JWT signature, expiry, and single-use nonce. +2. Confirm the signed-in ctrlplane session matches `userId` in the state. +3. Redirect the user through the GitHub OAuth authorize endpoint to obtain a + GitHub user access token (scope: minimal, enough to call + `GET /user/installations`). The token is held only for the duration of the + current linking request — it is **not persisted**. Once verification + completes (success or failure), the token is discarded along with the + request context. +4. Call `GET /user/installations` with the user's OAuth token. If the + `installation_id` from the redirect is not in the returned list, **reject + the link**. This is the teeth of the check: even if a malicious user + replays a redirect with someone else's `installation_id`, the API will not + list an installation they don't admin. + +**Step 4 — Fetch installation metadata and insert.** +Call `GET /app/installations/` with App JWT auth to fetch `account.login` +(`owner`), `account.type` (`accountType`), and other metadata. Insert one row +into `github_installation` scoped to the workspace. + +### Why this flow is sufficient + +- **No free-text input path exists.** The API never accepts `installationId` + as user input. There is nothing to spoof. +- **`state` handles CSRF.** A user in Workspace B cannot mint a valid `state` + for Workspace A. +- **User-level OAuth handles authorization.** Even if a `state` is somehow + intercepted, the final `GET /user/installations` check fails unless the + signed-in user is an admin of the target org — in which case they could + install the App on that org themselves anyway. +- **`UNIQUE (workspace_id, installation_id)`** blocks duplicate claims within a + workspace and permits legitimate cross-workspace sharing. + +### Migration of the existing job agent + +Today, `job_agent.config` for type `github-app` contains +`{ installationId, owner }` inline. After this RFC: + +1. Add column `github_installation_id uuid NULL REFERENCES github_installation(id)` + to `job_agent` (or continue storing it inside `config` as a UUID — TBD + during implementation). +2. Update the zod schema in `packages/trpc/src/routes/job-agents.ts` to accept + `{ type: "github-app", githubInstallationId: z.string().uuid() }` instead + of `{ installationId, owner }`. The dropdown in the job agent creation UI + becomes "pick from your workspace's linked installations" rather than a + free-text ID field. +3. Update the workspace-engine dispatcher to resolve the `installationId` + through a getter keyed on `githubInstallationId`, scoped to the job's + workspace. The engine never sees a raw installation ID supplied by user + input. + +### UI changes + +- **Workspace settings → Integrations → GitHub**: a new page listing linked + installations, with "Connect GitHub" triggering the flow above and + "Disconnect" removing the workspace's row (without uninstalling the App on + GitHub). +- **Job agent creation (GitHub App type)**: the `installationId` free-text + input is removed. Replace with a dropdown populated from + `github_installation` rows in the current workspace. If the list is empty, + link to the GitHub integrations page. + +### Webhook handling + +Unchanged by this RFC. The existing `workflow_run` handler at +`apps/api/src/routes/github/workflow_run.ts` continues to operate on +`installation.id` from the webhook payload. Since `installation.id` comes from +GitHub (not from a workspace), no trust change is needed there. + +Lifecycle events (`installation.deleted`, `installation_repositories.*`) will +be handled alongside this RFC to keep `github_installation` rows in sync when +users uninstall the App or change repo access on GitHub. + +## Alternatives Considered + +### 1. Keep installation data in `job_agent.config`, add a validation step + +Rejected. Layering validation on top of a free-text schema means every new +caller must remember to apply the check. A malformed or forgotten validation +in any future code path reopens the gap. Making the schema itself the source +of truth eliminates the class of bug. + +### 2. Use installation ID as a globally unique primary key + +Rejected. Forbids the legitimate case of one GitHub org serving multiple +ctrlplane workspaces (dev/staging/prod separation, multi-team monorepos). +`UNIQUE (workspace_id, installation_id)` captures the right invariant. + +### 3. Rely on the `state` CSRF token alone, skip GitHub OAuth + +Rejected. The `state` parameter proves the redirect was initiated from a +ctrlplane session but does not prove the ctrlplane user has admin rights on +the target org. Without the `GET /user/installations` check, a determined +user can still initiate an install flow, swap the `installation_id` in the +redirect (if they intercept it), and claim an installation they don't own. +OAuth verification is what makes the boundary real. + +### 4. Allow Terraform / API creation of installation rows + +Rejected for v1. The verification fundamentally requires a browser session +for GitHub OAuth; any programmatic path would need to either skip the check +(defeating the purpose) or accept pre-issued OAuth tokens (significant added +complexity and credential-handling surface). IaC users can still manage +everything *downstream* of the installation (job agents, deployments, +deployment sources) via Terraform once the installation row exists. + +## Tradeoffs + +### Pros + +- Closes the cross-workspace installation access gap. +- Existing `job_agent.config` shape is simplified. +- Clear audit trail: `created_by_user_id` + timestamp on every link. + +### Cons + +- Introduces a user-level GitHub OAuth flow that ctrlplane did not previously + need. Additional client ID/secret management for the App's OAuth surface. +- Terraform and API users cannot create installations programmatically; they + must complete the UI flow once per workspace per installation. + +## Migration Strategy + +Ctrlplane is still in an internal-only phase, so the set of existing GitHub +installations in the wild is small and known. There is no need for an +automated backfill. + +1. Add the `github_installation` table as a purely additive schema change. + Nothing else in the system references it yet. +2. Ship the UI + Setup URL handler so new links go through the verified flow. +3. **Manually re-link each existing installation through the UI.** For the + handful of existing internal workspaces that have a GitHub installation + in their `job_agent.config` today, an authorized user in each workspace + clicks "Connect GitHub" and completes the same verified flow. This + produces a `github_installation` row with correct `created_by_user_id` + and audit metadata — no raw SQL inserts, no nullable-column carve-outs. +4. Once all known installations have corresponding `github_installation` + rows, flip the job agent config schema to require `githubInstallationId` + (the FK) and update the workspace-engine dispatcher to resolve via getter. + At this point the raw `installationId` free-text path is removed. +5. Remove the old `{ installationId, owner }` JSON shape from `job_agent.config`. + +Manual re-linking is acceptable precisely because the population is tiny and +each re-link takes less than a minute. If ctrlplane were public or had many +installations, we'd want an automated backfill instead; at current scale, +going through the UI is simpler than writing migration code. + +## Open Questions + +1. **OAuth client.** Does the GitHub App already have OAuth credentials + configured (`GITHUB_BOT_CLIENT_ID`/`_CLIENT_SECRET`), or do we need a + separate OAuth app registration? If the App's OAuth surface is sufficient, + we avoid an extra credential. +2. **Unlink semantics.** When a workspace disconnects an installation, should + linked job agents be disabled, deleted, or left in a broken state pointing + at a now-missing FK? Soft-disable seems safest. +3. **Reinstall handling.** If a user uninstalls the App on GitHub and + reinstalls it later, GitHub may issue a new `installation_id`. Should + ctrlplane detect this (via `installation.deleted` + subsequent + `installation.created` events from the same `account.id`) and offer a + "relink" UX, or require the user to re-run the flow manually? + +## Conclusion + +The GitHub integration today conflates "the App can reach this installation" +with "this workspace is entitled to use this installation." Adding a +first-class `github_installation` table, with a verified link flow as the +only creation path, separates the two and closes a real multi-tenancy gap on +shared instances. The work is small, contained, and additive.