Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
323 changes: 323 additions & 0 deletions docs/rfc/0012-github-installations.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,323 @@
---
title: "RFC 0012: First-Class GitHub Installations"
---

| Category | Status | Created | Author |
| ------------ | --------------------------------- | ---------- | ---------------- |
| Integrations | <Badge color="gray">Draft</Badge> | 2026-04-20 | Aditya Choudhari |

## Summary

Introduce a first-class `github_installation` table that binds a GitHub App
installation to a ctrlplane workspace, with creation restricted to a verified
OAuth flow. This closes a multi-tenancy gap where any workspace on a shared
ctrlplane instance can currently impersonate any GitHub installation by typing
its numeric ID into a job agent config. The RFC covers the schema, the linking
flow, and the migration of the existing job agent integration to reference
installations by foreign key.

## Motivation

### The current state

Ctrlplane runs a single GitHub App (`GITHUB_BOT_APP_ID`, `GITHUB_BOT_PRIVATE_KEY`)
that users install on their GitHub org. The installation is represented inside
`job_agent.config` as an untyped JSON blob:

```json
{
"type": "github-app",
"installationId": 12345678,
"owner": "acme-corp"
}
```

There is no dedicated schema for GitHub installations. The zod validator in
`packages/trpc/src/routes/job-agents.ts` accepts any `installationId:
z.number()` without cross-checking it against the calling workspace or the
calling user's GitHub identity. At runtime, the workspace engine mints an
installation token using the App's private key and whatever `installationId`
the job agent config carries — the server has no way to know whether the
workspace should legitimately have access to that installation.

### The multi-tenancy gap

Installation IDs are not secret. They are visible in the URL of any GitHub App
installation settings page (`https://github.com/organizations/<org>/settings/installations/<id>`)
and are sequential integers.

On a shared ctrlplane instance, the following attack is trivial:

1. A user in Workspace B discovers (or guesses) the installation ID of
Workspace A's GitHub org.
2. The user creates a job agent in Workspace B with that installation ID.
3. Ctrlplane's server, holding the App's private key, successfully mints an
installation token for Workspace A's org.
4. Workspace B can now list repos, dispatch workflows, and read repo metadata
for Workspace A's org.

The only implicit defense today is "you have to know the installation ID,"
which is not an auth boundary.

### Why the fix has to happen at link-time

The workspace engine's GitHub dispatcher at
`apps/workspace-engine/pkg/jobagents/github/workflow_dispatcher.go` is a thin
wrapper that calls `gh.CreateClientForInstallation(ctx, cfg.InstallationId)`
with whatever `InstallationId` is in the config. Runtime validation in the
engine is the wrong layer — by then, the config has already been persisted and
the damage is done. The check must happen at the point of creation, i.e.
when a workspace first claims an installation.

## Goals

- Give GitHub installations a first-class table scoped to a workspace.
- Eliminate free-text `installationId` input from every user-facing surface.
- Require GitHub-side proof of admin rights (via OAuth + `GET /user/installations`)
before a link is accepted.
- Allow one installation to be linked by multiple workspaces, as long as each
link is independently verified.
- Migrate the existing GitHub Actions job agent to reference installations by
FK instead of duplicating the ID in JSON.

## Non-Goals

- A CLI flow for linking. The UI flow is the v1 surface; CLI is a follow-up
that plugs into the same backend handler.
- Supporting user-level GitHub OAuth as a general-purpose login mechanism. The
OAuth token acquired during linking is used only to verify installation
access for that single request and is discarded immediately after.
- Terraform-based creation of installation rows. Linking requires a human
browser session; a data-source or CLI-assisted workflow will follow.

## Proposal

### Schema

```sql
CREATE TABLE github_installation (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id uuid NOT NULL REFERENCES workspace(id) ON DELETE CASCADE,

installation_id bigint NOT NULL, -- GitHub's numeric ID
owner text NOT NULL, -- GitHub org or user login
account_type text NOT NULL, -- 'Organization' | 'User'

created_by_user_id uuid NOT NULL REFERENCES "user"(id),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve created_by_user_id nullability before implementation.

created_by_user_id is NOT NULL (Line 106), but the migration plan uses manual inserts (Line 282) and the RFC still leaves nullability undecided (Line 305). This is a schema/migration conflict that should be decided in the RFC now to avoid blocking rollout or forcing fake attribution.

Also applies to: 282-285, 305-307

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/rfc/0012-github-installations.mdx` at line 106, The RFC defines the
column created_by_user_id as NOT NULL but the migration plan uses manual inserts
without guaranteeing a value and the RFC text leaves nullability undecided; pick
and document a single resolution: either change the schema to allow
created_by_user_id NULL (alter the CREATE TABLE/column definition where
created_by_user_id uuid is declared) and update the migration steps to accept
NULL for legacy rows, or keep NOT NULL and add a deterministic attribution
strategy in the migration plan and RFC text (e.g., set a specific system user id
or map an existing user) and ensure every manual INSERT/seed referenced in the
migration plan supplies created_by_user_id; update all mentions in the RFC to
reflect the decided behavior.

created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),

UNIQUE (workspace_id, installation_id)
);

CREATE INDEX github_installation_installation_id_idx
ON github_installation(installation_id);
```

Key points:

- **`UNIQUE (workspace_id, installation_id)`** — one installation can be
linked to many workspaces, but each `(workspace, installation)` pair exists
at most once. A shared-org scenario (e.g., dev workspace and prod workspace
both watching the same repos) is supported natively.
- **No global uniqueness on `installation_id`** — multiple workspaces
legitimately share a real-world org installation.
- **`created_by_user_id`** — records which ctrlplane user completed the link.
Used for audit and for the per-user OAuth verification flow.

### Linking flow

The **only** backend entrypoint that creates rows in `github_installation` is a
single Setup URL handler. No tRPC mutation, no REST endpoint, no job-agent
config path accepts a raw `installationId`.

```
apps/api/src/routes/github/setup.ts
```

**Step 1 — User initiates the link from the UI.**
A "Connect GitHub" action in workspace settings opens:

```
https://github.com/apps/<app-slug>/installations/new?state=<signedJWT>
```

The `state` is a short-lived (≤5 min), single-use, HMAC-signed token containing
`{ workspaceId, userId, nonce }`. This binds the subsequent redirect to the
initiating ctrlplane session and prevents CSRF.

**Step 2 — GitHub install UX happens on github.com.**
The user picks an org, selects which repos to grant access to, and installs
(or reconfigures) the App. GitHub redirects the browser to ctrlplane's Setup
URL with `?installation_id=<id>&setup_action=install&state=<jwt>`.

**Step 3 — Ctrlplane verifies state and requires GitHub user OAuth.**

1. Verify `state` JWT signature, expiry, and single-use nonce.
2. Confirm the signed-in ctrlplane session matches `userId` in the state.
3. Redirect the user through the GitHub OAuth authorize endpoint to obtain a
GitHub user access token (scope: minimal, enough to call
`GET /user/installations`). The token is held only for the duration of the
current linking request — it is **not persisted**. Once verification
completes (success or failure), the token is discarded along with the
request context.
4. Call `GET /user/installations` with the user's OAuth token. If the
`installation_id` from the redirect is not in the returned list, **reject
the link**. This is the teeth of the check: even if a malicious user
replays a redirect with someone else's `installation_id`, the API will not
list an installation they don't admin.

**Step 4 — Fetch installation metadata and insert.**
Call `GET /app/installations/<id>` with App JWT auth to fetch `account.login`
(`owner`), `account.type` (`accountType`), and other metadata. Insert one row
into `github_installation` scoped to the workspace.

### Why this flow is sufficient

- **No free-text input path exists.** The API never accepts `installationId`
as user input. There is nothing to spoof.
- **`state` handles CSRF.** A user in Workspace B cannot mint a valid `state`
for Workspace A.
- **User-level OAuth handles authorization.** Even if a `state` is somehow
intercepted, the final `GET /user/installations` check fails unless the
signed-in user is an admin of the target org — in which case they could
install the App on that org themselves anyway.
- **`UNIQUE (workspace_id, installation_id)`** blocks duplicate claims within a
workspace and permits legitimate cross-workspace sharing.

### Migration of the existing job agent

Today, `job_agent.config` for type `github-app` contains
`{ installationId, owner }` inline. After this RFC:

1. Add column `github_installation_id uuid NULL REFERENCES github_installation(id)`
to `job_agent` (or continue storing it inside `config` as a UUID — TBD
during implementation).
2. Update the zod schema in `packages/trpc/src/routes/job-agents.ts` to accept
`{ type: "github-app", githubInstallationId: z.string().uuid() }` instead
of `{ installationId, owner }`. The dropdown in the job agent creation UI
becomes "pick from your workspace's linked installations" rather than a
free-text ID field.
3. Update the workspace-engine dispatcher to resolve the `installationId`
through a getter keyed on `githubInstallationId`, scoped to the job's
workspace. The engine never sees a raw installation ID supplied by user
input.

### UI changes

- **Workspace settings → Integrations → GitHub**: a new page listing linked
installations, with "Connect GitHub" triggering the flow above and
"Disconnect" removing the workspace's row (without uninstalling the App on
GitHub).
- **Job agent creation (GitHub App type)**: the `installationId` free-text
input is removed. Replace with a dropdown populated from
`github_installation` rows in the current workspace. If the list is empty,
link to the GitHub integrations page.

### Webhook handling

Unchanged by this RFC. The existing `workflow_run` handler at
`apps/api/src/routes/github/workflow_run.ts` continues to operate on
`installation.id` from the webhook payload. Since `installation.id` comes from
GitHub (not from a workspace), no trust change is needed there.

Lifecycle events (`installation.deleted`, `installation_repositories.*`) will
be handled alongside this RFC to keep `github_installation` rows in sync when
users uninstall the App or change repo access on GitHub.

## Alternatives Considered

### 1. Keep installation data in `job_agent.config`, add a validation step

Rejected. Layering validation on top of a free-text schema means every new
caller must remember to apply the check. A malformed or forgotten validation
in any future code path reopens the gap. Making the schema itself the source
of truth eliminates the class of bug.

### 2. Use installation ID as a globally unique primary key

Rejected. Forbids the legitimate case of one GitHub org serving multiple
ctrlplane workspaces (dev/staging/prod separation, multi-team monorepos).
`UNIQUE (workspace_id, installation_id)` captures the right invariant.

### 3. Rely on the `state` CSRF token alone, skip GitHub OAuth

Rejected. The `state` parameter proves the redirect was initiated from a
ctrlplane session but does not prove the ctrlplane user has admin rights on
the target org. Without the `GET /user/installations` check, a determined
user can still initiate an install flow, swap the `installation_id` in the
redirect (if they intercept it), and claim an installation they don't own.
OAuth verification is what makes the boundary real.

### 4. Allow Terraform / API creation of installation rows

Rejected for v1. The verification fundamentally requires a browser session
for GitHub OAuth; any programmatic path would need to either skip the check
(defeating the purpose) or accept pre-issued OAuth tokens (significant added
complexity and credential-handling surface). IaC users can still manage
everything *downstream* of the installation (job agents, deployments,
deployment sources) via Terraform once the installation row exists.

## Tradeoffs

### Pros

- Closes the cross-workspace installation access gap.
- Existing `job_agent.config` shape is simplified.
- Clear audit trail: `created_by_user_id` + timestamp on every link.

### Cons

- Introduces a user-level GitHub OAuth flow that ctrlplane did not previously
need. Additional client ID/secret management for the App's OAuth surface.
- Terraform and API users cannot create installations programmatically; they
must complete the UI flow once per workspace per installation.

## Migration Strategy

Ctrlplane is still in an internal-only phase, so the set of existing GitHub
installations in the wild is small and known. There is no need for an
automated backfill.

1. Add the `github_installation` table as a purely additive schema change.
Nothing else in the system references it yet.
2. Ship the UI + Setup URL handler so new links go through the verified flow.
3. **Manually re-link each existing installation through the UI.** For the
handful of existing internal workspaces that have a GitHub installation
in their `job_agent.config` today, an authorized user in each workspace
clicks "Connect GitHub" and completes the same verified flow. This
produces a `github_installation` row with correct `created_by_user_id`
and audit metadata — no raw SQL inserts, no nullable-column carve-outs.
4. Once all known installations have corresponding `github_installation`
rows, flip the job agent config schema to require `githubInstallationId`
(the FK) and update the workspace-engine dispatcher to resolve via getter.
At this point the raw `installationId` free-text path is removed.
5. Remove the old `{ installationId, owner }` JSON shape from `job_agent.config`.

Manual re-linking is acceptable precisely because the population is tiny and
each re-link takes less than a minute. If ctrlplane were public or had many
installations, we'd want an automated backfill instead; at current scale,
going through the UI is simpler than writing migration code.

## Open Questions

1. **OAuth client.** Does the GitHub App already have OAuth credentials
configured (`GITHUB_BOT_CLIENT_ID`/`_CLIENT_SECRET`), or do we need a
separate OAuth app registration? If the App's OAuth surface is sufficient,
we avoid an extra credential.
2. **Unlink semantics.** When a workspace disconnects an installation, should
linked job agents be disabled, deleted, or left in a broken state pointing
at a now-missing FK? Soft-disable seems safest.
3. **Reinstall handling.** If a user uninstalls the App on GitHub and
reinstalls it later, GitHub may issue a new `installation_id`. Should
ctrlplane detect this (via `installation.deleted` + subsequent
`installation.created` events from the same `account.id`) and offer a
"relink" UX, or require the user to re-run the flow manually?

## Conclusion

The GitHub integration today conflates "the App can reach this installation"
with "this workspace is entitled to use this installation." Adding a
first-class `github_installation` table, with a verified link flow as the
only creation path, separates the two and closes a real multi-tenancy gap on
shared instances. The work is small, contained, and additive.
Loading