Deployment-aware workflow orchestration for self-hosted Kubernetes environments.
Platformatic World solves the version-pinning problem for Workflow DevKit: when new code deploys, in-flight workflow runs must continue executing on the code version that started them. The Vercel world handles this via Vercel's infrastructure. Platformatic World provides the same guarantees for self-hosted environments by routing queue messages through a central service that pins each run to its originating deployment version.
graph LR
PodV1["Pod v1"]
PodV2["Pod v2"]
ICC["ICC"]
WF["Workflow Service
Fastify"]
PG[("PostgreSQL")]
PodV1 <-->|"HTTP/REST"| WF
PodV2 <-->|"HTTP/REST"| WF
ICC -->|"Admin API"| WF
WF -->|"SQL"| PG
style WF fill:#dbeafe,stroke:#3b82f6,stroke-width:2px,color:#1e3a5f
style PG fill:#dbeafe,stroke:#3b82f6,stroke-width:2px,color:#1e3a5f
style PodV1 fill:#d1fae5,stroke:#16a34a,stroke-width:2px,color:#14532d
style PodV2 fill:#d1fae5,stroke:#16a34a,stroke-width:2px,color:#14532d
style ICC fill:#fef3c7,stroke:#d97706,stroke-width:2px,color:#78350f
Two packages:
@platformatic/workflow(packages/workflow/) -- Fastify REST API that owns storage, queue routing, and deployment lifecycle. Multi-tenant with per-app isolation.@platformatic/world(packages/world/) -- Thin HTTP client implementing the@workflow/worldWorldinterface. Drop-in replacement for other world implementations.
The e2e/ directory contains a Next.js test app and end-to-end test suites (57 Vercel-compatible tests + our own integration tests).
- Node.js >= 22.19.0
- PostgreSQL 17
- pnpm >= 10
# Start PostgreSQL
docker run -d --name workflow-pg -e POSTGRES_USER=wf -e POSTGRES_PASSWORD=wf -e POSTGRES_DB=workflow -p 5434:5432 postgres:17-alpine
# Start the workflow service
npx @platformatic/workflow postgresql://wf:wf@localhost:5434/workflow# Clone and install
git clone https://github.com/platformatic/platformatic-world.git
cd platformatic-world
pnpm install
# Start PostgreSQL
docker compose up -d
# Start the workflow service (single-tenant mode, no auth)
cd packages/workflow
npx wattpm startThe service starts on http://localhost:3042 by default. Without K8s, it runs in single-tenant mode — no authentication, one implicit application.
Apps using the Vercel Workflow SDK need a queue handler registered on startup. In Kubernetes with ICC this is automatic, but for local development your app must call world.start() itself.
For Next.js, create an instrumentation.ts in your project root:
// instrumentation.ts
export async function register() {
if (process.env.PLT_WORLD_SERVICE_URL) {
const { createWorld } = await import('@platformatic/world')
const world = createWorld()
await world.start?.()
}
}Then start your app:
WORKFLOW_TARGET_WORLD=@platformatic/world \
PLT_WORLD_SERVICE_URL=http://localhost:3042 \
PORT=3000 \
npx next start -p 3000For other frameworks, call world.start() during your server's startup. See the User Guide for details.
import { createPlatformaticWorld } from '@platformatic/world'
const world = createPlatformaticWorld({
serviceUrl: 'http://localhost:3042',
appId: 'default',
deploymentVersion: 'v1',
})
// Create a workflow run
const { run } = await world.events.create(null, {
eventType: 'run_created',
eventData: {
workflowName: 'my-workflow',
deploymentId: 'v1',
input: { key: 'value' },
},
})
// Queue a message (routed to the correct deployment version)
await world.queue('__wkf_workflow_my-workflow', { runId: run.runId })
// Clean up
await world.close()| Environment Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql://wf:wf@localhost:5434/workflow |
PostgreSQL connection string |
PORT |
3042 |
HTTP listen port |
HOST |
0.0.0.0 |
HTTP listen host |
K8S_API_SERVER |
https://kubernetes.default.svc |
Kubernetes API server URL (multi-tenant only) |
K8S_CA_CERT |
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt |
Path to K8s CA certificate (multi-tenant only) |
K8S_ADMIN_SERVICE_ACCOUNT |
K8s service account with admin access, format namespace:name (e.g. platformatic:icc) |
interface PlatformaticWorldConfig {
serviceUrl: string // Workflow Service base URL
appId: string // Application ID
deploymentVersion: string // Current deployment version
}The service auto-detects its operating mode. If a K8s service account token is present, it starts in multi-tenant mode with authentication. Otherwise, it starts in single-tenant mode with no auth.
No authentication. A single implicit application is auto-provisioned. Just set PLT_WORLD_SERVICE_URL and go.
Pods authenticate with their projected K8s ServiceAccount tokens. The service validates them via the K8s TokenReview API and maps the ServiceAccount identity to an application.
Admin endpoints (app provisioning, draining, version management) require a K8s identity configured as the admin service account via K8S_ADMIN_SERVICE_ACCOUNT (e.g. platformatic:icc).
Every authenticated request resolves to an application_id. All SQL queries include WHERE application_id = $appId, enforcing row-level isolation between tenants.
All app-scoped endpoints are prefixed with /api/v1/apps/:appId.
| Method | Path | Description |
|---|---|---|
POST |
/runs/:runId/events |
Create an event (main write path) |
GET |
/runs/:runId/events |
List events for a run |
GET |
/events/by-correlation |
List events by correlation ID |
Supported event types: run_created, run_started, run_completed, run_failed, run_cancelled, run_expired, step_created, step_started, step_completed, step_failed, step_retrying, hook_created, hook_received, hook_disposed, wait_created, wait_completed.
| Method | Path | Description |
|---|---|---|
GET |
/runs/:runId |
Get run by ID |
GET |
/runs |
List runs (filters: status, workflowName, deploymentId) |
POST |
/runs/:runId/replay |
Replay a completed run (creates new run with same input, targets original version) |
POST |
/runs/:runId/cancel |
Cancel a running run |
POST |
/runs/:runId/wake-up |
Cancel active sleeps for a run |
GET |
/workflows/:workflowName/template |
Get step template from most recent completed run (query: deploymentId) |
| Method | Path | Description |
|---|---|---|
GET |
/runs/:runId/steps/:stepId |
Get step by ID |
GET |
/runs/:runId/steps |
List steps for a run |
| Method | Path | Description |
|---|---|---|
GET |
/hooks/:hookId |
Get hook by ID |
GET |
/hooks/by-token/:token |
Get hook by token |
GET |
/hooks |
List hooks (filter: runId) |
| Method | Path | Description |
|---|---|---|
PUT |
/runs/:runId/streams/:name |
Write chunk(s) to a stream |
GET |
/streams/:name |
Read stream chunks |
GET |
/runs/:runId/streams |
List stream names for a run |
| Method | Path | Description |
|---|---|---|
POST |
/queue |
Enqueue a message |
Supports delaySeconds for deferred delivery and idempotencyKey for deduplication.
| Method | Path | Description |
|---|---|---|
POST |
/handlers |
Register a pod's queue handler endpoints |
DELETE |
/handlers/:podId |
Deregister a pod |
| Method | Path | Description |
|---|---|---|
GET |
/encryption-key |
Get per-run encryption key (HKDF-derived) |
| Method | Path | Description |
|---|---|---|
GET |
/dead-letters |
List dead-lettered messages |
POST |
/dead-letters/:messageId/retry |
Retry a dead-lettered message |
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/apps |
Provision application |
POST |
/api/v1/apps/:appId/k8s-binding |
Create K8s ServiceAccount binding |
DELETE |
/api/v1/apps/:appId/k8s-binding |
Remove K8s binding |
GET |
/api/v1/apps/:appId/versions/:deploymentId/status |
Get version draining status |
POST |
/api/v1/apps/:appId/versions/:deploymentId/expire |
Force-expire a deployment version |
POST |
/api/v1/versions/notify |
Notify version status change |
GET |
/api/v1/apps/:appId/quotas |
Get quotas for an app (returns defaults if none set) |
PUT |
/api/v1/apps/:appId/quotas |
Set/update quotas (maxRuns, maxEventsPerRun, maxQueuePerMinute) |
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/ready |
No | Database connectivity check |
GET |
/status |
No | Service status |
GET |
/metrics |
No | Prometheus metrics |
The queue router pins messages to deployment versions:
- Each message carries a
deployment_versionfrom the run that created it - The router looks up registered handlers for that version
- Messages are dispatched via HTTP POST to the correct pod
- If a version is expired, messages are rejected
Messages with delaySeconds > 0 are stored with status='deferred' and a deliver_at timestamp. A background poller promotes them to pending when due.
Failed dispatches use exponential backoff: min(1000ms * 2^attempt, 60000ms), up to 10 attempts. After max attempts, messages move to dead status.
The poller detects runs stuck in running for over 15 minutes with no queued messages, marking them as failed with an ORPHANED error code.
The service provides APIs for ICC to manage deployment lifecycle:
- Version notification -- ICC notifies the service when a deployment version changes status (
active,draining,expired) - Draining status -- ICC queries the service for authoritative counts of active runs, pending hooks, pending waits, and queued messages for a version
- Force-expire -- ICC can force-expire a version, which cancels all in-flight runs, dead-letters queued messages, and deregisters handlers
This gives ICC a single authoritative source for "are there any non-terminal workflow runs for version X?" -- something that cannot be determined from pod heartbeats or queue depth alone, because hooks and waits are invisible at the infrastructure level.
Per-application quotas (configurable via the admin API or the workflow_app_quotas table):
| Quota | Default | Description |
|---|---|---|
max_runs |
10,000 | Maximum concurrent active runs |
max_events_per_run |
10,000 | Maximum events per run |
max_queue_per_minute |
1,000 | Queue message rate limit per minute |
Exceeding a quota returns HTTP 429.
The /metrics endpoint returns Prometheus-compatible metrics provided by the Platformatic runtime (HTTP request duration, status codes, Node.js runtime stats).
# Start PostgreSQL (port 5434)
docker compose up -d
# Install dependencies
pnpm install
# Run all unit/integration tests (87 workflow + 12 world)
pnpm test
# Run Vercel-compatible e2e tests (57 tests — requires PostgreSQL on port 5434)
cd e2e && node --test --test-force-exit test/vercel-e2e.test.ts
# Run our own e2e tests
cd e2e && node --test --test-force-exit test/workflow.test.tspackages/
workflow/
cli.js # CLI entrypoint (npx @platformatic/workflow)
watt.json # Platformatic Service configuration (dist/plugins for production)
watt-dev.json # Dev configuration (./plugins for local development)
lib/
db.ts # pg.Pool + Postgrator migrations
errors.ts # Typed HTTP errors (@fastify/error)
quotas.ts # Quota checks + rate limiting
auth/
index.ts # Auth plugin (onRequest hook)
k8s-token.ts # K8s ServiceAccount token validation
plugins/
db.ts # Database + auth initialization
auth.ts # Auth wiring
apps.ts # App provisioning + K8s bindings
events.ts # Event creation (main write path)
runs.ts # Run queries + workflow template API
run-actions.ts # Replay, cancel, wake-up
steps.ts # Step queries
hooks.ts # Hook queries
streams.ts # Stream read/write
queue.ts # Queue message ingestion
poller.ts # Poller lifecycle management
encryption.ts # Per-run encryption keys
handlers.ts # Pod handler registration
draining.ts # Version draining status + force-expire
versions.ts # Version status notifications
dead-letters.ts # Dead-letter management
quotas.ts # Quota admin API (GET/PUT)
queue/
router.ts # Deployment-aware message routing
dispatcher.ts # HTTP dispatch to pods
poller.ts # Deferred delivery + retry + orphan detection
retry.ts # Exponential backoff
migrations/
001.do.sql # Full schema (auth, core, queue, encryption, quotas)
test/ # 87 tests across 19 suites
world/
src/
index.ts # createPlatformaticWorld() + createWorld() factories
lib/
client.ts # undici Pool HTTP client
storage.ts # Storage interface (runs, events, steps, hooks)
queue.ts # Queue + handler registration
streamer.ts # Stream read/write
encryption.ts # Encryption key fetching
test/ # 12 tests
See PLATFORMATIC-WORLD-DESIGN.md for the full design rationale, including:
- Why all operations go through a central service (hooks and waits are invisible at the infrastructure level)
- Deployment-aware routing semantics
- Upgrade safety guarantees
- Database schema design
See UPGRADE-SEMANTICS.md for the analysis of Workflow DevKit's deterministic replay and why version pinning is required.