Skip to content

Add Command Health Check Endpoints#101

Open
Yash Shrivastava (alephys26) wants to merge 7 commits intomainfrom
alephys26/add-health-endpoint
Open

Add Command Health Check Endpoints#101
Yash Shrivastava (alephys26) wants to merge 7 commits intomainfrom
alephys26/add-health-endpoint

Conversation

@alephys26
Copy link
Copy Markdown
Contributor

Background

We did not have health checks for the commands in Heimdall. Introducing per command and all command health check endpoints for greater observability and to allow features such as AWS's blue-green deployment with configurable test endpoint to shift production traffic.

Changes

Health Check API and Logic:

  • Added new HTTP endpoints /command/health and /command/{id}/health to report health status for all commands or a specific command, respectively, including detailed per-command and per-cluster results. [1] [2]
  • Implemented the core health check logic in internal/pkg/heimdall/health.go, including parallel health probing, result aggregation, and error reporting.

Command and Plugin Structure Updates:

  • Added a HealthCheck boolean field to the Command struct to allow commands to opt in or out of health checks.
  • Introduced a HealthChecker interface in the plugin system, enabling plugins to provide custom health check logic.

Configuration and Miscellaneous:

  • Updated configs/local.yaml to set health_check: true for the test ping command, demonstrating the new feature.
  • go fmt formatting changes.

Tests

Tested on ping command on local.

Test 1: /command/health

curl -s http://localhost:9090/api/v1/command/health | jq .

{
  "healthy": true,
  "checks": [
    {
      "command_id": "ping-0.0.1",
      "cluster_id": "localhost-0.0.1",
      "status": "ok",
      "latency_ms": 2
    }
  ]
}
curl -X PUT http://localhost:9090/api/v1/command/ping-0.0.1/status \
  -H 'X-Heimdall-User: test' \
  -d '{"status": "inactive"}'
{"updated_at":1775476118,"status":"INACTIVE"}
curl -s http://localhost:9090/api/v1/command/health | jq .          
  
{
  "healthy": true,
  "checks": []
}

Test 2: /command/{id}/health

curl -s http://localhost:9090/api/v1/command/ping-0.0.1/health | jq .

{
  "healthy": true,
  "checks": [
    {
      "command_id": "ping-0.0.1",
      "cluster_id": "localhost-0.0.1",
      "status": "ok",
      "latency_ms": 0
    }
  ]
}

Copilot AI review requested due to automatic review settings April 6, 2026 13:10
@wiz-55ccc8b716
Copy link
Copy Markdown

wiz-55ccc8b716 bot commented Apr 6, 2026

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings 1 Low
Software Management Finding Software Management Findings -
Total 1 Low

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds command-level health check support to Heimdall to improve observability and enable deployment workflows (e.g., blue/green with test endpoints).

Changes:

  • Introduces new HTTP endpoints /command/health and /command/{id}/health.
  • Adds opt-in health_check flag on commands and a plugin.HealthChecker extension point.
  • Implements core health check orchestration (pair resolution, parallel probing, aggregation) and updates local config example.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/plugin/plugin.go Adds HealthChecker interface for plugin-provided health probes.
pkg/object/command/command.go Adds HealthCheck opt-in flag to command definition.
internal/pkg/heimdall/heimdall.go Registers the new health check HTTP routes.
internal/pkg/heimdall/health.go Implements health check resolution, probing, and response shaping.
configs/local.yaml Enables health_check: true for the local ping command example.
internal/pkg/object/command/postgres/postgres.go gofmt-only change to Cleanup signature formatting.
internal/pkg/object/command/clickhouse/column_types.go gofmt-only formatting adjustments.
internal/pkg/janitor/janitor.go gofmt-only alignment/formatting changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +155 to +160
var err error
if hc, ok := pair.handler.(plugin.HealthChecker); ok {
err = hc.HealthCheck(ctx, pair.cluster)
} else {
err = h.pluginProbe(ctx, pair.cluster, pair.handler)
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback path for handlers that don’t implement plugin.HealthChecker calls handler.Execute(...) as a probe. This is risky because Execute may be stateful/side-effecting and may assume a fully-populated job.Job (some built-in commands call j.Context.Unmarshal(...) without a nil check, which would panic here). Safer options are to require HealthChecker for commands that opt into health checks, or make the fallback return a clear "health check not implemented" error instead of executing the command.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants