Skip to content

Sidecar config-apply should populate chain-id in client.toml #246

@bdchatham

Description

@bdchatham

Problem

When a SeiNode is first deployed against a freshly-mounted BYOV data volume that doesn't already contain a config/client.toml, the cosmos-sdk seid init (run from the seid-init initContainer) writes a client.toml with chain-id = '' (empty).

On the next boot of the main seid container, the cosmos-sdk startup invariant compares genesis.json's chain_id against client.toml's chain-id and panics if they differ:

panic: genesis file chain-id=pacific-1 does not equal config.toml chain-id=

The sidecar's config-apply task currently rewrites config.toml and app.toml to match the desired controller-side config, but does not populate chain-id in client.toml from spec.chainId — so first-boot ends in a guaranteed crashloop until an operator intervenes.

Impact

  • Crashloop blocks every fresh archive node deployment. Hit during pacific-1/archive-1 redeploy (PR sei-protocol/platform#519) — pod sat in CrashLoopBackOff until live patch via kubectl debug ephemeral container.
  • Workaround required for archive-2 redeploy: pre-seeded client.toml directly on the EBS volume before mounting (PR sei-protocol/platform#540), bypassing the controller's responsibility.
  • Future BYOV deployments will hit this on every first boot unless someone remembers the workaround, defeating the controller's "give me a chain + a volume and I'll run it" contract.

Relevant experts

  • @bdchatham (controller contributor, hit the bug twice in the field)
  • kubernetes-specialist (sidecar / seictl ownership)

Proposed approach

In the seictl sidecar's config-apply task, after rewriting config.toml/app.toml, also patch /sei/config/client.toml:

chain-id = "<spec.chainId>"

Use the same seictl config patch --target client machinery the sidecar already has. The chain ID is already in the bootstrap context (SeiNode.spec.chainId), so no new spec fields needed.

Verification: e2e test that boots a SeiNode against an empty data volume and asserts seid reaches committed state (not panic) without manual intervention.

Acceptance criteria

  • Sidecar's config-apply task writes chain-id into client.toml from spec.chainId (idempotent — re-running config-apply with the same chainId is a no-op)
  • If client.toml doesn't exist on disk yet (first boot before cosmos-sdk has run), config-apply either creates it with reasonable defaults including chain-id, or defers and re-runs after cosmos-sdk creates the skeleton
  • Unit test for the new patch behavior (chainId set, empty, conflicting value)
  • No regression on existing archive nodes that already have a working client.toml

Out of scope

  • Other unpopulated client.toml fields (keyring-backend, node, etc.) — these have sensible cosmos-sdk defaults; only chain-id is the panic-trigger
  • Live migration / patching of already-deployed nodes — they have working client.toml already from the manual workaround; this is for future deployments

References

  • sei-protocol/platform#519 — archive-1 redeploy where this was first hit
  • sei-protocol/platform#540 — archive-2 redeploy where it was pre-empted via volume-level seeding
  • Cosmos-SDK startup panic: sei-cosmos/server/start.go:200 (StartCmd.func2 — genesis-vs-config chain-id check)
  • Sidecar code likely at: internal/task/config-apply (in seictl)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions