Problem
When a SeiNode is first deployed against a freshly-mounted BYOV data volume that doesn't already contain a config/client.toml, the cosmos-sdk seid init (run from the seid-init initContainer) writes a client.toml with chain-id = '' (empty).
On the next boot of the main seid container, the cosmos-sdk startup invariant compares genesis.json's chain_id against client.toml's chain-id and panics if they differ:
panic: genesis file chain-id=pacific-1 does not equal config.toml chain-id=
The sidecar's config-apply task currently rewrites config.toml and app.toml to match the desired controller-side config, but does not populate chain-id in client.toml from spec.chainId — so first-boot ends in a guaranteed crashloop until an operator intervenes.
Impact
- Crashloop blocks every fresh archive node deployment. Hit during pacific-1/archive-1 redeploy (PR sei-protocol/platform#519) — pod sat in
CrashLoopBackOff until live patch via kubectl debug ephemeral container.
- Workaround required for archive-2 redeploy: pre-seeded
client.toml directly on the EBS volume before mounting (PR sei-protocol/platform#540), bypassing the controller's responsibility.
- Future BYOV deployments will hit this on every first boot unless someone remembers the workaround, defeating the controller's "give me a chain + a volume and I'll run it" contract.
Relevant experts
- @bdchatham (controller contributor, hit the bug twice in the field)
- kubernetes-specialist (sidecar / seictl ownership)
Proposed approach
In the seictl sidecar's config-apply task, after rewriting config.toml/app.toml, also patch /sei/config/client.toml:
chain-id = "<spec.chainId>"
Use the same seictl config patch --target client machinery the sidecar already has. The chain ID is already in the bootstrap context (SeiNode.spec.chainId), so no new spec fields needed.
Verification: e2e test that boots a SeiNode against an empty data volume and asserts seid reaches committed state (not panic) without manual intervention.
Acceptance criteria
Out of scope
- Other unpopulated client.toml fields (
keyring-backend, node, etc.) — these have sensible cosmos-sdk defaults; only chain-id is the panic-trigger
- Live migration / patching of already-deployed nodes — they have working
client.toml already from the manual workaround; this is for future deployments
References
- sei-protocol/platform#519 — archive-1 redeploy where this was first hit
- sei-protocol/platform#540 — archive-2 redeploy where it was pre-empted via volume-level seeding
- Cosmos-SDK startup panic:
sei-cosmos/server/start.go:200 (StartCmd.func2 — genesis-vs-config chain-id check)
- Sidecar code likely at:
internal/task/config-apply (in seictl)
Problem
When a SeiNode is first deployed against a freshly-mounted BYOV data volume that doesn't already contain a
config/client.toml, the cosmos-sdkseid init(run from theseid-initinitContainer) writes aclient.tomlwithchain-id = ''(empty).On the next boot of the main
seidcontainer, the cosmos-sdk startup invariant comparesgenesis.json'schain_idagainstclient.toml'schain-idand panics if they differ:The sidecar's
config-applytask currently rewritesconfig.tomlandapp.tomlto match the desired controller-side config, but does not populatechain-idinclient.tomlfromspec.chainId— so first-boot ends in a guaranteed crashloop until an operator intervenes.Impact
CrashLoopBackOffuntil live patch viakubectl debugephemeral container.client.tomldirectly on the EBS volume before mounting (PR sei-protocol/platform#540), bypassing the controller's responsibility.Relevant experts
Proposed approach
In the seictl sidecar's
config-applytask, after rewritingconfig.toml/app.toml, also patch/sei/config/client.toml:Use the same
seictl config patch --target clientmachinery the sidecar already has. The chain ID is already in the bootstrap context (SeiNode.spec.chainId), so no new spec fields needed.Verification: e2e test that boots a SeiNode against an empty data volume and asserts seid reaches
committed state(not panic) without manual intervention.Acceptance criteria
config-applytask writeschain-idintoclient.tomlfromspec.chainId(idempotent — re-running config-apply with the same chainId is a no-op)client.tomldoesn't exist on disk yet (first boot before cosmos-sdk has run),config-applyeither creates it with reasonable defaults includingchain-id, or defers and re-runs after cosmos-sdk creates the skeletonclient.tomlOut of scope
keyring-backend,node, etc.) — these have sensible cosmos-sdk defaults; onlychain-idis the panic-triggerclient.tomlalready from the manual workaround; this is for future deploymentsReferences
sei-cosmos/server/start.go:200(StartCmd.func2— genesis-vs-config chain-id check)internal/task/config-apply(in seictl)