Skip to content

Restore "Fix race condition causing sshd start failure during provisioning"#489

Merged
beyhan merged 1 commit intocloudfoundry:ubuntu-noblefrom
s4heid:fix-sshd-race
Mar 19, 2026
Merged

Restore "Fix race condition causing sshd start failure during provisioning"#489
beyhan merged 1 commit intocloudfoundry:ubuntu-noblefrom
s4heid:fix-sshd-race

Conversation

@s4heid
Copy link
Copy Markdown
Contributor

@s4heid s4heid commented Mar 14, 2026

Following the revert in #488, this pull request restores #460, but using the previous file-based method for creating and checking the firstboot_done file. It should be compatible with AWS and help address the issue noted in #485. However, I currently don't have access to an AWS environment to verify this.

On Azure:

systemctl status firstboot.service
○ firstboot.service - Run first boot tasks
     Loaded: loaded (/etc/systemd/system/firstboot.service; enabled; preset: enabled)
     Active: inactive (dead) since Fri 2026-03-13 23:36:30 UTC; 6min ago
   Main PID: 1006 (code=exited, status=0/SUCCESS)
        CPU: 1.086s

 * Run first-boot tasks via systemd so sshd never races with host-key
   regeneration. The old `rc.local` script ran after network.target, but
   in parallel with other regular system services, like ssh.service.
   Therefore, ssh.service often started (and restarted) while
   `/root/firstboot.sh` was deleting keys. cloud-init’s set-passwords
   module made this worse by restarting ssh mid-run.
 * Replace `rc.local` with a oneshot firstboot.service (delete keys,
   create new keys, reconfigure sysstat) that runs Before=ssh.service
   and leaves the `/root/firstboot_done` file as a marker.
 * Add a cloud-config.service drop-in so cloud-init's config stage waits
   for firstboot.service, and
 * Update walinuxagent.service to wait for firstboot.service, ensuring
   ssh keys have been regenerated. This guarantees sshd, cloud-init, and
   WALinuxAgent all start only after the first-boot tasks succeed.
@metskem
Copy link
Copy Markdown

metskem commented Mar 18, 2026

We created a new stemcell from your branch, repaved a VM with it, and we can confirm the problem is fixed.
Thanks!

@github-project-automation github-project-automation bot moved this from Inbox to Pending Merge | Prioritized in Foundational Infrastructure Working Group Mar 18, 2026
@beyhan beyhan merged commit ccdb2d5 into cloudfoundry:ubuntu-noble Mar 19, 2026
2 checks passed
@github-project-automation github-project-automation bot moved this from Pending Merge | Prioritized to Done in Foundational Infrastructure Working Group Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

4 participants