Skip to content

test: (scriptless) Enable scriptless phase 3 in AB e2es#8453

Open
lilypan26 wants to merge 32 commits into
mainfrom
lily/scriptless/phase-3-e2e
Open

test: (scriptless) Enable scriptless phase 3 in AB e2es#8453
lilypan26 wants to merge 32 commits into
mainfrom
lily/scriptless/phase-3-e2e

Conversation

@lilypan26
Copy link
Copy Markdown
Contributor

@lilypan26 lilypan26 commented May 5, 2026

What this PR does / why we need it:

  • Enables scriptless phase 3 in ab e2es under tag EnableScriptlessANC
    • This tag means both ANC and NBC cse cmd will be provided, tests should validate that there are no diffs between the generated provisioning env vars
    • For now, NBC cse cmd will be used for provisioning until there are no more diffs, after which we can switch to using ANC
  • Adds CustomDataPhase3 to provide both ANC and NBC cse cmd to AKS node controller

Which issue(s) this PR fixes:

Fixes #

Copilot AI review requested due to automatic review settings May 5, 2026 16:25
@lilypan26 lilypan26 changed the title Lily/scriptless/phase 3 e2e test(scriptless): Enable scriptless phase 3 in AB e2es May 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables “scriptless phase 3” coverage in the AgentBaker e2e suite by adding a new scriptless_anc subtest path that provisions nodes using AKSNodeConfig/aks-node-controller, plus wiring many existing scenarios to provide an AKSNodeConfigMutator.

Changes:

  • Added a new scriptless_anc subtest variant and runtime flag (EnableScriptlessANC) to drive scriptless phase-3 execution.
  • Refactored/expanded the e2e “aks-node-controller hack” customData generation to optionally include AKSNodeConfig and/or an nbc-cmd script.
  • Updated many scenarios to set equivalent AKSNodeConfigMutator fields alongside existing NBC mutators.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
e2e/vmss.go Refactors customData hack generation and wires scriptless ANC + NBC cmd hack paths into VMSS creation.
e2e/types.go Adds EnableScriptlessANC and adjusts kubelet-config-file detection logic for scriptless ANC scenarios.
e2e/test_helpers.go Adds scriptless_anc subtest generation and new gating helper.
e2e/scenario_test.go Adds AKSNodeConfigMutator coverage across many existing scenarios.

Comment thread e2e/vmss.go Outdated
Comment thread e2e/vmss.go Outdated
Comment thread e2e/vmss.go Outdated
Comment thread e2e/vmss.go
Comment thread e2e/test_helpers.go
Comment thread e2e/vmss.go
Co-authored-by: lilypan26 <106703606+lilypan26@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 27, 2026 21:04
@lilypan26 lilypan26 review requested due to automatic review settings May 27, 2026 21:04
Co-authored-by: lilypan26 <106703606+lilypan26@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 27, 2026 21:09
@lilypan26 lilypan26 review requested due to automatic review settings May 27, 2026 21:09
Copilot AI review requested due to automatic review settings May 27, 2026 22:15
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 23 changed files in this pull request and generated 2 comments.

Comment thread parts/linux/cloud-init/artifacts/cse_cmd.sh
Comment thread e2e/validators.go Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 27, 2026 22:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 23 changed files in this pull request and generated 4 comments.

Comment thread parts/linux/cloud-init/artifacts/cse_cmd.sh
Comment thread e2e/vmss.go
Comment thread aks-node-controller/pkg/nodeconfigutils/utils.go
Comment thread e2e/test_helpers.go
Comment thread pkg/agent/baker.go
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.nvidia-container-runtime.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
SystemdCgroup = true
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.nvidia-container-runtime.options]
Copy link
Copy Markdown
Collaborator

@Devinwong Devinwong May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why there is no icm on this before your change? maybe no one is using containerd.cri.v1?

Comment thread pkg/agent/baker.go
snapshotter = "tardev"
runtime_type = "io.containerd.kata-cc.v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many indents error are related to Kata. I am wondering if kata is never in production or these configs are never used.

Comment thread e2e/node_config.go
FeatureGates: map[string]bool{},
FailSwapOn: to.Ptr(false),
FeatureGates: map[string]bool{
"RotateKubeletServerCertificate": true,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this set to true in KubeletFlags["--feature-gates".
If we've set EnableKubeletConfigFile: false,, do we need this?

Comment thread pkg/agent/baker.go
}

const sysctlTemplateString = `# This is a partial workaround to this upstream Kubernetes issue:
# https://github.com/kubernetes/kubernetes/issues/41916#issuecomment-312428731
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one dared to remove this although it's 9 years ago. 😂
Indeed, do you know if the issue no longer exists?

Comment thread e2e/scenario_test.go
nbc.IsARM64 = true
},
AKSNodeConfigMutator: func(_ *Cluster, config *aksnodeconfigv1.Configuration) {
config.VmSize = "Standard_D2pds_v6"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the vmsize needs explicit set for some cases only?

"IDENTITY_BINDINGS_LOCAL_AUTHORITY_SNI": config.GetServiceAccountImagePullProfile().GetLocalAuthoritySni(),
"CSE_TIMEOUT": getCSETimeout(config),
"SKIP_WAAGENT_HOLD": "true",
"SKIP_WAAGENT_HOLD": "false",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be true, because scriptless mode we always never hold waagent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this would be a difference compared to non scriptless path, may be we can have a list of keys which are ok to ignore

Comment thread e2e/types.go
VM *ScenarioVM
VMSSName string
EnableScriptlessNBCCSECmd bool
EnableScriptlessANC bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a new flag. Because RP needs to provide aksnodeconfig input somewhere which means we are changing the signature of GetBootstrappingData(). We will probably be switching over to aks-node-controller.utils.CustomData instead based on some feature flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants