test: (scriptless) Enable scriptless phase 3 in AB e2es#8453
test: (scriptless) Enable scriptless phase 3 in AB e2es#8453lilypan26 wants to merge 32 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Enables “scriptless phase 3” coverage in the AgentBaker e2e suite by adding a new scriptless_anc subtest path that provisions nodes using AKSNodeConfig/aks-node-controller, plus wiring many existing scenarios to provide an AKSNodeConfigMutator.
Changes:
- Added a new
scriptless_ancsubtest variant and runtime flag (EnableScriptlessANC) to drive scriptless phase-3 execution. - Refactored/expanded the e2e “aks-node-controller hack” customData generation to optionally include AKSNodeConfig and/or an nbc-cmd script.
- Updated many scenarios to set equivalent
AKSNodeConfigMutatorfields alongside existing NBC mutators.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| e2e/vmss.go | Refactors customData hack generation and wires scriptless ANC + NBC cmd hack paths into VMSS creation. |
| e2e/types.go | Adds EnableScriptlessANC and adjusts kubelet-config-file detection logic for scriptless ANC scenarios. |
| e2e/test_helpers.go | Adds scriptless_anc subtest generation and new gating helper. |
| e2e/scenario_test.go | Adds AKSNodeConfigMutator coverage across many existing scenarios. |
Co-authored-by: lilypan26 <106703606+lilypan26@users.noreply.github.com>
Co-authored-by: lilypan26 <106703606+lilypan26@users.noreply.github.com>
…e/AgentBaker into lily/scriptless/phase-3-e2e
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…e/AgentBaker into lily/scriptless/phase-3-e2e
| [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.nvidia-container-runtime.options] | ||
| BinaryName = "/usr/bin/nvidia-container-runtime" | ||
| SystemdCgroup = true | ||
| [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.nvidia-container-runtime.options] |
There was a problem hiding this comment.
I don't know why there is no icm on this before your change? maybe no one is using containerd.cri.v1?
| snapshotter = "tardev" | ||
| runtime_type = "io.containerd.kata-cc.v2" | ||
| privileged_without_host_devices = true | ||
| pod_annotations = ["io.katacontainers.*"] |
There was a problem hiding this comment.
Many indents error are related to Kata. I am wondering if kata is never in production or these configs are never used.
| FeatureGates: map[string]bool{}, | ||
| FailSwapOn: to.Ptr(false), | ||
| FeatureGates: map[string]bool{ | ||
| "RotateKubeletServerCertificate": true, |
There was a problem hiding this comment.
We already have this set to true in KubeletFlags["--feature-gates".
If we've set EnableKubeletConfigFile: false,, do we need this?
| } | ||
|
|
||
| const sysctlTemplateString = `# This is a partial workaround to this upstream Kubernetes issue: | ||
| # https://github.com/kubernetes/kubernetes/issues/41916#issuecomment-312428731 |
There was a problem hiding this comment.
No one dared to remove this although it's 9 years ago. 😂
Indeed, do you know if the issue no longer exists?
| nbc.IsARM64 = true | ||
| }, | ||
| AKSNodeConfigMutator: func(_ *Cluster, config *aksnodeconfigv1.Configuration) { | ||
| config.VmSize = "Standard_D2pds_v6" |
There was a problem hiding this comment.
Why the vmsize needs explicit set for some cases only?
| "IDENTITY_BINDINGS_LOCAL_AUTHORITY_SNI": config.GetServiceAccountImagePullProfile().GetLocalAuthoritySni(), | ||
| "CSE_TIMEOUT": getCSETimeout(config), | ||
| "SKIP_WAAGENT_HOLD": "true", | ||
| "SKIP_WAAGENT_HOLD": "false", |
There was a problem hiding this comment.
this needs to be true, because scriptless mode we always never hold waagent
There was a problem hiding this comment.
Yep this would be a difference compared to non scriptless path, may be we can have a list of keys which are ok to ignore
| VM *ScenarioVM | ||
| VMSSName string | ||
| EnableScriptlessNBCCSECmd bool | ||
| EnableScriptlessANC bool |
There was a problem hiding this comment.
Do we need a new flag. Because RP needs to provide aksnodeconfig input somewhere which means we are changing the signature of GetBootstrappingData(). We will probably be switching over to aks-node-controller.utils.CustomData instead based on some feature flag.
What this PR does / why we need it:
EnableScriptlessANCCustomDataPhase3to provide both ANC and NBC cse cmd to AKS node controllerWhich issue(s) this PR fixes:
Fixes #