Skip to content

[Multi_K8s-Plugin] Traffic Routing#6629

Open
mohammedfirdouss wants to merge 16 commits intopipe-cd:masterfrom
mohammedfirdouss:feat/k8s-multi-traffic-routing
Open

[Multi_K8s-Plugin] Traffic Routing#6629
mohammedfirdouss wants to merge 16 commits intopipe-cd:masterfrom
mohammedfirdouss:feat/k8s-multi-traffic-routing

Conversation

@mohammedfirdouss
Copy link
Copy Markdown
Contributor

@mohammedfirdouss mohammedfirdouss commented Mar 29, 2026

What this PR does: Adds the K8S_TRAFFIC_ROUTING stage to the kubernetes_multicluster plugin.

This stage controls what percentage of real user traffic reaches each variant (primary, canary, baseline) across all target clusters in parallel. Two routing methods are supported:

  • PodSelector (default): updates the Kubernetes Service selector to point at a single variant. All-or-nothing — 100% to primary OR 100% to canary. No service mesh required.
  • Istio: updates the VirtualService route weights to split traffic at any percentage across primary, canary, and baseline simultaneously. Requires Istio installed in each cluster.

Both methods use errgroup to run concurrently across all deploy targets, consistent with every other stage in this plugin.

Why we need it: Without this stage, canary and baseline pods exist in the cluster but receive no real user traffic. You can deploy a canary and a baseline side by side, but if nothing is hitting them, your metric comparison has no signal. K8S_TRAFFIC_ROUTING is what turns "pods that exist" into "pods that receive production traffic", making canary analysis meaningful.

Which issue(s) this PR fixes: #6446

Where this fits in the full pipeline:

K8S_CANARY_ROLLOUT        create canary pods (new version)
K8S_BASELINE_ROLLOUT      create baseline pods (old version, for comparison)
K8S_TRAFFIC_ROUTING  ◄──  shift/split traffic to canary (and baseline with Istio)  
[observe metrics — is canary better than baseline?]
K8S_TRAFFIC_ROUTING       restore traffic (100% to primary)
K8S_PRIMARY_ROLLOUT       promote new version
K8S_CANARY_CLEAN          delete canary pods
K8S_BASELINE_CLEAN        delete baseline pods

Does this PR introduce a user-facing change?:

  • How are users affected by this change: Users can now add K8S_TRAFFIC_ROUTING to their pipeline config to route real traffic to canary or baseline variants during analysis. PodSelector works out of the box with any Kubernetes cluster. Istio support requires trafficRouting.method: istio and a VirtualService manifest in the application directory.

  • Is this breaking change: No.

  • How to migrate (if breaking change): N/A

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
…ecution logic

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
…urces

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
…pod selector

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
…ration

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
…io traffic routing

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
@mohammedfirdouss mohammedfirdouss requested a review from a team as a code owner March 29, 2026 10:55
@mohammedfirdouss
Copy link
Copy Markdown
Contributor Author

mohammedfirdouss commented Mar 29, 2026

@Warashi For traffic routing, I added Test_generateVirtualServiceManifest as a pure unit test (7 cases, no cluster, no network). But I have a question, the TestPlugin_executeK8sMultiTrafficRoutingStageIstio mirrors the single-cluster plugin here but left it commented out since it pulls the Istio Helm chart from the internet at test time. Should I enable it?
Also, I noticed the single-cluster plugin has PodSelector edge cases we haven't ported (missing_variant, wrong_variant,no_service, multiple_services).
Should I add those here, I am not really clear on these edge cases.

https://github.com/pipe-cd/pipecd/blob/master/pkg/app/pipedv1/plugin/kubernetes/deployment/traffic_test.go#L1600
https://github.com/pipe-cd/pipecd/tree/master/pkg/app/pipedv1/plugin/kubernetes/deployment/testdata

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
…lint

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
…tage

Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
Signed-off-by: Mohammed Firdous <124298708+mohammedfirdouss@users.noreply.github.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.95%. Comparing base (91c6112) to head (a00024d).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6629      +/-   ##
==========================================
+ Coverage   29.24%   37.95%   +8.70%     
==========================================
  Files         582       12     -570     
  Lines       62103      664   -61439     
==========================================
- Hits        18164      252   -17912     
+ Misses      42541      396   -42145     
+ Partials     1398       16    -1382     
Flag Coverage Δ
. ?
.-pkg-app-pipedv1-plugin-analysis ?
.-pkg-app-pipedv1-plugin-ecs ?
.-pkg-app-pipedv1-plugin-kubernetes ?
.-pkg-app-pipedv1-plugin-kubernetes_multicluster ?
.-pkg-app-pipedv1-plugin-scriptrun ?
.-pkg-app-pipedv1-plugin-terraform 37.95% <ø> (ø)
.-pkg-app-pipedv1-plugin-wait ?
.-pkg-app-pipedv1-plugin-waitapproval ?
.-pkg-plugin-sdk ?
.-tool-actions-gh-release ?
.-tool-actions-plan-preview ?
.-tool-codegen-protoc-gen-auth ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mohammedfirdouss
Copy link
Copy Markdown
Contributor Author

@Warashi @khanhtc1202 I have enabled TestPlugin_executeK8sMultiTrafficRoutingStageIstio using local CRDs from testdata/istio_crds/ (no internet required), matching the single-cluster plugin. Also ported the 4 PodSelector edge cases from the single-cluster plugin: no_service, missing_variant, wrong_variant, multiple_services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant