docs(networking): add network architecture overview#422
Conversation
Add comprehensive documentation covering the Cozystack networking stack: MetalLB load balancing (L2 and BGP modes), Cilium eBPF as kube-proxy replacement, Kube-OVN centralized IPAM, and tenant isolation with identity-based eBPF policies. All diagrams use Mermaid. Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Summary of ChangesHello @IvanHunters, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces new documentation that provides a detailed overview of the Cozystack cluster's network architecture. It clarifies how various components like MetalLB, Cilium eBPF, and Kube-OVN work together to manage external load balancing, internal pod networking, and robust tenant isolation. The document aims to enhance understanding of the system's networking capabilities and security enforcement mechanisms. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughIntroduces comprehensive Network Architecture documentation detailing Cozystack's multi-layered networking stack, including external load balancing via MetalLB, service load balancing and policy enforcement via Cilium eBPF, pod networking via Kube-OVN, VM IP attachments, and observability through Hubble with associated diagrams and configuration examples. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds a comprehensive and well-structured documentation page for Cozystack's network architecture. The document is clear, detailed, and makes excellent use of Mermaid diagrams to explain complex concepts like MetalLB modes, Cilium's eBPF-based processing, and tenant isolation. The explanations are accurate and easy to follow. I have a couple of minor suggestions to improve the clarity of two diagrams, but overall this is a great addition to the documentation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 277-292: The sentence "All of this happens in kernel space in
approximately 100 nanoseconds." is an unsupported precise latency claim; update
the text in the "Policy Enforcement in Kernel" section to either remove the
numeric value or qualify it and add a citation: e.g., replace with a softened
statement such as "All of this happens in kernel space and is typically
performed in sub-microsecond time on modern hardware" or "…in approximately 100
nanoseconds (hardware- and version-dependent; see [benchmark/source])" and
include a reference to the benchmark or paper if you keep the number; locate the
exact sentence in that section to edit.
🧹 Nitpick comments (1)
content/en/docs/v1/networking/architecture.md (1)
294-316: Avoid absolute security guarantees; qualify the statements.Phrases like “No userspace bypass” / “no race conditions” / “cannot be bypassed” read as unconditional guarantees. Consider qualifying them (e.g., “by design” or “under correct configuration”) to avoid over-promising.
✏️ Suggested wording
-| **No userspace bypass** | All network traffic must pass through eBPF hooks | -| **Atomic updates** | Policy changes are atomic — no race conditions | +| **No userspace bypass (by design)** | All network traffic is expected to pass through eBPF hooks under correct configuration | +| **Atomic updates** | Policy updates are applied atomically to reduce race windows |- EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"] + EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Not intended to be bypassed by userspace (with correct configuration)<br/>• Atomic policy updates"]
| ### Policy Enforcement in Kernel | ||
|
|
||
| When a packet is sent between pods, Cilium enforces policies entirely within kernel space: | ||
|
|
||
| ```mermaid | ||
| flowchart TD | ||
| PKT["Packet: 10.244.0.10 → 10.244.1.20"] | ||
| STEP1["1. Lookup source identity:<br/>10.244.0.10 → ID 12345 (tenant-a)"] | ||
| STEP2["2. Lookup destination identity:<br/>10.244.1.20 → ID 67890 (tenant-b)"] | ||
| STEP3["3. Check policy map:<br/>(12345, 67890, TCP, 80) → DENY"] | ||
| DROP["4. DROP packet"] | ||
|
|
||
| PKT --> STEP1 --> STEP2 --> STEP3 --> DROP | ||
| ``` | ||
|
|
||
| All of this happens in kernel space in approximately 100 nanoseconds. |
There was a problem hiding this comment.
Soften or source the “~100 nanoseconds” performance claim.
This is a very specific latency figure and is likely hardware/version dependent. Consider removing the number, qualifying it, or citing a benchmark if you have one.
✏️ Suggested wording
-All of this happens in kernel space in approximately 100 nanoseconds.
+All of this happens in kernel space with very low per-packet overhead (exact latency depends on hardware, kernel, and policy complexity).🤖 Prompt for AI Agents
In `@content/en/docs/v1/networking/architecture.md` around lines 277 - 292, The
sentence "All of this happens in kernel space in approximately 100 nanoseconds."
is an unsupported precise latency claim; update the text in the "Policy
Enforcement in Kernel" section to either remove the numeric value or qualify it
and add a citation: e.g., replace with a softened statement such as "All of this
happens in kernel space and is typically performed in sub-microsecond time on
modern hardware" or "…in approximately 100 nanoseconds (hardware- and
version-dependent; see [benchmark/source])" and include a reference to the
benchmark or paper if you keep the number; locate the exact sentence in that
section to edit.
Add missing technical details: networking stack variants, GENEVE overlay, Join CIDR, cluster domain, Hubble observability section. Improve tenant isolation description with Cozystack-specific labels, automatic policy application, and CNI chaining details. Signed-off-by: ohotnikov.ivan <ohotnikov.ivan@e-queo.net>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Line 420: The sentence "See Enabling Hubble for full configuration details."
is plain text and needs to be converted into a proper Hugo cross-reference;
update the text to use Hugo's relref or a direct URL pointing to the Enabling
Hubble page (replace the plain phrase with something like {{< relref
"path/to/enabling-hubble.md" >}} or the correct URL), ensuring the phrase
"Enabling Hubble" becomes a clickable link; locate this sentence in the content
for content/en/docs/v1/networking/architecture.md and replace it accordingly.
- Around line 6-8: The frontmatter "aliases" entry currently points to the wrong
section; locate the "aliases:" key in the networking/architecture document and
either remove the incorrect alias string
"/docs/v1/reference/applications/architecture" or replace it with the correct
networking URL; if the alias is intentionally a redirect from an old URL, add a
short inline comment in the frontmatter explaining that it's an intentional
redirect so reviewers know it's deliberate.
- Around line 344-357: The NOTE node ("NOTE") is incorrectly placed inside the
US subgraph (User Space) making its kernel-enforcement message semantically
wrong; move or re-declare the NOTE node so it is not inside US or KS (for
example place it after both subgraph declarations or attach it to the edge
between US and KS), and update the flowchart connections so the NOTE still
visually associates with the link (US --> KS) without being nested in the US
subgraph; keep the node identifier NOTE and the subgraph identifiers US and KS
to locate and adjust the diagram.
- Around line 285-299: The Mermaid node ID TB conflicts with the flowchart TD
direction keyword and can break rendering in older Mermaid versions; rename the
node ID (and any references to it) to a non-reserved identifier such as TenantB
or TB_tenant in the flowchart TD block — update the node declaration (TB ->
TenantB) and the edge from TenantB to ENGINE so the labels TA, TC, ENGINE,
ALLOW, and DENY remain unchanged and the diagram renders reliably.
---
Duplicate comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Line 330: Replace the unsupported precise latency claim in the sentence "All
of this happens in kernel space in approximately 100 nanoseconds." by removing
the numeric value or softening it; for example change to "All of this happens in
kernel space and completes in a very small amount of time" or "All of this
happens in kernel space and typically completes in micro- to nanoseconds
depending on hardware" so the statement is not presented as an unsourced,
hardware-specific 100 ns figure.
| aliases: | ||
| - /docs/v1/reference/applications/architecture | ||
| --- |
There was a problem hiding this comment.
Alias path references the wrong section.
/docs/v1/reference/applications/architecture belongs to the applications section, not networking. If this alias is intentional (redirect from an old URL), it should be noted; otherwise it looks like a copy-paste artefact and should be corrected or removed.
✏️ Suggested fix
aliases:
- - /docs/v1/reference/applications/architecture
+ - /docs/v1/reference/networking/architecture(or remove entirely if no old URL needs redirecting)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| aliases: | |
| - /docs/v1/reference/applications/architecture | |
| --- | |
| aliases: | |
| - /docs/v1/reference/networking/architecture | |
| --- |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 6 - 8, The
frontmatter "aliases" entry currently points to the wrong section; locate the
"aliases:" key in the networking/architecture document and either remove the
incorrect alias string "/docs/v1/reference/applications/architecture" or replace
it with the correct networking URL; if the alias is intentionally a redirect
from an old URL, add a short inline comment in the frontmatter explaining that
it's an intentional redirect so reviewers know it's deliberate.
| ```mermaid | ||
| flowchart TD | ||
| TA["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"] | ||
| TB["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"] | ||
| TC["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"] | ||
|
|
||
| ENGINE{"Cilium eBPF Policy Engine"} | ||
|
|
||
| TA --> ENGINE | ||
| TB --> ENGINE | ||
| TC --> ENGINE | ||
|
|
||
| ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"] | ||
| ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"] | ||
| ``` |
There was a problem hiding this comment.
Node ID TB collides with Mermaid's TB (Top-to-Bottom) direction keyword.
TB is an alias for the TD direction in Mermaid's flowchart grammar. Using it as a node ID inside flowchart TD works in most recent Mermaid releases, but has caused rendering failures in older versions. Rename to avoid the ambiguity.
✏️ Suggested fix
- TA["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"]
- TB["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"]
- TC["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"]
+ TENANT_A["Tenant A — Namespace app-a<br/>Pods: 10.244.0.10, 10.244.0.11"]
+ TENANT_B["Tenant B — Namespace app-b<br/>Pods: 10.244.1.20, 10.244.1.21"]
+ TENANT_C["Tenant C — Namespace app-c<br/>Pods: 10.244.2.30, 10.244.2.31"]
ENGINE{"Cilium eBPF Policy Engine"}
- TA --> ENGINE
- TB --> ENGINE
- TC --> ENGINE
+ TENANT_A --> ENGINE
+ TENANT_B --> ENGINE
+ TENANT_C --> ENGINE
- ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"]
- ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"]
+ ENGINE -->|"A ↔ A — ALLOWED"| ALLOW["Same-tenant traffic passes"]
+ ENGINE -->|"A ↔ B — DENIED"| DENY["Cross-tenant traffic dropped"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 285 - 299, The
Mermaid node ID TB conflicts with the flowchart TD direction keyword and can
break rendering in older Mermaid versions; rename the node ID (and any
references to it) to a non-reserved identifier such as TenantB or TB_tenant in
the flowchart TD block — update the node declaration (TB -> TenantB) and the
edge from TenantB to ENGINE so the labels TA, TC, ENGINE, ALLOW, and DENY remain
unchanged and the diagram renders reliably.
| ```mermaid | ||
| flowchart TD | ||
| subgraph US["User Space"] | ||
| PODA["Pod A<br/>(Tenant A)"] | ||
| PODB["Pod B<br/>(Tenant B)"] | ||
| NOTE["Cannot bypass policy —<br/>traffic MUST go through kernel"] | ||
| end | ||
|
|
||
| subgraph KS["Kernel Space"] | ||
| EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"] | ||
| end | ||
|
|
||
| US -->|"all traffic"| KS | ||
| ``` |
There was a problem hiding this comment.
NOTE node about kernel enforcement is visually placed inside the "User Space" subgraph — semantically backwards.
The message "Cannot bypass policy — traffic MUST go through kernel" belongs outside both subgraphs (or on the edge), not inside the User Space box, where it implies the kernel-bypass note is a user-space property.
✏️ Suggested fix
flowchart TD
subgraph US["User Space"]
PODA["Pod A<br/>(Tenant A)"]
PODB["Pod B<br/>(Tenant B)"]
- NOTE["Cannot bypass policy —<br/>traffic MUST go through kernel"]
end
subgraph KS["Kernel Space"]
EBPF["eBPF Programs<br/>• Attached to network interfaces<br/>• Run in privileged kernel context<br/>• Verified by kernel<br/>• Cannot be bypassed by userspace<br/>• Atomic policy updates"]
end
- US -->|"all traffic"| KS
+ US -->|"all traffic (cannot bypass policy)"| KS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 344 - 357, The
NOTE node ("NOTE") is incorrectly placed inside the US subgraph (User Space)
making its kernel-enforcement message semantically wrong; move or re-declare the
NOTE node so it is not inside US or KS (for example place it after both subgraph
declarations or attach it to the edge between US and KS), and update the
flowchart connections so the NOTE still visually associates with the link (US
--> KS) without being nested in the US subgraph; keep the node identifier NOTE
and the subgraph identifiers US and KS to locate and adjust the diagram.
Signed-off-by: Ivan Stukov <stukov.ivan.time@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
content/en/docs/v1/networking/architecture.md (1)
223-228: "Context switches" row is ambiguous — netfilter/iptables is also kernel-space.The "Required / None" comparison conflates two different planes: iptables packet forwarding (kernel-space netfilter, no per-packet context switches) vs. kube-proxy rule management (userspace daemon, context switches to update rules). Both aspects are real, but the single row label obscures this. Splitting or rewording would improve accuracy.
✏️ Suggested rewording
-| Context switches | Required | None | +| Rule update overhead | Userspace daemon (kube-proxy) required | In-kernel atomic map update — no userspace round-trip |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@content/en/docs/v1/networking/architecture.md` around lines 223 - 228, The "Context switches" table row is ambiguous and mixes packet-path and control-plane behavior; update the table to separate or reword that row (e.g., add two rows "Per-packet context switches" and "Control-plane / rule management context switches" or change the label to "Packet vs control-plane context switches") and fill values to reflect that iptables/netfilter has no per-packet userland context switches but kube-proxy introduces control-plane/userland context switches for rule management while Cilium (eBPF) avoids userland per-packet context switches and minimizes control-plane churn; edit the table rows around the existing "Context switches" entry in the diff to implement this clearer split/rewording.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Line 82: The sentence in the Networking architecture doc that says "pods can
be rescheduled to different nodes while retaining their addresses" is
misleading; change it to state that pods do not retain IPs by default and that
IP persistence requires explicit static-IP annotation (or equivalent static IP
configuration in Kube-OVN). Update the line in the architecture.md text to
clarify the default behavior and mention the static-IP annotation as the
mechanism for keeping the same address after rescheduling.
- Around line 89-110: The diagram currently shows a single centralized Cilium
node (`CIL`) receiving traffic from all MetalLB speakers (`S1`, `S2`, `S3`), but
Cilium runs as a per-node DaemonSet; update the Mermaid flowchart so each
speaker connects to its own Cilium instance (e.g., `CIL1`, `CIL2`, `CIL3`)
co-located with `S1`, `S2`, `S3`, and each of those Cilium instances then DNATs
to the local `POD` (or the shared `POD` node as appropriate) to accurately
represent per-node Cilium behavior in `CIL`, `S1`, `S2`, `S3` relationships.
---
Duplicate comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 6-7: Update the YAML aliases entry: replace the incorrect alias
value '/docs/v1/reference/applications/architecture' under the aliases key with
the correct networking redirect (e.g.
'/docs/v1/reference/networking/architecture') or remove the aliases entry
entirely if no redirect is required; specifically edit the aliases mapping that
currently contains '/docs/v1/reference/applications/architecture' to point to
the proper networking path or delete it.
- Around line 285-299: Mermaid node ID "TB" conflicts with the Mermaid
top-to-bottom direction keyword and may break rendering; rename the node ID
(e.g., change TB to TB_tenant or TenantB) in the flowchart definition and update
all references to that ID (the node declaration TB["Tenant B — Namespace
app-b..."] and the edge TB --> ENGINE) so the label text remains the same but
the identifier no longer collides with reserved keywords (apply similarly if any
other node IDs like TA/TC conflict).
- Around line 344-357: The NOTE node is placed inside the US subgraph, which
contradicts its message; move the NOTE node out of both subgraphs (i.e., declare
NOTE at the top level of the flowchart rather than inside US) or instead place
that text on the edge label between US and KS (modify the edge US -->|"all
traffic"| KS to include the "traffic MUST go through kernel" text) so the visual
semantics align with the statement; update the mermaid nodes/edge accordingly
(referencing NOTE, US, KS, and the US -->|...| KS edge).
- Line 330: The sentence "All of this happens in kernel space in approximately
100 nanoseconds." makes an unsourced, hardware/configuration-dependent latency
claim; either remove the numeric "~100 nanoseconds" or replace it with a
qualified, sourced statement (e.g., "latency depends on hardware and
configuration; in some environments kernel-space operations can be on the order
of tens to hundreds of nanoseconds — see [source]"). Update the sentence in the
document (the exact quoted line) to either omit the specific latency or add a
clear qualifier and citation.
---
Nitpick comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 223-228: The "Context switches" table row is ambiguous and mixes
packet-path and control-plane behavior; update the table to separate or reword
that row (e.g., add two rows "Per-packet context switches" and "Control-plane /
rule management context switches" or change the label to "Packet vs
control-plane context switches") and fill values to reflect that
iptables/netfilter has no per-packet userland context switches but kube-proxy
introduces control-plane/userland context switches for rule management while
Cilium (eBPF) avoids userland per-packet context switches and minimizes
control-plane churn; edit the table rows around the existing "Context switches"
entry in the diff to implement this clearer split/rewording.
| - All pods draw from a single shared IP pool (10.244.0.0/16) | ||
| - IP addresses are allocated centrally through Kube-OVN's IPAM | ||
| - There is no per-node CIDR splitting (unlike Calico or Flannel) | ||
| - Because IPs are not tied to node-specific CIDR blocks, pods can be rescheduled to different nodes while retaining their addresses |
There was a problem hiding this comment.
"Retaining their addresses" is misleading for standard pods.
By default, Kube-OVN still allocates a new IP to a pod when it is rescheduled; IP persistence requires explicit static-IP annotation. The current phrasing overstates the default behavior.
✏️ Suggested clarification
-- Because IPs are not tied to node-specific CIDR blocks, pods can be rescheduled to different nodes while retaining their addresses
+- Because IPs are not tied to node-specific CIDR blocks, pods can in principle retain their addresses across reschedules (e.g. with Kube-OVN static IP assignment); in the default dynamic allocation mode, pods still receive a new IP on rescheduling, but that IP can be drawn from any node's range📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Because IPs are not tied to node-specific CIDR blocks, pods can be rescheduled to different nodes while retaining their addresses | |
| - Because IPs are not tied to node-specific CIDR blocks, pods can in principle retain their addresses across reschedules (e.g. with Kube-OVN static IP assignment); in the default dynamic allocation mode, pods still receive a new IP on rescheduling, but that IP can be drawn from any node's range |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` at line 82, The sentence in
the Networking architecture doc that says "pods can be rescheduled to different
nodes while retaining their addresses" is misleading; change it to state that
pods do not retain IPs by default and that IP persistence requires explicit
static-IP annotation (or equivalent static IP configuration in Kube-OVN). Update
the line in the architecture.md text to clarify the default behavior and mention
the static-IP annotation as the mechanism for keeping the same address after
rescheduling.
| ```mermaid | ||
| flowchart TD | ||
| CLIENT["External Client"] | ||
| RTR["Upstream Router"] | ||
|
|
||
| subgraph CLUSTER["Kubernetes Cluster"] | ||
| S1["Node 1<br/>MetalLB Speaker"] | ||
| S2["Node 2<br/>MetalLB Speaker"] | ||
| S3["Node 3<br/>MetalLB Speaker"] | ||
| CIL["Cilium (eBPF)<br/>Service Load Balancing<br/>DNAT to Pod IP"] | ||
| POD["Target Pod<br/>(Pod CIDR)"] | ||
| end | ||
|
|
||
| CLIENT -->|"Traffic to external IP<br/>(e.g. 10.x.x.20)"| RTR | ||
| RTR -->|"L2 (ARP) or BGP"| S1 | ||
| RTR -->|"L2 (ARP) or BGP"| S2 | ||
| RTR -->|"L2 (ARP) or BGP"| S3 | ||
| S1 --> CIL | ||
| S2 --> CIL | ||
| S3 --> CIL | ||
| CIL --> POD | ||
| ``` |
There was a problem hiding this comment.
Diagram implies Cilium is a centralized service, not a per-node DaemonSet.
All three speaker nodes (S1, S2, S3) converge to a single CIL entity. Cilium runs as a DaemonSet on each node, so each node has its own Cilium instance. This is especially important to represent accurately in an architecture doc.
✏️ Suggested fix (co-locate CIL with each speaker)
subgraph CLUSTER["Kubernetes Cluster"]
- S1["Node 1<br/>MetalLB Speaker"]
- S2["Node 2<br/>MetalLB Speaker"]
- S3["Node 3<br/>MetalLB Speaker"]
- CIL["Cilium (eBPF)<br/>Service Load Balancing<br/>DNAT to Pod IP"]
+ subgraph N1["Node 1"]
+ S1["MetalLB Speaker"]
+ CIL1["Cilium (eBPF)<br/>DNAT to Pod IP"]
+ end
+ subgraph N2["Node 2"]
+ S2["MetalLB Speaker"]
+ CIL2["Cilium (eBPF)<br/>DNAT to Pod IP"]
+ end
+ subgraph N3["Node 3"]
+ S3["MetalLB Speaker"]
+ CIL3["Cilium (eBPF)<br/>DNAT to Pod IP"]
+ end
POD["Target Pod<br/>(Pod CIDR)"]
end
CLIENT -->|"Traffic to external IP<br/>(e.g. 10.x.x.20)"| RTR
RTR -->|"L2 (ARP) or BGP"| S1
RTR -->|"L2 (ARP) or BGP"| S2
RTR -->|"L2 (ARP) or BGP"| S3
- S1 --> CIL
- S2 --> CIL
- S3 --> CIL
- CIL --> POD
+ S1 --> CIL1
+ S2 --> CIL2
+ S3 --> CIL3
+ CIL1 --> POD
+ CIL2 --> POD
+ CIL3 --> POD📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ```mermaid | |
| flowchart TD | |
| CLIENT["External Client"] | |
| RTR["Upstream Router"] | |
| subgraph CLUSTER["Kubernetes Cluster"] | |
| S1["Node 1<br/>MetalLB Speaker"] | |
| S2["Node 2<br/>MetalLB Speaker"] | |
| S3["Node 3<br/>MetalLB Speaker"] | |
| CIL["Cilium (eBPF)<br/>Service Load Balancing<br/>DNAT to Pod IP"] | |
| POD["Target Pod<br/>(Pod CIDR)"] | |
| end | |
| CLIENT -->|"Traffic to external IP<br/>(e.g. 10.x.x.20)"| RTR | |
| RTR -->|"L2 (ARP) or BGP"| S1 | |
| RTR -->|"L2 (ARP) or BGP"| S2 | |
| RTR -->|"L2 (ARP) or BGP"| S3 | |
| S1 --> CIL | |
| S2 --> CIL | |
| S3 --> CIL | |
| CIL --> POD | |
| ``` |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 89 - 110, The
diagram currently shows a single centralized Cilium node (`CIL`) receiving
traffic from all MetalLB speakers (`S1`, `S2`, `S3`), but Cilium runs as a
per-node DaemonSet; update the Mermaid flowchart so each speaker connects to its
own Cilium instance (e.g., `CIL1`, `CIL2`, `CIL3`) co-located with `S1`, `S2`,
`S3`, and each of those Cilium instances then DNATs to the local `POD` (or the
shared `POD` node as appropriate) to accurately represent per-node Cilium
behavior in `CIL`, `S1`, `S2`, `S3` relationships.
kvaps
left a comment
There was a problem hiding this comment.
Asesome. I'd also metoin that we're also using few more projects:
-
https://github.com/cozystack/cozy-proxy/
is used to passtrough IPs into virtual machines -
https://github.com/k8snetworkplumbingwg/multus-cni
is used to attach secondary L2-interfaces to the virtual machines
Also I'd like to not publish similar doc to v0, as it is not fully relevant
| | `kubeovn-cilium` | Kube-OVN + Cilium (default) | Talos Linux | | ||
| | `kubeovn-cilium-generic` | Kube-OVN + Cilium | kubeadm, k3s, RKE2 | | ||
| | `cilium` | Cilium only | Talos Linux | | ||
| | `cilium-generic` | Cilium only | kubeadm, k3s, RKE2 | |
There was a problem hiding this comment.
| | `cilium-generic` | Cilium only | kubeadm, k3s, RKE2 | | |
| | `cilium-generic` | Cilium only | kubeadm, k3s, RKE2 | | |
| | `cilium-kilo` | Cilium + Kilo | Talos Linux | |
There was a problem hiding this comment.
- Added cozy-proxy and Multus CNI to Overview table in the beginning of the article
- Added cilium-kilo string as proposed
- Deleted the article from v0 documentation
ebd40e4 to
3d67bce
Compare
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Ivan Stukov <stukov.ivan.time@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 64-73: The docs list a `cilium-kilo` variant but the prose only
covers Kube-OVN and "Cilium-only" variants; update the paragraph(s) after the
table to add a third paragraph specifically for `cilium-kilo` that explains
Kilo's role: state that `cilium-kilo` uses Cilium for CNI and service load
balancing while Kilo provides an encrypted WireGuard-based overlay mesh (secure
cross-node/cluster connectivity), how this differs from plain Cilium (adds
encrypted overlay and mesh routing rather than changing Cilium’s L3/L4
responsibilities), and adjust the existing "Cilium-only variants" sentence to
exclude `cilium-kilo` so it only refers to true Cilium-only variants. Reference
the `cilium-kilo` variant name, Kilo, and Cilium when making these edits.
- Around line 157-175: The sequenceDiagram currently shows R distributing
traffic to S1, S2, S3 but only shows DNAT on S1, making S2/S3 appear to drop
traffic; update the mermaid diagram (sequenceDiagram with participants S1, S2,
S3, R, CIL, P) to either (A) explicitly show S2->>CIL and S3->>CIL and
corresponding CIL->>P flows so all three node paths are represented, or (B)
collapse to a single representative path by keeping only S1->>CIL and CIL->>P
and add a clarifying note (e.g., Note over S1,S2,S3: Each node runs Cilium
(eBPF) performing DNAT) so readers understand S2/S3 are not dead-ends.
- Around line 383-394: The egress rule including toEntities: [kube-apiserver,
cluster] is ambiguous and may open cross-tenant paths; update the example in the
egress block (the egress: / toEntities: / toEndpoints: entries) to either remove
the cluster entity and only keep kube-apiserver if API-server access is
intended, or add a short explanatory paragraph immediately adjacent to the
policy showing what granting toEntities: cluster permits (host network and
non-Cilium-managed cluster IPs), why Cozystack’s clusterwide policies still
prevent cross-tenant pod-to-pod traffic, and how tenant isolation is preserved;
reference the
endpointSelector/ingress/egress/toEndpoints/toEntities/kube-apiserver/cluster
symbols when making the change.
---
Duplicate comments:
In `@content/en/docs/v1/networking/architecture.md`:
- Around line 348-361: The NOTE node is wrongly placed inside the US subgraph;
move the NOTE node definition out of the US subgraph and place it between the US
and KS blocks (or in an appropriate outer scope) so it visually sits between
User Space and Kernel Space; update the mermaid diagram by relocating the NOTE
declaration (the NOTE node identifier) outside the subgraph US while keeping its
label text intact and preserving the existing US -->|"all traffic"| KS edge.
- Around line 6-8: The two aliases in
content/en/docs/v1/networking/architecture.md currently point to the
applications section (aliases: /docs/v1/reference/applications/architecture and
/docs/reference/applications/architecture); update those alias values to
reference the networking section instead (e.g.,
/docs/v1/reference/networking/architecture and
/docs/reference/networking/architecture) so the aliases correctly map to the
networking/architecture page.
- Line 334: The sentence "All of this happens in kernel space in approximately
100 nanoseconds." makes an unsupported precise performance claim; update the
line in content/en/docs/v1/networking/architecture.md by either removing the
specific "~100 nanoseconds" number or qualifying it (e.g., "on the order of
micro- to nanoseconds") and add a citation or link to a benchmark or source that
supports the figure if you keep a numeric claim; locate and edit the exact
sentence to ensure the doc either cites evidence or uses conservative,
non-specific phrasing.
- Around line 289-303: The Mermaid diagram uses the node ID "TB" which collides
with Mermaid's built-in direction keyword "TB"; rename the node ID (for example
to "TenantB" or "TB_node") and update all references to it (the node declaration
and any edges like "TB --> ENGINE") so the diagram parses correctly; ensure the
displayed label remains "Tenant B — Namespace app-b<br/>Pods: 10.244.1.20,
10.244.1.21" and keep the other node IDs (TA, TC, ENGINE, ALLOW, DENY)
unchanged.
- Line 86: The sentence "Because IPs are not tied to node-specific CIDR blocks,
pods can be rescheduled to different nodes while retaining their addresses"
overstates Kube-OVN behavior; update that line to clarify that IP retention is
not guaranteed by default and depends on how pods are provisioned (e.g.,
StatefulSets, static IP annotations, or IP reservations in Kube-OVN). Replace
the original sentence with a concise note stating that while node-independent
CIDRs allow cross-node scheduling, retaining the same IP requires persistent
assignment mechanisms (mention "StatefulSet", "static IP annotation", or "IP
reservation" as options).
- Around line 93-114: The diagram currently shows a single centralized Cilium
node (CIL) inside CLUSTER which implies one Cilium instance; instead represent
Cilium as a per-node DaemonSet by moving the Cilium label into each node or by
creating per-node Cilium symbols (e.g., CIL1, CIL2, CIL3) and updating the
arrows from S1, S2, S3 to point to their local Cilium instances; update the CIL
label text ("Cilium (eBPF) Service Load Balancing DNAT to Pod IP") to be shown
per-node and ensure CIL/CIL1..CIL3 connect to POD as before so it reflects
per-node eBPF behavior.
| | `cilium-generic` | Cilium only | kubeadm, k3s, RKE2 | | ||
| | `cilium-kilo` | Cilium + Kilo | Talos Linux | | ||
| | `noop` | None (bring your own CNI) | Any | | ||
|
|
||
| In Kube-OVN variants, Cilium operates as a chained CNI (`generic-veth` mode): | ||
| Kube-OVN handles pod networking and IPAM, while Cilium provides service load | ||
| balancing, network policy enforcement, and optional observability via Hubble. | ||
|
|
||
| In Cilium-only variants, Cilium serves as both the CNI and the service load | ||
| balancer. |
There was a problem hiding this comment.
cilium-kilo variant is listed but not described in the following prose.
The table introduces a cilium-kilo (Cilium + Kilo) variant, but lines 68–73 only describe Kube-OVN variants and "Cilium-only variants". The blanket statement "In Cilium-only variants, Cilium serves as both the CNI and the service load balancer" is inaccurate for cilium-kilo, where Kilo adds a WireGuard-based overlay on top of Cilium. Add a third paragraph explaining what Kilo provides and how it differs from plain Cilium.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 64 - 73, The docs
list a `cilium-kilo` variant but the prose only covers Kube-OVN and
"Cilium-only" variants; update the paragraph(s) after the table to add a third
paragraph specifically for `cilium-kilo` that explains Kilo's role: state that
`cilium-kilo` uses Cilium for CNI and service load balancing while Kilo provides
an encrypted WireGuard-based overlay mesh (secure cross-node/cluster
connectivity), how this differs from plain Cilium (adds encrypted overlay and
mesh routing rather than changing Cilium’s L3/L4 responsibilities), and adjust
the existing "Cilium-only variants" sentence to exclude `cilium-kilo` so it only
refers to true Cilium-only variants. Reference the `cilium-kilo` variant name,
Kilo, and Cilium when making these edits.
| ```mermaid | ||
| sequenceDiagram | ||
| participant S1 as Node 1 (Speaker) | ||
| participant S2 as Node 2 (Speaker) | ||
| participant S3 as Node 3 (Speaker) | ||
| participant R as Upstream Router | ||
| participant CIL as Cilium (eBPF) | ||
| participant P as Pod | ||
|
|
||
| S1->>R: BGP UPDATE: 10.x.x.20/32 via Node 1 | ||
| S2->>R: BGP UPDATE: 10.x.x.20/32 via Node 2 | ||
| S3->>R: BGP UPDATE: 10.x.x.20/32 via Node 3 | ||
| Note over R: ECMP: 3 next-hops for 10.x.x.20/32 | ||
| R->>S1: Traffic (1/3) | ||
| R->>S2: Traffic (1/3) | ||
| R->>S3: Traffic (1/3) | ||
| S1->>CIL: Packet enters kernel | ||
| CIL->>P: DNAT → Pod | ||
| ``` |
There was a problem hiding this comment.
BGP sequence diagram leaves S2 and S3 as dead-ends after ECMP distribution.
Lines 170–172 show the router distributing traffic 1/3 each to S1, S2, and S3. Then only S1->>CIL: Packet enters kernel and CIL->>P: DNAT → Pod are shown. S2 and S3 receive traffic but it never reaches CIL or Pod, making two-thirds of the traffic path appear to vanish.
Either show all three paths (each node has its own Cilium instance doing DNAT), or collapse to a representative single path with a note like Note over S1,S3: Each node runs Cilium eBPF for DNAT.
✏️ Suggested representative-path fix
R->>S1: Traffic (1/3)
R->>S2: Traffic (1/3)
R->>S3: Traffic (1/3)
- S1->>CIL: Packet enters kernel
- CIL->>P: DNAT → Pod
+ Note over S1,S3: Each speaker forwards to its local<br/>Cilium eBPF instance for DNAT
+ S1->>CIL1: Packet enters kernel (Node 1)
+ CIL1->>P: DNAT → Pod🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 157 - 175, The
sequenceDiagram currently shows R distributing traffic to S1, S2, S3 but only
shows DNAT on S1, making S2/S3 appear to drop traffic; update the mermaid
diagram (sequenceDiagram with participants S1, S2, S3, R, CIL, P) to either (A)
explicitly show S2->>CIL and S3->>CIL and corresponding CIL->>P flows so all
three node paths are represented, or (B) collapse to a single representative
path by keeping only S1->>CIL and CIL->>P and add a clarifying note (e.g., Note
over S1,S2,S3: Each node runs Cilium (eBPF) performing DNAT) so readers
understand S2/S3 are not dead-ends.
| endpointSelector: {} | ||
| ingress: | ||
| - fromEndpoints: | ||
| - matchLabels: | ||
| k8s:io.cilium.k8s.namespace.labels.tenant.cozystack.io/tenant-example: "" | ||
| egress: | ||
| - toEndpoints: | ||
| - matchLabels: | ||
| k8s:io.cilium.k8s.namespace.labels.tenant.cozystack.io/tenant-example: "" | ||
| - toEntities: | ||
| - kube-apiserver | ||
| - cluster |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for CiliumClusterwideNetworkPolicy and CiliumNetworkPolicy files
rg -l "CiliumClusterwideNetworkPolicy\|CiliumNetworkPolicy" --type yamlRepository: cozystack/website
Length of output: 43
🏁 Script executed:
# Search for any references to CiliumClusterwideNetworkPolicy in documentation
rg "CiliumClusterwideNetworkPolicy" --type markdown
# Also search for default-deny and isolation patterns
rg "default-deny|tenant.*isolation|cross-tenant" --type markdownRepository: cozystack/website
Length of output: 2317
🏁 Script executed:
# Read the architecture.md file to understand the full context
wc -l content/en/docs/v1/networking/architecture.mdRepository: cozystack/website
Length of output: 110
🏁 Script executed:
# Read the architecture.md file, focusing on the area around lines 383-394
# and the sections explaining tenant isolation
sed -n '1,50p' content/en/docs/v1/networking/architecture.md
echo "---"
sed -n '300,394p' content/en/docs/v1/networking/architecture.md
echo "---"
sed -n '395,445p' content/en/docs/v1/networking/architecture.mdRepository: cozystack/website
Length of output: 6281
toEntities: cluster in egress requires clarification or removal.
The egress section has two independent rules (OR-semantics):
toEndpointsmatching the same-tenant label — correct.toEntities: [kube-apiserver, cluster]— lacks justification.
While the documentation states that "Cozystack applies CiliumNetworkPolicy and CiliumClusterwideNetworkPolicy resources automatically when a tenant is created" to "enforce namespace-level isolation," it does not explain the cluster-wide policy structure or whether default-deny rules mitigate the cluster entity.
Cilium's cluster identity can cover host network namespaces and non-Cilium-managed cluster IPs. Without explicit documentation of how cluster-wide policies prevent cross-tenant pod traffic through this entity, this egress rule risks creating an unintended cross-tenant path.
The example should either:
- Remove
clusterand keep onlykube-apiserverif API-server access is the sole intent, or - Add explicit documentation explaining what cluster-level connectivity this grants, why it is safe given Cozystack's cluster-wide policy baseline, and how tenant isolation is maintained.
✏️ Suggested fix (API-server-only intent)
egress:
- toEndpoints:
- matchLabels:
k8s:io.cilium.k8s.namespace.labels.tenant.cozystack.io/tenant-example: ""
- toEntities:
- kube-apiserver
- - cluster📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| endpointSelector: {} | |
| ingress: | |
| - fromEndpoints: | |
| - matchLabels: | |
| k8s:io.cilium.k8s.namespace.labels.tenant.cozystack.io/tenant-example: "" | |
| egress: | |
| - toEndpoints: | |
| - matchLabels: | |
| k8s:io.cilium.k8s.namespace.labels.tenant.cozystack.io/tenant-example: "" | |
| - toEntities: | |
| - kube-apiserver | |
| - cluster | |
| endpointSelector: {} | |
| ingress: | |
| - fromEndpoints: | |
| - matchLabels: | |
| k8s:io.cilium.k8s.namespace.labels.tenant.cozystack.io/tenant-example: "" | |
| egress: | |
| - toEndpoints: | |
| - matchLabels: | |
| k8s:io.cilium.k8s.namespace.labels.tenant.cozystack.io/tenant-example: "" | |
| - toEntities: | |
| - kube-apiserver |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@content/en/docs/v1/networking/architecture.md` around lines 383 - 394, The
egress rule including toEntities: [kube-apiserver, cluster] is ambiguous and may
open cross-tenant paths; update the example in the egress block (the egress: /
toEntities: / toEndpoints: entries) to either remove the cluster entity and only
keep kube-apiserver if API-server access is intended, or add a short explanatory
paragraph immediately adjacent to the policy showing what granting toEntities:
cluster permits (host network and non-Cilium-managed cluster IPs), why
Cozystack’s clusterwide policies still prevent cross-tenant pod-to-pod traffic,
and how tenant isolation is preserved; reference the
endpointSelector/ingress/egress/toEndpoints/toEntities/kube-apiserver/cluster
symbols when making the change.
Summary
Test plan
Summary by CodeRabbit