HYPERFLEET-633 - docs: Define release contract and integration testing strategy with GCP and ROSA teams#120
Conversation
WalkthroughA new documentation file defines the HyperFleet Release Contract and release handoff expectations with consumer teams (GCP Offering and ROSA Regional Platform). It lists required release artifacts (container images, OCI Helm charts, release notes, compatibility matrix, per-component/git tags), specifies a Helm chart distribution approach (short-term ROSA OCI registry, then Conflux publishing), defines Slack notification SLAs, states a roll-forward recovery posture with N-1 compatibility and targeted patch timelines, and documents nightly integration testing against main with ROSA overrides, test ownership, gaps, and mitigations. Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/docs/release-contract.md`:
- Around line 108-111: Remove the contradictory "(exact times TBD)" qualifier
from the header "HyperFleet commits to (exact times TBD):" so the explicit SLAs
in the two bullets ("Producing a patch release within **48 hours** for
Blocker/Critical regressions" and "Producing a patch release within **1 week**
for Major regressions") are the single source of truth; update the header to
"HyperFleet commits to:" (or alternatively remove the bullets and keep the TBD
wording) so the document no longer contains both TBD and explicit times.
- Around line 134-142: The ownership matrix contradicts itself because the "GCP
integration" row is marked as "TBD" while the "Release gate" row requires "All
of the above must pass" before GA; update the document so the gate is operable
by either assigning a concrete owner/schedule to the "GCP integration" row
(replace "TBD (ref: GCP-334)" with the actual owner and cadence) or changing the
"Release gate" text to explicitly exclude items marked TBD (e.g., "All
applicable, non-TBD layers must pass"); edit the rows "GCP integration" and
"Release gate" in the table accordingly to remove the contradiction.
- Around line 56-59: Replace the "GCP Platform Architecture" TBD section with a
concise, concrete architecture description: under the "GCP Platform
Architecture" header enumerate the GCP projects/services used (e.g., VPCs,
GKE/clusters, Cloud SQL/Spanner, Pub/Sub, Cloud Storage, IAM roles), diagram key
integration points with Hyperfleet components, specify auth/identity boundaries
and network/firewall expectations, and list owner responsibilities for testing
and release (who deploys/configures each service and what tests gating
releases). Ensure the section names "GCP Platform Architecture" and any
referenced components (VPC, GKE, Cloud SQL, Pub/Sub, IAM) appear so downstream
teams can map responsibilities and tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 923fbbce-26b3-471a-8a72-4d2c5ef875aa
📒 Files selected for processing (1)
hyperfleet/docs/release-contract.md
| HyperFleet commits to (exact times TBD): | ||
|
|
||
| - Producing a patch release within **48 hours** for Blocker/Critical regressions | ||
| - Producing a patch release within **1 week** for Major regressions |
There was a problem hiding this comment.
Remove contradictory SLA wording in rollback commitments.
Line 108 says exact times are TBD, but Line 110-111 already set explicit times. Keep one source of truth to avoid interpretation drift.
Suggested edit
-HyperFleet commits to (exact times TBD):
+HyperFleet commits to:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| HyperFleet commits to (exact times TBD): | |
| - Producing a patch release within **48 hours** for Blocker/Critical regressions | |
| - Producing a patch release within **1 week** for Major regressions | |
| HyperFleet commits to: | |
| - Producing a patch release within **48 hours** for Blocker/Critical regressions | |
| - Producing a patch release within **1 week** for Major regressions |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/docs/release-contract.md` around lines 108 - 111, Remove the
contradictory "(exact times TBD)" qualifier from the header "HyperFleet commits
to (exact times TBD):" so the explicit SLAs in the two bullets ("Producing a
patch release within **48 hours** for Blocker/Critical regressions" and
"Producing a patch release within **1 week** for Major regressions") are the
single source of truth; update the header to "HyperFleet commits to:" (or
alternatively remove the bullets and keep the TBD wording) so the document no
longer contains both TBD and explicit times.
There was a problem hiding this comment.
The times are just an example, but they are yet to be confirmed
There was a problem hiding this comment.
@rh-amarin, understood — the times are placeholders pending confirmation. In that case, a small tweak to make the provisional nature explicit inline could help avoid confusion, e.g.:
HyperFleet commits to (exact times TBD):
- Producing a patch release within **~48 hours** for Blocker/Critical regressions
- Producing a patch release within **~1 week** for Major regressionsThe ~ prefix on the values signals they're approximate/provisional without requiring readers to re-read the header. That said, if the current phrasing is intentional and works for your audience, happy to leave it as-is.
| | Layer | Owner | Scope | Runs on | | ||
| |-------|-------|-------|---------| | ||
| | Unit tests | HyperFleet | Each component in isolation | Every PR (presubmit) | | ||
| | Integration tests | HyperFleet | Cross-component API contracts | Every PR (presubmit) | | ||
| | HyperFleet E2E | HyperFleet | HyperFleet stack end-to-end | Nightly (main branch) | | ||
| | ROSA integration | ROSA Team | Full ROSA region + HyperFleet override | Nightly (HyperFleet main) | | ||
| | GCP integration | GCP Team | GCP deployment + HyperFleet | TBD (ref: GCP-334) | | ||
| | Release gate | HyperFleet | All of the above must pass | Before GA tag | | ||
|
|
There was a problem hiding this comment.
Resolve the release-gate contradiction in the ownership matrix.
Line 140 marks GCP integration as TBD, while Line 141 requires all layers to pass before GA. This is internally inconsistent and makes GA criteria non-operable.
Suggested wording adjustment
-| GCP integration | GCP Team | GCP deployment + HyperFleet | TBD (ref: GCP-334) |
-| Release gate | HyperFleet | All of the above must pass | Before GA tag |
+| GCP integration | GCP Team | GCP deployment + HyperFleet | TBD (ref: GCP-334) |
+| Release gate | HyperFleet | All mandatory layers must pass; GCP integration is advisory until GCP-334 is complete | Before GA tag |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | Layer | Owner | Scope | Runs on | | |
| |-------|-------|-------|---------| | |
| | Unit tests | HyperFleet | Each component in isolation | Every PR (presubmit) | | |
| | Integration tests | HyperFleet | Cross-component API contracts | Every PR (presubmit) | | |
| | HyperFleet E2E | HyperFleet | HyperFleet stack end-to-end | Nightly (main branch) | | |
| | ROSA integration | ROSA Team | Full ROSA region + HyperFleet override | Nightly (HyperFleet main) | | |
| | GCP integration | GCP Team | GCP deployment + HyperFleet | TBD (ref: GCP-334) | | |
| | Release gate | HyperFleet | All of the above must pass | Before GA tag | | |
| | Layer | Owner | Scope | Runs on | | |
| |-------|-------|-------|---------| | |
| | Unit tests | HyperFleet | Each component in isolation | Every PR (presubmit) | | |
| | Integration tests | HyperFleet | Cross-component API contracts | Every PR (presubmit) | | |
| | HyperFleet E2E | HyperFleet | HyperFleet stack end-to-end | Nightly (main branch) | | |
| | ROSA integration | ROSA Team | Full ROSA region + HyperFleet override | Nightly (HyperFleet main) | | |
| | GCP integration | GCP Team | GCP deployment + HyperFleet | TBD (ref: GCP-334) | | |
| | Release gate | HyperFleet | All mandatory layers must pass; GCP integration is advisory until GCP-334 is complete | Before GA tag | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/docs/release-contract.md` around lines 134 - 142, The ownership
matrix contradicts itself because the "GCP integration" row is marked as "TBD"
while the "Release gate" row requires "All of the above must pass" before GA;
update the document so the gate is operable by either assigning a concrete
owner/schedule to the "GCP integration" row (replace "TBD (ref: GCP-334)" with
the actual owner and cadence) or changing the "Release gate" text to explicitly
exclude items marked TBD (e.g., "All applicable, non-TBD layers must pass");
edit the rows "GCP integration" and "Release gate" in the table accordingly to
remove the contradiction.
| |----------|----------|--------|-------| | ||
| | Container images | `quay.io/openshift-hyperfleet/hyperfleet-{component}:{version}` | OCI image | Built automatically by Prow on GA tag | | ||
| | Helm charts | OCI registry (see [Helm Chart Distribution](#helm-chart-distribution)) | OCI artifact | Required for ROSA/Argo CD consumption | | ||
| | Release notes | `hyperfleet-release` repo, `releases/release-X.Y/` | Markdown | Compatibility matrix, breaking changes, upgrade guide | |
There was a problem hiding this comment.
After our last discussion, I’m wondering whether the hyperfleet-release repo is intended to include breaking changes and upgrade guides. Or would it make more sense to keep these within each component to avoid duplication across multiple places?
There was a problem hiding this comment.
IMO content should go into the components repository.
But for the hyperfleet-release I think we can keep just a compatibility matrix that says
- HyperFleet v0.2.0
- consist of:
- hyperfleet-api v0.2.0
- hyperfleet-sentinel v0.2.0
- hyperfleet-adapter v0.2.0
- tested on:
- rosa v0.0.1
- gcp v0.0.2
- consist of:
| | Git tags | Per-component repos + `hyperfleet-release` | `vX.Y.Z` / `release-X.Y` | See [Release Process](hyperfleet-release-process.md) | | ||
|
|
||
|
|
||
| When a GA release is published, it will have detail of which ROSA/GCP versions have passed the integration tests to use as compatibility matrix. This allows to potentially introduce a breaking change in one release, that may be only deployable by another pillar. |
There was a problem hiding this comment.
How should we obtain ROSA/GCP version information? Do we need to manually check with those teams for every Hyperfleet release?
There was a problem hiding this comment.
ROSA/GCP team will provide a way to run their pipelines, with a specific stable version of their solution for testing ours.
So we will take the version running there for the compatibility matrix
There was a problem hiding this comment.
Question, will they have a 'version' or will it just be a point in time, this is something that we struggled with in OCM, we have no 'version' for our environments, we just SHA's at specific points in time
| **Agreed path**: | ||
|
|
||
| 1. **Short-term (Q2 2026)**: ROSA team sets up a temporary OCI registry to publish HyperFleet Helm charts. This unblocks integration testing immediately. | ||
| 2. **Q2 target**: HyperFleet team publishes charts to an OCI-compliant registry via Conflux as part of the release pipeline, eliminating the temporary workaround and the Git coupling. |
There was a problem hiding this comment.
I think you mean Konflux not Conflux.
| |-------|---------|----------|------------| | ||
| | Release candidate available | `#hyperfleet-releases` Slack | RC cut day | GCP team, ROSA team | | ||
| | GA release published | `#hyperfleet-releases` Slack | GA day | GCP team, ROSA team | | ||
| | Breaking change in next release | `#hyperfleet-releases` Slack | ≥ 1 sprint before GA | GCP team, ROSA team | |
There was a problem hiding this comment.
Do you mean notify breaking change via slack from a summarized level? Will they be linked to each component repo for breaking change details?
There was a problem hiding this comment.
Detail will be in components, yes, no need to duplicate.
But this way, we have a trace when we communicated to our consumers about these breaking changes.
| | Hotfix / patch release | `#hyperfleet-releases` Slack | Within 2 hours of GA tag | GCP team, ROSA team | | ||
|
|
||
|
|
||
| At this point in time (April 26) breaking changes are not blockers to HyperFleet releases as ROSA/GCP teams do not have to keep long running clusters and migrate data. |
There was a problem hiding this comment.
Have we already aligned on this date April 26?
|
|
||
| ### Rollback / Recovery | ||
|
|
||
| HyperFleet uses a **roll-forward** strategy for MVP: issues are fixed via patch releases rather than rollback. See [Release Process — Release Recovery Strategy](hyperfleet-release-process.md#55-release-recovery-strategy). |
There was a problem hiding this comment.
Do we plan to continue using a roll-forward strategy for a period after MVP, as @ciaranRoche mentioned? I’ll also update the release process doc accordingly.
There was a problem hiding this comment.
Yes, I think we should maintain a roll-forward strategy, we can confirm on the next office hours to ensure this aligns with their expectations.
| - Start with **nightly runs** against HyperFleet `main` branch, not presubmit jobs | ||
| - Test against the **latest known-good stable version** of the ROSA regional platform (production Maestro version), replacing only the HyperFleet component under test | ||
| - The ROSA team will **temporarily enable OCI chart pushing** so the HyperFleet team can inject PR-built charts into the ROSA deployment pipeline | ||
| - Evaluate **non-blocking presubmit** integration with the HyperFleet release repository as a follow-up |
There was a problem hiding this comment.
Would it be better to add similar prow jobs—like the E2E testing Prow job—as a standard requirement and include them in the release X.Y checklist?
| | Gap | Owning Team | Mitigation | | ||
| |-----|-------------|------------| | ||
| | HyperFleet not yet onboarded to ROSA's pre-merge E2E mechanism | HyperFleet | Onboard to `openshift/release` Prow config + create `quay.io/rrp-dev-ci/` image repos (see onboarding steps above) | | ||
| | Helm chart override (OCI) not yet wired into ROSA CI | ROSA + HyperFleet | Temporary OCI setup by ROSA team (Q2 2026, immediate action); replaced by Conflux Q2 target | |
| | Unit tests | HyperFleet | Each component in isolation | Every PR (presubmit) | | ||
| | Integration tests | HyperFleet | Cross-component API contracts | Every PR (presubmit) | | ||
| | HyperFleet E2E | HyperFleet | HyperFleet stack end-to-end | Nightly (main branch) | | ||
| | ROSA integration | ROSA Team | Full ROSA region + HyperFleet override | Nightly (HyperFleet main) | |
There was a problem hiding this comment.
Who is responsible to update the configuration of breaking changes for nightly integration testing? And how to notify?
|
|
||
| The ROSA regional platform consumes HyperFleet as part of a GitOps deployment pipeline. Each deployment initiates three pipelines: | ||
|
|
||
| ```mermaid |
There was a problem hiding this comment.
When I see you add a mermaid diagram it warms my heart 😆
|
|
||
| Environment configuration is centralized in a `default` file that acts as the bill of materials for Argo CD reconciliation. Component versions, Git revisions, and domain names are defined there and can be overridden per environment. | ||
|
|
||
| ### GCP Platform Architecture |
There was a problem hiding this comment.
Should we include this 🤔 just thinking it might be hard to keep up to date, would be good if we could link to their doc's if they have them instead
|
|
||
| | Artifact | Location | Format | Notes | | ||
| |----------|----------|--------|-------| | ||
| | Container images | `quay.io/openshift-hyperfleet/hyperfleet-{component}:{version}` | OCI image | Built automatically by Prow on GA tag | |
There was a problem hiding this comment.
just leaving a comment for myself as a reminder we need to create a ticket post Konflux onboarding to update all references to the newest image registry
| | Git tags | Per-component repos + `hyperfleet-release` | `vX.Y.Z` / `release-X.Y` | See [Release Process](hyperfleet-release-process.md) | | ||
|
|
||
|
|
||
| When a GA release is published, it will have detail of which ROSA/GCP versions have passed the integration tests to use as compatibility matrix. This allows to potentially introduce a breaking change in one release, that may be only deployable by another pillar. |
There was a problem hiding this comment.
Question, will they have a 'version' or will it just be a point in time, this is something that we struggled with in OCM, we have no 'version' for our environments, we just SHA's at specific points in time
|
|
||
| ### Rollback / Recovery | ||
|
|
||
| HyperFleet uses a **roll-forward** strategy for MVP: issues are fixed via patch releases rather than rollback. See [Release Process — Release Recovery Strategy](hyperfleet-release-process.md#55-release-recovery-strategy). |
There was a problem hiding this comment.
Yes, I think we should maintain a roll-forward strategy, we can confirm on the next office hours to ensure this aligns with their expectations.
…n testing strategy
7538ca9 to
25e7b50
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
hyperfleet/docs/release-contract.md (2)
134-137:⚠️ Potential issue | 🟠 MajorResolve rollback SLA contradiction.
Line 134 says “exact times TBD”, but Lines 136-137 already define explicit timelines. Keep one source of truth so this contract is unambiguous.
Suggested edit
-HyperFleet commits to (exact times TBD): +HyperFleet commits to:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/docs/release-contract.md` around lines 134 - 137, The release-contract currently contains conflicting statements: the heading "HyperFleet commits to (exact times TBD):" contradicts the explicit timelines in the bullets; update the contract so there is a single source of truth by removing or replacing the "(exact times TBD)" text in the header and clarifying that the bullets define the SLA (e.g., change to "HyperFleet commits to the following timelines:"), ensuring the header and the bullet items (the 48-hour and 1-week patch-release commitments) are consistent and unambiguous.
167-177:⚠️ Potential issue | 🟠 MajorMake the release gate operable with currently acknowledged coverage gaps.
Line 167 requires all layers to pass before GA, but Lines 175-177 document known unresolved gaps (notably GCP chart-override limitations and multi-component validation limits). Add an explicit exception/waiver rule, otherwise the gate is not consistently enforceable.
Suggested wording adjustment
-| Release gate | HyperFleet | All of the above must pass | Before GA tag | +| Release gate | HyperFleet | All mandatory, currently implemented layers must pass; unresolved gaps require explicit release waiver from HyperFleet + affected consumer team | Before GA tag |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hyperfleet/docs/release-contract.md` around lines 167 - 177, Update the "Release gate" entry so it is operable despite known gaps by adding an explicit exception/waiver rule that ties to the "Testing Gaps Identified" section: change the Pass criterion text from "All of the above must pass" to include a clause like "All of the above must pass, except for documented, tracked gaps listed under 'Testing Gaps Identified' which may be waived by an approved exception"; add a short waiver process description (who can approve, required tracking ID/issue, and required mitigation timeline) and reference the existing gap items (e.g., the GCP chart-override and multi-component PR testing gaps) so the gate enforcement logic and readers can consistently apply the exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hyperfleet/docs/release-contract.md`:
- Line 127: The sentence "At this point in time (April 26) breaking changes are
not blockers to HyperFleet releases..." contains a hardcoded calendar date that
will go stale; replace the date with a versioned or status-based qualifier
(e.g., "as of release X.Y" or "currently" / "until further notice") and update
the sentence to read something like "Currently, breaking changes are not
blockers to HyperFleet releases; ROSA/GCP teams are not required to maintain
long‑running clusters or migrate data" so the policy remains evergreen—locate
the sentence by searching for the phrase "At this point in time (April 26)" in
release-contract.md and remove the explicit date, substituting a version/status
token.
---
Duplicate comments:
In `@hyperfleet/docs/release-contract.md`:
- Around line 134-137: The release-contract currently contains conflicting
statements: the heading "HyperFleet commits to (exact times TBD):" contradicts
the explicit timelines in the bullets; update the contract so there is a single
source of truth by removing or replacing the "(exact times TBD)" text in the
header and clarifying that the bullets define the SLA (e.g., change to
"HyperFleet commits to the following timelines:"), ensuring the header and the
bullet items (the 48-hour and 1-week patch-release commitments) are consistent
and unambiguous.
- Around line 167-177: Update the "Release gate" entry so it is operable despite
known gaps by adding an explicit exception/waiver rule that ties to the "Testing
Gaps Identified" section: change the Pass criterion text from "All of the above
must pass" to include a clause like "All of the above must pass, except for
documented, tracked gaps listed under 'Testing Gaps Identified' which may be
waived by an approved exception"; add a short waiver process description (who
can approve, required tracking ID/issue, and required mitigation timeline) and
reference the existing gap items (e.g., the GCP chart-override and
multi-component PR testing gaps) so the gate enforcement logic and readers can
consistently apply the exception.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 76710a70-aa4f-4b57-8851-ed9092ce43a6
📒 Files selected for processing (1)
hyperfleet/docs/release-contract.md
| | Hotfix / patch release | `#hyperfleet-releases` Slack | Within 2 hours of GA tag | GCP team, ROSA team | | ||
|
|
||
|
|
||
| At this point in time (April 26) breaking changes are not blockers to HyperFleet releases as ROSA/GCP teams do not have to keep long running clusters and migrate data. |
There was a problem hiding this comment.
Remove hardcoded date from normative policy text.
Line 127 embeds a point-in-time statement (April 26) in a rule section. This will go stale and can create policy ambiguity. Prefer a versioned/status-based qualifier instead of a calendar date in the sentence.
Suggested edit
-At this point in time (April 26) breaking changes are not blockers to HyperFleet releases as ROSA/GCP teams do not have to keep long running clusters and migrate data.
+For the current MVP phase, breaking changes are not blockers to HyperFleet releases, since ROSA/GCP teams do not maintain long-running clusters with data migration requirements.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| At this point in time (April 26) breaking changes are not blockers to HyperFleet releases as ROSA/GCP teams do not have to keep long running clusters and migrate data. | |
| For the current MVP phase, breaking changes are not blockers to HyperFleet releases, since ROSA/GCP teams do not maintain long-running clusters with data migration requirements. |
🧰 Tools
🪛 LanguageTool
[grammar] ~127-~127: Use a hyphen to join words.
Context: ... ROSA/GCP teams do not have to keep long running clusters and migrate data. ###...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hyperfleet/docs/release-contract.md` at line 127, The sentence "At this point
in time (April 26) breaking changes are not blockers to HyperFleet releases..."
contains a hardcoded calendar date that will go stale; replace the date with a
versioned or status-based qualifier (e.g., "as of release X.Y" or "currently" /
"until further notice") and update the sentence to read something like
"Currently, breaking changes are not blockers to HyperFleet releases; ROSA/GCP
teams are not required to maintain long‑running clusters or migrate data" so the
policy remains evergreen—locate the sentence by searching for the phrase "At
this point in time (April 26)" in release-contract.md and remove the explicit
date, substituting a version/status token.
https://redhat.atlassian.net/browse/HYPERFLEET-633
This is a first version to have a contract for releasing HyperFleet to be consumed by ROSA and GCP teams
Summary by CodeRabbit