This guide shows how to wire AgentOps into a complete GenAIOps CI/CD
pipeline on GitHub Actions, mapped to a classic GitFlow branching model
with three deployment environments (dev, qa, production).
Start with agentops workflow analyze. It reads the repo and recommends both
deployment wiring (azd, prompt-agent, or placeholder) and the eval runner.
Generate the PR gate first: agentops workflow generate --kinds pr. Add
DEV/QA/PROD after GitHub Environments and Azure OIDC are ready. Repos with
azure.yaml use azd-backed deploys; Foundry prompt agents can use
prompt-agent deploys and AgentOps cloud eval in Foundry when the dataset is
compatible.
The default scaffold ships four release-path templates. A scheduled Doctor
workflow is available separately when you explicitly generate --kinds doctor.
| File | Trigger | GitHub Environment | Purpose |
|---|---|---|---|
agentops-pr.yml |
PRs to develop, release/**, main |
dev |
Eval gate + Doctor gate (default blocks on critical findings; configurable via --doctor-gate) + PR comment |
agentops-deploy-dev.yml |
push to develop |
dev |
Eval → build → deploy DEV |
agentops-deploy-qa.yml |
push to release/** |
qa |
Eval → build → deploy QA |
agentops-deploy-prod.yml |
push to main |
production |
Safety eval → evidence → build → deploy PROD |
agentops-doctor.yml |
daily cron | dev |
Optional scheduled Doctor + release evidence |
flowchart LR
feat["feature/*"] -->|PR| prGate1{{"agentops-pr.yml<br/>(gate)"}}
prGate1 -->|merge| dev["develop"]
dev --> deployDev["agentops-deploy-dev.yml"]
deployDev --> DEV(["DEV"])
rel["release/*"] -->|push| deployQa["agentops-deploy-qa.yml"]
deployQa --> QA(["QA"])
rel -->|PR| prGate2{{"agentops-pr.yml<br/>(gate)"}}
prGate2 -->|merge| main["main"]
main --> deployProd["agentops-deploy-prod.yml"]
deployProd --> PROD(["PROD<br/>(required reviewers)"])
classDef gate fill:#fff3cd,stroke:#856404,color:#000;
classDef env fill:#d1ecf1,stroke:#0c5460,color:#000;
class prGate1,prGate2 gate;
class DEV,QA,PROD env;
If you are on trunk-based development, generate only the templates you
need: agentops workflow generate --kinds pr,dev,prod.
# 1. Analyze and fix eval setup before the first blocking run.
agentops eval analyze
# 2. Make sure your eval works locally first.
agentops eval run
# 3. Analyze the repo shape before generating workflows.
agentops workflow analyze
# 4. Generate the PR gate first.
agentops workflow generate --kinds pr
# 5. Configure GitHub (see sections below):
# - OIDC repo variables
# - dev environment
# - branch protection on develop and main
# 6. Commit and push the PR gate.
# 7. Only after deploy wiring is real, generate the full scaffold.
# auto uses azd when azure.yaml exists, or prompt-agent when agentops.yaml
# targets a Foundry prompt agent (name:version).
agentops workflow generate --kinds pr,dev,qa,prod --deploy-mode auto --forceThe GitHub setup spans repository creation, Azure OIDC, Actions variables, GitHub Environments, and branch protection. For a smoother first run, install the AgentOps workflow skill and hand this setup to Copilot:
agentops skills install --platform copilot --forceThen open Copilot and run /skills. Confirm agentops-workflow is loaded
before continuing.
When the skill is loaded, ask Copilot:
Use the AgentOps workflow skill to get the generated AgentOps GitHub Actions
workflows running end to end.
This may be a new folder with no Git repo or GitHub remote yet. Create or
connect the GitHub repo if needed, wire Azure OIDC and required Actions
variables, verify the OIDC principal has Foundry User access, create only the
environments used by the generated workflows, show me the plan before changing
GitHub or Azure, and call out anything that needs owner/admin permission.
In Settings → Secrets and variables → Actions → Variables, add:
| Variable | Purpose |
|---|---|
AZURE_CLIENT_ID |
App registration / managed identity used for federated login |
AZURE_TENANT_ID |
Azure AD tenant |
AZURE_SUBSCRIPTION_ID |
Target subscription |
AZURE_AI_FOUNDRY_PROJECT_ENDPOINT |
Foundry project URL (used by the eval step) |
AZURE_OPENAI_DEPLOYMENT |
Model deployment used by local evaluators and AgentOps cloud eval judges |
APPLICATIONINSIGHTS_CONNECTION_STRING |
Optional fallback when the Foundry project's App Insights connection cannot be auto-discovered |
Then on the Azure side, configure Workload Identity Federation (federated credentials) on the app registration so it can be assumed from GitHub Actions runs. See Microsoft's WIF docs.
For Foundry prompt-agent gates, the same app registration / service principal
needs two Azure RBAC roles before the first workflow run. Both are required
and the eval step fails silently (every metric returns null) if only one is
in place:
- Foundry User on the Foundry project or Foundry resource. Azure
Readeris not enough because the eval step calls Foundry data-plane APIs such asMicrosoft.CognitiveServices/accounts/AIServices/agents/read. - Cognitive Services OpenAI User on the underlying Azure AI Services
account that hosts the evaluator model deployment. Foundry
azure_ai_evaluatorgraders impersonate the OIDC principal to call OpenAI; without this role they fail with a 401PermissionDeniedonMicrosoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/actionand every metric returnsnullin the cloud eval report. AgentOps lifts that error intoresults.jsonand the orchestrator's "0 usable metric scores" warning so you can see the cause in CI logs, but the workflow still fails the gate. The role ids are53ca6127-db72-4b80-b1b0-d745d6d5456d(Foundry User) and5e0bd9bd-7b93-4f28-af87-19fc36ad61bd(Cognitive Services OpenAI User).
The generated eval and doctor workflows install AgentOps telemetry support.
When AZURE_AI_FOUNDRY_PROJECT_ENDPOINT is set, AgentOps first tries to
auto-discover the Foundry project's Application Insights resource. If that
is not available in your tenant, set APPLICATIONINSIGHTS_CONNECTION_STRING
as either a repository/environment variable or a secret. CI eval runs then
emit agentops.eval.* spans, and scheduled Doctor runs emit
agentops.agent.finding.* spans that the Cockpit can deep-link into Azure
Monitor Logs.
In Settings → Environments, create three:
- Usually no protection rules.
- Override env-specific variables here (e.g. dev resource group, dev ACA app name).
- Optional: restrict deployment branches to
release/**. - Override env-specific variables for QA infra.
- Required reviewers: at least one. Deploys to PROD pause until approved.
- Optional: Wait timer for an extra cool-down.
- Optional: Deployment branches: restrict to
main. - Override env-specific variables for production infra.
Environment-level variables override repo-level ones automatically
when the workflow's environment: matches.
Before making eval a required PR/deploy gate, run:
agentops eval analyze --format markdownThis command is read-only and local-only. It checks whether agentops.yaml,
the target kind, and the dataset columns are ready for agentops eval run. If
the project looks like a RAG app, tool-using agent, HTTP/containerized app, or
other accelerator where deterministic inference is not enough, it recommends
using Copilot with agentops-config, agentops-dataset, and/or
agentops-eval before the first blocking run. When skills are missing from the
repo, the output includes the install command and a copy/paste Copilot handoff
prompt.
AgentOps is azd-first for deployment: AgentOps runs the evaluation gate,
while Azure Developer CLI owns infrastructure, packaging, deployment, and
hooks declared in azure.yaml.
Before choosing manually, run:
agentops workflow analyze --format markdownThe analyzer is read-only and local-only. It looks for azure.yaml, Bicep
files, agentops.yaml, Foundry prompt-agent shape, source-controlled prompt
files, landing-zone manifests, private-network terms, Docker/Container Apps
signals, and existing CI folders. README matches such as GPT-RAG, Live Voice, or
AI Landing Zone are treated as hints; structural files drive the recommendation.
workflow generate --deploy-mode auto uses the same recommendation, so the
analysis and generated templates do not drift. The analyzer also reports the
eval runner: AgentOps cloud eval in Foundry for compatible Foundry prompt
agents, otherwise AgentOps local eval. If you omit --deploy-mode, the default
is auto; the command output prints the selected effective mode, for example
azd (auto default) or placeholder (auto default).
Use one of these modes:
| Mode | When to use it |
|---|---|
--deploy-mode auto |
Pick azd, prompt-agent, or placeholders from repo signals. |
--deploy-mode azd |
Use azd provision / azd deploy templates. |
--deploy-mode prompt-agent |
Create, evaluate, and record a Foundry prompt-agent candidate. |
--deploy-mode placeholder |
Keep stack-agnostic build/deploy placeholders. |
For azd-managed repos:
agentops workflow generate --kinds pr,dev,qa,prod --deploy-mode azd --forceThe generated deploy workflows:
- install
azd; - run
azd env new ... || azd env select ...on each CI runner; - run
azd provision --no-promptfor DEV by default; - run
azd provision --no-promptfor QA/PROD only when manually requested; - run the selected eval runner as the quality/safety gate;
- run
azd env refreshon the deploy runner; - run
azd deploy --no-prompt.
For production deploys, generated templates also run
agentops doctor --evidence-pack after the eval gate and upload
.agentops/release/latest/evidence.json plus evidence.md. Warnings do not
change the exit-code contract; critical Doctor findings block because the
production templates run with --severity-fail critical.
Set AZURE_ENV_NAME per GitHub Environment if your azd env names differ
from dev, qa, and production. Set AZURE_LOCATION when the azd
template needs an explicit region.
When azure.yaml is missing or --deploy-mode placeholder is selected,
each agentops-deploy-*.yml ships with Build (placeholder) and
Deploy (placeholder) steps. Prefer creating an azd deployment first; if
that is not possible, replace the placeholders with project-specific
commands.
For the simplest Foundry prompt-agent workflow, keep the instructions in
source control and point agentops.yaml at them:
version: 1
agent: "quickstart-agent:2"
dataset: .agentops/data/smoke.jsonl
execution: cloud
prompt_file: .agentops/prompts/agent-instructions.mdThen generate prompt-agent deploy workflows:
agentops workflow generate --kinds pr,dev,qa,prod --deploy-mode prompt-agent --forceEach deploy workflow does this:
- stages a candidate Foundry prompt-agent version from
prompt_file; - writes
.agentops/deployments/agentops.candidate.yamlpointing at the candidatename:version; - runs
agentops eval runagainst that candidate version, using Foundry cloud eval when supported or the local runner as the fallback; - runs
agentops doctor --evidence-packso the exact candidate has release evidence; - records
.agentops/deployments/foundry-agent.jsonas a CI artifact only after the gate passes.
This keeps the invariant clear: the evaluated agent version is the deployed agent version. Foundry manages the candidate agent versions; AgentOps records normalized AgentOps results, and always supplies the repo-side gate, deployment record, and Cockpit visibility.
Legacy workflows that explicitly use the Microsoft Foundry eval Action/task can still be regenerated by older AgentOps versions, but new prompt-agent gates use AgentOps cloud eval so threshold failures produce normalized PR evidence.
# Build
- name: Build image
run: |
az acr build \
--registry "${{ vars.ACR_NAME }}" \
--image "myapp:${{ github.sha }}" \
.
# Deploy
- name: Deploy to ACA
run: |
az containerapp update \
--name "${{ vars.ACA_APP_NAME }}" \
--resource-group "${{ vars.AZURE_RESOURCE_GROUP }}" \
--image "${{ vars.ACR_NAME }}.azurecr.io/myapp:${{ github.sha }}"# Build
- uses: actions/setup-python@v6
with: { python-version: "3.11" }
- run: pip install -r requirements.txt -t ./dist
- run: cp -r src ./dist/
# Deploy
- uses: azure/webapps-deploy@v3
with:
app-name: ${{ vars.WEBAPP_NAME }}
package: ./dist# Build is typically empty: hosted agents are configured, not packaged.
# Deploy: publish a new agent version with whatever your project uses
# to manage Foundry agents (project-specific tooling).If you ask a coding agent to generate a zero-trust deployment, have it
create or adapt azure.yaml, infra/, and azd-native hooks such as
preprovision, postprovision, predeploy, and postdeploy. Do not
wire ad-hoc hook scripts directly into AgentOps workflows. After the azd
path is valid locally, regenerate the workflows with
--deploy-mode azd.
For copied accelerators such as GPT-RAG, Live Voice Practice, or apps based on the Azure AI Landing Zone pattern, use AgentOps to turn the deployment path into actionable readiness: landing-zone preflight, azd/Bicep workflow stages, Doctor checks, eval gates, and post-deploy evidence.
agentops workflow analyze --format markdown --out agentops-workflow-plan.mdUse the output as the plan for your coding agent:
- AgentOps owns repo-side eval gates, Doctor readiness checks, artifacts, and Cockpit visibility.
azdownsprovision,deploy, and hooks for app/infra lifecycle whenazure.yamlis present or can be added.- Foundry owns hosted agents, evaluations, traces, and operations.
- Project-specific steps such as indexing data, seeding search, building containers, updating app config, or running private-network post-provision work stay in the accelerator's azd hooks or existing deployment tooling.
When scripts/Invoke-PreflightChecks.ps1 is present, generated azd deploy
workflows run it with -Strict before azd provision. Doctor also reports
AI Landing Zone deployment readiness in the Operational Excellence findings,
including whether the preflight script, agentops.yaml, azd deploy workflow,
network isolation, and private-runner path are ready.
If the analyzer reports network isolation, private endpoints, jumpbox/Bastion, Azure Firewall, or ACR Tasks signals, plan where private data-plane work runs before making deployment automatic. GitHub-hosted runners usually cannot reach private endpoints; use a self-hosted runner in the VNet, a jumpbox handoff, or an ACR Tasks agent pool depending on the accelerator.
In Settings → Branches, add a rule for both develop and main:
- ✅ Require a pull request before merging.
- ✅ Require status checks to pass: select
AgentOps PR / Eval (PR gate). - (Optional) Require linear history.
This makes the AgentOps eval a hard merge requirement.
The PR workflow uses the eval step as the hard merge gate. It still runs
Doctor and uploads evidence.json / evidence.md, but that PR-stage Doctor
evidence is advisory: it can say Release readiness: blocked without failing
the PR. Treat that as release-review guidance. Production deploy workflows still
run Doctor with a critical finding gate.
The GitHub run summary includes the same evidence.md content, including the
Doctor finding summary. When a release gate blocks, start there: the summary
lists the critical and warning finding IDs, categories, and titles before you
open the full artifact.
When agentops-local is selected, the eval step uses the AgentOps exit code
contract to gate deploys:
| Exit code | Meaning | Job result |
|---|---|---|
0 |
Eval ran, all thresholds passed | ✅ pass |
2 |
Eval ran, one or more thresholds failed | ❌ fail (deploy never runs) |
1 |
Runtime / config error | ❌ fail |
For prompt-agent cloud eval, Foundry owns the managed evaluation run and
AgentOps owns the CI exit code. A threshold failure exits 2, so the PR/deploy
gate fails with the failing threshold rows in report.md.
Eval and deploy workflows upload (always - even on failure):
results.json- machine-readable, versionedreport.md- human-readablecloud_evaluation.json- present when using Foundry cloud evaluation; contains a deep link to the New Foundry Experience Evaluations page.agentops/official-eval/input.json,metadata.json, andresult.json- present only for legacy official Action/task workflowsevidence.jsonandevidence.md- present in PR, PROD, and optional Doctor workflows afteragentops doctor --evidence-pack
Artifact names per workflow:
| Workflow | Artifact name |
|---|---|
agentops-pr.yml |
agentops-pr-results plus release evidence in the same artifact |
agentops-deploy-dev.yml |
agentops-dev-results |
agentops-deploy-qa.yml |
agentops-qa-results |
agentops-deploy-prod.yml |
agentops-prod-results plus release evidence |
agentops-doctor.yml |
agentops-doctor-history plus release evidence |
agentops eval analyze # inspect eval setup before first run
agentops eval promote-traces --source traces.jsonl --apply
agentops doctor --evidence-pack # write release evidence
agentops workflow analyze # inspect repo and recommend stages
agentops workflow analyze --format json # stable machine-readable analysis
agentops workflow generate --kinds pr # PR gate
agentops workflow generate # PR + DEV/QA/PROD; deploy mode defaults to auto
agentops workflow generate --kinds pr,dev,prod # subset (trunk-based)
agentops workflow generate --kinds doctor # optional scheduled Doctor workflow
agentops workflow generate --deploy-mode azd # delegate deploy to azd
agentops workflow generate --deploy-mode prompt-agent # Foundry prompt deployment
agentops workflow generate --doctor-gate warning # PR also blocks on warnings
agentops workflow generate --doctor-gate none # PR Doctor advisory (pre-1.x)
agentops workflow generate --platform azure-devops
agentops workflow generate --force # overwrite existing files
agentops workflow generate --dir <path> # different repo root| Flag | Description | Default |
|---|---|---|
--kinds |
Comma-separated subset of pr,dev,qa,prod,doctor |
pr,dev,qa,prod |
--platform |
github or azure-devops |
github |
--deploy-mode |
auto, placeholder, azd, or prompt-agent |
auto |
--doctor-gate |
Severity floor for the PR Doctor step: critical, warning, or none |
critical |
--force |
Overwrite existing workflow files | false |
--dir |
Repository root | . |
The PR template runs agentops doctor --evidence-pack after the eval
step. The --doctor-gate flag controls how Doctor failures interact with
the PR merge check:
--doctor-gate value |
PR-template behavior |
|---|---|
critical (default) |
Doctor blocks the PR on critical findings. Notably this includes the regression.<metric> checks, which fire when an evaluator score drops by >= 2 * threshold_drop (default 0.20, i.e. a 20 % drop) versus the rolling baseline. Catches drift like groundedness moving 5.0 → 4.0 even when the configured eval thresholds technically still pass. |
warning |
Doctor blocks on warning or higher findings. Use when you also want the smaller (≥10 %) regression drops to block. |
none |
Doctor still writes release evidence and uploads it as a PR artifact, but does not block the PR (pre-1.x behavior). The eval step remains the only hard gate. |
Deploy templates (agentops-deploy-dev.yml, …-qa.yml, …-prod.yml)
always run agentops doctor --severity-fail critical; the
--doctor-gate flag does not affect deploy templates. Existing workflows
keep their generated --severity-fail value until you re-generate with
--force.
- Tighten thresholds for QA / PROD - copy
agentops.yamltoagentops-qa.yaml/agentops-prod.yamland tighten thethresholds:block. Update theinputs.configdefault in the matching workflow file. - Scheduled runs - add a
schedule:entry inagentops-pr.yml(or a new file) to evaluate againstmainnightly. - Matrix per scenario - if you have multiple AgentOps config files, extend
the eval job with
strategy.matrix.config:and reference${{ matrix.config }}in the eval step. - Regression baseline - wire deploy templates to download the
previous run's
results.jsonartifact and callagentops eval run --baseline <results.json>.
If your repository still has agentops-eval.yml, agentops-eval-ci.yml,
or agentops-eval-cd.yml from a prior version of AgentOps:
- Delete the three old files.
- Run
agentops workflow generate. - Re-add Build / Deploy commands you had customised.
- Update branch-protection status checks to point at the new
AgentOps PRjob.