Skip to content

feat: Token count for agents#860

Open
Ayaz-Microsoft wants to merge 11 commits into
devfrom
token-count
Open

feat: Token count for agents#860
Ayaz-Microsoft wants to merge 11 commits into
devfrom
token-count

Conversation

@Ayaz-Microsoft
Copy link
Copy Markdown

Purpose

Count total input and output tokens used by each agent at various stages and show in workbook for analysis.

Does this introduce a breaking change?

  • Yes
  • No

Golden Path Validation

  • I have tested the primary workflows (the "golden path") to ensure they function correctly without errors.

Deployment Validation

  • I have validated the deployment process successfully and all services are running as expected with this change.

What to Check

Verify that the following are valid

  • ...

Other Information

- Implemented TokenUsageAccumulator to track per-request, per-agent, and per-model token usage.
- Emitted custom events to Azure Application Insights for monitoring.
- Created KQL queries for visualizing token usage metrics in Application Insights.
- Developed a workbook for easy access to token usage insights.
- Updated orchestrator to integrate token usage tracking during message processing and response handling.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 25, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
src/backend
   app.py72013581%47, 64, 71–76, 79, 84, 119–120, 165, 244, 262, 281, 288, 412–413, 519, 522, 525–527, 536–537, 540, 542–544, 553–556, 566–567, 570–576, 579–581, 587–588, 590, 601, 603–608, 611–613, 620, 630–631, 633, 730–734, 742, 745, 753, 756–759, 768–769, 772, 781, 783, 811, 820–821, 835–836, 855–856, 858–859, 866–867, 870–871, 1015–1017, 1021–1023, 1062–1063, 1101, 1104, 1106, 1132–1133, 1135–1137, 1139, 1159–1160, 1162–1163, 1234–1235, 1266–1267, 1344–1345, 1605–1607, 1609–1610, 1617–1619, 1621, 1756, 1774, 1836–1837, 1841–1842
   llm_token_telemetry.py41228231%120–121, 123–126, 128, 141, 146, 153–156, 164–177, 183, 185, 195–196, 198–199, 202–204, 212–227, 229–233, 251, 257, 261–263, 270–283, 293–299, 307–309, 311–315, 317–318, 320, 331, 340–341, 354–370, 452–453, 459–465, 471, 489, 493, 498–504, 508–512, 519–526, 538–546, 556–566, 568–569, 573–577, 579–586, 591, 607–609, 618–620, 631–633, 649–651, 666–668, 682–684, 704–705, 708, 717–718, 720–721, 732–734, 765–766, 768–771, 776, 779, 785–786, 791–792, 798–799, 806–807, 817, 865–872, 877–878, 887–892, 894–897, 900, 903–904, 910, 915, 920, 925–927, 939–940, 948, 976, 978–979, 981–988, 990–996, 998–999, 1004, 1014, 1018
   orchestrator.py75818975%40–42, 546, 549–557, 561, 567–568, 579, 584–588, 595, 599–602, 611, 616–617, 623–624, 629–630, 637, 722–723, 743–744, 758, 926, 969–970, 974, 983, 985–987, 989, 995–997, 999, 1035–1036, 1039–1045, 1074, 1099–1102, 1104–1105, 1112–1113, 1115–1117, 1120–1122, 1124, 1134–1137, 1141–1142, 1144–1145, 1156–1157, 1159–1166, 1175–1176, 1179–1180, 1205, 1244–1245, 1264–1265, 1321–1322, 1325–1326, 1336–1338, 1424, 1450, 1492–1494, 1496–1498, 1528–1531, 1621–1622, 1627–1629, 1645–1647, 1649–1651, 1665–1668, 1704–1705, 1734, 1782–1783, 1804, 1808, 1846, 1865–1866, 1882, 1884–1889, 1892, 1917–1918, 1920–1922, 1939–1940, 1962, 1965, 1971–1973, 1978–1979, 2017, 2070, 2074, 2079, 2152–2153, 2164–2172, 2202–2204
   telemetry.py462447%43–47, 54, 56–58, 60, 67–80
TOTAL833276190% 

Tests Skipped Failures Errors Time
422 0 💤 0 ❌ 0 🔥 13.130s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end LLM token usage telemetry for agent/workflow executions in the backend, plus Azure Monitor artifacts (workbook + KQL) to analyze usage by request, agent, model, and stage.

Changes:

  • Added TokenUsageAccumulator + extraction helpers to capture token usage from Agent Framework responses/stream updates and emit LLM_*_Token_Usage App Insights custom events.
  • Threaded user_id through orchestrator entrypoints and API handlers; added per-request ContextVar propagation to tag telemetry emitted from deeper helpers (e.g., image generation).
  • Added standalone Bicep deployments for monitoring add-on resources and a “Token Usage” workbook, plus workbook JSON, KQL query pack, and docs.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/backend/token_usage.py New module to extract/accumulate token counts and emit App Insights custom events.
src/backend/orchestrator.py Creates/records/flushes token usage across workflow streaming, brief parsing, generation, and image paths; propagates user_id.
src/backend/app.py Passes user_id into orchestrator calls for telemetry correlation.
infra/workbook/workbook.bicep Standalone deployment of the Token Usage workbook targeting an App Insights resource (optional binding).
infra/workbook/README.md Deployment instructions for the standalone workbook template.
infra/monitoring/monitoring.bicep Standalone “add monitoring later” deployment (LA + App Insights).
infra/monitoring/README.md Instructions for post-deploy monitoring enablement and wiring.
infra/dashboards/token-usage-workbook.json Serialized workbook definition with tiles/queries for token usage analysis.
infra/dashboards/token-usage-queries.kql KQL query pack for App Insights / Log Analytics.
docs/TokenUsageTelemetry.md Documentation for emitted events, enabling telemetry, and querying/visualizing usage.
infra/main.bicep Notes workbook is deployed separately; adds ACI tag hashing to force restart on monitoring config change.
infra/main_custom.bicep Notes workbook deployed separately; adds ACI tag hashing; changes default gptModelCapacity.
infra/main.json Recompiled ARM output with additional infra deltas beyond token telemetry.
.gitignore Fixes rai_results ignore entry and adds Python coverage artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/orchestrator.py Outdated
Comment thread src/backend/orchestrator.py
Comment thread src/backend/orchestrator.py
Comment thread src/backend/token_usage.py Outdated
Comment thread infra/workbook/README.md Outdated
Comment thread infra/main_custom.bicep
Comment thread infra/main.json
Comment thread docs/TokenUsageTelemetry.md Outdated
Comment thread src/backend/token_usage.py Outdated
Comment thread infra/dashboards/token-usage-workbook.json Outdated
… improve Application Insights event emission
Copilot AI review requested due to automatic review settings May 27, 2026 16:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

Comment on lines +324 to +328
input_audio_tokens=_to_int(_get(in_details, "audio_tokens")),
input_text_tokens=_to_int(_get(in_details, "text_tokens")),
input_cached_tokens=_to_int(_get(in_details, "cached_tokens")),
output_audio_tokens=_to_int(_get(out_details, "audio_tokens")),
output_text_tokens=_to_int(_get(out_details, "text_tokens")),
Comment on lines +887 to +894
start_ns = time.perf_counter_ns()
try:
found = extract_usage(source) or extract_usage_from_stream_chunk(source)
except Exception as exc: # belt + braces; extractors are already safe
logger.debug("TokenUsageScope.add failed: %s", exc, exc_info=True)
return None
finally:
self._extract_ns += time.perf_counter_ns() - start_ns
Comment thread infra/main.json
Comment on lines +18271 to 18276
"jumpboxDcr": {
"condition": "[and(variables('deployAdminAccessResources'), parameters('enableMonitoring'))]",
"type": "Microsoft.Resources/deployments",
"apiVersion": "2025-04-01",
"name": "[take(format('avm.res.network.private-dns-zone.{0}', replace(variables('privateDnsZones')[copyIndex()], '.', '-')), 64)]",
"name": "[take(format('avm.res.insights.data-collection-rule.{0}', variables('jumpboxDcrName')), 64)]",
"properties": {
Comment on lines +96 to +99
- **Out of scope (intentional).** The current implementation does not persist
token totals to Cosmos DB and does not push real-time updates to the
frontend. Operators add cost-estimation queries as needed by multiplying
token counts by their negotiated per-1K-token rates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants