Skip to content

Expose per-model token usage (model_usage array) in the status line payload #3405

@sinedied

Description

@sinedied

Feature request: expose per-model token usage in the status line payload

Context

The configurable status line (statusLine.command) receives a JSON payload on stdin. Token usage is currently exposed two ways:

context_window: {
  total_input_tokens,         // ← SUM across ALL models used in the session
  total_output_tokens,        // ← same
  total_cache_read_tokens,    // ← same
  total_cache_write_tokens,   // ← same
  total_reasoning_tokens,     // ← same
  ...
  current_usage: {            // ← ONLY for the currently selected model
    input_tokens,
    output_tokens,
    cache_creation_input_tokens,
    cache_read_input_tokens
  }
}

The runtime tracks usage per model in its internal modelMetrics map (this is visible in session.shutdown events and surfaced by /usage), but only the cross-model sum and the currently-selected model's slice are exposed to the status line script.

Problem

This makes it impossible for a status line script to compute an accurate session cost when the user switches models mid-session (e.g. starts in GPT-5.5, /model to Claude Opus 4.7). The script can only apply the currently selected model's rates to all cumulative tokens — which under- or over-estimates depending on which model is pricier.

Concrete example: spend 1M output tokens on GPT-5.5 ($30/M output), then switch to Opus 4.7 ($25/M output). True cost is $30. A status line script can only see "current model = Opus, total_output = 1M" and reports $25. Off by 20%.

Proposal

Expose the per-model breakdown alongside the existing totals:

context_window: {
  total_input_tokens,
  total_output_tokens,
  total_cache_read_tokens,
  total_cache_write_tokens,
  total_reasoning_tokens,
  ...

  // NEW: array of per-model usage slices
  model_usage: [
    {
      model_id: "gpt-5.5",
      model_display_name: "GPT-5.5",
      input_tokens: 700000,
      output_tokens: 700000,
      cache_read_tokens: 700000,
      cache_write_tokens: 35000,
      reasoning_tokens: 0,
      requests: 12
    },
    {
      model_id: "claude-opus-4.7",
      model_display_name: "Claude Opus 4.7",
      input_tokens: 300000,
      output_tokens: 300000,
      cache_read_tokens: 300000,
      cache_write_tokens: 15000,
      reasoning_tokens: 5000,
      requests: 4
    }
  ]
}

This data already exists internally as modelMetrics and is exactly what /usage displays today. Exposing it would be a one-line payload change.

Use case

Any status line script that surfaces session cost in USD (using the published model pricing) would become correct for multi-model sessions. The script just iterates model_usage, looks up each model's rates, and sums.

Current workaround (and why it's bad)

Parsing ~/.copilot/session-state/<id>/events.jsonl for assistant.message.outputTokens + model per turn — fragile (race conditions with the writer), incomplete (input/cache tokens are NOT logged per call, only aggregated at session.shutdown), and forces every status line author to reinvent the same incremental-parsing + caching machinery.

Related

Companion request for total_files_modified: #3404

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:configurationConfig files, instruction files, settings, and environment variablesarea:modelsModel selection, availability, switching, rate limits, and model-specific behavior
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions