Skip to content

claude-opus-4.7-1m-internal with contextTier: long_context triggers compaction at 18% of real capacity — uses 128K prompt limit instead of 936K #3677

@aragorn18

Description

@aragorn18

Describe the bug

Summary

CompactionProcessor triggers compaction (and the downstream summarizer) at 18% of the model's actual context capacity. Root cause: the CLI fetches model capabilities from two different sources and uses the smaller (non-long-context) blob for its local capacity check, ignoring the long_context tier the user has selected. Tool descriptions alone (160K tokens with a standard MCP loadout) already exceed the artificial 128K cap, so compaction fires on the first turn of every session, producing "empty session compaction" checkpoints where the summarizer silently discards real conversation history.


Evidence (from a live session log)

Two model-capability blobs received during one session, same model ID:

[capabilities, from /v1/models listing — line 6233]
"capabilities": {
"family": "claude-opus-4.7-1m-internal",
"limits": {
"max_context_window_tokens": 1000000,
"max_prompt_tokens": 936000,
...
}
}

[Got model info, used at runtime — line 6414]
"id": "claude-opus-4.7-1m-internal",
"capabilities": {
"limits": {
"max_prompt_tokens": 128000,
"max_context_window_tokens": 200000,
...
}
}

contextTier: long_context IS being sent on every request (29 occurrences in one session log):

"config_contextTier": "long_context",
"config_model": "claude-opus-4.7-1m-internal",

The compaction event itself:

CompactionProcessor: Context at 143.4% utilization - starting background compaction...
BasicTruncator truncated: {
"preTruncationTokensInMessages": 183490,
"tokenLimit": 128000,
"toolDefinitionsTokenCount": 159955,
...
}

Recurrence:

  • Line 27094: first compaction at 143.4% (start of session, before any meaningful user turn)
  • Line 363795: second compaction at 140.4% (a few turns later)
  • Session has 4 checkpoint files written today: 001-empty-session-pre-compaction.md, 002, 003, 004 — all titled "empty session compaction" despite an extended active conversation.

Actual behavior

  • CompactionProcessor uses tokenLimit: 128000 regardless of contextTier setting.
  • Compaction fires on turn 1 due to MCP tool descriptions alone exceeding the cap.
  • Compaction → summarizer discards real conversation; checkpoints titled "empty session compaction" persist mid-session.

Suspected fix locations (informed guesses)

  1. The Got model info code path (around the [DEBUG] Got model info: log line) — the structure returned here looks like a default model card and doesn't appear to be tier-aware. Either the request needs to include the tier when fetching capabilities, or the response merging needs to overlay long-context limits when the user's tier is long_context.
  2. CompactionProcessor's threshold computation — should read from the tier-resolved limit, not the base-model limit.

Impact

  • Users on long-context tiers effectively get 128K prompt budget instead of the 936K they're paying premium for.
  • Every session leaks real conversation history into "empty" summaries the user can't recover from.
  • High-tool-count MCP setups (Microsoft internal users with the Agency bundle are a large group) trip this on turn 1, making the bug unmissable but easy to misattribute to "the model forgot what we said".

Workarounds (none are good)

  • Aggressively /clear between unrelated tasks.
  • Disable MCP servers to shrink tool definitions below ~100K (loses functionality).
  • Manually /compact early so the user controls what gets summarized rather than letting the auto-compactor pick the boundaries.

Affected version

1.0.59

Steps to reproduce the behavior

Steps to reproduce

  1. On Windows, install Copilot CLI 1.0.59.
  2. In ~/.copilot/settings.json, set "model": "claude-opus-4.7-1m-internal" and "contextTier": "long_context".
  3. Configure ~10 MCP servers totaling 80+ tools (sufficient to push toolDefinitionsTokenCount above 128K — easily reached with the official Agency bundle: ADO, ICMProd, Kusto, Mail, Teams, Calendar, SharePoint, ServiceTree, Memory, etc.).
  4. Start a new session, send any message.
  5. Observe in ~/.copilot/logs/process-*.log: - CompactionProcessor: Context at >100% utilization on the first turn
  • BasicTruncator reporting tokenLimit: 128000
  • A checkpoint file written to ~/.copilot/session-state//checkpoints/ titled "empty session compaction"
  1. Continue working. Multiple "empty session" checkpoints will be written even on conversations far below 1M tokens. The summarizer fed the conversation will sometimes return "no substantive work was performed" because the surface area it sees vs. its expected ceiling is so anomalous.

Expected behavior

Expected behavior

With contextTier: long_context set:

  • The local CompactionProcessor should use max_prompt_tokens: 936000 (the long-context value).
  • Compaction should fire near the actual ceiling, not at ~14% of it.
  • The runtime Got model info capability blob should reflect the selected tier, OR CompactionProcessor should select limits based on the configured tier rather than the default blob.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:context-memoryContext window, memory, compaction, checkpoints, and instruction loadingarea:modelsModel selection, availability, switching, rate limits, and model-specific behavior

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions