diff --git a/docs/adr/context-aware-token.md b/docs/adr/context-aware-token.md
new file mode 100644
index 00000000..172f558b
--- /dev/null
+++ b/docs/adr/context-aware-token.md
@@ -0,0 +1,415 @@
+# ADR: Context-Aware Token for Agent-Initiated Platform Operations
+
+- **Status:** Proposed
+- **Date:** 2026-05-03
+- **Author:** @chaodu-agent
+- **Related:** #339, PR #527 (superseded)
+
+---
+
+## 1. Context & Problem
+
+OAB agents today are **passive receivers** — they get a prompt from the adapter and return a response. But real-world usage reveals scenarios where agents need to **actively interact** with the platform:
+
+| Scenario | Current State | Desired State |
+|---|---|---|
+| Update thread title to reflect task status | Agent uses `curl` via steering doc hack | Agent calls Discord API directly |
+| Fetch a specific historical message | ❌ Not possible — agent only sees conversation window | Agent fetches any message by ID |
+| Notify a bot in another channel | ❌ Not possible — agent is confined to current channel | Agent sends cross-channel messages |
+| Ping another bot to trigger a reaction | ❌ Not possible | Agent mentions bot in target channel |
+
+### Why Not PR #527's Approach?
+
+PR #527 proposed always prepending quoted message content to the agent prompt at the OAB transport layer. While well-implemented, this approach:
+
+- **Always pays the cost** (~500 tokens per reply) even when the agent already has the context from conversation history
+- **Only solves one edge case** (reply/quote context) out of the broader set of agent-initiated operations
+- **Puts the decision in the wrong layer** — OAB (transport) decides what context the agent needs, instead of the agent deciding for itself
+
+A context-aware token lets the agent **pull context on demand** — only when it determines the context is needed.
+
+---
+
+## 2. Prior Art & Industry Research
+
+### OpenClaw
+
+**How it works:** OpenClaw uses a **mediated architecture** — agents never get direct API tokens or raw platform access. All platform interactions flow through the Gateway + Channel Adapter layer:
+
+```
+Agent → Gateway (tool call / response) → Channel Outbound Adapter → Platform API
+Platform API → Channel Monitor → Gateway (normalized MsgContext) → Agent
+```
+
+Key mechanisms:
+
+- **50-action message action system** — agents invoke named actions (`send`, `thread-create`, `read`, `edit`, `delete`, `search`, `pin`, `role-add`, etc.) dispatched to channel plugins. Each action is gated by `supportsAction` checks against a `ChannelCapabilities` object declaring what each platform supports.
+- **Session tools for cross-channel operations** — `sessions_list`, `sessions_history`, `sessions_send`, `sessions_spawn` let agents discover sessions, fetch transcripts, send messages across channels, and spawn sub-agents. Inter-session messages are tagged with `message.provenance.kind = "inter_session"` for auditability.
+- **Layered policy model** — security enforced at multiple levels: Tool Profile → Provider Profile → Global Policy → Provider Policy → Agent Policy → Group Policy → Sandbox Policy. Later sources override earlier ones.
+- **Send Policy** — configurable deny/allow rules for agent-initiated messages by channel and chat type, enforced at the gateway level.
+- **Session-based trust boundaries** — `main` sessions (operator) get full host access; `dm` and `group` sessions are sandboxed in Docker containers by default.
+
+**Security lessons from advisories:**
+
+| Advisory | Risk | Lesson for OpenAB |
+|---|---|---|
+| [GHSA-v3qc-wrwx-j3pw](https://github.com/openclaw/openclaw/security/advisories/GHSA-v3qc-wrwx-j3pw) (High) | LLM agent disabled exec approval via config.patch | Behavioral constraints alone are insufficient — agents can modify their own config to bypass them |
+| [GHSA-2rqg-gjgv-84jm](https://github.com/openclaw/openclaw/security/advisories/GHSA-2rqg-gjgv-84jm) (High, CVSS 8.8) | Workspace boundary override via attacker-controlled params | Gateway must enforce boundaries regardless of caller overrides |
+| [GHSA-7jx5-9fjg-hp4m](https://github.com/openclaw/openclaw/security/advisories/GHSA-7jx5-9fjg-hp4m) (Moderate) | ACP permission auto-approval bypass via untrusted metadata | Auto-approval heuristics trusting untrusted metadata are dangerous |
+
+**Key insight:** The mediated architecture IS the security boundary. Agents never touch platform APIs directly — the Gateway enforces what's allowed.
+
+References:
+- Channel plugin types: [`src/channels/plugins/types.plugin.ts`](https://github.com/openclaw/openclaw)
+- Session tools: [docs/concepts/session-tool](https://molty.finna.ai/docs/concepts/session-tool)
+- Security advisories: [github.com/openclaw/openclaw/security](https://github.com/openclaw/openclaw/security)
+
+### Hermes Agent
+
+**How it works:** Hermes Agent uses a **tool-based architecture** — platform operations are exposed as LLM tools with OpenAI function-calling schemas. The agent decides when to call them.
+
+Key mechanisms:
+
+- **Discord tool** ([`tools/discord_tool.py`](https://github.com/NousResearch/hermes-agent)) — 14 actions split into two toolsets: `discord` (core: `fetch_messages`, `search_members`, `create_thread`) and `discord_admin` (server management). Uses Discord REST API directly with the bot token.
+- **Cross-platform `send_message` tool** ([`tools/send_message_tool.py`](https://github.com/NousResearch/hermes-agent)) — supports 17+ platforms (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, etc.) with human-friendly name resolution via `gateway/channel_directory.py`.
+- **Dynamic schema based on capabilities** — Discord tool detects bot intents via `GET /applications/@me` and hides unavailable actions from the schema (e.g., `search_members` hidden if GUILD_MEMBERS intent is missing). This prevents the LLM from hallucinating calls to tools it can't use.
+- **Config-based action allowlist** — `discord.server_actions` in config.yaml restricts which actions the agent can perform. Runtime re-checks the allowlist even if a stale cached schema exposes a disabled action.
+- **Defense-in-depth** — three layers: (1) schema filtering removes unavailable actions, (2) runtime allowlist check at dispatch, (3) platform-level permission errors (Discord 403) handled gracefully with actionable guidance.
+- **Error redaction** — `_sanitize_error_text()` strips secrets (GitHub PATs, Bearer tokens, API keys) from error messages before they reach the LLM.
+
+**Gaps relevant to OpenAB:**
+- Only Discord has a comprehensive read tool (`fetch_messages`). Other platforms (Telegram, Slack, etc.) have no equivalent — agents can send to them but not read from them.
+- No formal capability/scope system beyond the Discord config allowlist. The agent either has the tool or doesn't.
+
+**Key insight:** Tools provide a structured, auditable interface vs raw API access. The schema itself acts as a capability declaration — the agent can only call what's in the schema.
+
+References:
+- Discord tool: [`tools/discord_tool.py`](https://github.com/NousResearch/hermes-agent)
+- Send message tool: [`tools/send_message_tool.py`](https://github.com/NousResearch/hermes-agent)
+- Tool registry: [`tools/registry.py`](https://github.com/NousResearch/hermes-agent)
+- Security docs: [hermes-agent.nousresearch.com/docs/user-guide/security](https://hermes-agent.nousresearch.com/docs/user-guide/security)
+
+### Comparison
+
+| Aspect | OpenClaw | Hermes Agent | OpenAB (this ADR) |
+|---|---|---|---|
+| Agent-to-platform interface | Mediated — Gateway dispatches named actions | Tool-based — LLM tools with function-calling schemas | Direct — agent calls platform API with token |
+| Security enforcement | Technical — Gateway enforces layered policy | Technical — schema gating + runtime allowlist + platform permissions | Behavioral — steering docs define allow/deny (see §5 for evolution path) |
+| Cross-channel operations | `sessions_send` with provenance tagging | `send_message` tool with 17+ platform support | Agent uses token + `curl` |
+| Message fetching | `sessions_history` + `read` action | `discord_tool.fetch_messages` (Discord only) | Agent calls REST API directly |
+| Audit trail | Inter-session provenance tags | Tool call logs | None built-in (see §5 for planned additions) |
+
+### Why OpenAB Diverges
+
+Both OpenClaw and Hermes use **mediated architectures** where the platform (Gateway or tool runtime) controls what agents can do. OpenAB's context-aware token takes a different path: **direct agent access with behavioral constraints**.
+
+This is a deliberate tradeoff:
+
+1. **OAB's architecture** — OAB is a passive transport layer by design. Adding a Gateway mediation layer or tool runtime would be a fundamental architecture change, not an incremental feature.
+2. **Agent diversity** — OAB supports 4+ different agent runtimes (Kiro CLI, Claude Code, Codex, Gemini). A mediated approach would require each runtime to integrate with OAB's tool/action system. Direct token access works with any agent that has shell access.
+3. **Pragmatism** — The pattern already works in production (超渡法師 uses `curl` + bot token for thread title updates). This ADR formalizes and hardens what's already happening.
+
+The security gap is real and acknowledged — §5 below describes the evolution path from behavioral-only to defense-in-depth.
+
+---
+
+## 3. Proposed Design
+
+### Core Concept
+
+Give the agent a **scoped platform token** (e.g., `DISCORD_CONTEXT_TOKEN`) that it can use to perform platform API calls when it judges them necessary. The token is configured by the user in their steering/tools definition, not by OAB core.
+
+```
+OAB Layer (transport)              Agent Layer (intelligence)
+─────────────────────              ────────────────────────
+BOT_TOKEN                          DISCORD_CONTEXT_TOKEN
+Passive: receive msg, send reply   Active: fetch, notify, update
+OAB doesn't change                 User defines allowed operations
+Adapter responsibility             Agent autonomy
+```
+
+### How It Works
+
+1. User sets `DISCORD_CONTEXT_TOKEN` in the agent's environment (same bot token or a separate scoped token)
+
+> ⚠️ **Security note:** Using the same bot token as `DISCORD_CONTEXT_TOKEN` grants the agent full bot permissions — this is a convenience shortcut suitable only for trusted, single-operator deployments. For production or multi-tenant environments, use a separate token with minimal scopes when the platform supports it. See §5 for the security evolution path.
+2. User defines allowed operations in `tools.md` or steering docs
+3. Agent decides at runtime when to use the token — e.g., "user said 'why?' and I'm not sure what they're referring to, let me fetch the referenced message"
+4. OAB core is unaware of this — it's purely an agent-side capability
+
+### Scope Definition (User-Controlled)
+
+The trust boundary is initially defined by the user in steering docs. **This is a documentation convention, not a security boundary** — the agent has the full token and is behaviorally constrained, not technically restricted. See §5 for the evolution path toward technical enforcement.
+
+```markdown
+# Discord Context Tools
+
+You have DISCORD_CONTEXT_TOKEN for platform operations.
+
+## Allowed
+- Update current thread title
+- Fetch messages in current channel/thread
+- Send messages to specified channels (cross-channel notify)
+- Add reactions
+
+## Not Allowed
+- Delete messages
+- Modify server settings
+- Manage roles/permissions
+- Create/delete channels
+```
+
+---
+
+## 4. Use Cases
+
+### 4a. Smart Quote Resolution (Replaces PR #527)
+
+Instead of always prepending quoted content:
+
+```
+User replies to a message: "why?"
+  │
+  ├─ Agent sees "why?" in prompt
+  ├─ Agent checks conversation history — enough context? → respond directly
+  ├─ Not enough context? → use token to fetch referenced message
+  └─ Now respond with full understanding
+```
+
+**Benefit:** Zero extra tokens when context is already available. Only fetches when genuinely needed.
+
+### 4b. Cross-Channel Bot Coordination
+
+```
+User: "ask 普渡法師 in #claude-room to review this code"
+  │
+  ├─ Agent uses token to send message to #claude-room
+  ├─ Message mentions 普渡法師 bot
+  └─ 普渡法師 receives the message and starts working
+```
+
+### 4c. Thread Title Management
+
+```
+Agent finishes reviewing PR #527
+  │
+  ├─ Agent uses token to update thread title
+  └─ "🔢 PR #527 reviewed"
+```
+
+This is already happening today via steering doc + `curl`. The token formalizes it.
+
+### 4d. Historical Context Retrieval
+
+```
+User: "what did Jack say about this yesterday?"
+  │
+  ├─ Agent searches conversation history — not in window
+  ├─ Agent uses token to fetch recent messages from channel
+  └─ Finds Jack's message and responds
+```
+
+---
+
+## 5. Security Model
+
+### Current State: Behavioral Constraints Only
+
+| Concern | Mitigation | Limitation |
+|---|---|---|
+| Token is same as BOT_TOKEN — full permissions | Steering docs define allow/deny list | Agent can ignore steering docs (prompt injection, hallucination) |
+| Agent could misuse token | Steering docs define explicit scope | No technical enforcement — relies on agent compliance |
+| Token leaked in logs | Agent instructed to reference by env var name | No redaction layer — agent could still echo the value |
+| Cross-channel abuse | Steering docs restrict target channels | No runtime validation of target channels |
+
+**This is insufficient for production use.** OpenClaw's security advisories demonstrate that behavioral constraints alone fail — their [GHSA-v3qc-wrwx-j3pw](https://github.com/openclaw/openclaw/security/advisories/GHSA-v3qc-wrwx-j3pw) showed an LLM agent disabling its own exec approval via config modification.
+
+### Evolution Path: Behavioral → Defense-in-Depth
+
+The security model evolves across four maturity levels (distinct from the [Rollout Plan in §10](#10-rollout-plan), which tracks implementation milestones):
+
+**Level 1 (Behavioral only — current):**
+- Steering docs define allowed operations
+- Suitable for trusted, operator-controlled agents only
+- Acceptable risk: operator is the user AND the admin
+
+**Level 2 (Audit logging):**
+- OAB logs all outbound HTTP calls from agent processes (network-level observation)
+- No enforcement, but provides visibility for post-incident analysis
+- Implementation: eBPF-based network monitoring or HTTP proxy with logging
+
+**Level 3 (Proxy enforcement):**
+- Agent's `DISCORD_CONTEXT_TOKEN` routes through an OAB-controlled HTTP proxy
+- Proxy validates each API call against a configured allowlist:
+  ```
+  # Example proxy allowlist (config.toml)
+  [agent_proxy]
+  allowed_endpoints = [
+    "GET /channels/*/messages",      # fetch messages
+    "PATCH /channels/*",             # update thread title
+    "POST /channels/*/messages",     # send messages
+    "PUT /channels/*/messages/*/reactions/*",  # add reactions
+  ]
+  denied_endpoints = [
+    "DELETE *",                      # no deletions
+    "PUT /guilds/*",                 # no server modifications
+  ]
+  ```
+- Denied calls return 403 with audit log entry
+- Inspired by Hermes Agent's defense-in-depth (schema filtering + runtime allowlist + platform permissions)
+
+**Level 4 (True scoped tokens):**
+- If Discord (or other platforms) introduce fine-grained token scopes, swap the token
+- The agent-side interface doesn't change — only the token's actual permissions narrow
+
+### Comparison with Prior Art
+
+| Layer | OpenClaw | Hermes Agent | OpenAB Level 1 | OpenAB Level 3 |
+|---|---|---|---|---|
+| Schema/capability gating | ✅ `supportsAction` | ✅ Dynamic schema | ❌ | ❌ |
+| Runtime allowlist | ✅ Send Policy | ✅ Config allowlist | ❌ Steering docs only | ✅ Proxy allowlist |
+| Platform permission errors | ✅ | ✅ Graceful 403 handling | ✅ (passthrough) | ✅ (passthrough) |
+| Audit trail | ✅ Provenance tags | ✅ Tool call logs | ❌ | ✅ Proxy logs |
+| Secret redaction | ❌ | ✅ `_sanitize_error_text()` | ❌ | ✅ Proxy strips tokens from errors |
+
+---
+
+## 6. Concrete Resolution Path for Issue #339
+
+Issue [#339](https://github.com/openabdev/openab/issues/339) requests reply/quote context in agent prompts. PR #527 implemented always-on prepending; this ADR supersedes that approach with on-demand fetching.
+
+### How #339 Gets Resolved
+
+There are two distinct scenarios for context resolution. The token is only needed for the second.
+
+**Scenario A — Discord reply (user uses reply/quote feature):**
+
+Discord Gateway already sends `referenced_message` (full message object) and `message_reference` (with `message_id`, `channel_id`, `guild_id`) on reply messages (type 19). OAB's adapter receives this data and can passthrough it to the agent at near-zero cost.
+
+**Step 1 — OAB passthroughs reply metadata (minimal transport-layer change):**
+
+OAB already receives `referenced_message` from Discord's gateway. Instead of prepending the full quoted content (PR #527's approach), OAB injects only the metadata:
+
+```
+[Reply context: message_id=1234567890, channel_id=9876543210, author=Jack]
+```
+
+This costs ~20 tokens (vs ~500 for full content) and gives the agent enough information to decide whether to fetch.
+
+**Step 2 — Agent decides whether to fetch:**
+
+- If the conversation history already contains the referenced message → respond directly (zero extra cost)
+- If the agent needs the full content of the referenced message → use `DISCORD_CONTEXT_TOKEN` to call `GET /channels/{channel_id}/messages/{message_id}` and fetch it
+- If no token is configured → agent responds based on available context (graceful degradation)
+
+> **Note:** Because Discord already provides `referenced_message` on reply messages, OAB could alternatively passthrough the full content directly (like PR #527). The metadata-only approach is preferred because it keeps the transport layer minimal and lets the agent decide. Either way, the token is not strictly required for this scenario.
+
+**Scenario B — Non-reply historical lookup (no `referenced_message` available):**
+
+When a user asks about past messages without using Discord's reply feature (e.g., "what did Jack say about this yesterday?"), there is no `referenced_message` in the Gateway event. This is where the context-aware token provides unique value — the agent uses it to call `GET /channels/{channel_id}/messages` and search for relevant messages.
+
+**Step 3 — Steering doc template:**
+
+```markdown
+# Reply Context Resolution
+
+When you see `[Reply context: message_id=..., channel_id=..., author=...]`:
+1. Check if the referenced message is already in your conversation history
+2. If not, and you need it to respond accurately, fetch it:
+   curl -s -H "Authorization: Bot $DISCORD_CONTEXT_TOKEN" \
+     "https://discord.com/api/v10/channels/{channel_id}/messages/{message_id}"
+3. If DISCORD_CONTEXT_TOKEN is not available, respond based on available context
+
+# Historical Context Retrieval (no reply metadata)
+
+When a user references past messages without using Discord reply:
+1. Use the token to fetch recent messages from the channel:
+   curl -s -H "Authorization: Bot $DISCORD_CONTEXT_TOKEN" \
+     "https://discord.com/api/v10/channels/{channel_id}/messages?limit=50"
+2. Search the results for relevant context
+3. If DISCORD_CONTEXT_TOKEN is not available, ask the user to quote or reply to the specific message
+```
+
+### Acceptance Criteria for #339
+
+- [ ] OAB injects reply metadata (`message_id`, `channel_id`, `author`) into agent prompt — small transport-layer PR
+- [ ] Steering doc template published for agents to use the token for on-demand fetching
+- [ ] At least one agent (超渡法師) validated end-to-end: reply in Discord → agent fetches referenced message → responds with full context
+- [ ] Issue #339 closed with reference to this ADR and the implementing PRs
+
+---
+
+## 7. What Changes in OAB?
+
+**Minimal.** The key design principle is preserved — OAB remains a passive transport layer:
+
+- **Reply metadata injection** (for #339): OAB adds `[Reply context: message_id=..., channel_id=..., author=...]` to the prompt when a Discord message has `referenced_message`. This is a small, targeted change in `discord.rs` (~10 lines).
+- **Token passthrough**: OAB already passes environment variables to agent processes. No change needed — user adds `DISCORD_CONTEXT_TOKEN` to their agent config.
+- **Proxy (Level 3, optional)**: If/when proxy enforcement is added, it would be a new optional component, not a change to OAB core.
+
+---
+
+## 8. Relationship to Existing Features
+
+| Feature | Relationship |
+|---|---|
+| PR #527 (reply context) | **Superseded** — context-aware token solves the same problem more efficiently (on-demand vs always-on) |
+| Custom Gateway ADR | **Complementary** — gateway handles inbound webhooks; context-aware token handles agent-initiated outbound operations |
+| Multi-Platform Adapters ADR | **Complementary** — each platform can have its own scoped token type |
+| Steering docs | **Extended** — steering docs gain a new responsibility: defining token scope (Level 1 only — Level 3 moves to proxy enforcement) |
+
+---
+
+## 9. Open Questions
+
+| Question | Options | Notes |
+|---|---|---|
+| ~~One token per platform or unified?~~ | **Decided: Per-platform** — simpler and more secure | Start with Discord, extend later |
+| Should OAB inject the token automatically? | No — user configures it in agent env | Keeps OAB uninvolved |
+| Rate limiting on agent-initiated calls? | Level 1: rely on platform rate limits; Level 3: proxy-level rate limiting | Proxy can enforce per-agent rate limits |
+| How to handle platforms without API tokens? | N/A until needed | LINE, Telegram have different auth models |
+| Should OAB provide a proxy from Level 1? | No — start with behavioral constraints for trusted operators, add proxy when multi-tenant or untrusted agents are needed | Complexity should match the threat model |
+
+---
+
+## 10. Rollout Plan
+
+| Phase | Scope | Target | Acceptance Criteria |
+|---|---|---|---|
+| **Phase 1** | Document the pattern — steering doc template for Discord context token | v0.9.x | Template published, validated with 超渡法師 |
+| **Phase 2** | Resolve #339 — OAB injects reply metadata, agent fetches on demand | v0.9.x | #339 closed, end-to-end validation with at least one agent |
+| **Phase 3** | Formalize as `tools.md` convention across OpenAB agents | v0.10.x | All 4 agent runtimes have working steering doc templates |
+| **Phase 4** | Add optional audit logging for agent-initiated API calls | v0.10.x | Operator can see what API calls agents make |
+| **Phase 5** | Evaluate proxy enforcement for multi-tenant / untrusted agent scenarios | v0.11.x+ | Design doc for proxy layer if demand exists |
+
+---
+
+## Consequences
+
+### Positive
+
+- Agent gets platform awareness without OAB core changes
+- On-demand context fetching is more token-efficient than always-on prepending
+- Enables cross-channel coordination — a capability that was previously impossible
+- User controls the scope — no one-size-fits-all behavior imposed by OAB
+- Pattern extends naturally to other platforms
+- Concrete path to resolve #339 with minimal OAB changes
+
+### Negative
+
+- Phase 1 trust boundary is behavioral (steering docs), not technical — relies on agent compliance. Acceptable for trusted operators; insufficient for multi-tenant deployments.
+- Each user must configure the token and define scope — more setup burden
+- Agent-initiated API calls add latency when they occur
+- No centralized audit of what agents do with the token until Phase 4
+- Diverges from industry standard (mediated architecture) — conscious tradeoff for OAB's passive transport philosophy
+
+---
+
+## References
+
+- [Issue #339](https://github.com/openabdev/openab/issues/339) — Original feature request for reply/quote context
+- [PR #527](https://github.com/openabdev/openab/pull/527) — Implementation of always-on quote prepending (superseded by this ADR)
+- [ADR: Custom Gateway](./custom-gateway.md) — Complementary architecture for inbound webhook handling
+- [ADR: Multi-Platform Adapters](./multi-platform-adapters.md) — Platform-agnostic adapter layer
+- [OpenClaw Security Advisories](https://github.com/openclaw/openclaw/security) — Real-world security lessons for agent-platform interactions
+- [Hermes Agent Tools Runtime](https://hermes-agent.nousresearch.com/docs/developer-guide/tools-runtime) — Tool-based agent interaction architecture