feat: echo STT transcripts to thread before agent reply by dogzzdogzz · Pull Request #571 · openabdev/openab

dogzzdogzz · 2026-04-26T03:17:59Z

Summary

When STT transcribes a voice message, post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Discord and Slack today; platform-agnostic helper means future adapters get it for free.

One thread message per user message: > 🎤 <transcript> per clip.
Failure → > 🎤 (transcription failed) line + ⚠️ reaction on the user's original message.
Opt-out via [stt] echo_transcript = false (default true, mirrored as stt.echoTranscript in Helm values).

Closes #570.

Originally requested in Discord: https://discord.com/channels/1491295327620169908/1491365150664560881/1497784772230123560

Architecture

stt::post_echo(&Arc<dyn ChatAdapter>, &ChannelRef, &MessageRef, &[EchoEntry], &SttConfig) is the platform-agnostic helper. Discord (`src/discord.rs`) and Slack (`src/slack.rs`) collect a `Vec` while iterating audio attachments and call the helper before the agent dispatch. Gateway-based platforms (LINE / Telegram / future Teams) intentionally not wired today — their protocol carries text only. The helper signature is unchanged when audio plumbing lands there later.

Files changed

`src/config.rs` — `SttConfig.echo_transcript: bool` (default `true`).
`src/stt.rs` — `EchoEntry` enum, `format_echo_message`, `post_echo` with `MockAdapter`-driven tests.
`src/discord.rs`, `src/slack.rs` — wire echo into the audio attachment loop, call `post_echo` before `router.handle_message`.
`charts/openab/values.yaml`, `charts/openab/templates/configmap.yaml` — expose `echoTranscript` (default `true`, `hasKey` guard preserves the default while distinguishing unset vs. explicit `false`).
`docs/stt.md`, `docs/config-reference.md` — document `echo_transcript`.
`docs/superpowers/specs/` and `docs/superpowers/plans/` — design spec + TDD-style implementation plan that drove this work.

Test plan

`cargo test --bin openab` — 133/133 pass (10 in `stt::tests` cover format, post_echo success, failure, mixed, disabled config, empty entries).
`cargo clippy --all-targets -- -D warnings` — clean.
`helm lint charts/openab` — clean.
`helm template ...` with default values renders `echo_transcript = true`; with `--set agents.kiro.stt.echoTranscript=false` renders `echo_transcript = false`.
Manual smoke test: send a voice message in Discord — verify the bot posts `> 🎤 ` before the agent's reply.
Manual smoke test: same in Slack.
Manual smoke test: simulate STT failure (e.g. revoke API key briefly or attach an unsupported file) — verify the failure line + ⚠️ reaction.

Out of scope / follow-ups

LINE / Telegram / Teams via gateway — those need audio plumbing in the gateway protocol first. The helper signature accommodates them when that work lands.
Multi-clip ordering: `extra_blocks.insert(0, …)` reverses transcript order in the agent prompt while `echo_entries.push(…)` preserves upload order. Pre-existing in the agent-prompt path; out of scope for this PR.

🤖 Generated with Claude Code

When STT transcribes a voice message, optionally post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Default is OFF — opt in via [stt] echo_transcript = true. - New config: [stt] echo_transcript (default false, opt-in) - New helper: stt::post_echo with platform-agnostic ChatAdapter handle — future LINE/Telegram/Teams adapters get echo for free - Format: > 🎤 <transcript> per clip, all in one thread message - Failure: > 🎤 (transcription failed) line + ⚠️ reaction on the user msg - Helm: agents.<name>.stt.echoTranscript (camelCase) wired through configmap - Docs: docs/stt.md and docs/config-reference.md updated Rebased on top of openabdev#567 (gateway config rendering). Tests: 133/133 cargo. helm-unittest: 28/28. Clippy --all-targets -D warnings clean.

shaun-agent · 2026-05-02T10:48:20Z

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Title: feat: echo STT transcripts to thread before agent reply
Source: feat: echo STT transcripts to thread before agent reply #571
Status: moved to PR-Screening
Generated at: 2026-05-02T10:48:19.455Z
Discord thread: https://discord.com/channels/1488041051187974246/1500086347174776893

Screening report

## Intent

PR #571 makes voice-message STT behavior visible to users by posting the transcript back into the same Discord or Slack thread before the agent replies. The operator-visible problem is that users currently have no quick way to confirm what the bot heard before it acts on the transcription.

Feat

Feature. It adds configurable STT transcript echoing for Discord and Slack:

Successful clip: > 🎤 <transcript>
Failed clip: > 🎤 (transcription failed) plus a warning reaction on the original message
Config opt-out: [stt] echo_transcript = false
Helm exposure: stt.echoTranscript

Who It Serves

Primary beneficiaries: Discord and Slack end users.

Secondary beneficiaries: maintainers and deployers, because the behavior is centralized behind a platform-agnostic STT helper and configurable through normal config and Helm paths.

Rewritten Prompt

Implement configurable STT transcript echoing for voice-message workflows.

When Discord or Slack audio attachments are transcribed, post one transcript echo message into the same thread before dispatching the agent reply. Preserve upload order. Use the format > 🎤 <transcript> for successful clips and > 🎤 (transcription failed) for failed clips. On failure, also add a warning reaction to the original user message where the adapter supports it.

Add stt.echo_transcript, defaulting to true, with Helm value support as stt.echoTranscript. Keep the echo logic platform-agnostic so future adapters can reuse it when audio support exists. Add unit tests for formatting, disabled config, success, failure, mixed results, and empty input. Update STT and config documentation.

Merge Pitch

This is a useful UX improvement with a modest implementation surface. It makes STT behavior auditable in the conversation itself and reduces confusion when the agent responds to a misheard voice message.

Risk is moderate-low. The likely reviewer concerns are message ordering, accidental mentions or formatting injection, adapter-specific failure behavior, and whether the echo happens exactly once per user message before the agent response.

Best-Practice Comparison

OpenClaw principles that fit:

Explicit delivery routing is relevant. The echo must go to the same channel/thread as the original user message, not a guessed destination.
Run visibility is relevant in spirit. Echoing transcripts gives users a lightweight conversational audit trail of what STT produced.
Retry/backoff is only partly relevant. Echo failure should probably not block agent dispatch unless the product explicitly wants that.

OpenClaw principles that do not strongly fit:

Gateway-owned scheduling, durable job persistence, and isolated executions are not central here because this is synchronous chat-message handling, not scheduled work.

Hermes Agent principles that fit:

Self-contained prompts are indirectly relevant. The agent should still receive a clear transcript block independent of whether the user-facing echo succeeds.
Atomic persisted state is not directly applicable unless transcript echo state becomes durable later.

Hermes Agent principles that do not strongly fit:

Gateway daemon ticks, file locking, fresh scheduled sessions, and schedule overlap prevention are not relevant to this PR’s core behavior.

Implementation Options

Option 1: Conservative adapter-local echo
Wire transcript echo directly in Discord and Slack handlers with minimal shared code. Keep config support but avoid a broad helper abstraction.

Option 2: Balanced shared STT helper
Use the current proposed shape: EchoEntry, shared formatting, post_echo, config default-on behavior, Discord/Slack wiring, Helm/docs/tests. Keep gateway platforms out of scope until they have audio plumbing.

Option 3: Ambitious cross-platform transcript event model
Add a first-class transcript event path across adapters/gateway, with durable echo state, retry/backoff, delivery logs, and future support for LINE, Telegram, and Teams when audio transport exists.

Comparison Table

Option	Speed to ship	Complexity	Reliability	Maintainability	User impact	Fit for OpenAB right now
Conservative adapter-local echo	High	Low	Medium	Medium-low	Medium	Good for quick patch, weaker long-term shape
Balanced shared STT helper	Medium-high	Medium	High	High	High	Best fit
Ambitious transcript event model	Low	High	Highest if completed	Medium-high	Highest long term	Too large for this PR

Recommendation

Advance the balanced shared-helper approach.

It gives users the visible transcript behavior now, keeps the implementation reviewable, and leaves a clean extension point for future adapters without forcing gateway/media architecture work into this PR. For merge discussion, focus review on ordering, no-mention formatting, failure behavior, and whether echo-post failures should be non-blocking.

Follow-up work should be split separately for gateway audio support, durable delivery tracking, and the pre-existing multi-clip prompt-ordering issue.

dogzzdogzz requested a review from thepagent as a code owner April 26, 2026 03:18

github-actions Bot added the pending-screening PR awaiting automated screening label Apr 26, 2026

dogzzdogzz force-pushed the feat/stt-transcript-echo branch 2 times, most recently from ee10184 to 7f74166 Compare April 26, 2026 03:34

dogzzdogzz force-pushed the feat/stt-transcript-echo branch from 7f74166 to 6bc70f6 Compare April 26, 2026 03:42

CHC-Agent mentioned this pull request Apr 27, 2026

STT: echo transcript to thread before agent reply (Discord + Slack) #570

Open

thepagent added p2 Medium — planned work feature labels Apr 28, 2026

github-actions Bot added needs-rebase pending-contributor and removed needs-rebase pending-contributor labels Apr 30, 2026

github-actions Bot added pending-contributor needs-rebase and removed needs-rebase pending-contributor labels May 1, 2026

github-actions Bot added needs-rebase pending-contributor labels May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: echo STT transcripts to thread before agent reply#571

feat: echo STT transcripts to thread before agent reply#571
dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
dogzzdogzz:feat/stt-transcript-echo

dogzzdogzz commented Apr 26, 2026

Uh oh!

shaun-agent commented May 2, 2026

Feat

Who It Serves

Rewritten Prompt

Merge Pitch

Best-Practice Comparison

Implementation Options

Comparison Table

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dogzzdogzz commented Apr 26, 2026

Summary

Architecture

Files changed

Test plan

Out of scope / follow-ups

Uh oh!

shaun-agent commented May 2, 2026

OpenAB PR Screening

Feat

Who It Serves

Rewritten Prompt

Merge Pitch

Best-Practice Comparison

Implementation Options

Comparison Table

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants