feat: echo STT transcripts to thread before agent reply#571
feat: echo STT transcripts to thread before agent reply#571dogzzdogzz wants to merge 1 commit intoopenabdev:mainfrom
Conversation
ee10184 to
7f74166
Compare
When STT transcribes a voice message, optionally post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Default is OFF — opt in via [stt] echo_transcript = true. - New config: [stt] echo_transcript (default false, opt-in) - New helper: stt::post_echo with platform-agnostic ChatAdapter handle — future LINE/Telegram/Teams adapters get echo for free - Format: > 🎤 <transcript> per clip, all in one thread message - Failure: > 🎤 (transcription failed) line +⚠️ reaction on the user msg - Helm: agents.<name>.stt.echoTranscript (camelCase) wired through configmap - Docs: docs/stt.md and docs/config-reference.md updated Rebased on top of openabdev#567 (gateway config rendering). Tests: 133/133 cargo. helm-unittest: 28/28. Clippy --all-targets -D warnings clean.
7f74166 to
6bc70f6
Compare
OpenAB PR ScreeningThis is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Screening report## IntentPR #571 makes voice-message STT behavior visible to users by posting the transcript back into the same Discord or Slack thread before the agent replies. The operator-visible problem is that users currently have no quick way to confirm what the bot heard before it acts on the transcription. FeatFeature. It adds configurable STT transcript echoing for Discord and Slack:
Who It ServesPrimary beneficiaries: Discord and Slack end users. Secondary beneficiaries: maintainers and deployers, because the behavior is centralized behind a platform-agnostic STT helper and configurable through normal config and Helm paths. Rewritten PromptImplement configurable STT transcript echoing for voice-message workflows. When Discord or Slack audio attachments are transcribed, post one transcript echo message into the same thread before dispatching the agent reply. Preserve upload order. Use the format Add Merge PitchThis is a useful UX improvement with a modest implementation surface. It makes STT behavior auditable in the conversation itself and reduces confusion when the agent responds to a misheard voice message. Risk is moderate-low. The likely reviewer concerns are message ordering, accidental mentions or formatting injection, adapter-specific failure behavior, and whether the echo happens exactly once per user message before the agent response. Best-Practice ComparisonOpenClaw principles that fit:
OpenClaw principles that do not strongly fit:
Hermes Agent principles that fit:
Hermes Agent principles that do not strongly fit:
Implementation OptionsOption 1: Conservative adapter-local echo Option 2: Balanced shared STT helper Option 3: Ambitious cross-platform transcript event model Comparison Table
RecommendationAdvance the balanced shared-helper approach. It gives users the visible transcript behavior now, keeps the implementation reviewable, and leaves a clean extension point for future adapters without forcing gateway/media architecture work into this PR. For merge discussion, focus review on ordering, no-mention formatting, failure behavior, and whether echo-post failures should be non-blocking. Follow-up work should be split separately for gateway audio support, durable delivery tracking, and the pre-existing multi-clip prompt-ordering issue. |
Summary
When STT transcribes a voice message, post the transcript back to the thread (no mentions) before the agent reply so users can verify what was heard. Discord and Slack today; platform-agnostic helper means future adapters get it for free.
> 🎤 <transcript>per clip.> 🎤 (transcription failed)line +[stt] echo_transcript = false(defaulttrue, mirrored asstt.echoTranscriptin Helm values).Closes #570.
Originally requested in Discord: https://discord.com/channels/1491295327620169908/1491365150664560881/1497784772230123560
Architecture
stt::post_echo(&Arc<dyn ChatAdapter>, &ChannelRef, &MessageRef, &[EchoEntry], &SttConfig)is the platform-agnostic helper. Discord (`src/discord.rs`) and Slack (`src/slack.rs`) collect a `Vec` while iterating audio attachments and call the helper before the agent dispatch. Gateway-based platforms (LINE / Telegram / future Teams) intentionally not wired today — their protocol carries text only. The helper signature is unchanged when audio plumbing lands there later.Files changed
Test plan
Out of scope / follow-ups
🤖 Generated with Claude Code