Skip to content

SpeechConfig.language_code does not constrain input audio transcription language in Live API sessions #5542

@amitanshupanigrahi2704

Description

@amitanshupanigrahi2704

Labels: bug, live-api

Description

When using the ADK's run_live() with native audio models (Gemini 2.0 Flash Live), setting language_code="en-US" in SpeechConfig does not constrain the input audio transcription language. The transcribed user input still intermittently appears in other languages (e.g., Hindi, Spanish, Japanese) even though the speaker is speaking English.

Configuration

run_config = RunConfig(
    streaming_mode=StreamingMode.BIDI,
    response_modalities=["AUDIO"],
    speech_config=types.SpeechConfig(
        language_code="en-US",
        voice_config=types.VoiceConfig(
            prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Kore")
        ),
    ),
    input_audio_transcription=types.AudioTranscriptionConfig(),
    output_audio_transcription=types.AudioTranscriptionConfig(),
)

Observed behaviour

  • Output speech is correctly in English.
  • Input transcription (input_audio_transcription events) frequently returns non-English text even though the user is speaking English exclusively. For example, an English sentence might be transcribed in Devanagari script or mixed with non-Latin characters.

Expected behaviour

Setting language_code="en-US" in SpeechConfig should constrain both the output speech language and the input speech-to-text transcription language. At minimum, the input transcription should respect the configured language and not produce text in unrelated languages.

Root cause analysis

AudioTranscriptionConfig is currently an empty class (pass) with no language_code or similar field:

class AudioTranscriptionConfig(_common.BaseModel):
  """The audio transcription configuration in Setup."""
  pass

There appears to be no mechanism to pass the desired transcription language to the underlying STT model for input audio. The SpeechConfig.language_code seems to only affect TTS output, not STT input.

Suggested fix

Either:

  1. Propagate SpeechConfig.language_code to the input transcription pipeline so it constrains the STT language, or
  2. Add a language_code field to AudioTranscriptionConfig so developers can explicitly set the transcription language:
    input_audio_transcription=types.AudioTranscriptionConfig(language_code="en-US")

Environment

  • ADK version: google-adk 1.x (latest)
  • google-genai SDK version: latest
  • Model: gemini-2.5-flash-native-audio-preview-12-2025
  • Streaming mode: BIDI (WebSocket)

Metadata

Metadata

Labels

live[Component] This issue is related to live, voice and video chatrequest clarification[Status] The maintainer need clarification or more information from the author

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions