Labels: bug, live-api
Description
When using the ADK's run_live() with native audio models (Gemini 2.0 Flash Live), setting language_code="en-US" in SpeechConfig does not constrain the input audio transcription language. The transcribed user input still intermittently appears in other languages (e.g., Hindi, Spanish, Japanese) even though the speaker is speaking English.
Configuration
run_config = RunConfig(
streaming_mode=StreamingMode.BIDI,
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
language_code="en-US",
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Kore")
),
),
input_audio_transcription=types.AudioTranscriptionConfig(),
output_audio_transcription=types.AudioTranscriptionConfig(),
)
Observed behaviour
- Output speech is correctly in English.
- Input transcription (
input_audio_transcription events) frequently returns non-English text even though the user is speaking English exclusively. For example, an English sentence might be transcribed in Devanagari script or mixed with non-Latin characters.
Expected behaviour
Setting language_code="en-US" in SpeechConfig should constrain both the output speech language and the input speech-to-text transcription language. At minimum, the input transcription should respect the configured language and not produce text in unrelated languages.
Root cause analysis
AudioTranscriptionConfig is currently an empty class (pass) with no language_code or similar field:
class AudioTranscriptionConfig(_common.BaseModel):
"""The audio transcription configuration in Setup."""
pass
There appears to be no mechanism to pass the desired transcription language to the underlying STT model for input audio. The SpeechConfig.language_code seems to only affect TTS output, not STT input.
Suggested fix
Either:
- Propagate
SpeechConfig.language_code to the input transcription pipeline so it constrains the STT language, or
- Add a
language_code field to AudioTranscriptionConfig so developers can explicitly set the transcription language:
input_audio_transcription=types.AudioTranscriptionConfig(language_code="en-US")
Environment
- ADK version: google-adk 1.x (latest)
- google-genai SDK version: latest
- Model:
gemini-2.5-flash-native-audio-preview-12-2025
- Streaming mode: BIDI (WebSocket)
Labels:
bug,live-apiDescription
When using the ADK's
run_live()with native audio models (Gemini 2.0 Flash Live), settinglanguage_code="en-US"inSpeechConfigdoes not constrain the input audio transcription language. The transcribed user input still intermittently appears in other languages (e.g., Hindi, Spanish, Japanese) even though the speaker is speaking English.Configuration
Observed behaviour
input_audio_transcriptionevents) frequently returns non-English text even though the user is speaking English exclusively. For example, an English sentence might be transcribed in Devanagari script or mixed with non-Latin characters.Expected behaviour
Setting
language_code="en-US"inSpeechConfigshould constrain both the output speech language and the input speech-to-text transcription language. At minimum, the input transcription should respect the configured language and not produce text in unrelated languages.Root cause analysis
AudioTranscriptionConfigis currently an empty class (pass) with nolanguage_codeor similar field:There appears to be no mechanism to pass the desired transcription language to the underlying STT model for input audio. The
SpeechConfig.language_codeseems to only affect TTS output, not STT input.Suggested fix
Either:
SpeechConfig.language_codeto the input transcription pipeline so it constrains the STT language, orlanguage_codefield toAudioTranscriptionConfigso developers can explicitly set the transcription language:Environment
gemini-2.5-flash-native-audio-preview-12-2025