Skip to content

feat(cli): add --lang and auto-infer phonemizer locale from voice prefix#351

Merged
jrusso1020 merged 2 commits intomainfrom
feat/tts-multilingual-lang
Apr 20, 2026
Merged

feat(cli): add --lang and auto-infer phonemizer locale from voice prefix#351
jrusso1020 merged 2 commits intomainfrom
feat/tts-multilingual-lang

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

What

Fixes multilingual TTS output in hyperframes tts. Adds:

  • --lang <code> flag for explicit phonemizer locale override
  • Auto-detection of the phonemizer locale from the Kokoro voice ID prefix when --lang is omitted (the actual bug fix)
  • Graceful fallback when the installed kokoro-onnx version predates the lang kwarg

Closes #349.

Why

hyperframes tts was calling Kokoro's model.create(text, voice=, speed=) with no language argument. Kokoro's text frontend defaults to en-us regardless of voice, so picking a non-English voice like ef_dora (Spanish) or jf_alpha (Japanese) and feeding it native-language text produced English-phonemized output — every non-English voice was effectively broken.

Kokoro's own voice ID convention encodes the language in the first letter (a=American, b=British, e=Spanish, f=French, h=Hindi, i=Italian, j=Japanese, p=Brazilian Portuguese, z=Mandarin), so the default can be derived mechanically from the voice. Explicit --lang is kept for cases where users intentionally want a mismatch (stylized accent, specific locale like en-gb vs en-us).

How

packages/cli/src/tts/manager.ts

  • New SUPPORTED_LANGS readonly tuple of the nine valid phonemizer codes
  • New inferLangFromVoiceId(voiceId) helper mapping voice prefixes → locales
  • New isSupportedLang() type guard
  • Attach defaultLang to every VoiceInfo
  • Expand BUNDLED_VOICES with ef_dora, ff_siwis, jf_alpha, zf_xiaobei so --list surfaces multilingual options (it was English-only before)

packages/cli/src/tts/synthesize.ts

  • Accept optional lang in SynthesizeOptions; forward as argv[7] to Python
  • Python worker introspects Kokoro.create's signature and only passes lang= when the installed kokoro-onnx version supports it — older installs keep working with their default (English) phonemization
  • Returned metadata adds lang and langApplied so the caller can detect silent no-ops
  • Rename cached script to synth-v2.py so existing users automatically pick up the new script (the old cached copy doesn't know about argv[7])

packages/cli/src/commands/tts.ts

  • Add --lang, -l flag with validation against SUPPORTED_LANGS
  • Resolution order: explicit --lang > inferLangFromVoiceId(voice) > en-us
  • Dim-level hint when explicit --lang disagrees with voice-implied lang (legitimate for stylized accents, suppressed under --json)
  • Dim-level hint when kokoro-onnx silently ignores the lang kwarg (old installs)
  • --list adds a "Lang code" column so users can see what each voice phonemizes to by default
  • New multilingual examples in --help

Backward compatibility

Invocation Before After
tts "Hello" --voice af_heart lang=en-us lang=en-us (inferred) — unchanged
tts "Bonjour" --voice ff_siwis Phonemized as English (buggy) Phonemized as French (correct)
tts "..." --voice ef_dora --lang en-us N/A Explicit override, works
Kokoro-onnx < lang kwarg support Works (English-only) Works (English-only); --lang logs a dim note if requested

English voices (a*/b* prefixes) are unchanged. Non-English voices now phonemize correctly by default — that's a bug fix, not a regression.

Test plan

  • Unit tests added: packages/cli/src/tts/manager.test.ts — every supported prefix, unknown-prefix fallback, case-insensitivity, isSupportedLang acceptance/rejection, regression guard that every bundled voice has a valid defaultLang matching its ID
  • Full CLI suite: bun --cwd packages/cli test → 128 passed (11 test files, 17 new)
  • Lint/format: bunx oxlint + bunx oxfmt --check clean on all changed files
  • Typecheck via lefthook pre-commit hook — clean
  • bun run build succeeds
  • Manual smoke tests:
    • npx tsx packages/cli/src/cli.ts tts --help renders --lang row and examples correctly
    • npx tsx packages/cli/src/cli.ts tts --list renders the new Lang code column and lists the expanded voice set
    • npx tsx packages/cli/src/cli.ts tts "hi" --lang notreal produces a clean validation error listing valid codes
    • npx tsx packages/cli/src/cli.ts tts --lang es (no input) falls through to the "provide text" error
  • End-to-end audio verification (ef_dora + Spanish): I didn't run this in CI because it requires Python + kokoro-onnx + espeak-ng + a ~340MB model download. Worth a reviewer doing one manual npx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output /tmp/es.wav and confirming by ear that it pronounces "reunión" and "nueve" in Spanish rather than English.

Notes for reviewers

  • Non-English phonemization in kokoro-onnx requires espeak-ng installed at the system level (brew install espeak-ng / apt-get install espeak-ng). The docs/skill call this out. A preflight check for espeak-ng when --lang is non-English would be nice follow-up but the Kokoro error is already reasonably clear.
  • Validation is exact-match against the known Kokoro locales — e.g. --lang EN-US is rejected (we normalize with toLowerCase() before checking so --lang EN-USen-us still works; explicit invalid values like --lang english or --lang de are rejected).
  • This PR fits cleanly as a single unit — it's small (+292/-23), every file change is directly motivated by either the bug fix or the --lang surface, and the inference logic + CLI surface ship together so users get the fix without needing to know the new flag.

`hyperframes tts` was calling Kokoro's `model.create(text, voice=, speed=)`
with no language argument, so Kokoro's default phonemizer (en-us) was
applied regardless of the voice selected. Picking `ef_dora` or `jf_alpha`
and feeding it Spanish or Japanese text produced English-phonemized
output.

Closes #349.

- `manager.ts`: add `SUPPORTED_LANGS`, `inferLangFromVoiceId`, and
  `isSupportedLang`. Attach a `defaultLang` field to every bundled voice
  and expand the bundled list with `ef_dora`, `ff_siwis`, `jf_alpha`,
  `zf_xiaobei` so `--list` surfaces multilingual options.
- `synthesize.ts`: accept optional `lang: SupportedLang` in
  `SynthesizeOptions`, forward it to the Python worker as `argv[7]`.
  The worker introspects `Kokoro.create`'s signature and only passes
  `lang=` when the installed kokoro-onnx version supports it. Returned
  metadata now includes `lang` and `langApplied` so callers can detect
  silent no-ops. Bump the cached script filename to `synth-v2.py` so
  existing installs pick up the new script automatically.
- `commands/tts.ts`: add `--lang, -l` with validation against
  `SUPPORTED_LANGS`. Resolution order is explicit `--lang` > inferred
  from voice prefix > `en-us`. When explicit lang disagrees with the
  voice-implied lang (legitimate for stylized accents), emit a
  dim-level hint; suppress under `--json`. When kokoro-onnx silently
  ignores the kwarg, log that too. Update `--list` with a new
  "Lang code" column and add multilingual examples.
- Tests: new `manager.test.ts` covering every supported prefix, the
  unknown-prefix fallback, case-insensitivity, `isSupportedLang`
  validation, and a regression guard that every bundled voice has a
  valid `defaultLang` matching its ID.
- Docs: `docs/packages/cli.mdx` and `skills/hyperframes/references/tts.md`
  updated with the flag, examples, the espeak-ng dependency note for
  non-English phonemization, and the voice-prefix → lang table.

Backward compatibility:
- English voices (a*/b* prefixes) continue to phonemize as en-us / en-gb
  — no change.
- Non-English voices now phonemize correctly by default (bug fix, not a
  regression).
- Older kokoro-onnx versions that don't know the `lang` kwarg keep
  working via signature introspection; the CLI logs a dim note if
  `--lang` was requested but ignored.

Verification:
- `bun --cwd packages/cli test` — 128 tests pass (incl. 17 new).
- `bunx oxlint` and `bunx oxfmt --check` clean on changed files.
- `bun run build` succeeds.
- `npx tsx packages/cli/src/cli.ts tts --help` / `--list` render cleanly;
  invalid `--lang` produces a clean error with the valid-codes list.
@mintlify
Copy link
Copy Markdown

mintlify bot commented Apr 20, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
hyperframes 🟢 Ready View Preview Apr 20, 2026, 4:43 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Post-review cleanup on #351. Net -21 lines.

- Drop `defaultLang` field + `makeVoice()` helper from VoiceInfo —
  compute via `inferLangFromVoiceId(v.id)` at read time in listVoices.
  The only reader was the --list table; caching the derived value on
  every voice added a self-consistency invariant we had to test.
- Drop redundant `lang` field from SynthesizeResult — caller already
  knows the requested lang since it passed it in; only `langApplied`
  carries information the caller can't derive.
- Use `errorBox` for --lang validation to match the house style in
  render.ts (other validation errors already use errorBox).
- Reuse existing `langList` module constant in the validation error
  instead of re-joining SUPPORTED_LANGS.
- Inline `DEFAULT_LANG` — used once in inferLangFromVoiceId.
- Trim WHAT-restating comments and the duplicate prefix-enumeration
  JSDoc on inferLangFromVoiceId (VOICE_PREFIX_LANG already carries
  per-row comments).
- Clean up orphaned `synth*.py` files in ~/.cache/hyperframes/tts
  when writing the current versioned script, so repeated upgrades
  don't leak files.
- Drop the `EN-US` case-sensitive-rejection test assertion — the CLI
  lowercases input before validation, so accepting mixed case is a
  feature, not a bug.

Tests: 16/16 in `manager.test.ts`, 127/127 full CLI suite pass.
Lint + format + typecheck clean.
@BaltasarOrtiz
Copy link
Copy Markdown

Tested locally end-to-end on this PR branch and can confirm the fix works as intended.

Local validation performed

  • Generated Spanish output with inferred language:
    • tts "La reunión empieza a las nueve" --voice ef_dora --output es-pr.wav
  • Generated control output forcing English phonemization:
    • tts "La reunión empieza a las nueve" --voice ef_dora --lang en-us --output es-pr-en-us.wav
  • Compared against current released behavior (0.4.9), which matches the forced en-us control.

Result

  • es-pr.wav sounds correctly Spanish-phonetized.
  • es-pr-en-us.wav reproduces the previous incorrect English-like phonemization.
  • Audible difference is especially clear in the phrase "...a las nueve": in es-pr.wav, nueve is realized with correct Spanish phonology as /ˈnwe.βe/ (bisyllabic, labial approximant in the intervocalic position), whereas in es-pr-en-us.wav it is anglicized and approximates an English-like /nuː/ ("new"), collapsing the expected Spanish syllabic and segmental structure.

From a practical E2E perspective, the fix behaves as expected and addresses the multilingual phonemization issue reported in #349.

@jrusso1020 jrusso1020 merged commit 4a55bc8 into main Apr 20, 2026
17 checks passed
@jrusso1020 jrusso1020 deleted the feat/tts-multilingual-lang branch April 20, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tts: Improve multilingual phonemization with explicit --lang support

3 participants