feat(cli): add --lang and auto-infer phonemizer locale from voice prefix by jrusso1020 · Pull Request #351 · heygen-com/hyperframes

jrusso1020 · 2026-04-20T16:43:11Z

What

Fixes multilingual TTS output in hyperframes tts. Adds:

--lang <code> flag for explicit phonemizer locale override
Auto-detection of the phonemizer locale from the Kokoro voice ID prefix when --lang is omitted (the actual bug fix)
Graceful fallback when the installed kokoro-onnx version predates the lang kwarg

Closes #349.

Why

hyperframes tts was calling Kokoro's model.create(text, voice=, speed=) with no language argument. Kokoro's text frontend defaults to en-us regardless of voice, so picking a non-English voice like ef_dora (Spanish) or jf_alpha (Japanese) and feeding it native-language text produced English-phonemized output — every non-English voice was effectively broken.

Kokoro's own voice ID convention encodes the language in the first letter (a=American, b=British, e=Spanish, f=French, h=Hindi, i=Italian, j=Japanese, p=Brazilian Portuguese, z=Mandarin), so the default can be derived mechanically from the voice. Explicit --lang is kept for cases where users intentionally want a mismatch (stylized accent, specific locale like en-gb vs en-us).

How

packages/cli/src/tts/manager.ts

New SUPPORTED_LANGS readonly tuple of the nine valid phonemizer codes
New inferLangFromVoiceId(voiceId) helper mapping voice prefixes → locales
New isSupportedLang() type guard
Attach defaultLang to every VoiceInfo
Expand BUNDLED_VOICES with ef_dora, ff_siwis, jf_alpha, zf_xiaobei so --list surfaces multilingual options (it was English-only before)

packages/cli/src/tts/synthesize.ts

Accept optional lang in SynthesizeOptions; forward as argv[7] to Python
Python worker introspects Kokoro.create's signature and only passes lang= when the installed kokoro-onnx version supports it — older installs keep working with their default (English) phonemization
Returned metadata adds lang and langApplied so the caller can detect silent no-ops
Rename cached script to synth-v2.py so existing users automatically pick up the new script (the old cached copy doesn't know about argv[7])

packages/cli/src/commands/tts.ts

Add --lang, -l flag with validation against SUPPORTED_LANGS
Resolution order: explicit --lang > inferLangFromVoiceId(voice) > en-us
Dim-level hint when explicit --lang disagrees with voice-implied lang (legitimate for stylized accents, suppressed under --json)
Dim-level hint when kokoro-onnx silently ignores the lang kwarg (old installs)
--list adds a "Lang code" column so users can see what each voice phonemizes to by default
New multilingual examples in --help

Backward compatibility

Invocation	Before	After
`tts "Hello" --voice af_heart`	`lang=en-us`	`lang=en-us` (inferred) — unchanged
`tts "Bonjour" --voice ff_siwis`	Phonemized as English (buggy)	Phonemized as French (correct)
`tts "..." --voice ef_dora --lang en-us`	N/A	Explicit override, works
Kokoro-onnx < `lang` kwarg support	Works (English-only)	Works (English-only); `--lang` logs a dim note if requested

English voices (a*/b* prefixes) are unchanged. Non-English voices now phonemize correctly by default — that's a bug fix, not a regression.

Test plan

Unit tests added: packages/cli/src/tts/manager.test.ts — every supported prefix, unknown-prefix fallback, case-insensitivity, isSupportedLang acceptance/rejection, regression guard that every bundled voice has a valid defaultLang matching its ID
Full CLI suite: bun --cwd packages/cli test → 128 passed (11 test files, 17 new)
Lint/format: bunx oxlint + bunx oxfmt --check clean on all changed files
Typecheck via lefthook pre-commit hook — clean
bun run build succeeds
Manual smoke tests:
- npx tsx packages/cli/src/cli.ts tts --help renders --lang row and examples correctly
- npx tsx packages/cli/src/cli.ts tts --list renders the new Lang code column and lists the expanded voice set
- npx tsx packages/cli/src/cli.ts tts "hi" --lang notreal produces a clean validation error listing valid codes
- npx tsx packages/cli/src/cli.ts tts --lang es (no input) falls through to the "provide text" error
End-to-end audio verification (ef_dora + Spanish): I didn't run this in CI because it requires Python + kokoro-onnx + espeak-ng + a ~340MB model download. Worth a reviewer doing one manual npx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output /tmp/es.wav and confirming by ear that it pronounces "reunión" and "nueve" in Spanish rather than English.

Notes for reviewers

Non-English phonemization in kokoro-onnx requires espeak-ng installed at the system level (brew install espeak-ng / apt-get install espeak-ng). The docs/skill call this out. A preflight check for espeak-ng when --lang is non-English would be nice follow-up but the Kokoro error is already reasonably clear.
Validation is exact-match against the known Kokoro locales — e.g. --lang EN-US is rejected (we normalize with toLowerCase() before checking so --lang EN-US → en-us still works; explicit invalid values like --lang english or --lang de are rejected).
This PR fits cleanly as a single unit — it's small (+292/-23), every file change is directly motivated by either the bug fix or the --lang surface, and the inference logic + CLI surface ship together so users get the fix without needing to know the new flag.

`hyperframes tts` was calling Kokoro's `model.create(text, voice=, speed=)` with no language argument, so Kokoro's default phonemizer (en-us) was applied regardless of the voice selected. Picking `ef_dora` or `jf_alpha` and feeding it Spanish or Japanese text produced English-phonemized output. Closes #349. - `manager.ts`: add `SUPPORTED_LANGS`, `inferLangFromVoiceId`, and `isSupportedLang`. Attach a `defaultLang` field to every bundled voice and expand the bundled list with `ef_dora`, `ff_siwis`, `jf_alpha`, `zf_xiaobei` so `--list` surfaces multilingual options. - `synthesize.ts`: accept optional `lang: SupportedLang` in `SynthesizeOptions`, forward it to the Python worker as `argv[7]`. The worker introspects `Kokoro.create`'s signature and only passes `lang=` when the installed kokoro-onnx version supports it. Returned metadata now includes `lang` and `langApplied` so callers can detect silent no-ops. Bump the cached script filename to `synth-v2.py` so existing installs pick up the new script automatically. - `commands/tts.ts`: add `--lang, -l` with validation against `SUPPORTED_LANGS`. Resolution order is explicit `--lang` > inferred from voice prefix > `en-us`. When explicit lang disagrees with the voice-implied lang (legitimate for stylized accents), emit a dim-level hint; suppress under `--json`. When kokoro-onnx silently ignores the kwarg, log that too. Update `--list` with a new "Lang code" column and add multilingual examples. - Tests: new `manager.test.ts` covering every supported prefix, the unknown-prefix fallback, case-insensitivity, `isSupportedLang` validation, and a regression guard that every bundled voice has a valid `defaultLang` matching its ID. - Docs: `docs/packages/cli.mdx` and `skills/hyperframes/references/tts.md` updated with the flag, examples, the espeak-ng dependency note for non-English phonemization, and the voice-prefix → lang table. Backward compatibility: - English voices (a*/b* prefixes) continue to phonemize as en-us / en-gb — no change. - Non-English voices now phonemize correctly by default (bug fix, not a regression). - Older kokoro-onnx versions that don't know the `lang` kwarg keep working via signature introspection; the CLI logs a dim note if `--lang` was requested but ignored. Verification: - `bun --cwd packages/cli test` — 128 tests pass (incl. 17 new). - `bunx oxlint` and `bunx oxfmt --check` clean on changed files. - `bun run build` succeeds. - `npx tsx packages/cli/src/cli.ts tts --help` / `--list` render cleanly; invalid `--lang` produces a clean error with the valid-codes list.

mintlify · 2026-04-20T16:43:23Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
hyperframes	🟢 Ready	View Preview	Apr 20, 2026, 4:43 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Post-review cleanup on #351. Net -21 lines. - Drop `defaultLang` field + `makeVoice()` helper from VoiceInfo — compute via `inferLangFromVoiceId(v.id)` at read time in listVoices. The only reader was the --list table; caching the derived value on every voice added a self-consistency invariant we had to test. - Drop redundant `lang` field from SynthesizeResult — caller already knows the requested lang since it passed it in; only `langApplied` carries information the caller can't derive. - Use `errorBox` for --lang validation to match the house style in render.ts (other validation errors already use errorBox). - Reuse existing `langList` module constant in the validation error instead of re-joining SUPPORTED_LANGS. - Inline `DEFAULT_LANG` — used once in inferLangFromVoiceId. - Trim WHAT-restating comments and the duplicate prefix-enumeration JSDoc on inferLangFromVoiceId (VOICE_PREFIX_LANG already carries per-row comments). - Clean up orphaned `synth*.py` files in ~/.cache/hyperframes/tts when writing the current versioned script, so repeated upgrades don't leak files. - Drop the `EN-US` case-sensitive-rejection test assertion — the CLI lowercases input before validation, so accepting mixed case is a feature, not a bug. Tests: 16/16 in `manager.test.ts`, 127/127 full CLI suite pass. Lint + format + typecheck clean.

BaltasarOrtiz · 2026-04-20T17:31:57Z

Tested locally end-to-end on this PR branch and can confirm the fix works as intended.

Local validation performed

Generated Spanish output with inferred language:
- tts "La reunión empieza a las nueve" --voice ef_dora --output es-pr.wav
Generated control output forcing English phonemization:
- tts "La reunión empieza a las nueve" --voice ef_dora --lang en-us --output es-pr-en-us.wav
Compared against current released behavior (0.4.9), which matches the forced en-us control.

Result

es-pr.wav sounds correctly Spanish-phonetized.
es-pr-en-us.wav reproduces the previous incorrect English-like phonemization.
Audible difference is especially clear in the phrase "...a las nueve": in es-pr.wav, nueve is realized with correct Spanish phonology as /ˈnwe.βe/ (bisyllabic, labial approximant in the intervocalic position), whereas in es-pr-en-us.wav it is anglicized and approximates an English-like /nuː/ ("new"), collapsing the expected Spanish syllabic and segmental structure.

From a practical E2E perspective, the fix behaves as expected and addresses the multilingual phonemization issue reported in #349.

mintlify bot deployed to staging - docs April 20, 2026 16:43 View deployment

miguel-heygen approved these changes Apr 20, 2026

View reviewed changes

jrusso1020 merged commit 4a55bc8 into main Apr 20, 2026
17 checks passed

jrusso1020 deleted the feat/tts-multilingual-lang branch April 20, 2026 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add --lang and auto-infer phonemizer locale from voice prefix#351

feat(cli): add --lang and auto-infer phonemizer locale from voice prefix#351
jrusso1020 merged 2 commits intomainfrom
feat/tts-multilingual-lang

jrusso1020 commented Apr 20, 2026

Uh oh!

mintlify bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

BaltasarOrtiz commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jrusso1020 commented Apr 20, 2026

What

Why

How

Backward compatibility

Test plan

Notes for reviewers

Uh oh!

mintlify bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BaltasarOrtiz commented Apr 20, 2026

Local validation performed

Result

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mintlify bot commented Apr 20, 2026 •

edited

Loading