Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961) by kirklandsign · Pull Request #18961 · pytorch/executorch

kirklandsign · 2026-04-17T00:10:22Z

Summary:

Restructure extension/asr/runner/ to support both Seq2Seq and Transducer ASR architectures:

Rename AsrRunner → Seq2SeqRunner (with backward-compat alias in runner.h)
- runner.cpp → seq2seq_runner.cpp, runner.h → seq2seq_runner.h
- Old runner.h kept as a thin include redirect for backward compatibility
Add TransducerRunner for RNN-T/TDT models
- Extracts the ~190-line greedy decode loop from examples/models/parakeet/main.cpp
- Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
- Supports both standard RNN-T (duration=1) and TDT (variable durations)
- Returns vector<Token> with frame offsets for downstream timestamp computation
- Includes preprocess() method for models with bundled preprocessor
- Exposes tokenizer via tokenizer() getter for downstream use
Unify Token type: parakeet::Token is now an alias for asr::Token
Migrate consumers
- Whisper main.cpp: AsrRunner → Seq2SeqRunner
- Parakeet main.cpp: inline decode → TransducerRunner::transcribe()
- Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in examples/models/parakeet/.

Differential Revision: D100892465

pytorch-bot · 2026-04-17T00:10:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18961

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 2 Unrelated Failures

As of commit 67e6800 with merge base 7fdd306 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner (gh)
>>> Lint for extension/asr/runner/transducer_runner.cpp:
MLX / test-mlx-parakeet / test-mlx-parakeet (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 1
pull / test-parakeet-xnnpack-linux / linux-job (gh)
RuntimeError: Command docker exec -t d2077185a42efc05e7c44cace113d088186fb38aa67ee7a43747d93a4e5652d8 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-17T00:11:04Z

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100892465.

kirklandsign · 2026-04-17T00:11:25Z

@claude review

github-actions · 2026-04-17T00:11:44Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

claude · 2026-04-17T00:12:13Z

Claude finished @kirklandsign's task in 1m 28s —— View job

Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

Read all changed files and gather context
Review TransducerRunner implementation
Review Seq2SeqRunner rename and backward compatibility
Review consumer migrations (Whisper, Parakeet, Android JNI)
Review build system changes
Post review feedback

Summary

Well-structured refactoring that cleanly separates Seq2Seq (Whisper) and Transducer (RNN-T/TDT) ASR architectures. The extraction of ~190 lines of greedy decode logic from examples/models/parakeet/main.cpp into a reusable TransducerRunner is a good move. The backward-compat layer for AsrRunner is handled correctly.

Issues

1. stats_ is never reset between calls in TransducerRunner::transcribe()

transducer_runner.cpp:245-271 — The stats_ member accumulates across multiple transcribe() calls without being reset. If a user calls transcribe() twice, model_load_start_ms / model_load_end_ms from the first load() persist, but inference_start_ms gets overwritten. This produces misleading statistics for the second call. The Seq2SeqRunner has the same pre-existing pattern, so this is not a regression, but worth noting for both runners.

2. Redundant model loading for timestamp metadata in parakeet/main.cpp

examples/models/parakeet/main.cpp:195-227 — A second Module is created just to read sample_rate, window_stride, and encoder_subsampling_factor constant methods. These are read via execute() on an empty input vector, which requires loading the full module again. Since TransducerRunner already loads the same module and reads similar metadata (blank_id, num_rnn_layers, pred_hidden), consider either:

Exposing a module() accessor on TransducerRunner for downstream constant_method queries, or
Having TransducerRunner::load_model_metadata() read these extra fields too (perhaps as optional metadata).

This duplicates the module load cost at runtime.

3. BUCK file has duplicated load statement

extension/asr/runner/BUCK:8,13 — load(":targets.bzl", "define_common_targets") appears twice. This appears to be intentional (one for non_fbcode_target, one for fbcode_target), but the duplicate load on line 13 is redundant since Starlark load is idempotent. Not a bug, but slightly unusual.

Minor / Nits

4. preprocess() return type constructs TensorPtr manually

transducer_runner.cpp:234-235:

auto mel = outputs[0].toTensor();
return std::make_shared<::executorch::aten::Tensor>(std::move(mel));

This works but is inconsistent with how TensorPtr is typically created elsewhere in the codebase (e.g., via from_blob or the tensor_ptr_maker utilities). It also moves from a local Tensor into a shared_ptr, which is fine functionally.

5. Potential out-of-bounds in encoder output access

transducer_runner.cpp:268:

int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];

No bounds check on the tensor having at least one element. If the encoder returns a 0-element length tensor, this would be undefined behavior. The check at line 261-265 verifies both outputs exist and are tensors, but not that the length tensor is non-empty.

6. g_proj shape assumes it matches proj_dim

transducer_runner.cpp:372-375 — The g_proj tensor is created with shape {1, 1, proj_dim} where proj_dim comes from the encoder's output. If the decoder's g_proj output has a different hidden dimension than the encoder's f_proj, this would silently produce incorrect results. In practice these likely always match for transducer architectures, but the assumption is implicit.

7. Missing #include <optional> in transducer_runner.h

transducer_runner.h:79 uses std::optional<std::string> but doesn't include <optional>. It likely gets pulled in transitively through other headers, but for correctness it should be explicitly included.

Looks Good

Backward-compat alias strategy (runner.h → thin redirect, AsrRunner = Seq2SeqRunner) is clean and low-risk.
Android JNI layer correctly uses the alias (asr::AsrRunner) so no API breakage.
Token type unification (parakeet::Token → alias for asr::Token) avoids duplication.
Build system (CMake and Buck/Bazel targets) properly separates the two runners as independent compilation units.
The greedy decode loop in TransducerRunner::transcribe() faithfully reproduces the original logic from parakeet/main.cpp with proper error handling via ET_CHECK_OK_OR_RETURN_ERROR.
TDT duration handling with bounds checking (transducer_runner.cpp:389-396) is correct.
The max_symbols_per_step guard (transducer_runner.cpp:446-449) prevents infinite loops on pathological inputs.

Copilot

Pull request overview

This PR restructures extension/asr/runner/ to support both encoder-decoder (Seq2Seq, e.g. Whisper) and Transducer (RNN‑T/TDT) ASR architectures by introducing separate runner implementations and migrating example/JNI consumers.

Changes:

Renames AsrRunner to Seq2SeqRunner (with backward-compatible runner.h redirect and type aliases).
Adds TransducerRunner implementing greedy transducer decoding and optional bundled-preprocessor execution.
Migrates Whisper/Parakeet examples and Android JNI includes to the new runner headers, and updates build files (CMake/Buck).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
extension/asr/runner/transducer_runner.h	Defines new TransducerRunner API, Token type, and TransducerConfig.
extension/asr/runner/transducer_runner.cpp	Implements model loading, optional preprocessor invocation, and transducer greedy decode loop.
extension/asr/runner/seq2seq_runner.h	Defines Seq2SeqRunner API and backward-compatible `AsrRunner` aliases.
extension/asr/runner/seq2seq_runner.cpp	Renames implementation from AsrRunner to Seq2SeqRunner and updates includes.
extension/asr/runner/runner.h	Becomes a thin backward-compat include redirect to seq2seq_runner.h.
extension/asr/runner/targets.bzl	Adds Buck targets for seq2seq_runner and transducer_runner libraries.
extension/asr/runner/CMakeLists.txt	Builds ASR runner library from seq2seq_runner.cpp + transducer_runner.cpp.
extension/asr/runner/BUCK	Adds migration BUCK wrapper for shared targets.
extension/android/jni/jni_layer_asr.cpp	Updates include to the new Seq2Seq runner header.
examples/models/whisper/main.cpp	Migrates usage from AsrRunner/AsrTranscribeConfig to Seq2SeqRunner/Seq2SeqTranscribeConfig.
examples/models/parakeet/types.h	Switches Parakeet Token to alias shared asr::Token.
examples/models/parakeet/main.cpp	Replaces inline greedy decode with TransducerRunner preprocess()+transcribe().
examples/models/parakeet/CMakeLists.txt	Links Parakeet example against extension_asr_runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T00:19:23Z

+
+#include <cstdint>
+#include <functional>
+#include <memory>


transducer_runner.h uses std::optional/std::nullopt in the TransducerRunner constructor but does not include . Add the missing include so this header compiles when included on its own (including from transducer_runner.cpp).

Suggested change

#include <memory>

#include <memory>

#include <optional>

Copilot · 2026-04-17T00:19:24Z

+load(":targets.bzl", "define_common_targets")
+
+


This second load(":targets.bzl", "define_common_targets") appears redundant (the symbol is already loaded above). Consider removing the duplicate to simplify the BUCK file.

Suggested change

load(":targets.bzl", "define_common_targets")

Copilot · 2026-04-17T00:19:24Z

+  ET_CHECK_OR_RETURN_ERROR(
+      blank_id_result.ok(),
+      Internal,
+      "Model must export 'blank_id' as a constant_method.");
+  blank_id_ = blank_id_result.get()[0].toInt();


load_model_metadata() indexes blank_id_result.get()[0] without verifying the constant_method returned at least one value (and the expected scalar type). If the model returns an empty vector or a tensor scalar, this can crash. Consider validating output count/type before calling toInt().

Copilot · 2026-04-17T00:19:24Z

+  // --- Encode ---
+  int64_t mel_len_value = preprocessed_features->size(1);
+  std::vector<int64_t> mel_len_data = {mel_len_value};
+  auto mel_len = ::executorch::extension::from_blob(
+      mel_len_data.data(), {1}, ::executorch::aten::ScalarType::Long);


transcribe() derives mel_len_value from preprocessed_features->size(1). If the preprocessor pads the mel tensor and provides an explicit mel_len, this will pass the padded length to the encoder. Consider threading the true mel_len through (e.g., from preprocess()) instead of inferring it from the tensor shape.

Copilot · 2026-04-17T00:19:24Z

+      std::vector<::executorch::runtime::EValue>{sos_token, h, c});
+  ET_CHECK_OK_OR_RETURN_ERROR(decoder_init_result.error());
+
+  auto& init_outputs = decoder_init_result.get();


decoder_init_result.get() is indexed at [0..2] without checking the output count/types. If decoder_step returns fewer outputs (or non-tensors), this will read out of bounds. Add an explicit output validation similar to the encoder check before indexing.

Suggested change

auto& init_outputs = decoder_init_result.get();

auto& init_outputs = decoder_init_result.get();

if (init_outputs.size() < 3 || !init_outputs[0].isTensor() ||

!init_outputs[1].isTensor() || !init_outputs[2].isTensor()) {

ET_LOG(

Error,

"Method %s returned invalid outputs: expected at least 3 tensor outputs, got %zu",

kDecoderStepMethodName,

init_outputs.size());

return Error::InvalidArgument;

}

Copilot · 2026-04-17T00:19:25Z

+struct Token {
+  uint64_t id;
+  int64_t start_offset; // Frame index in the encoder output
+  int64_t duration; // TDT duration value (0 for standard RNN-T)


Token::duration is documented as "0 for standard RNN-T", but the implementation uses dur=1 when durations is empty (and TransducerConfig also describes standard RNN-T as duration==1). Please reconcile the docs/contract so downstream timestamp code can interpret duration consistently.

Suggested change

int64_t duration; // TDT duration value (0 for standard RNN-T)

int64_t duration; // Duration in encoder frames (1 for standard RNN-T)

Copilot · 2026-04-17T00:19:25Z

+  // TDT duration values. Empty means standard RNN-T (duration always 1).
+  // If the model exports a "durations" constant_method, it will be used
+  // instead of this field.


TransducerConfig says a model-exported "durations" constant_method will override config.durations, but TransducerRunner::load()/load_model_metadata() never reads such a constant_method. Either implement that override or update the comment to match actual behavior.

Suggested change

// TDT duration values. Empty means standard RNN-T (duration always 1).

// If the model exports a "durations" constant_method, it will be used

// instead of this field.

// TDT duration values used by the decode loop. Empty means standard

// RNN-T (duration always 1).

Copilot · 2026-04-17T00:19:26Z

+#pragma once
+
+#include <functional>
+#include <memory>


seq2seq_runner.h uses std::optional in the Seq2SeqRunner constructor signature but doesn't include . It currently relies on transitive includes; please add to keep the header self-contained and avoid include-order fragility.

Suggested change

#include <memory>

#include <memory>

#include <optional>

Copilot · 2026-04-17T00:19:26Z

+      Internal,
+      "Preprocessor returned unexpected output.");
+
+  auto mel = outputs[0].toTensor();
+  return std::make_shared<::executorch::aten::Tensor>(std::move(mel));


preprocess() only consumes outputs[0] from the preprocessor, but the header documents the method as returning (mel, mel_len). If mel_len is provided (or mel is padded), dropping it can cause transcribe() to pass an incorrect length to the encoder. Consider consuming outputs[1] when present and using it to compute/override mel_len.

Copilot · 2026-04-17T00:19:26Z

+    int64_t k = joint_result.get()[0].toTensor().const_data_ptr<int64_t>()[0];
+
+    // Compute frame advance duration
+    int64_t dur = 1; // default for standard RNN-T
+    if (is_tdt) {


joint_result.get() is indexed at [0] (and [1] when is_tdt) without validating the number/type of outputs. Add a guard that checks output count and that the expected outputs are tensors/scalars before reading const_data_ptr<int64_t>().

Summary: Pull Request resolved: #18961 Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures: 1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h) - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h` - Old `runner.h` kept as a thin include redirect for backward compatibility 2. **Add** `TransducerRunner` for RNN-T/TDT models - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp` - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods - Supports both standard RNN-T (duration=1) and TDT (variable durations) - Returns `vector<Token>` with frame offsets for downstream timestamp computation - Includes `preprocess()` method for models with bundled preprocessor - Exposes tokenizer via `tokenizer()` getter for downstream use 3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token` 4. **Migrate consumers** - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner` - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()` - Android JNI: updated include (uses backward-compat alias, no API change) Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`. Differential Revision: D100892465

Copilot AI review requested due to automatic review settings April 17, 2026 00:10

kirklandsign requested review from larryliu0820 and lucylq as code owners April 17, 2026 00:10

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026

Copilot started reviewing on behalf of kirklandsign April 17, 2026 00:10 View session

meta-codesync bot added fb-exported meta-exported labels Apr 17, 2026

Copilot AI reviewed Apr 17, 2026

View reviewed changes

kirklandsign force-pushed the export-D100892465 branch from abbbb8f to a29f1d7 Compare April 17, 2026 00:21

meta-codesync bot changed the title ~~Add TransducerRunner and rename AsrRunner to Seq2SeqRunner~~ Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961) Apr 17, 2026

kirklandsign force-pushed the export-D100892465 branch from a29f1d7 to 67e6800 Compare April 17, 2026 00:28

-  auto& init_outputs = decoder_init_result.get();
+  auto& init_outputs = decoder_init_result.get();
+  if (init_outputs.size() < 3 || !init_outputs[0].isTensor() ||
+      !init_outputs[1].isTensor() || !init_outputs[2].isTensor()) {
+    ET_LOG(
+        Error,
+        "Method %s returned invalid outputs: expected at least 3 tensor outputs, got %zu",
+        kDecoderStepMethodName,
+        init_outputs.size());
+    return Error::InvalidArgument;
+  }

	int64_t duration; // TDT duration value (0 for standard RNN-T)
	int64_t duration; // Duration in encoder frames (1 for standard RNN-T)

Conversation

kirklandsign commented Apr 17, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18961

❌ 3 New Failures, 2 Unrelated Failures

Uh oh!

meta-codesync bot commented Apr 17, 2026

Uh oh!

kirklandsign commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

This PR needs a release notes: label

Uh oh!

claude bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

Summary

Issues

Minor / Nits

Looks Good

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kirklandsign commented Apr 17, 2026 •

edited by meta-codesync bot

Loading

pytorch-bot bot commented Apr 17, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude bot commented Apr 17, 2026 •

edited

Loading