Skip to content

Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961)#18961

Open
kirklandsign wants to merge 1 commit intomainfrom
export-D100892465
Open

Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961)#18961
kirklandsign wants to merge 1 commit intomainfrom
export-D100892465

Conversation

@kirklandsign
Copy link
Copy Markdown
Contributor

@kirklandsign kirklandsign commented Apr 17, 2026

Summary:

Restructure extension/asr/runner/ to support both Seq2Seq and Transducer ASR architectures:

  1. Rename AsrRunnerSeq2SeqRunner (with backward-compat alias in runner.h)

    • runner.cppseq2seq_runner.cpp, runner.hseq2seq_runner.h
    • Old runner.h kept as a thin include redirect for backward compatibility
  2. Add TransducerRunner for RNN-T/TDT models

    • Extracts the ~190-line greedy decode loop from examples/models/parakeet/main.cpp
    • Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
    • Supports both standard RNN-T (duration=1) and TDT (variable durations)
    • Returns vector<Token> with frame offsets for downstream timestamp computation
    • Includes preprocess() method for models with bundled preprocessor
    • Exposes tokenizer via tokenizer() getter for downstream use
  3. Unify Token type: parakeet::Token is now an alias for asr::Token

  4. Migrate consumers

    • Whisper main.cpp: AsrRunnerSeq2SeqRunner
    • Parakeet main.cpp: inline decode → TransducerRunner::transcribe()
    • Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in examples/models/parakeet/.

Differential Revision: D100892465

Copilot AI review requested due to automatic review settings April 17, 2026 00:10
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18961

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 2 Unrelated Failures

As of commit 67e6800 with merge base 7fdd306 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 17, 2026

@kirklandsign has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100892465.

@kirklandsign
Copy link
Copy Markdown
Contributor Author

@claude review

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@claude
Copy link
Copy Markdown

claude bot commented Apr 17, 2026

Claude finished @kirklandsign's task in 1m 28s —— View job


Code Review: Add TransducerRunner and rename AsrRunner to Seq2SeqRunner

  • Read all changed files and gather context
  • Review TransducerRunner implementation
  • Review Seq2SeqRunner rename and backward compatibility
  • Review consumer migrations (Whisper, Parakeet, Android JNI)
  • Review build system changes
  • Post review feedback

Summary

Well-structured refactoring that cleanly separates Seq2Seq (Whisper) and Transducer (RNN-T/TDT) ASR architectures. The extraction of ~190 lines of greedy decode logic from examples/models/parakeet/main.cpp into a reusable TransducerRunner is a good move. The backward-compat layer for AsrRunner is handled correctly.

Issues

1. stats_ is never reset between calls in TransducerRunner::transcribe()

transducer_runner.cpp:245-271 — The stats_ member accumulates across multiple transcribe() calls without being reset. If a user calls transcribe() twice, model_load_start_ms / model_load_end_ms from the first load() persist, but inference_start_ms gets overwritten. This produces misleading statistics for the second call. The Seq2SeqRunner has the same pre-existing pattern, so this is not a regression, but worth noting for both runners.

2. Redundant model loading for timestamp metadata in parakeet/main.cpp

examples/models/parakeet/main.cpp:195-227 — A second Module is created just to read sample_rate, window_stride, and encoder_subsampling_factor constant methods. These are read via execute() on an empty input vector, which requires loading the full module again. Since TransducerRunner already loads the same module and reads similar metadata (blank_id, num_rnn_layers, pred_hidden), consider either:

  • Exposing a module() accessor on TransducerRunner for downstream constant_method queries, or
  • Having TransducerRunner::load_model_metadata() read these extra fields too (perhaps as optional metadata).

This duplicates the module load cost at runtime.

3. BUCK file has duplicated load statement

extension/asr/runner/BUCK:8,13load(":targets.bzl", "define_common_targets") appears twice. This appears to be intentional (one for non_fbcode_target, one for fbcode_target), but the duplicate load on line 13 is redundant since Starlark load is idempotent. Not a bug, but slightly unusual.

Minor / Nits

4. preprocess() return type constructs TensorPtr manually

transducer_runner.cpp:234-235:

auto mel = outputs[0].toTensor();
return std::make_shared<::executorch::aten::Tensor>(std::move(mel));

This works but is inconsistent with how TensorPtr is typically created elsewhere in the codebase (e.g., via from_blob or the tensor_ptr_maker utilities). It also moves from a local Tensor into a shared_ptr, which is fine functionally.

5. Potential out-of-bounds in encoder output access

transducer_runner.cpp:268:

int64_t encoded_len = enc_outputs[1].toTensor().const_data_ptr<int64_t>()[0];

No bounds check on the tensor having at least one element. If the encoder returns a 0-element length tensor, this would be undefined behavior. The check at line 261-265 verifies both outputs exist and are tensors, but not that the length tensor is non-empty.

6. g_proj shape assumes it matches proj_dim

transducer_runner.cpp:372-375 — The g_proj tensor is created with shape {1, 1, proj_dim} where proj_dim comes from the encoder's output. If the decoder's g_proj output has a different hidden dimension than the encoder's f_proj, this would silently produce incorrect results. In practice these likely always match for transducer architectures, but the assumption is implicit.

7. Missing #include <optional> in transducer_runner.h

transducer_runner.h:79 uses std::optional<std::string> but doesn't include <optional>. It likely gets pulled in transitively through other headers, but for correctness it should be explicitly included.

Looks Good

  • Backward-compat alias strategy (runner.h → thin redirect, AsrRunner = Seq2SeqRunner) is clean and low-risk.
  • Android JNI layer correctly uses the alias (asr::AsrRunner) so no API breakage.
  • Token type unification (parakeet::Token → alias for asr::Token) avoids duplication.
  • Build system (CMake and Buck/Bazel targets) properly separates the two runners as independent compilation units.
  • The greedy decode loop in TransducerRunner::transcribe() faithfully reproduces the original logic from parakeet/main.cpp with proper error handling via ET_CHECK_OK_OR_RETURN_ERROR.
  • TDT duration handling with bounds checking (transducer_runner.cpp:389-396) is correct.
  • The max_symbols_per_step guard (transducer_runner.cpp:446-449) prevents infinite loops on pathological inputs.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures extension/asr/runner/ to support both encoder-decoder (Seq2Seq, e.g. Whisper) and Transducer (RNN‑T/TDT) ASR architectures by introducing separate runner implementations and migrating example/JNI consumers.

Changes:

  • Renames AsrRunner to Seq2SeqRunner (with backward-compatible runner.h redirect and type aliases).
  • Adds TransducerRunner implementing greedy transducer decoding and optional bundled-preprocessor execution.
  • Migrates Whisper/Parakeet examples and Android JNI includes to the new runner headers, and updates build files (CMake/Buck).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
extension/asr/runner/transducer_runner.h Defines new TransducerRunner API, Token type, and TransducerConfig.
extension/asr/runner/transducer_runner.cpp Implements model loading, optional preprocessor invocation, and transducer greedy decode loop.
extension/asr/runner/seq2seq_runner.h Defines Seq2SeqRunner API and backward-compatible AsrRunner aliases.
extension/asr/runner/seq2seq_runner.cpp Renames implementation from AsrRunner to Seq2SeqRunner and updates includes.
extension/asr/runner/runner.h Becomes a thin backward-compat include redirect to seq2seq_runner.h.
extension/asr/runner/targets.bzl Adds Buck targets for seq2seq_runner and transducer_runner libraries.
extension/asr/runner/CMakeLists.txt Builds ASR runner library from seq2seq_runner.cpp + transducer_runner.cpp.
extension/asr/runner/BUCK Adds migration BUCK wrapper for shared targets.
extension/android/jni/jni_layer_asr.cpp Updates include to the new Seq2Seq runner header.
examples/models/whisper/main.cpp Migrates usage from AsrRunner/AsrTranscribeConfig to Seq2SeqRunner/Seq2SeqTranscribeConfig.
examples/models/parakeet/types.h Switches Parakeet Token to alias shared asr::Token.
examples/models/parakeet/main.cpp Replaces inline greedy decode with TransducerRunner preprocess()+transcribe().
examples/models/parakeet/CMakeLists.txt Links Parakeet example against extension_asr_runner.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


#include <cstdint>
#include <functional>
#include <memory>
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transducer_runner.h uses std::optional/std::nullopt in the TransducerRunner constructor but does not include . Add the missing include so this header compiles when included on its own (including from transducer_runner.cpp).

Suggested change
#include <memory>
#include <memory>
#include <optional>

Copilot uses AI. Check for mistakes.
Comment thread extension/asr/runner/BUCK
Comment on lines +13 to +15
load(":targets.bzl", "define_common_targets")


Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second load(":targets.bzl", "define_common_targets") appears redundant (the symbol is already loaded above). Consider removing the duplicate to simplify the BUCK file.

Suggested change
load(":targets.bzl", "define_common_targets")

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +121
ET_CHECK_OR_RETURN_ERROR(
blank_id_result.ok(),
Internal,
"Model must export 'blank_id' as a constant_method.");
blank_id_ = blank_id_result.get()[0].toInt();
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_model_metadata() indexes blank_id_result.get()[0] without verifying the constant_method returned at least one value (and the expected scalar type). If the model returns an empty vector or a tensor scalar, this can crash. Consider validating output count/type before calling toInt().

Copilot uses AI. Check for mistakes.
Comment on lines +247 to +251
// --- Encode ---
int64_t mel_len_value = preprocessed_features->size(1);
std::vector<int64_t> mel_len_data = {mel_len_value};
auto mel_len = ::executorch::extension::from_blob(
mel_len_data.data(), {1}, ::executorch::aten::ScalarType::Long);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transcribe() derives mel_len_value from preprocessed_features->size(1). If the preprocessor pads the mel tensor and provides an explicit mel_len, this will pass the padded length to the encoder. Consider threading the true mel_len through (e.g., from preprocess()) instead of inferring it from the tensor shape.

Copilot uses AI. Check for mistakes.
std::vector<::executorch::runtime::EValue>{sos_token, h, c});
ET_CHECK_OK_OR_RETURN_ERROR(decoder_init_result.error());

auto& init_outputs = decoder_init_result.get();
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoder_init_result.get() is indexed at [0..2] without checking the output count/types. If decoder_step returns fewer outputs (or non-tensors), this will read out of bounds. Add an explicit output validation similar to the encoder check before indexing.

Suggested change
auto& init_outputs = decoder_init_result.get();
auto& init_outputs = decoder_init_result.get();
if (init_outputs.size() < 3 || !init_outputs[0].isTensor() ||
!init_outputs[1].isTensor() || !init_outputs[2].isTensor()) {
ET_LOG(
Error,
"Method %s returned invalid outputs: expected at least 3 tensor outputs, got %zu",
kDecoderStepMethodName,
init_outputs.size());
return Error::InvalidArgument;
}

Copilot uses AI. Check for mistakes.
struct Token {
uint64_t id;
int64_t start_offset; // Frame index in the encoder output
int64_t duration; // TDT duration value (0 for standard RNN-T)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token::duration is documented as "0 for standard RNN-T", but the implementation uses dur=1 when durations is empty (and TransducerConfig also describes standard RNN-T as duration==1). Please reconcile the docs/contract so downstream timestamp code can interpret duration consistently.

Suggested change
int64_t duration; // TDT duration value (0 for standard RNN-T)
int64_t duration; // Duration in encoder frames (1 for standard RNN-T)

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +51
// TDT duration values. Empty means standard RNN-T (duration always 1).
// If the model exports a "durations" constant_method, it will be used
// instead of this field.
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TransducerConfig says a model-exported "durations" constant_method will override config.durations, but TransducerRunner::load()/load_model_metadata() never reads such a constant_method. Either implement that override or update the comment to match actual behavior.

Suggested change
// TDT duration values. Empty means standard RNN-T (duration always 1).
// If the model exports a "durations" constant_method, it will be used
// instead of this field.
// TDT duration values used by the decode loop. Empty means standard
// RNN-T (duration always 1).

Copilot uses AI. Check for mistakes.
#pragma once

#include <functional>
#include <memory>
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq2seq_runner.h uses std::optional in the Seq2SeqRunner constructor signature but doesn't include . It currently relies on transitive includes; please add to keep the header self-contained and avoid include-order fragility.

Suggested change
#include <memory>
#include <memory>
#include <optional>

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +235
Internal,
"Preprocessor returned unexpected output.");

auto mel = outputs[0].toTensor();
return std::make_shared<::executorch::aten::Tensor>(std::move(mel));
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preprocess() only consumes outputs[0] from the preprocessor, but the header documents the method as returning (mel, mel_len). If mel_len is provided (or mel is padded), dropping it can cause transcribe() to pass an incorrect length to the encoder. Consider consuming outputs[1] when present and using it to compute/override mel_len.

Copilot uses AI. Check for mistakes.
Comment on lines +382 to +386
int64_t k = joint_result.get()[0].toTensor().const_data_ptr<int64_t>()[0];

// Compute frame advance duration
int64_t dur = 1; // default for standard RNN-T
if (is_tdt) {
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joint_result.get() is indexed at [0] (and [1] when is_tdt) without validating the number/type of outputs. Add a guard that checks output count and that the expected outputs are tensors/scalars before reading const_data_ptr<int64_t>().

Copilot uses AI. Check for mistakes.
kirklandsign added a commit that referenced this pull request Apr 17, 2026
Summary:
Pull Request resolved: #18961

Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures:

1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h)
   - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h`
   - Old `runner.h` kept as a thin include redirect for backward compatibility

2. **Add** `TransducerRunner` for RNN-T/TDT models
   - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp`
   - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
   - Supports both standard RNN-T (duration=1) and TDT (variable durations)
   - Returns `vector<Token>` with frame offsets for downstream timestamp computation
   - Includes `preprocess()` method for models with bundled preprocessor
   - Exposes tokenizer via `tokenizer()` getter for downstream use

3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token`

4. **Migrate consumers**
   - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner`
   - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()`
   - Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`.

Differential Revision: D100892465
@meta-codesync meta-codesync bot changed the title Add TransducerRunner and rename AsrRunner to Seq2SeqRunner Add TransducerRunner and rename AsrRunner to Seq2SeqRunner (#18961) Apr 17, 2026
Summary:
Pull Request resolved: #18961

Restructure `extension/asr/runner/` to support both Seq2Seq and Transducer ASR architectures:

1. **Rename** `AsrRunner` → `Seq2SeqRunner` (with backward-compat alias in runner.h)
   - `runner.cpp` → `seq2seq_runner.cpp`, `runner.h` → `seq2seq_runner.h`
   - Old `runner.h` kept as a thin include redirect for backward compatibility

2. **Add** `TransducerRunner` for RNN-T/TDT models
   - Extracts the ~190-line greedy decode loop from `examples/models/parakeet/main.cpp`
   - Auto-reads model metadata (blank_id, num_rnn_layers, pred_hidden) from constant_methods
   - Supports both standard RNN-T (duration=1) and TDT (variable durations)
   - Returns `vector<Token>` with frame offsets for downstream timestamp computation
   - Includes `preprocess()` method for models with bundled preprocessor
   - Exposes tokenizer via `tokenizer()` getter for downstream use

3. **Unify Token type**: `parakeet::Token` is now an alias for `asr::Token`

4. **Migrate consumers**
   - Whisper `main.cpp`: `AsrRunner` → `Seq2SeqRunner`
   - Parakeet `main.cpp`: inline decode → `TransducerRunner::transcribe()`
   - Android JNI: updated include (uses backward-compat alias, no API change)

Model-specific post-processing (timestamp computation) remains in `examples/models/parakeet/`.

Differential Revision: D100892465
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants