Skip to content

[FEAT] Port SRT encoder to Rust#2227

Open
DhanushVarma-2 wants to merge 1 commit intoCCExtractor:masterfrom
DhanushVarma-2:feat/srt-encoder-rust
Open

[FEAT] Port SRT encoder to Rust#2227
DhanushVarma-2 wants to merge 1 commit intoCCExtractor:masterfrom
DhanushVarma-2:feat/srt-encoder-rust

Conversation

@DhanushVarma-2
Copy link
Copy Markdown
Contributor

@DhanushVarma-2 DhanushVarma-2 commented Mar 24, 2026

Ported all 5 functions from ccx_encoders_srt.c to Rust:

  • write_stringz_as_srt → ccxr_write_stringz_as_srt (text subtitles)
  • write_cc_buffer_as_srt → ccxr_write_cc_buffer_as_srt (CEA-608 with autodash)
  • write_cc_subtitle_as_srt → ccxr_write_cc_subtitle_as_srt (subtitle chain + teletext multi-page)
  • write_cc_bitmap_as_srt → ccxr_write_cc_bitmap_as_srt (OCR, behind hardsubx_ocr feature)
  • write_stringz_as_srt_to_output (internal helper)

Rust is called by default. C fallback behind #ifdef DISABLE_RUST.

Tested on Matroska test files (Elephant Dreams) with 8 subtitle tracks
including English and Hungarian with UTF-8 accented characters.
All extract correctly.

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch 7 times, most recently from d9b3a3a to c6523ce Compare March 25, 2026 11:07
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

Screenshot 2026-03-25 at 9 16 39 PM Output

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch 5 times, most recently from af58374 to a1dc78a Compare March 26, 2026 11:07
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

The bitmap function can't be wired right now. CMake sets ENABLE_HARDSUBX for C code but never calls corrosion_set_features to pass hardsubx_ocr to the Rust crate. So the Rust function doesn't get compiled in hardsubx builds, and wiring it from C causes a linker error....that's what happened when the OCR Docker CI failed earlier.

The Rust code is there and ready. To actually wire it, someone needs to add corrosion_set_features(ccx_rust hardsubx_ocr) inside the WITH_HARDSUBX block in CMakeLists.txt. I'll do that as a follow-up PR to keep this one clean.

And yes get_teletext_output and get_teletext_srt_counter both exist in ccx_encoders_common.c (lines 1356 and 1460)
They've been there since the teletext multi-page feature was added.

Copy link
Copy Markdown
Member

@steel-bucket steel-bucket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare Master Branch SP test results and yours

Comment on lines +9 to +11
#ifndef DISABLE_RUST
extern int ccxr_write_stringz_as_srt(const char *string, struct encoder_ctx *context, LLONG ms_start, LLONG ms_end);
extern int ccxr_write_cc_buffer_as_srt(struct eia608_screen *data, struct encoder_ctx *context);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISABLE_RUST is deprecated

Comment on lines +10 to +18
extern "C" {
fn get_decoder_line_encoded(
ctx: *mut encoder_ctx,
buffer: *mut c_uchar,
line_num: c_int,
data: *const eia608_screen,
) -> c_uint;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import Externs from the way we do in src/rust/src/lib.rs, not this way

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch 2 times, most recently from 43ef45f to 043901b Compare March 28, 2026 09:40
Implement ccxr_write_stringz_as_srt and ccxr_write_cc_buffer_as_srt
in src/rust/src/encoder/srt.rs. Covers:
- Subtitle counter and timestamp formatting with -1ms overlap prevention
- \n unescape handling for multi-line subtitles
- Encoding conversion (UTF-8, Latin1, UCS-2)
- Autodash detection for CEA-608 screen buffers
- Speaker name detection (colon-based)

Uses existing Rust encoder infrastructure (encode_line, write_wrapped)
and calls C get_decoder_line_encoded for CEA-608 line encoding until
that function is also ported.

Exported as #[no_mangle] extern C functions ready to replace the C
versions in ccx_encoders_srt.c.
@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/srt-encoder-rust branch from 043901b to c591f90 Compare March 29, 2026 07:59
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

DhanushVarma-2 commented Mar 29, 2026

@steel-bucket can you review it again

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit d56a6be...:
Report Name Tests Passed
Broken 9/13
CEA-708 1/14
DVB 3/7
DVD 3/3
DVR-MS 2/2
General 20/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 77/86
Teletext 20/21
WTV 13/13
XDS 31/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=srt --latin1 b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --out=spupng c83f765c66...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit d56a6be...:
Report Name Tests Passed
Broken 9/13
CEA-708 1/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 22/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 20/21
WTV 13/13
XDS 31/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
  • ccextractor --autoprogram --out=srt --latin1 b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Comment on lines +1 to +2
use crate::bindings::{ccx_encoding_type_CCX_ENC_UNICODE, eia608_screen, encoder_ctx};
use crate::encoder::common::encode_line;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't import bindings outside of libccxr_exports

Comment on lines +27 to +53
/// Helper: copy CRLF bytes from context so we don't hold a borrow.
unsafe fn copy_crlf(ctx: &encoder_ctx) -> Vec<u8> {
let len = ctx.encoded_crlf_length as usize;
let slice = std::slice::from_raw_parts(ctx.encoded_crlf.as_ptr(), len);
slice.to_vec()
}

/// Helper: encode into a fresh Vec, avoiding borrow conflicts with ctx.buffer.
unsafe fn encode_to_vec(ctx: &mut encoder_ctx, text: &[u8]) -> Vec<u8> {
let cap = ctx.capacity as usize;
let buf = std::slice::from_raw_parts_mut(ctx.buffer, cap);
let used = encode_line(ctx, buf, text) as usize;
buf[..used].to_vec()
}

/// Core SRT writer — writes a single subtitle entry to a specific output.
///
/// # Safety
/// Accesses raw pointers from encoder context.
pub unsafe fn write_stringz_as_srt_to_output(
string: *const i8,
context: &mut encoder_ctx,
ms_start: i64,
ms_end: i64,
out_fh: c_int,
srt_counter: *mut u32,
) -> c_int {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this is C-like, The underlying thought while doing these migrations, is that when we finally move to Rust completely we can't still pass raw pointers around

Comment on lines +495 to +506
struct CcBitmap {
x: c_int,
y: c_int,
w: c_int,
h: c_int,
nb_colors: c_int,
data0: *mut std::os::raw::c_uchar,
data1: *mut std::os::raw::c_uchar,
linesize0: c_int,
linesize1: c_int,
}
let data_ptr = s.data as *mut CcBitmap;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants