Skip to content

[TeleViz] Add QuadLayer + textured-quad pipeline + VizCudaArray#466

Merged
farbod-nv merged 7 commits into
mainfrom
fm/televiz_m3b
May 5, 2026
Merged

[TeleViz] Add QuadLayer + textured-quad pipeline + VizCudaArray#466
farbod-nv merged 7 commits into
mainfrom
fm/televiz_m3b

Conversation

@farbod-nv
Copy link
Copy Markdown
Contributor

@farbod-nv farbod-nv commented May 4, 2026

QuadLayer is the first concrete layer type: a fullscreen-blit textured quad sampling a CUDA-fed DeviceImage. Two submission paths:

  • submit(VizBuffer) Mode A: caller's CUDA buffer is copied
    into the layer's DeviceImage.
  • acquire() / release() Mode B: caller writes into the layer's
    tiled CUDA-Vulkan image directly. Zero
    copy. acquire() returns VizCudaArray.

Two image-shape view types now sit side-by-side in viz/core:

  • VizBuffer — linear pointer-backed memory (CPU bytes or CUDA
    device pointer). Exposes cuda_array_interface
    in Python for kDevice.
  • VizCudaArray — opaque CUDA cudaArray_t (tiled GPU memory). Does
    NOT expose cuda_array_interface — that protocol
    means pointer-backed memory and a cudaArray_t isn't
    one. Used as Mode B return.

Pipeline machinery (sampler, descriptor set layout, pipeline layout, VkPipeline, descriptor pool/set) lives inside QuadLayer. Pipelines build using a process-wide VkPipelineCache added to VkContext for driver-side compilation reuse.

Summary by CodeRabbit

  • New Features

    • QuadLayer: render CUDA-fed textures as fullscreen quads with two submission modes (copy and zero‑copy).
    • Vulkan pipeline cache exposed for faster pipeline reuse.
    • VizSession exposes a Vulkan context accessor.
    • Per-layer timeline-semaphore hooks for explicit GPU synchronization.
  • Tests

    • Added unit and GPU tests for QuadLayer and end-to-end CUDA↔Vulkan interop (including multi-frame).
  • Documentation

    • Clarified module docs on CUDA tiled arrays and static layer layout.

QuadLayer is the first concrete layer type: a fullscreen-blit textured
quad sampling a CUDA-fed DeviceImage. Two submission paths:

- submit(VizBuffer)        Mode A: caller's CUDA buffer is copied
                           into the layer's DeviceImage.
- acquire() / release()    Mode B: caller writes into the layer's
                           tiled CUDA-Vulkan image directly. Zero
                           copy. acquire() returns VizCudaArray.

Two image-shape view types now sit side-by-side in viz/core:

- VizBuffer    — linear pointer-backed memory (CPU bytes or CUDA
                 device pointer). Exposes __cuda_array_interface__
                 in Python for kDevice.
- VizCudaArray — opaque CUDA cudaArray_t (tiled GPU memory). Does
                 NOT expose __cuda_array_interface__ — that protocol
                 means pointer-backed memory and a cudaArray_t isn't
                 one. Used as Mode B return.

Pipeline machinery (sampler, descriptor set layout, pipeline layout,
VkPipeline, descriptor pool/set) lives inside QuadLayer. Pipelines
build using a process-wide VkPipelineCache added to VkContext for
driver-side compilation reuse.

Sync today is heavyweight: vkDeviceWaitIdle (wait for prior frame's
Vulkan reads) + cudaDeviceSynchronize (wait for our writes) inside
submit / release. Fine-grained CUDA-Vulkan binary semaphores ship
later when multi-frame parallelism actually matters.

VizSession::get_vk_context() exposes the underlying context (nullptr
after destroy) so layers can reach pipeline_cache(), cuda_device_id(),
etc. from session-driven code.

Validation rejects zero dimensions, null render pass, and non-kRGBA8
formats (kD32F would create a depth-aspect view that the textured-
quad pipeline can't sample as color).

Milestone tests (test_quad_milestone.cpp) run end-to-end CUDA →
Vulkan → readback round-trip in BOTH modes (Mode A submit + Mode B
acquire/release). 4-quadrant {0, 255}-only RGBA pattern survives the
sRGB attachment encoding because curve endpoints map to themselves.

40 unit + 34 GPU tests pass; unit tests also pass under ASAN+UBSAN.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 95c1b15d-c60e-45ed-973b-0a03fff7f4d2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new viz::QuadLayer that renders a CUDA-backed 2D texture as a fullscreen quad and supports two ingestion modes: copy-based submit(VizBuffer, cudaStream_t) and zero-copy acquire(cudaStream_t)/release(cudaStream_t) returning VizCudaArray. Removes DeviceImage::view(), introduces VizCudaArray (opaque cudaArray_t view), adds Vulkan/CUDA timeline-semaphore interop and CUDA-side wait/signal helpers on DeviceImage, exposes VkContext::pipeline_cache(), converts viz_layers to a STATIC library, and wires per-layer timeline semaphores into VizCompositor submission. Extensive unit and end-to-end tests and test helpers for CUDA–Vulkan interop were added.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Session as VizSession
    participant Layer as QuadLayer
    participant DevImg as DeviceImage
    participant CUDA as CUDA Runtime
    participant VK as Vulkan

    Note over App,Layer: Mode A — submit(VizBuffer, stream)
    App->>Layer: submit(VizBuffer, stream)
    Layer->>CUDA: cudaSetDevice(ctx.cuda_id)
    Layer->>DevImg: cuda_wait_for_vk_read(stream)
    Layer->>CUDA: cudaMemcpy2DToArrayAsync(..., stream)
    Layer->>DevImg: cuda_signal_write_done(stream)

    Note over App,Layer: Mode B — acquire/release(stream)
    App->>Layer: acquire(stream)
    Layer->>CUDA: cudaSetDevice(ctx.cuda_id)
    Layer->>DevImg: cuda_wait_for_vk_read(stream)
    Layer-->>App: VizCudaArray (cudaArray_t)
    App->>CUDA: cudaMemcpy2DToArray(..., stream)
    App->>Layer: release(stream)
    Layer->>DevImg: cuda_signal_write_done(stream)

    Note over Session,Layer: Render + timeline semaphore submit
    Session->>Layer: get_wait_semaphores()
    Layer->>DevImg: provide cuda_done_writing() + value
    Session->>VK: build VkTimelineSemaphoreSubmitInfo (waits/signals)
    Session->>VK: vkQueueSubmit(...)
    VK-->>Layer: submission success
    Layer->>Layer: commit_pending_signals() (advance vk_done_reading value)
    VK->>DevImg: Vulkan reads shader-sampled image during render
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Prior changes that introduced DeviceImage CUDA–Vulkan interop and VkContext CUDA device matching (base interop infra that QuadLayer builds upon).
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately and concisely summarizes the primary change: adding QuadLayer, a textured-quad pipeline, and VizCudaArray. It captures the key deliverables without being vague.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fm/televiz_m3b

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/viz/layers_tests/cpp/test_quad_layer.cpp`:
- Around line 48-76: The three tests in test_quad_layer.cpp titled about
render_pass, resolution, and format never reach those checks because QuadLayer's
constructor first checks VkContext::is_initialized() on the default-constructed
VkContext; move the argument validation (render_pass, cfg.resolution,
cfg.format) to occur before the context initialization check in
QuadLayer::QuadLayer (so the ctor throws for invalid
render_pass/resolution/format even with an uninitialized VkContext), or
alternatively initialize a real VkContext in those TEST_CASEs and mark them as
[gpu] so the intended validations run; update references to QuadLayer::Config,
QuadLayer::QuadLayer, and VkContext::is_initialized() accordingly.

In `@src/viz/layers/cpp/quad_layer.cpp`:
- Around line 99-139: QuadLayer::destroy currently resets device_image_ but
leaves ctx_ and other state live, allowing later public methods (submit,
acquire, record) to operate on null resources; update destroy() to mark the
instance as destroyed (e.g., clear ctx_ or set a dedicated bool destroyed_),
ensure all Vulkan handles (descriptor_pool_, pipeline_, pipeline_layout_,
descriptor_set_layout_, sampler_) and device_image_ are nulled/cleared, and make
public entry points (submit, acquire, record, any public getters) check the
destroyed flag or ctx_ and return an explicit error or no-op instead of
dereferencing resources; also make destroy() idempotent so repeated calls are
safe and add tests for invalid input rejection, state-machine transitions,
exception recovery, and idempotent destroy as described.
- Around line 188-199: The acquire() method (QuadLayer::acquire) must set the
CUDA device before returning the VizCudaArray so subsequent CUDA ops run on the
correct context: call cudaSetDevice(ctx_->cuda_device_id()) at the start (or
immediately before returning) of QuadLayer::acquire(), mirroring the device
pinning done in submit() and release(); ensure you do this prior to exposing
device_image_->cuda_array() in the returned VizCudaArray and handle any
cudaSetDevice error consistently with existing error handling.

In `@src/viz/session_tests/cpp/test_quad_milestone.cpp`:
- Around line 35-49: The current gpu_available() helper inspects only Vulkan
devices via viz::VkContext::enumerate_physical_devices() which can falsely pass
when CUDA is missing; update the tests to use the canonical check
viz::testing::is_gpu_available() and early-exit with SKIP() when it returns
false (add the SKIP() call before any CUDA/Vulkan calls), or replace
gpu_available() usages with viz::testing::is_gpu_available(); ensure the test
functions that currently call gpu_available() (e.g., the functions referencing
gpu_available(), and any code paths that invoke CUDA after the gate) perform
SKIP() if viz::testing::is_gpu_available() is false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: de5981b8-20a5-463c-8ffb-f3f63e900044

📥 Commits

Reviewing files that changed from the base of the PR and between 4da8221 and d8be634.

📒 Files selected for processing (17)
  • src/viz/AGENTS.md
  • src/viz/core/cpp/device_image.cpp
  • src/viz/core/cpp/inc/viz/core/device_image.hpp
  • src/viz/core/cpp/inc/viz/core/viz_types.hpp
  • src/viz/core/cpp/inc/viz/core/vk_context.hpp
  • src/viz/core/cpp/vk_context.cpp
  • src/viz/core_tests/cpp/test_device_image.cpp
  • src/viz/layers/cpp/CMakeLists.txt
  • src/viz/layers/cpp/inc/viz/layers/quad_layer.hpp
  • src/viz/layers/cpp/quad_layer.cpp
  • src/viz/layers_tests/cpp/CMakeLists.txt
  • src/viz/layers_tests/cpp/test_quad_layer.cpp
  • src/viz/session/cpp/inc/viz/session/viz_session.hpp
  • src/viz/session/cpp/viz_session.cpp
  • src/viz/session_tests/cpp/CMakeLists.txt
  • src/viz/session_tests/cpp/test_quad_milestone.cpp
  • src/viz/shaders/cpp/textured_quad.frag

Comment thread src/viz/layers_tests/cpp/test_quad_layer.cpp Outdated
Comment thread src/viz/layers/cpp/quad_layer.cpp
Comment thread src/viz/layers/cpp/quad_layer.cpp Outdated
Comment thread src/viz/session_tests/cpp/test_quad_milestone.cpp Outdated
…DA in acquire

Three review-driven fixes:

1. Constructor validation order: arg-shape checks (format, resolution,
   render_pass) now run before VkContext::is_initialized(). The unit
   tests for those rejection paths previously short-circuited on the
   ctx check and never reached their named check; with the new order
   each unit test exercises exactly the path it claims.

2. Use-after-destroy guard: submit / acquire / release / record now
   throw std::logic_error if device_image_ has been reset. Previously
   they would deref null resources and crash. Idempotent destroy
   stays a no-op. Added a [gpu] test covering the guard.

3. acquire() now calls cudaSetDevice on the calling thread before
   exposing the cudaArray_t, mirroring submit / release. Without
   this, a worker-thread caller's CUDA writes to the returned
   handle would target whatever device CUDA defaulted to.

Also tightened the gpu_available() helpers in test_quad_layer and
test_quad_milestone to probe CUDA, not just Vulkan — so a Vulkan-
only machine skips cleanly instead of crashing in
cudaImportExternalMemory. (The canonical viz::testing::is_gpu_available
stays Vulkan-only since not all viz tests need CUDA.)

40 unit + 35 GPU tests pass; unit tests pass under ASAN+UBSAN.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@farbod-nv farbod-nv changed the title Add QuadLayer + textured-quad pipeline + VizCudaArray [TeleViz] Add QuadLayer + textured-quad pipeline + VizCudaArray May 5, 2026
…shared test helpers

Folds the deferred fine-grained sync work into m3b plus the
substantive review fixes that surfaced from it.

DeviceImage:
- Vulkan timeline semaphores (VK_KHR_timeline_semaphore, initial
  value 0) imported into CUDA as TimelineSemaphoreFd. No first-frame
  primer; timeline waits for >= 0 are trivially satisfied.
- Counters split into reserve_*() / commit_*() pairs. reserve_*
  returns the next monotonic value; commit_* advances the public
  value via monotonic-max only after the signal has been queued
  successfully. A failed cuda*Async or vkQueueSubmit no longer
  poisons the timeline.
- DeviceImage exposes small CUDA-side primitives
  (cuda_wait_for_vk_read, cuda_signal_write_done) so QuadLayer
  orchestrates them rather than reimplementing the calls.

QuadLayer:
- acquire/release/submit/record state machine via atomic acquired_.
  Single-producer-thread contract documented in the header (multi-
  producer feeding the same layer is undefined; use multiple
  layers). Catches double-acquire, release-without-acquire, submit-
  while-acquired, record-while-acquired as std::logic_error.
- submit / acquire / release accept an optional cudaStream_t
  (default 0). Producers can pass their own stream so the
  wait/copy/signal sequence is correctly ordered after the
  producer's prior work on that stream.

LayerBase + VizCompositor:
- Layers reserve signal values via get_signal_semaphores() but don't
  commit until VizCompositor calls commit_pending_signals() after
  vkQueueSubmit returns success. A failed submit leaves counters
  un-advanced.
- VizCompositor snapshots the visible-layer set ONCE at the top of
  render() and uses that snapshot for record / sema-collect /
  commit. A mid-frame is_visible() toggle would otherwise let a
  layer record draws but skip semaphore wiring (or vice versa) and
  desync the timeline counters.

Tests:
- Canonical viz::testing::is_cuda_vulkan_interop_available() probe
  in test_helpers.hpp checks Vulkan-CUDA UUID overlap (mirroring
  VkContext::init's requirement). New viz::test_support INTERFACE
  CMake target so layers_tests / session_tests share one
  implementation; the duplicated gpu_available() helpers are gone.
- New tests: state-machine rejections (double-acquire,
  release-without-acquire, submit-while-acquired, release-after-
  release), non-default-stream submit, multi-frame submit/render/
  readback loop (16 frames, varying solid colors) verifying
  pipelining correctness.

40 unit + 38 GPU tests pass; unit tests pass under ASAN+UBSAN.

Signed-off-by: Farbod Motlagh <fmotlagh@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@farbod-nv
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Lands the M3c review findings that were merged into the m3b commit
description but not actually present in the code: the producer side
of QuadLayer was still a single atomic<bool>, DeviceImage was still
single-format-per-image, and the milestone tests didn't exercise the
sRGB color-space round-trip.

QuadLayer:
- Replace acquired_ with std::atomic<ProducerState> (Idle / Submitting
  / Acquired). submit() and acquire() use compare_exchange_strong to
  enter their states; submit() takes an RAII guard so an exception
  doesn't strand the state machine. record() rejects unless Idle, so
  a Mode A submit in flight or a Mode B acquire-without-release can't
  race with sampling. Error messages quote the observed state so
  misuse is easy to triage.
- get_signal_semaphores() now refuses to reserve a new vk_done_reading
  value if the previous reservation hasn't been committed. That would
  orphan the first reservation and leave Vulkan signaling a value
  whose public counter never advances — silently breaking future
  CUDA waits.

DeviceImage:
- Storage VkImage is created with VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT
  in UNORM, sampling view in SRGB. CUDA writes raw bytes; the sampler
  decodes sRGB->linear; the SRGB color attachment encodes on write;
  net round-trip is identity for arbitrary RGBA byte values.
- create() now hard-rejects PixelFormat::kD32F. The depth-image
  interop contract (sample semantics, layout transitions, color-space
  view) isn't designed yet — better to refuse than half-implement it
  until ProjectionLayer ships.
- vk_format() now returns the SRGB sampling view format (the format
  layers and tests actually care about); storage format is internal.

Tests:
- test_quad_layer: rename "Mode B state machine" → "producer state
  machine" and add a record-while-Acquired case using a real cmd
  buffer (the state check fires before any vk command is issued).
  Use a real cudaMalloc'd device pointer so the submit() rejection
  path is reached for state reasons, not arg-validation reasons.
- test_quad_milestone: add a midtone-RGBA round-trip case (64, 128,
  200, 255). The {0, 255}-only end-to-end tests can't catch a wrong
  color-space wiring because those endpoints are fixed points of any
  gamma curve.
- test_device_image: vk_format() check updated for the SRGB view
  format with a comment pointing at the storage/view split.

Build / sanitizers:
- All 79 viz tests (40 unit + 39 gpu) pass locally.
- All 40 viz unit tests pass under ASAN+UBSAN (CI's sanitizer gate).

Co-authored-by: Cursor <cursoragent@cursor.com>
farbod-nv and others added 3 commits May 5, 2026 11:23
Mode A's submit() now routes into one of three CUDA-Vulkan interop
slots; record() promotes the most recent publish at frame start. The
producer never blocks and the renderer always samples the latest
finished frame, regardless of how producer rate compares to render
rate. The bidirectional vk_done_reading sync is gone — with three
slots the producer never targets a slot the renderer is currently
sampling, so it isn't needed.

DeviceImage:
- Drop vk_done_reading semaphore + reserve/commit/cuda_wait_for_vk_read.
- Keep cuda_done_writing as the one producer->consumer signal.
- Inline the reserve/commit dance into cuda_signal_write_done() (no
  external callers anymore).
- Comments reflect the one-direction sync.

LayerBase:
- Drop SignalSemaphore, get_signal_semaphores(), commit_pending_signals().
  No layer needs the compositor to signal a producer-facing semaphore
  anymore.
- get_wait_semaphores() stays — that's how QuadLayer hands the
  fragment-shader wait on cuda_done_writing[in_use] to the compositor.

QuadLayer:
- 3 DeviceImage slots, 3 descriptor sets (one per slot's image view),
  one shared sampler / pipeline / pipeline layout.
- Two atomic uint8_t indices (latest_, in_use_) with a kSlotNone
  sentinel for the pre-first-publish frame; record() skips the draw
  cleanly when nothing has been published.
- submit() picks the unique slot that is neither latest_ nor in_use_,
  cudaMemcpyAsync into it, signal cuda_done_writing, store latest_.
  No state machine, no RAII guards, no producer/render reconciliation.
- record() promotes latest_ -> in_use_ at frame start; previous
  in_use_ becomes free for the next submit(). get_wait_semaphores()
  reads in_use_ (set by record(), which the compositor calls first).
- VizCudaArray gone. cudaArray_t is internal to the mailbox now.
  Producers always present pixels through VizBuffer.

VizCompositor:
- Drop the signal-semaphore collection + commit_pending_signals
  fan-out. The submit-info side only needs waits.
- visible_layers snapshot stays — still important so a mid-frame
  visibility toggle doesn't desync record() from get_wait_semaphores().

Tests:
- test_quad_layer: drop the Mode B / ProducerState / acquire-release
  fixtures. Add a "back-to-back submits cycle through mailbox slots"
  test that confirms cuda_done_writing counters advance once per
  submit across the three slots. Add a kSlotCount out-of-range check
  on the diagnostic device_image(slot) accessor.
- test_quad_milestone: drop the Mode B end-to-end test. Mode A
  submit/render/readback, multi-frame palette loop, and midtone
  sRGB round-trip all stay (and now exercise the mailbox path).
- test_device_image: no changes needed — vk_format(), idempotent
  destroy, and the byte-pattern round-trip don't depend on the
  dropped APIs.

Net diff: -311 lines across QuadLayer + DeviceImage + LayerBase
+ tests + viz/AGENTS.md. A future zero-copy acquire/release variant
is documented under "Future: zero-copy acquire/release" in DESIGN.md
for revisit when a real producer (NVDEC, custom CUDA kernel) wants it.

Co-authored-by: Cursor <cursoragent@cursor.com>
Three follow-ups from the triple-buffer mailbox review.

Documentation:
- quad_layer.hpp: note that mailbox correctness depends on
  VizCompositor::render() being synchronous (frame_sync_->wait() at
  end of frame). Multi-frame-in-flight would require in_use_ to
  become per-in-flight-frame.
- viz_compositor.cpp: cross-reference the dependency on the trailing
  frame_sync_->wait() so anyone touching that wait sees what depends
  on it.

DeviceImage:
- Drop the dead CAS loop in cuda_signal_write_done. Single producer
  per DeviceImage means reserved is always > the previously committed
  _value_, so a release-store suffices. The reserve/commit split via
  cuda_done_writing_next_ stays — it's still what isolates a failed
  signal from advancing the public counter (and avoids reusing a
  timeline value on retry, which is UB).

Tests (test_quad_milestone.cpp):
- "with no submit yet renders the clear color" — pins the kSlotNone
  short-circuit in record() / get_wait_semaphores(). Configures a
  green clear and confirms readback is green when no submit() has run.
- "re-renders the same publish when no new submit arrives" — pins
  in_use_ stability across frames when latest_ doesn't change.
- "fast producer: render samples only the latest publish" — pins the
  core mailbox guarantee: 5 back-to-back submits with distinct colors,
  one render, readback equals only the last submit's color.

All 40 unit tests pass; the 3 new [gpu] tests register and skip
cleanly on a no-CUDA-Vulkan-interop machine via the existing
is_cuda_vulkan_interop_available() gate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… scrub Mode B refs

Follow-up review-pass on the triple-buffer mailbox MR.

device_image.cpp:
- Inline the create_one lambda in create_interop_semaphores. It was
  factored out to dedup two semaphores (cuda_done_writing +
  vk_done_reading); after Mode B and the back-channel sync were
  dropped, only one semaphore is created, so the closure adds nothing.

test_helpers.hpp:
- Collapse is_gpu_available() (Vulkan-only) and
  is_cuda_vulkan_interop_available() (Vulkan + CUDA UUID-overlap)
  into a single strict is_gpu_available(). The "Vulkan-only" gate
  was actually too loose: VkContext::init() requires the UUID match,
  so any [gpu] test that uses GpuFixture would pass the loose check
  and then throw inside match_cuda_device_to_vulkan(). One canonical
  probe matches the actual init-time contract.

test_quad_layer.cpp / test_quad_milestone.cpp / test_offscreen_render.cpp:
- Drop the per-file `gpu_available()` wrappers that just forwarded
  to one of the two helpers. Call sites now use a `using
  viz::testing::is_gpu_available;` declaration plus
  `is_gpu_available()` directly. Behavior is unchanged; the offscreen
  tests get a slightly stricter gate (CUDA UUID required), which
  matches what GpuFixture already enforced at init.

CMakeLists.txt (core_tests/cpp):
- Update the helper-list comment to reflect the single probe.

viz_buffer.hpp:
- Comment talked about "Mode B submission (acquire/release)" — that
  API is gone. Reword to describe submit() (caller-owned source
  buffer copied into the layer's interop slot) and host-readback
  views.

test_quad_milestone.cpp:
- Comment in the midtone test still said "Mode A / Mode B tests".
  Reword to plain "{0,255}-only round-trip tests".

Net: -47 lines across 7 files. 40/40 unit tests pass; [gpu] tests
register and skip cleanly without GPU.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@junjeechao
Copy link
Copy Markdown

Do we need to change the MR summary? I see acquire() and release() was removed from QuadLayer

@farbod-nv farbod-nv merged commit 208395c into main May 5, 2026
46 checks passed
@farbod-nv farbod-nv deleted the fm/televiz_m3b branch May 5, 2026 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants