Skip to content

[https://nvbugs/6329227][fix] Use pkgutil.extend_path to merge the two flash_attn distributions before…#15498

Open
tensorrt-cicd wants to merge 2 commits into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6329227
Open

[https://nvbugs/6329227][fix] Use pkgutil.extend_path to merge the two flash_attn distributions before…#15498
tensorrt-cicd wants to merge 2 commits into
NVIDIA:mainfrom
tensorrt-cicd:repair-bot-bug6329227

Conversation

@tensorrt-cicd

@tensorrt-cicd tensorrt-cicd commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Root cause: Co-installed flash-attn v2 (regular pkg with init.py) shadows the flash-attn-4 cute-only wheel so flash_attn.cute is invisible; additionally REPO_ROOT in test_visual_gen.py was not updated when the test moved one directory deeper.
  • Fix: Use pkgutil.extend_path to merge the two flash_attn distributions before importing flash_attn.cute.* in flash_attn4.py and parallel.py; add the missing ".." segment to REPO_ROOT in test_visual_gen.py.
  • Automated fix generated by repair-bot

Test plan

  • Verify fix on the same GPU type as the original failure
  • Check for regressions in related tests

Links

Summary by CodeRabbit

  • Bug Fixes

    • Improved flash attention library compatibility by implementing namespace merging to handle scenarios where both legacy and current versions are installed.
  • Tests

    • Re-enabled three previously waived multi-GPU visual generation test cases that validate tensor parallelism configurations.
    • Enhanced distributed test infrastructure to properly propagate Python import paths across spawned child processes.

The integration test test_wan22_t2v_lpips_against_golden_tp[*] failed in
post-merge CI with `ModuleNotFoundError: No module named
'tensorrt_llm.bindings'` raised inside a torch.multiprocessing.spawn
worker. The worker reaches `tensorrt_llm/_torch/visual_gen/mapping.py:
246` (`from tensorrt_llm.bindings.internal.process_group import init_pg`)
during VisualGenMapping.setup_communicators(), which the C++ allreduce
path requires for tp_size > 1.

mp.spawn launches a fresh interpreter that inherits env vars but does
not replay the parent's runtime sys.path mutations. In the failing CI
image, the compiled tensorrt_llm.bindings*.so is reachable from the
parent through a sys.path entry that the child does not rebuild,
so the import fails in the worker only. The matching non-TP variants
(test_wan22_t2v_lpips_against_golden_multi_gpu[*]) skip
setup_communicators() and therefore never hit this import.

Propagate the parent's resolved sys.path through PYTHONPATH before
mp.spawn so the spawn child sees the same import roots the parent used
and init_pg resolves identically in both processes. Removes the three
nvbugs/6329227 waivers added for these variants.

Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
…ualGen test relocation

The TP variants of test_wan22_t2v_lpips_against_golden_tp execute the FA4
attention backend, which imports flash_attn.cute.* from the flash-attn-4
wheel. When the legacy flash-attn (v2) wheel is co-installed and ships its
own __init__.py, Python pins flash_attn.__path__ to the v2 location and
never sees the cute/ subpackage. Use pkgutil.extend_path so flash_attn
behaves as a namespace-extending package and merges both installs.

Additionally, commit d202244 moved test_visual_gen.py one directory
deeper but did not update its REPO_ROOT computation, so the LPIPS eval
script lookup pointed at /code/tensorrt_llm/tests instead of the repo
root. Add the missing '..' so the eval script resolves correctly.

Signed-off-by: Repair Bot <noreply@nvidia.com>
Signed-off-by: tensorrt-cicd <90828364+tensorrt-cicd@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a _merge_flash_attn_namespace() utility that uses pkgutil.extend_path to merge co-installed flash-attn v2 and flash-attn-4 wheel namespaces; calls it in both flash_attn4.py and parallel.py before their respective flash_attn JIT imports. Also fixes REPO_ROOT path depth in the single-GPU test, propagates PYTHONPATH to spawned child processes in the multi-GPU test, and removes three test waivers.

Changes

Visual gen flash-attn namespace fix and test env repairs

Layer / File(s) Summary
flash_attn namespace merge utility and callers
tensorrt_llm/_torch/visual_gen/attention_backend/utils.py, tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py, tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py
Adds _merge_flash_attn_namespace() that re-extends flash_attn.__path__ via pkgutil.extend_path, then calls it in both flash_attn4.py and parallel.py inside their existing ImportError/OSError try-blocks before importing the flash_attn JIT symbols.
Test environment: PYTHONPATH propagation, REPO_ROOT fix, and waiver removal
tests/integration/defs/examples/visual_gen/test_visual_gen_multi_gpu.py, tests/integration/defs/examples/visual_gen/test_visual_gen.py, tests/integration/test_lists/waives.txt
Propagates parent sys.path as PYTHONPATH to mp.spawn child processes; corrects REPO_ROOT traversal depth by one level; removes three test_wan22_t2v_lpips_against_golden_tp waivers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • NVIDIA/TensorRT-LLM#15395: Both PRs modify tests/integration/test_lists/waives.txt to change which visual_gen WAN22 multi-GPU/t2v LPIPS test cases are skipped.
  • NVIDIA/TensorRT-LLM#15389: Both PRs modify tests/integration/test_lists/waives.txt by changing the set of skip/waive entries.
  • NVIDIA/TensorRT-LLM#15391: Both PRs modify the same tests/integration/test_lists/waives.txt file to adjust test skip entries.

Suggested reviewers

  • juney-nvidia
  • jieli-matrix
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: using pkgutil.extend_path to merge flash_attn distributions, which is the primary fix addressing the core issue.
Description check ✅ Passed The description covers the root cause, fix, test verification, and includes the bug link. All key sections are present and adequately detailed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/defs/examples/visual_gen/test_visual_gen_multi_gpu.py`:
- Around line 112-117: The code modifies os.environ["PYTHONPATH"] on line 115 to
set up the environment for the spawned workers but never restores the original
value after mp.spawn() completes, which can cause subsequent tests to inherit
this modified state and become order-dependent. Save the original PYTHONPATH
value before setting it (handle the case where it may not exist), then restore
it after the mp.spawn() call finishes to ensure the environment is clean for
subsequent tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f1a13485-c43c-4694-83ef-fc78270d1d05

📥 Commits

Reviewing files that changed from the base of the PR and between b9b132b and 422a575.

⛔ Files ignored due to path filters (1)
  • tests/integration/defs/examples/visual_gen/golden/visual_gen_lpips/visual_gen_lpips_golden_media.zip is excluded by !**/*.zip
📒 Files selected for processing (6)
  • tensorrt_llm/_torch/visual_gen/attention_backend/flash_attn4.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/parallel.py
  • tensorrt_llm/_torch/visual_gen/attention_backend/utils.py
  • tests/integration/defs/examples/visual_gen/test_visual_gen.py
  • tests/integration/defs/examples/visual_gen/test_visual_gen_multi_gpu.py
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

Comment on lines +112 to 117
# mp.spawn starts a fresh interpreter that does not inherit the parent's
# runtime sys.path mutations; without this, the child cannot import
# tensorrt_llm.bindings.internal.process_group (needed by the tp_size>1
# C++ allreduce path).
os.environ["PYTHONPATH"] = os.pathsep.join(filter(None, sys.path))
mp.spawn(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Restore PYTHONPATH after spawning workers.

Line 116 mutates process-global environment state and never restores it; later tests can inherit this synthetic value and become order-dependent/flaky.

🔧 Proposed fix
-    os.environ["PYTHONPATH"] = os.pathsep.join(filter(None, sys.path))
-    mp.spawn(
-        _distributed_worker,
-        args=(world_size, backend, test_fn, port, kwargs),
-        nprocs=world_size,
-        join=True,
-    )
+    previous_pythonpath = os.environ.get("PYTHONPATH")
+    os.environ["PYTHONPATH"] = os.pathsep.join(map(str, filter(None, sys.path)))
+    try:
+        mp.spawn(
+            _distributed_worker,
+            args=(world_size, backend, test_fn, port, kwargs),
+            nprocs=world_size,
+            join=True,
+        )
+    finally:
+        if previous_pythonpath is None:
+            os.environ.pop("PYTHONPATH", None)
+        else:
+            os.environ["PYTHONPATH"] = previous_pythonpath
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/defs/examples/visual_gen/test_visual_gen_multi_gpu.py`
around lines 112 - 117, The code modifies os.environ["PYTHONPATH"] on line 115
to set up the environment for the spawned workers but never restores the
original value after mp.spawn() completes, which can cause subsequent tests to
inherit this modified state and become order-dependent. Save the original
PYTHONPATH value before setting it (handle the case where it may not exist),
then restore it after the mp.spawn() call finishes to ensure the environment is
clean for subsequent tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants