Refactor: clean up CI log (deselect filters, ::group:: folding, buffered output)#647
Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom Apr 22, 2026
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
f121763 to
1b5122d
Compare
…: folding Makes the ST CI log readable again. Six coordinated changes land together because each amplifies the others. conftest.pytest_collection_modifyitems: - Static filter mismatches (wrong ``--level`` / ``--runtime`` / wrong platform) now go through ``config.hook.pytest_deselected`` instead of receiving a ``pytest.mark.skip``. Same path pytest's ``-k`` / ``-m`` use. Each L2 runtime subprocess previously re-collected ~50 items and reported them as ``N skipped`` in the summary; they now vanish cleanly under ``N deselected``. - User-actionable problems (``--platform required``) intentionally stay as real skips so the reason still surfaces in the summary. parallel_scheduler.run_jobs: - Subprocesses are launched with ``stdout=PIPE, stderr=STDOUT, text=True, bufsize=1`` and a dedicated daemon pump thread drains the pipe into a buffer (avoids the 64 KB pipe-buffer deadlock a verbose child would otherwise hit). - ``JobResult`` now carries ``output`` and ``duration_s``; ``_reap_one`` joins the pump, snapshots the buffer, records elapsed time. - ``_RunningJob`` dataclass replaces the ``(Job, list[int])`` tuple so per-job book-keeping stays typed. conftest._dispatch_test_phases: - Each Resource job emits ``::group::<label> [PASS 8.3s, devices=[...]]`` then the captured subprocess output then ``::endgroup::``. GitHub Actions renders these as collapsible folds; locally they're plain stdout. - L2 runtime subprocesses run serially, so their stdout still streams live - we just bracket ``subprocess.run`` with ``::group::`` / ``::endgroup::``. - Failures get an out-of-group ``*** FAIL: <label> - expand group above ***`` line so a reviewer scanning the folded log still sees them. examples/workers/* co-located tests: - Promote each worker directory to a package by adding a minimal ``__init__.py``. The per-test 6-line ``SourceFileLoader(...) .load_module()`` block collapses to a single ``from .main import run``. Side effect: kills the ``<frozen importlib._bootstrap>:533: DeprecationWarning`` that fired ~40 times per L2 runtime summary (``load_module`` is removed in Python 3.15). pyproject.toml filterwarnings: - Silence torch's ``UserWarning: Failed to initialize NumPy`` emitted once per pytest subprocess in CI (torch's CPU wheel on pytorch.org does not pull numpy transitively). Our code never touches numpy interop (``tensor.numpy()`` / ``torch.from_numpy()``) - we use pure torch paths - so the missing interop is harmless. Silencing at the reporter level is preferred over adding an unused ~30 MB dependency. .github/workflows/ci.yml ``ut`` job: - Narrow the scope from ``pytest tests`` to ``pytest tests/ut`` so the Python unit-test step stops collecting ``tests/st/`` SceneTestCase items and emitting ~40 ``SKIPPED ... --platform required`` lines. - Pass ``--clone-protocol https`` to mirror the ST jobs: GH-hosted runners have no SSH keys, so pytest's ``pytest_configure`` pre-clone of pto-isa (default ssh, kept unchanged for local dev) would emit a ``Permission denied (publickey)`` git stderr line before the lazy fallback kicks in. tests/ut/py/test_scene_test_cache.py: - Drop ``test_clear_compile_cache_releases_chip_callable_refs``. The remaining ``test_clear_compile_cache_drops_cached_chip_callables`` already covers the core regression (cache empty after the cleanup call). The removed test relied on ``gc.get_referrers`` seeing the ``_compile_cache`` dict as a referrer of the ChipCallable instance, which depended on CPython's GC-tracking heuristic for dicts holding only untracked values - fine on Python 3.14 (passes locally) but empty-list on Python 3.10 (the UT-CI Python). Widening the regression surface via a fragile implementation-detail assertion was a bad trade. Verified with ``pytest examples tests/st --platform a2a3sim --device 0-15 -v``: exit=0, 26 tests run, 25 deselected, no importlib deprecation warnings, each Resource job and each L2 runtime emits a single collapsible group. UT scope locally: 207 collected / 13 deselected / 0 skipped / 0 SSH-clone stderr.
1b5122d to
6ee206c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The ST CI log became unreadable after #307 split execution into phases — each L2 runtime subprocess re-collected ~50 items and dumped
N skipped+ 40 importlib deprecation warnings + a NumPy-init warning per subprocess, while parallel Resource-phase subprocesses interleaved their stdout. Under-vit was unusable.What this PR does
Four coordinated changes that each amplify the others:
pytest_collection_modifyitems)N skipped→N deselected, no per-item noise under-vparallel_scheduler.run_jobs)::group::folds (_dispatch_test_phases)load_module→exec_module(examples/workers/*)<frozen importlib._bootstrap>:533: DeprecationWarninglines per L2 runtimenumpy>=1.24in test extras (pyproject.toml)UserWarning: Failed to initialize NumPyemitted once per pytest subprocess in CIBefore / After
Per-L2-runtime summary:
Per-Resource-job (folded view on GitHub Actions):
Failures stay visible even when folded:
Verified locally
First push of this PR already passed all 14 CI checks; force-pushed to add the numpy dep fix.
Deliberately out of scope
os.fork()DeprecationWarning inworker.py:728/745— different root cause, not log-readabilityactions/checkout@v4,actions/cache@v3,actions/setup-python@v5Node.js 20 deprecation (GitHub's June 2026 cutover) — scheduled maintenance, separate PR for action version bumps##[warning]gcc/ninja already installedsetup noise — not fixable from our sideTradeoffs noted
-rs(show skip reasons) can't be used to debug filter behavior. No one currently uses that.