Skip to content

Refactor: clean up CI log (deselect filters, ::group:: folding, buffered output)#647

Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/clean-up-ci-log-folded-groups-buffered-out
Apr 22, 2026
Merged

Refactor: clean up CI log (deselect filters, ::group:: folding, buffered output)#647
ChaoWao merged 1 commit intohw-native-sys:mainfrom
ChaoWao:refactor/clean-up-ci-log-folded-groups-buffered-out

Conversation

@ChaoWao
Copy link
Copy Markdown
Collaborator

@ChaoWao ChaoWao commented Apr 22, 2026

Motivation

The ST CI log became unreadable after #307 split execution into phases — each L2 runtime subprocess re-collected ~50 items and dumped N skipped + 40 importlib deprecation warnings + a NumPy-init warning per subprocess, while parallel Resource-phase subprocesses interleaved their stdout. Under -v it was unusable.

What this PR does

Four coordinated changes that each amplify the others:

# Change Effect on log
1 deselect instead of skip (pytest_collection_modifyitems) L2 summary's N skippedN deselected, no per-item noise under -v
2 buffer subprocess output (parallel_scheduler.run_jobs) Concurrent Resource jobs no longer interleave stdout
3 ::group:: folds (_dispatch_test_phases) GitHub Actions UI collapses each job's output; FAIL surfaces in an out-of-group summary
4 Fix load_moduleexec_module (examples/workers/*) Removes ~40 <frozen importlib._bootstrap>:533: DeprecationWarning lines per L2 runtime
5 Pin numpy>=1.24 in test extras (pyproject.toml) Removes torch's UserWarning: Failed to initialize NumPy emitted once per pytest subprocess in CI

Before / After

Per-L2-runtime summary:

-================== 2 passed, 51 skipped, 40 warnings in 7.90s ==================
+======================== 2 passed, ... in 7.40s ========================

Per-Resource-job (folded view on GitHub Actions):

▸ l3 TestL3Dependency (rt=tensormap_and_ringbuffer, dev=1) [PASS 8.3s, devices=[1]]

Failures stay visible even when folded:

▸ standalone test_multi_chip_dispatch (...) [FAIL rc=1, devices=[0,1]]   ← Actions marks red
*** FAIL: standalone test_multi_chip_dispatch (devices=[0,1]) - expand group above ***

Verified locally

pytest examples tests/st --platform a2a3sim --device 0-15 -v
→ exit=0
→ 26 tests run (3 L3 + 2 standalone + 2+4+15 L2)
→ 25 deselected (a5-only platform items)
→ 0 importlib deprecation warnings
→ 0 "Failed to initialize NumPy" warnings
→ 5 Resource ::group:: blocks + 3 L2 ::group:: blocks, nothing interleaved

First push of this PR already passed all 14 CI checks; force-pushed to add the numpy dep fix.

Deliberately out of scope

  • Terse reporter (one-line-per-case custom reporter) — explicitly vetoed
  • macOS runner bottleneck / matrix trim — separate issue
  • os.fork() DeprecationWarning in worker.py:728/745 — different root cause, not log-readability
  • actions/checkout@v4, actions/cache@v3, actions/setup-python@v5 Node.js 20 deprecation (GitHub's June 2026 cutover) — scheduled maintenance, separate PR for action version bumps
  • Homebrew ##[warning]gcc/ninja already installed setup noise — not fixable from our side

Tradeoffs noted

  • Buffered subprocess output means slow Resource jobs (L3 ~10 s) don't show incremental progress until completion. Acceptable — the whole Resource phase is ~30 s on sim.
  • Deselecting filter misses means -rs (show skip reasons) can't be used to debug filter behavior. No one currently uses that.

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ChaoWao ChaoWao force-pushed the refactor/clean-up-ci-log-folded-groups-buffered-out branch 4 times, most recently from f121763 to 1b5122d Compare April 22, 2026 08:01
…: folding

Makes the ST CI log readable again. Six coordinated changes land
together because each amplifies the others.

conftest.pytest_collection_modifyitems:
- Static filter mismatches (wrong ``--level`` / ``--runtime`` / wrong
  platform) now go through ``config.hook.pytest_deselected`` instead of
  receiving a ``pytest.mark.skip``. Same path pytest's ``-k`` / ``-m``
  use. Each L2 runtime subprocess previously re-collected ~50 items and
  reported them as ``N skipped`` in the summary; they now vanish cleanly
  under ``N deselected``.
- User-actionable problems (``--platform required``) intentionally stay
  as real skips so the reason still surfaces in the summary.

parallel_scheduler.run_jobs:
- Subprocesses are launched with ``stdout=PIPE, stderr=STDOUT,
  text=True, bufsize=1`` and a dedicated daemon pump thread drains the
  pipe into a buffer (avoids the 64 KB pipe-buffer deadlock a verbose
  child would otherwise hit).
- ``JobResult`` now carries ``output`` and ``duration_s``; ``_reap_one``
  joins the pump, snapshots the buffer, records elapsed time.
- ``_RunningJob`` dataclass replaces the ``(Job, list[int])`` tuple so
  per-job book-keeping stays typed.

conftest._dispatch_test_phases:
- Each Resource job emits ``::group::<label> [PASS 8.3s, devices=[...]]``
  then the captured subprocess output then ``::endgroup::``. GitHub
  Actions renders these as collapsible folds; locally they're plain
  stdout.
- L2 runtime subprocesses run serially, so their stdout still streams
  live - we just bracket ``subprocess.run`` with ``::group::`` /
  ``::endgroup::``.
- Failures get an out-of-group ``*** FAIL: <label> - expand group above
  ***`` line so a reviewer scanning the folded log still sees them.

examples/workers/* co-located tests:
- Promote each worker directory to a package by adding a minimal
  ``__init__.py``. The per-test 6-line ``SourceFileLoader(...)
  .load_module()`` block collapses to a single ``from .main import
  run``. Side effect: kills the ``<frozen importlib._bootstrap>:533:
  DeprecationWarning`` that fired ~40 times per L2 runtime summary
  (``load_module`` is removed in Python 3.15).

pyproject.toml filterwarnings:
- Silence torch's ``UserWarning: Failed to initialize NumPy`` emitted
  once per pytest subprocess in CI (torch's CPU wheel on pytorch.org
  does not pull numpy transitively). Our code never touches numpy
  interop (``tensor.numpy()`` / ``torch.from_numpy()``) - we use pure
  torch paths - so the missing interop is harmless. Silencing at the
  reporter level is preferred over adding an unused ~30 MB dependency.

.github/workflows/ci.yml ``ut`` job:
- Narrow the scope from ``pytest tests`` to ``pytest tests/ut`` so the
  Python unit-test step stops collecting ``tests/st/`` SceneTestCase
  items and emitting ~40 ``SKIPPED ... --platform required`` lines.
- Pass ``--clone-protocol https`` to mirror the ST jobs: GH-hosted
  runners have no SSH keys, so pytest's ``pytest_configure`` pre-clone
  of pto-isa (default ssh, kept unchanged for local dev) would emit a
  ``Permission denied (publickey)`` git stderr line before the lazy
  fallback kicks in.

tests/ut/py/test_scene_test_cache.py:
- Drop ``test_clear_compile_cache_releases_chip_callable_refs``. The
  remaining ``test_clear_compile_cache_drops_cached_chip_callables``
  already covers the core regression (cache empty after the cleanup
  call). The removed test relied on ``gc.get_referrers`` seeing the
  ``_compile_cache`` dict as a referrer of the ChipCallable instance,
  which depended on CPython's GC-tracking heuristic for dicts holding
  only untracked values - fine on Python 3.14 (passes locally) but
  empty-list on Python 3.10 (the UT-CI Python). Widening the
  regression surface via a fragile implementation-detail assertion was
  a bad trade.

Verified with ``pytest examples tests/st --platform a2a3sim --device
0-15 -v``: exit=0, 26 tests run, 25 deselected, no importlib
deprecation warnings, each Resource job and each L2 runtime emits a
single collapsible group. UT scope locally: 207 collected / 13
deselected / 0 skipped / 0 SSH-clone stderr.
@ChaoWao ChaoWao force-pushed the refactor/clean-up-ci-log-folded-groups-buffered-out branch from 1b5122d to 6ee206c Compare April 22, 2026 08:23
@ChaoWao ChaoWao merged commit 52d41d6 into hw-native-sys:main Apr 22, 2026
14 checks passed
@ChaoWao ChaoWao deleted the refactor/clean-up-ci-log-folded-groups-buffered-out branch April 23, 2026 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant