use llvm cov for coverage and justification#549
Conversation
- Move quality/coverage.bazelrc to quality/coverage/coverage.bazelrc for better organization alongside the coverage tooling. - Enable user.bazelrc support via try-import in .bazelrc. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…reports Replace the genhtml/lcov-based coverage report pipeline with a native llvm-cov implementation that produces HTML reports directly from profraw data. Key changes: - Add merger.py as --coverage_output_generator (per-test profraw→profdata+HTML) - Add reporter.py as --coverage_report_generator (aggregate all tests) - Add filter_regexes.txt config file for configurable source filtering - Update coverage.bazelrc with llvm-cov flags (--experimental_use_llvm_covmap, --dynamic_mode=off, continuous profiling mode) - Update generate_coverage_html.sh to extract from the new zip format The source filtering is configurable via filter_regexes.txt (loaded from runfiles) and can be overridden via --filter_sources CLI argument. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a justification system that allows developers to argue lines that cannot be covered by tests (defensive programming, tool false positives, platform-specific code). Components: - justify.py: YAML loader + source scanner → resolved manifest JSON - effective_coverage.py: HTML post-processor + effective coverage calculator - coverage_justifications.yaml: initial justification database - Integration into generate_coverage_html.sh (runs after report generation) Justifications can be applied via: - In-code markers: // COV_JUSTIFIED <id> - YAML locations: file + line range (no code change needed) Justified lines appear yellow in the HTML report, and effective coverage is calculated as (covered + justified) / instrumented. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add quality/coverage/README.md with pipeline architecture overview - Add quality/coverage/llvm_cov/README.md with tool-level documentation - Update quality/quality.md to reflect new llvm-cov pipeline and link to the detailed documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| # at the report level. | ||
|
|
||
| # Exclude mock files. | ||
| .*mock.*\.(h|hpp|cpp)$ |
There was a problem hiding this comment.
Better make it .*_mock.*\.(h|hpp|cpp)$. I'd prefer to start very strict and relax it when we find files that should be excluded.
| .*mock.*\.(h|hpp|cpp)$ | ||
|
|
||
| # Exclude external dependencies (anything under external/). | ||
| .*/external/.* |
There was a problem hiding this comment.
Not a big fan of the starting .*. What is the base path? Could we live with external/.*?
| return object_files | ||
|
|
||
|
|
||
| def find_llvm_bin() -> Path: |
There was a problem hiding this comment.
Uff... not really sure whether this is a good idea.
This could silently break without us being aware.
Two problems I see:
- This might become unhermetic if we ever end up with a llvm-toolchain that does not provide llvm-cov for some reason
- globbing through all of "external" invites usage of an llvm-cov that ends up there for a different reason (e.g. sysroot for a different tool)
| return llvm_cov.resolve().parent | ||
|
|
||
| # Search from ROOT. | ||
| root = Path(os.environ.get("ROOT", ".")) |
There was a problem hiding this comment.
Again, not sure whether . is a good fallback here. The fallback also makes no sense since it results in the same logic as ll254.
| # Also remove any other symlinks pointing into sandbox paths. | ||
| for entry in directory.iterdir(): | ||
| if entry.is_symlink(): | ||
| target = os.readlink(entry) | ||
| if "sandbox" in target: | ||
| entry.unlink() |
There was a problem hiding this comment.
Should we at least issue a warning for such symlinks?
Better to know which ones point in and then add them here explicitly than just removing everything that is there blindly?
| ┌───────────────────────────┐ | ||
| │ generate_coverage_html.sh │ ◄── bazel run //quality/coverage:generate_coverage_html | ||
| └────┬──────────────────────┘ | ||
| │ Extracts html_report/ → cpp_coverage/ | ||
| ▼ | ||
| ┌─────────────┐ coverage_justifications.yaml | ||
| │ justify.py │ ◄── + source files (COV_JUSTIFIED markers) | ||
| └────┬────────┘ | ||
| │ manifest.json: {file → {line → justification}} | ||
| ▼ | ||
| ┌───────────────────────┐ | ||
| │ effective_coverage.py │ ◄── manifest.json + html_report/ | ||
| └────┬──────────────────┘ | ||
| │ • Modifies HTML in-place (restyled justified lines/branches) | ||
| │ • report.json + summary.txt | ||
| ▼ | ||
| Console output: effective coverage summary |
There was a problem hiding this comment.
I think it would help to argue why we do not call this from reporter.py.
If it is because we want to retain the original data without the justifications applied, we could just output the modified report next to the original report.
|
|
||
| ## Requirements | ||
|
|
||
| The coverage pipeline was built to satisfy the following requirements: |
There was a problem hiding this comment.
Should we put those directly into TRLC? Nit-pick can be done in a follow-up.
| // COV_JUSTIFIED_STOP | ||
| ``` | ||
|
|
||
| Both methods can be combined. A justification covers both the line and any branches on that line. |
There was a problem hiding this comment.
Add some lines that strongly suggest that using code markes is the preferred way.
Putting locations in the YAML always runs the risk of justifications being outdated without us being aware, because code is shifted.
| Code coverage is generated using LLVM's source-based coverage instrumentation. The instrumentation filter is configured in [`quality/coverage/coverage.bazelrc`](coverage/coverage.bazelrc) to cover `//score/message_passing` and `//score/mw/com` while excluding test and benchmark code. | ||
|
|
||
| ### Running Coverage | ||
| HTML reports are generated directly by `llvm-cov show` — no intermediate LCOV conversion or `genhtml` is needed. |
There was a problem hiding this comment.
Please just drop everything after the dash. It just confuses.
| #### Adding a Justification | ||
|
|
||
| 1. Add an entry to `coverage_justifications.yaml`: | ||
|
|
||
| ```yaml | ||
| justifications: | ||
| - id: my-justification-id | ||
| category: defensive_programming # or: tool_false_positive, platform_specific, other | ||
| reason: > | ||
| Explanation of why these lines cannot be covered. | ||
| locations: | ||
| - file: score/mw/com/impl/some_file.cpp | ||
| line_start: 42 | ||
| line_end: 45 | ||
| ``` | ||
|
|
||
| 2. Alternatively, reference the justification from code (no `locations` needed in YAML): | ||
|
|
||
| ```cpp | ||
| unreachable_code(); // COV_JUSTIFIED my-justification-id | ||
|
|
||
| // COV_JUSTIFIED_START my-justification-id | ||
| defensive_block(); | ||
| more_defensive_code(); | ||
| // COV_JUSTIFIED_STOP | ||
| ``` | ||
|
|
||
| #### Effective Coverage | ||
|
|
||
| The coverage pipeline calculates: | ||
| - **Raw line coverage**: actual lines hit / total instrumented lines | ||
| - **Effective line coverage**: (lines hit + justified lines) / total instrumented lines | ||
| - **Raw branch coverage**: actual branches hit / total branches | ||
| - **Effective branch coverage**: (branches hit + justified branches) / total branches | ||
|
|
||
| The index page shows a banner with overall effective coverage and updates per-file percentages. The `generate_coverage_html` script prints the full summary. Set `COVERAGE_THRESHOLD` to enforce a minimum (default: 100%): | ||
|
|
||
| ```bash | ||
| COVERAGE_THRESHOLD=95 bazel run //quality/coverage:generate_coverage_html | ||
| ``` |
There was a problem hiding this comment.
Not really a huge fan of this duplication. Can we not just link to the actual readme for coverage?
No description provided.