Skip to content

feat(bench): index performance benchmark suite#173

Open
tobyhede wants to merge 28 commits intomainfrom
index-performance-scheduled-benchmarks
Open

feat(bench): index performance benchmark suite#173
tobyhede wants to merge 28 commits intomainfrom
index-performance-scheduled-benchmarks

Conversation

@tobyhede
Copy link
Copy Markdown
Contributor

@tobyhede tobyhede commented Apr 22, 2026

Adds a three-tier benchmarking infrastructure for EQL index performance.

What's included

Tier 0 — fixtures (tests/sqlx/fixtures/bench_*.sql, migration 007)

  • bench table with Zipf-skewed 10K-row opt-in fixture
  • Indexes: hmac_256 hash, bloom_filter GIN, ORE btree for text/int/bigint

Tier 1 — regression tests (tests/sqlx/tests/bench_{plan,regression,data}_tests.rs)

  • Plan assertions for ORE range queries and P0 ignored patterns
  • Magnitude regression tests with timing thresholds

Tier 2 — scheduled benchmarks (tests/benchmarks/, .github/workflows/benchmark.yml)

  • Docker compose: Postgres 17 + CipherStash Proxy on ports 7433/6433
  • generate.sh inserts 100K plaintext rows through Proxy (encrypts)
  • bench_perf_tests.rs — 6 patterns × 1000 runs, reads pg_stat_statements
  • Weekly Monday 03:00 UTC cron + workflow_dispatch, 90-day artifact retention
  • mise run bench:{up,down,generate,full} for local runs

Requirements

Scheduled workflow needs repo secrets: CS_CLIENT_ACCESS_KEY, CS_DEFAULT_KEYSET_ID, CS_CLIENT_KEY, CS_CLIENT_ID, CS_WORKSPACE_CRN.

Test plan

  • Tier 1: mise run test passes
  • Tier 2 local: populate tests/benchmarks/.env, then mise run bench:{up,generate,full} produces report in tests/benchmarks/reports/
  • Tier 2 CI: workflow_dispatch trigger succeeds end-to-end once secrets are provisioned

Summary by CodeRabbit

  • New Features

    • Scheduled weekly Tier‑2 benchmark workflow plus manual trigger; local tasks to provision services, generate a 100K encrypted dataset, and run full benchmarks.
  • Reporting

    • Benchmarks produce JSON and Markdown reports per run and upload them as CI artifacts (retained 90 days).
  • Tests

    • New performance, plan, and regression benchmark suites added; several slow tests are skipped by default.
  • Documentation

    • Added benchmarking guide, env template, and local Docker setup instructions.

tobyhede added 24 commits April 21, 2026 13:24
…tion tests

Migration 007 creates bench table with encrypted_text, encrypted_int,
encrypted_bigint columns and seeds 10K rows via create_encrypted_json()
with cycling offsets for distribution variety. Fixture creates 5 indexes
(hash/btree/GIN) split from migration for before/after testing. 9 tests
verify row count, column population, index term extraction, index usage
via EXPLAIN, and seq scan baseline.
Addresses code review feedback: the literal 10000 appeared in 5 assert_eq
calls. Single constant makes it easy to adjust if row count changes for CI.
The INSERT ... generate_series in migration 007 bloated the sqlx template
database, making every test pay the copy cost and causing disk space
crashes. Now migration 007 is DDL-only (CREATE TABLE bench) and the 10K
row seed lives in bench_data.sql fixture. Only bench tests opt in.

Also fixes remaining BENCH_ROW_COUNT literals missed by earlier replace.
Database OID was passed as 3rd arg (queryid) instead of 2nd arg (dbid).
read_pg_stat_statements correctly filters by dbid, confirming the intent.
The reset now scopes to the current database instead of matching a
non-existent query ID.
- Fix misleading bench_setup.sql comment (DDL-only migration, rows from fixture)
- Fix off-by-one in bench_data.sql offset documentation comments
- Add missing ORE term extraction test for encrypted_bigint
- Add missing index-usage tests for bench_text_ore_idx and bench_bigint_ore_idx
- Document bench_data and bench_setup fixtures in FIXTURE_SCHEMA.md
- Update migrations README to list all migrations 002-007
- Use BIGINT GENERATED ALWAYS AS IDENTITY for bench.id (matches
  documented schema in FIXTURE_SCHEMA.md and other tables in suite)
- Fix stale migration range in README (001-004 → 001-007)
- Add comment clarifying int/bigint ORE tests verify data seeding,
  not distinct encoding paths
- bench_data.sql, FIXTURE_SCHEMA.md: correct offset labels from +33/+66
  to +34/+67 (formulas unchanged; labels now match the id sequences)
- FIXTURE_SCHEMA.md: remove self-contradictory ore_data.sql fixture
  references; the ore table is migration-only, not a fixture
- migrations/README.md: update stale example filename from
  005_my_fixture.sql (slot occupied) to 008_my_fixture.sql
Replaces the uniform, phase-locked cycling (each column got 100 rows
each of 100 distinct ids, offsets +34/+67) with a deterministic
Zipf-like skew seeded via setseed(0.42) and transformed through
random()^2.

Key differences:
- Skewed distribution — top id gets ~5% of rows, tail ids ~0.5%
  (~10x ratio), giving the planner realistic histograms instead
  of a perfectly flat distribution.
- Three independent draws per row decorrelate the columns; previously
  all three were the same cycle with fixed phase shifts.
- Id range tightened to [1, 99] — create_encrypted_json(id) looks up
  ore.id = 10*id, so id=100 previously resolved to a missing ore row.

Existing bench tests are distribution-agnostic (row-count / non-null /
read-and-query-by-id=1) and continue to pass.
- Add get_bench_encrypted_int/text helpers to helpers.rs and re-export from lib.rs
- Replace duplicate inline fetch queries with helpers in bench_plan_tests.rs
- Add BENCH_INT_ORE_IDX constant to eliminate repeated index name literals
- Fix long combined-range format string with line continuation
- Add ANALYZE fixture dependency comment to bench_plan_tests.rs module doc
- Clarify #[ignore] tests: 10K rows sufficient for validation, CIP-2831 is one root cause
- Fix module doc in bench_regression_tests.rs: acknowledge ore_order_by uses 4x not ~100x
- Add stats.node_type to all 4 regression assert messages
- Add cardinality comments for id=1 and id=50 probe row choices
- Add NULL context to get_bench_encrypted_int/text helpers (mirrors ore helper pattern)
- Add BENCH_TEXT_HMAC_IDX constant to bench_plan_tests.rs for consistency with BENCH_INT_ORE_IDX
- Clarify id=50 midpoint comment: bench row midpoint, not ORE value midpoint (+33 offset)
- Split long use statement in bench_regression_tests.rs onto two lines
…SE_URL

The bench:full mise task overrides DATABASE_URL to point at the bench
Postgres on port 7433, so tests can read the standard env var like
every other test in tests/sqlx/. mise task is the canonical entry point.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a scheduled Tier‑2 benchmarking system: CI workflow, mise tasks, Docker Compose (Postgres + CipherStash Proxy), dataset generation and schema/migration, deterministic fixtures, many Rust benchmark/plan/regression tests, and a report aggregation/upload pipeline.

Changes

Cohort / File(s) Summary
CI Workflow
​.github/workflows/benchmark.yml
New weekly/manual GitHub Actions workflow to run Tier‑2 benchmarks, install toolchain/caching, write proxy creds from secrets, run mise tasks, always teardown, and upload reports artifact (90d retention).
Task orchestration
mise.toml, tasks/bench.toml
Included tasks/bench.toml in mise includes; added bench:up, bench:down, bench:generate, and bench:full tasks to orchestrate docker, data generation, and benchmark execution.
Benchmark infra
tests/benchmarks/...
tests/benchmarks/.env.example, tests/benchmarks/.gitignore, tests/benchmarks/README.md, tests/benchmarks/docker-compose.yml
Added credential template, gitignore, documentation, and docker‑compose for Postgres 17 + CipherStash Proxy (ports, healthchecks, bench network).
Data generation & schema
tests/benchmarks/generate.sh, tests/benchmarks/schema.sql, tests/sqlx/migrations/007_install_bench_data.sql
Script installs EQL, applies schema, and populates encrypted dataset; schema and migration add bench table with eql_v2_encrypted columns and Proxy search configs.
Fixtures & docs
tests/sqlx/fixtures/..., tests/sqlx/migrations/README.md
tests/sqlx/fixtures/FIXTURE_SCHEMA.md, tests/sqlx/fixtures/bench_data.sql, tests/sqlx/fixtures/bench_setup.sql
Added deterministic 10k fixture, index/setup fixture, and updated fixture/migration docs to include migrations up to 007.
Test helpers & reporting
tests/sqlx/src/...
tests/sqlx/src/helpers.rs, tests/sqlx/src/lib.rs, tests/sqlx/src/reports.rs
Added helpers to fetch bench encrypted fields, bound EXPLAIN helper, adjusted pg_stat_statements reset call; introduced PerfResult/BenchmarkReport, append_result, and write_reports (JSON + Markdown).
Benchmark test suites
tests/sqlx/tests/...
tests/sqlx/tests/bench_data_tests.rs, bench_perf_tests.rs, bench_plan_tests.rs, bench_regression_tests.rs, order_by_*.rs
Added many new benchmark, plan, and regression tests (data integrity, plan assertions, performance thresholds). Several long/slow tests were marked #[ignore]; perf orchestrator tests are ignored by default for scheduled/manual runs.
Build config
tests/sqlx/Cargo.toml
Added time = "0.3" dependency with formatting and std features for the test crate.

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant Runner as CI Runner
    participant Mise as mise action
    participant Docker as Docker Compose
    participant Tests as Test suite
    participant Store as Artifact storage

    GH->>Runner: checkout repo
    Runner->>Runner: install deps (pg client, node, mise action)
    Runner->>Mise: run bench:up
    Mise->>Docker: docker compose up -d --wait (postgres -> proxy)
    Docker-->>Mise: services healthy
    Runner->>Mise: run bench:generate (install EQL, apply schema, insert rows)
    Runner->>Mise: run bench:full (run benchmarks)
    Tests->>Docker: execute queries, collect pg_stat_statements
    Tests->>Mise: append_result in-memory
    Runner->>Tests: write_reports -> write JSON/MD to reports dir
    Runner->>Store: upload artifact benchmark-report-${{github.run_id}}
    Runner->>Mise: run bench:down (always)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • freshtonic
  • auxesis

Poem

🐇 I seeded rows beneath the moon,
Proxies hummed and queries tuned,
I counted runs and wrote a chart,
A carrot graph, a tester's art,
Reports delivered — hop, commune. 🥕📊

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(bench): index performance benchmark suite' clearly summarizes the main change—adding a comprehensive benchmarking infrastructure for EQL index performance across three tiers (fixtures, regression tests, and scheduled benchmarks).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch index-performance-scheduled-benchmarks
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch index-performance-scheduled-benchmarks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
tests/benchmarks/docker-compose.yml (1)

26-26: Pin the Proxy image by digest for reproducible benchmark runs.

cipherstash/proxy:latest can change independently of this repository, making benchmark results hard to compare and potentially introducing unreviewed behavior into CI. Semantic version tags are not available for this image; instead, pin to a specific content digest (e.g., cipherstash/proxy@sha256:424a88567...) and update it intentionally.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/benchmarks/docker-compose.yml` at line 26, Replace the floating image
tag "cipherstash/proxy:latest" with an exact content digest (format:
cipherstash/proxy@sha256:<digest>) to pin the Proxy image for reproducible
benchmarks; locate the Docker Compose service that currently uses the image key
"image: cipherstash/proxy:latest", fetch the image digest you want to pin (or
obtain the approved sha256 value), and update that image line to use the `@sha256`
digest instead of :latest, committing the intentional digest string so
CI/benchmarks use a fixed image.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 62-64: Split the combined assignment+export for BENCH_REPORT_DATE
so the shell doesn't mask the date command's exit status: first assign
BENCH_REPORT_DATE="$(date -u +%Y-%m-%d)-${{ github.run_id }}" on its own line,
then run export BENCH_REPORT_DATE on the following line before executing mise
run bench:full; this keeps ShellCheck/SC2155 happy and preserves the date
command's failure visibility.
- Line 20: The workflow uses an invalid runner label "ubuntu-latest-m" in the
runs-on field; update the runs-on value in .github/workflows/benchmark.yml to a
valid GitHub-hosted runner (e.g., "ubuntu-latest", "ubuntu-24.04",
"ubuntu-22.04", or "ubuntu-slim") or ensure a custom runner with the label
"ubuntu-latest-m" is registered in your org so the scheduled job can acquire a
runner.

In `@tests/benchmarks/docker-compose.yml`:
- Around line 11-12: The ports mapping in docker-compose.yml currently exposes
the service on all host interfaces ("7433:5432"); change the host-side binding
to localhost-only by prefixing the host port with 127.0.0.1 (e.g.,
"127.0.0.1:7433:5432") so Postgres/Proxy do not listen publicly, and apply the
same localhost binding pattern to the other port mappings mentioned (the
mappings around lines 28-29) to keep benchmark services bound to the
developer/CI host only.

In `@tests/sqlx/fixtures/bench_data.sql`:
- Around line 20-22: The fixture is generating ids that don't map to existing
ORE rows because create_encrypted_json(id) looks up ore.id = id * 10; update the
fixture to either (A) constrain the random id generation so values produce valid
ore.id in the 1..99 ORE table (e.g., sample ids that, after multiplying by 10,
fall within the existing ORE id range) or (B) add a bench-specific helper (e.g.,
create_encrypted_json_for_ore or similar) that accepts or maps directly to an
ORE id in 1..99 and use that helper in place of create_encrypted_json in the
three fixture lines; ensure the change references create_encrypted_json (or the
new helper name) and the ORE id mapping logic so benchmark rows always have
valid ORE blocks.

In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md`:
- Around line 154-159: The documentation in FIXTURE_SCHEMA.md claims
bench_data.sql uses 99 distinct ORE ids but create_encrypted_json(id) currently
only supports ORE lookups for ids 1..9; either extend the fixture generator to
produce valid ORE values for ids 1..99 or update the doc to reflect the true id
range. Modify the generator that implements create_encrypted_json(id) so its ORE
helper accepts and returns correct encrypted/indexed values for ids 1..99
(matching the Zipf distribution code used by bench_data.sql), and then update
the FIXTURE_SCHEMA.md section to reference the corrected range (or, if you
choose not to extend the generator, change the doc to state the actual supported
id range). Ensure references to bench_data.sql, create_encrypted_json(id), and
the ORE helper are consistent.

In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 70-78: The benchmark currently hard-codes PerfResult.plan_type
("Index Scan") when calling append_result; change this to record the actual
planner output instead: run EXPLAIN (or EXPLAIN (FORMAT JSON)) for each
benchmark query within the same test harness, capture the planner node type (or
full plan) and populate a new field in PerfResult (e.g., actual_plan or
actual_plan_json), and either rename the existing plan_type to
expected_plan_type or stop hard-coding it so expected vs. actual are both
available; update all call sites (the append_result calls and the PerfResult
definition) at the locations around the append_result invocations to use the new
fields.
- Around line 16-17: Replace the fragile alphabetical-ordering approach
(zz_write_reports) with an explicit orchestrator test: create a new test
function (e.g., orchestration_write_reports) marked #[test] and #[ignore], have
it directly invoke each benchmark helper function used in this module (the same
helpers called by the individual bench_* tests) in sequence, then call
write_reports(), and remove any reliance on the zz_write_reports name or
test-thread serialization; this way the orchestrator controls execution ordering
and report generation even when tests are filtered or run in parallel.

In `@tests/sqlx/tests/bench_regression_tests.rs`:
- Around line 25-29: The test interpolates encrypted JSON into SQL strings which
breaks on embedded quotes; update the explain_analyze_avg helper to accept bind
parameters (e.g., change its signature or add explain_analyze_avg_with_params to
take a &[&(dyn ToSql + Sync)] or Vec<Value>/params) and use parameterized
execution so callers can use "$1" placeholders; then change the calls in
bench_regression_tests.rs (the three occurrences building sql with format!) to
use a parameterized SQL like "SELECT ... =
eql_v2.hmac_256($1::jsonb::eql_v2_encrypted)" and pass the encrypted value in
the new params argument to explain_analyze_avg.

---

Nitpick comments:
In `@tests/benchmarks/docker-compose.yml`:
- Line 26: Replace the floating image tag "cipherstash/proxy:latest" with an
exact content digest (format: cipherstash/proxy@sha256:<digest>) to pin the
Proxy image for reproducible benchmarks; locate the Docker Compose service that
currently uses the image key "image: cipherstash/proxy:latest", fetch the image
digest you want to pin (or obtain the approved sha256 value), and update that
image line to use the `@sha256` digest instead of :latest, committing the
intentional digest string so CI/benchmarks use a fixed image.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a89f433-6f56-4501-8cf5-0bfd8136b2a4

📥 Commits

Reviewing files that changed from the base of the PR and between c89356f and 6d98632.

📒 Files selected for processing (22)
  • .github/workflows/benchmark.yml
  • mise.toml
  • tasks/bench.toml
  • tests/benchmarks/.env.example
  • tests/benchmarks/.gitignore
  • tests/benchmarks/README.md
  • tests/benchmarks/docker-compose.yml
  • tests/benchmarks/generate.sh
  • tests/benchmarks/reports/.gitkeep
  • tests/benchmarks/schema.sql
  • tests/sqlx/fixtures/FIXTURE_SCHEMA.md
  • tests/sqlx/fixtures/bench_data.sql
  • tests/sqlx/fixtures/bench_setup.sql
  • tests/sqlx/migrations/007_install_bench_data.sql
  • tests/sqlx/migrations/README.md
  • tests/sqlx/src/helpers.rs
  • tests/sqlx/src/lib.rs
  • tests/sqlx/src/reports.rs
  • tests/sqlx/tests/bench_data_tests.rs
  • tests/sqlx/tests/bench_perf_tests.rs
  • tests/sqlx/tests/bench_plan_tests.rs
  • tests/sqlx/tests/bench_regression_tests.rs

Comment thread .github/workflows/benchmark.yml Outdated
Comment thread .github/workflows/benchmark.yml
Comment thread tests/benchmarks/docker-compose.yml Outdated
Comment thread tests/sqlx/fixtures/bench_data.sql
Comment thread tests/sqlx/fixtures/FIXTURE_SCHEMA.md
Comment thread tests/sqlx/tests/bench_perf_tests.rs Outdated
Comment thread tests/sqlx/tests/bench_perf_tests.rs
Comment thread tests/sqlx/tests/bench_regression_tests.rs Outdated
Default cargo test now skips 9 tests that dominated CI time:

  order_by_sort_tests (30 tests → 25, 401.9s → 8.98s):
    - filtered_inner_query_correct_order
    - sort_compare_faster_than_correlated_subquery
    - filtered_inner_query_faster_than_unfiltered
    - sort_compare_performance_at_scale
    - filtered_inner_query_performance_at_scale

  order_by_no_opclass_tests (12 tests → 8, ~88s → 1.06s):
    - correlated_subquery_ranking_{asc,desc,with_limit,with_where}_without_opclass

These tests assert timing relationships or demonstrate O(n²) behaviour
over 1000-row ORE fixtures; they don't catch correctness regressions on
the PR path. Run them with `cargo test -- --ignored` (all 9 pass in
~7m 28s).

Measured on local PG17 for the two affected binaries combined:
  before: ~491s
  after:   ~10s (~49× faster)

Projected CI impact on test-eql.yml: cargo test step drops from ~22m
to ~2m per Postgres version.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/sqlx/tests/order_by_sort_tests.rs (1)

606-792: Observational perf tests have no timing assertions — consider dropping the relative-timing framing in the ignore reason.

sort_compare_faster_than_correlated_subquery, filtered_inner_query_faster_than_unfiltered, sort_compare_performance_at_scale, and filtered_inner_query_performance_at_scale only eprintln! timings and assert on row counts/ids — there is no assert! on elapsed, so the test names/ignore reasons ("faster_than", "relative timing assertion") overstate what the tests actually verify. These will never fail due to timing, which is fine for --ignored bench runs but means regressions won't be caught automatically. Consider either adding a loose upper-bound timing assertion (with generous slack to avoid CI flakiness) or renaming/reasoning them as purely observational to match sort_compare_text_performance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/tests/order_by_sort_tests.rs` around lines 606 - 792, The test
names/ignore reasons claim timing assertions but the functions
(sort_compare_faster_than_correlated_subquery,
filtered_inner_query_faster_than_unfiltered, sort_compare_performance_at_scale,
filtered_inner_query_performance_at_scale) only print elapsed times and assert
row counts; either change the #[ignore = "..."] reason to reflect
observational/bench-only (e.g., "perf: observational timings; run with
--ignored") or add a loose timing assertion using the existing elapsed variables
(sort_elapsed, correlated_elapsed, filtered_elapsed, unfiltered_elapsed) — for
example assert that the expected faster elapsed is not more than N× slower
(choose a generous factor to avoid CI flakiness) and include a clear eprintln on
failure; update each function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/sqlx/tests/order_by_sort_tests.rs`:
- Around line 606-792: The test names/ignore reasons claim timing assertions but
the functions (sort_compare_faster_than_correlated_subquery,
filtered_inner_query_faster_than_unfiltered, sort_compare_performance_at_scale,
filtered_inner_query_performance_at_scale) only print elapsed times and assert
row counts; either change the #[ignore = "..."] reason to reflect
observational/bench-only (e.g., "perf: observational timings; run with
--ignored") or add a loose timing assertion using the existing elapsed variables
(sort_elapsed, correlated_elapsed, filtered_elapsed, unfiltered_elapsed) — for
example assert that the expected faster elapsed is not more than N× slower
(choose a generous factor to avoid CI flakiness) and include a clear eprintln on
failure; update each function accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea45e17c-e325-441b-80d2-b35d6b7617ca

📥 Commits

Reviewing files that changed from the base of the PR and between 6d98632 and c2dc431.

📒 Files selected for processing (2)
  • tests/sqlx/tests/order_by_no_opclass_tests.rs
  • tests/sqlx/tests/order_by_sort_tests.rs

Pure whitespace: collapses/expands await chains and use-imports to
match rustfmt default. No behaviour change.

Unblocks test-eql.yml lint step (cargo fmt --check was failing).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
tests/sqlx/src/reports.rs (2)

52-57: RESULTS accumulator is never cleared after write_reports.

write_reports clones the Vec but leaves the global populated. If the test binary is re-invoked within the same process (unusual for cargo test, but possible for custom harnesses, repeated write_reports calls, or future refactors), subsequent reports will double-count prior runs. Consider draining instead of cloning:

♻️ Suggested change
-    let results = RESULTS.lock().expect("results mutex poisoned").clone();
+    let results = std::mem::take(&mut *RESULTS.lock().expect("results mutex poisoned"));

Also applies to: 73-73

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/src/reports.rs` around lines 52 - 57, The global RESULTS Vec is
never cleared after write_reports, causing double-counting on subsequent runs;
change write_reports to take ownership of and clear the accumulator under the
Mutex (e.g., lock RESULTS and use std::mem::take(&mut *guard) or
Vec::drain(..).collect() to extract/clone the results and leave RESULTS empty)
instead of merely cloning, so append_result (and the static RESULTS) won't
retain prior entries after write_reports completes.

95-108: Markdown table cells aren't escaped.

If any name, priority, or plan_type string ever contains | or a newline, the generated Markdown table will break. Values are currently hard-coded in the benchmark suite so this is low risk, but a small escape helper (s.replace('|', "\\|").replace('\n', " ")) would make the writer robust to future additions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/src/reports.rs` around lines 95 - 108, The Markdown table cells in
render_markdown are not escaped, so values like r.name, r.priority, and
r.plan_type containing '|' or newlines will break the table; add a small helper
(e.g., escape_md) and call it when formatting each cell to replace '|' with "\|"
and replace newlines with a space (or otherwise sanitize), then use the escaped
strings in the format call inside render_markdown.
tests/sqlx/tests/bench_perf_tests.rs (2)

282-289: Shelling out to date -u is brittle; .unwrap() can panic on non-UTF8.

Minor, but this won't work on Windows dev machines, and String::from_utf8(out.stdout).unwrap() panics if the locale produces unexpected bytes. Since chrono is already a common transitive dep (via sqlx), you likely get it for free — or use time which sqlx also pulls in. Alternatively check out.status.success() and return Result<String> rather than expect/unwrap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 282 - 289, The today_utc()
helper currently shells out to `date -u` and uses expect/unwrap which is brittle
and can panic on non-UTF8 output or non-Unix platforms; change today_utc to
avoid shelling out and instead use a proper time library (e.g.,
`chrono::Utc::now().format("%Y-%m-%d").to_string()` or `time` APIs) to produce
the YYYY-MM-DD string, or at minimum check `out.status.success()` and return a
Result<String> instead of calling expect/unwrap; update callers/tests to handle
the Result if you choose the error-returning approach.

269-272: pg_major parsing breaks for pre-10 Postgres versions.

server_version_num format is MMmmpp pre-10 (e.g. 90605 → major 9.6) and MMmmpp with 2-digit major post-10 (17000417). The current slice len - 4 happens to work for PG 10+ but would silently yield "9" for PG 9.6. Not a blocker since the bench stack pins Postgres 17, but a brief comment calling out the PG≥10 assumption (or parsing via SHOW server_version) would prevent future surprises.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 269 - 272, The current
pg_major slicing assumes PG≥10 and breaks for pre-10 versions; update the
parsing around pg_version/pg_major in bench_perf_tests.rs to handle both formats
by: if pg_version contains a dot, split on '.' and set pg_major to the first two
components joined (e.g., "9.6" or "17"); else if it's a numeric
server_version_num string, if length > 4 keep the existing
prefix-excluding-last-4 behavior, otherwise derive major as the first digit, a
dot, and the next digit(s) (e.g., "90605" → "9.6"); alternatively, replace this
logic with a call to SHOW server_version and parse that string to produce
pg_major — ensure changes reference the pg_version variable and assign to
pg_major accordingly and add a brief comment noting the PG≥10 vs pre-10 formats.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/sqlx/src/reports.rs`:
- Around line 42-43: The field `timestamp` on the report struct is misnamed and
misdocumented because its value is derived as `{date}T00:00:00Z` (a date-bucket)
rather than the actual write time; update the struct `timestamp` field name to
something like `date_bucket` (or `date_midnight`) and change its doc comment to
clearly state it is a UTC date-bucket formatted as `YYYY-MM-DDT00:00:00Z`
derived from the caller-supplied date; additionally, validate the
caller-supplied `date` against the `YYYY-MM-DD` pattern (return an error if
invalid) before formatting, and update the corresponding occurrences referenced
around the other usage at the same file (lines noted in the review) to use the
new name and behavior.

---

Nitpick comments:
In `@tests/sqlx/src/reports.rs`:
- Around line 52-57: The global RESULTS Vec is never cleared after
write_reports, causing double-counting on subsequent runs; change write_reports
to take ownership of and clear the accumulator under the Mutex (e.g., lock
RESULTS and use std::mem::take(&mut *guard) or Vec::drain(..).collect() to
extract/clone the results and leave RESULTS empty) instead of merely cloning, so
append_result (and the static RESULTS) won't retain prior entries after
write_reports completes.
- Around line 95-108: The Markdown table cells in render_markdown are not
escaped, so values like r.name, r.priority, and r.plan_type containing '|' or
newlines will break the table; add a small helper (e.g., escape_md) and call it
when formatting each cell to replace '|' with "\|" and replace newlines with a
space (or otherwise sanitize), then use the escaped strings in the format call
inside render_markdown.

In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 282-289: The today_utc() helper currently shells out to `date -u`
and uses expect/unwrap which is brittle and can panic on non-UTF8 output or
non-Unix platforms; change today_utc to avoid shelling out and instead use a
proper time library (e.g., `chrono::Utc::now().format("%Y-%m-%d").to_string()`
or `time` APIs) to produce the YYYY-MM-DD string, or at minimum check
`out.status.success()` and return a Result<String> instead of calling
expect/unwrap; update callers/tests to handle the Result if you choose the
error-returning approach.
- Around line 269-272: The current pg_major slicing assumes PG≥10 and breaks for
pre-10 versions; update the parsing around pg_version/pg_major in
bench_perf_tests.rs to handle both formats by: if pg_version contains a dot,
split on '.' and set pg_major to the first two components joined (e.g., "9.6" or
"17"); else if it's a numeric server_version_num string, if length > 4 keep the
existing prefix-excluding-last-4 behavior, otherwise derive major as the first
digit, a dot, and the next digit(s) (e.g., "90605" → "9.6"); alternatively,
replace this logic with a call to SHOW server_version and parse that string to
produce pg_major — ensure changes reference the pg_version variable and assign
to pg_major accordingly and add a brief comment noting the PG≥10 vs pre-10
formats.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d12e4427-211c-4b9e-b544-43893c27eee1

📥 Commits

Reviewing files that changed from the base of the PR and between c2dc431 and 511d97b.

📒 Files selected for processing (2)
  • tests/sqlx/src/reports.rs
  • tests/sqlx/tests/bench_perf_tests.rs

Comment thread tests/sqlx/src/reports.rs
- use valid `ubuntu-latest` runner label and split SC2155 export
- bind docker-compose host ports to 127.0.0.1
- capture actual plan_type per benchmark via EXPLAIN instead of hard-coding
- replace zz_write_reports alphabetical-ordering hack with explicit
  `run_all_benchmarks` orchestrator
- parameterize bench_regression_tests via explain_analyze_avg_bound
  to avoid encrypted-value string interpolation
- capture report timestamp at write-time via OffsetDateTime::now_utc
- correct FIXTURE_SCHEMA bench_data ORE id mapping documentation
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/sqlx/fixtures/FIXTURE_SCHEMA.md (1)

109-110: ⚠️ Potential issue | 🔴 Critical

Fix documentation: ore table contains 1000 records (ids 1-1000), not 99.

Lines 109-110 claim "99 records (ids 1-99)", but the migration creates 1000 INSERT statements with ids 1-1000. Additionally, line 123's documentation of the lookup range for create_encrypted_json(id) should be updated to reflect the actual id range (e.g., "valid ids: 1-100 → ore lookups: 10-1000" or equivalent depending on intended behavior).

Update the ore table documentation to match the actual migration contents.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md` around lines 109 - 110, Update the
FIXTURE_SCHEMA.md documentation for the ore table to match the actual migration:
change the data count to "1000 records (ids 1-1000)" and adjust any lookup range
examples referencing create_encrypted_json(id) so the described valid id range
matches the real data (e.g., "valid ids: 1-1000 → ore lookups: 10-1000" or the
correct mapping used by the migration). Ensure the text mentions the ore table
and the create_encrypted_json(id) function explicitly so readers can correlate
the documented ranges with the migration-generated INSERT ids.
🧹 Nitpick comments (4)
tests/sqlx/fixtures/FIXTURE_SCHEMA.md (1)

123-123: Add range validation to create_encrypted_json(id) function.

The implementation at 004_install_test_helpers.sql:271-334 lacks validation for the id parameter despite documentation stating "valid ids: 1-99". Currently, out-of-range calls silently fail to find matching ore records and return NULL in the encrypted JSON.

All current test usage (including bench_data.sql) stays within the 1-99 range, but adding an explicit check would prevent silent errors:

IF id < 1 OR id > 99 THEN
  RAISE EXCEPTION 'create_encrypted_json: id must be in range 1-99, got %', id;
END IF;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md` at line 123, Add explicit range
validation inside the create_encrypted_json(id) function: detect when id < 1 OR
id > 99 and raise an exception with a clear message (e.g.,
"create_encrypted_json: id must be in range 1-99, got %") so out-of-range calls
fail loudly instead of returning NULL; update the body of create_encrypted_json
(in the implementation currently in 004_install_test_helpers.sql) to perform
this check at the start before any ore lookup logic.
tests/sqlx/src/reports.rs (1)

72-101: Optional: clear RESULTS after a successful write to make the API idempotent.

write_reports clones the accumulator but never drains it, so a second call in the same process — e.g. if the orchestrator is later split into multiple phases, or a developer calls both run_all_benchmarks and a wrapper in the same cargo test invocation — will re-include every previous PerfResult. Current usage is one-shot so this is benign, but a one-liner eliminates the foot-gun.

♻️ Proposed fix
 pub fn write_reports(
     output_dir: &str,
     date: &str,
     postgres_version: &str,
     dataset_rows: i64,
 ) -> Result<(PathBuf, PathBuf)> {
-    let results = RESULTS.lock().expect("results mutex poisoned").clone();
+    let results = {
+        let mut guard = RESULTS.lock().expect("results mutex poisoned");
+        std::mem::take(&mut *guard)
+    };
     let timestamp = OffsetDateTime::now_utc()

Alternatively, document at the module level that write_reports must be called at most once per process.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/src/reports.rs` around lines 72 - 101, write_reports currently
clones the shared accumulator RESULTS and leaves it populated, causing
subsequent calls to include prior PerfResult entries; after creating the report
(or immediately after cloning), acquire the same RESULTS mutex and replace or
drain the inner Vec (e.g., swap with an empty Vec or use mem::take) so the
global accumulator is cleared for idempotency — locate the lock calls around
RESULTS.lock() and the creation of BenchmarkReport to perform the clear while
still holding the mutex to avoid races.
tests/sqlx/tests/bench_perf_tests.rs (2)

290-330: Individual wrappers + orchestrator can collide under libtest parallelism.

cargo test --test bench_perf_tests -- --ignored (no name filter) will execute all 7 #[tokio::test] #[ignore] tests in parallel against the same Postgres instance. Because every bench_* helper calls reset_pg_stat_statements → runs RUNS queries → reads back pg_stat_statements, concurrent helpers will:

  • reset each other's counters mid-measurement,
  • see sibling queries in read_pg_stat_statements (causing assert_eq!(stats.calls, RUNS) to flap),
  • double-append to the global RESULTS vec if the orchestrator also runs (12 entries instead of 6, written to the report).

The mise run bench:full task is fine because it filters to run_all_benchmarks, but the failure mode is easy for a developer to trip into when iterating locally. Consider documenting the safe invocation at module level (and/or adding serial_test::serial on the wrappers) so the collision mode is explicit.

📝 Suggested doc clarification
 // ============================================================================
 // Individual test wrappers — allow running one benchmark in isolation via
 // `cargo test --test bench_perf_tests <name> -- --ignored`. These do NOT
 // flush reports; only `run_all_benchmarks` does that.
+//
+// NOTE: Do NOT run these with a bare `cargo test -- --ignored`. libtest will
+// execute them in parallel against the same Postgres, and the shared
+// pg_stat_statements view + global RESULTS accumulator will cause spurious
+// assertion failures and duplicated report rows. Always pass an explicit test
+// name filter, or use `--test-threads=1`.
 // ============================================================================
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 290 - 330, The individual
tokio test wrappers (hmac_256_equality, bloom_filter_containment,
eql_cast_equality, ore_equality_opclass, ore_range_lt_limit, ore_order_by_limit)
can run in parallel and interfere with
reset_pg_stat_statements/read_pg_stat_statements and the global RESULTS when run
without a name filter; fix by making these wrappers run serially or documenting
the required invocation: either add the serial_test::serial attribute to each
wrapper (or a module-level enforcement) so they never run concurrently with
run_all_benchmarks, or add a clear top-of-file comment explaining that only
run_all_benchmarks should be executed without filters and that running the
individual ignored tests in bulk will corrupt pg_stat_statements and
double-append to RESULTS; ensure references to reset_pg_stat_statements,
read_pg_stat_statements, run_all_benchmarks and RESULTS are mentioned in the
comment if you choose the documentation route.

263-270: Consider using the time crate instead of shelling out to date.

The time crate is already a dependency of this crate (used in tests/sqlx/src/reports.rs for OffsetDateTime::now_utc().format(&Rfc3339)) with the "formatting" feature enabled, so the subprocess invocation and .unwrap() on String::from_utf8 can both be avoided.

♻️ Proposed fix
fn today_utc() -> String {
-    // Avoid adding the `chrono` dep; shell out to `date -u` for UTC.
-    let out = std::process::Command::new("date")
-        .args(["-u", "+%Y-%m-%d"])
-        .output()
-        .expect("invoking date");
-    String::from_utf8(out.stdout).unwrap().trim().to_string()
+    use time::format_description::well_known::Iso8601;
+    // `time` is already a dep (used by reports.rs); prefer it over shelling out.
+    time::OffsetDateTime::now_utc()
+        .date()
+        .format(&Iso8601::DATE)
+        .expect("formatting UTC date")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 263 - 270, Replace the
shell-out in today_utc() with the time crate: remove the std::process::Command
invocation and String::from_utf8 unwrap, and return
OffsetDateTime::now_utc().format(&time::format_description::parse("[year]-[month]-[day]").unwrap()).to_string()
(or use the format_description! macro if preferred). This keeps today_utc() as a
String, references the OffsetDateTime type and avoids spawning a subprocess and
UTF-8 unwraps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/benchmark.yml:
- Line 21: The CI job timeout of 60 minutes in .github/workflows/benchmark.yml
is too low for the bench_ore_order_by_limit workload (see
tests/sqlx/tests/bench_perf_tests.rs where RUNS = 1000 and per-run time is
several seconds at 100K), so either increase timeout-minutes (e.g. to 120) in
the benchmark.yml workflow, or change the benchmark harness to reduce work for
that specific test (e.g., lower RUNS for the unindexed-order-by test in
tests/sqlx/tests/bench_perf_tests.rs or add a conditional smaller RUNS when
running in CI) so the total runtime stays under the CI timeout.

---

Outside diff comments:
In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md`:
- Around line 109-110: Update the FIXTURE_SCHEMA.md documentation for the ore
table to match the actual migration: change the data count to "1000 records (ids
1-1000)" and adjust any lookup range examples referencing
create_encrypted_json(id) so the described valid id range matches the real data
(e.g., "valid ids: 1-1000 → ore lookups: 10-1000" or the correct mapping used by
the migration). Ensure the text mentions the ore table and the
create_encrypted_json(id) function explicitly so readers can correlate the
documented ranges with the migration-generated INSERT ids.

---

Nitpick comments:
In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md`:
- Line 123: Add explicit range validation inside the create_encrypted_json(id)
function: detect when id < 1 OR id > 99 and raise an exception with a clear
message (e.g., "create_encrypted_json: id must be in range 1-99, got %") so
out-of-range calls fail loudly instead of returning NULL; update the body of
create_encrypted_json (in the implementation currently in
004_install_test_helpers.sql) to perform this check at the start before any ore
lookup logic.

In `@tests/sqlx/src/reports.rs`:
- Around line 72-101: write_reports currently clones the shared accumulator
RESULTS and leaves it populated, causing subsequent calls to include prior
PerfResult entries; after creating the report (or immediately after cloning),
acquire the same RESULTS mutex and replace or drain the inner Vec (e.g., swap
with an empty Vec or use mem::take) so the global accumulator is cleared for
idempotency — locate the lock calls around RESULTS.lock() and the creation of
BenchmarkReport to perform the clear while still holding the mutex to avoid
races.

In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 290-330: The individual tokio test wrappers (hmac_256_equality,
bloom_filter_containment, eql_cast_equality, ore_equality_opclass,
ore_range_lt_limit, ore_order_by_limit) can run in parallel and interfere with
reset_pg_stat_statements/read_pg_stat_statements and the global RESULTS when run
without a name filter; fix by making these wrappers run serially or documenting
the required invocation: either add the serial_test::serial attribute to each
wrapper (or a module-level enforcement) so they never run concurrently with
run_all_benchmarks, or add a clear top-of-file comment explaining that only
run_all_benchmarks should be executed without filters and that running the
individual ignored tests in bulk will corrupt pg_stat_statements and
double-append to RESULTS; ensure references to reset_pg_stat_statements,
read_pg_stat_statements, run_all_benchmarks and RESULTS are mentioned in the
comment if you choose the documentation route.
- Around line 263-270: Replace the shell-out in today_utc() with the time crate:
remove the std::process::Command invocation and String::from_utf8 unwrap, and
return
OffsetDateTime::now_utc().format(&time::format_description::parse("[year]-[month]-[day]").unwrap()).to_string()
(or use the format_description! macro if preferred). This keeps today_utc() as a
String, references the OffsetDateTime type and avoids spawning a subprocess and
UTF-8 unwraps.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 66bc5a3c-abf5-4d51-a715-d309caf318f8

📥 Commits

Reviewing files that changed from the base of the PR and between 511d97b and e5fff4d.

⛔ Files ignored due to path filters (1)
  • tests/sqlx/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • .github/workflows/benchmark.yml
  • tasks/bench.toml
  • tests/benchmarks/docker-compose.yml
  • tests/sqlx/Cargo.toml
  • tests/sqlx/fixtures/FIXTURE_SCHEMA.md
  • tests/sqlx/src/helpers.rs
  • tests/sqlx/src/lib.rs
  • tests/sqlx/src/reports.rs
  • tests/sqlx/tests/bench_perf_tests.rs
  • tests/sqlx/tests/bench_regression_tests.rs
✅ Files skipped from review due to trivial changes (2)
  • tests/sqlx/Cargo.toml
  • tests/benchmarks/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (3)
  • tasks/bench.toml
  • tests/sqlx/tests/bench_regression_tests.rs
  • tests/sqlx/src/helpers.rs

Comment thread .github/workflows/benchmark.yml
Worst-case bench_ore_order_by_limit takes several seconds per run at
100K rows; with RUNS=1000 the single benchmark could exceed the 60-min
CI timeout before flush_reports writes the artifact. Drop RUNS to 10
to keep the scheduled job well under budget while still capturing
mean/stddev via pg_stat_statements.

Addresses CodeRabbit review feedback on PR #173.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/benchmarks/README.md (1)

33-38: Document required GitHub secrets in CI usage section.

Line 35-Line 37 explain schedule/dispatch, but the required secrets are not listed here. Without them, manual or scheduled runs will fail even if the workflow is triggered correctly.

📝 Suggested doc patch
 ## CI usage
 
 Runs automatically every Monday at 03:00 UTC via
 `.github/workflows/benchmark.yml`. Also manually invocable from the
 GitHub Actions UI (Run workflow button).
+
+Required repository secrets:
+- `CS_CLIENT_ACCESS_KEY`
+- `CS_DEFAULT_KEYSET_ID`
+- `CS_CLIENT_KEY`
+- `CS_CLIENT_ID`
+- `CS_WORKSPACE_CRN`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/benchmarks/README.md` around lines 33 - 38, Update the "CI usage"
section to list the GitHub Actions secrets required by the benchmark workflow
(referencing the workflow file .github/workflows/benchmark.yml), naming each
secret and a one-line purpose/usage for it (e.g., BENCHMARK_API_KEY: used to
upload results, AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY: for publishing
artifacts, GH_TOKEN: for dispatching or PR comments), and add a short note on
where to set them in the repository settings so manual or scheduled runs do not
fail due to missing secrets.
tests/sqlx/tests/bench_perf_tests.rs (1)

34-43: Acquire a single connection per benchmark body to ensure plan capture and execution use the same session.

Currently, fetch_plan_node_type(), the measured query loop, and stats readback all acquire from the same PgPool with max_connections(4). This can cause plan mismatches: EXPLAIN (FORMAT JSON) runs unprepared (simple protocol) on one physical connection, while sqlx::query().fetch_all() uses the extended protocol with prepared statements on a potentially different connection. PostgreSQL caches prepared statement plans per backend, and after ~5 executions may switch to a generic plan. The captured plan and the actual measured execution may differ, introducing noise into benchmarks. For week-over-week perf data stability, acquire a single persistent connection per benchmark body (or set pool to 1) so EXPLAIN and execution see identical plan cache state.

Also applies to: 94-100, 128-134, 161-167, 194-200, 223-230

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 34 - 43, The benchmark is
using a pool with max_connections(4) which can hand out different physical
backends for EXPLAIN vs measured executions; change connect() to create a
single-backend pool (PgPoolOptions::new().max_connections(1)) and, in each
benchmark body, acquire one persistent connection from that pool (let mut conn =
pool.acquire().await?) and reuse that same connection for
fetch_plan_node_type(), the measured sqlx::query().fetch_all() loop, and the
stats readback (ensure_pg_stat_statements()/reads) instead of calling pool-level
helpers that may grab different connections so EXPLAIN and execution share the
same session/plan cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 31-32: Replace the hard-coded DATASET_ROWS constant with a runtime
value queried from the test database before emitting the report: remove or stop
using const DATASET_ROWS and instead run a COUNT(*) query against the table(s)
populated by these tests (using the same sqlx connection/pool used by the suite)
to produce a dataset_rows i64, then use that dataset_rows value wherever
DATASET_ROWS was used for report metadata (including the other occurrences noted
around lines 245-258); ensure the COUNT query runs once per test run and
propagates the resulting i64 into the report-writing code so DATABASE_URL
overrides and reused/partial fixtures are reflected correctly.

---

Nitpick comments:
In `@tests/benchmarks/README.md`:
- Around line 33-38: Update the "CI usage" section to list the GitHub Actions
secrets required by the benchmark workflow (referencing the workflow file
.github/workflows/benchmark.yml), naming each secret and a one-line
purpose/usage for it (e.g., BENCHMARK_API_KEY: used to upload results,
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY: for publishing artifacts, GH_TOKEN: for
dispatching or PR comments), and add a short note on where to set them in the
repository settings so manual or scheduled runs do not fail due to missing
secrets.

In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 34-43: The benchmark is using a pool with max_connections(4) which
can hand out different physical backends for EXPLAIN vs measured executions;
change connect() to create a single-backend pool
(PgPoolOptions::new().max_connections(1)) and, in each benchmark body, acquire
one persistent connection from that pool (let mut conn = pool.acquire().await?)
and reuse that same connection for fetch_plan_node_type(), the measured
sqlx::query().fetch_all() loop, and the stats readback
(ensure_pg_stat_statements()/reads) instead of calling pool-level helpers that
may grab different connections so EXPLAIN and execution share the same
session/plan cache.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e104a6e-972b-4a62-a4f6-991501067944

📥 Commits

Reviewing files that changed from the base of the PR and between e5fff4d and 2ef4dec.

📒 Files selected for processing (2)
  • tests/benchmarks/README.md
  • tests/sqlx/tests/bench_perf_tests.rs

Comment on lines +31 to +32
const RUNS: i64 = 10;
const DATASET_ROWS: i64 = 100_000;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Derive dataset_rows from the database before writing the report.

The artifact always reports 100_000 rows right now, but this suite explicitly allows DATABASE_URL overrides. If the target DB is reused, partially generated, or the fixture size changes later, the report metadata becomes wrong even when the queries still succeed.

Proposed fix
-const DATASET_ROWS: i64 = 100_000;
-
 async fn flush_reports(pool: &PgPool) -> Result<()> {
+    let dataset_rows: i64 = sqlx::query_scalar("SELECT COUNT(*) FROM bench")
+        .fetch_one(pool)
+        .await?;
+
     let pg_version: String = sqlx::query_scalar("SHOW server_version_num")
         .fetch_one(pool)
         .await?;
@@
     let date = std::env::var("BENCH_REPORT_DATE").unwrap_or_else(|_| today_utc());
     let output_dir = std::env::var("BENCH_REPORT_DIR")
         .unwrap_or_else(|_| "../../tests/benchmarks/reports".into());
-    let (json, md) = write_reports(&output_dir, &date, &pg_major, DATASET_ROWS)?;
+    let (json, md) = write_reports(&output_dir, &date, &pg_major, dataset_rows)?;
     eprintln!("wrote {} and {}", json.display(), md.display());
     Ok(())
 }

Also applies to: 245-258

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 31 - 32, Replace the
hard-coded DATASET_ROWS constant with a runtime value queried from the test
database before emitting the report: remove or stop using const DATASET_ROWS and
instead run a COUNT(*) query against the table(s) populated by these tests
(using the same sqlx connection/pool used by the suite) to produce a
dataset_rows i64, then use that dataset_rows value wherever DATASET_ROWS was
used for report metadata (including the other occurrences noted around lines
245-258); ensure the COUNT query runs once per test run and propagates the
resulting i64 into the report-writing code so DATABASE_URL overrides and
reused/partial fixtures are reflected correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant