feat(bench): index performance benchmark suite#173
Conversation
…tion tests Migration 007 creates bench table with encrypted_text, encrypted_int, encrypted_bigint columns and seeds 10K rows via create_encrypted_json() with cycling offsets for distribution variety. Fixture creates 5 indexes (hash/btree/GIN) split from migration for before/after testing. 9 tests verify row count, column population, index term extraction, index usage via EXPLAIN, and seq scan baseline.
Addresses code review feedback: the literal 10000 appeared in 5 assert_eq calls. Single constant makes it easy to adjust if row count changes for CI.
The INSERT ... generate_series in migration 007 bloated the sqlx template database, making every test pay the copy cost and causing disk space crashes. Now migration 007 is DDL-only (CREATE TABLE bench) and the 10K row seed lives in bench_data.sql fixture. Only bench tests opt in. Also fixes remaining BENCH_ROW_COUNT literals missed by earlier replace.
Database OID was passed as 3rd arg (queryid) instead of 2nd arg (dbid). read_pg_stat_statements correctly filters by dbid, confirming the intent. The reset now scopes to the current database instead of matching a non-existent query ID.
- Fix misleading bench_setup.sql comment (DDL-only migration, rows from fixture) - Fix off-by-one in bench_data.sql offset documentation comments - Add missing ORE term extraction test for encrypted_bigint - Add missing index-usage tests for bench_text_ore_idx and bench_bigint_ore_idx - Document bench_data and bench_setup fixtures in FIXTURE_SCHEMA.md - Update migrations README to list all migrations 002-007
- Use BIGINT GENERATED ALWAYS AS IDENTITY for bench.id (matches documented schema in FIXTURE_SCHEMA.md and other tables in suite) - Fix stale migration range in README (001-004 → 001-007) - Add comment clarifying int/bigint ORE tests verify data seeding, not distinct encoding paths
- bench_data.sql, FIXTURE_SCHEMA.md: correct offset labels from +33/+66 to +34/+67 (formulas unchanged; labels now match the id sequences) - FIXTURE_SCHEMA.md: remove self-contradictory ore_data.sql fixture references; the ore table is migration-only, not a fixture - migrations/README.md: update stale example filename from 005_my_fixture.sql (slot occupied) to 008_my_fixture.sql
Replaces the uniform, phase-locked cycling (each column got 100 rows each of 100 distinct ids, offsets +34/+67) with a deterministic Zipf-like skew seeded via setseed(0.42) and transformed through random()^2. Key differences: - Skewed distribution — top id gets ~5% of rows, tail ids ~0.5% (~10x ratio), giving the planner realistic histograms instead of a perfectly flat distribution. - Three independent draws per row decorrelate the columns; previously all three were the same cycle with fixed phase shifts. - Id range tightened to [1, 99] — create_encrypted_json(id) looks up ore.id = 10*id, so id=100 previously resolved to a missing ore row. Existing bench tests are distribution-agnostic (row-count / non-null / read-and-query-by-id=1) and continue to pass.
- Add get_bench_encrypted_int/text helpers to helpers.rs and re-export from lib.rs - Replace duplicate inline fetch queries with helpers in bench_plan_tests.rs - Add BENCH_INT_ORE_IDX constant to eliminate repeated index name literals - Fix long combined-range format string with line continuation - Add ANALYZE fixture dependency comment to bench_plan_tests.rs module doc - Clarify #[ignore] tests: 10K rows sufficient for validation, CIP-2831 is one root cause - Fix module doc in bench_regression_tests.rs: acknowledge ore_order_by uses 4x not ~100x - Add stats.node_type to all 4 regression assert messages - Add cardinality comments for id=1 and id=50 probe row choices
- Add NULL context to get_bench_encrypted_int/text helpers (mirrors ore helper pattern) - Add BENCH_TEXT_HMAC_IDX constant to bench_plan_tests.rs for consistency with BENCH_INT_ORE_IDX - Clarify id=50 midpoint comment: bench row midpoint, not ORE value midpoint (+33 offset) - Split long use statement in bench_regression_tests.rs onto two lines
…SE_URL The bench:full mise task overrides DATABASE_URL to point at the bench Postgres on port 7433, so tests can read the standard env var like every other test in tests/sqlx/. mise task is the canonical entry point.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a scheduled Tier‑2 benchmarking system: CI workflow, mise tasks, Docker Compose (Postgres + CipherStash Proxy), dataset generation and schema/migration, deterministic fixtures, many Rust benchmark/plan/regression tests, and a report aggregation/upload pipeline. Changes
Sequence Diagram(s)sequenceDiagram
participant GH as GitHub Actions
participant Runner as CI Runner
participant Mise as mise action
participant Docker as Docker Compose
participant Tests as Test suite
participant Store as Artifact storage
GH->>Runner: checkout repo
Runner->>Runner: install deps (pg client, node, mise action)
Runner->>Mise: run bench:up
Mise->>Docker: docker compose up -d --wait (postgres -> proxy)
Docker-->>Mise: services healthy
Runner->>Mise: run bench:generate (install EQL, apply schema, insert rows)
Runner->>Mise: run bench:full (run benchmarks)
Tests->>Docker: execute queries, collect pg_stat_statements
Tests->>Mise: append_result in-memory
Runner->>Tests: write_reports -> write JSON/MD to reports dir
Runner->>Store: upload artifact benchmark-report-${{github.run_id}}
Runner->>Mise: run bench:down (always)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
tests/benchmarks/docker-compose.yml (1)
26-26: Pin the Proxy image by digest for reproducible benchmark runs.
cipherstash/proxy:latestcan change independently of this repository, making benchmark results hard to compare and potentially introducing unreviewed behavior into CI. Semantic version tags are not available for this image; instead, pin to a specific content digest (e.g.,cipherstash/proxy@sha256:424a88567...) and update it intentionally.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/benchmarks/docker-compose.yml` at line 26, Replace the floating image tag "cipherstash/proxy:latest" with an exact content digest (format: cipherstash/proxy@sha256:<digest>) to pin the Proxy image for reproducible benchmarks; locate the Docker Compose service that currently uses the image key "image: cipherstash/proxy:latest", fetch the image digest you want to pin (or obtain the approved sha256 value), and update that image line to use the `@sha256` digest instead of :latest, committing the intentional digest string so CI/benchmarks use a fixed image.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 62-64: Split the combined assignment+export for BENCH_REPORT_DATE
so the shell doesn't mask the date command's exit status: first assign
BENCH_REPORT_DATE="$(date -u +%Y-%m-%d)-${{ github.run_id }}" on its own line,
then run export BENCH_REPORT_DATE on the following line before executing mise
run bench:full; this keeps ShellCheck/SC2155 happy and preserves the date
command's failure visibility.
- Line 20: The workflow uses an invalid runner label "ubuntu-latest-m" in the
runs-on field; update the runs-on value in .github/workflows/benchmark.yml to a
valid GitHub-hosted runner (e.g., "ubuntu-latest", "ubuntu-24.04",
"ubuntu-22.04", or "ubuntu-slim") or ensure a custom runner with the label
"ubuntu-latest-m" is registered in your org so the scheduled job can acquire a
runner.
In `@tests/benchmarks/docker-compose.yml`:
- Around line 11-12: The ports mapping in docker-compose.yml currently exposes
the service on all host interfaces ("7433:5432"); change the host-side binding
to localhost-only by prefixing the host port with 127.0.0.1 (e.g.,
"127.0.0.1:7433:5432") so Postgres/Proxy do not listen publicly, and apply the
same localhost binding pattern to the other port mappings mentioned (the
mappings around lines 28-29) to keep benchmark services bound to the
developer/CI host only.
In `@tests/sqlx/fixtures/bench_data.sql`:
- Around line 20-22: The fixture is generating ids that don't map to existing
ORE rows because create_encrypted_json(id) looks up ore.id = id * 10; update the
fixture to either (A) constrain the random id generation so values produce valid
ore.id in the 1..99 ORE table (e.g., sample ids that, after multiplying by 10,
fall within the existing ORE id range) or (B) add a bench-specific helper (e.g.,
create_encrypted_json_for_ore or similar) that accepts or maps directly to an
ORE id in 1..99 and use that helper in place of create_encrypted_json in the
three fixture lines; ensure the change references create_encrypted_json (or the
new helper name) and the ORE id mapping logic so benchmark rows always have
valid ORE blocks.
In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md`:
- Around line 154-159: The documentation in FIXTURE_SCHEMA.md claims
bench_data.sql uses 99 distinct ORE ids but create_encrypted_json(id) currently
only supports ORE lookups for ids 1..9; either extend the fixture generator to
produce valid ORE values for ids 1..99 or update the doc to reflect the true id
range. Modify the generator that implements create_encrypted_json(id) so its ORE
helper accepts and returns correct encrypted/indexed values for ids 1..99
(matching the Zipf distribution code used by bench_data.sql), and then update
the FIXTURE_SCHEMA.md section to reference the corrected range (or, if you
choose not to extend the generator, change the doc to state the actual supported
id range). Ensure references to bench_data.sql, create_encrypted_json(id), and
the ORE helper are consistent.
In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 70-78: The benchmark currently hard-codes PerfResult.plan_type
("Index Scan") when calling append_result; change this to record the actual
planner output instead: run EXPLAIN (or EXPLAIN (FORMAT JSON)) for each
benchmark query within the same test harness, capture the planner node type (or
full plan) and populate a new field in PerfResult (e.g., actual_plan or
actual_plan_json), and either rename the existing plan_type to
expected_plan_type or stop hard-coding it so expected vs. actual are both
available; update all call sites (the append_result calls and the PerfResult
definition) at the locations around the append_result invocations to use the new
fields.
- Around line 16-17: Replace the fragile alphabetical-ordering approach
(zz_write_reports) with an explicit orchestrator test: create a new test
function (e.g., orchestration_write_reports) marked #[test] and #[ignore], have
it directly invoke each benchmark helper function used in this module (the same
helpers called by the individual bench_* tests) in sequence, then call
write_reports(), and remove any reliance on the zz_write_reports name or
test-thread serialization; this way the orchestrator controls execution ordering
and report generation even when tests are filtered or run in parallel.
In `@tests/sqlx/tests/bench_regression_tests.rs`:
- Around line 25-29: The test interpolates encrypted JSON into SQL strings which
breaks on embedded quotes; update the explain_analyze_avg helper to accept bind
parameters (e.g., change its signature or add explain_analyze_avg_with_params to
take a &[&(dyn ToSql + Sync)] or Vec<Value>/params) and use parameterized
execution so callers can use "$1" placeholders; then change the calls in
bench_regression_tests.rs (the three occurrences building sql with format!) to
use a parameterized SQL like "SELECT ... =
eql_v2.hmac_256($1::jsonb::eql_v2_encrypted)" and pass the encrypted value in
the new params argument to explain_analyze_avg.
---
Nitpick comments:
In `@tests/benchmarks/docker-compose.yml`:
- Line 26: Replace the floating image tag "cipherstash/proxy:latest" with an
exact content digest (format: cipherstash/proxy@sha256:<digest>) to pin the
Proxy image for reproducible benchmarks; locate the Docker Compose service that
currently uses the image key "image: cipherstash/proxy:latest", fetch the image
digest you want to pin (or obtain the approved sha256 value), and update that
image line to use the `@sha256` digest instead of :latest, committing the
intentional digest string so CI/benchmarks use a fixed image.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9a89f433-6f56-4501-8cf5-0bfd8136b2a4
📒 Files selected for processing (22)
.github/workflows/benchmark.ymlmise.tomltasks/bench.tomltests/benchmarks/.env.exampletests/benchmarks/.gitignoretests/benchmarks/README.mdtests/benchmarks/docker-compose.ymltests/benchmarks/generate.shtests/benchmarks/reports/.gitkeeptests/benchmarks/schema.sqltests/sqlx/fixtures/FIXTURE_SCHEMA.mdtests/sqlx/fixtures/bench_data.sqltests/sqlx/fixtures/bench_setup.sqltests/sqlx/migrations/007_install_bench_data.sqltests/sqlx/migrations/README.mdtests/sqlx/src/helpers.rstests/sqlx/src/lib.rstests/sqlx/src/reports.rstests/sqlx/tests/bench_data_tests.rstests/sqlx/tests/bench_perf_tests.rstests/sqlx/tests/bench_plan_tests.rstests/sqlx/tests/bench_regression_tests.rs
Default cargo test now skips 9 tests that dominated CI time:
order_by_sort_tests (30 tests → 25, 401.9s → 8.98s):
- filtered_inner_query_correct_order
- sort_compare_faster_than_correlated_subquery
- filtered_inner_query_faster_than_unfiltered
- sort_compare_performance_at_scale
- filtered_inner_query_performance_at_scale
order_by_no_opclass_tests (12 tests → 8, ~88s → 1.06s):
- correlated_subquery_ranking_{asc,desc,with_limit,with_where}_without_opclass
These tests assert timing relationships or demonstrate O(n²) behaviour
over 1000-row ORE fixtures; they don't catch correctness regressions on
the PR path. Run them with `cargo test -- --ignored` (all 9 pass in
~7m 28s).
Measured on local PG17 for the two affected binaries combined:
before: ~491s
after: ~10s (~49× faster)
Projected CI impact on test-eql.yml: cargo test step drops from ~22m
to ~2m per Postgres version.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/sqlx/tests/order_by_sort_tests.rs (1)
606-792: Observational perf tests have no timing assertions — consider dropping the relative-timing framing in the ignore reason.
sort_compare_faster_than_correlated_subquery,filtered_inner_query_faster_than_unfiltered,sort_compare_performance_at_scale, andfiltered_inner_query_performance_at_scaleonlyeprintln!timings and assert on row counts/ids — there is noassert!onelapsed, so the test names/ignore reasons ("faster_than", "relative timing assertion") overstate what the tests actually verify. These will never fail due to timing, which is fine for--ignoredbench runs but means regressions won't be caught automatically. Consider either adding a loose upper-bound timing assertion (with generous slack to avoid CI flakiness) or renaming/reasoning them as purely observational to matchsort_compare_text_performance.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/tests/order_by_sort_tests.rs` around lines 606 - 792, The test names/ignore reasons claim timing assertions but the functions (sort_compare_faster_than_correlated_subquery, filtered_inner_query_faster_than_unfiltered, sort_compare_performance_at_scale, filtered_inner_query_performance_at_scale) only print elapsed times and assert row counts; either change the #[ignore = "..."] reason to reflect observational/bench-only (e.g., "perf: observational timings; run with --ignored") or add a loose timing assertion using the existing elapsed variables (sort_elapsed, correlated_elapsed, filtered_elapsed, unfiltered_elapsed) — for example assert that the expected faster elapsed is not more than N× slower (choose a generous factor to avoid CI flakiness) and include a clear eprintln on failure; update each function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/sqlx/tests/order_by_sort_tests.rs`:
- Around line 606-792: The test names/ignore reasons claim timing assertions but
the functions (sort_compare_faster_than_correlated_subquery,
filtered_inner_query_faster_than_unfiltered, sort_compare_performance_at_scale,
filtered_inner_query_performance_at_scale) only print elapsed times and assert
row counts; either change the #[ignore = "..."] reason to reflect
observational/bench-only (e.g., "perf: observational timings; run with
--ignored") or add a loose timing assertion using the existing elapsed variables
(sort_elapsed, correlated_elapsed, filtered_elapsed, unfiltered_elapsed) — for
example assert that the expected faster elapsed is not more than N× slower
(choose a generous factor to avoid CI flakiness) and include a clear eprintln on
failure; update each function accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ea45e17c-e325-441b-80d2-b35d6b7617ca
📒 Files selected for processing (2)
tests/sqlx/tests/order_by_no_opclass_tests.rstests/sqlx/tests/order_by_sort_tests.rs
Pure whitespace: collapses/expands await chains and use-imports to match rustfmt default. No behaviour change. Unblocks test-eql.yml lint step (cargo fmt --check was failing).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/sqlx/src/reports.rs (2)
52-57:RESULTSaccumulator is never cleared afterwrite_reports.
write_reportsclones the Vec but leaves the global populated. If the test binary is re-invoked within the same process (unusual forcargo test, but possible for custom harnesses, repeatedwrite_reportscalls, or future refactors), subsequent reports will double-count prior runs. Consider draining instead of cloning:♻️ Suggested change
- let results = RESULTS.lock().expect("results mutex poisoned").clone(); + let results = std::mem::take(&mut *RESULTS.lock().expect("results mutex poisoned"));Also applies to: 73-73
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/src/reports.rs` around lines 52 - 57, The global RESULTS Vec is never cleared after write_reports, causing double-counting on subsequent runs; change write_reports to take ownership of and clear the accumulator under the Mutex (e.g., lock RESULTS and use std::mem::take(&mut *guard) or Vec::drain(..).collect() to extract/clone the results and leave RESULTS empty) instead of merely cloning, so append_result (and the static RESULTS) won't retain prior entries after write_reports completes.
95-108: Markdown table cells aren't escaped.If any
name,priority, orplan_typestring ever contains|or a newline, the generated Markdown table will break. Values are currently hard-coded in the benchmark suite so this is low risk, but a small escape helper (s.replace('|', "\\|").replace('\n', " ")) would make the writer robust to future additions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/src/reports.rs` around lines 95 - 108, The Markdown table cells in render_markdown are not escaped, so values like r.name, r.priority, and r.plan_type containing '|' or newlines will break the table; add a small helper (e.g., escape_md) and call it when formatting each cell to replace '|' with "\|" and replace newlines with a space (or otherwise sanitize), then use the escaped strings in the format call inside render_markdown.tests/sqlx/tests/bench_perf_tests.rs (2)
282-289: Shelling out todate -uis brittle;.unwrap()can panic on non-UTF8.Minor, but this won't work on Windows dev machines, and
String::from_utf8(out.stdout).unwrap()panics if the locale produces unexpected bytes. Sincechronois already a common transitive dep (viasqlx), you likely get it for free — or usetimewhich sqlx also pulls in. Alternatively checkout.status.success()and returnResult<String>rather thanexpect/unwrap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 282 - 289, The today_utc() helper currently shells out to `date -u` and uses expect/unwrap which is brittle and can panic on non-UTF8 output or non-Unix platforms; change today_utc to avoid shelling out and instead use a proper time library (e.g., `chrono::Utc::now().format("%Y-%m-%d").to_string()` or `time` APIs) to produce the YYYY-MM-DD string, or at minimum check `out.status.success()` and return a Result<String> instead of calling expect/unwrap; update callers/tests to handle the Result if you choose the error-returning approach.
269-272:pg_majorparsing breaks for pre-10 Postgres versions.
server_version_numformat isMMmmpppre-10 (e.g.90605→ major9.6) andMMmmppwith 2-digit major post-10 (170004→17). The current slicelen - 4happens to work for PG 10+ but would silently yield"9"for PG 9.6. Not a blocker since the bench stack pins Postgres 17, but a brief comment calling out the PG≥10 assumption (or parsing viaSHOW server_version) would prevent future surprises.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 269 - 272, The current pg_major slicing assumes PG≥10 and breaks for pre-10 versions; update the parsing around pg_version/pg_major in bench_perf_tests.rs to handle both formats by: if pg_version contains a dot, split on '.' and set pg_major to the first two components joined (e.g., "9.6" or "17"); else if it's a numeric server_version_num string, if length > 4 keep the existing prefix-excluding-last-4 behavior, otherwise derive major as the first digit, a dot, and the next digit(s) (e.g., "90605" → "9.6"); alternatively, replace this logic with a call to SHOW server_version and parse that string to produce pg_major — ensure changes reference the pg_version variable and assign to pg_major accordingly and add a brief comment noting the PG≥10 vs pre-10 formats.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/sqlx/src/reports.rs`:
- Around line 42-43: The field `timestamp` on the report struct is misnamed and
misdocumented because its value is derived as `{date}T00:00:00Z` (a date-bucket)
rather than the actual write time; update the struct `timestamp` field name to
something like `date_bucket` (or `date_midnight`) and change its doc comment to
clearly state it is a UTC date-bucket formatted as `YYYY-MM-DDT00:00:00Z`
derived from the caller-supplied date; additionally, validate the
caller-supplied `date` against the `YYYY-MM-DD` pattern (return an error if
invalid) before formatting, and update the corresponding occurrences referenced
around the other usage at the same file (lines noted in the review) to use the
new name and behavior.
---
Nitpick comments:
In `@tests/sqlx/src/reports.rs`:
- Around line 52-57: The global RESULTS Vec is never cleared after
write_reports, causing double-counting on subsequent runs; change write_reports
to take ownership of and clear the accumulator under the Mutex (e.g., lock
RESULTS and use std::mem::take(&mut *guard) or Vec::drain(..).collect() to
extract/clone the results and leave RESULTS empty) instead of merely cloning, so
append_result (and the static RESULTS) won't retain prior entries after
write_reports completes.
- Around line 95-108: The Markdown table cells in render_markdown are not
escaped, so values like r.name, r.priority, and r.plan_type containing '|' or
newlines will break the table; add a small helper (e.g., escape_md) and call it
when formatting each cell to replace '|' with "\|" and replace newlines with a
space (or otherwise sanitize), then use the escaped strings in the format call
inside render_markdown.
In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 282-289: The today_utc() helper currently shells out to `date -u`
and uses expect/unwrap which is brittle and can panic on non-UTF8 output or
non-Unix platforms; change today_utc to avoid shelling out and instead use a
proper time library (e.g., `chrono::Utc::now().format("%Y-%m-%d").to_string()`
or `time` APIs) to produce the YYYY-MM-DD string, or at minimum check
`out.status.success()` and return a Result<String> instead of calling
expect/unwrap; update callers/tests to handle the Result if you choose the
error-returning approach.
- Around line 269-272: The current pg_major slicing assumes PG≥10 and breaks for
pre-10 versions; update the parsing around pg_version/pg_major in
bench_perf_tests.rs to handle both formats by: if pg_version contains a dot,
split on '.' and set pg_major to the first two components joined (e.g., "9.6" or
"17"); else if it's a numeric server_version_num string, if length > 4 keep the
existing prefix-excluding-last-4 behavior, otherwise derive major as the first
digit, a dot, and the next digit(s) (e.g., "90605" → "9.6"); alternatively,
replace this logic with a call to SHOW server_version and parse that string to
produce pg_major — ensure changes reference the pg_version variable and assign
to pg_major accordingly and add a brief comment noting the PG≥10 vs pre-10
formats.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d12e4427-211c-4b9e-b544-43893c27eee1
📒 Files selected for processing (2)
tests/sqlx/src/reports.rstests/sqlx/tests/bench_perf_tests.rs
- use valid `ubuntu-latest` runner label and split SC2155 export - bind docker-compose host ports to 127.0.0.1 - capture actual plan_type per benchmark via EXPLAIN instead of hard-coding - replace zz_write_reports alphabetical-ordering hack with explicit `run_all_benchmarks` orchestrator - parameterize bench_regression_tests via explain_analyze_avg_bound to avoid encrypted-value string interpolation - capture report timestamp at write-time via OffsetDateTime::now_utc - correct FIXTURE_SCHEMA bench_data ORE id mapping documentation
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/sqlx/fixtures/FIXTURE_SCHEMA.md (1)
109-110:⚠️ Potential issue | 🔴 CriticalFix documentation: ore table contains 1000 records (ids 1-1000), not 99.
Lines 109-110 claim "99 records (ids 1-99)", but the migration creates 1000 INSERT statements with ids 1-1000. Additionally, line 123's documentation of the lookup range for
create_encrypted_json(id)should be updated to reflect the actual id range (e.g., "valid ids: 1-100 → ore lookups: 10-1000" or equivalent depending on intended behavior).Update the ore table documentation to match the actual migration contents.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md` around lines 109 - 110, Update the FIXTURE_SCHEMA.md documentation for the ore table to match the actual migration: change the data count to "1000 records (ids 1-1000)" and adjust any lookup range examples referencing create_encrypted_json(id) so the described valid id range matches the real data (e.g., "valid ids: 1-1000 → ore lookups: 10-1000" or the correct mapping used by the migration). Ensure the text mentions the ore table and the create_encrypted_json(id) function explicitly so readers can correlate the documented ranges with the migration-generated INSERT ids.
🧹 Nitpick comments (4)
tests/sqlx/fixtures/FIXTURE_SCHEMA.md (1)
123-123: Add range validation tocreate_encrypted_json(id)function.The implementation at 004_install_test_helpers.sql:271-334 lacks validation for the
idparameter despite documentation stating "valid ids: 1-99". Currently, out-of-range calls silently fail to find matching ore records and return NULL in the encrypted JSON.All current test usage (including bench_data.sql) stays within the 1-99 range, but adding an explicit check would prevent silent errors:
IF id < 1 OR id > 99 THEN RAISE EXCEPTION 'create_encrypted_json: id must be in range 1-99, got %', id; END IF;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md` at line 123, Add explicit range validation inside the create_encrypted_json(id) function: detect when id < 1 OR id > 99 and raise an exception with a clear message (e.g., "create_encrypted_json: id must be in range 1-99, got %") so out-of-range calls fail loudly instead of returning NULL; update the body of create_encrypted_json (in the implementation currently in 004_install_test_helpers.sql) to perform this check at the start before any ore lookup logic.tests/sqlx/src/reports.rs (1)
72-101: Optional: clearRESULTSafter a successful write to make the API idempotent.
write_reportsclones the accumulator but never drains it, so a second call in the same process — e.g. if the orchestrator is later split into multiple phases, or a developer calls bothrun_all_benchmarksand a wrapper in the samecargo testinvocation — will re-include every previousPerfResult. Current usage is one-shot so this is benign, but a one-liner eliminates the foot-gun.♻️ Proposed fix
pub fn write_reports( output_dir: &str, date: &str, postgres_version: &str, dataset_rows: i64, ) -> Result<(PathBuf, PathBuf)> { - let results = RESULTS.lock().expect("results mutex poisoned").clone(); + let results = { + let mut guard = RESULTS.lock().expect("results mutex poisoned"); + std::mem::take(&mut *guard) + }; let timestamp = OffsetDateTime::now_utc()Alternatively, document at the module level that
write_reportsmust be called at most once per process.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/src/reports.rs` around lines 72 - 101, write_reports currently clones the shared accumulator RESULTS and leaves it populated, causing subsequent calls to include prior PerfResult entries; after creating the report (or immediately after cloning), acquire the same RESULTS mutex and replace or drain the inner Vec (e.g., swap with an empty Vec or use mem::take) so the global accumulator is cleared for idempotency — locate the lock calls around RESULTS.lock() and the creation of BenchmarkReport to perform the clear while still holding the mutex to avoid races.tests/sqlx/tests/bench_perf_tests.rs (2)
290-330: Individual wrappers + orchestrator can collide under libtest parallelism.
cargo test --test bench_perf_tests -- --ignored(no name filter) will execute all 7#[tokio::test] #[ignore]tests in parallel against the same Postgres instance. Because everybench_*helper callsreset_pg_stat_statements→ runs RUNS queries → reads backpg_stat_statements, concurrent helpers will:
- reset each other's counters mid-measurement,
- see sibling queries in
read_pg_stat_statements(causingassert_eq!(stats.calls, RUNS)to flap),- double-append to the global
RESULTSvec if the orchestrator also runs (12 entries instead of 6, written to the report).The
mise run bench:fulltask is fine because it filters torun_all_benchmarks, but the failure mode is easy for a developer to trip into when iterating locally. Consider documenting the safe invocation at module level (and/or addingserial_test::serialon the wrappers) so the collision mode is explicit.📝 Suggested doc clarification
// ============================================================================ // Individual test wrappers — allow running one benchmark in isolation via // `cargo test --test bench_perf_tests <name> -- --ignored`. These do NOT // flush reports; only `run_all_benchmarks` does that. +// +// NOTE: Do NOT run these with a bare `cargo test -- --ignored`. libtest will +// execute them in parallel against the same Postgres, and the shared +// pg_stat_statements view + global RESULTS accumulator will cause spurious +// assertion failures and duplicated report rows. Always pass an explicit test +// name filter, or use `--test-threads=1`. // ============================================================================🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 290 - 330, The individual tokio test wrappers (hmac_256_equality, bloom_filter_containment, eql_cast_equality, ore_equality_opclass, ore_range_lt_limit, ore_order_by_limit) can run in parallel and interfere with reset_pg_stat_statements/read_pg_stat_statements and the global RESULTS when run without a name filter; fix by making these wrappers run serially or documenting the required invocation: either add the serial_test::serial attribute to each wrapper (or a module-level enforcement) so they never run concurrently with run_all_benchmarks, or add a clear top-of-file comment explaining that only run_all_benchmarks should be executed without filters and that running the individual ignored tests in bulk will corrupt pg_stat_statements and double-append to RESULTS; ensure references to reset_pg_stat_statements, read_pg_stat_statements, run_all_benchmarks and RESULTS are mentioned in the comment if you choose the documentation route.
263-270: Consider using thetimecrate instead of shelling out todate.The
timecrate is already a dependency of this crate (used intests/sqlx/src/reports.rsforOffsetDateTime::now_utc().format(&Rfc3339)) with the "formatting" feature enabled, so the subprocess invocation and.unwrap()onString::from_utf8can both be avoided.♻️ Proposed fix
fn today_utc() -> String { - // Avoid adding the `chrono` dep; shell out to `date -u` for UTC. - let out = std::process::Command::new("date") - .args(["-u", "+%Y-%m-%d"]) - .output() - .expect("invoking date"); - String::from_utf8(out.stdout).unwrap().trim().to_string() + use time::format_description::well_known::Iso8601; + // `time` is already a dep (used by reports.rs); prefer it over shelling out. + time::OffsetDateTime::now_utc() + .date() + .format(&Iso8601::DATE) + .expect("formatting UTC date") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 263 - 270, Replace the shell-out in today_utc() with the time crate: remove the std::process::Command invocation and String::from_utf8 unwrap, and return OffsetDateTime::now_utc().format(&time::format_description::parse("[year]-[month]-[day]").unwrap()).to_string() (or use the format_description! macro if preferred). This keeps today_utc() as a String, references the OffsetDateTime type and avoids spawning a subprocess and UTF-8 unwraps.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/benchmark.yml:
- Line 21: The CI job timeout of 60 minutes in .github/workflows/benchmark.yml
is too low for the bench_ore_order_by_limit workload (see
tests/sqlx/tests/bench_perf_tests.rs where RUNS = 1000 and per-run time is
several seconds at 100K), so either increase timeout-minutes (e.g. to 120) in
the benchmark.yml workflow, or change the benchmark harness to reduce work for
that specific test (e.g., lower RUNS for the unindexed-order-by test in
tests/sqlx/tests/bench_perf_tests.rs or add a conditional smaller RUNS when
running in CI) so the total runtime stays under the CI timeout.
---
Outside diff comments:
In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md`:
- Around line 109-110: Update the FIXTURE_SCHEMA.md documentation for the ore
table to match the actual migration: change the data count to "1000 records (ids
1-1000)" and adjust any lookup range examples referencing
create_encrypted_json(id) so the described valid id range matches the real data
(e.g., "valid ids: 1-1000 → ore lookups: 10-1000" or the correct mapping used by
the migration). Ensure the text mentions the ore table and the
create_encrypted_json(id) function explicitly so readers can correlate the
documented ranges with the migration-generated INSERT ids.
---
Nitpick comments:
In `@tests/sqlx/fixtures/FIXTURE_SCHEMA.md`:
- Line 123: Add explicit range validation inside the create_encrypted_json(id)
function: detect when id < 1 OR id > 99 and raise an exception with a clear
message (e.g., "create_encrypted_json: id must be in range 1-99, got %") so
out-of-range calls fail loudly instead of returning NULL; update the body of
create_encrypted_json (in the implementation currently in
004_install_test_helpers.sql) to perform this check at the start before any ore
lookup logic.
In `@tests/sqlx/src/reports.rs`:
- Around line 72-101: write_reports currently clones the shared accumulator
RESULTS and leaves it populated, causing subsequent calls to include prior
PerfResult entries; after creating the report (or immediately after cloning),
acquire the same RESULTS mutex and replace or drain the inner Vec (e.g., swap
with an empty Vec or use mem::take) so the global accumulator is cleared for
idempotency — locate the lock calls around RESULTS.lock() and the creation of
BenchmarkReport to perform the clear while still holding the mutex to avoid
races.
In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 290-330: The individual tokio test wrappers (hmac_256_equality,
bloom_filter_containment, eql_cast_equality, ore_equality_opclass,
ore_range_lt_limit, ore_order_by_limit) can run in parallel and interfere with
reset_pg_stat_statements/read_pg_stat_statements and the global RESULTS when run
without a name filter; fix by making these wrappers run serially or documenting
the required invocation: either add the serial_test::serial attribute to each
wrapper (or a module-level enforcement) so they never run concurrently with
run_all_benchmarks, or add a clear top-of-file comment explaining that only
run_all_benchmarks should be executed without filters and that running the
individual ignored tests in bulk will corrupt pg_stat_statements and
double-append to RESULTS; ensure references to reset_pg_stat_statements,
read_pg_stat_statements, run_all_benchmarks and RESULTS are mentioned in the
comment if you choose the documentation route.
- Around line 263-270: Replace the shell-out in today_utc() with the time crate:
remove the std::process::Command invocation and String::from_utf8 unwrap, and
return
OffsetDateTime::now_utc().format(&time::format_description::parse("[year]-[month]-[day]").unwrap()).to_string()
(or use the format_description! macro if preferred). This keeps today_utc() as a
String, references the OffsetDateTime type and avoids spawning a subprocess and
UTF-8 unwraps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 66bc5a3c-abf5-4d51-a715-d309caf318f8
⛔ Files ignored due to path filters (1)
tests/sqlx/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
.github/workflows/benchmark.ymltasks/bench.tomltests/benchmarks/docker-compose.ymltests/sqlx/Cargo.tomltests/sqlx/fixtures/FIXTURE_SCHEMA.mdtests/sqlx/src/helpers.rstests/sqlx/src/lib.rstests/sqlx/src/reports.rstests/sqlx/tests/bench_perf_tests.rstests/sqlx/tests/bench_regression_tests.rs
✅ Files skipped from review due to trivial changes (2)
- tests/sqlx/Cargo.toml
- tests/benchmarks/docker-compose.yml
🚧 Files skipped from review as they are similar to previous changes (3)
- tasks/bench.toml
- tests/sqlx/tests/bench_regression_tests.rs
- tests/sqlx/src/helpers.rs
Worst-case bench_ore_order_by_limit takes several seconds per run at 100K rows; with RUNS=1000 the single benchmark could exceed the 60-min CI timeout before flush_reports writes the artifact. Drop RUNS to 10 to keep the scheduled job well under budget while still capturing mean/stddev via pg_stat_statements. Addresses CodeRabbit review feedback on PR #173.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/benchmarks/README.md (1)
33-38: Document required GitHub secrets in CI usage section.Line 35-Line 37 explain schedule/dispatch, but the required secrets are not listed here. Without them, manual or scheduled runs will fail even if the workflow is triggered correctly.
📝 Suggested doc patch
## CI usage Runs automatically every Monday at 03:00 UTC via `.github/workflows/benchmark.yml`. Also manually invocable from the GitHub Actions UI (Run workflow button). + +Required repository secrets: +- `CS_CLIENT_ACCESS_KEY` +- `CS_DEFAULT_KEYSET_ID` +- `CS_CLIENT_KEY` +- `CS_CLIENT_ID` +- `CS_WORKSPACE_CRN`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/benchmarks/README.md` around lines 33 - 38, Update the "CI usage" section to list the GitHub Actions secrets required by the benchmark workflow (referencing the workflow file .github/workflows/benchmark.yml), naming each secret and a one-line purpose/usage for it (e.g., BENCHMARK_API_KEY: used to upload results, AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY: for publishing artifacts, GH_TOKEN: for dispatching or PR comments), and add a short note on where to set them in the repository settings so manual or scheduled runs do not fail due to missing secrets.tests/sqlx/tests/bench_perf_tests.rs (1)
34-43: Acquire a single connection per benchmark body to ensure plan capture and execution use the same session.Currently,
fetch_plan_node_type(), the measured query loop, and stats readback all acquire from the samePgPoolwithmax_connections(4). This can cause plan mismatches:EXPLAIN (FORMAT JSON)runs unprepared (simple protocol) on one physical connection, whilesqlx::query().fetch_all()uses the extended protocol with prepared statements on a potentially different connection. PostgreSQL caches prepared statement plans per backend, and after ~5 executions may switch to a generic plan. The captured plan and the actual measured execution may differ, introducing noise into benchmarks. For week-over-week perf data stability, acquire a single persistent connection per benchmark body (or set pool to 1) so EXPLAIN and execution see identical plan cache state.Also applies to: 94-100, 128-134, 161-167, 194-200, 223-230
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 34 - 43, The benchmark is using a pool with max_connections(4) which can hand out different physical backends for EXPLAIN vs measured executions; change connect() to create a single-backend pool (PgPoolOptions::new().max_connections(1)) and, in each benchmark body, acquire one persistent connection from that pool (let mut conn = pool.acquire().await?) and reuse that same connection for fetch_plan_node_type(), the measured sqlx::query().fetch_all() loop, and the stats readback (ensure_pg_stat_statements()/reads) instead of calling pool-level helpers that may grab different connections so EXPLAIN and execution share the same session/plan cache.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 31-32: Replace the hard-coded DATASET_ROWS constant with a runtime
value queried from the test database before emitting the report: remove or stop
using const DATASET_ROWS and instead run a COUNT(*) query against the table(s)
populated by these tests (using the same sqlx connection/pool used by the suite)
to produce a dataset_rows i64, then use that dataset_rows value wherever
DATASET_ROWS was used for report metadata (including the other occurrences noted
around lines 245-258); ensure the COUNT query runs once per test run and
propagates the resulting i64 into the report-writing code so DATABASE_URL
overrides and reused/partial fixtures are reflected correctly.
---
Nitpick comments:
In `@tests/benchmarks/README.md`:
- Around line 33-38: Update the "CI usage" section to list the GitHub Actions
secrets required by the benchmark workflow (referencing the workflow file
.github/workflows/benchmark.yml), naming each secret and a one-line
purpose/usage for it (e.g., BENCHMARK_API_KEY: used to upload results,
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY: for publishing artifacts, GH_TOKEN: for
dispatching or PR comments), and add a short note on where to set them in the
repository settings so manual or scheduled runs do not fail due to missing
secrets.
In `@tests/sqlx/tests/bench_perf_tests.rs`:
- Around line 34-43: The benchmark is using a pool with max_connections(4) which
can hand out different physical backends for EXPLAIN vs measured executions;
change connect() to create a single-backend pool
(PgPoolOptions::new().max_connections(1)) and, in each benchmark body, acquire
one persistent connection from that pool (let mut conn = pool.acquire().await?)
and reuse that same connection for fetch_plan_node_type(), the measured
sqlx::query().fetch_all() loop, and the stats readback
(ensure_pg_stat_statements()/reads) instead of calling pool-level helpers that
may grab different connections so EXPLAIN and execution share the same
session/plan cache.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3e104a6e-972b-4a62-a4f6-991501067944
📒 Files selected for processing (2)
tests/benchmarks/README.mdtests/sqlx/tests/bench_perf_tests.rs
| const RUNS: i64 = 10; | ||
| const DATASET_ROWS: i64 = 100_000; |
There was a problem hiding this comment.
Derive dataset_rows from the database before writing the report.
The artifact always reports 100_000 rows right now, but this suite explicitly allows DATABASE_URL overrides. If the target DB is reused, partially generated, or the fixture size changes later, the report metadata becomes wrong even when the queries still succeed.
Proposed fix
-const DATASET_ROWS: i64 = 100_000;
-
async fn flush_reports(pool: &PgPool) -> Result<()> {
+ let dataset_rows: i64 = sqlx::query_scalar("SELECT COUNT(*) FROM bench")
+ .fetch_one(pool)
+ .await?;
+
let pg_version: String = sqlx::query_scalar("SHOW server_version_num")
.fetch_one(pool)
.await?;
@@
let date = std::env::var("BENCH_REPORT_DATE").unwrap_or_else(|_| today_utc());
let output_dir = std::env::var("BENCH_REPORT_DIR")
.unwrap_or_else(|_| "../../tests/benchmarks/reports".into());
- let (json, md) = write_reports(&output_dir, &date, &pg_major, DATASET_ROWS)?;
+ let (json, md) = write_reports(&output_dir, &date, &pg_major, dataset_rows)?;
eprintln!("wrote {} and {}", json.display(), md.display());
Ok(())
}Also applies to: 245-258
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/sqlx/tests/bench_perf_tests.rs` around lines 31 - 32, Replace the
hard-coded DATASET_ROWS constant with a runtime value queried from the test
database before emitting the report: remove or stop using const DATASET_ROWS and
instead run a COUNT(*) query against the table(s) populated by these tests
(using the same sqlx connection/pool used by the suite) to produce a
dataset_rows i64, then use that dataset_rows value wherever DATASET_ROWS was
used for report metadata (including the other occurrences noted around lines
245-258); ensure the COUNT query runs once per test run and propagates the
resulting i64 into the report-writing code so DATABASE_URL overrides and
reused/partial fixtures are reflected correctly.
Adds a three-tier benchmarking infrastructure for EQL index performance.
What's included
Tier 0 — fixtures (
tests/sqlx/fixtures/bench_*.sql, migration 007)benchtable with Zipf-skewed 10K-row opt-in fixtureTier 1 — regression tests (
tests/sqlx/tests/bench_{plan,regression,data}_tests.rs)Tier 2 — scheduled benchmarks (
tests/benchmarks/,.github/workflows/benchmark.yml)generate.shinserts 100K plaintext rows through Proxy (encrypts)bench_perf_tests.rs— 6 patterns × 1000 runs, readspg_stat_statementsworkflow_dispatch, 90-day artifact retentionmise run bench:{up,down,generate,full}for local runsRequirements
Scheduled workflow needs repo secrets:
CS_CLIENT_ACCESS_KEY,CS_DEFAULT_KEYSET_ID,CS_CLIENT_KEY,CS_CLIENT_ID,CS_WORKSPACE_CRN.Test plan
mise run testpassestests/benchmarks/.env, thenmise run bench:{up,generate,full}produces report intests/benchmarks/reports/workflow_dispatchtrigger succeeds end-to-end once secrets are provisionedSummary by CodeRabbit
New Features
Reporting
Tests
Documentation