Skip to content

feat: periodic soak test mode for RAC fuzz framework#22

Open
rophy wants to merge 6 commits intomasterfrom
feat/soak-test-periodic
Open

feat: periodic soak test mode for RAC fuzz framework#22
rophy wants to merge 6 commits intomasterfrom
feat/soak-test-periodic

Conversation

@rophy
Copy link
Copy Markdown
Owner

@rophy rophy commented Apr 29, 2026

Summary

  • Evolve fuzz test into repeatable periodic soak mode — each cycle is a finite fuzz run with cursor-based resume
  • Make workload idempotent across cycles (repopulate tracked IDs from existing rows, continue sequences from MAX)
  • Move table cleanup into workload PL/SQL so DELETEs flow through CDC and get validated
  • Replace host-side archive cleanup with containerized cron service
  • Drop streaming soak design (Phase B) — replaced by periodic fuzz model

Test plan

  • 8-hour soak run (36 cycles, 10min each): ~1.1M events validated, 0 mismatches
  • Back-to-back cycles with no teardown: cursor advances correctly each cycle
  • Existing single-cycle fuzz test (run N + validate) still works

Summary by CodeRabbit

Release Notes

  • New Features

    • Added archive cleanup service for automated removal of old logs
    • Introduced soak testing script for continuous fuzz testing with time-based cycling
    • Added resumable validation with checkpoint cursor support
  • Improvements

    • Enhanced test failure diagnostics with detailed logs and better readiness detection
    • Implemented TTL-based in-memory cleanup with configurable purge intervals
    • Improved database connection robustness with timeout configuration
  • Tests

    • Added workspace persistence for soak-state carryover between test cycles

rophy added 5 commits April 29, 2026 20:45
Phase A - Storage cleanup (works with existing finite runs):
- Add created_at column to all FUZZ_* tables for TTL-based purge
- Add FUZZ_WKL.cleanup() procedure + DBMS_SCHEDULER job (every 30min)
- Add archive log cleanup loop in fuzz-test.sh up (hourly)
- Add seq dict pruning in kafka-consumer.py (every 10min, 24h TTL)
- Add SQLite event purge in validator.py after each validation cycle

Phase B - Continuous soak operation:
- Add FUZZ_WKL.run_forever() with rate limiting via DBMS_SESSION.SLEEP
- Add SOAK_MODE to validator.py: continuous validate-purge cycles
- Add stall detection (exit if no new events for 5min)
- Add fuzz-test.sh soak subcommand with health monitoring
- Add SQLite busy_timeout to handle concurrent writer contention

Tested: 2-min finite run (11,238 events, 0 mismatches) and 5-min soak
run (2,407 events validated in cycle 1, 0 mismatches) on RAC VM.
Phase B (continuous soak) is being redesigned as periodic fuzz runs;
remove the streaming implementation:
- Drop run_forever() in FUZZ_WKL package
- Drop SOAK_MODE path in validator.py (stall detection, continuous loop)
- Drop soak subcommand and action_soak from fuzz-test.sh
- Remove SOAK-TEST.md design doc

Replace host-side archive cleanup loop with a docker-compose service:
- New archive-cleanup/ directory (Dockerfile + crontab)
- Alpine + openssh-client + supercronic v0.2.44 (SHA1 pinned)
- Hourly find -mtime +1 -print -delete, deletions logged to stdout
- Lifecycle tied to docker-compose up/down, visible via docker logs

Phase A cleanup mechanisms retained:
- created_at columns + FUZZ_WKL.cleanup() + DBMS_SCHEDULER job
- Consumer seq dict pruning (10min interval, 24h TTL)
- validator.py purge_old_events() called after each one-shot cycle
- SQLite busy_timeout=30000 on both consumer and validator
- Move FUZZ_* table cleanup into workload PL/SQL (every 5min) so
  DELETEs flow through CDC and get validated like normal DML.
- Remove DBMS_SCHEDULER FUZZ_CLEANUP job from up/down (replaced by
  in-workload cleanup).
- Fix grep -q + pipefail SIGPIPE false-failure in OLR/Debezium
  readiness waits.
…ator

- fuzz-workload.sql: make FUZZ_WKL.run() idempotent across cycles.
  Repopulate per-node tracked-ID arrays from existing rows (parity-
  filtered), continue g_next_id from MAX(id) preserving parity, and
  continue g_event_seq from MAX numeric tail of existing event_ids so
  cycle N+1 event_ids never collide with cycle N. Seed INSERTs only
  run on cold start (empty tables).
- validator.py: accept START_CURSOR env and emit
  '[validator] final_cursor=...' (safe frontier only). Lets a soak
  loop resume past already-validated events without re-scanning.
- fuzz-test.sh: forward START_CURSOR to the validator container; dump
  sqlplus output when FUZZ_DONE summary is missing (diagnostic).

Verified: 3 back-to-back cycles with no teardown, 32k events, 0
mismatches, cursor advances each cycle.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Warning

Rate limit exceeded

@rophy has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 37 minutes and 38 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9702d258-261d-4e61-8e09-78b1a2c2de5e

📥 Commits

Reviewing files that changed from the base of the PR and between d938471 and c95225c.

📒 Files selected for processing (1)
  • tests/dbz-twin/rac/soak.sh
📝 Walkthrough

Walkthrough

The PR enhances the RAC fuzz testing framework with archive cleanup infrastructure, resumable cursor support, TTL-based event purging, database resilience improvements, and a soak test orchestration script to enable continuous testing with cycle recovery.

Changes

Cohort / File(s) Summary
Archive Cleanup Infrastructure
tests/dbz-twin/rac/archive-cleanup/Dockerfile, tests/dbz-twin/rac/archive-cleanup/crontab
New Docker image and hourly cron job that uses SSH to remotely delete redo archive logs older than 1 day from the RAC VM, with output prefixed for container log visibility.
Compose & Environment
tests/dbz-twin/rac/docker-compose-fuzz.yaml
Adds archive-cleanup service with VM connectivity configuration and introduces PURGE_TTL_HOURS environment variable (default 24) for the consumer container.
Fuzz Test Orchestration
tests/dbz-twin/rac/fuzz-test.sh, tests/dbz-twin/rac/soak.sh
Improves readiness detection with stricter grep semantics; adds WORK_DIR for state persistence; expands failure diagnostics with full sqlplus output; introduces new soak script that cycles fuzz runs with cursor resumption and metric aggregation.
Workload & Sequence Management
tests/dbz-twin/rac/perf/fuzz-workload.sql
Adds created_at column to all fuzz tables; introduces cleanup procedure for 24-hour row deletion; enables soak carryover by repopulating tracked-ID arrays from existing rows and continuing sequence generation without reset.
Kafka Consumer & Validation
tests/dbz-twin/rac/kafka-consumer.py, tests/dbz-twin/rac/validator.py
Adds TTL-based cleanup of in-memory event maps (purging every 10 minutes); improves SQLite resilience with 30s timeouts and busy pragmas; refactors validator with resumable cursor support (START_CURSOR), deterministic cursor parsing/formatting, per-cycle validation logic, and final cursor emission for next cycle resumption.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 Hop, purge, and persist through the night,
Old archive logs cleaned up just right!
Cursors resume where cycles left off,
Timeouts prevent the soak test scoff.
Round and round the fuzzing goes—
Where cleanup ends, resilience flows!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: periodic soak test mode for RAC fuzz framework' directly and clearly describes the main change: adding periodic soak testing capability to the RAC fuzz framework.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/soak-test-periodic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 37 minutes and 38 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@rophy
Copy link
Copy Markdown
Owner Author

rophy commented Apr 29, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
tests/dbz-twin/rac/archive-cleanup/Dockerfile (1)

1-14: Consider adding a non-root user for security best practices.

The container runs as root by default. While this is acceptable for test infrastructure, adding a non-root user would align with container security best practices.

🛡️ Optional: Add non-root user
 RUN apk add --no-cache openssh-client curl \
  && curl -fsSL -o /usr/local/bin/supercronic "${SUPERCRONIC_URL}" \
  && echo "${SUPERCRONIC_SHA}  /usr/local/bin/supercronic" | sha1sum -c - \
  && chmod +x /usr/local/bin/supercronic \
- && apk del curl
+ && apk del curl \
+ && adduser -D -u 1000 cleanup

+USER cleanup
 ENTRYPOINT ["/usr/local/bin/supercronic", "-passthrough-logs"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/dbz-twin/rac/archive-cleanup/Dockerfile` around lines 1 - 14, Add a
non-root user and switch to it at the end of the Dockerfile: create a group/user
(e.g., addgroup -S super && adduser -S super -G super), change ownership of
/usr/local/bin/supercronic (and any other runtime paths like /etc/crontab if
needed) to that user, and add a USER super line before ENTRYPOINT/CMD; ensure
permissions allow execution of /usr/local/bin/supercronic and that ENTRYPOINT
("/usr/local/bin/supercronic", "-passthrough-logs") continues to work under the
non-root user.
tests/dbz-twin/rac/fuzz-test.sh (1)

463-463: Quote $_SSH_OPTS to prevent word splitting.

Shellcheck SC2086 correctly identifies that $_SSH_OPTS should be quoted. While it likely works in practice, quoting is safer.

🔧 Suggested fix
-    ssh $_SSH_OPTS "${VM_USER}@${VM_HOST}" \
+    ssh "$_SSH_OPTS" "${VM_USER}@${VM_HOST}" \

Note: If $_SSH_OPTS contains multiple space-separated options, you may need to use an array instead: ssh "${_SSH_OPTS[@]}" ...

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/dbz-twin/rac/fuzz-test.sh` at line 463, The ssh invocation in
fuzz-test.sh uses an unquoted variable $_SSH_OPTS which can undergo word
splitting (SC2086); update the ssh call to quote the variable (use
"${_SSH_OPTS}" or, if _SSH_OPTS is intended as an array, use "${_SSH_OPTS[@]}")
so options are preserved correctly when invoking ssh "${_SSH_OPTS}"
"${VM_USER}@${VM_HOST}" ...; ensure references to _SSH_OPTS, VM_USER, VM_HOST
and the ssh command are updated accordingly.
tests/dbz-twin/rac/validator.py (2)

434-434: Minor: Unused variable nf in unpacking.

Prefix with underscore to indicate it's intentionally unused.

🧹 Fix unused variable warning
-            (v, m, mm, mo, ml, to_, tl, lmc, oc, nf) = result
+            (v, m, mm, mo, ml, to_, tl, lmc, oc, _nf) = result
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/dbz-twin/rac/validator.py` at line 434, The tuple unpacking currently
binds an unused variable named `nf`; change the unpack target `nf` to `_nf` (or
`_`) in the unpack expression `(v, m, mm, mo, ml, to_, tl, lmc, oc, nf) =
result` so the unused value is clearly marked and silences the warning while
leaving the rest of the bindings (`v, m, mm, mo, ml, to_, tl, lmc, oc`)
unchanged.

358-361: Minor: Remove unnecessary f-string prefixes.

Lines 359 and 382 use f-strings without placeholders.

🧹 Fix linting warnings
-    print(f"\n{'='*60}", flush=True)
-    print(f"  Fuzz Test Validation Summary", flush=True)
-    print(f"{'='*60}", flush=True)
+    print(f"\n{'='*60}", flush=True)
+    print("  Fuzz Test Validation Summary", flush=True)
+    print(f"{'='*60}", flush=True)

And at line 382:

-    print(f"Validator starting", flush=True)
+    print("Validator starting", flush=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/dbz-twin/rac/validator.py` around lines 358 - 361, Several print
statements use f-strings without placeholders (for example the header prints
using {'='*60} and the print that outputs total_validated), which is
unnecessary; update those prints to plain string literals by removing the
leading f (e.g., change print(f"\n{'='*60}", ...) and print(f"  Total validated:
{total_validated}", ...) to use non-f-prefixed strings where appropriate and
only keep f-strings for prints that actually interpolate variables like
total_validated). Ensure you only remove the f-prefix on lines that have no
Python expression interpolation so formatting remains identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/dbz-twin/rac/soak.sh`:
- Line 6: The hardcoded absolute path in the source command inside
tests/dbz-twin/rac/soak.sh should be replaced with a path computed relative to
the script location (the source invocation in soak.sh), e.g., compute the script
directory via $0 or ${BASH_SOURCE[0]} and source the vm-env.sh from
../environments/rac/vm-env.sh relative to that directory so the script is
portable across machines.

---

Nitpick comments:
In `@tests/dbz-twin/rac/archive-cleanup/Dockerfile`:
- Around line 1-14: Add a non-root user and switch to it at the end of the
Dockerfile: create a group/user (e.g., addgroup -S super && adduser -S super -G
super), change ownership of /usr/local/bin/supercronic (and any other runtime
paths like /etc/crontab if needed) to that user, and add a USER super line
before ENTRYPOINT/CMD; ensure permissions allow execution of
/usr/local/bin/supercronic and that ENTRYPOINT ("/usr/local/bin/supercronic",
"-passthrough-logs") continues to work under the non-root user.

In `@tests/dbz-twin/rac/fuzz-test.sh`:
- Line 463: The ssh invocation in fuzz-test.sh uses an unquoted variable
$_SSH_OPTS which can undergo word splitting (SC2086); update the ssh call to
quote the variable (use "${_SSH_OPTS}" or, if _SSH_OPTS is intended as an array,
use "${_SSH_OPTS[@]}") so options are preserved correctly when invoking ssh
"${_SSH_OPTS}" "${VM_USER}@${VM_HOST}" ...; ensure references to _SSH_OPTS,
VM_USER, VM_HOST and the ssh command are updated accordingly.

In `@tests/dbz-twin/rac/validator.py`:
- Line 434: The tuple unpacking currently binds an unused variable named `nf`;
change the unpack target `nf` to `_nf` (or `_`) in the unpack expression `(v, m,
mm, mo, ml, to_, tl, lmc, oc, nf) = result` so the unused value is clearly
marked and silences the warning while leaving the rest of the bindings (`v, m,
mm, mo, ml, to_, tl, lmc, oc`) unchanged.
- Around line 358-361: Several print statements use f-strings without
placeholders (for example the header prints using {'='*60} and the print that
outputs total_validated), which is unnecessary; update those prints to plain
string literals by removing the leading f (e.g., change print(f"\n{'='*60}",
...) and print(f"  Total validated:    {total_validated}", ...) to use
non-f-prefixed strings where appropriate and only keep f-strings for prints that
actually interpolate variables like total_validated). Ensure you only remove the
f-prefix on lines that have no Python expression interpolation so formatting
remains identical.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ab34e29b-3e2d-4124-a533-7cdad113d68b

📥 Commits

Reviewing files that changed from the base of the PR and between 21955ba and d938471.

📒 Files selected for processing (8)
  • tests/dbz-twin/rac/archive-cleanup/Dockerfile
  • tests/dbz-twin/rac/archive-cleanup/crontab
  • tests/dbz-twin/rac/docker-compose-fuzz.yaml
  • tests/dbz-twin/rac/fuzz-test.sh
  • tests/dbz-twin/rac/kafka-consumer.py
  • tests/dbz-twin/rac/perf/fuzz-workload.sql
  • tests/dbz-twin/rac/soak.sh
  • tests/dbz-twin/rac/validator.py

Comment thread tests/dbz-twin/rac/soak.sh Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant