fix COMMIT/ROLLBACK retry data-loss footgun (PostgreSQL & MySQL)#1216
Open
skunkworker wants to merge 3 commits into
Open
fix COMMIT/ROLLBACK retry data-loss footgun (PostgreSQL & MySQL)#1216skunkworker wants to merge 3 commits into
skunkworker wants to merge 3 commits into
Conversation
Contributor
Author
|
Looks like the specs are failing because of the i18n |
…handling Transaction control statements were defined in the shared ArJdbc::Abstract::TransactionSupport mixin with allow_retry: true. That is safe for the idempotent BEGIN/savepoint statements but dangerous for COMMIT/ROLLBACK: if the backend connection drops at commit time (pgbouncer reaping a connection, a network blip, a server restart), AR's with_raw_connection machinery reconnects, replays BEGIN (but not the data writes, which died with the old backend) and re-runs COMMIT against an empty transaction - reporting success while silently losing the writes. This became reachable on PostgreSQL once backend disconnects started being translated to a retryable ActiveRecord::ConnectionFailed. Override commit_db_transaction / exec_rollback_db_transaction in the PostgreSQL and MySQL adapters to force allow_retry: false (matching AR's native adapters); BEGIN and savepoints intentionally keep allow_retry: true. MySQL additionally lacked translation of dropped connections: a lost backend surfaced as a plain JDBCError (SQLState class 08 / errorCode 0), which AR's retry path never matched. Add a connection_lost? helper (SQLState 08*, server-gone vendor codes, message patterns, and wrapped SQLRecoverableException/SQLNonTransientConnectionException causes) and short-circuit translate_exception to ConnectionFailed. Also: cap i18n < 1.15.0 (that release needs Ruby 3.2+, which we can't assume across supported JRubies) and fix a timezone test in test/simple.rb where with_timezone_config must wrap Time.use_zone to avoid the global-state leak guard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r JDBC Correct savepoint behaviour in the shared transaction support and fix the PostgreSQL#client_min_messages getter, which did not work under the JDBC adapter (it does not inherit the native pg adapter implementation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…upport 1.1 Connection leak: PostgreSQLRubyJdbcConnection#newConnection opened a physical backend via super.newConnection() but then ran unwrap()/addDataType() without cleanup. Any failure there leaked the open server-side connection, since the caller only saw the exception and never got a handle to close it. Wrap the post-connect setup so the connection is closed (suppressing close errors) before re-raising, mirroring the native adapter discarding a connection it could not fully establish. 1.2 SQL warnings (AR 7.2 db_warnings_action): capture statement.getWarnings() in RubyJdbcConnection#execute before the statement closes and expose them via #last_warnings. PostgreSQLRubyJdbcConnection#newWarning maps PSQLWarning's ServerErrorMessage to [message, sqlstate, severity]. The PG raw_execute override dispatches them through handle_warnings to ActiveRecord .db_warnings_action and normalises the Integer update-count return to [] for PG::Result API parity. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
c3bbd43 to
5bd86de
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
NOTE: Used Opus 4.8 to identify this specific bug and add a test.
Fix: COMMIT/ROLLBACK retry data-loss footgun (PostgreSQL & MySQL)
The bug
The shared ArJdbc::Abstract::TransactionSupport mixin (lib/arjdbc/abstract/transaction_support.rb) implements every transaction control statement — BEGIN, COMMIT,
ROLLBACK, savepoints — with the same options:
allow_retry: truetells ActiveRecord's with_raw_connection machinery that the operation is safe to replay if the connection drops. That's fine for BEGIN(idempotent) but dangerous for COMMIT on a networked database. The failure sequence:
llow_retry: true, AR catches the retryable ConnectionFailed, callsreconnect!(restore_transactions: true) — which replays BEGIN and savepoints but notthe data writes (they died with the old backend).
This diverges from ActiveRecord's own adapters, which deliberately use
allow_retry: falsefor COMMIT/ROLLBACK on both PostgreSQL and MySQL (only SQLite, a localfile, uses true).
Why it became newly reachable on PostgreSQL
The recent pgbouncer patch (06e7afa) started translating backend disconnects into a retryable ActiveRecord::ConnectionFailed. Before that translation, a dropped
COMMIT surfaced as a raw PSQLException that AR never retried — so the footgun was latent. After it, the retry path is live, making silent data loss reachable in
normal pgbouncer deployments.
The fix
Override commit_db_transaction and exec_rollback_db_transaction in each adapter to force
allow_retry: false, matching ActiveRecord's native adapters.(materialize_transactions: true also matches AR's native contract for these methods.)
PostgreSQL — lib/arjdbc/postgresql/adapter.rb:585. The overrides live in the ArJdbc::PostgreSQL module, which is included after TransactionSupport, so they win in
the method resolution order.
MySQL — lib/arjdbc/mysql/adapter.rb:215. Here ArJdbc::MySQL is a native Java module, so the overrides are placed directly in the Mysql2Adapter class body
(class-level definitions override the included mixin).
Both adapters now use:
BEGIN and savepoint methods are intentionally left with
allow_retry: true— replaying them is safe and is what enables clean reconnection after an idle-pool drop.MySQL caveat (why its fix is defensive)
Empirically verified: on MySQL/MariaDB a dropped-socket COMMIT currently raises a plain ActiveRecord::JDBCError, not a retryable ConnectionFailed. Since
retryable_connection_error? only matches ConnectionFailed/ConnectionNotEstablished, AR won't retry a failed MySQL commit today regardless of the flag — the
footgun is masked on MySQL. The MySQL fix therefore pins the safe contract so it can't be silently reintroduced if MySQL ever gains ConnectionFailed translation
(as PostgreSQL just did).
Also patch mysql
The gap: Last week's Postgres patch (06e7afa) translates backend disconnects into a retryable ActiveRecord::ConnectionFailed. MySQL had no equivalent —
Mysql2Adapter#translate_exception only matched two message patterns then called super. super (native AbstractMysqlAdapter) dispatches on error_number, so it only
catches the vendor "server gone" codes (2006/2013/…). But Connector/J and the MariaDB driver raise communications failures with SQLState class 08 (e.g. 08S01
"Communications link failure") and errorCode 0 — which neither path caught. So a proxy/server dropping an idle connection surfaced as a raw
JDBCError/StatementInvalid and AR's reconnect-and-retry never fired. (This is also why the earlier COMMIT-retry fix was latent on MySQL.)
The fix (lib/arjdbc/mysql/adapter.rb), mirroring the PG structure:
2013, 1053, 1927, 4031), connection-failure message patterns, and a wrapped SQLRecoverableException/SQLNonTransientConnectionException cause.
Also fix: PostgreSQL connection leak on failed establishment
PostgreSQLRubyJdbcConnection#newConnectionopened a physical backend viasuper.newConnection()and then ranconnection.unwrap(PGConnection.class)plus sixpgConnection.addDataType(...)registrations with no cleanup. If any of that post-connect setup threw, the already-open server-side connection was leaked — the caller only sees the exception and never receives a handle it could close.Leaked backends accumulate against PostgreSQL's
max_connections, which is exactly the kind of self-poisoning that produces a "too many clients already" /ConnectionTimeoutErrorcascade across a test suite or a long-running app.The fix (
src/java/arjdbc/postgresql/PostgreSQLRubyJdbcConnection.java) wraps the post-connect setup in atry/catchthat closes the physical connection (suppressing any close error onto the original exception) before re-raising — mirroring how the native adapter discards a connection it could not fully establish.Verified: a full PostgreSQL Rails test-suite run shows zero "too many clients already" occurrences (the only timeout markers trace to
connection_pool_test.rb, a test that deliberately uses a 0.4s timeout).Also add: SQL warnings support (AR 7.2
db_warnings_action)The JDBC PostgreSQL adapter does not inherit the native pg adapter, so libpq's
set_notice_receiverwarning path never existed —db_warnings_actionwas a no-op and warning-related tests failed withundefined method 'to_a' for 0:Integer. PostgreSQLRAISE WARNING/ NOTICE messages land on the JDBC Statement's warning chain and are lost when the statement closes, so they must be captured on the Java side.RubyJdbcConnection#executenow capturesstatement.getWarnings()before the statement closes and exposes them via#last_warnings.PostgreSQLRubyJdbcConnection#newWarningmaps eachPSQLWarning'sServerErrorMessageto[message, sqlstate, severity](clean primary message, SQLSTATE, and "WARNING"/"NOTICE" level).raw_executeoverride (lib/arjdbc/postgresql/database_statements.rb) dispatches captured warnings throughhandle_warningstoActiveRecord.db_warnings_action, applieswarning_ignored?(NOTICE-level and thedb_warnings_ignoreallowlist are skipped), and normalises the Integer update-count return to[]forPG::ResultAPI parity.This clears the
db_warnings_actionlog/raise/proc/allowlist tests. (The jarlib/arjdbc/jdbc/adapter_java.jaris gitignored and rebuilt during packaging/CI, so only the Java/Ruby source is committed.)