Skip to content

fix COMMIT/ROLLBACK retry data-loss footgun (PostgreSQL & MySQL)#1216

Open
skunkworker wants to merge 3 commits into
jruby:72-stablefrom
skunkworker:fix_retry_rollback_commit_dataloss
Open

fix COMMIT/ROLLBACK retry data-loss footgun (PostgreSQL & MySQL)#1216
skunkworker wants to merge 3 commits into
jruby:72-stablefrom
skunkworker:fix_retry_rollback_commit_dataloss

Conversation

@skunkworker

@skunkworker skunkworker commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

NOTE: Used Opus 4.8 to identify this specific bug and add a test.

Fix: COMMIT/ROLLBACK retry data-loss footgun (PostgreSQL & MySQL)

The bug

The shared ArJdbc::Abstract::TransactionSupport mixin (lib/arjdbc/abstract/transaction_support.rb) implements every transaction control statement — BEGIN, COMMIT,
ROLLBACK, savepoints — with the same options:

  def commit_db_transaction
    log('COMMIT', 'TRANSACTION') do
      with_raw_connection(allow_retry: true, materialize_transactions: false) do |conn|
        conn.commit
      end
    end
  end

allow_retry: true tells ActiveRecord's with_raw_connection machinery that the operation is safe to replay if the connection drops. That's fine for BEGIN
(idempotent) but dangerous for COMMIT on a networked database. The failure sequence:

  1. App opens a transaction, writes rows.
  2. The backend connection is dropped right at commit time (pgbouncer reaping an idle/server connection, network blip, server restart).
  3. With allow_retry: true, AR catches the retryable ConnectionFailed, calls reconnect!(restore_transactions: true) — which replays BEGIN and savepoints but not
    the data writes (they died with the old backend).
  4. AR re-runs COMMIT on the fresh connection, against an empty transaction. It succeeds.
  5. The caller is told the commit succeeded. The writes are silently gone.

This diverges from ActiveRecord's own adapters, which deliberately use allow_retry: false for COMMIT/ROLLBACK on both PostgreSQL and MySQL (only SQLite, a local
file, uses true).

Why it became newly reachable on PostgreSQL

The recent pgbouncer patch (06e7afa) started translating backend disconnects into a retryable ActiveRecord::ConnectionFailed. Before that translation, a dropped
COMMIT surfaced as a raw PSQLException that AR never retried — so the footgun was latent. After it, the retry path is live, making silent data loss reachable in
normal pgbouncer deployments.

The fix

Override commit_db_transaction and exec_rollback_db_transaction in each adapter to force allow_retry: false, matching ActiveRecord's native adapters.
(materialize_transactions: true also matches AR's native contract for these methods.)

PostgreSQL — lib/arjdbc/postgresql/adapter.rb:585. The overrides live in the ArJdbc::PostgreSQL module, which is included after TransactionSupport, so they win in
the method resolution order.

MySQL — lib/arjdbc/mysql/adapter.rb:215. Here ArJdbc::MySQL is a native Java module, so the overrides are placed directly in the Mysql2Adapter class body
(class-level definitions override the included mixin).

Both adapters now use:

  def commit_db_transaction
    log('COMMIT', 'TRANSACTION') do
      with_raw_connection(allow_retry: false, materialize_transactions: true) do |conn|
        conn.commit
      end
    end
  end

  def exec_rollback_db_transaction
    log('ROLLBACK', 'TRANSACTION') do
      with_raw_connection(allow_retry: false, materialize_transactions: true) do |conn|
        conn.rollback
      end
    end
  end

BEGIN and savepoint methods are intentionally left with allow_retry: true — replaying them is safe and is what enables clean reconnection after an idle-pool drop.

MySQL caveat (why its fix is defensive)

Empirically verified: on MySQL/MariaDB a dropped-socket COMMIT currently raises a plain ActiveRecord::JDBCError, not a retryable ConnectionFailed. Since
retryable_connection_error? only matches ConnectionFailed/ConnectionNotEstablished, AR won't retry a failed MySQL commit today regardless of the flag — the
footgun is masked on MySQL. The MySQL fix therefore pins the safe contract so it can't be silently reintroduced if MySQL ever gains ConnectionFailed translation
(as PostgreSQL just did).

Also patch mysql

The gap: Last week's Postgres patch (06e7afa) translates backend disconnects into a retryable ActiveRecord::ConnectionFailed. MySQL had no equivalent —
Mysql2Adapter#translate_exception only matched two message patterns then called super. super (native AbstractMysqlAdapter) dispatches on error_number, so it only
catches the vendor "server gone" codes (2006/2013/…). But Connector/J and the MariaDB driver raise communications failures with SQLState class 08 (e.g. 08S01
"Communications link failure") and errorCode 0 — which neither path caught. So a proxy/server dropping an idle connection surfaced as a raw
JDBCError/StatementInvalid and AR's reconnect-and-retry never fired. (This is also why the earlier COMMIT-retry fix was latent on MySQL.)

The fix (lib/arjdbc/mysql/adapter.rb), mirroring the PG structure:

  • A connection_lost?(exception) helper checking, in order: SQLState class 08 (08000/08001/08003/08004/08006/08007/08S01), the server-gone vendor codes (2006,
    2013, 1053, 1927, 4031), connection-failure message patterns, and a wrapped SQLRecoverableException/SQLNonTransientConnectionException cause.
  • translate_exception now short-circuits to ConnectionFailed when connection_lost? is true, before the existing message cases and super.

Also fix: PostgreSQL connection leak on failed establishment

PostgreSQLRubyJdbcConnection#newConnection opened a physical backend via super.newConnection() and then ran connection.unwrap(PGConnection.class) plus six pgConnection.addDataType(...) registrations with no cleanup. If any of that post-connect setup threw, the already-open server-side connection was leaked — the caller only sees the exception and never receives a handle it could close.

Leaked backends accumulate against PostgreSQL's max_connections, which is exactly the kind of self-poisoning that produces a "too many clients already" / ConnectionTimeoutError cascade across a test suite or a long-running app.

The fix (src/java/arjdbc/postgresql/PostgreSQLRubyJdbcConnection.java) wraps the post-connect setup in a try/catch that closes the physical connection (suppressing any close error onto the original exception) before re-raising — mirroring how the native adapter discards a connection it could not fully establish.

Verified: a full PostgreSQL Rails test-suite run shows zero "too many clients already" occurrences (the only timeout markers trace to connection_pool_test.rb, a test that deliberately uses a 0.4s timeout).

Also add: SQL warnings support (AR 7.2 db_warnings_action)

The JDBC PostgreSQL adapter does not inherit the native pg adapter, so libpq's set_notice_receiver warning path never existed — db_warnings_action was a no-op and warning-related tests failed with undefined method 'to_a' for 0:Integer. PostgreSQL RAISE WARNING / NOTICE messages land on the JDBC Statement's warning chain and are lost when the statement closes, so they must be captured on the Java side.

  • RubyJdbcConnection#execute now captures statement.getWarnings() before the statement closes and exposes them via #last_warnings.
  • PostgreSQLRubyJdbcConnection#newWarning maps each PSQLWarning's ServerErrorMessage to [message, sqlstate, severity] (clean primary message, SQLSTATE, and "WARNING"/"NOTICE" level).
  • The PG raw_execute override (lib/arjdbc/postgresql/database_statements.rb) dispatches captured warnings through handle_warnings to ActiveRecord.db_warnings_action, applies warning_ignored? (NOTICE-level and the db_warnings_ignore allowlist are skipped), and normalises the Integer update-count return to [] for PG::Result API parity.

This clears the db_warnings_action log/raise/proc/allowlist tests. (The jar lib/arjdbc/jdbc/adapter_java.jar is gitignored and rebuilt during packaging/CI, so only the Java/Ruby source is committed.)

@skunkworker

Copy link
Copy Markdown
Contributor Author

Looks like the specs are failing because of the i18n v1.15.0 Fiber bug.

skunkworker and others added 3 commits June 18, 2026 18:43
…handling

Transaction control statements were defined in the shared
ArJdbc::Abstract::TransactionSupport mixin with allow_retry: true. That is
safe for the idempotent BEGIN/savepoint statements but dangerous for
COMMIT/ROLLBACK: if the backend connection drops at commit time (pgbouncer
reaping a connection, a network blip, a server restart), AR's
with_raw_connection machinery reconnects, replays BEGIN (but not the data
writes, which died with the old backend) and re-runs COMMIT against an empty
transaction - reporting success while silently losing the writes. This
became reachable on PostgreSQL once backend disconnects started being
translated to a retryable ActiveRecord::ConnectionFailed.

Override commit_db_transaction / exec_rollback_db_transaction in the
PostgreSQL and MySQL adapters to force allow_retry: false (matching AR's
native adapters); BEGIN and savepoints intentionally keep allow_retry: true.

MySQL additionally lacked translation of dropped connections: a lost backend
surfaced as a plain JDBCError (SQLState class 08 / errorCode 0), which AR's
retry path never matched. Add a connection_lost? helper (SQLState 08*,
server-gone vendor codes, message patterns, and wrapped
SQLRecoverableException/SQLNonTransientConnectionException causes) and
short-circuit translate_exception to ConnectionFailed.

Also: cap i18n < 1.15.0 (that release needs Ruby 3.2+, which we can't assume
across supported JRubies) and fix a timezone test in test/simple.rb where
with_timezone_config must wrap Time.use_zone to avoid the global-state leak
guard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…r JDBC

Correct savepoint behaviour in the shared transaction support and fix the
PostgreSQL#client_min_messages getter, which did not work under the JDBC
adapter (it does not inherit the native pg adapter implementation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…upport

1.1 Connection leak: PostgreSQLRubyJdbcConnection#newConnection opened a
physical backend via super.newConnection() but then ran unwrap()/addDataType()
without cleanup. Any failure there leaked the open server-side connection,
since the caller only saw the exception and never got a handle to close it.
Wrap the post-connect setup so the connection is closed (suppressing close
errors) before re-raising, mirroring the native adapter discarding a
connection it could not fully establish.

1.2 SQL warnings (AR 7.2 db_warnings_action): capture statement.getWarnings()
in RubyJdbcConnection#execute before the statement closes and expose them via
#last_warnings. PostgreSQLRubyJdbcConnection#newWarning maps PSQLWarning's
ServerErrorMessage to [message, sqlstate, severity]. The PG raw_execute
override dispatches them through handle_warnings to ActiveRecord
.db_warnings_action and normalises the Integer update-count return to [] for
PG::Result API parity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@skunkworker skunkworker force-pushed the fix_retry_rollback_commit_dataloss branch from c3bbd43 to 5bd86de Compare June 19, 2026 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant