Skip to content
/ server Public

MDEV-38497 Remove all slave related configurations on RESET SLAVE ALL#4641

Open
deo002 wants to merge 1 commit intoMariaDB:12.3from
deo002:fix/reset_slave_all
Open

MDEV-38497 Remove all slave related configurations on RESET SLAVE ALL#4641
deo002 wants to merge 1 commit intoMariaDB:12.3from
deo002:fix/reset_slave_all

Conversation

@deo002
Copy link
Contributor

@deo002 deo002 commented Feb 11, 2026

  • The Jira issue number for this PR is: MDEV-38497

Description

RESET SLAVE ALL on the default unnamed slave does not all clear of its configurations from the server. So, certain configurations get carried onto the subsequent CHANGE MASTER.

This PR fixes it with a proper clean up of all the config options.

Class variables and their reset status:

  • user, password, host : Set in CHANGE MASTER and must be reset in RESET SLAVE command.
  • master_log_file, master_log_pos : Set in CHANGE MASTER . master_log_file can also be set using server init options. Must be reset in RESET SLAVE command.
  • ssl config options: Set in CHANGE MASTER and must be reset in RESET SLAVE ALL command.
  • rpl_filter: Set using server init options ex. replicate_do_db, replicate_ignore_db etc. Should not be reset in RESET SLAVE as it is server level configuration.
  • domain_id_filter - IGNORE_DOMAIN_IDS, DO_DOMAIN_IDS: Set in CHANGE MASTER. Should be reset in RESET SLAVE ALL.
  • rli: Stores info about the replication thread. Initialization handled by the handle_slave_sql, the slave sql thread entry point. Relevant things are reinitialized in the reset_slave, get_master_version_and_clock functions. No need to add something in this PR.
  • checksum_alg_before_fd: to hold checksum alg in use until IO thread has received FD. No need to be touched in RESET SLAVE.
  • master_connect_retry: Set in CHANGE MASTER. Needs to be setback to default value in RESET SLAVE ALL.
  • master_retry_count: Set in CHANGE MASTER. Needs to be setback to default value in RESET SLAVE ALL. Setting it to 0 will make the slave try infinitely to connect to master resulting in timeout errors in some tests.
  • connects_tried: Set to 0 in reset_slave function. No need to add something in this PR.
  • binlog_storage_engine: Set using server init option --binlog-storage-engine of the master in get_master_version_and_clock. Nothing to be done in RESET SLAVE.
  • inited, abort_slave, slave_running, slave_run_id: Set to know the slave lifecycle state. Nothing to be done in RESET SLAVE ALL.
  • clock_diff_with_master: Clock diff with master. Not to be touched in RESET SLAVE
  • master_id: Set in get_master_version_and_clock on server connect. Needs to be reset in RESET SLAVE ALL.
  • prev_master_id: Set in rotate log event. Needs to be reset in RESET SLAVE ALL.
  • sync_counter: Set using server init option --sync-master-info. Not to be touched in RESET SLAVE ALL.
  • recieved_heartbeats: Incremented during a HEARTBEAT_LOG_EVENT event. It is reset in CHANGE MASTER but also should be reset in RESET SLAVE ALL.
  • ignore_server_ids: Set in CHANGE MASTER. Needs to be reset in RESET SLAVE ALL.
  • master_use_gtid: Set in CHANGE MASTER. Needs to be reset to default value in RESET SLAVE ALL.
  • gtid_current_pos: Used when connecting to a master server with GTID. Set to last_queued_gtid when the entire gtid event is queued without issues. Needs to be reset in RESET SLAVE ALL.
  • events_queued_since_last_gtid: Set to 0 on START SLAVE, on recieving a rotate event and when gtid event is queued without issues. As START SLAVE resets it, we don't need to reset it in RESET SLAVE ALL. But it was in code previously, so not touching it.
  • last_queued_gtid, last_queued_gtid_standalone: Set during logging a GTID_EVENT. It is not cleared anywhere, it should be though, in START SLAVE . RESET SLAVE ALL doesn't seem to be a appropriate place for it.
  • gtid_reconnect_event_skip_count: Set to events_queued_since_last_gtid when a slave io thread reconnects and to 0 in START SLAVE. As START SLAVE resets it, we don't need to reset it in RESET SLAVE ALL. But it was in code previously, so not touching it.
  • gtid_event_seen: Set to true when a GTID_EVENT occurs. Set to false when slave io thread reconnects via gtid to master. RESET SLAVE ALL doesn't seem to be a appropriate place for it. But it was in code previously, so not touching it.
  • rows_event_tracker: Reset when a slave io thread starts. No need to reset it in RESET SLAVE ALL.
  • in_start_all_slaves, in_stop_all_slaves, in_flush_all_relay_logs, users, killed: Used for operations on the master info index. Not to be reset in RESET SLAVE.
  • total_ddl_groups, total_non_trans_groups, total_trans_groups: Set when a gtid event is read from the relay log. Needs to be cleaned up in RESET SLAVE ALL. These values are also shown in SHOW SLAVE STATUS.
  • parallel_mode: Set via server init option --slave-parallel-mode. No need to be reset in RESET SLAVE.
  • semi_ack: Set after parsing the semi-sync packet header when slave io thread is started. No need to be reset in RESET SLAVE.
  • do_accept_own_server_id: Initialized when the slave io thread is started. Computed when a gtid event is recieved. No need to be reset in RESET SLAVE.
  • semi_sync_reply_enabled: Not to be reset in RESET SLAVE.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@deo002 deo002 force-pushed the fix/reset_slave_all branch 2 times, most recently from ce7d020 to 3bbf843 Compare February 11, 2026 19:23
@deo002 deo002 marked this pull request as draft February 11, 2026 19:56
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Feb 12, 2026
Copy link
Contributor

@ParadoxV5 ParadoxV5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello~ Thanks for taking this on. Here’s some early feedback before I go on break.

Comment on lines +198 to +200
mi->master_ssl_ca= nullptr; mi->master_ssl_capath= nullptr; mi->master_ssl_cert= nullptr;
mi->master_ssl_cipher= nullptr; mi->master_ssl_key= nullptr; mi->master_ssl_crl= nullptr;
mi->master_ssl_crlpath= nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should fix the earlier LTS versions as well.

In either case, this section is unique to a refactor on 12.3.

sql/rpl_mi.cc Outdated
Comment on lines +186 to +188
init_ssl_config(this);
init_connection_config(this);
init_group_counters_config(this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project isn’t just about adding clears to RESET SLAVE, but also checking which one is done where and whether they could’ve been the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked it and added tests accordingly. I'll verify it once again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll jot down the list of class variables with a comment on their "resetability" in the PR description to ensure my understanding is correct and we are on the same page.
cc: @gkodinov @bnestere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw RESET SLAVE already tested somewhere.
Maybe it didn’t cover RESET SLAVE ALL well.

# Cleanup
RESET REPLICA ALL;

# --source include/rpl_end.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please use SLAVE over REPLICA to match the other tests for the time being.
  • Perhaps the second RESET is for clearing the MASTER_HOST='localhost' part.
    But MTR is okay if you leave certain configurations set to their defaults.
  • Interesting that the CI kept working with rpl_end.inc commented out.
    Well, at least until this rpl.rpl_slave_status you reported on Zulip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it! The rpl.rpl_slave_status was failing due to a faulty code change. I had set mi->retry_count TO 0, which made the replica try to establish a connection infinitely, causing a timeout.

@deo002 deo002 force-pushed the fix/reset_slave_all branch 2 times, most recently from 713d688 to 30c79d9 Compare February 22, 2026 12:22
@deo002 deo002 marked this pull request as ready for review February 22, 2026 16:27
@deo002 deo002 force-pushed the fix/reset_slave_all branch 3 times, most recently from a446da1 to e1df985 Compare February 23, 2026 04:09
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. This is a preliminary review.

Please update the commit message and the text to say exactly what gets reset.
Also, please fix the failing tests in build-bot.

@deo002 deo002 force-pushed the fix/reset_slave_all branch from e1df985 to 55c1bba Compare February 24, 2026 04:14
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for working on this with me! Please stand by for the final review.

And yes, the test failure is indeed unrelated. I've re-run the failing host now.

@deo002
Copy link
Contributor Author

deo002 commented Feb 24, 2026

LGTM. Thank you for working on this with me! Please stand by for the final review.

And yes, the test failure is indeed unrelated. I've re-run the failing host now.

Always a pleasure @gkodinov !

@deo002 deo002 force-pushed the fix/reset_slave_all branch from 55c1bba to 0f7dbc6 Compare February 27, 2026 20:37
Fix RESET SLAVE ALL to fully clear replica configuration of the
default unnamed slave, including SSL options, connection config,
logging info and group info so subsequent CHANGE MASTER starts
from a clean state.

Signed-off-by: Dearsh Oberoi <oberoidearsh@gmail.com>
@deo002 deo002 force-pushed the fix/reset_slave_all branch from 0f7dbc6 to 4e254df Compare February 27, 2026 20:39
@ParadoxV5 ParadoxV5 added the Replication Patches involved in replication label Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

4 participants