MDEV-38624: prevent parallel slave from skipping ER_CONNECTION_KILLED error#4658
Conversation
andrelkin
left a comment
There was a problem hiding this comment.
Hi!
Thanks for the patch, please check my two notes.
More to that, find out what possible tests cover these changes, and if none, please add them too.
Cheers,
Andrei
sql/log_event_server.cc
Outdated
| inline int ignored_error_code(int err_code) | ||
| { | ||
| // Force slave to stop on this error even if in --slave-skip-errors | ||
| if (err_code == ER_CONNECTION_KILLED) |
There was a problem hiding this comment.
While this is a possible place to mask ER_CONNECTION_KILLED error, it is less specific than
concurrency_error_code() to which ER_CONNECTION_KILLED belong just naturally.
Please consider that.
There was a problem hiding this comment.
At any rate the error-to-skip init_slave_skip_errors() function needs a warning about the supplied ER_CONNECTION_KILLED will be ignored.
There was a problem hiding this comment.
I checked concurrency_error_code(), and I think it is better than ignored_error_code(), but I need to ask about the function name, the ER_CONNECTION_KILLED, is not a concurrency error, right?
Is it good to add a connection error to concurrency errors?
There was a problem hiding this comment.
And I will check the tests and add a warning message.
Thanks a lot :)
There was a problem hiding this comment.
ER_CONNECTION_KILLED is a sort of concurrency error in the slave parallel setup. Yet its semantic is not about error, at facing it the parallel slave won't stop, rather it will retry.
However, let me adjust my yesterday point and by that to simplify this work.
Along with the init_slave_skip_errors() warning let's simply skip setting its bit in slave_error_mask.
Notice the single-threaded "legacy" slave is fine with this. There is nothing wrong with a requirement your patch would impose: if its lonely thread is connection-killed, it then must exit.
There was a problem hiding this comment.
I made the fix in init_slave_skip_errors() and added a warning in all option -where I cleared the bit map from connection error- and another warning in skip arg if it had the connection error.
Also, I made 2 tests for each senario please check them
gkodinov
left a comment
There was a problem hiding this comment.
This is a preliminary review. Please make sure about the following:
- you have buildbot tests running fine: right now you have compilation errors
- you have some extra commits showing up. Please re-clone your branch from the latest main branch, then cherry pick the diff you need (preferably in a single commit) and override push this into your tree.
- get a commit message that conforms to the coding standards please.
I will leave the technical review to Andrei. He has already started.
|
I did not mention last time, but the PR contains unrelated commits. Those are better be stripped off sooner. |
|
The preferred way to update the base branch around here is with rebase. See https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch. This guarantees that only your changes will be visible in the PR. I would suggest the following to make a clean single-commit PR:
This should ensure that your branch is based on the latest main from the upstream repo and that it has a single commit delta that contains only your changes. |
gkodinov
left a comment
There was a problem hiding this comment.
testing my script. please ignore.
af749ce to
1f1a08e
Compare
|
I cleared my branch, and it has only my commit now, but I couldn't clear the other 9 commits. I need to do |
gkodinov
left a comment
There was a problem hiding this comment.
According to https://github.com/MariaDB/server/blob/main/CODING_STANDARDS.md "Bug fixes should be based against the earliest maintained branch in which the bug can be reproduced". This is a bug fix. I'd say go for 10.6 if you can reproduce it there.
On the PR content: I'm not the best one to ask, but I believe gh will just display whatever extra commits you have in your branch. This is why I believe dropping the branch and re-creating it with a single commit containing only your change will do the trick.
The important outcome should be that your branch (as pushed in your cloned tree) should not contain more than 1 commit compared to its base branch.
andrelkin
left a comment
There was a problem hiding this comment.
Hi Nada!
The patch is close to be perfect. I have few suggestions for the test part.
Cheers,
Andrei
f308a94 to
ad82034
Compare
andrelkin
left a comment
There was a problem hiding this comment.
Thanks Nada!
The commit looks very good.
Cheers,
Andrei
gkodinov
left a comment
There was a problem hiding this comment.
Great that you managed to do a single commit. Please now add a commit message compliant to CODING_STANDARDS.md.
Also, please re-base to 10.11: this is not a critical bug fix.
Prevent ER_CONNECTION_KILLED (1927) from being added to the slave_error_mask bitmap in init_slave_skip_errors(), even when explicitly listed in --slave-skip-errors or when using --slave-skip-errors=all. A warning is emitted in the error log when either is attempted.
ad82034 to
f25ff47
Compare
|
I have rebased the commit and updated its message. Please check. |
|
Thanks a lot. Are there any other changes to get merged? |
|
Hello @gkodinov, are there any other changes? |
No description provided.