Skip to content

Small fixes for reconnect policy and hearbeat#834

Open
Lorak-mmk wants to merge 3 commits intoscylladb:masterfrom
Lorak-mmk:fix-host-reconnect
Open

Small fixes for reconnect policy and hearbeat#834
Lorak-mmk wants to merge 3 commits intoscylladb:masterfrom
Lorak-mmk:fix-host-reconnect

Conversation

@Lorak-mmk
Copy link
Copy Markdown

When looking into #295 and https://scylladb.atlassian.net/browse/SCYLLADB-1251 I found some minor issues that this PR fixes.

  1. Heartbeat exceptions showed some absurd values like Connection heartbeat timeout after -56.143085956573486 seconds, last_host=127.0.16.3:19042. This is because they used timeout argument passed to wait, which is not the whole timeout duration, but the duration LEFT after waiting for other futures.
  2. Reconnect policies had by default a maximum number of attempts. After that, the driver would just give up connecting.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk Lorak-mmk self-assigned this Apr 28, 2026
@Lorak-mmk Lorak-mmk requested a review from dkropachev April 28, 2026 21:18
Comment thread cassandra/policies.py
Comment thread cassandra/connection.py Outdated
Lorak-mmk added 2 commits May 4, 2026 10:58
This was kept this way to preserve legacy behavior, but I think changing
the behavior will be less of a problem than what the current behavior
causes.
The policy is used for reconnections (for example, reconnecting control
connection). If reconnect policy finishes generation (it will do so
after 64 attempts before my change), then the reconnector finish and the
driver won't attempt reconnection anymore. This would be a terrible
situation.
This better conveys what this is: not a timeut duration from config, but
how much of this timeout is left right now.
@Lorak-mmk Lorak-mmk force-pushed the fix-host-reconnect branch from 3465ec8 to da707be Compare May 4, 2026 09:43
@Lorak-mmk
Copy link
Copy Markdown
Author

Addressed review comments

@Lorak-mmk Lorak-mmk requested a review from dkropachev May 4, 2026 09:43
@dkropachev
Copy link
Copy Markdown
Collaborator

@Lorak-mmk , you need to fix tests that assert this exception

The timeout argument in `wait` tells how much we need to wait taking
into consideration that we already waited for some other futures.
The total wait time that this future had available to complete is
different: it includes time we spent waiting for other futures.
This created confusing hearbeat messages, that could even show negative
wait times.
I fixed it by putting both timeouts in the error message. The `timeout`
parameter of `OperationTimedOut` I changed to the original timeout
because I think it is more useful and relevant here.
@Lorak-mmk Lorak-mmk force-pushed the fix-host-reconnect branch from da707be to 4e2e9bb Compare May 4, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants