Skip to content

fix: gsoc and pss checks#557

Open
martinconic wants to merge 3 commits intomasterfrom
fix/gsoc-pss-checks
Open

fix: gsoc and pss checks#557
martinconic wants to merge 3 commits intomasterfrom
fix/gsoc-pss-checks

Conversation

@martinconic
Copy link
Contributor

@martinconic martinconic commented Jan 2, 2026

The pt-pss check was failing intermittently with "500 Internal Server Error" or timeouts, even when messages were successfully delivered. This PR adds retry logic and decouples the send confirmation from the message receipt verification.

The pt-gsoc check was timing out under high load when sending many chunks in parallel. This PR introduces a configurable Chunks option, reduces the default parallel chunk count, and implements a smart retry mechanism that only re-broadcasts missing chunks.

PSS Check (pkg/check/pss)

Reliability: Added retry logic (5 attempts) to SendPSSMessage.
False Positive Fix: Modified the check to succeed if the PSS message is received by the listener, even if the SendPSSMessage call returns an error (e.g., timeout).
Configuration: Updated default PostageDepth to 22 to align with testnet configuration.
GSOC Check (pkg/check/gsoc)

Reliability: Implemented a retry mechanism that detects missing SOC updates after the initial run and re-broadcasts only the missing chunks.
Context Safety: Added ctx.Done() checks in the retry loop to ensure the check respects context cancellation immediately.
Configuration:
Added a Chunks option to control the number of parallel updates (default: 3).
Updated default PostageDepth to 22.
Added strict validation: Returns an error if Chunks is ≤ 0 (previously defaulted silently to 3).

@martinconic martinconic marked this pull request as ready for review January 5, 2026 21:48
@akrem-chabchoub
Copy link
Contributor

Should add chunks field in checks and local config ?

@gacevicljubisa
Copy link
Member

Any idea if this relates to the same issue?
#524

@nugaon, @martinconic ?

@martinconic
Copy link
Contributor Author

Any idea if this relates to the same issue? #524

@nugaon, @martinconic ?

I am not sure about what that PR tries to solve, I was just trying to address the errors occurring when running these checks.

@gacevicljubisa gacevicljubisa requested review from nugaon March 9, 2026 12:28
@nugaon
Copy link
Member

nugaon commented Mar 10, 2026

The feature/test logic should be fixed instead of loosing the criteria for successful tests e.g. reuploading chunks arbitrary times whose delivery was unsuccessful.

The GSOC use-case does not exactly mimics a real scenario, multiple chunk uploads can happen from different nodes instead of one.
More important is that GSOC address should be the closest to the receiver/listener node on the network, for that as @gacevicljubisa pointed out, #524 addressed this issue.
Nevertheless, I just peeked into the PRs regarding this and the expected behavior is not alive in Bee ethersphere/bee#5081
thereby, GSOC payloads can be lost between two pullsyncs, since Bee is not capable of storing different payloads of a SOC -> one pull sync period / 1 update is supported only in this state.

My guess for the PSS problem is the hard-coded AddressPrefix field value. It only defines the first 4 bits of the target and it may be too shallow for the actual test network storage depth, but Bee side does not even handle a hex value that has length of 1 so I wonder how the PSS sending could have worked ever in this test. I support that to be 2 instead to be aligned with the proposed target depth 8 at GSOC.

Could you link some CI errors regarding the PSS and GSOC tests that fail?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants