rptest: add list_offsets leader epoch test#30285
Open
nguyen-andrew wants to merge 4 commits intoredpanda-data:devfrom
Open
rptest: add list_offsets leader epoch test#30285nguyen-andrew wants to merge 4 commits intoredpanda-data:devfrom
nguyen-andrew wants to merge 4 commits intoredpanda-data:devfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new rptest that exercises Kafka ListOffsets v4 leader-epoch semantics across earliest/latest/timequery and empty-partition paths, running the same procedure against Apache Kafka (as baseline) and Redpanda.
Changes:
- Introduces custom ListOffsets v4 request/response schema helpers to query leader_epoch from protocol responses.
- Adds shared test logic to create a record-epoch vs leader-epoch gap by restarting leaders and then validating returned epochs for different ListOffsets timestamp modes.
- Adds Redpanda and Apache Kafka test classes that run the same assertions with cluster-specific leader-restart mechanics.
Exercises ListOffsets v4 for the earliest, latest, timequery, and empty-partition paths. Produces records at a single epoch, advances the leader epoch 3 times via leader restart, and asserts the returned epoch against the expected value for each cluster: Kafka returns the record's historical epoch for earliest/timequery (the correct behavior we compare against), while Redpanda currently returns the current leader epoch. The empty-partition variant drives the same flow with zero records to cover the no-records branch. Covers CORE-12505: Redpanda returns the current leader epoch instead of the record's historical epoch for earliest, timequery, and empty partition paths. The Redpanda test pins today's buggy behavior so it will fail once the bug is fixed and can be flipped to assert correct behavior.
44c733a to
5deffbd
Compare
Adds optional kwargs `topics: list[str] | None` and `allow_new_topics: bool` (default False) to RpkTool.group_seek_to. Backward-compatible: existing callers passing only (group, to) are unchanged. Needed by the CORE-12505 e2e tests' throwaway-hack flow, which seeks a fresh consumer group to a topic the group has not yet consumed -- rpk requires --allow-new-topics for that case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `clean_shutdown: bool = False` to RpkConsumer.__init__ and threads it into stop_node's kill_process call. Default preserves the existing SIGKILL behavior; opt in via clean_shutdown=True to send SIGTERM, which triggers rpk's signal handler at consume.go:118-152 -> franz-go client.Close() -> final commit + LeaveGroup before the process exits. Used by the CORE-12505 e2e tests, where sequential consumers in the same group would otherwise wait ~45s for the prior member's session timeout before joining. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two ducktape tests on ListOffsetsLeaderEpochRedpandaTest: - test_seek_to_start_poisons_commit -- end-to-end reproduction of the bug via `rpk group seek --to start`. With a stagnant topic and an epoch gap, the seek writes (0, current_epoch) into __consumer_offsets; an `rpk topic consume --group` consumer (franz-go AutoCommitMarks) reads all records but commits don't advance; restart triggers a full replay. The bug is broader than this single trigger -- it manifests for any seek that resolves to records produced at an older leader epoch (`--to start`, `--to <past-timestamp>`, or any tool that does ListOffsets earliest/timequery -> OffsetCommit). `--to start` is exercised here because it's the most reproducible trigger. - test_throwaway_hack_mitigates_seek_to_start -- mirror test verifying the --to-file-with-throwaway-group hack as a mitigation. Seeded commit is (0, -1); franz-go's EpochOffset.Less comparator clamp lets real marks advance the head; restart reads zero records. Supporting additions in the same file: - OffsetFetchRequest_v5 / OffsetFetchResponse_v5 protocol classes, needed because `rpk group describe` exposes only CURRENT-OFFSET, not committed_leader_epoch. - _get_committed helper on the base class -- issues OffsetFetch v5 directly via the kafka admin client and returns (offset, epoch). - _consume_and_wait_for_autocommit helper on the Redpanda subclass -- wraps RpkConsumer with clean_shutdown=True for graceful LeaveGroup. - _apply_throwaway_hack helper -- runs the literal 5-step hack with try/finally cleanup of the throwaway group and the local seek file. _setup_topic_with_epoch_gap is parameterized with num_epoch_advances (default 3 to preserve the existing epoch-correctness tests' behavior; the new tests pass 1, since the bug only requires current_epoch > initial_epoch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds ducktape coverage for CORE-12505: Redpanda returns the current leader epoch instead of the record's historical epoch on the ListOffsets earliest, timequery-match, and empty-partition paths.
Server-side response shape.
test_list_offsets_epochandtest_list_offsets_epoch_empty_partitionexercise ListOffsets v4 against both Kafka and Redpanda for the earliest, latest, timequery, and empty-partition paths. Produces records at a single epoch, advances the leader epoch 3 times via leader restart, and asserts the returned epoch per path. Kafka returns the record's historical epoch, which is the correct behavior we compare against. Redpanda currently returns the current leader epoch. The Redpanda test pins today's buggy behavior so it will fail once the bug is fixed and can be flipped to assert correct behavior. The empty-partition variant covers the no-records branch.Consumer-level end-to-end.
test_seek_to_start_poisons_commitreproduces the bug end-to-end viarpk group seek --to start. With a stagnant topic and an epoch gap, the seek writes(0, current_epoch)into__consumer_offsets. Anrpk topic consume --groupconsumer (franz-goAutoCommitMarks) reads all records but commits don't advance. Restart triggers a full replay. The bug applies to any seek that resolves to records at an older leader epoch;--to startis the most reproducible trigger.test_throwaway_hack_mitigates_seek_to_startverifies the--to-file-with-throwaway-group mitigation. The seeded commit is(0, -1). franz-go'sEpochOffset.Lesscomparator clamp lets real marks advance the head. Restart reads zero records.Supporting changes.
RpkTool.group_seek_togains optionaltopics/allow_new_topicskwargs (backward-compatible).RpkConsumergains an optionalclean_shutdownkwarg (defaultFalse; opt-inTruesends SIGTERM so franz-go can flush a final commit and sendLeaveGroup). NewOffsetFetchRequest_v5/OffsetFetchResponse_v5protocol classes plus a_get_committedhelper, sincerpk group describedoesn't exposecommitted_leader_epoch.Backports Required
Release Notes