Skip to content

HDDS-12864. All commit semantics in replication writes#10365

Open
ss77892 wants to merge 1 commit into
apache:masterfrom
ss77892:HDDS-12864
Open

HDDS-12864. All commit semantics in replication writes#10365
ss77892 wants to merge 1 commit into
apache:masterfrom
ss77892:HDDS-12864

Conversation

@ss77892
Copy link
Copy Markdown
Contributor

@ss77892 ss77892 commented May 26, 2026

What changes were proposed in this pull request?

That's WiP patch for All commit replication writes.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12864

How was this patch tested?

Later will be covered with UTs.

Copy link
Copy Markdown

@yandrey321 yandrey321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there enough tests to cover new logic and error condition?

* <p>{@link #applyTransaction(TransactionContext)} becomes a no-op for PutBlock because
* {@code persistPutBlock()} detects the already-written BCSID and returns early.
*/
private CompletableFuture<Message> writePutBlockData(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be refactored to smaller functions to improve readability

stateMachineDataCache.put(entryIndex, write.getData());
// Cache the full proto bytes so read() can return them for slow followers
// without having to reconstruct the proto from raw bytes.
stateMachineDataCache.put(entryIndex, requestProto.toByteString());
Copy link
Copy Markdown

@yandrey321 yandrey321 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how cache would be cleaned in case if write to disk fails?

How long data would be stored in cache and how it would be evicted over the time?

executor);

writeChunkFutureMap.put(entryIndex, new WriteFutures(future, raftFuture, startTime));
if (LOG.isDebugEnabled()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are no expensive calls or string formatting for LOG.debug() call, so it would check the log level inside and shortcircuit the call.

@ivandika3 ivandika3 self-requested a review May 27, 2026 02:29
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ss77892 , thanks for working on this! Please see the comments inlined.

// Strip data for the WAL entry (logProto).
protoBuilder.setWriteChunk(WriteChunkRequestProto.newBuilder(write).clearData());
// stateMachineData carries the full proto so followers can parse it directly.
builder.setStateMachineData(fullProtoWithData.toByteString());
Copy link
Copy Markdown
Contributor

@szetszwo szetszwo May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems an incompatible change since it changes the format of the stateMachineData. Then, an old datanode will not work with a new datanode.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a case like rolling upgrade when different versions of datanodes would be present at the same time? Let me think about it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we must rolling upgrade for datanodes.

Comment on lines +937 to +939
case PutBlock:
return writePutBlockData(requestProto, entry.getIndex(),
entry.getTerm(), writeStateMachineStartTime);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is to write PutBlock to RocksDb when writing the RaftLog entry (via DataApi.write(..)). If the RaftLog entry is truncated, what will happen to the RocksDb record?

@adoroszlai
Copy link
Copy Markdown
Contributor

That's WiP patch for All commit replication writes.
Later will be covered with UTs.

Please create PR as draft if further updates are expected (new tests planned to be added), or inevitable (test failure seen in fork CI run should be fixed). This saves Apache resources by skipping most CI checks until PR is ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants