Skip to content

netty: Preserve early server handshake failure cause in logs#12626

Open
becomeStar wants to merge 2 commits intogrpc:masterfrom
becomeStar:netty/propagate-handshake-failure
Open

netty: Preserve early server handshake failure cause in logs#12626
becomeStar wants to merge 2 commits intogrpc:masterfrom
becomeStar:netty/propagate-handshake-failure

Conversation

@becomeStar
Copy link
Contributor

@becomeStar becomeStar commented Jan 25, 2026

Early server-side negotiation failures may terminate a transport before NettyServerHandler is fully active in the pipeline. In those cases, the original handshake failure can be missing from transport termination logging because termination may rely on connectionError(), which can be null on this early path.

This change adds a server-side NOOP write in NettyServerTransport.start() (analogous to the existing client-side NOOP write path). If that write fails, its cause is passed to notifyTerminated(), preserving and logging the original transport termination reason for debugging.

To support this, NettyServerHandler now accepts NOOP_MESSAGE writes by writing an empty buffer, and tests are added to verify:

  • transport failure logging for plaintext-client to TLS-server failure
  • server NOOP write handling in NettyServerHandler

Fixes #8495

When a handshake failure occurs before any writes are buffered on the server
side, WriteBufferingAndExceptionHandler can record the failure internally
but never surface it to downstream inbound handlers.

This makes the original handshake error unobservable and complicates
debugging and instrumentation.

Propagate only the first failure via exceptionCaught, gated on the absence
of a previous failure, so that the canonical error becomes observable while
avoiding duplicate propagation and preserving existing close semantics.
@kannanjgithub
Copy link
Contributor

Replied my thought on issue #8495.

@becomeStar
Copy link
Contributor Author

@kannanjgithub

Thank you very much for your detailed analysis and for taking the time to simulate the failure. Your observation about the object handle changing is incredibly helpful and provides a clear clue as to why the original root cause may be getting lost.

It does seem that failCause can effectively be reset when a new instance of WriteBufferingAndExceptionHandler is introduced into the pipeline, which explains why a secondary exception ends up being surfaced instead of the original handshake failure.

I’ll dig further into where and why the handler instance is being replaced and look for a way to ensure the first meaningful exception is preserved across instances.

Based on your feedback, I’ll work toward a refined solution that addresses this state-loss issue directly. Once I have a clearer fix, I can either update this PR or follow up with a new one, depending on what you think makes the most sense.

Thanks again for the detailed investigation and guidance — it’s been extremely helpful.

Add a server-side NOOP write path to surface early negotiation failures through notifyTerminated().

When negotiation fails before NettyServerHandler becomes active in the pipeline, connectionError() can remain null and the transport may log an unhelpful termination reason.

This preserves the original failure cause in transport logs and adds coverage for the server NOOP write path.
@becomeStar becomeStar changed the title netty: Propagate initial handshake failure before close netty: Preserve early server handshake failure cause in logs Feb 19, 2026
@becomeStar
Copy link
Contributor Author

@kannanjgithub

Thanks again for the guidance.

I added one new commit to this PR following the direction we discussed: a server-side NOOP write path in NettyServerTransport.start() that reports the write failure cause to notifyTerminated(...), plus the corresponding server-side NOOP handling and tests.

Could you please take another look when you have a moment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Netty server loses exception during handshake

2 participants

Comments