netty: Preserve early server handshake failure cause in logs#12626
netty: Preserve early server handshake failure cause in logs#12626becomeStar wants to merge 2 commits intogrpc:masterfrom
Conversation
When a handshake failure occurs before any writes are buffered on the server side, WriteBufferingAndExceptionHandler can record the failure internally but never surface it to downstream inbound handlers. This makes the original handshake error unobservable and complicates debugging and instrumentation. Propagate only the first failure via exceptionCaught, gated on the absence of a previous failure, so that the canonical error becomes observable while avoiding duplicate propagation and preserving existing close semantics.
|
Replied my thought on issue #8495. |
|
Thank you very much for your detailed analysis and for taking the time to simulate the failure. Your observation about the object handle changing is incredibly helpful and provides a clear clue as to why the original root cause may be getting lost. It does seem that failCause can effectively be reset when a new instance of WriteBufferingAndExceptionHandler is introduced into the pipeline, which explains why a secondary exception ends up being surfaced instead of the original handshake failure. I’ll dig further into where and why the handler instance is being replaced and look for a way to ensure the first meaningful exception is preserved across instances. Based on your feedback, I’ll work toward a refined solution that addresses this state-loss issue directly. Once I have a clearer fix, I can either update this PR or follow up with a new one, depending on what you think makes the most sense. Thanks again for the detailed investigation and guidance — it’s been extremely helpful. |
Add a server-side NOOP write path to surface early negotiation failures through notifyTerminated(). When negotiation fails before NettyServerHandler becomes active in the pipeline, connectionError() can remain null and the transport may log an unhelpful termination reason. This preserves the original failure cause in transport logs and adds coverage for the server NOOP write path.
|
Thanks again for the guidance. I added one new commit to this PR following the direction we discussed: a server-side NOOP write path in Could you please take another look when you have a moment? |
Early server-side negotiation failures may terminate a transport before NettyServerHandler is fully active in the pipeline. In those cases, the original handshake failure can be missing from transport termination logging because termination may rely on connectionError(), which can be null on this early path.
This change adds a server-side NOOP write in NettyServerTransport.start() (analogous to the existing client-side NOOP write path). If that write fails, its cause is passed to notifyTerminated(), preserving and logging the original transport termination reason for debugging.
To support this, NettyServerHandler now accepts NOOP_MESSAGE writes by writing an empty buffer, and tests are added to verify:
Fixes #8495