[None][fix] Make NIXL port-lock path configurable via TRTLLM_NIXL_PORT_LOCK_PATH#15500
[None][fix] Make NIXL port-lock path configurable via TRTLLM_NIXL_PORT_LOCK_PATH#15500CodersAcademy006 wants to merge 3 commits into
Conversation
NixlTransferAgent hardcodes the port-selection file lock at /tmp/trtllm_nixl_port.lock. On a shared /tmp (multi-tenant hosts, colocated containers) independent deployments contend on the same lock, serializing unrelated agents' startup. Add getEnvNixlPortLockPath(), overridable via TRTLLM_NIXL_PORT_LOCK_PATH, defaulting to the existing path so behavior is unchanged unless set. The lock still serializes port selection among agents that share the path, preserving the coordination it exists for. Fixes NVIDIA#13950 Signed-off-by: Srijan Upadhyay <srjnupadhyay@gmail.com>
Split a pure resolveNixlPortLockPath() out of getEnvNixlPortLockPath() so the resolution logic is testable without the once-cached env read, mirroring the populateIterationIndexesImpl pattern in cudaProfilerUtils. Covers the default (null), set-but-empty, and override cases. Signed-off-by: Srijan Upadhyay <srjnupadhyay@gmail.com>
📝 WalkthroughWalkthroughAdds ChangesConfigurable NIXL Port Lock Path
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
cpp/tensorrt_llm/common/envUtils.cpp (1)
312-314: ⚡ Quick winExtract the default lock path into a named constant.
Line 314 embeds the default path literal directly; please move it to a
k-prefixed constant and reuse it in the resolver.Proposed change
+namespace +{ +char const* const kDefaultNixlPortLockPath = "/tmp/trtllm_nixl_port.lock"; +} // namespace + std::string resolveNixlPortLockPath(char const* envValue) { - return (envValue && *envValue) ? std::string(envValue) : std::string("/tmp/trtllm_nixl_port.lock"); + return (envValue && *envValue) ? std::string(envValue) : std::string(kDefaultNixlPortLockPath); }As per coding guidelines, non-exempt literals should be extracted into named constants, and constants should use
k-prefixed camelCase names.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tensorrt_llm/common/envUtils.cpp` around lines 312 - 314, The function `resolveNixlPortLockPath` contains a hardcoded string literal "/tmp/trtllm_nixl_port.lock" in its return statement on line 314. Extract this literal into a module-level constant using the k-prefix naming convention (e.g., kDefaultNixlPortLockPath or similar) and replace the embedded string in the function with a reference to this constant. Ensure the constant is defined before the function and follows the k-prefixed camelCase naming pattern as per coding guidelines.Source: Coding guidelines
cpp/tests/unit_tests/common/envUtilsTest.cpp (1)
26-27: ⚡ Quick winUse named constants for expected lock paths in assertions.
Please initialize the expected paths as
k-prefixed constants and reference them inEXPECT_EQcalls instead of inline literals.Proposed change
TEST(EnvUtils, ResolveNixlPortLockPathDefault) { + std::string const kDefaultNixlPortLockPath = "/tmp/trtllm_nixl_port.lock"; // Unset (nullptr) and set-but-empty both fall back to the default path. - EXPECT_EQ(tc::resolveNixlPortLockPath(nullptr), "/tmp/trtllm_nixl_port.lock"); - EXPECT_EQ(tc::resolveNixlPortLockPath(""), "/tmp/trtllm_nixl_port.lock"); + EXPECT_EQ(tc::resolveNixlPortLockPath(nullptr), kDefaultNixlPortLockPath); + EXPECT_EQ(tc::resolveNixlPortLockPath(""), kDefaultNixlPortLockPath); } TEST(EnvUtils, ResolveNixlPortLockPathOverride) { + std::string const kOverrideNixlPortLockPath = "/run/deployA/nixl_port.lock"; // A non-empty value is used verbatim. - EXPECT_EQ(tc::resolveNixlPortLockPath("/run/deployA/nixl_port.lock"), "/run/deployA/nixl_port.lock"); + EXPECT_EQ(tc::resolveNixlPortLockPath(kOverrideNixlPortLockPath.c_str()), kOverrideNixlPortLockPath); }As per coding guidelines, non-exempt literals should be extracted into named constants, and constants should use
k-prefixed camelCase names.Also applies to: 33-33
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/unit_tests/common/envUtilsTest.cpp` around lines 26 - 27, The test file contains hardcoded string literals "/tmp/trtllm_nixl_port.lock" repeated in multiple EXPECT_EQ assertions when testing the resolveNixlPortLockPath function. Extract this literal into a k-prefixed named constant (e.g., kExpectedNixlPortLockPath) at the top of the test file or test class, then replace all inline string literals in the EXPECT_EQ calls with references to this constant to follow the coding guidelines for non-exempt literals.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cpp/tensorrt_llm/common/envUtils.cpp`:
- Around line 312-314: The function `resolveNixlPortLockPath` contains a
hardcoded string literal "/tmp/trtllm_nixl_port.lock" in its return statement on
line 314. Extract this literal into a module-level constant using the k-prefix
naming convention (e.g., kDefaultNixlPortLockPath or similar) and replace the
embedded string in the function with a reference to this constant. Ensure the
constant is defined before the function and follows the k-prefixed camelCase
naming pattern as per coding guidelines.
In `@cpp/tests/unit_tests/common/envUtilsTest.cpp`:
- Around line 26-27: The test file contains hardcoded string literals
"/tmp/trtllm_nixl_port.lock" repeated in multiple EXPECT_EQ assertions when
testing the resolveNixlPortLockPath function. Extract this literal into a
k-prefixed named constant (e.g., kExpectedNixlPortLockPath) at the top of the
test file or test class, then replace all inline string literals in the
EXPECT_EQ calls with references to this constant to follow the coding guidelines
for non-exempt literals.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 194b5dc1-2adf-49eb-9a53-d556445c291a
📒 Files selected for processing (5)
cpp/tensorrt_llm/common/envUtils.cppcpp/tensorrt_llm/common/envUtils.hcpp/tensorrt_llm/executor/cache_transmission/nixl_utils/transferAgent.cppcpp/tests/unit_tests/common/CMakeLists.txtcpp/tests/unit_tests/common/envUtilsTest.cpp
Address review: replace the repeated /tmp/trtllm_nixl_port.lock literal with kDefaultNixlPortLockPath (k-prefixed, per coding guidelines) in the resolver, and use named constants in the test assertions. Signed-off-by: Srijan Upadhyay <srjnupadhyay@gmail.com>
|
Thanks for the review. Addressed in 8b12101: Nitpicks (both applied):
Docstring Coverage check (warning): holding on this one. The 0% is the file-wide baseline — none of the existing For context on the design: the |
Description
NixlTransferAgenthardcodes the port-selection file lock at/tmp/trtllm_nixl_port.lock. On a shared/tmp(multi-tenant hosts, colocated containers), independent deployments contend on the same lock, serializing the startup of otherwise-unrelated agents.This adds
getEnvNixlPortLockPath(), overridable via theTRTLLM_NIXL_PORT_LOCK_PATHenv var and defaulting to the existing/tmp/trtllm_nixl_port.lockso behavior is unchanged unless the var is set. The lock still serializes port selection among agents that share the path, preserving the coordination it exists for — operators just scope it per deployment to avoid cross-tenant contention.An env var (rather than a config field) keeps parity with the existing NIXL env helpers (
TRTLLM_NIXL_PORT,TRTLLM_NIXL_INTERFACE, …) and is a deployment-time ops knob, not a model parameter.Fixes #13950
Changes
common/envUtils.{h,cpp}: addgetEnvNixlPortLockPath()(samestd::call_oncepattern as the other NIXL env helpers), with a pureresolveNixlPortLockPath()underneath for testability — mirrorspopulateIterationIndexesImplincudaProfilerUtils.nixl_utils/transferAgent.cpp: use it for theFileLockand the throw message (which now prints the resolved path).Test Coverage
cpp/tests/unit_tests/common/envUtilsTest.cpp(new):resolveNixlPortLockPathreturns the default on null and on set-but-empty, and the override verbatim when set.PR Checklist
Summary by CodeRabbit
New Features
Tests