Conversation
| // the guest Linux stacks – to the logrus logger tagged with the sandbox ID. | ||
| // This provides an out-of-band diagnostic snapshot without requiring the shim | ||
| // to be paused or restarted. | ||
| func ETWCallback(sourceID guid.GUID, state etw.ProviderState, level etw.Level, matchAnyKeyword uint64, matchAllKeyword uint64, filterData uintptr) { |
There was a problem hiding this comment.
If we do the etw setup here do we need to export this fn or can we just have the plugin do the etw registration and callback registration?
| events chan interface{} | ||
|
|
||
| // sandboxID is the unique identifier for the sandbox managed by this Service instance. | ||
| sandboxID string |
There was a problem hiding this comment.
In LCOWs case the sandbox is always the UVM id right? Can we just call it that here or at least say that in the comment.
There was a problem hiding this comment.
So, the UVM ID is not this identifier, though it is derived from this one. This sandboxID corresponds with the ID which is stored and used by containerd. The VM's ID is "SandboxID@vm".
Maybe we can add in the comments-
For LCOW shim, sandboxID corresponds 1-1 with the UtilityVM managed by the shim.
| sandboxID string | ||
|
|
||
| // vmHcsDocument holds the HCS compute-system document used to create the VM. | ||
| vmHcsDocument *hcsschema.ComputeSystem |
There was a problem hiding this comment.
I that the whole point of the VM controller was to have things like this hidden away so they are always up to date always ref counted etc when a modification was made?
| span.AddAttributes( | ||
| trace.StringAttribute("sandbox-id", request.SandboxID), | ||
| trace.StringAttribute("bundle", request.BundlePath), | ||
| trace.StringAttribute("net-ns-path", request.NetnsPath)) |
There was a problem hiding this comment.
The property is netns one word. Should we use a prop constants file to control this for later updates and consistency?
internal/controller/vm/status.go
Outdated
| // atomicState is a concurrency-safe VM state holder backed by an atomic int32. | ||
| // All reads and writes go through atomic operations, so no mutex is required | ||
| // for state itself. | ||
| type atomicState struct { |
There was a problem hiding this comment.
This is fine but seems excessive
internal/controller/vm/vm.go
Outdated
| // We will create the guest connection via GuestManager during StartVM. | ||
| c.guest = guestmanager.New(ctx, uvm) | ||
|
|
||
| c.vmState.store(StateCreated) |
There was a problem hiding this comment.
See you already had the mutex, so setting this didnt need to be atomic
internal/controller/vm/vm.go
Outdated
| c.mu.Lock() | ||
| defer c.mu.Unlock() | ||
|
|
||
| if c.vmState.load() == StateRunning { |
There was a problem hiding this comment.
For these I'd prefer a single read and then handle the if cases if we stay with atomic. If its just a property guarded by the mutex its fine
| // use parent context, to prevent 2 minute timout (set above) from overridding terminate operation's | ||
| // timeout and erroring out prematurely | ||
| _ = c.uvm.Terminate(pCtx) | ||
| return fmt.Errorf("failed to start VM: %w", err) |
There was a problem hiding this comment.
Shouldnt cases like this set it to state stopped as well? I wonder if you want any error from this method to set state stopped?
internal/controller/vm/vm.go
Outdated
| } | ||
|
|
||
| for _, driver := range drivers { | ||
| _ = driver |
In this commit, we are adding the initial draft of `containerd-shim-lcow-v1`. This shim is used for running Linux Containers on Windows. As part of this commit, we are adding the following- - The main entrypoint and supporting logic to start + serve the shim server. - The stubs for Task Service and Shimdiag service. - The implementation for `Sandbox` service. - The implementation for `VM Controller` which manages the VM lifecycle and operations. Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
6777136 to
1bb1938
Compare
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
1bb1938 to
a457c22
Compare
| events chan interface{} | ||
|
|
||
| // sandboxID is the unique identifier for the sandbox managed by this Service instance. | ||
| // For LCOW shim, sandboxID corresponds 1-1 with the UtilityVM managed by the shim. |
There was a problem hiding this comment.
are we not planning on adding multi-pod support?
There was a problem hiding this comment.
So multipod support will be added using podControllers where each controller owns the lifecycle of the specific pod. The sandboxID corresponds with the UVM which hosts the pods.
The flow from CRI would be-
- Call SandboxController.Create -> Creates the UVM aka Sandbox to host all the pods
- Call NewTask while passing in the SandboxID via WithSandbox -> Creates a pod in the sandbox
This way the annotation of io.kubernetes.cri.sandbox-id on the CreateTask request tells the story of the pod wherein the container needs to be added.
| func (s *Service) diagTasksInternal(ctx context.Context, request *shimdiag.TasksRequest) (*shimdiag.TasksResponse, error) { | ||
| _ = ctx | ||
| _ = request | ||
| return nil, errdefs.ErrNotImplemented | ||
| } |
There was a problem hiding this comment.
| func (s *Service) diagTasksInternal(ctx context.Context, request *shimdiag.TasksRequest) (*shimdiag.TasksResponse, error) { | |
| _ = ctx | |
| _ = request | |
| return nil, errdefs.ErrNotImplemented | |
| } | |
| func (s *Service) diagTasksInternal(_ context.Context, _*shimdiag.TasksRequest) (*shimdiag.TasksResponse, error) { | |
| return nil, errdefs.ErrNotImplemented | |
| } |
same for the below 2, and methods in service_task_internal.go
| // diagExecInHostInternal is the implementation for DiagExecInHost. | ||
| // | ||
| // It is used to create an exec session into the hosting UVM. | ||
| func (s *Service) diagExecInHostInternal(ctx context.Context, request *shimdiag.ExecProcessRequest) (*shimdiag.ExecProcessResponse, error) { |
| // document used to create an LCOW Utility VM. | ||
| func BuildSandboxConfig( | ||
| ctx context.Context, | ||
| owner string, |
There was a problem hiding this comment.
why hard-code the shim name here and not pass it from cmd/containerd-shim-lcow-v2 code?
we do set different values for owners in uvmboot and internal test code
There was a problem hiding this comment.
Right. Let me revert it back,
internal/controller/vm/vm.go
Outdated
| vmState State | ||
|
|
||
| // mu guards the concurrent access to the Manager's fields and operations. | ||
| mu sync.Mutex |
There was a problem hiding this comment.
sync.RWMutex might make sense here
There was a problem hiding this comment.
Awesome suggestion. I have incorporated the same.
internal/controller/vm/vm.go
Outdated
| c.mu.Unlock() | ||
| return "", fmt.Errorf("cannot dump stacks: VM is in incorrect state %s", c.vmState) | ||
| } | ||
| c.mu.Unlock() |
There was a problem hiding this comment.
probably best to keep c.mu locked while making the DumpStacks guest request so the uVM doesn't get mutated/terminated during the operation
There was a problem hiding this comment.
That makes sense. Implemented the change.
| Width = "width" | ||
| Height = "height" |
There was a problem hiding this comment.
do we use these anywhere besides the one location in service_task.go?
There was a problem hiding this comment.
Nope there are not used elsewhere but Justin was of the opinion that we should standardize them in logfields.
internal/logfields/fields.go
Outdated
| Version = "version" | ||
| ShimPid = "shim-pid" | ||
| TaskPid = "task-pid" | ||
| ExecSpanID = "exec-id" |
There was a problem hiding this comment.
theres already an ExecID above, and ExecSpanID doesn't really make sense since its used to log the exec ID, and not an associated trace/span
There was a problem hiding this comment.
Makes sense. Implemented the change.
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
6fb16bd to
49b2b66
Compare
shreyanshjain7174
left a comment
There was a problem hiding this comment.
Review of the critical items — each comment is a question for clarification.
| func NewController() *Manager { | ||
| return &Manager{ | ||
| logOutputDone: make(chan struct{}), | ||
| vmState: StateNotCreated, | ||
| } | ||
| } |
There was a problem hiding this comment.
so this returning *Manager instead of Controller interface ?
There was a problem hiding this comment.
Yes, this aligns with Go’s principle of “accept interfaces, return concrete types.”
| close(c.logOutputDone) | ||
| } | ||
| }() | ||
| } |
There was a problem hiding this comment.
The if _, ok := <-c.logOutputDone; ok { close(c.logOutputDone) } pattern looks broken — logOutputDone is an unbuffered chan struct{} that nobody ever sends on, only closes. So the receive here blocks forever until someone else closes it, at which point ok == false and the close branch is skipped.
This means Wait() will hang on WCOW because logOutputDone is never signaled from this goroutine?
Should this just be close(c.logOutputDone) after wg.Wait(), matching the LCOW pattern in vm_lcow.go?
There was a problem hiding this comment.
Nice catch. This seems broken in old shim too.
Fixed at both places.
| logrus.WithError(err).Fatal("failed to listen for windows logging connections") | ||
| } | ||
|
|
||
| // Use a WaitGroup to track active log processing goroutines. |
There was a problem hiding this comment.
logrus.Fatal calls os.Exit(1) — this kills the entire shim process if the log listener fails to bind. Isn't this a recoverable/non-critical failure? Should this use logrus.Error + close(c.logOutputDone) instead, so Wait() doesn't block and the shim can continue operating without log forwarding?
| Args: request.Args, | ||
| Workdir: request.Workdir, | ||
| Terminal: request.Terminal, | ||
| Stdin: request.Stdin, |
There was a problem hiding this comment.
DumpStacks only reads vmState and calls guest.DumpStacks() — there's no state mutation here. Should this be mu.RLock() / mu.RUnlock() like ExecIntoHost and Wait do, to avoid unnecessarily blocking concurrent readers?
There was a problem hiding this comment.
Yes sure. We can implement that.
| return "", nil | ||
| } | ||
|
|
||
| // Wait blocks until the VM exits and all log output processing has completed. |
There was a problem hiding this comment.
Stats() takes mu.Lock() because of the lazy vmmemProcess initialization, but this blocks all other operations (including ExecIntoHost, Wait, DumpStacks). Would it make sense to initialize vmmemProcess during StartVM, or use a separate sync.Once for the lazy init so Stats() can use RLock for the state check?
There was a problem hiding this comment.
I like the idea of RLock in Stats but vmmemProcess init should remain in this method as that simplifies the LM scenario.
| // Create the serve command. | ||
| cmd, err := newCommand(ctx, id, opts.Address, address, f) | ||
| if err != nil { | ||
| return params, err |
There was a problem hiding this comment.
This uses stdlib log.Fatalf (not logrus), which calls os.Exit(1). Shouldn't this return the error instead — return params, fmt.Errorf("failed to create event: %w", err) — so the caller can handle it gracefully?
There was a problem hiding this comment.
Yes, sure that works.
Signed-off-by: Harsh Rawat <harshrawat@microsoft.com>
Summary
This pull request introduces a new implementation of the
containerd-shim-lcow-v1for Windows, used for running Linux Containers on Windows.The changes cover the main entrypoint, shim manager logic, plugin registration, shared service implementation, and a manifest for proper Windows compatibility. The uber changes are-
Sandboxservice.VM Controllerwhich manages the VM lifecycle and operations.This shim follows the upstream pattern of using
containerd/pkg/shimfor managing it's lifecycle. It uses plugin mechanism for registering service with containerd and is initialized during startup.This PR is dependent upon PR #2612 and PR #2616