Skip to content

fix: shutdown logger after container process exits#2337

Closed
mharwani wants to merge 6 commits intocontainerd:mainfrom
mharwani:mharwani/shutdown-logger-after-exit
Closed

fix: shutdown logger after container process exits#2337
mharwani wants to merge 6 commits intocontainerd:mainfrom
mharwani:mharwani/shutdown-logger-after-exit

Conversation

@mharwani
Copy link
Copy Markdown

Addresses #2313

Container task is not deleted after it exits, leaving the runtime shim and logger open throughout the lifecycle of a container. This can be a problem when there is unprocessed stdout data after a container task finishes and the logger is blocked forever waiting for a newline or EOF.

This PR fixes the issue by piping container output instead of reading directly from stdout and stderr file descriptors. Closing the io pipes upon task exit allows the logger to shutdown gracefully and process leftover data without a newline .

@mharwani mharwani marked this pull request as ready for review June 29, 2023 19:07
@fahedouch fahedouch self-requested a review July 1, 2023 22:35
@fahedouch fahedouch added this to the v1.5.0 milestone Jul 1, 2023
@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 97ef2b7 to 9d8a14f Compare July 10, 2023 17:31
@fahedouch
Copy link
Copy Markdown
Member

fahedouch commented Jul 10, 2023

Hi @mharwani, as mentioned here closing task io io.Close() after task exiting should close FIFOs => send EOF to stdio => logger will receive EOF

@mharwani
Copy link
Copy Markdown
Author

Hi @mharwani, as mentioned here closing task io io.Close() after task exiting should close FIFOs => send EOF to stdio => logger will receive EOF

Hi @fahedouch, I tried closing the task IO upon container exit with task.IO().Close(), but it did not fix the issue. The scanner did not receive EOF. I posted my thoughts here:

From what I understand, when a task is running in detached mode it uses logURI as the task IO, which doesn't have any FIFOs open and does nothing on close.

Comment thread cmd/nerdctl/container_logs_test.go Outdated
base := testutil.NewBase(t)
containerName := testutil.Identifier(t)
defer base.Cmd("rm", containerName).Run()
expected := []string{}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected := []string{}
expected := make([]string, 10)

}

base.Cmd("run", "-d", "--name", containerName, testutil.CommonImage,
"sh", "-euc", "for i in `seq 1 10`; do echo $i; sleep 1; done").AssertOK()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"sh", "-euc", "for i in `seq 1 10`; do echo $i; sleep 1; done").AssertOK()
"sh", "-euc", "for i in `seq 1 10`; do echo $i; done").AssertOK()

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added sleep to have both the container task and the logs command running concurrently

Comment thread cmd/nerdctl/container_logs_test.go Outdated
base.Cmd("run", "-d", "--name", containerName, testutil.CommonImage,
"sh", "-euc", "for i in `seq 1 10`; do echo $i; sleep 1; done").AssertOK()
base.Cmd("logs", "-f", containerName).AssertOutContainsAll(expected...)
base.Cmd("rm", "-f", containerName).AssertOK()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless regarding defer

Comment thread cmd/nerdctl/container_logs_test.go Outdated
t.Parallel()
base := testutil.NewBase(t)
containerName := testutil.Identifier(t)
defer base.Cmd("rm", containerName).Run()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
defer base.Cmd("rm", containerName).Run()
defer base.Cmd("rm", "-f", containerName).Run()

base.Cmd("rm", "-f", containerName).AssertOK()
}

func TestLogsWithRunningContainer(t *testing.T) {
Copy link
Copy Markdown
Member

@fahedouch fahedouch Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what this test covers comparing to this one ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other test runs the log command after the container has stopped running, but this one tests logs during the task execution and as logs are being sent to the logger with the --follow option.

Comment thread cmd/nerdctl/container_logs_test.go Outdated
Comment thread pkg/cmd/container/logs.go Outdated
go func() {
<-waitCh
// wait 100ms to let logViewer process data sent after container exit, if any
time.Sleep(100 * time.Millisecond)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you experienced data not processed by the LogViewer?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the logger may process additional data after container exits and there is no newline or EOF. For example, the following will not print any logs without time.Sleep:

sudo nerdctl run -d --name test alpine sh -c 'sleep 10; echo -n hello'
sudo nerdctl logs -f test

Sleeping for 100ms is undeterministic. There should be a better way to do this. Maybe somehow wait for the logger to finish instead of container wait?

Copy link
Copy Markdown
Member

@fahedouch fahedouch Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, waiting for the logger to complete instead of waiting for the container to exit gives a more accurate result

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't get a chance to work on it. I will send a new revision later today.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fahedouch To wait for the logger to complete, it has to send a signal to the LogViewer. I'm using file locks to accomplish this IPC. PTAL here and here. Thanks!

Comment thread pkg/logging/logging.go Outdated
if err := driver.PreProcess(dataStore, config); err != nil {
return err
}
exit, err := getContainerWait(ctx, hostAddress, config)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exit, err := getContainerWait(ctx, hostAddress, config)
exitCh, err := getContainerWait(ctx, hostAddress, config)

Comment thread pkg/logging/logging.go Outdated
stderrR, stderrW := io.Pipe()
copyStream := func(reader io.Reader, writer *io.PipeWriter) {
// copy using a buffer of size 16K
buf := make([]byte, 16*1024)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buf := make([]byte, 16*1024)
buf := make([]byte, 16<<10)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why choose a 16k buffer?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to 32k as that is what containerd uses to copy stdio:
https://github.com/containerd/containerd/blob/main/cio/io.go#L34

Comment thread pkg/logging/logging.go
Comment thread pkg/cmd/container/create.go Outdated

// generateLogConfig creates a LogConfig for the current container store
func generateLogConfig(dataStore string, id string, logDriver string, logOpt []string, ns string) (logConfig logging.LogConfig, err error) {
func generateLogConfig(dataStore string, id string, logDriver string, logOpt []string, gOpt *types.GlobalCommandOptions) (logConfig logging.LogConfig, err error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why passing a pointer here gOpt *types.GlobalCommandOptions?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 9d8a14f to 8763da2 Compare July 19, 2023 14:58
@mharwani
Copy link
Copy Markdown
Author

It looks like the environment setup is failing for some of the checks because curl cannot reach the endpoint https://raw.githubusercontent.com/AkihiroSuda/containerized-systemd/v0.1.1/docker-entrypoint.sh:
SSL: no alternative certificate subject name matches target host name 'raw.githubusercontent.com'

Any ideas?

@mharwani mharwani requested a review from fahedouch July 19, 2023 15:41
@AkihiroSuda
Copy link
Copy Markdown
Member

CI is now green for Linux.
Please try rebasing to make Windows CI green

Copy link
Copy Markdown
Member

@fahedouch fahedouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 8763da2 to 23b24a6 Compare July 31, 2023 05:08
@AkihiroSuda
Copy link
Copy Markdown
Member

Windows tests are failing

=== NAME  TestLogs
    container_logs_test.go:42: assertion failed: 
        Command:  C:\Windows\system32\config\systemprofile\go\bin\nerdctl.exe --namespace=nerdctl-test logs --since 10s nerdctl-testlogs
        ExitCode: 0
        Stdout:   
        Stderr:   
        Failures:
        Expected stdout to contain "foo\nbar"
=== CONT  TestLogsWithoutNewlineOrEOF
--- FAIL: TestLogs (4.60s)
=== NAME  TestLogsAfterRestartingContainer

https://cirrus-ci.com/task/4610614887186432?logs=build#L843

Comment thread pkg/logging/logging.go Outdated
}
if err := ready(); err != nil {

lockFile := getLockPath(dataStore, config.Namespace, config.ID)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be loggerLock is more explicit here

@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 3ce8c17 to 1df7878 Compare August 1, 2023 21:18
Signed-off-by: Mrudul Harwani <mharwani@amazon.com>
Signed-off-by: Mrudul Harwani <mharwani@amazon.com>
…ocessed data

Signed-off-by: Mrudul Harwani <mharwani@amazon.com>
Signed-off-by: Mrudul Harwani <mharwani@amazon.com>
Signed-off-by: Mrudul Harwani <mharwani@amazon.com>
@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 1df7878 to 39fe0b3 Compare August 1, 2023 21:20
Signed-off-by: Mrudul Harwani <mharwani@amazon.com>
@mharwani mharwani force-pushed the mharwani/shutdown-logger-after-exit branch from 39fe0b3 to 0050ac5 Compare August 1, 2023 21:30
Comment thread pkg/logging/logging.go
@fahedouch fahedouch self-requested a review August 8, 2023 20:37
Comment thread cmd/nerdctl/container_logs_test.go
Comment thread pkg/logging/logging.go
Comment thread pkg/logging/logging.go
for {
select {
case <-ctx.Done():
return nil, fmt.Errorf("timed out waiting for container task to start")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("timed out waiting for container task to start")
return nil, errors.New("timed out waiting for container task to start")

Comment thread pkg/logging/logging.go

// If task was not found, it's possible that the container runtime is still being created.
// Retry every 100ms.
ticker := time.NewTicker(100 * time.Millisecond)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ticker := time.NewTicker(100 * time.Millisecond)
ticker := time.NewTicker(100 * time.Millisecond)
defer ticker.Stop()

@fahedouch
Copy link
Copy Markdown
Member

@mharwani PTAL

@AkihiroSuda AkihiroSuda modified the milestones: v1.5.1, v1.5.2 Sep 11, 2023
@AkihiroSuda
Copy link
Copy Markdown
Member

ping @mharwani

@apostasie
Copy link
Copy Markdown
Contributor

Hey @mharwani

Would you still be interested in carrying this PR through?
If so, could you rebase (a lot has changed) and see about @fahedouch comments?

@fahedouch
Copy link
Copy Markdown
Member

Sorry for ping again @mharwani

Would you still be interested in carrying this PR through?

AkihiroSuda added a commit that referenced this pull request Mar 10, 2025
…xits

[carry #2337] ensure logger completion after container exit
@fahedouch fahedouch closed this Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants