Add resource usage monitoring for build steps by tonistiigi · Pull Request #3860 · moby/buildkit

tonistiigi · 2023-05-11T07:19:45Z

This adds the possibility of capturing resource usage of build steps (CPU, memory, io, network) so it can be used for performance analysis or resource controls #2108 in the future.

The usage data is available in provenance attestation. To opt in user needs to set capture-usage=true. For provenance available via history API, usage is available automatically.

Some examples: Search resourceUsage, sysUsage:

https://explore.ggcr.dev/?blob=tonistiigi/buildkit@sha256:c771192c6f500ab46dc75c8aeced2d45cd564ca18dc230458aa225fbdd718936&mt=application%2Fvnd.in-toto%2Bjson&size=101624
https://explore.ggcr.dev/?blob=tonistiigi/buildkit@sha256:55904ca0b7a18fe76c10438c2a2c756eb73302628f6027138cd430b37de13d24&mt=application%2Fvnd.in-toto%2Bjson&size=109356
https://explore.ggcr.dev/?blob=docker.io/tonistiigi/buildkit@sha256:84fc0b9059458cce035df1d49402e30b441ff2a1ebe79f1d7aad6afa3e80754d&mt=application%2Fvnd.in-toto%2Bjson&size=61853

This feature requires cgroupv2. Pressure fields require a kernel with CONFIG_PSI enabled. memory.peak file requires kernel 5.19+ . Related fields are empty if requirements are not met or some group controllers are not enabled. I don't think it makes sense to add any fallbacks for cgroupv1 or non-PSI. These will be requirements for #2108 anyway in the future.

Network monitoring only works with CNI provider.

Samples are taken at the end of the step and also while execution if the step takes a long time. The minimal sample interval is 2sec and the maximum limit of samples is 10 per build step.

Please check if I'm missing any fields that could be useful, or if you think some of the fields are useless. I didn't add all fields, but I think it is better to add more than miss out on something that could become useful in the future.

System samples for CPU/Memory are added as well so they can be compared against step information to understand how much relative resources a step used. The maximum number of system samples for the whole build is 20 (same 2sec minimum interval).

thaJeztah · 2023-05-12T04:43:54Z

Is this something we could do with containerd's cgroups package? Does this mean we now have 3 separate implementations (runc, containerd, buildkit)?

tonistiigi · 2023-05-12T05:52:08Z

I don't see what could be reused from there.

thaJeztah · 2023-05-12T05:53:25Z

Isn't the "stats" code in there that collects metrics of the container's cgroup?

AkihiroSuda · 2023-05-12T05:59:53Z

I don't see what could be reused from there.

Would be nice if the structs can be shared
https://pkg.go.dev/github.com/containerd/cgroups/v3@v3.0.1/cgroup2/stats

tonistiigi · 2023-05-12T06:52:25Z

Would be nice if the structs can be shared

But they are not the same, for example:

This is designed around PSI so we can track what aspects are blocking the build. I don't see any PSI/pressure types there.
The sample for build step is taken after the container has exited. So the memory values are zeros in that case, except for memory.peak that is not in that pkg. For current memory fields, there are many other fields that I didn't add - are you making a case that these are useful and should all be included?
There are fields that control the limits, eg. pids, memory, swap etc. This PR is not about setting or detecting limits. If buildkit limit is set in LLB then it is already part of definition.
The IO types are completely different as containerd types track devices, while this PR captures throughput.

We could take this one type https://pkg.go.dev/github.com/containerd/cgroups/v3@v3.0.1/cgroup2/stats#CPUStat and embed it into our type that has the pressure support. But I don't think it makes sense to include a big package for just one struct with 6 fields.

AkihiroSuda · 2023-05-16T10:43:12Z

needs rebase

jedevc · 2023-05-17T15:43:16Z

Can we add comments linking to the specifications for the underlying file formats in https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html.

jedevc · 2023-05-17T15:47:36Z

+			*cpuStat.UserNanos = value * 1000
+		case cpuSystemUsec:
+			cpuStat.SystemNanos = new(uint64)
+			*cpuStat.SystemNanos = value * 1000


We can use uint64Ptr (from io.go) to simplify this.

jedevc · 2023-05-17T16:06:20Z

 	}
-	return nil, releaseContainer(context.TODO())
+
+	return rec, rec.CloseAsync(releaseContainer)


Instead of doing this, can we not spawn a goroutine here, so we can have a standard Close function:

go func() { // errors are explicitly discarded in this example _ = rec.Close() releaseContainer() }() return rec, nil

I already did a new implementation before I realized why I did it this way.

If the daemon is closed before releaseContainer can be called, then it can leak resources. Monitor will take care of this and will block daemon from shutting down before all the release calls have been invoked. So if releaseContainer has not been called synchronously we still need to register it so that it is guaranteed to be called before shutdown. In a new goroutine this guarantee would not exist. 451e18c

This would be cleaner if Executor itself would have a Shutdown function. Then in here, it could make it block until release has been called.

We should add this context in a comment, I think it's too easy to accidentally remove in a refactor or similar.

jedevc · 2023-05-17T16:23:50Z

I'll be honest, I'm not sure I have a good grasp on why we need both per-step resource usage, and sysUsage - what are they both tracking and why are they different?

Is sysUsage capturing from Buildkit, while the per-step is from each ExecOp? Ideally we could have some comments inline that could elaborate 😄

tonistiigi · 2023-05-17T21:29:21Z

@jedevc sysUsage is capturing the total system usage while the build is happening. This is important to put the step usage into relative context of how much of the whole system was used and what other things were happening at the system while the step/build was running. Eg. you can look at the usage of the same steps in two different runs, and while their user/sys times match, their real time might be completely different if the system was constrained by some other process at the same time. This allows to understand these cases and, for example, suggest that the reason the build step was unexpectedly slow was not because of the step itself but because of the other processes in the system. There is also sysCPUStat per step that is a delta of system CPU usage between the start and end of the step for comparison with the CPU info captured by the specific cgroup.

AkihiroSuda · 2023-06-05T06:34:48Z

Needs rebase again

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

This can be used to convert step usage to relative units. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

crazy-max · 2023-06-20T10:51:17Z

 	}

+	withUsage := false
+	if v, ok := attrs["capture-usage"]; ok {


Needs docs update

jedevc

A couple of unresolved comments on the PR (would be nice to leave some comments/links inline to make it easier to read later, but shouldn't be a blocker).

gabriel-samfira · 2023-07-03T15:01:48Z

This ends up crashing on Windows due to the fact that there is no procfs. We either disable it by returning a stub sampler or we skip allocating a sampler altogether and check for nil pointer before we call Record(). How would you like to proceed?

tonistiigi force-pushed the step-usage-monitoring branch 3 times, most recently from 9b68967 to 853b11e Compare May 12, 2023 00:46

tonistiigi mentioned this pull request May 12, 2023

path to CNI bridge by default #3864

Closed

AkihiroSuda approved these changes May 13, 2023

View reviewed changes

jedevc reviewed May 17, 2023

View reviewed changes

tonistiigi added this to the v0.12.0 milestone May 22, 2023

tonistiigi force-pushed the step-usage-monitoring branch from 853b11e to e0a8e08 Compare May 30, 2023 03:17

tonistiigi added 6 commits June 8, 2023 15:51

resources: add build step resource tracking via cgroups

6e87e4b

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

resources: CNI network usage sampling support

963f161

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

resources: add sampler for periodic stat reads

6a2f92d

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

resources: store sys cpu usage per step

32dcdff

This can be used to convert step usage to relative units. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

llbsolver: add systemusage samples to provenance attestation

509cfa3

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

resources: make maxsamples configurable

262b708

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

tonistiigi force-pushed the step-usage-monitoring branch from e0a8e08 to 262b708 Compare June 8, 2023 23:04

tonistiigi mentioned this pull request Jun 10, 2023

handle missing provenance for non-evaluated result #3945

Merged

AkihiroSuda approved these changes Jun 20, 2023

View reviewed changes

AkihiroSuda requested a review from jedevc June 20, 2023 10:46

crazy-max reviewed Jun 20, 2023

View reviewed changes

crazy-max added the needs/docs-follow-up label Jun 20, 2023

crazy-max approved these changes Jun 20, 2023

View reviewed changes

jedevc approved these changes Jun 21, 2023

View reviewed changes

tonistiigi merged commit a2d1c24 into moby:master Jun 28, 2023

crazy-max mentioned this pull request Jul 6, 2023

Builds on Amazon EKS fail with error: failed to solve: read /proc/pressure/cpu: operation not supported #3997

Closed

This was referenced Jul 20, 2023

Update to BuildKit 0.12 moby/moby#45966

Merged

executor/resource: stub out NewSysSampler on Windows #4040

Merged

[0.12 backport] executor/resource: stub out NewSysSampler on Windows #4042

Merged

rrjjvv mentioned this pull request Jul 25, 2023

cgroup "operation not supported" errors on Rocky 9 earthly/earthly#3139

Closed

crazy-max mentioned this pull request Sep 5, 2023

use bklog instead of logrus #4198

Merged

jedevc mentioned this pull request Nov 13, 2023

podmetrics.metrics.k8s.io from buildkit not found #4407

Open

tonistiigi mentioned this pull request Dec 23, 2023

cache: capture metrics related to cache records and pruning #4476

Closed

crazy-max added the needs/docs-follow-up label May 22, 2025

Conversation

tonistiigi commented May 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah commented May 12, 2023

Uh oh!

tonistiigi commented May 12, 2023

Uh oh!

thaJeztah commented May 12, 2023

Uh oh!

AkihiroSuda commented May 12, 2023

Uh oh!

tonistiigi commented May 12, 2023

Uh oh!

AkihiroSuda commented May 16, 2023

Uh oh!

jedevc May 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jedevc May 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jedevc May 17, 2023

Choose a reason for hiding this comment

Uh oh!

tonistiigi May 30, 2023

Choose a reason for hiding this comment

Uh oh!

jedevc May 30, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jedevc commented May 17, 2023

Uh oh!

tonistiigi commented May 17, 2023

Uh oh!

AkihiroSuda commented Jun 5, 2023

Uh oh!

crazy-max Jun 20, 2023

Choose a reason for hiding this comment

Uh oh!

jedevc left a comment

Choose a reason for hiding this comment

Uh oh!

gabriel-samfira commented Jul 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tonistiigi commented May 11, 2023 •

edited

Loading

gabriel-samfira commented Jul 3, 2023 •

edited

Loading