Resilience in adding of exec tasks to cgroups#1950
Merged
Conversation
bbbcd28 to
37fa8c3
Compare
The kernel will sometimes return EINVAL when writing a pid to a cgroup.procs file. It does so when the task being added still has the state TASK_NEW. See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286 Co-authored-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Tom Godkin <tgodkin@pivotal.io> Signed-off-by: Danail Branekov <danailster@gmail.com>
37fa8c3 to
bdf3524
Compare
BooleanCat
commented
Dec 17, 2018
BooleanCat
pushed a commit
to cloudfoundry/garden-runc-release
that referenced
this pull request
Dec 17, 2018
The patch contains the commit from PR opencontainers/runc#1950 [#162607734] Co-authored-by: Danail Branekov <danailster@gmail.com>
Author
|
Friendly bump |
Author
|
@cyphar @crosbymichael Hey friends! Currently Garden is shipping with this in place as a patch since it was causing persistent application push failures in some cloud foundry deployments (which is fixed with this change). Would be great to get this merge in so we can stop relying on patching runc! Please let me know what you think, thanks! |
Member
|
Reviewing this today. Sorry, it got lost on me. Thanks for the ping! |
Member
1 similar comment
Contributor
|
Does it solve #1326 ? |
Author
|
We've had this running in prod for a while now (even as a gift patch before this got merged) in cloud foundry and have not seen it reoccur. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is related to #1884
This fix allows more leniency for setting up cgroups on exec when cgroup namespaces are not used.
The issue linked above was partly resolved by #1916 - we can avoid the issue entirely by using cgroup namespaces. Cloud Foundry's use of runc does not yet use cgroup namespaces and we encounter the issue above frequently.
We believe that since not using cgroup namespaces is a valid configuration of a config.json, runc should be more resilient when we choose not to. This PR will attempt to write to cgroup.procs a few times when execing, only retrying on seeing EINVAL (which, unless you're using realtime processes, means the task's state in the kernel is still TASK_NEW: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286)
We replaced the ioutil.WriteFile with a Open/Write dance because we don't want to open the file each time we attempt to write to cgroup.procs, and we reduce the surface for error types changing underneath us within golang.