systemd: add SetUnified for direct cgroupfs writes bypassing dbus#59
systemd: add SetUnified for direct cgroupfs writes bypassing dbus#59sohankunkerkar wants to merge 1 commit intoopencontainers:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a targeted API to update cgroup v2 “unified” (cgroupfs file) values for systemd-managed cgroups without using systemd’s SetUnitProperties DBus call, to avoid systemd resetting unrelated cgroup properties during property updates.
Changes:
- Add
(*UnifiedManager).SetUnified(map[string]string)to write selected unified keys directly via the cgroupfs (fs2) path, bypassing DBus. - Add an integration test to verify unified keys (e.g.,
memory.min,memory.low) can be set and later cleared using the new method.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
systemd/v2.go |
Adds SetUnified to perform direct cgroupfs unified writes via the fs2 manager. |
systemd/systemd_test.go |
Adds TestSetUnified integration coverage for setting/clearing unified memory protection knobs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add SetUnified method on UnifiedManager that writes unified cgroup v2 resource values directly to cgroupfs without going through systemd's SetUnitProperties dbus path. When callers use Set() to update specific unified keys (e.g. memory.min, memory.low), the current implementation bundles those updates with all other resource properties into a single SetUnitProperties dbus call. This can cause systemd to reset unrelated cgroup properties on the unit. SetUnified avoids this by writing only the specified unified values via the existing fs2 manager path. This is needed by kubelet to clear stale MemoryQoS cgroup protection values during feature rollback without triggering systemd side effects on CPU and memory limit properties. Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
4fd5f5e to
e21361b
Compare
|
hm if we do this is there a chance systemd clobbers the changes? We've had situations in the past where writing directly to cgroups without going through dbus means systemd doesn't become aware of the change and later undoes it. |
Yup, systemd will clobber these writes on the next The caller ( So yes, the clobbering is expected and the caller is built around it. The API intentionally leaves that responsibility to the caller rather than trying to solve it here. |
|
I guess do we understand why setting the memory.low is unsetting cpu.max/cpu.weight/memory.max? I feel like we should address that rather than doing all of this overwriting all the time |
systemd’s SetUnitProperties dbus API reconciles the entire unit state, not just the provided properties. When kubelet calls |
|
so maybe we need to keep track of the state of the cgroup in kubelet so each time we pass the properties we don't clobber our own values? or in this library read them before we write them. if that's too complicated ig this would work but it feels like bad practice to be fighting with systemd so much on this |
When the gate is on this already works! The problem is only when the gate is off, The alternative of doing read-modify-write in the cgroups library to snapshot the full unit state before |
|
I will take a closer look next week, but for now:
|
Dug deeper and confirmed why the dbus path can't work for this. Ran a test on systemd 257: # Set CPU and memory on a slice via dbus
systemctl set-property --runtime test-mask.slice CPUQuota=80% CPUWeight=500 MemoryMax=1073741824
# Overwrite values via cgroupfs (simulating what enforceNodeAllocatableCgroups + external process does)
echo "50000 100000" > test-mask.slice/cpu.max
echo "200" > test-mask.slice/cpu.weight
echo "536870912" > test-mask.slice/memory.max
# Send ONLY MemoryMin=0 via dbus (what kubelet's setMemoryQoS would do)
systemctl set-property --runtime test-mask.slice MemoryMin=0
# Result: ALL values reverted to dbus-stored values
cpu.max: 80000 100000 (reverted from 50000)
cpu.weight: 500 (reverted from 200)
memory.max: 1073741824 (reverted from 536870912)
memory.min: 0 (correctly cleared) SetUnitProperties triggers full unit realization as systemd rewrites all cgroup properties from its stored context, not just the one you changed. There's no dbus API to update a single cgroup property without triggering this. This is the same dbus method kubelet uses. |
It's on the systemd driver because the problem is systemd-specific. On cgroupfs driver, Set() writes unified keys directly to cgroupfs without side effects. On systemd driver, Set() calls setUnitProperties via dbus first, which triggers full unit realization as systemd rewrites all cgroup properties from its stored context, not just the ones you sent. #59 (comment)
The caller (kubelet) could technically get the path via |
|
Closing this in favor of kubernetes/kubernetes#138903 |
Add SetUnified method on UnifiedManager that writes unified cgroup v2 resource values directly to cgroupfs without going through systemd's SetUnitProperties dbus path.
When callers use Set() to update specific unified keys (e.g. memory.min, memory.low), the current implementation bundles those updates with all other resource properties into a single SetUnitProperties dbus call. This can cause systemd to reset unrelated cgroup properties on the unit. SetUnified avoids this by writing only the specified unified values via the existing fs2 manager path.
This is needed by kubelet to clear stale MemoryQoS cgroup protection values during feature rollback without triggering systemd side effects on CPU and memory limit properties.
Context