Auto switch cgroup monitors from v1 to v2#18705
Conversation
capistrant
left a comment
There was a problem hiding this comment.
What do we want to do about documentation in https://druid.apache.org/docs/latest/configuration/#metrics-monitors? Do we want to explicitly call out that the auto conversion will happen for the v1 monitors? Do we want to remove experimental from the v2 monitors docs?
|
|
||
| package org.apache.druid.java.util.metrics.cgroups; | ||
|
|
||
| import org.junit.Assert; |
There was a problem hiding this comment.
should new test file be written with junit5?
| } | ||
|
|
||
| @Test | ||
| public void testBackwardsCompatibilityWithV1Parsing() throws IOException |
There was a problem hiding this comment.
could you point out how this format is different than all the tests above? I'm missing what is unique about it compared to the rest of the v2 tests
|
|
||
| package org.apache.druid.java.util.metrics.cgroups; | ||
|
|
||
| import org.junit.Assert; |
There was a problem hiding this comment.
same as other new UT file, should we write it in junit5?
| } | ||
|
|
||
| @Test | ||
| public void testMicrosecondToJiffiesConversion() throws IOException |
There was a problem hiding this comment.
| public void testMicrosecondToJiffiesConversion() throws IOException | |
| public void testV2doesNotConvertMicrosecondToJiffies() throws IOException |
nit: existing name feels misleading since there is no conversion, jiffies just aren't provided
| } | ||
| } | ||
| catch (Exception e) { | ||
| LOG.debug(e, "Could not determine cgroups version, assuming v1"); |
There was a problem hiding this comment.
should this actually be a warn so operators can investigate and get things properly configured?
There was a problem hiding this comment.
This method is removed
| private static final String | ||
| CPU_MAX_FILE = "cpu.max"; |
There was a problem hiding this comment.
| private static final String | |
| CPU_MAX_FILE = "cpu.max"; | |
| private static final String CPU_MAX_FILE = "cpu.max"; |
| * @param cgroupDiscoverer the discoverer to check | ||
| * @return true if running on cgroups v2, false for v1 or unknown | ||
| */ | ||
| public static boolean isRunningOnCgroupsV2(CgroupDiscoverer cgroupDiscoverer) |
There was a problem hiding this comment.
This method looks like it will use cgroupv1 if both exist. in processing/src/main/java/org/apache/druid/java/util/metrics/cgroups/ProcSelfCgroupDiscoverer.java detectCgroupVersion it looks like it prefers v2 even if both exist. Is this discrepancy intentional?
There was a problem hiding this comment.
This method is no longer needed. Adjusted the code.
| } | ||
| } | ||
|
|
||
| public static boolean doMonitorInternal( |
There was a problem hiding this comment.
short javadoc might be helpful to note that this is a static utility method used by v1 and v2 monitor. I don't fully understand the "internal" part of the name either
There was a problem hiding this comment.
Does this require an entry in https://druid.apache.org/docs/latest/configuration/#metrics-monitors? Will cover the docs topic more in the top level review comment
cryptoe
left a comment
There was a problem hiding this comment.
Thanks for the review @capistrant .
Responded to all your queries.
Regarding junit 4 vs junit5 frameworks, I hope that change may not block this PR.
Will need to adjust lot of the testing boiler plate code for that to work which I was hoping can be punted down the line :)
| * @param cgroupDiscoverer the discoverer to check | ||
| * @return true if running on cgroups v2, false for v1 or unknown | ||
| */ | ||
| public static boolean isRunningOnCgroupsV2(CgroupDiscoverer cgroupDiscoverer) |
There was a problem hiding this comment.
This method is no longer needed. Adjusted the code.
| } | ||
| } | ||
| catch (Exception e) { | ||
| LOG.debug(e, "Could not determine cgroups version, assuming v1"); |
There was a problem hiding this comment.
This method is removed
|
|
||
| // doMonitor should return true but skip actual monitoring | ||
| Assert.assertTrue(monitor.doMonitor(emitter)); | ||
| Assert.assertEquals(4, emitter.getNumEmittedEvents()); |
| // doMonitor should return true | ||
| Assert.assertTrue(monitor.doMonitor(emitter)); | ||
|
|
||
| Assert.assertEquals(2, emitter.getEvents().size()); |
| } | ||
|
|
||
| @Test | ||
| public void testZeroMicrosecondValues() throws IOException |
There was a problem hiding this comment.
Wanted to test if the values are initialized properly.
Added more asserts to check that.
capistrant
left a comment
There was a problem hiding this comment.
Left a suggestion that you should be able to batch accept to fix spell checker GHA
| |`org.apache.druid.java.util.metrics.CgroupV2CpuMonitor`| **EXPERIMENTAL** Reports CPU usage from `cpu.stat` file. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupV2DiskMonitor`| **EXPERIMENTAL** Reports disk usage from `io.stat` file. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor`| **EXPERIMENTAL** Reports memory usage from `memory.current` and `memory.max` files. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automatically switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| |
| |`org.apache.druid.java.util.metrics.CgroupV2DiskMonitor`| **EXPERIMENTAL** Reports disk usage from `io.stat` file. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor`| **EXPERIMENTAL** Reports memory usage from `memory.current` and `memory.max` files. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automatically switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| |
| |`org.apache.druid.java.util.metrics.CgroupV2MemoryMonitor`| **EXPERIMENTAL** Reports memory usage from `memory.current` and `memory.max` files. Only applicable to `cgroupv2`.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automaticaly switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automaticaly switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automatically switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| |
| |`org.apache.druid.java.util.metrics.CgroupCpuMonitor`|Reports CPU shares and quotas as per the `cpu` cgroup. Automaticaly switches to `CgroupV2CpuMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupCpuSetMonitor`|Reports CPU core/HT and memory node allocations as per the `cpuset` cgroup. Automaticaly switches to `CgroupV2CpuSetMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupDiskMonitor`|Reports disk statistic as per the blkio cgroup. Automaticaly switches to `CgroupV2DiskMonitor` in case `cgroupv2` type is detected.|Any| | ||
| |`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory statistic as per the memory cgroup. Automaticaly switches to `CgroupV2MemoryMonitor` in case `cgroupv2` type is detected.|Any| |
There was a problem hiding this comment.
| |`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory statistic as per the memory cgroup. Automaticaly switches to `CgroupV2MemoryMonitor` in case `cgroupv2` type is detected.|Any| | |
| |`org.apache.druid.java.util.metrics.CgroupMemoryMonitor`|Reports memory statistic as per the memory cgroup. Automatically switches to `CgroupV2MemoryMonitor` in case `cgroupv2` type is detected.|Any| |
Summary
This PR introduces auto-switching cgroup monitors that can dynamically detect and switch between cgroup v1 and v2 implementations, along with necessary code style fixes.
Changes
-Now CgroupCpuMonitor , CgroupCpuSetMonitor , CgroupDiskMonitor ,MemoryMonitor autoswitch to cgroups v2 and no longer fail in case cgroups v2 is detected.
cgroup/cpu/sharesandcgroup/cpu/cores_quotaTest Plan
This enhancement allows Druid to seamlessly work across different containerization environments that use either cgroup v1 or v2, improving compatibility and reducing configuration overhead.
Release notes
cgroup/cpu/sharesandcgroup/cpu/cores_quota