[WIP] k8s memory resource hotplug#580
Conversation
Signed-off-by: Yanqiang Miao <miao.yanqiang@zte.com.cn>
|
PSS Measurement: Memory inside container: |
|
Build failed (third-party-check pipeline) integration testing with
|
|
@miaoyq Can you add bit more details about what problem you are solving here and how you are solving it. |
|
@amshinde Related to #400 But in k8s pod, resource limits is set in app containers config, like: So we should dynamically increase or decrease memory of VM according to the app containers config. Kata-runtime already supports memory hotplug in hypervisor side, see #470 |
|
You need to add |
| } | ||
| } | ||
| if ocispec.Linux.Resources.Memory != nil { | ||
| if ocispec.Linux.Resources.Memory.Limit != nil { |
There was a problem hiding this comment.
You can make a short sentense:
if ocispec.Linux.Resources.Memory != nil && ocispec.Linux.Resources.Memory.Limit != nil {
| sizeMB := int(mem / 1024 / 1024) | ||
| // sizeMB needs to be divisible by 2 | ||
| sizeMB = sizeMB + sizeMB%2 | ||
| _, err := c.sandbox.hypervisor.hotplugRemoveDevice(&memoryDevice{1, sizeMB}, memoryDev) |
There was a problem hiding this comment.
To support hot-unplug memory devices, we need to support guest ballooning first (which I don't think we have added yet).
There was a problem hiding this comment.
We don't support hot memory unplug. This code should be removed from this PR as it will make the code failing.
|
@miaoyq I played with your PR, but I can't see expected effect with your test POD spec. The POD's memory is always 2048MiB as specified in kata configure file. Do I miss something? What's the expected behaviour with this PR? |
|
@WeiZhang555 @bergwolf @jodh-intel |
|
@miaoyq @jodh-intel @WeiZhang555 @bergwolf We should think about this carefully. We face 3 problems:
|
|
Indeed it's a better solution to hot add memory aligned to guestos memory block size. As now we have the fact:
We can get the size of guestos memory block by reading |
|
My two cents, there is still some points that haven't been discussed:
Since memory hot remove is almost impossible (or at least not reliable), we should track how much memory was assigned/required to/by a container, for example, if a container has 2GB but it's updated or removed (PODs scenario) its memory still can be used by other new or existing containers |
Maybe we can make the memory section size configurable, for different architecture kata will use the different |
Two way to clear “memblock size”:
|
|
@miaoyq |
|
@linzichang @clarecch
If so, the first option is a little better I think, we can add a interface in |
|
@miaoyq @devimc @linzichang Maybe we can get memory block size from agent when kata-runtime create or kata-runtime start, then store it in file. We don't need to ask the memory block_size for agent everytime hotplug. |
|
@clarecch Agree. |
@grahamwhaley Do you have any idea about memory slot, which I'm not quite familiar with? I have read clearcontainers/runtime#380. Is there any result of memory slot discussion? |
Hi @clarecch - heh, good digging, finding that 1yr old thread! I think any measures we did before on number-of-slot overheads (~1.5yr ago now) will be out of date - we should/would have to re-run them if we need that data. But, I think this will always be difficult - I very much doubt we can find a default that will suit all situations and users. I think we need to make this configurable (in the runtime toml config file), and set some sensible defaults. The only sensible default I can think of might be:
In the config file I think we should probably offer two config options to give the user flexibility:
|
|
@grahamwhaley very enlightening! |
|
Maybe I'm thinking about this wrong as well.... I don't think slots have a fixed size do they? (that is, you can plug different sized dimms into each slot I think in QEMU?). |
|
I think in some case we will hotplug multiple dimms for every container in a pod, which contains multiple containers. |
|
@devimc I change |
|
Thanks @linzichang, thanks for grabbing that data. 11Mb over how much - what is to total PSS (so we can see what % increase there is). Our PSS footprint can be between maybe 48Mb (20 containers, KSM) to 135Mb (20 containers, no KSM) for instance on my local system - in which case 11Mb could represent anywhere from a 23% to 8% footprint increase. Both of those I would call significant enough to need a discussion :-) You could maybe use the reportgen from the tests repo to generate us a report.
|
| virtLog.Debugf("hot adding %d B memory", mem) | ||
| sizeMB := int(mem / 1024 / 1024) | ||
| // sizeMB needs to be divisible by 2 | ||
| sizeMB = sizeMB + sizeMB%2 |
There was a problem hiding this comment.
It might be better to move the log call to here and log sizeMB (and maybe mem too):
virtLog.WithField("memory-mb", sizeMB).Info("hot adding memory")
Same comment for the log call in removeResources().
| } | ||
| if ocispec.Linux.Resources.Memory != nil { | ||
| if ocispec.Linux.Resources.Memory.Limit != nil { | ||
| resources.Mem = uint32(*ocispec.Linux.Resources.Memory.Limit) |
There was a problem hiding this comment.
This is casting an int64 into an int32:
Maybe ContainerResources.Mem should be changed to int64 too?
|
@devimc You have @ the wrong guy twice, :) |
|
@linzichang 😄 I'm sorry |
| sizeMB := int(mem / 1024 / 1024) | ||
| // sizeMB needs to be divisible by 2 | ||
| sizeMB = sizeMB + sizeMB%2 | ||
| _, err := c.sandbox.hypervisor.hotplugRemoveDevice(&memoryDevice{1, sizeMB}, memoryDev) |
There was a problem hiding this comment.
We don't support hot memory unplug. This code should be removed from this PR as it will make the code failing.
|
@sboeuf @linzichang @clarecch I'm working on this. |
|
@miaoyq I need this feature in short time. We have already talk about the solution in #624 (comment) . I will implement some both needed method in update memory to help speedup this PR. Let'us get this done quicker together. Thank you to rework this very much :) |
|
@miaoyq I think you should first take care the memory footprint we mentions in #580 (comment) #580 (comment) #580 (comment). Because I haven't tested it clear yet. |
|
@linzichang I think you're a little bit more familiar with this, so in order not to affect your usage, you can rework this freely. :-) |
|
Hi, @linzichang @miaoyq any updates on this? thx! |
|
@raravena80 |
|
@miaoyq @raravena80 I will open a new PR in the last few days. |
|
@linzichang Thanks. :-) |
|
Rework PR #786 |
@linzichang I will close this PR, thanks! |
At present, kata-runtime already supports memory hotplug in hypervisor side,however, the limitation of pod resources in k8s cannot be satisfied.
This pr try to satisfy the usage of k8s.
Signed-off-by: Yanqiang Miao miao.yanqiang@zte.com.cn