Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

agent: add support for online memory and cpu separately#331

Merged
jodh-intel merged 1 commit intokata-containers:masterfrom
cedriccchen:master
Aug 28, 2018
Merged

agent: add support for online memory and cpu separately#331
jodh-intel merged 1 commit intokata-containers:masterfrom
cedriccchen:master

Conversation

@cedriccchen
Copy link
Contributor

@cedriccchen cedriccchen commented Aug 23, 2018

agent: add support for online memory and cpu separately.

In func OnlineCPUMem, cpu is always onlined,
which is not expected. So we add "CpuOnly"
field to support separating memory and cpu online.

Fixes #332

Signed-off-by: Clare Chen clare.chenhui@huawei.com
Signed-off-by: Zichang Lin linzichang@huawei.com

@cedriccchen
Copy link
Contributor Author

This PR is related to kata-containers/runtime#624

grpc.go Outdated

func (a *agentGRPC) onlineCPUMem(req *pb.OnlineCPUMemRequest) error {
if req.NbCpus <= 0 {
if req.NbCpus < 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@jodh-intel
Copy link

jodh-intel commented Aug 23, 2018

Hi @clarecch - thanks for raising!

It might be worth changing the commit message a little. Previously -- as you have said -- CPU onlining was always attempted (even when zero cpus were specified). But also, it looks like because of that initial test, previously memory was only ever onlined when CPUs were also onlined? With this change, we online memory "always", not only when the number of cpus is zero I think?

It might even be clearer to change the code to online the memory first - we could then keep the original req.NbCpus test?

lgtm

Approved with PullApprove

@devimc
Copy link

devimc commented Aug 23, 2018

CI is not happy, @clarecch can you fix it?

grpc.go:196:5:warning: unsigned values are never < 0 (SA4003) (staticcheck)

@cedriccchen
Copy link
Contributor Author

@devimc I will fix it as soon as possible.

@linzichang
Copy link
Contributor

@jodh-intel @devimc I think there're three solutions:

  1. The modify in this PR. It's very simple. But maybe not so graceful.
  2. Adding member to "req *pb.OnlineCPUMemRequest" to decide online cpu, mem or both.
  3. Separating onlineCPUMem to two function, onlineCPU and onlineMem.

@devimc
Copy link

devimc commented Aug 23, 2018

@linzichang I like the number 3

Separating onlineCPUMem to two function, onlineCPU and onlineMem.

@devimc
Copy link

devimc commented Aug 23, 2018

@clarecch what do you think ?

@jodh-intel
Copy link

Hi @linzichang - I agree and I think for clarity it makes sense to have separate OnlineCPUs() and OnlineMemory() rpc calls.

However, presumably we created the OnlineCPUMem() for efficiency, so that a single gRPC call could handle both? Is that right @sboeuf, @bergwolf, @gnawux?

Aside: on a general point, how about renaming OnlineCPUMem() to OnlineResources()?

@cedriccchen
Copy link
Contributor Author

cedriccchen commented Aug 23, 2018

@devimc I like number 3 too, but it seems to need a lot of modification

@jodh-intel
Copy link

Yep - I think option 3 is the "cleanest", but let's get some more input from others....

@codecov
Copy link

codecov bot commented Aug 23, 2018

Codecov Report

Merging #331 into master will decrease coverage by 0.23%.
The diff coverage is 62.5%.

@@            Coverage Diff             @@
##           master     #331      +/-   ##
==========================================
- Coverage   44.92%   44.68%   -0.24%     
==========================================
  Files          15       15              
  Lines        2393     2354      -39     
==========================================
- Hits         1075     1052      -23     
+ Misses       1176     1167       -9     
+ Partials      142      135       -7

@devimc
Copy link

devimc commented Aug 23, 2018

yep, I think hyper guys wrote this protocol
cc @bergwolf @laijs @gnawux

@bergwolf
Copy link
Member

@devimc @jodh-intel @linzichang @clarecch We put onlinecpu and onlinememory into one call since they are mostly called together, and we save one grpc call when they actually are. Also we did want to only use a small set of grpc APIs to keep it simple.

And if we split/rename the onlinecpumem API, it will break grpc backward compatibility. So I would suggest we go with option 2 and name the new field CpuOnly so that it defaults to false and old client can still work properly.

@cedriccchen
Copy link
Contributor Author

@bergwolf If we want to keep grpc backward compatibility from being broken, I think the option 2 is the only choice. There are two solutions I think:

  1. Add a new field named CpuOnly, use NbCpus to judge online cpu&memory or online memory only. In this case, we cannot keep the orign comparation, which is that if we want to online cpu while NbCpus=0 we should return an error.
  2. Add two new field named CpuOnline and MemOnline so that we can judge online cpu&memory or memory only or cpu only. We can keep the orign comparation at the same time.

@bergwolf
Copy link
Member

@clarecch, I don't think we need an extra field. The new grpc onlinecpumem handler can:

  1. NbCpus > 0: online CPU
  2. NbCpus == 0 && CpuOnly: return error
  3. !CpuOnly: online memory

And all cases are covered. WDYT?

@cedriccchen
Copy link
Contributor Author

@bergwolf Yes, this solution is quite good, which covers most cases, except one case NbCpus=0 and CpuOnly=false. In this case we don't know NbCpus=0 is an input error or really want hotplug memory only. Is that matters? ^^

@bergwolf
Copy link
Member

@clarecch I think we can rely on NbCpus>0 to say that we want to online CPU, and CpuOnly=false to say that we want to online memory, and do not specially care about the cpu/memory only cases. The combination of NbCpus<=0 && CpuOnly == true means we do not want to online cpu nor memory, in which case caller should not call the API in the first place, and we return error for it.

So to answer your question, NbCpus == 0 && CpuOnly == false means we only online memory. Maybe we should rename CpuOnly to SkipMemory to make it non-relevant to CPU ;). I wanted to make online memory the default behavior so I picked CpuOnly but it seems to cause more confusion than I expected.

@jodh-intel
Copy link

Adding a CpuOnly bool wfm...

diff --git a/protocols/grpc/agent.proto b/protocols/grpc/agent.proto
index b930d94..865fe0f 100644
--- a/protocols/grpc/agent.proto
+++ b/protocols/grpc/agent.proto
@@ -330,11 +330,13 @@ message ListRoutesRequest {
 message OnlineCPUMemRequest {
        // Wait specifies if the caller waits for the agent to online all resources.
        // If true the agent returns once all resources have been connected, otherwise all
        // resources are connected asynchronously and the agent returns immediately.
        bool wait = 1;
 
        // NbCpus specifies the number of CPUs that were added and the agent has to online.
        uint32 nb_cpus = 2;
+
+       bool cpu_only = 3;
 }

... as long as we document in that file every possible scenario (as we have done on this issue) to maximise understanding.

Related: #150.

@cedriccchen cedriccchen changed the title agent: add support for online memory only. agent: add support for online memory and cpu separately. Aug 25, 2018
@cedriccchen cedriccchen changed the title agent: add support for online memory and cpu separately. agent: add support for online memory and cpu separately Aug 25, 2018
@linzichang
Copy link
Contributor

I have no idea why the ubuntu-ci failed. May someone give a help?


func (a *agentGRPC) onlineCPUMem(req *pb.OnlineCPUMemRequest) error {
if req.NbCpus <= 0 {
if req.NbCpus == 0 && req.CpuOnly {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about NbCpus < 0? We should fail in such case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NbCpus is uint32. I think there is no need to consider NbCpus <0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, good point! I thought it was int32 as we had <= there ;)

@bergwolf
Copy link
Member

CI failed with

not ok 2 Running within memory constraints
# (from function `waitForProcess' in file ../../.ci/lib.sh, line 161,
#  in test file k8s-memory.bats, line 47)
#   `waitForProcess "$wait_time" "$sleep_time" "$pod_status_cmd"' failed

@cedriccchen
Copy link
Contributor Author

@bergwolf I have no idea what happen when Ci run k8s-memory.bats and failed. Could you please help to find out the reason, if convenient?

@caoruidong
Copy link
Member

This PR kata-containers/runtime#643 also fails on this test. But it just bumps version of golang. Kind of wired. I think there is something wrong with CI.

@bergwolf
Copy link
Member

bergwolf commented Aug 27, 2018

LGTM!

Approved with PullApprove

@bergwolf
Copy link
Member

CI green now. It needs one more approval on the protocol change though.

@caoruidong
Copy link
Member

Hold on. @clarecch Do you generate pb.go file by proto-gen-go tool?

@devimc
Copy link

devimc commented Aug 27, 2018

good catch @caoruidong! @clarecch please run hack/update-generated-agent-proto.sh to generate the pb.go file

In func OnlineCPUMem, cpu is always onlined,
which is not expected. So we add "CpuOnly"
field to support separating memory and cpu online.

Fixes #332

Signed-off-by: Clare Chen <clare.chenhui@huawei.com>
Signed-off-by: Zichang Lin <linzichang@huawei.com>
@cedriccchen
Copy link
Contributor Author

@devimc @caoruidong @bergwolf Commit is updated, and CI is green. PTAL!

@bergwolf
Copy link
Member

Thanks @clarecch ! Still need one more approval on the protocol change though. ping @kata-containers/agent

@caoruidong
Copy link
Member

caoruidong commented Aug 28, 2018

LGTM

Approved with PullApprove Approved with PullApprove

@jodh-intel
Copy link

jodh-intel commented Aug 28, 2018

Thanks @clarecch!

lgtm

Approved with PullApprove

@jodh-intel jodh-intel merged commit a0136e7 into kata-containers:master Aug 28, 2018
@@ -193,21 +193,25 @@ func updateContainerCpuset(cgroupPath string, newCpuset string, cookies cookie)
}

func (a *agentGRPC) onlineCPUMem(req *pb.OnlineCPUMemRequest) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linzichang @bergwolf
For sure this patch works, but why don't we simply define two different functions for CPU and memory ?
This way, no need for specific flags, we would simply call the appropriate function:

  • onlineCPU
  • onlineMemory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did discuss this a bit above (#331 (comment)) and I agree it would be a lot clearer (but potentially slower as multiple calls might be required).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can see that!
@bergwolf I understand the concern about multiple calls, but I think it's definitely negligible compared to the whole stack.
And about the API compatibility breakage, we better break it now to make it right, as the project is still not completely stable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants