Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

static-build: build the real static binary for clh#1099

Merged
likebreath merged 1 commit into
kata-containers:masterfrom
likebreath:fix_1033
Aug 27, 2020
Merged

static-build: build the real static binary for clh#1099
likebreath merged 1 commit into
kata-containers:masterfrom
likebreath:fix_1033

Conversation

@likebreath
Copy link
Copy Markdown
Contributor

@likebreath likebreath commented Jul 21, 2020

This patch reuses the 'dev_cli.sh' from cloud-hypervisor repo to build a
static binary of clh without creating our own docker image nor
installing extra dependencies.

Fixes: #1033

Signed-off-by: Bo Chen chen.bo@intel.com

Note: As we don't have full CI coverage in this repo, I am using a dummy PR from the tests repo (kata-containers/tests#2733) to verify this change is passing all CI jobs related to cloud-hypervisor.

@likebreath
Copy link
Copy Markdown
Contributor Author

/test-ubuntu

@likebreath
Copy link
Copy Markdown
Contributor Author

/test-clh

@likebreath
Copy link
Copy Markdown
Contributor Author

As pointed by @jcvenegas, the clh CI now is broken and is being fixed by two pending PRs (#1098 and kata-containers/runtime#2843). We will need to land those two PRs before we can merge this one.

git fetch || true
git checkout "${cloud_hypervisor_version}"
"${script_dir}/docker-build/build.sh"
./scripts/dev_cli.sh build --release --libc musl
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see the dev_cli.sh file in this repo. Are we adding it in this PR?
Also, prefer to use ${script_dir} instead os ./scripts since this script could be invoked from another directory.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch reuses the building script from clh repo: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/scripts/dev_cli.sh. It relies on the container image created/maintained as a part of the clh project. I think it is better to leverage their efforts, instead of maintaining another docker image to build clh in Kata.

Also, using the hard-coded path to /scripts/dev_cli.sh should be good given the context of this build-static-clh.sh . I assume this build-static-clh.sh will work independent of where it is invoked. Please correct me if I am wrong. Any other thoughts, please also let me know. :)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @sboeuf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me. The relative path is within the repo we just pulled -- it's handled by the cd $repo_dir.

We just have to hope that they don't reorganize within their repos structure. If they do, we'd need to redo our build logic anyway. lgtm.

./scripts/dev_cli.sh build --release --libc musl
rm -f cloud-hypervisor
cp ./target/release/cloud-hypervisor .
cp build/cargo_target/x86_64-unknown-linux-musl/release/cloud-hypervisor .
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the docker file is removed below. Are the builds no longer performed inside a container?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the build is inside a container, but with the cloud-hypervisor source folder mounted. The final build artifact (static binary) will be in that location (build/cargo_target/x86_64-unknown-linux-musl/release/cloud-hypervisor relative to the clh src root). I chose to keep the cp after the build assuming this will make this change transparent to the rest of the CI/building flow.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace that hard-coded architecture with uname -m?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel Good point. Updated the PR.

@likebreath
Copy link
Copy Markdown
Contributor Author

/test-clh

@likebreath
Copy link
Copy Markdown
Contributor Author

/test-clh

@likebreath
Copy link
Copy Markdown
Contributor Author

/test

likebreath added a commit to likebreath/kata-tests that referenced this pull request Jul 22, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
likebreath added a commit to likebreath/kata-tests that referenced this pull request Jul 22, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
likebreath added a commit to likebreath/kata-tests that referenced this pull request Jul 22, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
likebreath added a commit to likebreath/kata-tests that referenced this pull request Jul 28, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
@amshinde amshinde closed this Aug 12, 2020
@amshinde amshinde reopened this Aug 12, 2020
@amshinde
Copy link
Copy Markdown
Member

/test-clh
/test-ubuntu
/AzurePipelines run

likebreath added a commit to likebreath/kata-tests that referenced this pull request Aug 13, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
@jcvenegas
Copy link
Copy Markdown
Member

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

likebreath added a commit to likebreath/kata-tests that referenced this pull request Aug 13, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath
Copy link
Copy Markdown
Contributor Author

/test-ubuntu

@likebreath
Copy link
Copy Markdown
Contributor Author

/test-clh

likebreath added a commit to likebreath/kata-tests that referenced this pull request Aug 14, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath
Copy link
Copy Markdown
Contributor Author

/test-clh
/test-ubuntu
/AzurePipelines run

@jcvenegas
Copy link
Copy Markdown
Member

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@likebreath likebreath added the do-not-merge PR has problems or depends on another label Aug 14, 2020
@likebreath
Copy link
Copy Markdown
Contributor Author

Although the PR itself passed all necessary checks, we are still facing a metrics CI failure as tested by the dummy PR from the test repo (kata-containers/tests#2733). So I marked it as DNM for now.

The metrics CI is failing again on the check over the minimal memory footprint (and only for KSM enabled instances): 11:35:00 time="2020-08-14T18:35:00Z" level=warning msg="Failed Minval ( 269742.458500(minimal mean expected) > 269741.400000(minimal mean on dataset)) for [memory-footprint-ksm]". Full log is here: Log is here: http://jenkins.katacontainers.io/job/kata-metrics-tests-ubuntu-clh-18-04-PR/85/console.

One interesting observation is that the numbers are very close (1 out of 269000). The question is: how do we want to adjust the metrics CI to make sure it is working with static build of cloud-hypervisor? We either can 1) remove the check over the minimal memory footprint, or 2) adjust the reference number from the metrics CI. WDYT? @GabyCT @jcvenegas @amshinde @egernst

@egernst
Copy link
Copy Markdown
Member

egernst commented Aug 14, 2020

Once we understand the reason for the increased footprint, it would make sense to update the reference value.

We have more details on why? @likebreath @sameo?

@likebreath
Copy link
Copy Markdown
Contributor Author

Once we understand the reason for the increased footprint, it would make sense to update the reference value.

We have more details on why? @likebreath @sameo?

@egernst Thanks for the prompt response. My understanding is that the checks is failing not because of increased memory footprint, but the opposite: the actual minimal memory footprint now is smaller than the reference value. This is how I understand the error message: 11:35:00 time="2020-08-14T18:35:00Z" level=warning msg="Failed Minval ( 269742.458500(minimal mean expected) > 269741.400000(minimal mean on dataset)) for [memory-footprint-ksm]. Please correct me if I am wrong.

@likebreath
Copy link
Copy Markdown
Contributor Author

@kata-containers/packaging Please comment on the above discussions about how we want to proceed with the metrics CI failure, where the current minimum memory footprint is slightly smaller than the reference value.

Also, as we are preparing for next release, do we want to have it as a part of that release?

Thanks.

@amshinde
Copy link
Copy Markdown
Member

@likebreath Is the decrease in memory footprint only in case of KSM?
I thought that you were seeing an increase in memory footprint for static binary, which makes sense to me.

In any case, I feel we should remove the check on the min value, we should'nt really trigger a failure when we are seeing lower memory overhead. I would also readjust the min limits.
@egernst @kata-containers/packaging Any thoughts on this?

@likebreath
Copy link
Copy Markdown
Contributor Author

likebreath commented Aug 20, 2020

@likebreath Is the decrease in memory footprint only in case of KSM?

Right.

I thought that you were seeing an increase in memory footprint for static binary, which makes sense to me.

That is also what I thought when we got the initial metrics CI failure with this PR. For now (after several changes from the metrics CI/runtime/packaging), check in minval is the only failure.

In any case, I feel we should remove the check on the min value, we should'nt really trigger a failure when we are seeing lower memory overhead. I would also readjust the min limits.

This makes sense to me. Any pointers/help to adjust this in the CI?

@amshinde
Copy link
Copy Markdown
Member

This makes sense to me. Any pointers/help to adjust this in the CI?

@GabyCT would be the best person to help you here.

@chavafg
Copy link
Copy Markdown
Contributor

chavafg commented Aug 21, 2020

This is the file that needs to be updated: https://github.com/kata-containers/tests/blob/master/cmd/checkmetrics/ci_slaves/checkmetrics-json-clh-baremetal-kata-metric3.toml

@likebreath
Copy link
Copy Markdown
Contributor Author

FYI: the pending PR kata-containers/tests#2809 to loos the check on minval, which should resolve the metrics CI failure we have with this PR.

likebreath added a commit to likebreath/kata-tests that referenced this pull request Aug 25, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath
Copy link
Copy Markdown
Contributor Author

Please help re-run the AzurePipelines. Just Rebased on master.
I will run the CLH related CI jobs from the dummy PR kata-containers/tests#2733 to make sure all CI jobs are passing (especially the metrics CI).
/test

This patch reuses the 'dev_cli.sh' from cloud-hypervisor repo to build a
static binary of clh without creating our own docker image nor
installing extra dependencies.

Fixes: kata-containers#1033

Signed-off-by: Bo Chen <chen.bo@intel.com>
likebreath added a commit to likebreath/kata-tests that referenced this pull request Aug 25, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
@likebreath
Copy link
Copy Markdown
Contributor Author

I see some instability of our clh-docker CI, which mostly can be resolved by restarting the CI. After talking with @jcvenegas, we believe it might be related to the seccomp filters defined in clh.

The static build of cloud-hypervisor is using a different set of syscalls from its dynamic build. The static build of clh is less tested in our kata CI, so we may see more instabilities when using the static build of clh. I am setting up a local environment to manually run the clh-docker CI tests, from which I am hoping to find the missing syscalls from the seccomp list in clh if any.

Meanwhile, for the sake of stability, we may need to keep using the dynamic build of clh in our current beta release of Kata.

Also, I am thinking we may want to disable the seccomp functionality of clh in Kata for now while we are completing the seccomp filter list based on Kata's CI jobs/workload.

/cc @egernst @jcvenegas @amshinde @sameo @sboeuf

likebreath added a commit to likebreath/kata-tests that referenced this pull request Aug 25, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099
Depends-on: github.com/kata-containers/runtime#2899

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
likebreath added a commit to likebreath/kata-tests that referenced this pull request Aug 25, 2020
Again this is a dummy PR to the test repo, and is used to validate the
changes to another PR from the packaging PR related to
cloud-hypervisor. The following linked issue is just a place holder.

Depends-on: github.com/kata-containers/packaging#1099
Depends-on: github.com/kata-containers/runtime#2900

Fixes: kata-containers#2546

Signed-off-by: Bo Chen <chen.bo@intel.com>
@sboeuf
Copy link
Copy Markdown

sboeuf commented Aug 26, 2020

@likebreath thanks for the summary about your investigations. I agree disabling seccomp while you try to find out the missing syscalls is acceptable.

@likebreath likebreath removed the do-not-merge PR has problems or depends on another label Aug 26, 2020
@likebreath
Copy link
Copy Markdown
Contributor Author

@egernst @amshinde Please run the AzurePipeline and I think the PR is ready for being merged.

The seccomp option of clh in kata-runtime is now disabled temperately in kata-runtime, I no longer see the instability of the clh-docker CI (tested in kata-containers/tests#2733). So I removed the DNM from this PR.

@jodh-intel
Copy link
Copy Markdown

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@likebreath likebreath merged commit 20a89c7 into kata-containers:master Aug 27, 2020
@likebreath likebreath added the port-to-2.0 PRs that need to be ported to kata 2.0-dev branch label Aug 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

port-to-2.0 PRs that need to be ported to kata 2.0-dev branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clh: Use Static build of cloud-hypervisor

7 participants