Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

add kernel config for gpu#938

Merged
devimc merged 4 commits into
kata-containers:masterfrom
Jimmy-Xu:kernel-support-gpu
Mar 20, 2020
Merged

add kernel config for gpu#938
devimc merged 4 commits into
kata-containers:masterfrom
Jimmy-Xu:kernel-support-gpu

Conversation

@Jimmy-Xu
Copy link
Copy Markdown

@Jimmy-Xu Jimmy-Xu commented Feb 17, 2020

To support VFIO passthrough GPU (for example Nvidia Tesla P100)

  • enable mmio
  • enable load module(load NVIDIA Driver)

Usage example:

$ ./build-kernel.sh -v 4.19.86 -g nvidia -f setup
$ ./build-kernel.sh -v 4.19.86 -g nvidia build
$ sudo -E ./build-kernel.sh -v 4.19.86 -g nvidia install

-g <gpu_vendor> means enable gpu support, the option can be nvidia and intel

Output guest kernel:

/usr/share/kata-containers/vmlinux-nvidia-gpu.container -> vmlinux-4.19.86-70-nvidia-gpu
/usr/share/kata-containers/vmlinuz-nvidia-gpu.container -> vmlinuz-4.19.86-70-nvidia-gpu

To build NVIDIA Driver in kata container, kernel-devel is required:

$ cd kata-linux-4.19.86-70
$ make rpm-pkg

Output RPMs:

~/rpmbuild/RPMS/x86_64/kernel-devel-4.19.86_nvidia_gpu-1.x86_64.rpm

Comment thread kernel/configs/fragments/x86_64/gpu/version.conf Outdated
@Jimmy-Xu Jimmy-Xu force-pushed the kernel-support-gpu branch 2 times, most recently from f3503b8 to 673c94a Compare February 21, 2020 07:48
Copy link
Copy Markdown

@devimc devimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Jimmy-Xu

lgtm, but I'd like what other folks think about this

cc @grahamwhaley @amshinde @jcvenegas

@jodh-intel
Copy link
Copy Markdown

/test

@devimc
Copy link
Copy Markdown

devimc commented Feb 25, 2020

@Jimmy-Xu CI is not happy

ERROR: Please bump version in kernel/kata_config_version

@Jimmy-Xu Jimmy-Xu force-pushed the kernel-support-gpu branch 2 times, most recently from f078453 to b292ee3 Compare February 26, 2020 10:35
@Jimmy-Xu
Copy link
Copy Markdown
Author

/test

@chavafg
Copy link
Copy Markdown
Contributor

chavafg commented Feb 26, 2020

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@Jimmy-Xu
Copy link
Copy Markdown
Author

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 938 in repo kata-containers/packaging

@jodh-intel
Copy link
Copy Markdown

@grahamwhaley, @jcvenegas - any idea what's up with Azure? ^^

@grahamwhaley
Copy link
Copy Markdown
Contributor

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@grahamwhaley
Copy link
Copy Markdown
Contributor

Let's try from me - it may be a perms/group thing.... that we can fix if we see that is true.

@grahamwhaley
Copy link
Copy Markdown
Contributor

Sooo, @jcvenegas @chavafg - how does the Azure pipe check user perms - @Jimmy-Xu is in the kata org, I guess maybe he would need to be in one of the teams as well?

@chavafg
Copy link
Copy Markdown
Contributor

chavafg commented Feb 27, 2020

@jcvenegas should know more, but I see the next note in the "Controlling the CI" wiki page

Note: Only works if you are a member of the packaging team and the phrase is the only text in the comment!

@chavafg
Copy link
Copy Markdown
Contributor

chavafg commented Feb 27, 2020

I see @Jimmy-Xu is already part of the packaging team, so not sure what else is missing.

Comment thread kernel/configs/fragments/x86_64/gpu/iommu.conf Outdated
Comment thread kernel/configs/fragments/x86_64/gpu/vfio.conf Outdated
Comment thread kernel/configs/fragments/x86_64/gpu/x86.conf Outdated
@Jimmy-Xu Jimmy-Xu force-pushed the kernel-support-gpu branch 2 times, most recently from bae6d24 to d4f4319 Compare February 28, 2020 14:47
@Jimmy-Xu
Copy link
Copy Markdown
Author

/test

# Support to load driver module
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_SIG=y
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @grahamwhaley @devimc Dont remember if we ran into issues with CONFIG_MODULE_SIG in the past.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about this..... in theory I think MODULE_SIG is probably a good thing, as then you can only load modules actually built for your kernel.... but, it will add some overhead I think.
I was also thinking it would probably hard-tie the exact nvidia module used to the exact kernel you are loading... which, I'm not sure is the intention, or exactly what you don't want.

@Jimmy-Xu - can you tell us why you enabled it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, @Jimmy-Xu is this option really needed?

Copy link
Copy Markdown
Author

@Jimmy-Xu Jimmy-Xu Feb 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grahamwhaley @devimc
There is a parameter CONFIG_CRYPTO_FIPS in kernel/configs/fragments/common/crypto.conf.
It depends on CRYPTO [=y] && (CRYPTO_ANSI_CPRNG [=y] || CRYPTO_DRBG [=n]) && !CRYPTO_MANAGER_DISABLE_TESTS [=n] && (MODULE_SIG [=n] || !MODULES [=y])
Therefore, if CONFIG_MODULES is enabled, CONFIG_MODULE_SIG is also required.

If CONFIG_CRYPTO_FIPS is removed, CONFIG_MODULE_SIG can also be removed.

I ran a test and after removing these two parameters the Nvidia GPU worked fine in a kata container.

Is the CONFIG_CRYPTO_FIPS really needed?

On the other hand, I think it's safer to use CONFIG_MODULE_SIG.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I remember now .... @amshinde added FIPS, which is somewhat entangled with MODULE. There is a conversation over at #891 (comment) where I dug into it...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grahamwhaley No wonder that config looked familiar :) Now that I see the comments on the related issue, I am fine with having CONFIG_MODULE_SIG.

@Jimmy-Xu Jimmy-Xu force-pushed the kernel-support-gpu branch from 88f48d4 to 5680e62 Compare March 14, 2020 07:56
@Jimmy-Xu
Copy link
Copy Markdown
Author

@devimc Ah, I see.
I can't put the gpu directory under x86_64 due to https://github.com/kata-containers/packaging/blob/master/obs-packaging/linux-container/debian.rules#L27
Fixed.
Thank you!

@Jimmy-Xu
Copy link
Copy Markdown
Author

/test

@Jimmy-Xu
Copy link
Copy Markdown
Author

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@jcvenegas
Copy link
Copy Markdown
Member

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@Jimmy-Xu
Copy link
Copy Markdown
Author

@sameo
kata-containers/documentation/pull/615 is blocked on this PR, can you PTAL?

@grahamwhaley
Copy link
Copy Markdown
Contributor

I'm going to stick a DNM label on here whilst we do a little bit of review....

@grahamwhaley grahamwhaley added the do-not-merge PR has problems or depends on another label Mar 19, 2020
Copy link
Copy Markdown
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, looking very good @Jimmy-Xu :-)
A few minor comments.
I think we should probably add some documentation in https://github.com/kata-containers/packaging/blob/master/kernel/README.md about the command line arguments though - something we have been missing so far, but now I think with the addition of the gpu item, it is more important we have them.
/cc @jcvenegas

Comment thread kernel/build-kernel.sh Outdated
kernel_config_path=""
#Force generate config when setup
force_setup_generate_config="false"
#Nvidia kernel support
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Nvidia specific, or does it also work for other GPUs?
We should either fix the comment, or change the variable name I suspect.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think GPU kernel support sounds more generic

Copy link
Copy Markdown
Author

@Jimmy-Xu Jimmy-Xu Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. support Intel and Nvidia GPU, -g intel, -g nvidia

Comment thread kernel/build-kernel.sh
info "kernel path does not exist, will download kernel"
download_kernel="true"
[ -n "$kernel_version" ] || die "failed to get kernel version: Kernel version is emtpy"
if [[ "${force_setup_generate_config}" != "true" ]];then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would have added the 'force build' and 'bash debug' bits as separate commits in this PR, just to make it clearer for the reviewers ;-)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment thread kernel/build-kernel.sh Outdated
vmlinuz="vmlinuz-${kernel_version}-${config_version}"
vmlinux="vmlinux-${kernel_version}-${config_version}"

sufix=""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - I think you inherited this, but - s/sufix/suffix/

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread kernel/build-kernel.sh
kernel_config_path="${OPTARG}"
;;
d)
PS4=' Line ${LINENO}: '
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sweet ;-)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS5 is coming 😉

Copy link
Copy Markdown
Author

@Jimmy-Xu Jimmy-Xu Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PS4 defined the $LINENO. So I used PS4 here.

@devimc
Copy link
Copy Markdown

devimc commented Mar 19, 2020

thanks @Jimmy-Xu , just a couple of comments/questions and we could merge this

Jimmy Xu added 3 commits March 20, 2020 02:18
Add option '-g' in build-kernel.sh to build a guest kernel that supports Intel/Nvidia GPU

Fixes: kata-containers#979

Signed-off-by: Jimmy Xu <junming.xjm@antfin.com>
Add option '-d' in build-kernel.sh to enable bash debug.

Signed-off-by: Jimmy Xu <junming.xjm@antfin.com>
Add option '-f' in build-kernel.sh to force the generation of .config

Signed-off-by: Jimmy Xu <junming.xjm@antfin.com>

n 请为您的变更输入提交说明。以 '#' 开始的行将被忽略,而一个空的提交
@Jimmy-Xu Jimmy-Xu force-pushed the kernel-support-gpu branch from 5680e62 to 0a875ff Compare March 19, 2020 18:49
Add usage of build-kernel.sh to the readme

Signed-off-by: Jimmy Xu <junming.xjm@antfin.com>
@Jimmy-Xu Jimmy-Xu force-pushed the kernel-support-gpu branch from 0a875ff to 12d351d Compare March 19, 2020 18:54
@Jimmy-Xu
Copy link
Copy Markdown
Author

/test

@devimc
Copy link
Copy Markdown

devimc commented Mar 19, 2020

/AzurePipelines run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 2 pipeline(s).

@jodh-intel
Copy link
Copy Markdown

Hi @Jimmy-Xu - when you are happy, please remove the do-not-merge label (as that is currently stopping this PR from being merged).

@Jimmy-Xu Jimmy-Xu removed the do-not-merge PR has problems or depends on another label Mar 20, 2020
@Jimmy-Xu
Copy link
Copy Markdown
Author

Jimmy-Xu commented Mar 20, 2020

@jodh-intel do-not-merge label has been removed:)

Jimmy-Xu pushed a commit to Jimmy-Xu/documentation that referenced this pull request Mar 20, 2020
@devimc devimc merged commit 607931c into kata-containers:master Mar 20, 2020
dong-liuliu pushed a commit to dong-liuliu/kata-documentation that referenced this pull request Mar 30, 2020
dong-liuliu pushed a commit to dong-liuliu/kata-documentation that referenced this pull request Mar 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants