support-vsock: load vhost_vsock module if it isn't built-in#1513
support-vsock: load vhost_vsock module if it isn't built-in#1513jodh-intel merged 3 commits intokata-containers:masterfrom
Conversation
|
/test |
|
Hi~ @grahamwhaley @chavafg @devimc @jodh-intel sorry to bother. I know a lot CI failed. I really tried to find the failing spot. I literally have no clue. could anyone lend a hand here? a lot thanks!!!!!! ;) |
|
Hi @Pennyzct :-) Yeah, the error is not obvious ;-) We should also get some opinion on if we should be auto-loading the |
|
@grahamwhaley I don't recall the discussion about the decision, but the main drawback is that having this |
|
Hi~ @grahamwhaley sooooooo thanks for the help!!!! That the ARM CI got passed also confirmed the result. |
|
Hi~ @sboeuf sorry to bother, could you please explain what is "hot path"?? or just link me the similar issue?? a lot thanks. ;) |
By "hot path" I mean the code path that is used when calling into That being said, if the modprobe is not too slow and if others think that is not too critical, then we could consider having the |
|
Hi~ @sboeuf thanks for the detailed explanation! ;) |
|
@Pennyzct - you should probably use the report tool to do a before/after comparison of boot times ( I have to agree, that if it has an impact (and I think it probably will, as it has to exec() another binary which goes and does some work), then we may not want to do the probe. |
|
Hi~ @grahamwhaley Do you think it is also a feasible solution? ;) |
|
Hi @Pennyzct - looks feasible. Yes, running >=20 instances and taking the average is the right way, as we do get spikes due to other system activity etc. Yours:
/me very happy to see somebody running metrics :-) |
a0a4dc7 to
48485ab
Compare
|
Hi~ @grahamwhaley @sboeuf I have changed my solution, from probing every time when creating a container to only the first time, if on the same host machine. ;) I'm afraid if I use report tool @grahamwhaley suggested, it couldn't provide the same environment before each run. ;) |
|
Hi~ @grahamwhaley @sboeuf After: (load my patch) I will try to find one x86_64 machine to do the comparison. ;) |
|
/test |
|
@Pennyzct - you should be able to use the report tool to tell if the PR makes a different to the real use case - that is, the module would only get inserted once for the first container, but the check would happen for every container - so you would do something like: # Have the last kata release installed
$ modprobe -r *vsock* (or whatever, to ensure we have a clean start - maybe even a reboot)
$ grabdata.sh -t
$ mkdir ../results/run1
$ mv ../results/*.json ../results/run1/
# Now check out, build and install your PR into your system
$ modprobe -r *vsock* (or whatever, to ensure we have a clean start - maybe even a reboot)
$ grabdata.sh -t
$ mkdir ../results/run2
$ mv ../results/*.json ../results/run2/
$ ./makereport.sh
# and view report in output dirThe idea is to compare the current version or HEAD performance, and the performance when the new code is installed. You do not want to remove the module between each container launch, as that is not how it would work in 'the real world' - the module would only get loaded once, the first time the probe is done. What we are really checking for is if the module/vsock check makes any difference to the performance. |
|
It looks to me that the 10:20:02 /usr/local/bin/kata-runtime
10:20:02 Logging kata-env information:
10:20:02 Build step 'Execute shell' marked build as failure |
|
Hi~ @grahamwhaley I found one x86_64 machine and patched my code. when |
|
Hi~ @grahamwhaley this is the performance report on AArch64. Since it is a pdf report, it isn't convenient to paste here. so I upload into my own repository Pennyzct/metrics_report, you could see from there. |
|
/test |
|
/test |
|
Hi~ @grahamwhaley Hi~ finally knowing why a lot ci failed, getting error output from metrics CI: |
|
Ah, good spot @Pennyzct - that might be as I think @jcvenegas added some extra debug in the fail case to the logs? Which machine (distro/arch) is that on? /cc @devimc |
|
Hi~ @grahamwhaley Besides the old kernel on metrics CI, vsock needing 4.8+, other failed because |
|
I'm thinking, does all failed CI run in VMware guest? if it's the reason, maybe we should check if it is |
|
Good question @Pennyzct . The other question that occurs to me is - is a failure to insert the module a fatal error? I think it probably should not be. So, in theory the |
|
/test |
|
Hi~ @grahamwhaley same error three times. I'm afraid this is not sporadic. Do you find this error anywhere besides here? |
|
Hi, Hi @Pennyzct . Aha, so, if you look in the .json files generated as part of the metrics tests (they are attached as artifacts to the Jenkins builds), then you find this in them: },
"kata-env" :
warning: check host vhost_vsock ability failed: modprobe -i vhost_vsock failed: modprobe: FATAL: Module vhost_vsock not found in directory /lib/modules/4.4.0-134-generic
{which, is invalid json :-) The big question then is I think, what should we do if the |
|
Agreed - Speaking of which, tal at how |
|
Any update on this @Pennyzct? |
|
Hi~ @jodh-intel I've been offline for attending Open Infrastructure Summit. I'll try to update asap.;) |
|
/test |
|
Hi~ Sorry for the delayed update. ;) @grahamwhaley @egernst @devimc
I updated the |
|
/test |
|
|
||
| // SupportsVsocks returns true if vsocks are supported, otherwise false | ||
| func SupportsVsocks() bool { | ||
| if _, err := os.Stat(VSockDevicePath); err != nil { |
There was a problem hiding this comment.
It looks like you can remove VSockDevicePath entirely now as it's only being used by the tests?
There was a problem hiding this comment.
yes, already updated. ;)
|
@mnaser, @chavafg - and the same problem on Ubuntu 16.04: |
1b29131 to
c7cdf16
Compare
|
Hi~ @grahamwhaley @jodh-intel For better understanding of the |
|
/test |
|
On sles 12, I see that the issue is: maybe the sles kernel version does not support vsocks? I don't see the Ubuntu failure... Well, it failed but on the integration tests... |
QEMU opens /dev/vhost-vsock and this causes vhost_vsock.ko to be automatically loaded. So, checking the existence of /dev/vhost-vsock is enough. Fixes: kata-containers#1512 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Since we prefer vsock over virtio serial port, we add 'vhost_vsock' in kernel mosules list. But vhost_vsock.ko shouldn't be the definitely required kernel modules, afterall, we could also use virtio serial port. if kata-env shows SupportsVSocks as false, users could run kata-check to manually load vhost_vsock.ko and get detailed info(errors) Fixes: kata-containers#1512 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We should refine unit test which involves func SupportsVsocks and newly reconstructed struct kernelModule. Fixes: kata-containers#1512 Signed-off-by: Penny Zheng <penny.zheng@arm.com>
|
/test |
|
Looks like the sles CI has stalled but before it's been scheduled (so there is no restart link). All other CI's have passed. This code looks distro-agnostic to me so let's just merge to avoid having to re-crank every CI for that one issue... |
Description of problem
In a few host machines, when vhost_vsock isn't built-in, but could be loadable, value
SupportVoscksfromkata-envstill outputs false.We add
modprobe -i vhost_vsockin funcSupportsVsocksto try to avoid above scenario.Expected result
Actual result
Related: #1195 #581