Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

support-vsock: load vhost_vsock module if it isn't built-in#1513

Merged
jodh-intel merged 3 commits intokata-containers:masterfrom
Pennyzct:vsock
May 14, 2019
Merged

support-vsock: load vhost_vsock module if it isn't built-in#1513
jodh-intel merged 3 commits intokata-containers:masterfrom
Pennyzct:vsock

Conversation

@Pennyzct
Copy link
Contributor

Description of problem

In a few host machines, when vhost_vsock isn't built-in, but could be loadable, value SupportVoscks from kata-env still outputs false.
We add modprobe -i vhost_vsock in func SupportsVsocks to try to avoid above scenario.

Expected result

[Host]
  Kernel = "5.0.7"
  Architecture = "arm64"
  VMContainerCapable = true
  SupportVSocks = true
  [Host.Distro]
    Name = "Ubuntu"
    Version = "18.04"
  [Host.CPU]
    Vendor = "3rd Party Limited"
    Model = "v8"

Actual result

[Host]
  Kernel = "5.0.7"
  Architecture = "arm64"
  VMContainerCapable = true
  SupportVSocks = false
  [Host.Distro]
    Name = "Ubuntu"
    Version = "18.04"
  [Host.CPU]
    Vendor = "3rd Party Limited"
    Model = "v8"

Related: #1195 #581

@Pennyzct
Copy link
Contributor Author

/test

@Pennyzct
Copy link
Contributor Author

Hi~ @grahamwhaley @chavafg @devimc @jodh-intel sorry to bother. I know a lot CI failed. I really tried to find the failing spot. I literally have no clue. could anyone lend a hand here? a lot thanks!!!!!! ;)

@grahamwhaley
Copy link
Contributor

Hi @Pennyzct :-) Yeah, the error is not obvious ;-)
I think what has happened is the script has tried to run kata-runtime kata-env, and that has failed, and thus failed the build. Could you try to run that locally?

Logging kata-env information:
Build step 'Execute shell' marked build as failure

We should also get some opinion on if we should be auto-loading the vsock module, and execing a modprobe call. I think we probably originally decided to not do this when we first wrote the code - now we need to discuss/remember why...
/cc @sboeuf @egernst @gnawux @jodh-intel

@sboeuf
Copy link

sboeuf commented Apr 10, 2019

@grahamwhaley I don't recall the discussion about the decision, but the main drawback is that having this modprobe being performed by kata-runtime would affect the "hot path" of creating a container, which would affect performances.

@Pennyzct
Copy link
Contributor Author

Hi~ @grahamwhaley sooooooo thanks for the help!!!!
on AArch64, that's no problem, Here is the output from my local machine:

root@entos-thunderx2-desktop:~# kata-runtime kata-env
[Meta]
  Version = "1.0.21"

[Runtime]
  Debug = true
  Trace = false
  DisableGuestSeccomp = true
  DisableNewNetNs = false
  Path = "/usr/local/bin/kata-runtime"
  [Runtime.Version]
    Semver = "1.7.0-alpha0"
    Commit = "1684332d059f3698f4a1a7f6d18cf32c4f79a732"
    OCI = "1.0.1-dev"
  [Runtime.Config]
    Path = "/etc/kata-containers/configuration.toml"

[Hypervisor]
  MachineType = "virt"
  Version = "QEMU emulator version 3.0.92 (v3.1.0-rc2-dirty)\nCopyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers"
  Path = "/snap/kata-containers/140/usr/bin/qemu-system-aarch64"
  BlockDeviceDriver = "virtio-scsi"
  EntropySource = "/dev/urandom"
  Msize9p = 8192
  MemorySlots = 10
  Debug = true
  UseVSock = true

[Image]
  Path = ""

[Kernel]
  Path = "/snap/kata-containers/140/usr/share/kata-containers/vmlinuz-4.19.28.container"
  Parameters = "agent.log=debug initcall_debug"

[Initrd]
  Path = "/usr/share/kata-containers/kata-containers-alpine-3.7-osbuilder-edd7d9c-agent-74639b7.initrd"

[Proxy]
  Type = "noProxy"
  Version = ""
  Path = ""
  Debug = false

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.6.1-39896627000d0305490e26440880a68f766ce45c"
  Path = "/snap/kata-containers/140/usr/libexec/kata-containers/kata-shim"
  Debug = true

[Agent]
  Type = "kata"

[Host]
  Kernel = "5.0.7"
  Architecture = "arm64"
  VMContainerCapable = true
  SupportVSocks = true
  [Host.Distro]
    Name = "Ubuntu"
    Version = "18.04"
  [Host.CPU]
    Vendor = "3rd Party Limited"
    Model = "v8"

[Netmon]
  Version = "kata-netmon version 1.6.1"
  Path = "/snap/kata-containers/current/usr/libexec/kata-containers/kata-netmon"
  Debug = true
  Enable = false

That the ARM CI got passed also confirmed the result.
I will try to find one x86_64 machine to do the test. ;)

@Pennyzct
Copy link
Contributor Author

Hi~ @sboeuf sorry to bother, could you please explain what is "hot path"?? or just link me the similar issue?? a lot thanks. ;)

@sboeuf
Copy link

sboeuf commented Apr 11, 2019

@Pennyzct

could you please explain what is "hot path"?? or just link me the similar issue??

By "hot path" I mean the code path that is used when calling into kata-runtime create. This is a critical path since we expect some performance from there, and by having the module loading part of this code flow, we might slow it down.

That being said, if the modprobe is not too slow and if others think that is not too critical, then we could consider having the kata-runtime create command checking if the module is loaded or not, and load it if that's not the case.

@Pennyzct
Copy link
Contributor Author

Hi~ @sboeuf thanks for the detailed explanation! ;)

@grahamwhaley
Copy link
Contributor

@Pennyzct - you should probably use the report tool to do a before/after comparison of boot times (grabdata.sh -t with the report tool), so we can see if there is any visible impact to the probe.

I have to agree, that if it has an impact (and I think it probably will, as it has to exec() another binary which goes and does some work), then we may not want to do the probe.

@Pennyzct
Copy link
Contributor Author

Hi~ @grahamwhaley
Thanks for the report tool, everyday learn a new thing. ;) I will try and paste out the comparison lately.
previously, i was writing a simple test to catch the difference of boot time of kata create
here is the code, just calculate the execution time of the same kata create 100 times and see the mean.

 #!/bin/bash
count=0.000000
for((i=1;i<=100;i++))
do
startTime=`date +"%s.%N"`
kata-runtime create --bundle /tmp/bundle067106738 vsock
endTime=`date +"%s.%N"`
value=$(echo `awk -v x1="$(echo $endTime | cut -d '.' -f 1)" -v x2="$(echo $startTime | cut -d '.' -f 1)" -v y1="$[$(echo $endTime | cut -d '.' -f 2) / 1000]" -v y2="$[$(echo $startTime | cut -d '.' -f 2) /1000]" 'BEGIN{printf "%.6f",(x1-x2)+(y1-y2)/1000000}'`)
echo $value
count=$(echo `awk -v x1="$(echo $value)" -v x2="$(echo $count)" 'BEGIN{printf "%.6f",x1+x2}'`)
echo $count
kata-runtime delete vsock
done
mean=$(echo `awk -v x1="$(echo $count)" 'BEGIN{printf "%.6f\n",x1/100}'`)

Do you think it is also a feasible solution? ;)

@grahamwhaley
Copy link
Contributor

Hi @Pennyzct - looks feasible. Yes, running >=20 instances and taking the average is the right way, as we do get spikes due to other system activity etc.
That test is quite similar to the one the report tool makes use of: https://github.com/kata-containers/tests/blob/master/metrics/time/launch_times.sh
Be careful that you may be measuring the complete create/run/delete cycle - you may not want to measure that 'delete' bit ;-)

Yours:

  • uses the runtime directly, whereas the launch_times uses docker. In some ways that is good, as it just measures kata parts.
  • Only measures the complete cycle. the launch_times tries to measure some other times as well - time for the kernel to boot, time to get to the workload etc.

/me very happy to see somebody running metrics :-)

@Pennyzct Pennyzct force-pushed the vsock branch 2 times, most recently from a0a4dc7 to 48485ab Compare April 12, 2019 06:35
@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 12, 2019

Hi~ @grahamwhaley @sboeuf I have changed my solution, from probing every time when creating a container to only the first time, if on the same host machine. ;)
I will still do the performance test, using following script, I will modprobe -r vhost_vsock every time to make sure the same environment:

#!/bin/bash

ns() {
        s=$(echo $1 | cut -d '.' -f 1)
        n=$(echo $1 | cut -d '.' -f 2)
        ns=$(( (s * 1000000000) + n ))

        echo $ns
}

count=0.000000
for((i=1;i<=20;i++))
do
startTime=`date +"%-s.%-N"`
kata-runtime create --bundle /tmp/bundle067106738 vsock
endTime=`date +"%-s.%-N"`
startTime=$(ns $startTime)
endTime=$(ns $endTime)
value=$(echo "scale=3 ; ($endTime - $startTime) / 1000000000" | bc)
echo $value
count=$(echo "scale=3 ; $count + $value" | bc)
kata-runtime delete vsock
while :
do
        if [ "$(lsmod | awk '{ if($1 == "vhost_vsock") print $3;}')" == "0" ]; then
                modprobe -r vhost_vsock
                break
        fi
done
done
mean=$(echo "scale=3 ; $count / 20" | bc)
echo "mean: $mean"

I'm afraid if I use report tool @grahamwhaley suggested, it couldn't provide the same environment before each run. ;)

@Pennyzct
Copy link
Contributor Author

Hi~ @grahamwhaley @sboeuf
On AArch64, Each run has 20 samples.
Before:

2.924
2.942
2.948
2.942
2.941
2.944
2.932
2.931
2.944
2.941
2.939
2.951
2.947
2.942
2.936
2.952
2.942
2.937
2.939
2.932
mean: 2.940

After: (load my patch)

2.942
2.962
2.962
2.948
2.954
2.950
2.954
2.946
2.943
2.940
2.939
2.950
2.950
2.931
2.958
2.956
2.941
2.960
2.946
2.943
mean: 2.948

I will try to find one x86_64 machine to do the comparison. ;)

@Pennyzct
Copy link
Contributor Author

/test

@grahamwhaley
Copy link
Contributor

@Pennyzct - you should be able to use the report tool to tell if the PR makes a different to the real use case - that is, the module would only get inserted once for the first container, but the check would happen for every container - so you would do something like:

# Have the last kata release installed
$ modprobe -r *vsock* (or whatever, to ensure we have a clean start - maybe even a reboot)
$ grabdata.sh -t
$ mkdir ../results/run1
$ mv ../results/*.json ../results/run1/
# Now check out, build and install your PR into your system
$ modprobe -r *vsock* (or whatever, to ensure we have a clean start - maybe even a reboot)
$ grabdata.sh -t
$ mkdir ../results/run2
$ mv ../results/*.json ../results/run2/
$ ./makereport.sh
# and view report in output dir

The idea is to compare the current version or HEAD performance, and the performance when the new code is installed. You do not want to remove the module between each container launch, as that is not how it would work in 'the real world' - the module would only get loaded once, the first time the probe is done. What we are really checking for is if the module/vsock check makes any difference to the performance.

@grahamwhaley
Copy link
Contributor

It looks to me that the kata-runtime kata-env is still failing, on the metrics machine at least?

10:20:02 /usr/local/bin/kata-runtime
10:20:02 Logging kata-env information:
10:20:02 Build step 'Execute shell' marked build as failure

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 15, 2019

Hi~ @grahamwhaley I found one x86_64 machine and patched my code.
No matter when I turned on use_vsock or not, kata-runtime kata-env works well.
when use_vsock is turned on:

$ kata-runtime kata-env                
[Meta]
  Version = "1.0.21"

[Runtime]
  Debug = false
  Trace = false
  DisableGuestSeccomp = true
  DisableNewNetNs = false
  Path = "/usr/local/bin/kata-runtime"
  [Runtime.Version]
    Semver = "1.7.0-alpha0"
    Commit = "8d711073fe242ff01aee89e540eee1633705f92f"
    OCI = "1.0.1-dev"
  [Runtime.Config]
    Path = "/usr/share/defaults/kata-containers/configuration-qemu.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  EntropySource = "/dev/urandom"
  Msize9p = 8192
  MemorySlots = 10
  Debug = false
  UseVSock = true

[Image]
  Path = "/usr/share/kata-containers/kata-containers-clearlinux-28830-osbuilder-edd7d9c-agent-7720b93.img"

[Kernel]
  Path = "/usr/share/kata-containers/vmlinuz-4.19.28-33"
  Parameters = "init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket systemd.mask=systemd-journald.service systemd.mask=systemd-journald.socket systemd.mask=systemd-journal-flush.service systemd.mask=systemd-udevd.service systemd.mask=systemd-udevd.socket systemd.mask=systemd-udev-trigger.service systemd.mask=systemd-timesyncd.service systemd.mask=systemd-update-utmp.service systemd.mask=systemd-tmpfiles-setup.service systemd.mask=systemd-tmpfiles-cleanup.service systemd.mask=systemd-tmpfiles-cleanup.timer systemd.mask=tmp.mount systemd.mask=systemd-random-seed.service"

[Initrd]
  Path = ""

[Proxy]
  Type = "noProxy"
  Version = ""
  Path = ""
  Debug = false

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.6.0-rc2-ea27044"
  Path = "/usr/libexec/kata-containers/kata-shim"
  Debug = false

[Agent]
  Type = "kata"

[Host]
  Kernel = "4.15.0-43-generic"
  Architecture = "amd64"
  VMContainerCapable = true
  SupportVSocks = true
  [Host.Distro]
    Name = "Ubuntu"
    Version = "18.04"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz"

[Netmon]
  Version = "kata-netmon version 1.7.0-alpha0"
  Path = "/usr/libexec/kata-containers/kata-netmon"
  Debug = false
  Enable = false

when use_vsock is turned off:

$ kata-runtime kata-env
[Meta]
  Version = "1.0.21"

[Runtime]
  Debug = false
  Trace = false
  DisableGuestSeccomp = true
  DisableNewNetNs = false
  Path = "/usr/local/bin/kata-runtime"
  [Runtime.Version]
    Semver = "1.7.0-alpha0"
    Commit = "8d711073fe242ff01aee89e540eee1633705f92f"
    OCI = "1.0.1-dev"
  [Runtime.Config]
    Path = "/usr/share/defaults/kata-containers/configuration-qemu.toml"

[Hypervisor]
  MachineType = "pc"
  Version = "QEMU emulator version 2.11.0\nCopyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers"
  Path = "/usr/bin/qemu-lite-system-x86_64"
  BlockDeviceDriver = "virtio-scsi"
  EntropySource = "/dev/urandom"
  Msize9p = 8192
  MemorySlots = 10
  Debug = false
  UseVSock = false

[Image]
  Path = "/usr/share/kata-containers/kata-containers-clearlinux-28830-osbuilder-edd7d9c-agent-7720b93.img"

[Kernel]
  Path = "/usr/share/kata-containers/vmlinuz-4.19.28-33"
  Parameters = "init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket systemd.mask=systemd-journald.service systemd.mask=systemd-journald.socket systemd.mask=systemd-journal-flush.service systemd.mask=systemd-udevd.service systemd.mask=systemd-udevd.socket systemd.mask=systemd-udev-trigger.service systemd.mask=systemd-timesyncd.service systemd.mask=systemd-update-utmp.service systemd.mask=systemd-tmpfiles-setup.service systemd.mask=systemd-tmpfiles-cleanup.service systemd.mask=systemd-tmpfiles-cleanup.timer systemd.mask=tmp.mount systemd.mask=systemd-random-seed.service"

[Initrd]
  Path = ""

[Proxy]
  Type = "kataProxy"
  Version = "kata-proxy version 1.6.0-rc2-d4aa0b2"
  Path = "/usr/libexec/kata-containers/kata-proxy"
  Debug = false

[Shim]
  Type = "kataShim"
  Version = "kata-shim version 1.6.0-rc2-ea27044"
  Path = "/usr/libexec/kata-containers/kata-shim"
  Debug = false

[Agent]
  Type = "kata"

[Host]
  Kernel = "4.15.0-43-generic"
  Architecture = "amd64"
  VMContainerCapable = true
  SupportVSocks = true
  [Host.Distro]
    Name = "Ubuntu"
    Version = "18.04"
  [Host.CPU]
    Vendor = "GenuineIntel"
    Model = "Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz"

[Netmon]
  Version = "kata-netmon version 1.7.0-alpha0"
  Path = "/usr/libexec/kata-containers/kata-netmon"
  Debug = false
  Enable = false

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 15, 2019

Hi~ @grahamwhaley this is the performance report on AArch64. Since it is a pdf report, it isn't convenient to paste here. so I upload into my own repository Pennyzct/metrics_report, you could see from there.
Also report_orig.sh for collecting current version or HEAD performance, report.sh for collecting when my code is installed

@Pennyzct
Copy link
Contributor Author

/test

@Pennyzct
Copy link
Contributor Author

/test

@Pennyzct
Copy link
Contributor Author

Hi~ @grahamwhaley Hi~ finally knowing why a lot ci failed, getting error output from metrics CI:
modprobe -i vhost_vsock failed: modprobe: FATAL: Module vhost_vsock not found in directory /lib/modules/4.4.0-134-generic

@grahamwhaley
Copy link
Contributor

Ah, good spot @Pennyzct - that might be as I think @jcvenegas added some extra debug in the fail case to the logs? Which machine (distro/arch) is that on? /cc @devimc

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 15, 2019

Hi~ @grahamwhaley Besides the old kernel on metrics CI, vsock needing 4.8+, other failed because modprobe -i vhost_vsock failed: modprobe: ERROR: could not insert 'vhost_vsock': Operation not permitted.

@Pennyzct
Copy link
Contributor Author

I'm thinking, does all failed CI run in VMware guest?
Coping from VSOCK.md, we need to

sudo systemctl stop vmware-tools
sudo modprobe -r vmw_vsock_vmci_transport
sudo modprobe -i vhost_vsock

if it's the reason, maybe we should check if it is RunningOnVMM, above workload is too much, we may just give a heads-up to users, let themselves do the work.

@grahamwhaley
Copy link
Contributor

Good question @Pennyzct . The other question that occurs to me is - is a failure to insert the module a fatal error? I think it probably should not be. So, in theory the kata-runtime kata-env in the CI scripts probably should have 'worked', even if it cannot insert the vsock module for whatever reason.

@Pennyzct
Copy link
Contributor Author

/test

@Pennyzct
Copy link
Contributor Author

Pennyzct commented Apr 24, 2019

Hi~ @grahamwhaley

16:46:58 time="2019-04-23T08:46:58Z" level=warning msg="Failed to run [jq .\"boot-times\".Results | .[] | .\"to-workload\".Result /home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests/metrics/results/boot-times.json][exit status 4]"
16:46:58 time="2019-04-23T08:46:58Z" level=warning msg="[/home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests/metrics/results/boot-times.json][exit status 4]"
16:46:58 time="2019-04-23T08:46:58Z" level=warning msg="Failed to run [jq .\"memory-footprint\".Results | .[] | .average.Result /home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests/metrics/results/memory-footprint.json][exit status 4]"
16:46:58 time="2019-04-23T08:46:58Z" level=warning msg="[/home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests/metrics/results/memory-footprint.json][exit status 4]"
16:46:58 time="2019-04-23T08:46:58Z" level=warning msg="Failed to run [jq .\"memory-footprint-ksm\".Results | .[] | .average.Result /home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests/metrics/results/memory-footprint-ksm.json][exit status 4]"
16:46:58 time="2019-04-23T08:46:58Z" level=warning msg="[/home/jenkins/workspace/workspace/kata-metrics-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests/metrics/results/memory-footprint-ksm.json][exit status 4]"
16:46:58 time="2019-04-23T08:46:58Z" level=warning msg="Overall we failed"

same error three times. I'm afraid this is not sporadic. Do you find this error anywhere besides here?

@grahamwhaley
Copy link
Contributor

Hi, Hi @Pennyzct . Aha, so, if you look in the .json files generated as part of the metrics tests (they are attached as artifacts to the Jenkins builds), then you find this in them:

},
	"kata-env" :
	warning: check host vhost_vsock ability failed: modprobe -i vhost_vsock failed: modprobe: FATAL: Module vhost_vsock not found in directory /lib/modules/4.4.0-134-generic
{

which, is invalid json :-)

The big question then is I think, what should we do if the modprobe fails? I suspect we should silently fail, as some systems may not have vsock.
I'm open for some input from @egernst @jodh-intel here as well :-)

@jodh-intel
Copy link

Agreed - kata-env should always show as much details as possible. Failing would stop that happening unless we are super-careful about how we handle failures (as we are in kata-check).

Speaking of which, tal at how kata-env displays Host.VMContainerCapable - it essentially calls kata-check. So we could modify kata-env to add a new Host.VSockCapable (or Host.VHostCapable, ...) option for example. If that shows as false, users could run kata-check to get the full details (and errors)?

@jodh-intel
Copy link

Any update on this @Pennyzct?

@Pennyzct
Copy link
Contributor Author

Pennyzct commented May 7, 2019

Hi~ @jodh-intel I've been offline for attending Open Infrastructure Summit. I'll try to update asap.;)

@teawater
Copy link
Member

teawater commented May 7, 2019

/test

@Pennyzct
Copy link
Contributor Author

Pennyzct commented May 10, 2019

Hi~ Sorry for the delayed update. ;) @grahamwhaley @egernst @devimc
Thanks for the suggestion from @stefanha, quoting our discussion here,

This happens because /dev/vsock only exists after vhost_vsock.ko has been loaded. QEMU opens /dev/vhost-vsock and this causes vhost_vsock.ko to be loaded. Therefore SupportsVsocks() shouldn't call os.Stat(VSockDevicePath), just checking os.Stat(VHostVSockDevicePath) is enough.

Hi~ @stefanha I'm Penny. ;). Correct me if I'm wrong, So as you say, even the vhost_vsock.ko hasn't been loaded at host firstly, we don't need to usemodprobe -i vhost_vsock to load vhost_vsock.ko before using them, it will be automatically loaded when QEMU opens /dev/vhost-vosck.

@Pennyzct Hi Penny! Yes, that's correct since Linux 4.13. Earlier Linux versions required manual loading of vhost_vsock.ko.

I updated the SupportsVSocks to only check /dev/vhost_vsock and I added the vhost_vsock.ko into acquired kernel modules in kata-check, just like @jodh-intel suggested, using kata-check to get the full details (and errors).
But i only cover two archs, amd64 and arm64, since i didn't know the status of vsock in s390x and ppc64le.

@Pennyzct
Copy link
Contributor Author

/test


// SupportsVsocks returns true if vsocks are supported, otherwise false
func SupportsVsocks() bool {
if _, err := os.Stat(VSockDevicePath); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you can remove VSockDevicePath entirely now as it's only being used by the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, already updated. ;)

@jodh-intel
Copy link

@chavafg, @mnaser - check out this sles 12 CI failure:

time="2019-05-10T08:40:21Z" level=error msg="ERROR: System is not capable of running Kata Containers" arch=amd64 name=kata-runtime pid=66647 source=runtime

@jodh-intel
Copy link

@mnaser, @chavafg - and the same problem on Ubuntu 16.04:

09:25:16 time="2019-05-10T08:25:16Z" level=error msg="kernel property not found" arch=amd64 description="Host Support for Linux VM Sockets" name=vhost_vsock pid=15988 source=runtime type=module
09:25:16 time="2019-05-10T08:25:16Z" level=info msg="kernel property found" arch=amd64 description="Intel KVM" name=kvm_intel pid=15988 source=runtime type=module
09:25:16 time="2019-05-10T08:25:16Z" level=error msg="ERROR: System is not capable of running Kata Containers" arch=amd64 name=kata-runtime pid=15988 source=runtime

@Pennyzct Pennyzct force-pushed the vsock branch 2 times, most recently from 1b29131 to c7cdf16 Compare May 13, 2019 07:50
@Pennyzct
Copy link
Contributor Author

Hi~ @grahamwhaley @jodh-intel For better understanding of the modprobe -i module failure, i tried to deliver more specific error info in this patch series.
And for modprobe -i vhost_vsock, i found that if the host is running on VMM, especially in a VMWare guest environment, the whole insertion procedure is a little bit complicated, so i also tried to give warning message to users to let them manually deal with this scenario.

@Pennyzct
Copy link
Contributor Author

/test

@chavafg
Copy link
Contributor

chavafg commented May 13, 2019

On sles 12, I see that the issue is:

time="2019-05-13T08:34:45Z" level=error msg="kernel property not found" arch=amd64 description="Host Support for Linux VM Sockets" name=vhost_vsock pid=64711 source=runtime type=module

maybe the sles kernel version does not support vsocks?

I don't see the Ubuntu failure... Well, it failed but on the integration tests...

Pennyzct added 3 commits May 14, 2019 13:30
QEMU opens /dev/vhost-vsock and this causes vhost_vsock.ko to be
automatically loaded.
So, checking the existence of /dev/vhost-vsock is enough.

Fixes: kata-containers#1512

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Since we prefer vsock over virtio serial port, we add 'vhost_vsock'
in kernel mosules list.
But vhost_vsock.ko shouldn't be the definitely required kernel modules,
afterall, we could also use virtio serial port.
if kata-env shows SupportsVSocks as false, users could run kata-check
to manually load vhost_vsock.ko and get detailed info(errors)

Fixes: kata-containers#1512

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
We should refine unit test which involves func SupportsVsocks and newly
reconstructed struct kernelModule.

Fixes: kata-containers#1512

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
@Pennyzct
Copy link
Contributor Author

/test

@jodh-intel
Copy link

Looks like the sles CI has stalled but before it's been scheduled (so there is no restart link). All other CI's have passed. This code looks distro-agnostic to me so let's just merge to avoid having to re-crank every CI for that one issue...

@jodh-intel jodh-intel merged commit 576b8a5 into kata-containers:master May 14, 2019
@ganeshmaharaj ganeshmaharaj mentioned this pull request Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants