Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

virtcontainers: refactor device.go to device manager#282

Merged
egernst merged 4 commits intokata-containers:masterfrom
WeiZhang555:device-manager
May 8, 2018
Merged

virtcontainers: refactor device.go to device manager#282
egernst merged 4 commits intokata-containers:masterfrom
WeiZhang555:device-manager

Conversation

@WeiZhang555
Copy link
Member

Fixes #50

This is done for decoupling device management part from other parts.
It seperate device.go to several dirs and files:

virtcontainers/device
├── api
│   └── interface.go
├── config
│   └── config.go
├── drivers
│   ├── block.go
│   ├── generic.go
│   ├── utils.go
│   ├── vfio.go
│   ├── vhost_user_blk.go
│   ├── vhost_user.go
│   ├── vhost_user_net.go
│   └── vhost_user_scsi.go
└── manager
    ├── manager.go
    └── utils.go
  • api contains interface definition of device management, so upper level caller
    should import and use the interface, and lower level should implement the interface.
    it's bridge to device drivers and callers.
  • config contains structed exported data.
  • drivers contains specific device drivers including block, vfio and vhost user
    devices.
  • manager exposes an external management package with a DeviceManager.

Signed-off-by: Zhang Wei zhangwei555@huawei.com

@amshinde amshinde added the review label May 2, 2018
@WeiZhang555
Copy link
Member Author

CI should still have some problems, working on resolving the test case problems.

@sboeuf sboeuf requested a review from amshinde May 2, 2018 15:38
@WeiZhang555 WeiZhang555 force-pushed the device-manager branch 4 times, most recently from 98ce4fd to caebef2 Compare May 3, 2018 14:19
@WeiZhang555 WeiZhang555 changed the title [WIP] virtcontainers: refactor device.go to device manager virtcontainers: refactor device.go to device manager May 3, 2018
@WeiZhang555
Copy link
Member Author

I think the code is ready for reviewing, but CI is still failing, guess I broke something but I can't find what's the exact problem.

# time="2018-05-03 15:22:18.236110022Z" level=debug msg="ExecRequest &ExecRequest{ContainerId:62db7baa8f4bbb932f141de79fe3c1d226a8eaec23a8668308fcb42240a85be0,Cmd:[env],Tty:false,Stdin:false,Stdout:true,Stderr:true,}" 
# time="2018-05-03T15:22:18Z" level=fatal msg="execing command in container failed: unable to upgrade connection: 404 page not found"
# 1
# rm: cannot remove '/tmp/tmp.LnOZCkmCCS/crio-run/devicemapper-containers/8e219ccc8357ddb06fb754d4f7ffd3aa25af2945986580b5d92624ab10c65412/userdata/shm': Device or resource busy
# rm: cannot remove '/tmp/tmp.LnOZCkmCCS/crio/devicemapper/mnt/4c80bed2c0250903123348ef400653aca641cc615639b8389321a57a5897cbf6': Device or resource busy
# rm: cannot remove '/tmp/tmp.LnOZCkmCCS/crio/devicemapper/mnt/1c36578b9b17d9fe463014ec12826dabf34f6bb81d9ef921df0298aeaeb40b7a': Device or resource busy
Build timed out (after 5 minutes). Marking the build as aborted.
Build was aborted
Performing Post build task...
Match found for :Build was aborted : True
Could not match :Build step 'Execute shell' marked build as failure  : False
Logical operation result is TRUE
Running script  : #!/bin/bash

export GOPATH=$WORKSPACE/go

cd $GOPATH/src/github.com/kata-containers/tests
.ci/teardown.sh "$WORKSPACE"
[kata-containers-runtime-ubuntu-16-04-PR] $ /bin/bash /tmp/jenkins5969148565439900524.sh
~/jenkins_slave/workspace/kata-containers-runtime-ubuntu-16-04-PR ~/jenkins_slave/workspace/kata-containers-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests
~/jenkins_slave/workspace/kata-containers-runtime-ubuntu-16-04-PR/go/src/github.com/kata-containers/tests
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Archiving artifacts
Setting status of caebef228f5cf1642afdcb6d9fd48437c906d665 to FAILURE with url http://kata-jenkins-ci.westus2.cloudapp.azure.com/job/kata-containers-runtime-ubuntu-16-04-PR/370/ and message: 'Build finished. '
Using context: jenkins-ci-ubuntu-16-04
Finished: ABORTED

Does anyone happens to know why? @sboeuf @amshinde @jodh-intel

@bergwolf
Copy link
Member

bergwolf commented May 3, 2018

@WeiZhang555 I've seen such error with other non-relevant PR too. I think it is a transient error since it disappeared after jenkins jobs are re-triggered.

@codecov
Copy link

codecov bot commented May 4, 2018

Codecov Report

Merging #282 into master will decrease coverage by 0.73%.
The diff coverage is 36.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #282      +/-   ##
==========================================
- Coverage   65.36%   64.62%   -0.74%     
==========================================
  Files          76       84       +8     
  Lines        8185     8043     -142     
==========================================
- Hits         5350     5198     -152     
- Misses       2262     2284      +22     
+ Partials      573      561      -12
Impacted Files Coverage Δ
virtcontainers/hypervisor.go 72.78% <ø> (+0.36%) ⬆️
virtcontainers/mount.go 80.17% <ø> (+0.31%) ⬆️
virtcontainers/device/drivers/vhost_user_net.go 0% <0%> (ø)
virtcontainers/device/drivers/generic.go 0% <0%> (ø)
virtcontainers/device/drivers/vhost_user_scsi.go 0% <0%> (ø)
virtcontainers/device/drivers/vhost_user.go 0% <0%> (ø)
virtcontainers/device/drivers/utils.go 0% <0%> (ø)
virtcontainers/hyperstart_agent.go 60.71% <0%> (+1.68%) ⬆️
virtcontainers/device/drivers/block.go 0% <0%> (ø)
virtcontainers/device/drivers/vhost_user_blk.go 0% <0%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81503d7...f4a453b. Read the comment docs.

@WeiZhang555
Copy link
Member Author

@bergwolf @sboeuf @amshinde @egernst
Wow, CI is all green now. Guess it was some random failures yesterday.

Can you help review so I can go on with the work of #245 ?

I know this PR is large and most of us don't hope to merge refactoring before 1.0, but without this #245 could be too ugly and I believe you won't want to merge it in.


// DeviceReceiver is an interface used for accepting devices
// a device should be attached/added/plugged to a DeviceReceiver
type DeviceReceiver interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiZhang555 I'm trying to understand why you need Sandbox to implement these interfaces. Is it because we do not export proper hypervisor APIs? The layering looks a bit strange but if it is temporary until we have a hypervisor module, I'm ok with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is temporary, if we can make a good hypervisor module things can be easier.
We need more refactor including seperating hypervisor management part.
And existence of GetAndSetSandboxBlockIndex() and DecrementSandboxBlockIndex() means we have to use Sandbox as DeviceReceiver. But actually a Hypervisor will be a better DeviceReceiver.

The implementation is not complete and perfect, we need more furthur works. But I don't have enough time to do more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this interface currently looks strange and forced, combining the calls for Sandbox and Hypervisor.
@WeiZhang555 I understand that you are trying to decouple devices completely from the rest of the virtcontainers components, but I would like to understand more clearly why this is required and how this will help the storage hotplug case. How storage hotplug will look without this decoupling.

Like you pointed out, devices do depend on calls from the sandbox and the hypervisor as well. I am not sure if we can get way with decoupling from the hypervisor completely, as we may have to call into some hypervisor specific details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amshinde devices should be able to only depend on hypervisor, once we have a hypervisor module there. It calls from the sandbox only because hypervisor is part of sandbox right now. There are two dependences on sandbox, hotplug api and virtio-scsi device index management. The hotplug api is just a wrapper of hypervisor hotplug api. And the virtio-scsi device index should be just put inside the hypervisor as well because it is hypervisor specific data.

If we all agree this code needs refactoring/decoupling, it looks to be a reasonable step towards a proper shape of modular components in virtcontainers.

Copy link
Member Author

@WeiZhang555 WeiZhang555 May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bergwolf for answering this for me! You said exactly what I want to say!

The problem I met while doing the storage hotplug feature is that the devices, sandbox, hypervisor and containers are all coupled closely. What I only want to do is add a "device" to "hypervisor", while at this time I don't even need a container, so first thing I want to do is remove "container" from the device manager API.

This PR is trying to separate the "hotplug device to hypervisor" part, but as @bergwolf said, the block index is put in sandbox structure, so I have to use both sandbox and sandbox.hypervisor, that's why the sandbox is a "DeviceReceiver" but not the hypervisor, in my head hypervisor could a good enough DeviceReceiver, that can be done in future and needs more modifications.

@amshinde

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf @WeiZhang555 Thanks for the explanation, it makes sense to me now. I agree we should move the virtio-scsi index to the hypervisor.
@WeiZhang555 Can we create an issue for this, so that we can keep track.

HotplugAddDevice(Device, config.DeviceType) error
HotplugRemoveDevice(Device, config.DeviceType) error

// this is only for virtio-blk support
Copy link
Member

@bergwolf bergwolf May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really only needed for virtio-scsi now that we have merged #267. And hopefully we can remove this in future by refactoring the scsi lun management code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. These two interfaces are truly not good being here.
This part would need more refactor as you said, I am trying to make this PR a little bit simplier so that it won't be too hard to review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiZhang555 This comments needs to be updated then to mention virtio-scsi.

var devices []api.Device

for _, devInfo := range devInfos {
hostPath, err := config.GetHostPathFunc(devInfo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this inside dm.CreateDevice() so that the two dm APIs accept the same DeviceInfo structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one thing I'm not sure here, config.GetHostPathFunc would give a different hostPath, and NewDevices() will modify devInfo.HostPath, and CreateDevice() won't modify devInfo . This is a difference between NewDevices() and CreateDevice() (modify hostPath or not).

Are these two APIs designed so? Because I'm not certain on this so I didn't modify it.
Do you think it's safe to move it inside dm.CreateDevice()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateDevice() is only called inside NewDevices(). So I think it is safe and we should let them share a similar semantics w.r.t. deviceinfo, unless you have other plans to use CreateDevice() differently in your storage hotplug PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right! And I think then we only need to expose NewDevices and don't need to expose CreateDevice now!
Great suggestion! I will modify this part.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bergwolf Updated

@WeiZhang555 WeiZhang555 force-pushed the device-manager branch 2 times, most recently from e3456c9 to 6458833 Compare May 4, 2018 13:22
)

// Defining these as a variable instead of a const, to allow
// overriding this in the tests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this comment was there in original code, but wanted to comment that I appreciate being explicit here on why you're not using a const! Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, this is from original code. But I also appreciate this, that's why I keep it after the refactor :-)

// FIXME: this is duplicate code from virtcontainers/hypervisor.go
const maxDevIDSize = 31

// FIXME: this is duplicate code from virtcontainers/hypervisor.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry; why can't we remove this from hypervisor.go @WeiZhang555

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because my original idea is we don't expose virtcontainers/device/drivers package directly to hypervisor at all (or at least not expose too much), and I don't want to let hypervisor.go invoke a drivers.MakeNameID() call, that's why I didn't remove it from hypervisor.go.

Maybe we can find a better way to handle these common functions with more refactor and further works, currently I don't have a certain thought for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiZhang555 This is really a generic function for creating name ids that will be used by the hypervisor, not just for device specific name-ids.

return devices, nil
}

// NewDeviceManager creates a deviceManager object bahaved as DeviceManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/bahaved/behaved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch! Will modify!

Copy link
Member

@egernst egernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, minus a couple queries/nits; I still have a bit more I want to look into for this (large) change, but definitely will need @amshinde to review.

Thanks for the big effort, @WeiZhang555


var devLogger = logrus.FieldLogger(logrus.New())

// SetLogger sets the logger for virtcontainers package.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/virtcontainers/device api/? package

HotplugAddDevice(Device, config.DeviceType) error
HotplugRemoveDevice(Device, config.DeviceType) error

// this is only for virtio-blk support
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiZhang555 This comments needs to be updated then to mention virtio-scsi.

}

// FIXME: this is duplicate code from virtcontainers/hypervisor.go
const maxDevIDSize = 31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lived in virtcontainers/qemu_arch_base.go. Different hypervisor implementations will have their own specific limits, this should really be living in the hypervisor implementations.

Copy link
Member Author

@WeiZhang555 WeiZhang555 May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move makeNameID and maxDevIDSize to the virtcontainers/utils package then, or there will be dependency cycle problem if I use if from virtcontainers package, later if we get a hypervisor management package, we can set different maxDevIDSize there if necessary.

// FIXME: this is duplicate code from virtcontainers/hypervisor.go
const maxDevIDSize = 31

// FIXME: this is duplicate code from virtcontainers/hypervisor.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiZhang555 This is really a generic function for creating name ids that will be used by the hypervisor, not just for device specific name-ids.


// bindDevicetoVFIO binds the device to vfio driver after unbinding from host.
// Will be called by a network interface or a generic pcie device.
func bindDevicetoVFIO(bdf, hostDriver, vendorDeviceID string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiZhang555 This should live in the driver for vfio. Though this is currently just being used by network.go, this will potentially be used by other devices in the future.

// Sandbox implement DeviceReceiver interface from device/api/interface.go
func (s *Sandbox) HotplugAddDevice(device api.Device, devType config.DeviceType) error {
switch devType {
case config.DeviceVFIO:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an assertion on VFIODevice here?

func (s *Sandbox) HotplugRemoveDevice(device api.Device, devType config.DeviceType) error {
switch devType {
case config.DeviceVFIO:
return s.hypervisor.hotplugRemoveDevice(device, vfioDev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, missing assertion.


// DeviceReceiver is an interface used for accepting devices
// a device should be attached/added/plugged to a DeviceReceiver
type DeviceReceiver interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this interface currently looks strange and forced, combining the calls for Sandbox and Hypervisor.
@WeiZhang555 I understand that you are trying to decouple devices completely from the rest of the virtcontainers components, but I would like to understand more clearly why this is required and how this will help the storage hotplug case. How storage hotplug will look without this decoupling.

Like you pointed out, devices do depend on calls from the sandbox and the hypervisor as well. I am not sure if we can get way with decoupling from the hypervisor completely, as we may have to call into some hypervisor specific details.

@WeiZhang555
Copy link
Member Author

@amshinde Your comments are addressed.

Copy link
Member

@amshinde amshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@amshinde
Copy link
Member

amshinde commented May 7, 2018

@WeiZhang555 Thanks for addressing comments, can you rebase your changes?

Fixes kata-containers#50

This is done for decoupling device management part from other parts.
It seperate device.go to several dirs and files:

```
virtcontainers/device
├── api
│   └── interface.go
├── config
│   └── config.go
├── drivers
│   ├── block.go
│   ├── generic.go
│   ├── utils.go
│   ├── vfio.go
│   ├── vhost_user_blk.go
│   ├── vhost_user.go
│   ├── vhost_user_net.go
│   └── vhost_user_scsi.go
└── manager
    ├── manager.go
    └── utils.go
```

* `api` contains interface definition of device management, so upper level caller
should import and use the interface, and lower level should implement the interface.
it's bridge to device drivers and callers.
* `config` contains structed exported data.
* `drivers` contains specific device drivers including block, vfio and vhost user
devices.
* `manager` exposes an external management package with a `DeviceManager`.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
CreateDevice() is only used by `NewDevices()` so we can make it private and
there's no need to export it.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Fix typo.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
* Move makeNameID() func to virtcontainers/utils file as it's a generic
function for making name and ID.
* Move bindDevicetoVFIO() and bindDevicetoHost() to vfio driver package.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
@WeiZhang555
Copy link
Member Author

@amshinde Thank you for help review.

@egernst What do you think about this? Any more comments?

@egernst
Copy link
Member

egernst commented May 8, 2018

statically looks fine - was going to wait for CI to finish up (wow, it is taking >51 minutes?)

@egernst egernst merged commit fa848ba into kata-containers:master May 8, 2018
@WeiZhang555 WeiZhang555 deleted the device-manager branch May 8, 2018 06:28
zklei pushed a commit to zklei/runtime that referenced this pull request Jun 13, 2019
Now cgroups are mounted at /sys/fs/cgroup, in the past
when initrd was used cgroups were mounted at /proc/cgroups

fixes kata-containers#282

Signed-off-by: Julio Montes <julio.montes@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants