virtcontainers: support pre-add storage for frakti#301
virtcontainers: support pre-add storage for frakti#301WeiZhang555 merged 10 commits intokata-containers:masterfrom
Conversation
|
work in progress and replace #245 |
d112985 to
d119d8c
Compare
virtcontainers/container.go
Outdated
| return nil, err | ||
| } | ||
|
|
||
| // TODO: remove dependency of package drivers |
There was a problem hiding this comment.
If this TODO is going to be retained, please could you change it to reference a github issue so we don't forget to do this.
Same comment for the other TODO's.
There was a problem hiding this comment.
Addressed in #306 and self-assigned.
Will handle this soon
| attached := device.DevManager.AttachedDevices(device) | ||
| switch len(attached) { | ||
| case 0: | ||
| // this should never happen, report error if it occurs |
There was a problem hiding this comment.
Could you add a return fmt.Errorf("BUG: ...") or similar?
There was a problem hiding this comment.
Addressed: and find a wrong test case via fixing this!
| d.DevManager = dm | ||
| dm.devices[d.DeviceID()] = d | ||
| return d | ||
| case *drivers.GenericDevice: |
There was a problem hiding this comment.
As these three cases are identical, you could add explicit fallthrough's. Or something like:
switch d := dev.(type) {
case *drivers.BlockDevice:
case *drivers.VFIODevice:
case *drivers.GenericDevice:
default:
return nil
}
d.DevManager = dm
dm.devices[d.DeviceID()] = d
return d
There was a problem hiding this comment.
fallthrough isn't supported in type switch in my test. But latter one looks good, will modify, thank!
There was a problem hiding this comment.
Oops, sorry, just found that the latter one doesn't work either.
I need to use d in type switch, if I put the logic at below, it can't visit d anymore.
So have to keep the original way.
virtcontainers/hyperstart_agent.go
Outdated
| func fsMapFromDevices(devices []api.Device) ([]*hyperstart.FsmapDescriptor, error) { | ||
| var fsmap []*hyperstart.FsmapDescriptor | ||
| for _, device := range devices { | ||
| // skip if it's not block device |
There was a problem hiding this comment.
Nit: this comment is redundant I think as the test is very clear.
virtcontainers/kata_agent.go
Outdated
| } | ||
| d, ok := device.GetDeviceDrive().(*config.BlockDrive) | ||
| if !ok || d == nil { | ||
| k.Logger().Error("malformed block drive") |
There was a problem hiding this comment.
I'd add details of the invalid device. Something like:
k.Logger().WithField("device", device).Error("malformed block drive")
Same comment to error below.
virtcontainers/sandbox.go
Outdated
| // adding a group of VFIO devices | ||
| for _, dev := range vfioDevices { | ||
| if err = s.hypervisor.hotplugAddDevice(dev, vfioDev); err != nil { | ||
| s.Logger().WithError(err). |
There was a problem hiding this comment.
You could define a new logger variable here to avoid the WithFields() duplication:
logger := s.Logger().WithFields(...)
defer func () {
:
logger.WithError(rollbackErr).Error("failed to remove vfio device for rolling back")
:
}()
:
logger.WithError(err).Error("failed to hotplug VFIO device")
Same comment for code below.
There was a problem hiding this comment.
Oops, this is in for loop, and I need to display the detailed information for each dev, so I'm afraid a logger template won't be good enough.
Codecov Report
@@ Coverage Diff @@
## master #301 +/- ##
==========================================
- Coverage 66.64% 65.88% -0.77%
==========================================
Files 93 92 -1
Lines 9574 9743 +169
==========================================
+ Hits 6381 6419 +38
- Misses 2506 2618 +112
- Partials 687 706 +19
Continue to review full report at Codecov.
|
|
Add commit 2: |
|
ping @bergwolf Please check commit 2 with the new sandbox interface, see if this is what you want. Note: new interface isn't fully tested, I am adding new test cases to check if it works. And by the way, did anyone update the gometalinter? I see lots of new warnings for gometalinter. |
30debb5 to
bbf8fb3
Compare
virtcontainers/interfaces.go
Outdated
| SignalProcess(containerID, processID string, signal syscall.Signal, all bool) error | ||
| WinsizeProcess(containerID, processID string, height, width uint32) error | ||
| IOStream(containerID, processID string) (io.WriteCloser, io.Reader, io.Reader, error) | ||
| PreAddDevices(containerID ContainerConfig) ([]api.Device, error) |
There was a problem hiding this comment.
@WeiZhang555 Can you please decouple devices from container here? The original idea is to make storage and devices sandbox level resource, and AddStorage() would be able to add storage to an empty sandbox that has no containers. The AddStorage API need to do the following two things:
- add storage resource info to sandbox, and save it in persistent storage
- hotplug the storage devices if necessary
And it does not need to return a Device slice. The storage resources are saved in sandbox and VCSandbox.CreateContainer() can reference existing storage/devices with e.g. the device id.
some pseudo code of PreAddDevices()
func (s *Sandbox) PreAddStorage(devices []DeviceConfig) error {
for _, d := range devices {
s.hypervisor.hotplugDevice(d)
s.storages = append(s.devices, d)
}
s.filesystem.saveDeviceInfo()
}
And in VCSandbox.CreateContainer(), we allow containerConfig.Mount to reference an existing device with its ID, and update corresponding container.Mount.BlockDevice to match the referenced device.
There was a problem hiding this comment.
I agree with you in every point. The difficult point here is even after I did lots of reworks, the devices are still coupled to container in some point.
First one is device is created based on input of config.DeviceInfo which contains a containerPath in it so it has to bind to a container unless we modify the DeviceInfo construct.
Second is every device is store in sandbox/containerDir/devices.json and sandbox/containerDir/mount.json, we might need to move sandbox/containerDir/devices.json to sandbox/devices.json.
I agree with you this is the right way, but I don't think I can finish it before 1.0. That's why I choose another way, keep devices.json under the containerDir/ as before, and hotplug the devices for container in PreAddStorage function, then while we are creating the container, it can find the device.json on disk and skip the device hotplug process.
Do you want me to modify to the new way you described? It's a right approach, but there's a big risk that I can't finish it in several days. @bergwolf
There was a problem hiding this comment.
@WeiZhang555 Please change it so that we do not need to maintain the PreAddStorage API in this form post 1.0. And frakti won't be able to use it anyway. frakti wants to be able to add storage/device to an empty sandbox. If we cannot make it 1.0, let do post release like 1.1.0.
There was a problem hiding this comment.
I am gonna try to summarize my understanding here. We want an API that allows to hotplug blindly some storage to the VM without having to relate this to any container semantic.
Given the fact that the caller knows what it's doing, we could have a simple AddStorage() invoking some hypervisor function in order to hotplug what is needed as a first step.
As a second step, the caller (frakti most likely) will call later into CreateContainer(), providing the appropriate volume description expected for this container. If it wants the container to use the storage previously hotplugged, it will have to describe it through an appropriate Mount structure. We could add a specific field NeedToPlug = false to this structure to specify nothing needs to be done on the host, but a bind mount, or something else has to happen inside the guest.
Am I missing something or this could simply work ?
There was a problem hiding this comment.
@sboeuf You description is correct. One thing to note is that NeedToPlug is not necessary in device config and virtcontainers can decide if hotplug is needed depending on the device types.
| // driver has an underlying block storage device. | ||
| type BlockDrive struct { | ||
|
|
||
| // Path to the disk-image/device which will be used with this drive |
There was a problem hiding this comment.
comment should start with File, isn't it?
There was a problem hiding this comment.
Correct. I'll fix it, thanks!
| // SCSI address is in the format SCSI-Id:LUN | ||
| SCSIAddr string | ||
|
|
||
| // Path at which the device appears inside the VM, outside of the container mount namespace |
| // Path at which the device appears inside the VM, outside of the container mount namespace | ||
| VirtPath string | ||
|
|
||
| // Device path inside the container |
virtcontainers/sandbox.go
Outdated
| return nil, err | ||
| } | ||
|
|
||
| // TODO: hotplug rootfs |
There was a problem hiding this comment.
A lots of more work need to be done for this, so I added a TODO tag here.
virtcontainers/sandbox.go
Outdated
| // if err happens,roll back and remove added device! | ||
| if err != nil { | ||
| for _, dev := range removedDev { | ||
| if rollbackErr := s.hypervisor.hotplugAddDevice(dev, vfioDev); rollbackErr != nil { |
There was a problem hiding this comment.
comment says // if err happens,roll back and remove added device!, but here you are adding devices ?
There was a problem hiding this comment.
The comment is wrong, thanks! It's a typical copy-and-paste error 😉
virtcontainers/sandbox.go
Outdated
| ) | ||
|
|
||
| defer func() { | ||
| if err != nil { |
There was a problem hiding this comment.
to reduce 1 indentation level and improve code readability, you should check if err == nil and return in all rolling back functions, what do you think?
There was a problem hiding this comment.
I see your point, nice suggestion. Will fix it
virtcontainers/container.go
Outdated
| HostPath: m.Source, | ||
| ContainerPath: m.Destination, | ||
| DevType: "b", | ||
| Major: int64(stat.Rdev / 256), |
There was a problem hiding this comment.
@WeiZhang555 Can you use https://godoc.org/golang.org/x/sys/unix#Major instead? Same for minor number of device.
There was a problem hiding this comment.
Nice suggestion, thanks
| // DeviceManager can be used to create a new device, this can be used as single | ||
| // device management object. | ||
| type DeviceManager interface { | ||
| NewDevices(devInfos []config.DeviceInfo) ([]Device, error) |
There was a problem hiding this comment.
Any reason for handling a single device rather than the entire set of devices here?
Along those lines, can you split the refactoring in another commit with the reasoning, so that this change is easier to digest.
There was a problem hiding this comment.
I don't see any necessity to create many devices in one API, I was hoping to break this API into smaller granularity and let upper function call to decide whether to operate on many or one.
There was a problem hiding this comment.
Well, the reason for NewDevices() was to abstract the for loop from container.go. If nothing uses NewDevice() except newContainer(), I prefer abstraction and optimizations through NewDevices(). And the good thing with NewDevices() is that it covers the use case where you have only one device, which is not valid the other way around.
There was a problem hiding this comment.
Actually there is a reason:
I modified the container.go: https://github.com/kata-containers/runtime/pull/301/files#diff-8a56f5a1fa5971cc43e650ffa24f1e2bR613
+ for _, info := range contConfig.DeviceInfos {
+ dev, err := sandbox.devManager.NewDevice(info)
+ if err != nil {
+ return &Container{}, err
+ }
+ c.devices = append(c.devices, ContainerDevice{
+ ID: dev.DeviceID(),
+ ContainerPath: info.ContainerPath,
+ })
}
}
The ContainerPath is in each contConfig.DeviceInfos and I need to do a map with each device created by NewDevice and info.ContainerPath, if I use NewDevices() then there's no way to find which info links to which device, that's why I moved NewDevices() to a NewDevice(). @amshinde @sboeuf
| ContainerPath string | ||
| } | ||
|
|
||
| // VFIODrive represents a VFIO drive used for hotplugging |
There was a problem hiding this comment.
Drive here does not sound right. VFIO could be any sort of device. Again, if this is moved into its own commit, explaining why these structures are introduced, it will be helpful.
There was a problem hiding this comment.
Any good suggestion?
I am intending to abstract vfio device's ID and BDF to a standard struct but I'm not good at naming it:
runtime/virtcontainers/device/drivers/vfio.go
Lines 62 to 79 in 813c8c3
There was a problem hiding this comment.
Maybe a simple VFIODev because VFIO can be related to any type of device, not only drives for storage.
There was a problem hiding this comment.
Yeah, I would go with VFIODev too.
There was a problem hiding this comment.
Could you do the change regarding VFIODrive into VFIODev directly into this commit ? This way you fix it directly where it has been introduced.
| customOptions := device.DeviceInfo.DriverOptions | ||
| if customOptions != nil && customOptions["block-driver"] == "virtio-blk" { | ||
| device.VirtPath = filepath.Join("/dev", driveName) | ||
| device.PCIAddr = drive.PCIAddr |
There was a problem hiding this comment.
Why is the PCIAddr assignment deleted here? kata-agent relies on the PCIAddr for determining the virtio-block device name inside the guest. The assignment to Virtpath is not used by kata-agent, but was retained for the clear-container agent until that becomes obsolete.
There was a problem hiding this comment.
It's not removed, just moved to struct BlockDrive:
| if err != nil { | ||
| return err | ||
| } | ||
|
|
There was a problem hiding this comment.
you are not checking here if the device has been already attached here? We should avoid attaching VFIO devices more than once as well.
| // NewDevices creates bundles of devices based on array of DeviceInfo | ||
| func (dm *deviceManager) NewDevices(devInfos []config.DeviceInfo) ([]api.Device, error) { | ||
| // NewDevice creates bundles of devices based on array of DeviceInfo | ||
| func (dm *deviceManager) NewDevice(devInfo config.DeviceInfo) (api.Device, error) { |
There was a problem hiding this comment.
Why not just retain NewDevices? that way you can lock and unlock just once.
| // with input paramater device | ||
| func (dm *deviceManager) AttachedDevices(device api.Device) []api.Device { | ||
| var devices []api.Device | ||
| devInfo := device.GetDeviceInfo() |
| func (device *VFIODevice) Detach(devReceiver api.DeviceReceiver) error { | ||
| device.DevManager.Lock() | ||
| defer device.DevManager.Unlock() | ||
| device.DeviceInfo.Hotplugged = true |
There was a problem hiding this comment.
Oh, right. A critical mistake. Will modify
|
I am trying to do more and bigger refactoring on device manager part, so adding "[WIP]" back and will post the new codes one by one later. |
a6d02bc to
2ab0bbd
Compare
|
Thanks @WeiZhang555 for addressing all the comments. |
Fixes kata-containers#50 Previously the devices are created with device manager and laterly attached to hypervisor with "device.Attach()", this could work, but there's no way to remember the reference count for every device, which means if we plug one device to hypervisor twice, it's truly inserted twice, but actually we only need to insert once but use it in many places. Use device manager as a consolidated entrypoint of device management can give us a way to handle many "references" to single device, because it can save all devices and remember it's use count. Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
The interface "VhostUserDevice" has duplicate functions and fields with Device, so we can merge them into one interface and manage them with one group of interfaces. Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Instead of using drivers.XXXDevice directly, we should use exported struct from device structure. package drivers should be internal struct and other package should avoid read it's struct content directly. Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Fixes kata-containers#50 This commit imports a big logic change: * host device to be attached or appended now is sandbox level resources, one device should bind to sandbox/hypervisor first, then container could reference it via device's unique ID. * attach or detach device should go through the device manager interface instead of the device interface. * allocate device ID in global device mapper to guarantee every device has a uniq device ID and there won't be any ID collision. With this change, there will some changes on data format on disk for sandbox and container, these changes also make a breakage of backward compatibility. New persist data format: * every sandbox will get a new "devices.json" file under "/run/vc/sbs/<sid>/" which saves detailed device information, this also conforms to the concept that device should be sandbox level resource. * every container uses a "devices.json" file but with new data format: ``` [ { "ID": "b80d4736e70a471f", "ContainerPath": "/dev/zero" }, { "ID": "6765a06e0aa0897d", "ContainerPath": "/dev/null" } ] ``` `ID` should reference to a device in a sandbox, `ContainerPath` indicates device path inside a container. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Fix kata-containers#50 Fix unit tests Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
cleanup: remove ununsed device interface function "GetDeviceInfo()" Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Rename VFIODrive to VFIODev, also rename device interface "GetDeviceDrive()" to "GetDeviceInfo()". Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
Address some review comments: * remove unnecessary rollback logics * add vfio hot unplug handling. Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
Add test cases for device manager reworks. Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
b55e2f8 to
66fb6e7
Compare
|
PSS Measurement: Memory inside container: |
|
Build failed (third-party-check pipeline) integration testing with
|
|
CI's failing. The F27 at least looks like it might be a network glitch, so I will re-kick that one and we'll see how it goes: |
|
I think it turns out to be some random test case failure. @bergwolf, do you want to take a look at this PR? |
Fix typo. Signed-off-by: z00280905 <zhangwei555@huawei.com>
66fb6e7 to
b3015dd
Compare
|
PSS Measurement: Memory inside container: |
|
Build failed (third-party-check pipeline) integration testing with
|
bergwolf
left a comment
There was a problem hiding this comment.
@WeiZhang555 Thanks for the change! It looks mostly good. Just one suggestion on the virtcontainers API so that frakti can make a sandbox use the same device manager that it is using.
|
|
||
| // DeviceManager can be used to create a new device, this can be used as single | ||
| // device management object. | ||
| type DeviceManager interface { |
There was a problem hiding this comment.
@WeiZhang555 I'm wondering how this can be consumed by frakti. Currently sandbox has its private device manager and that will manage all device hotplug internally. If frakti wants to use this API, it will have to create a device manager on its own -- so that it can call functions in the DeviceManager interface. But that seems to conflict with the internal device manager in the sandbox structure. Would you please add device manager as an argument to CreateSandbox/RunSandbox and FetchSandbox, and use that to set sandbox.devManager if it's non-nil?
Another way is to make sandbox.HotplugAddDevice/sandbox.HotplugRemoveDevice top level API and use device manager underneath. But since you have already chosen a different path, there is no need to go back there. Thanks for the hard work!
There was a problem hiding this comment.
Another way is to make sandbox.HotplugAddDevice/sandbox.HotplugRemoveDevice top level API and use device manager underneath.
This is what I intended to do in seperate PR , and because this PR is too huge, my initial plan was only keeping the rework part in this one.
What do you think of the sandbox.HotplugAddDevice/sandbox.HotplugRemoveDevice ?
There was a problem hiding this comment.
And sandbox.HotplugAddDevice will be simply wrapper of DeviceManager.HotplugAddDevice
|
LGTM! Thanks @WeiZhang555 ! |
|
I'll merge this as we have 3 LGTM now, there will be a new Sandbox API for Frakti support in following PR, I'll do it ASAP. |
|
@WeiZhang555 I see that you merged this PR. I agree it was ready from a code review perspective, but usually, we don't merge when the CI is not passing... Zuul CI is not needed for now as it might not be completely stable, but |
protocols/client: close yamux session when closing the stream
Commit1: virtcontainers: same device can only attach once
Fix #50
This commit enhanced device manager in several aspects:
allocate device ID in global device mapper to guarantee every device
has a uniq device ID and there won't be any ID collision.
block device can only be attached once, first reason is that it
doesn't make sense to attach one block twice, second is for pre-add
storage for frakti, if one block is identified as hotplugged before, we
can skip and don't need to do it again. To identify if two block device
points to same one, we check the HostPath of device information, they
are regarded same when their HostPath is same.
Signed-off-by: Zhang Wei zhangwei555@huawei.com