[WIP][RFC]persist: baseline persist data format#874
Conversation
174e0dd to
09c2284
Compare
|
@WeiZhang555 Nice! I think we should also clarify a few things in order to roll out the v1 persistent data format:
Also since the PR is an incompatible change by its own, it should be merged as a new release and we can take the opportunity to revisit existing fields to see if we want to drop or re-organize any of them. |
|
@bergwolf I agree with every point 😄 |
|
@WeiZhang555 , I spent some time studying the PR and understand and appreciate the intent. I think a good goal will be for the kata-runtime persistent storage to align with the resulting data-structures that you are presenting here. It looks like in this PR you are simply copying the data-structures which are currently used in virtcontainers, correct? When discussing with Archana and Sebastien, we agreed that perhaps these structures contain more than what we’d minimally need to have persist. Would you agree? In kata-runtime we are a bit naïve and store more than is minimally necessary. Even more fields could be removed after a container is started (but would be required between the time the container is created and before it is started). Further optimization, reducing how much we’d need to serialize/derserialize. How do you see the project using the version field? I wonder if most of the presence/unexpected fields can be handled by marshalling of the json itself? |
Regarding your last question in comment: It's for some breaking change. e.g. v1: Then we find that int isn't right , we actually need is a map, then v2: And new handling code should be : |
|
@WeiZhang555 I'd like to support what @egernst said here:
For now, it is fine to go with this PR, but it showed us that a cleanup is really needed, and by cleanup I mean identify what is strictly needed to be stored. Here is an example of what we discussed with @egernst and @amshinde: // Bridge is a bridge where devices can be hot plugged
type Bridge struct {
// Address contains information about devices plugged and its address in the bridge
DeviceAddr map[uint32]string
// Type is the type of the bridge (pci, pcie, etc)
Type string
//ID is used to identify the bridge in the hypervisor
ID string
// Addr is the PCI/e slot of the bridge
Addr int
}would become: type BridgeState struct {
//ID is used to identify the bridge in the hypervisor
ID string
}
// Bridge is a bridge where devices can be hot plugged
type Bridge struct {
// Address contains information about devices plugged and its address in the bridge
DeviceAddr map[uint32]string
// Type is the type of the bridge (pci, pcie, etc)
Type string
// State holds all information that need to be stored
State BridgeState
// Addr is the PCI/e slot of the bridge
Addr int
}This way, we would apply the same logic to every structure, and we would end up eventually with a list of structures |
First one is better. The example I gave is not very good, but real world can be very complicated, in my example, we can detect different versions from Another example, suppose one day we want to add a restriction: "containers in same POD must be created&started by same runtime version" To achieve this, a
|
09c2284 to
ffbb691
Compare
|
#883 is a demo to show how to make use of this package laterly. |
I don't think this should be done as we store in
I am fine with that, the example you mentioned sounds reasonable.
No, no I was simply using this as an example, but if we think DeviceAddr needs to be stored, then it should be part of the
Yes, virtcontainers would rely on this package (persistapi), and the structures such as So, it'd be nice to identify only what's strictly needed to be stored, so that |
|
@WeiZhang555 Sorry for the delay, was away for a couple of days. id - id used to pass the netdev option to qemu With this, we would need translation logic to go from this data to the actual endpoint type, since the endpoint structures are themselves composed of structures. (I think some of the endpoint structures needs to be revisited and can be simplified further.) |
Nit: we should be careful not to mention QEMU specific things as part of the description of this persist API. |
How about we make a rule to generate id used by qemu? |
|
I think |
|
|
||
| // Major, minor numbers for device. | ||
| Major int64 | ||
| Minor int64 |
There was a problem hiding this comment.
With HostPath, do we still need DevType/Major/Minor?
There was a problem hiding this comment.
I think Major/Minor is more important since this is compliant with OCI spec: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#devices
Maybe we should remove the HostPath ?
There was a problem hiding this comment.
I think it depends on how we plan on using the information. Right now we use it to reconstruct the device slice in memory. And we are not using HostPath directly, -- createDevice() looks for host path with Major/Minor pair instead. So yes, I agree we should remove HostPath.
c4024ea to
0011bf3
Compare
Yes I agree. My point was to avoid mentioning QEMU from the networking structures since it's supposed to be independent from the hypervisor. |
|
@sboeuf Oh, I get your point now. Yes, I agree with you! |
0011bf3 to
3c7c66d
Compare
Fixes kata-containers#803 The disk persist data should be "versioned" and baselined, any modification in persist data should be considered potential break of backward compatibility. Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
3c7c66d to
0a99ce3
Compare
|
ping @kata-containers/runtime |
|
Talked about in 12/17/18 meeting (kata-herding) |
|
Hi @WeiZhang555 - Thanks for raising! I think it would be interesting if you took this RFC a little further so we can see how this API might work. For example, you've created As noted on kata-containers/kata-containers#25, I think it is best not to rely on the version number and have the code which reads state off the disk for an X-1 version of Kata "fill in the gaps" to set defaults where it is sensible to do so (if it can't, that would be an error of course). One reason for doing this being that unless we are really careful, we may forget to increase |
|
Also, as we start to restructure the codebase, what impact will this have on the persistence API design? See for example #1096. |
|
@WeiZhang555 - ah! I've just found #874 ;) |
|
Err, I mean I just found #883 ;) |
|
@WeiZhang555 ping. Any updates? |
Fixes #803
The disk persist data should be "versioned" and baselined, any modification in
persist data should be considered potential break of backward compatibility.
Signed-off-by: Wei Zhang zhangwei555@huawei.com
TODO LIST:
/var/lib/vc/sbs/<sid>//var/run/kata-containerscan be re-organized or merged with/var/run/vc