Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

[WIP][RFC]persist: baseline persist data format#874

Closed
WeiZhang555 wants to merge 1 commit into
kata-containers:masterfrom
WeiZhang555:rfc-persist-data-standard
Closed

[WIP][RFC]persist: baseline persist data format#874
WeiZhang555 wants to merge 1 commit into
kata-containers:masterfrom
WeiZhang555:rfc-persist-data-standard

Conversation

@WeiZhang555
Copy link
Copy Markdown
Member

@WeiZhang555 WeiZhang555 commented Oct 31, 2018

Fixes #803

The disk persist data should be "versioned" and baselined, any modification in
persist data should be considered potential break of backward compatibility.

Signed-off-by: Wei Zhang zhangwei555@huawei.com

TODO LIST:

  • network endpoints data isn't formalized
  • config.json from /var/lib/vc/sbs/<sid>/
  • /var/run/kata-containers can be re-organized or merged with /var/run/vc

@WeiZhang555
Copy link
Copy Markdown
Member Author

@amshinde Can you help me fix the network endpoint persist data? It was content from "network.json", currently it saves quite a lot of data which contains some unnecessary part, e.g. "EndpointProperties".

https://github.com/kata-containers/runtime/pull/874/files#diff-38d341573c08cdfaf94b6efd91f2ed8dR13

I think you should be right person who is most familiar with "*_endpoints.go" @amshinde

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard branch 3 times, most recently from 174e0dd to 09c2284 Compare November 1, 2018 08:26
@bergwolf
Copy link
Copy Markdown
Member

bergwolf commented Nov 1, 2018

@WeiZhang555 Nice! I think we should also clarify a few things in order to roll out the v1 persistent data format:

  1. The target of backward compatibility is to let newer runtime be able to read and process persistent data of old runtime
  2. When removing a field, we should keep obsolete fields for a few releases even if they are not used, so that the json package can still unpack the old format
  3. When adding a new field, it should have always have a default working value so that it still works when new runtime reads from old persistent data

Also since the PR is an incompatible change by its own, it should be merged as a new release and we can take the opportunity to revisit existing fields to see if we want to drop or re-organize any of them.

@WeiZhang555
Copy link
Copy Markdown
Member Author

@bergwolf I agree with every point 😄
So far we haven't got a conclusion about our compabitility policy, kata-containers/documentation#196 can give us some basic reference but it's not merged yet.

@egernst
Copy link
Copy Markdown
Member

egernst commented Nov 2, 2018

@WeiZhang555 , I spent some time studying the PR and understand and appreciate the intent. I think a good goal will be for the kata-runtime persistent storage to align with the resulting data-structures that you are presenting here.

It looks like in this PR you are simply copying the data-structures which are currently used in virtcontainers, correct? When discussing with Archana and Sebastien, we agreed that perhaps these structures contain more than what we’d minimally need to have persist. Would you agree? In kata-runtime we are a bit naïve and store more than is minimally necessary.

Even more fields could be removed after a container is started (but would be required between the time the container is created and before it is started). Further optimization, reducing how much we’d need to serialize/derserialize.

How do you see the project using the version field? I wonder if most of the presence/unexpected fields can be handled by marshalling of the json itself?

@WeiZhang555
Copy link
Copy Markdown
Member Author

WeiZhang555 commented Nov 2, 2018

How do you see the project using the version field? I wonder if most of the presence/unexpected fields can be handled by marshalling of the json itself?

Regarding your last question in comment:

It's for some breaking change. e.g.

v1:

BlockIndex int

Then we find that int isn't right , we actually need is a map, then

v2:

BlockIndex int  // legacy
BlockIndexMap map[int]bool

And new handling code should be :

if v1 {
  // this container is created by old runtime, go to legacy handling logic
  // handle BlockIndex
} else if v2 {
  // moder runtime, handling modern fields
  // handle BlockIndexMap
}

@sboeuf
Copy link
Copy Markdown

sboeuf commented Nov 2, 2018

@WeiZhang555
Based on your example where you're adding a new field between v1 and v2, I don't think the versioning is needed here.
When you upgrade an existing pod with the runtime v2, it will simply unmarshal the structure from the disk, and JSON unmarshalling will take care of populating only the BlockIndex since it was stored by a runtime v1. So now your runtime v2 does not have the information he expects about the BlockIndexMap because this field was not known by the runtime v1 at the time it stored the information. We have two choices, either the runtime v2 knows how to fallback and use BlockIndex, or it graciously fails saying that BlockIndexMap is empty.

I'd like to support what @egernst said here:

It looks like in this PR you are simply copying the data-structures which are currently used in virtcontainers, correct? When discussing with Archana and Sebastien, we agreed that perhaps these structures contain more than what we’d minimally need to have persist. Would you agree? In kata-runtime we are a bit naïve and store more than is minimally necessary.

For now, it is fine to go with this PR, but it showed us that a cleanup is really needed, and by cleanup I mean identify what is strictly needed to be stored. Here is an example of what we discussed with @egernst and @amshinde:
For instance, from the structure Bridge, all we need is to store the ID. This means that we should not define the entire structure Bridge as part of this package. Instead, we should rework the existing codebase to split Bridge:

// Bridge is a bridge where devices can be hot plugged
type Bridge struct {
	// Address contains information about devices plugged and its address in the bridge
	DeviceAddr map[uint32]string
	// Type is the type of the bridge (pci, pcie, etc)
	Type string
	//ID is used to identify the bridge in the hypervisor
	ID string
	// Addr is the PCI/e slot of the bridge
	Addr int
}

would become:

type BridgeState struct {
        //ID is used to identify the bridge in the hypervisor
	ID string
}

// Bridge is a bridge where devices can be hot plugged
type Bridge struct {
	// Address contains information about devices plugged and its address in the bridge
	DeviceAddr map[uint32]string
	// Type is the type of the bridge (pci, pcie, etc)
	Type string
        // State holds all information that need to be stored
	State BridgeState
	// Addr is the PCI/e slot of the bridge
	Addr int
}

This way, we would apply the same logic to every structure, and we would end up eventually with a list of structures *****State defined from the persist API. This would make virtcontainers a consumer of this persistapi package too.

@WeiZhang555
Copy link
Copy Markdown
Member Author

WeiZhang555 commented Nov 3, 2018

@sboeuf

We have two choices, either the runtime v2 knows how to fallback and use BlockIndex, or it graciously fails saying that BlockIndexMap is empty.

First one is better. The example I gave is not very good, but real world can be very complicated, in my example, we can detect different versions from BlockIndex or BlockIndexMap without help of PersistVersion, but I would prefer we keep it in case we need a very clear version check.

Another example, suppose one day we want to add a restriction:

"containers in same POD must be created&started by same runtime version"

To achieve this, a PersistVersion is very necessary. Without this, we cannot find a reliable way to detect "different" runtime version.

For instance, from the structure Bridge, all we need is to store the ID. This means that we should not define the entire structure Bridge as part of this package. Instead, we should rework the existing codebase to split Bridge:

  1. In this example, is ID the only required field? DeviceAddr saves device pci address, in future, if we want to support flexible hot plug/unplug, we need to know free pci slot, so I think maybe DeviceAddr need to persist too. Same consideration to other fields, I think we need to take care of them carefully.

  2. This would make virtcontainers a consumer of this persistapi package too This isn't required but it's OK. virtcontainers is internal implementation, it can do anything with any structure, what we required is a stable fundamental persist data, and virtcontainer implementation change shouldn't influence persist data format. Another choice is we can use a converter, as I suggested to virtcontainers <-> agent GRPC structure translation (ref: virtcontainers: Rely on new interface LinkType field #867), this will requires less modification(maybe) in virtcontainers/

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard branch from 09c2284 to ffbb691 Compare November 5, 2018 02:59
@WeiZhang555
Copy link
Copy Markdown
Member Author

#883 is a demo to show how to make use of this package laterly.

@sboeuf
Copy link
Copy Markdown

sboeuf commented Nov 6, 2018

@WeiZhang555

/var/run/kata-containers can be re-organized or merged with /var/run/vc

I don't think this should be done as we store in /var/run/kata-containers what is specific to the kata-runtime cli, whilst the /var/run/vc path is used by the Kata API.

To achieve this, a PersistVersion is very necessary. Without this, we cannot find a reliable way to detect "different" runtime version.

I am fine with that, the example you mentioned sounds reasonable.

In this example, is ID the only required field? DeviceAddr saves device pci address, in future, if we want to support flexible hot plug/unplug, we need to know free pci slot, so I think maybe DeviceAddr need to persist too. Same consideration to other fields, I think we need to take care of them carefully.

No, no I was simply using this as an example, but if we think DeviceAddr needs to be stored, then it should be part of the BridgeState structure too.

This would make virtcontainers a consumer of this persistapi package too This isn't required but it's OK. virtcontainers is internal implementation, it can do anything with any structure, what we required is a stable fundamental persist data, and virtcontainer implementation change shouldn't influence persist data format.

Yes, virtcontainers would rely on this package (persistapi), and the structures such as Bridge, Container,... would still be defined by virtcontainers. But each of these structures would include (or embed?) the corresponding ****State structure defined by persistapi package.

So, it'd be nice to identify only what's strictly needed to be stored, so that persistapi package would define only BridgeState, ContainerState, SandboxState, and so on.

@amshinde
Copy link
Copy Markdown
Member

amshinde commented Nov 6, 2018

@WeiZhang555 Sorry for the delay, was away for a couple of days.
In case of endpoints, if we want to go with minimal data that needs to stored, I think the following fields should be sufficient:

id - id used to pass the netdev option to qemu
index - to deduce the tap Name(if tap/macvtap interface is used) and the bridge Name if bridging is used.
ifName - the interface name

With this, we would need translation logic to go from this data to the actual endpoint type, since the endpoint structures are themselves composed of structures. (I think some of the endpoint structures needs to be revisited and can be simplified further.)

@sboeuf
Copy link
Copy Markdown

sboeuf commented Nov 6, 2018

@amshinde

id - id used to pass the netdev option to qemu

Nit: we should be careful not to mention QEMU specific things as part of the description of this persist API.

@caoruidong
Copy link
Copy Markdown
Member

@amshinde

id - id used to pass the netdev option to qemu

Nit: we should be careful not to mention QEMU specific things as part of the description of this persist API.

How about we make a rule to generate id used by qemu?

@WeiZhang555
Copy link
Copy Markdown
Member Author

I think id should be also persisted since we'll need it later for removing from qemu.


// Major, minor numbers for device.
Major int64
Minor int64
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With HostPath, do we still need DevType/Major/Minor?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Major/Minor is more important since this is compliant with OCI spec: https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#devices

Maybe we should remove the HostPath ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on how we plan on using the information. Right now we use it to reconstruct the device slice in memory. And we are not using HostPath directly, -- createDevice() looks for host path with Major/Minor pair instead. So yes, I agree we should remove HostPath.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard branch from c4024ea to 0011bf3 Compare November 7, 2018 07:39
@sboeuf
Copy link
Copy Markdown

sboeuf commented Nov 7, 2018

@WeiZhang555

I think id should be also persisted since we'll need it later for removing from qemu.

Yes I agree. My point was to avoid mentioning QEMU from the networking structures since it's supposed to be independent from the hypervisor.

@WeiZhang555
Copy link
Copy Markdown
Member Author

@sboeuf Oh, I get your point now. Yes, I agree with you!

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard branch from 0011bf3 to 3c7c66d Compare November 13, 2018 08:45
Fixes kata-containers#803

The disk persist data should be "versioned" and baselined, any modification in
persist data should be considered potential break of backward compatibility.

Signed-off-by: Wei Zhang <zhangwei555@huawei.com>
@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard branch from 3c7c66d to 0a99ce3 Compare November 21, 2018 03:31
@WeiZhang555
Copy link
Copy Markdown
Member Author

ping @kata-containers/runtime

@raravena80
Copy link
Copy Markdown
Member

Talked about in 12/17/18 meeting (kata-herding)

@jodh-intel
Copy link
Copy Markdown

Hi @WeiZhang555 - Thanks for raising! I think it would be interesting if you took this RFC a little further so we can see how this API might work. For example, you've created virtcontainers/persist/api/, but this would need to link up to virtcontainers/resource_storage.go somehow I think?

As noted on kata-containers/kata-containers#25, I think it is best not to rely on the version number and have the code which reads state off the disk for an X-1 version of Kata "fill in the gaps" to set defaults where it is sensible to do so (if it can't, that would be an error of course). One reason for doing this being that unless we are really careful, we may forget to increase CurPersistVersion when we change the on-disk API files. But that doesn't matter if the file parsing code can handle missing data itself.

@jodh-intel
Copy link
Copy Markdown

Also, as we start to restructure the codebase, what impact will this have on the persistence API design? See for example #1096.

@jodh-intel
Copy link
Copy Markdown

@WeiZhang555 - ah! I've just found #874 ;)

@jodh-intel
Copy link
Copy Markdown

Err, I mean I just found #883 ;)

@raravena80
Copy link
Copy Markdown
Member

@WeiZhang555 ping. Any updates?

@WeiZhang555
Copy link
Copy Markdown
Member Author

Everying is moved to #883 with implementation codes.
So closing this, thanks everyone for review, please move to #883 for more detailed info :-)

@WeiZhang555 WeiZhang555 deleted the rfc-persist-data-standard branch May 13, 2019 01:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants