Skip to content

Add a new RFC for cloud event data mapping#3

Merged
ctron merged 2 commits into
mainfrom
feature/cloudevents_1
May 20, 2021
Merged

Add a new RFC for cloud event data mapping#3
ctron merged 2 commits into
mainfrom
feature/cloudevents_1

Conversation

@ctron
Copy link
Copy Markdown
Member

@ctron ctron commented Feb 5, 2021

This is work in progress. You are welcome to comment, propose changes, or even for the repository and create PRs for the branch this PR is based on.

Also see: drogue-iot/drogue-cloud#29

@ctron ctron added the help wanted Extra attention is needed label Feb 5, 2021
@garyedwards
Copy link
Copy Markdown

Hi @ctron

I have done a bit more digging with the GCP equivalent (by no means exemplar), UDMI envelope as well as other examples are on the CloudEvents adapters page.

Google adopt a layered approach where eventarc provides the CloudEvents headers, followed by their own custom event data format generated by the MQTT endpoint which is actually part of the payload including their additional IoT attributes followed by the payload itself. I think if Drogue can populate the CloudEvents headers in a more complete way the middle event data format can be removed.

A lot of this information can be included in the source attribute as per your draft document. This could mitigate the need for the deviceid extension attribute. Regarding the existing device_id attribute I believe if it is kept the snake case should be dropped as per cloudevents/spec#321.

It looks like the type should be something Drogue IoT specific e.g. io.drogue.message_published.

The subject could be passed through from the subFolder property of the MQTT topic as per the UDMI envelope. This could include things like telemetry and command as used by Hono or pointset, config, metadata etc. as used by UDMI.

Examples below are in json but the CloudEvents headers would most likely be sent to knative as HTTP headers.

Proposed Drogue IoT:

{
  "specversion": "1.0",
  "id": "b280e845-b3c5-458f-af84-896f90bd2019",
  "source": "//example.com/my-tenant/my-device",
  "type": "io.drogue.message_published",
  "datacontenttype": "application/json",
  "time": "2021-02-11T13:52:25.646Z",
  "subject": "pointset",
  "dataschema": "https://raw.githubusercontent.com/faucetsdn/udmi/1.3.6/schema/event_pointset.json",
  "data": {
    "version": 1,
    "timestamp": "2021-02-11T13:52:25Z",
    "points": {
      "recalcitrant_angle": { "present_value": 40 },
      "faulty_finding": { "present_value": false },
      "superimposition_reading": { "present_value": 72 }
    }
  }
}

GCP IoT Core + eventarc:

{
 "specversion": "1.0",
 "id": "2008878765719495",
 "source": "//pubsub.googleapis.com/projects/my-project/topics/my-iot-project",
 "type": "google.cloud.pubsub.topic.v1.messagePublished",
 "datacontenttype": "application/json",
 "time": "2021-02-11T13:52:25.646Z",
 "data": {
   "message": {
     "attributes": {
       "deviceId": "my-device",
       "deviceNumId": "2852723095039291",
       "deviceRegistryId": "my-iot-project",
       "deviceRegistryLocation": "europe-west1",
       "projectId": "my-project",
       "subFolder": "pointset"
     },
     "data": {
       "version": 1,
       "timestamp": "2021-02-11T13:52:25Z",
       "points": {
         "recalcitrant_angle": { "present_value": 40 },
         "faulty_finding": { "present_value": false },
         "superimposition_reading": { "present_value": 72 }
       }
     },
     "messageId": "2008878765719495",
     "message_id": "2008878765719495",
     "publishTime": "2021-02-11T13:52:25.646Z",
     "publish_time": "2021-02-11T13:52:25.646Z"
   },
   "subscription": "projects/my-project/subscriptions/eventarc-europe-west1-trigger-pubsub-sub-249"
 }
}

| `source` | ✓ | The [device id](#device-id) |
| `specversion` | ✓ | Always contains `1.0` |
| `type` | ✓ | ?? |
| `datacontenttype` | ✓ | The mime type of the payload. As provided by the device. Defaults to `application/octet-stream`. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a default, maybe it should not be required? According to cloudevents spec it is optional.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a cloud events perspective, it is optional. I wanted to make it mandatory from our perspective. So that consuming application can rely on that attribute.

@ctron
Copy link
Copy Markdown
Member Author

ctron commented Feb 16, 2021

@garyedwards This is great, I really like it!

What do you think about adding application (the field formerly known as tenant) and device additionally to the extension attributes? I understand that it would not be necessary, but that way you can directly extract them (e.g. for filtering) without the need to split strings.

I already changed device_id to device in the current main branch. That was a mistake, one that the rust cloudevents SDK didn't spot, but the Java SDK rejected 😁

@ctron
Copy link
Copy Markdown
Member Author

ctron commented Feb 16, 2021

I also think that we should support the portioning extension: https://github.com/cloudevents/spec/blob/v1.0.1/extensions/partitioning.md … and probably fill it with a app/device ID combination.

@garyedwards
Copy link
Copy Markdown

garyedwards commented Feb 16, 2021

I think device and application would make sense to include to allow filtering to be as simple as possible.

The UDMI envelope.json does a pretty good job at picking out useful additional attributes for IoT.

I am not sure if projectId and deviceNumId make sense in Drogue but could be pretty handy if available.

I have not used partitioning, it looks like it would be specific to the internal workings of Drogue. I guess if used it would also make sense to include the aforementioned projectID if available. How would it look different from the source?

Regarding naming, you could follow UDMI and just drop the camel case. I think your proposed naming looks cleaner but could be a XKCD 927 situation. I guess other systems use different naming anyway and it would be simple to map between:

  • projectId = instance?? Id of Drogue instance? Maybe the hostname although this may differ per endpoint?
  • deviceRegistryId = application
  • deviceNumId = deviceuuid?? Does Drogue give a device a UUID in addition the name provided on creation?
  • deviceId = device
  • subFolder = subject part of the main ca spec derived from MQTT topic as per previous comment.

Drogue flavour:

{
  "specversion": "1.0",
  "id": "b280e845-b3c5-458f-af84-896f90bd2019",
  "source": "//example.com/my-application/my-device",
  "type": "io.drogue.message_published",
  "datacontenttype": "application/json",
  "time": "2021-02-11T13:52:25.646Z",
  "subject": "pointset",
  "dataschema": "https://raw.githubusercontent.com/faucetsdn/udmi/1.3.5/schema/event_pointset.json",
  "instance": "example.com",
  "application": "my-application",
  "device": "my-device",
  "deviceuuid": "40b6a495-b92c-4a71-b5f2-ea8b9ac6fbcf",
  "data": {
    "version": 1,
    "timestamp": "2021-02-11T13:52:25Z",
    "points": {
      "recalcitrant_angle": { "present_value": 40 },
      "faulty_finding": { "present_value": false },
      "superimposition_reading": { "present_value": 72 }
    }
  }
}

Drogue UDMI flavour:

{
  "specversion": "1.0",
  "id": "b280e845-b3c5-458f-af84-896f90bd2019",
  "source": "//example.com/my-application/my-device",
  "type": "io.drogue.message_published",
  "datacontenttype": "application/json",
  "time": "2021-02-11T13:52:25.646Z",
  "subject": "pointset",
  "dataschema": "https://raw.githubusercontent.com/faucetsdn/udmi/1.3.5/schema/event_pointset.json",
  "projectid": "example.com",
  "deviceregistryid": "my-application",
  "deviceid": "my-device",
  "devicenumid": "40b6a495-b92c-4a71-b5f2-ea8b9ac6fbcf",
  "data": {
    "version": 1,
    "timestamp": "2021-02-11T13:52:25Z",
    "points": {
      "recalcitrant_angle": { "present_value": 40 },
      "faulty_finding": { "present_value": false },
      "superimposition_reading": { "present_value": 72 }
    }
  }
}

@lulf
Copy link
Copy Markdown
Member

lulf commented Feb 16, 2021

I also think that we should support the portioning extension: https://github.com/cloudevents/spec/blob/v1.0.1/extensions/partitioning.md … and probably fill it with a app/device ID combination.

In most messaging systems, messages within a partition provides guaranteed ordering of events, so I think a particular application consuming events will prefer some notion of ordering within an app + device.

@ctron
Copy link
Copy Markdown
Member Author

ctron commented Feb 18, 2021

@garyedwards This is great input! I really appreciate it. I hope this goes into the right direction for you as well!

Regarding naming, you could follow UDMI and just drop the camel case. I think your proposed naming looks cleaner but could be a XKCD 927 situation.

Yea, there are lots of other mappings. If considering UDMI, then we would also need to evaluate other standards, and find a common denominator. And I am not sure if that time is well invested. I think we should spend the time on making this look nice in the context of Drogue, and then spend some time on exemplary mappings to other standards. Same for OPC UA, probably Sparkplug, and a "few" others 😀

Does Drogue give a device a UUID in addition the name provided on creation?

Currently no. But that might be an interesting idea. Currently the ID of the device is assigned by the user. However, every device has a set of aliases, one being the ID itself. That allows you to create and replace a device later on, keeping the ID stable on the cloud side. If we would make the initial ID optional, and put a UUID in place, then we would have the same feature I guess. Which means that we would need to put in fields in the event for "reported device id" (as reported by the device) and "stored device id" (as stored in the registry). The "stored" ID should go out on the cloud side, while we could add the "reported" to the events as well. That should allow the device and deviceuuid fields, if needed.

I am not sure if projectId and deviceNumId make sense in Drogue but could be pretty handy if available.

I am not sure what the meaning of deviceRegistryId is, or projectid. It sounds a bit like project would be the application. And device registry would be the instance. In any case, if we add some static "instance id", then we would have both values, and could map them to e.g. UDMI. So the new information would be instance I guess.

As Ulf mentioned, the partionkey is more an internal field, which we would need to set for Kafka to keep an order that makes sense for the consumer. A combination of application/device makes sense, as that would mean that an application would receive messages in order per-device. When we move to a per-application Kafka topic in the future, we could drop the application part from the key and simply use the device ID as key.

I guess two things we didn't really think about are timestamp (source, received, …) and gateways. Currently there is no "received by gateway X" indication in the event. As a gateway is just another device in the same application, I guess a simple gateway (ID) field would be sufficient.

@garyedwards
Copy link
Copy Markdown

@ctron defiantly going in the right direction, this is a really great project. Agree there are many standard, better to make it work for Drogue and map later. I think projectid = instance and deviceRegistryId = application per the Drogue glossary so instance would be the new extension attribute as you say:

  • instance
  • application
  • device

deviceuuid would be a nice to have depending on how data is being used downstream. It sounds like the registry is not set up in this way at the moment and application/device would be unique anyway.

I think time is handled correctly at the moment being when the message is received by Drogue. If the device generates a timestamp this should be in the payload which Drogue will not necessarily understand. For the gateway this could refer to the "authenticated device" which would always be populated by the gateway or device. This may be less legible though and gateway is just turned on and off in the attributes as needed.

I did notice that a lot of my suggestion where already implemented in 0.2.0 so have re invented the wheel a bit in my comments. They seem to generally match which is comforting. Looking forward to spinning up 0.3.0 for a test run on GKE and k3s.

@ctron
Copy link
Copy Markdown
Member Author

ctron commented Feb 19, 2021

I think time is handled correctly at the moment being when the message is received by Drogue. If the device generates a timestamp this should be in the payload which Drogue will not necessarily understand.

Good point 👍

For the gateway this could refer to the "authenticated device" which would always be populated by the gateway or device. This may be less legible though and gateway is just turned on and off in the attributes as needed.

That is a great idea! So we would have device and gateway … where gateway would be equal to device if there is no gateway involved. Which might make it hard to understand :) … however, if we find a proper name for this field, then this would be a much better overall solution I think. authenticateddevice feels a bit too much though … via could be a candidate. peer, `by, … not sure …

We already had a discussion in the chat about writing a "book", or at least some pages of documentation. I think this would help, as "documentation" is currently spread all over the place. And is hard to track down.

Looking forward to spinning up 0.3.0 for a test run on GKE and k3s.

Awesome! I would love some feedback!

@dejanb
Copy link
Copy Markdown
Member

dejanb commented Feb 23, 2021

That is a great idea! So we would have device and gateway … where gateway would be equal to device if there is no gateway involved. Which might make it hard to understand :) … however, if we find a proper name for this field, then this would be a much better overall solution I think. authenticateddevice feels a bit too much though … via could be a candidate. peer, `by, … not sure …

If it's hard to find a new name, I think gateway would work just fine.

@ctron
Copy link
Copy Markdown
Member Author

ctron commented Mar 3, 2021

Sorry for the long delay. However, I already started to adopt the ideas from this RFC in a development branch of mine:

  • I added the instance field
  • I also used the suggested field names: instance, application, device
  • I added an internal ID (borrowed uid from Kubernetes) to implement the functionality that @garyedwards described, to have an internal and a stable id. That helps detecting a delete + create operation with the same name, and also makes it easier add "names" to the entity. The special use of the id field did feel a bit awkward in the past.

I need to finish up a few things, and will come back to this in a few days.

@ctron
Copy link
Copy Markdown
Member Author

ctron commented Mar 5, 2021

That is a great idea! So we would have device and gateway … where gateway would be equal to device if there is no gateway involved. Which might make it hard to understand :) … however, if we find a proper name for this field, then this would be a much better overall solution I think. authenticateddevice feels a bit too much though … via could be a candidate. peer, `by, … not sure …

If it's hard to find a new name, I think gateway would work just fine.

So yes, that would work. However is would somehow imply that the event traveled through a gateway, which wouldn't be the case.

Taking a look at SMTP, we have the Received header: https://tools.ietf.org/html/rfc5321#section-3.7.2 … However, it doesn't feel like a good fit. It is more a debugging tool, each mail gateway adding itself to the list of hops. So that would more the gateway adding itself, rather than the cloud side adding the gateway.

However, peeking at the spec if notices the header sender, which sounded like a great match to me. In indicates who "sent" the message. That could be a gateway, the device itself, another service.

So my proposal would be to use the sender field for that, always filling this with the ID/name of the device that sent the information.

@ctron ctron force-pushed the feature/cloudevents_1 branch from 124d675 to de3aa40 Compare March 5, 2021 08:53
@ctron
Copy link
Copy Markdown
Member Author

ctron commented Mar 5, 2021

I found some time (and need 😁) to update the RFC, integrating the comments/feedback. Thanks for that!

I think this looks good. I am only not completely sure about the *uid variants of application and device. Because not only the device has a UID, but the application too. It adds more the overall event size, it is worth it? I don't know. More information is better. Less bytes too :)

I still kept it in there, as I don't see another way to obtain this information through a different way.

ctron added a commit to drogue-iot/drogue-vorto-converter that referenced this pull request Mar 5, 2021
@garyedwards
Copy link
Copy Markdown

I am only not completely sure about the *uid variants of application and device

I think deviceuid is the key one. application and instance are likely to be more limited in number and be more deliberately controlled / named when compared to devices. Ideally care is taken when naming everything... I think it is reasonable to keep deviceuid and drop the other *uid to keep the size down. Also you can find the application and instance from the device registry with the deviceuid but not the other way around.

@ctron
Copy link
Copy Markdown
Member Author

ctron commented Apr 6, 2021

Ok … so I think 0.4.0 already contains most of the things we discussed here :)

Only the deviceuid is currently missing, but I think we could easily add this in 0.5.

@garyedwards Unless the uid variant is super important to you, I would encourage you to give 0.4 (or the sandbox) a try and see if that goes into the right direction.

@ctron ctron marked this pull request as ready for review May 6, 2021 15:17
@ctron ctron merged commit 23d08dd into main May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants