Skip to content

Pubsub: add DirectRunner support for id_label and timestamp_attribute in Python SDK #18939

@kennknowles

Description

@kennknowles

At least for publishing (and maybe pulling) messages, non-Dataflow-based sources and sinks for Pub/Sub use the public API for Pub/Sub, which doesn't support id_label and timestamp_attribute settings.

Publishing:
id_label - add an attribute to each message with a unique value
timestamp_attribute - add an attribute to each message with the publishing time as its value

Pulling:
id_label - use the value of this message attribute to deduplicate messages
timestamp_attribute - use the value of this message attribute as the element's timestamp

 

Implementation details: could probably create a pubsubio.py module, for reuse with other runners (i.e. implement Pub/Sub IO as PTransforms and not NativeSinks and Sources).

Imported from Jira BEAM-4275. Original Jira may contain additional context.
Reported by: udim.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions