Skip to content

Conversation

@thiagotnunes
Copy link
Contributor

@thiagotnunes thiagotnunes commented Jan 14, 2022

Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions from change streams and process them accordingly. This component receives a change stream name, a partition, a start time and an end time to query. It then initiates a change stream query with the received parameters.

Within a change stream, 3 types of records can be received:

  1. A Data record
  2. A Heartbeat record
  3. A Child partitions record

Upon receiving (1), the function updates the watermark with the record's commit timestamp and emits the record into the output PCollection.
Upon receiving (2), the function updates the watermark with the record's timestamp, but it does not emit any record into the PCollection.
Upon receiving (3), the function updates the watermark with the record's timestamp and writes the new child partitions into the metadata table. These partitions will be later scheduled by the DetectNewPartitions component.

Once the change stream query for the element partition finishes, it marks the partition as finished in the metadata table and terminates.

Adds ReadChangeStreamPartitionDoFn, which is an SDF to read partitions
from change streams and process them accordingly. This component
receives a change stream name, a partition, a start time and an end time
to query. It then initiates a change stream query with the received
parameters.

Within a change stream, 3 types of records can be received:

1. A Data record
2. A Heartbeat record
3. A Child partitions record

Upon receiving #1, the function updates the watermark with the record's
commit timestamp and emits the record into the output PCollection.
Upon receiving #2, the function updates the watermark with the record's
timestamp, but it does not emit any record into the PCollection.
Upon receiving #3, the function updates the watermark with the record's
timestamp and writes the new child partitions into the metadata table.
These partitions will be later scheduled by the DetectNewPartitions
component.

Once the change stream query for the element partition finishes, it
marks the partition as finished in the metadata table and terminates.
@thiagotnunes
Copy link
Contributor Author

retest this please

@thiagotnunes
Copy link
Contributor Author

thiagotnunes commented Jan 14, 2022

R: @pabloem

@pabloem
Copy link
Member

pabloem commented Jan 21, 2022

ah this PR is surprisingly easy to follow. I think it makes sense to me. LGTM

@pabloem pabloem merged commit f43789a into apache:master Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants