Skip to content

RFC: Design notes for integration syncing #1189

@joshsmith

Description

@joshsmith

Problem

Even with only GitHub integration, we have been having difficulties with the sync up/down to the external service.

  • Writes are coupled to the external API
  • Multiple task and comment records are created
  • Records become easily out of sync

Our basic goals:

  • Eventual consistency to the destination records in our database
  • Eventual consistency to the external APIs
  • Events to/from the external APIs are fully concurrent:
    • enqueued in order
    • dropped when more recent events come in
    • restartable when errored

Each Event can have a:

  • direction - :inbound | :outbound
  • integration_ fields
    • integration_external_id - the id of the integration resource from the external provider
    • integration_updated_at - the last updated at timestamp of the integration resource from the external provider
    • integration_record_id - the id of our cached record for the resource
    • integration_record_type - the type our cached record for the resource as the table name
  • record_ fields
    • record_id - the id of the record for the resource connected to this integration
    • record_type - the type of the record for the resource connected to this integration as the table name
  • canceled_by - the id of the Event that canceled this one
  • duplicate_of - the id of the Event that this is a duplicate of
  • ignored_for_id - the id of the record that caused this event to be ignored
  • ignored_for_type - the type of the record (table name) that caused this event to be ignored
  • state - :queued | :processing | :completed | :errored | :canceled | :ignored | :duplicate | :disabled

We may want our own writes to our own records, even without integrations, to also go through this process. Not sure.

When an event comes in we should:

  • check if there is any event for the integration_external_id where:
    • the integration_updated_at is after our event's last updated timestamp (limit 1)
      • if yes, set state to :ignored and stop processing, set ignored_for_id to the id of the event in the limit 1 query and ignored_for_type to this event table's name
    • the integration_updated_at timestamp for the relevant record_ is equal to our event's last updated timestamp (limit 1)
      • if yes, set state to :duplicate and stop processing, set duplicate_of to the id of the event in the limit 1
    • the modified_at timestamp for the relevant record_ is after our event's last updated timestamp
      • if yes, set state to :ignored and stop processing, set ignored_for_id to the record_id and ignored_for_type to the record_type
  • check if there are any events for the integration_external_id where:
    • integration_updated_at is before our event's last updated timestamp
      • if yes, set state of those events to :canceled and set canceled_by to the id of this event
  • check if there is any other :queued event or :processing event for the integration_external_id
    • if yes, set state to :queued
  • when :processing, create or update the relevant record matching record_id and record_type through the relationship on the record for integration_record_id and integration_record_type
  • when :completed, kick off process to look for next :queued item where the integration_updated_at timestamp is the oldest

We would also need within the logic for updating the given record to check whether the record's updated timestamp is after the event's timestamp. If it is, then we need to bubble the changeset validation error and mark the event :ignored as above.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions