Skip to content

KafkaIndexTask can delete published segments on restart #6124

@jihoonson

Description

@jihoonson

This can happen in the following scenario.

  1. A kafka index task starts publishing segments.
  2. The task succeeds to publish segments and is stopped immediately (by restarting the machine).
  3. When the task is restored, it restores all sequences it kept in memory before restarting.
  4. After reading some more events from Kafka, the task tries to publish segments. These segments include the ones which were published before restarting because the restored sequences contain them.
  5. Since the segments which are published twice are already stored in metastore, the publish fails.
  6. The set of published segments in metastore is different from the set of segments the task is trying because the task read more data.
  7. The task thinks that the publish actually failed and removes the published segments from deep storage.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions