KafkaIndexTask can delete published segments on restart

This can happen in the following scenario.

1. A kafka index task starts publishing segments.
2. The task succeeds to publish segments and is stopped immediately (by restarting the machine).
3. When the task is restored, it restores all sequences it kept in memory before restarting.
4. After reading some more events from Kafka, the task tries to publish segments. These segments include the ones which were published before restarting because the restored sequences contain them.
5. Since the segments which are published twice are already stored in metastore, the publish fails.
6. The set of published segments in metastore is different from the set of segments the task is trying because the task read more data.
7. The task thinks that the publish actually failed and removes the published segments from deep storage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KafkaIndexTask can delete published segments on restart #6124

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

KafkaIndexTask can delete published segments on restart #6124

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions