This can happen in the following scenario.
- A kafka index task starts publishing segments.
- The task succeeds to publish segments and is stopped immediately (by restarting the machine).
- When the task is restored, it restores all sequences it kept in memory before restarting.
- After reading some more events from Kafka, the task tries to publish segments. These segments include the ones which were published before restarting because the restored sequences contain them.
- Since the segments which are published twice are already stored in metastore, the publish fails.
- The set of published segments in metastore is different from the set of segments the task is trying because the task read more data.
- The task thinks that the publish actually failed and removes the published segments from deep storage.
This can happen in the following scenario.