In the following scenario, the committed datasource metadata will not match the segments:
- task starts at offset X
- supervisor tells it to stop at Y
- task uploads some of its segments but fails halfway through
- new task starts at X again
- supervisor tells new task to end at Z
Since this task will have the same sequence name as the previous failed one, it will reuse the same segment IDs. The content of these segments will not be the same as the ones previously uploaded before the failure (read to offset Z vs offset Y).
The issue arises when the task goes to push the segment to deep storage and commit metadata. Because some of the segments were already written to deep storage, these ones don't get written again and the old segments (that stopped at Y) remain. This operation succeeds so the datasource metadata is written with the ending offset as Z which doesn't match what is contained in the segments.
A possible solution is to allow the deep storage pushers to be configured to overwrite existing segments. This needs to be a configurable option because in the case of Tranquility replicas, generated segments are not guaranteed to be identical and you want to avoid a situation where historicals have loaded different versions of the same segment.
In the case of exactly-once ingestion using the datasource table for tracking stream position, segments generated by replicas will be the same so overriding will not be problematic and will prevent this issue from happening.
In the following scenario, the committed datasource metadata will not match the segments:
Since this task will have the same sequence name as the previous failed one, it will reuse the same segment IDs. The content of these segments will not be the same as the ones previously uploaded before the failure (read to offset Z vs offset Y).
The issue arises when the task goes to push the segment to deep storage and commit metadata. Because some of the segments were already written to deep storage, these ones don't get written again and the old segments (that stopped at Y) remain. This operation succeeds so the datasource metadata is written with the ending offset as Z which doesn't match what is contained in the segments.
A possible solution is to allow the deep storage pushers to be configured to overwrite existing segments. This needs to be a configurable option because in the case of Tranquility replicas, generated segments are not guaranteed to be identical and you want to avoid a situation where historicals have loaded different versions of the same segment.
In the case of exactly-once ingestion using the datasource table for tracking stream position, segments generated by replicas will be the same so overriding will not be problematic and will prevent this issue from happening.