Druid ingestion can leave loose segments lying around on deep storage in situations where an ingestion job has pushed some segments but then fails before those segments can be published to the metadata store. It would be nice to have a tool that cleans these up.
The tool would need to be careful not to get rid of any segments that were recently created, since they may just not have been inserted into the metadata store yet.
See also: discussion on #5692. Note that the change on #5692 is not believed to meaningfully increase the amount of loose segments. Even though it involves creating segments at unique-per-task paths, it cleans them up when it notices they could not be published. It will only leave loose segments if the task fails before it can do this cleanup.
Druid ingestion can leave loose segments lying around on deep storage in situations where an ingestion job has pushed some segments but then fails before those segments can be published to the metadata store. It would be nice to have a tool that cleans these up.
The tool would need to be careful not to get rid of any segments that were recently created, since they may just not have been inserted into the metadata store yet.
See also: discussion on #5692. Note that the change on #5692 is not believed to meaningfully increase the amount of loose segments. Even though it involves creating segments at unique-per-task paths, it cleans them up when it notices they could not be published. It will only leave loose segments if the task fails before it can do this cleanup.