Skip to content

Deep storage "fsck"-like tool #5716

@gianm

Description

@gianm

Druid ingestion can leave loose segments lying around on deep storage in situations where an ingestion job has pushed some segments but then fails before those segments can be published to the metadata store. It would be nice to have a tool that cleans these up.

The tool would need to be careful not to get rid of any segments that were recently created, since they may just not have been inserted into the metadata store yet.

See also: discussion on #5692. Note that the change on #5692 is not believed to meaningfully increase the amount of loose segments. Even though it involves creating segments at unique-per-task paths, it cleans them up when it notices they could not be published. It will only leave loose segments if the task fails before it can do this cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions