Skip to content

IngestSegmentFirehoseFactory race between tasks #3608

@gianm

Description

@gianm

IngestSegmentFirehoseFactory uses a fixed task id "reindex" when creating the toolbox factory that it uses to get segments. This creates a race condition where two tasks could actually download the same segment at the same time into the same directory, and one will get clobbered and fail.

One situation where this can happen easily is if you're reindexing a datasource into two datasources (maybe reducing to two different levels of grain). The tasks will proceed simultaneously, since there's no cross locking, but they'll be downloading the same segments for the input datasource.

One fix is having it actually use the real task id (which would need to get plumbed in). This has the advantage of putting the task files all together, and using the existing mechanisms for cleaning up task work directories.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions