Skip to content

[PROPOSAL] Support only finiteFirehose for native batch ingestion #7071

@jihoonson

Description

@jihoonson

Motivation

Currently native batch tasks (local and parallel index tasks) support any firehose implementation. However, it isn't very useful when firehose is an infinite one because they don't have any context about stream ingestion.

Proposed changes

I propose to change the type of firehose of IndexIOConfig and ParallelIndexIOConfig from FirehoseFactory to FiniteFirehoseFactory.

Rationale

FiniteFirehoseFactory is designed for any type of batch ingestion. It assumes that input data is finite (and provides an optional hint for parallel indexing). It makes more sense to support only FiniteFirehoseFactory for native batch tasks rather than improve them to support any kind of firehoseFactory which may be designed for stream input data.

Operational impact

There's no change in the task spec because the variable name isn't changed.

Custom firehoseFactory implementations for native batch tasks need to be updated.

Future work

This change effectively makes native batch tasks to support only text file formats by default because all implementations of FiniteFirehoseFactory are using StringInputRowParser. #5584 should be solved to support various file formats.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions