Motivation
Currently native batch tasks (local and parallel index tasks) support any firehose implementation. However, it isn't very useful when firehose is an infinite one because they don't have any context about stream ingestion.
Proposed changes
I propose to change the type of firehose of IndexIOConfig and ParallelIndexIOConfig from FirehoseFactory to FiniteFirehoseFactory.
Rationale
FiniteFirehoseFactory is designed for any type of batch ingestion. It assumes that input data is finite (and provides an optional hint for parallel indexing). It makes more sense to support only FiniteFirehoseFactory for native batch tasks rather than improve them to support any kind of firehoseFactory which may be designed for stream input data.
Operational impact
There's no change in the task spec because the variable name isn't changed.
Custom firehoseFactory implementations for native batch tasks need to be updated.
Future work
This change effectively makes native batch tasks to support only text file formats by default because all implementations of FiniteFirehoseFactory are using StringInputRowParser. #5584 should be solved to support various file formats.
Motivation
Currently native batch tasks (local and parallel index tasks) support any firehose implementation. However, it isn't very useful when firehose is an infinite one because they don't have any context about stream ingestion.
Proposed changes
I propose to change the type of
firehoseofIndexIOConfigandParallelIndexIOConfigfromFirehoseFactorytoFiniteFirehoseFactory.Rationale
FiniteFirehoseFactoryis designed for any type of batch ingestion. It assumes that input data is finite (and provides an optional hint for parallel indexing). It makes more sense to support onlyFiniteFirehoseFactoryfor native batch tasks rather than improve them to support any kind of firehoseFactory which may be designed for stream input data.Operational impact
There's no change in the task spec because the variable name isn't changed.
Custom firehoseFactory implementations for native batch tasks need to be updated.
Future work
This change effectively makes native batch tasks to support only text file formats by default because all implementations of
FiniteFirehoseFactoryare usingStringInputRowParser. #5584 should be solved to support various file formats.