Fix supported file formats for Hadoop vs Native batch doc#7069
Fix supported file formats for Hadoop vs Native batch doc#7069fjy merged 3 commits intoapache:masterfrom
Conversation
| | Supported partitioning methods | [Both Hash-based and range partitioning](http://druid.io/docs/latest/ingestion/hadoop.html#partitioning-specification) | N/A | Hash-based partitioning (when `forceGuaranteedRollup` = true) | | ||
| | Supported input locations | All locations accessible via HDFS client or Druid dataSource | All implemented [firehoses](./firehose.html) | All implemented [firehoses](./firehose.html) | | ||
| | Supported file formats | All implemented Hadoop InputFormats | Currently only text file format (CSV, TSV, JSON) | Currently only text file format (CSV, TSV, JSON) | | ||
| | Supported file formats | All implemented Hadoop InputFormats | Currently text file formats (CSV, TSV, JSON) by default. Any custom implmentation can be supported. See [FiniteFirehoseFactory](https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/data/input/FiniteFirehoseFactory.java) for custom implementation. | Currently text file formats (CSV, TSV, JSON) by default. Any custom implementation can be supported. See [FiniteFirehoseFactory](https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/data/input/FiniteFirehoseFactory.java) for custom implementation. | |
There was a problem hiding this comment.
I don't think the final column should refer to FiniteFirehoseFactory? It doesn't split.
There was a problem hiding this comment.
Please check my comment. FiniteFirehoseFactory is for any type of batch indexing and split is just optional hint for parallel indexing. I think it's better to recommend to use it for local index task as well because it assumes that input data is finite as opposed to FirehoseFactory.
There was a problem hiding this comment.
I see.
Looking through the built in hierarchy, IngestSegment is the only FirehoseFactory that seems like you'd want to be able to use it from a local index task that isn't a FiniteFirehoseFactory, and we are fixing that.
There was a problem hiding this comment.
Right. Thanks for your understanding! I also raised a proposal about this: #7071.
There was a problem hiding this comment.
Suggest making it more clear that this would be done with an extension, maybe something like:
Additional formats can be added though a [custom extension](link to extension making docs) implementing [`FiniteFirehoseFactory`](https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/data/input/FiniteFirehoseFactory.java)
* Fix supported file formats * address comment
See the discussion at #7044 (comment).