It would benefit some users if kafka-indexing service supported the capability to let users put multiple druid InputRow inside single kafka record.
This allows users to do batching while still using kafka sync producer which allows only one kafka record at a time.
I would imagine adding following method to InputRowParser interface.
default List<InputRow> parseBatch(T input)
{
return ImmutableList.of(parse(input));
}
current InputRow parse(T input) method would be deprecated and all of druid code would be adjusted to use parseBatch(input) instead.
kafka-indexing service will need to persist <offset,row-number-in-record> pair instead of just offset to support exactly once. I believe, this will end up being the biggest change really.
This approach would add batching support in general and not just for kafka-indexing .
It would benefit some users if kafka-indexing service supported the capability to let users put multiple druid InputRow inside single kafka record.
This allows users to do batching while still using kafka sync producer which allows only one kafka record at a time.
I would imagine adding following method to
InputRowParserinterface.current
InputRow parse(T input)method would be deprecated and all of druid code would be adjusted to useparseBatch(input)instead.kafka-indexing service will need to persist <offset,row-number-in-record> pair instead of just offset to support exactly once. I believe, this will end up being the biggest change really.
This approach would add batching support in general and not just for kafka-indexing .