Skip to content

[Proposal] Add batching support in kafka-indexing-service #4373

@himanshug

Description

@himanshug

It would benefit some users if kafka-indexing service supported the capability to let users put multiple druid InputRow inside single kafka record.
This allows users to do batching while still using kafka sync producer which allows only one kafka record at a time.

I would imagine adding following method to InputRowParser interface.

  default List<InputRow> parseBatch(T input)
  {
    return ImmutableList.of(parse(input));
  }

current InputRow parse(T input) method would be deprecated and all of druid code would be adjusted to use parseBatch(input) instead.

kafka-indexing service will need to persist <offset,row-number-in-record> pair instead of just offset to support exactly once. I believe, this will end up being the biggest change really.

This approach would add batching support in general and not just for kafka-indexing .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions