Skip to content

Clarification regarding Appenderator's behavior of persist subset of segments #4781

@pjain1

Description

@pjain1

AppenderatorImpl provides persist(Collection<SegmentIdentifier> identifiers, Committer committer) method which allows to persist only the segments passed in as input argument. So, if this method is called with subset of segments that AppenderatorImpl is handling then in that case the commitHydrants metadata persisted on disk will only have hydrants corresponding to this subset of segments. Is this expected behavior or the persist should try to merge previously committed hydrant information with the currently committing hydrants ?

To give some more context, when KafkaIndexTask allows incremental handoffs then it might call persist with subset of segments that are being published. There are two ways to handle this case -

  1. Make sure the currently committed hydrants metadata gets merged with the previously committed metadata, also make sure that the caller commit metadata (next partitionOffsets information) is in line with what is being committed. This is what I have done for now, but it can get a little confusing to understand.
  2. Keep the current behavior of persist in which it just overwrites previously committed metadata. To enable publishing subset of segments, add a new method pushWithoutPersist in Appenderator that pushes without persisting (currently persistAll is called implicitly), in this case make sure to call persistAll explicitly from the task before calling pushWithoutPersist so that it is guaranteed that the segments that needs to be published are all persisted.

Any opinions - @gianm @jihoonson

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions