Allow reordered segment allocation in kafka indexing service by jihoonson · Pull Request #5805 · apache/druid

jihoonson · 2018-05-25T13:27:09Z

Major changes are:

Enforce to have at most a single active segment per sequence and per interval (see SegmentsOfInterval)
Fix IndexerSQLMetadataStorageCoordinator.allocatePendingSegment() to respect skipSegmentLineageCheck to avoid the unique constraint violation for sequence_name_prev_id_sha1. If skipSegmentLineageCheck is true, sequence_prev_id is always an empty string and sequence_name_prev_id_sha1 is created as below.

    final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(
        Hashing.sha1()
               .newHasher()
               .putBytes(StringUtils.toUtf8(sequenceName))
               .putByte((byte) 0xff)
               .putLong(interval.getStartMillis())
               .putLong(interval.getEndMillis())
               .hash()
               .asBytes()
    );

skipSegmentLineageCheck can be still false for backward compatibility.

This change is

gianm · 2018-05-29T15:55:11Z

+    void setAppendingSegment(SegmentWithState appendingSegment)
+    {
+      // There should be only one appending segment at any time
+      Preconditions.checkState(this.appendingSegment == null);


Please include an error message here. (Probably a "WTF?!" message if it should never happen.)

gianm · 2018-05-29T15:57:25Z

+      this.appendingSegment = appendingSegment;
+    }
+
+    void addAppendFinishedSegment(SegmentWithState appendFinishedSegment)


Is this only supposed to be using during bootstrapping (startJob)? It doesn't seem like it would make sense otherwise. It could be clearer if this was made into a constructor instead: something that takes a list of initial segments. (Up to you though - this is just a suggestion)

Sounds good. Fixed.

gianm · 2018-05-30T05:40:42Z

+    // UNIQUE key for the row, ensuring sequences do not fork in two directions.
+    // Using a single column instead of (sequence_name, sequence_prev_id) as some MySQL storage engines
+    // have difficulty with large unique keys (see https://github.com/druid-io/druid/issues/2319)
+    final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(


There's a bit too much code duplication here. Please share some more code between this method and the other similar one. I know it is slightly different, but it seems close enough that it could be shared. Perhaps take a string for the secondary key and have that either be the previousId (in one path) or the interval (in another path).

Refactored.

gianm · 2018-05-30T05:40:50Z

+    // Avoiding ON DUPLICATE KEY since it's not portable.
+    // Avoiding try/catch since it may cause inadvertent transaction-splitting.
+
+    // UNIQUE key for the row, ensuring sequences do not fork in two directions.


This comment is not accurate (its purpose is no longer "ensuring sequences do not fork in two directions"; it changed so now its purpose is to ensure we don't have more than one segment per sequence per interval).

gianm · 2018-05-30T05:41:15Z

+               .asBytes()
+    );
+
+    handle.createStatement(


This code seems shareable too.

gianm · 2018-05-30T05:46:05Z

  ) throws IOException
  {
-    return append(row, sequenceName, null, false, true);
+    return append(row, sequenceName, null, true, true);


Why is the BatchAppenderatorDriver skipping the lineage check now? I thought it could still make more than one segment per interval if it's running in non-incremental-publishing mode.

My bad. Thanks.

gianm · 2018-05-30T05:47:37Z

-        .filter(segmentWithState -> segmentWithState.getState() == SegmentState.APPENDING)
-        .map(SegmentWithState::getSegmentIdentifier)
-        .collect(Collectors.toList());
+    final Map<SegmentIdentifier, SegmentWithState> requestedSegmentIdsForSequences = getAppendingSegments(sequenceNames)


What is the reason for moving the creation of requestedSegmentIdsForSequences from after the push, to before the push? Is it fixing something?

I think it shouldn't fix anything, but is more reliable and understandable.

gianm

LGTM, thanks @jihoonson!

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

…ce (#5943) * Allow reordered segment allocation in kafka indexing service (#5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug * commit remaining changes

Allow reordered segment allocation in kafka indexing service

b02daf0

jihoonson added Bug Area - Streaming Ingestion labels May 25, 2018

gianm reviewed May 30, 2018

View reviewed changes

jihoonson added 2 commits May 31, 2018 16:24

address comments

1dc24ef

fix a bug

e667d21

gianm approved these changes Jul 2, 2018

View reviewed changes

gianm merged commit b6c957b into apache:master Jul 2, 2018

jihoonson added this to the 0.12.2 milestone Jul 3, 2018

jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 3, 2018

Allow reordered segment allocation in kafka indexing service (apache#…

cce1269

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

jihoonson added a commit to jihoonson/druid that referenced this pull request Jul 5, 2018

Allow reordered segment allocation in kafka indexing service (apache#…

504ca80

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

jihoonson mentioned this pull request Jul 5, 2018

[Backport] Allow reordered segment allocation in kafka indexing service #5943

Merged

jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 5, 2018

Allow reordered segment allocation in kafka indexing service (apache#…

a03137c

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

jihoonson mentioned this pull request Aug 6, 2018

Druid 0.12.2 release notes #6116

Closed

Conversation

jihoonson commented May 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jihoonson commented May 25, 2018 •

edited

Loading