Skip to content

Allow reordered segment allocation in kafka indexing service#5805

Merged
gianm merged 3 commits intoapache:masterfrom
jihoonson:fix-kis-non-determ-seg-order
Jul 2, 2018
Merged

Allow reordered segment allocation in kafka indexing service#5805
gianm merged 3 commits intoapache:masterfrom
jihoonson:fix-kis-non-determ-seg-order

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented May 25, 2018

Fix #5761.

Major changes are:

  • Enforce to have at most a single active segment per sequence and per interval (see SegmentsOfInterval)
  • Fix IndexerSQLMetadataStorageCoordinator.allocatePendingSegment() to respect skipSegmentLineageCheck to avoid the unique constraint violation for sequence_name_prev_id_sha1. If skipSegmentLineageCheck is true, sequence_prev_id is always an empty string and sequence_name_prev_id_sha1 is created as below.
    final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(
        Hashing.sha1()
               .newHasher()
               .putBytes(StringUtils.toUtf8(sequenceName))
               .putByte((byte) 0xff)
               .putLong(interval.getStartMillis())
               .putLong(interval.getEndMillis())
               .hash()
               .asBytes()
    );
  • skipSegmentLineageCheck can be still false for backward compatibility.

This change is Reviewable

void setAppendingSegment(SegmentWithState appendingSegment)
{
// There should be only one appending segment at any time
Preconditions.checkState(this.appendingSegment == null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include an error message here. (Probably a "WTF?!" message if it should never happen.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

this.appendingSegment = appendingSegment;
}

void addAppendFinishedSegment(SegmentWithState appendFinishedSegment)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only supposed to be using during bootstrapping (startJob)? It doesn't seem like it would make sense otherwise. It could be clearer if this was made into a constructor instead: something that takes a list of initial segments. (Up to you though - this is just a suggestion)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Fixed.

// UNIQUE key for the row, ensuring sequences do not fork in two directions.
// Using a single column instead of (sequence_name, sequence_prev_id) as some MySQL storage engines
// have difficulty with large unique keys (see https://github.com/druid-io/druid/issues/2319)
final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit too much code duplication here. Please share some more code between this method and the other similar one. I know it is slightly different, but it seems close enough that it could be shared. Perhaps take a string for the secondary key and have that either be the previousId (in one path) or the interval (in another path).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

// Avoiding ON DUPLICATE KEY since it's not portable.
// Avoiding try/catch since it may cause inadvertent transaction-splitting.

// UNIQUE key for the row, ensuring sequences do not fork in two directions.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate (its purpose is no longer "ensuring sequences do not fork in two directions"; it changed so now its purpose is to ensure we don't have more than one segment per sequence per interval).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

.asBytes()
);

handle.createStatement(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems shareable too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

) throws IOException
{
return append(row, sequenceName, null, false, true);
return append(row, sequenceName, null, true, true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the BatchAppenderatorDriver skipping the lineage check now? I thought it could still make more than one segment per interval if it's running in non-incremental-publishing mode.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Thanks.

.filter(segmentWithState -> segmentWithState.getState() == SegmentState.APPENDING)
.map(SegmentWithState::getSegmentIdentifier)
.collect(Collectors.toList());
final Map<SegmentIdentifier, SegmentWithState> requestedSegmentIdsForSequences = getAppendingSegments(sequenceNames)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for moving the creation of requestedSegmentIdsForSequences from after the push, to before the push? Is it fixing something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it shouldn't fix anything, but is more reliable and understandable.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jihoonson!

@gianm gianm merged commit b6c957b into apache:master Jul 2, 2018
@jihoonson jihoonson added this to the 0.12.2 milestone Jul 3, 2018
jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 3, 2018
…5805)

* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug
jihoonson added a commit to jihoonson/druid that referenced this pull request Jul 5, 2018
…5805)

* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug
jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 5, 2018
…5805)

    * Allow reordered segment allocation in kafka indexing service

    * address comments

    * fix a bug
fjy pushed a commit that referenced this pull request Jul 5, 2018
…ce (#5943)

* Allow reordered segment allocation in kafka indexing service (#5805)

* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug

* commit remaining changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants