Fix three bugs with segment publishing.#6155
Conversation
1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in apache#6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing.
|
👍 |
|
Please check this: |
| final TransactionStatus transactionStatus | ||
| ) throws Exception | ||
| { | ||
| definitelyNotUpdated.set(false); |
There was a problem hiding this comment.
Would you add a comment that this overwrites definitelyNotUpdated on retrying?
| log.info("Our segments really do exist, awaiting handoff."); | ||
| } else { | ||
| throw new ISE("Failed to publish segments[%s]", segmentsAndMetadata.getSegments()); | ||
| throw new ISE("Failed to publish segments."); |
There was a problem hiding this comment.
Is this change for removing too large logs? I feel sometimes this log helps..
There was a problem hiding this comment.
I was thinking it's not necessary, since they will get logged in the catch statement via:
log.warn(e, "Failed publish, not removing segments: %s", segmentsAndMetadata.getSegments());| if (txnFailure.get()) { | ||
| return new SegmentPublishResult(ImmutableSet.of(), false); | ||
| if (definitelyNotUpdated.get()) { | ||
| return SegmentPublishResult.fail(); |
There was a problem hiding this comment.
What do you think about adding an exception to SegmentPublishResult on failure, so that callers can figure out why it failed?
There was a problem hiding this comment.
I think it's not necessary, since there is supposed to be only one reason: compare-and-swap failure with the metadata update.
This is happening because I referenced it in a javadoc. Apparently that's not good enough for the plugin. I removed the reference. |
|
@gianm please check the build failure. |
|
@jihoonson thanks, I pushed again. |
|
👍 |
|
There are still some build failures. |
|
OMG, sorry, I'll check more thoroughly before I push again. |
|
Hmm, now some unit tests are failing and looks legitimate. |
* Fix three bugs with segment publishing. 1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in apache#6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing. * Remove javadoc-only import. * Updates. * Fix test. * Fix tests.
…che#6187) * [Backport] Fix three bugs with segment publishing. (apache#6155) * Fix KafkaIndexTask
…che#6187) * [Backport] Fix three bugs with segment publishing. (apache#6155) * Fix KafkaIndexTask
was already pushed. This is important because if we don't do this, it causes
the issue mentioned in KafkaIndexTask can delete published segments on restart #6124.
a "not published" result instead of throwing an exception, when there was one
metadata update failure, followed by some random exception. This is done by
resetting the AtomicBoolean that tracks what case we're in, each time the
callback runs.
publish result. Skip killing if we just got some exception. The reason for this
is that we want to avoid killing segments if they are in an unknown state.
Two other changes to clarify the contracts a bit and hopefully prevent future bugs:
more similar to announceHistoricalSegments.
must indicate that the publish definitely did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.