Skip to content

Fix Appenderator.push() to commit the metadata of all segments#5730

Merged
b-slim merged 2 commits intoapache:masterfrom
jihoonson:remove-persist
May 2, 2018
Merged

Fix Appenderator.push() to commit the metadata of all segments#5730
b-slim merged 2 commits intoapache:masterfrom
jihoonson:remove-persist

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented May 1, 2018

Whenever the appenderator needs to persist any data, it should always persist all segments because persisting segments involves committing metadata about persisted segments.

Also removed Appenderator.persist() since it's not used anymore.


This change is Reviewable

@jihoonson jihoonson added the Bug label May 1, 2018
@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented May 2, 2018

@jihoonson i think it is still useful to keep that method public maybe in some other cases you want to persist one by one, i think we should keep that method public and fix whatever need to be fixed.

@jihoonson
Copy link
Copy Markdown
Contributor Author

@b-slim I can't imagine any case. Do you have anything in your mind?

If we don't know exactly when it can be used, I think it would be better to remove to avoid mistakenly using a wrong method.

@jihoonson
Copy link
Copy Markdown
Contributor Author

BTW, the unit test failure might be related even though it passed when I ran it locally. I'll check it.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 2, 2018

IMO, the method is broken by design and it makes sense to get rid of it (at least if you are using commit metadata). If you are using commit metadata, you need to persist all segments at once, since the commit metadata is not related to any particular segment.

It could make sense to call persist for one segment at a time if you don't care about tracking metadata, but I don't see a reason to keep it around just for this. We don't have a use case for this behavior so it's all basically dead code.

{
return persist(getSegments(), committer);
}
ListenableFuture<Object> persistAll(@Nullable Committer committer);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also adjust the javadocs for other methods to stop referring to the "persist" method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Updated javadoc.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 2, 2018

LGTM other than the adjustment to the javadocs.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented May 2, 2018

If you are using commit metadata, you need to persist all segments at once, since the commit metadata is not related to any particular segment.

@gianm please help me to understand this, if we can not do a persist of some segments (in-memory) how we can do a push of subset of segments via ListenableFuture<SegmentsAndMetadata> push(Collection<SegmentIdentifier> identifiers, @Nullable Committer committer);

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 2, 2018

@gianm please help me to understand this, if we can not do a persist of some segments (in-memory) how we can do a push of subset of segments via ListenableFuture push(Collection identifiers, @nullable Committer committer);

With persisting, the appenderator is trying to guarantee that the data persisted on-disk matches the Committer metadata. The purpose of this stuff is mainly so tasks can be restarted and relaunch without losing all data since the last handoff (they can restore from disk). Since the Committer metadata contains the current state of kafka offsets across all segments, it's important to persist every in-memory IncrementalIndex to disk before writing down the Committer metadata.

Pushing is a bit different, in this case you are pushing segments to deep storage that have already been persisted to disk and finalized. So the Committer metadata is already in sync and there is no reason to push every segment.

@b-slim
Copy link
Copy Markdown
Contributor

b-slim commented May 2, 2018

@gianm thanks!

@gianm
Copy link
Copy Markdown
Contributor

gianm commented May 2, 2018

@b-slim no problem! Are you ok with merging this?

@b-slim b-slim merged commit 2c8296f into apache:master May 2, 2018
@jihoonson
Copy link
Copy Markdown
Contributor Author

Since we will create another rc for 0.12.1, I think this is also worthwhile to backport.

sathishsri88 pushed a commit to sathishs/druid that referenced this pull request May 8, 2018
@jihoonson jihoonson added this to the 0.12.1 milestone Jul 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants