Fix Appenderator.push() to commit the metadata of all segments#5730
Fix Appenderator.push() to commit the metadata of all segments#5730b-slim merged 2 commits intoapache:masterfrom
Conversation
|
@jihoonson i think it is still useful to keep that method public maybe in some other cases you want to persist one by one, i think we should keep that method public and fix whatever need to be fixed. |
|
@b-slim I can't imagine any case. Do you have anything in your mind? If we don't know exactly when it can be used, I think it would be better to remove to avoid mistakenly using a wrong method. |
|
BTW, the unit test failure might be related even though it passed when I ran it locally. I'll check it. |
|
IMO, the method is broken by design and it makes sense to get rid of it (at least if you are using commit metadata). If you are using commit metadata, you need to persist all segments at once, since the commit metadata is not related to any particular segment. It could make sense to call |
| { | ||
| return persist(getSegments(), committer); | ||
| } | ||
| ListenableFuture<Object> persistAll(@Nullable Committer committer); |
There was a problem hiding this comment.
Please also adjust the javadocs for other methods to stop referring to the "persist" method.
There was a problem hiding this comment.
Thanks. Updated javadoc.
|
LGTM other than the adjustment to the javadocs. |
@gianm please help me to understand this, if we can not do a persist of some segments (in-memory) how we can do a push of subset of segments via |
With persisting, the appenderator is trying to guarantee that the data persisted on-disk matches the Committer metadata. The purpose of this stuff is mainly so tasks can be restarted and relaunch without losing all data since the last handoff (they can restore from disk). Since the Committer metadata contains the current state of kafka offsets across all segments, it's important to persist every in-memory IncrementalIndex to disk before writing down the Committer metadata. Pushing is a bit different, in this case you are pushing segments to deep storage that have already been persisted to disk and finalized. So the Committer metadata is already in sync and there is no reason to push every segment. |
|
@gianm thanks! |
|
@b-slim no problem! Are you ok with merging this? |
|
Since we will create another rc for 0.12.1, I think this is also worthwhile to backport. |
…e#5730) * Remove persist from Appenderator * fix javadoc
Whenever the appenderator needs to persist any data, it should always persist all segments because persisting segments involves committing metadata about persisted segments.
Also removed
Appenderator.persist()since it's not used anymore.This change is