Fix load-drop-load sequence for same segment and historical in http loadqueue peon by rohangarg · Pull Request #11717 · apache/druid

rohangarg · 2021-09-16T15:51:16Z

Fixes an issue where a load-drop-load sequence for a segment and historical doesn't work correctly for http based load queue peon. The first cycle of load-drop works fine - the problem comes when there is an attempt to reload the segment. The historical caches load success for some recent segments and makes the reload as a no-op. But it doesn't consider that fact that the segment was also dropped in between the load requests.
This change invalidates the cache after a client tries to fetch a success result.

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

…oadqueue peon With http loadqueue peon, the success of a load operation is cached. With this after a load-drop sequence, the peon makes the subsequent load (reload) as a no-op due to caching. The fix clears up the cached success while serving it to a client

kfaraz · 2021-09-16T17:01:53Z

            },
            this::resolveWaitingFutures
        );
+      } else if (status.get().getState() == Status.STATE.SUCCESS) {


I feel a better approach could be to augment the possible states to denote that the last success was a Drop/Load success rather than invalidating here. (e.g. Maybe we could add a new state SUCCESS_DROPPED, etc.)

The current approach could cause a problem if we send three subsequent requests, each to Load the same segment.

Request 1 would honor the request and mark it as PENDING. The request would then complete and be marked SUCCESS.

Request 2 would invalidate the status of the previous one.

Request 3 would again try to Load the Segment, even though it's already loaded.

Please correct me if my understanding is not right.

I wonder if the cache should be keyed by DataSegment instead of DataSegmentChangeRequest, maybe something like Cache<DataSegment, AtomicReference<NonnullPair<DataSegmentChangeRequest, Status>>>, so then you could check if the change request is the same type of change. A different type of change for the same segment would replace the old status so a reload would not be incorrectly cached. DataSegmentChangeRequest itself would probably need modified to have a new method:

@Nullable default DataSegment getSegment() { return null; }

this would be a bit more disruptive of a change though, also I haven't fully thought it through so maybe there is some reason this wouldn't be good..

Segment loads are idempotent, the node can be asked to load the same segment many times, it will only load it once. If it gets asked multiple times, it will generate some log spew, but won't create a correctness bug. The cache here is just an optimization on top of that idempotency to keep things out of synchronized blocks when not needed. The intent of the change is to "invalidate" the cache as soon as the node believes that it has accurately reported back a meaningful success response.

Note that this is not just an issue with dropping segments, but an issue in the load case too.

@kfaraz what is the harm if the segment is attempted to be loaded again? maybe wasteful, if it occurs, but how often do you think this would happen?

If we only invalidate the cache after the second call to reload then that call itself is a no-op right? However, since there was already a drop before it, that call should not have been a no-op and should reload right?. In this case we would have to do something like a load-drop-load-load to actually reload the segment?

So, the thing that is going on here is that the HTTP protocol allows for batches of change requests. The request will return as soon as any one of those requests completes, in a batch of 5, 1 might complete immediately while others are still running. When 1 completes, the coordinator will issue a new request. In the meantime, one of the ones that was requested might have completed, this will cache that it completed so that when the new request comes it can immediately respond back saying that it completed loading, this causes another request for load from the coordinator to happen.

So, even in the normal case, once SUCCESS (meaning that the request has been completed) has been achieved, the need for having the thing in the cache is no more. This change is making those semantics explicit.

There are still potential corner cases (as there are with any distributed protocol). Those corner cases are covered by the idempotency of the API. If, due to some corner case, the same thing gets request multiple times, then it will only be loaded or dropped once (once it has been loaded, there's nothing to load. Once it has been dropped, there's nothing more to drop).

The coordinator assignment algorithm also doesn't expect that a request to load/drop will actually succeed. It starts over from fresh state every time it wakes up because failures can happen. This means that if something is lost, it will identify the correct current state and eventually recover on its own. I.e. even if there is some point in time when a "load-drop-load-load" occurs, that's not actually a problem because at the end it will actually loaded. There is only a problem if it goes into an infinite loop of "load-load-load" without ever actually making progress on the work.

@cheddar , for the coordinator assignment algorithm that you noted, im confused how it protects against a segment never being loaded properly in the face of the bug that this pr is fixing. The coordinator will likely notice that a particular segment that was asked to be loaded hasnt been loaded, so it may request that the historical load the segment again, but if the historical still has the cache entry for an earlier successful load request for the same segment, how would it ever load the segment? And in the worse case, if the segment has at one point in time been loaded by all historicals in deployment, and none of them have been rebooted since, I believe that no historical will load the segment no matter how many times the coordinator requests them to do so, until they are restarted, and that cache is cleared. Please correct any misunderstandings I may have here.

So, the cache is set to have a maximum number of entries of 100 by default:

requestStatuses = CacheBuilder.newBuilder().maximumSize(config.getStatusQueueMaxSize()).initialCapacity(8).build();

config.getStatusQueueMaxSize basically defaults to 100. I.e. the reason that this isn't hit too much in production environments is that there is enough other segment stuff going on to cause the cache to get invalidated.

The point I was trying to make is that invalidating the entry as soon as some valid response is returned is correct as we have a point where we are guaranteed that the cache has been cleared. Generally speaking, this should also only ever return the success response once per request, which is also what we want. If there is a corner case such that success is returned a second time, we are at least guaranteed that it won't be returned a 3rd time (as it should be removed on the second return). And the coordinator's normal protocol is resilient to being lied to sometimes.

Thanks for clarifying @cheddar . Makes sense

imply-cheddar · 2022-01-31T03:55:42Z

            },
            this::resolveWaitingFutures
        );
+      } else if (status.get().getState() == Status.STATE.SUCCESS) {


So, the cache is set to have a maximum number of entries of 100 by default:

requestStatuses = CacheBuilder.newBuilder().maximumSize(config.getStatusQueueMaxSize()).initialCapacity(8).build();

config.getStatusQueueMaxSize basically defaults to 100. I.e. the reason that this isn't hit too much in production environments is that there is enough other segment stuff going on to cause the cache to get invalidated.

The point I was trying to make is that invalidating the entry as soon as some valid response is returned is correct as we have a point where we are guaranteed that the cache has been cleared. Generally speaking, this should also only ever return the success response once per request, which is also what we want. If there is a corner case such that success is returned a second time, we are at least guaranteed that it won't be returned a 3rd time (as it should be removed on the second return). And the coordinator's normal protocol is resilient to being lied to sometimes.

abhishekagarwal87 · 2022-01-31T07:47:16Z

Merged since CI failure is unrelated. Thank you @rohangarg

Add test for load-drop-load sequence for same segment

a169ef1

rohangarg force-pushed the http_loadqueue_fix branch from 0eb9dea to 268c029 Compare September 16, 2021 15:55

rohangarg added 2 commits September 16, 2021 21:32

Fix tests in SegmentLoadDropHandler

84c21aa

rohangarg force-pushed the http_loadqueue_fix branch from 268c029 to 84c21aa Compare September 16, 2021 16:03

kfaraz reviewed Sep 16, 2021

View reviewed changes

clintropolis added Area - Segment Balancing/Coordination Bug labels Sep 21, 2021

samarthjain mentioned this pull request Nov 3, 2021

Change default segment loading to http #11760

Closed

imply-cheddar approved these changes Jan 31, 2022

View reviewed changes

abhishekagarwal87 merged commit c4fa3cc into apache:master Jan 31, 2022

FaxianZhao mentioned this pull request Feb 21, 2022

Historical request status cache cause stale data in historical which cannot be removed #12104

Closed

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

kfaraz mentioned this pull request Jul 19, 2023

Invalidate cache load status on historical as soon as it is served #14616

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix load-drop-load sequence for same segment and historical in http loadqueue peon#11717

Fix load-drop-load sequence for same segment and historical in http loadqueue peon#11717
abhishekagarwal87 merged 3 commits intoapache:masterfrom
rohangarg:http_loadqueue_fix

rohangarg commented Sep 16, 2021

Uh oh!

kfaraz Sep 16, 2021 •

edited

Loading

Uh oh!

clintropolis Oct 7, 2021

Uh oh!

imply-cheddar Jan 26, 2022

Uh oh!

zachjsh Jan 26, 2022

Uh oh!

zachjsh Jan 26, 2022

Uh oh!

maytasm Jan 27, 2022

Uh oh!

imply-cheddar Jan 27, 2022

Uh oh!

zachjsh Jan 27, 2022 •

edited

Loading

Uh oh!

imply-cheddar Jan 31, 2022

Uh oh!

zachjsh Jan 31, 2022

Uh oh!

imply-cheddar Jan 31, 2022

Uh oh!

abhishekagarwal87 commented Jan 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

rohangarg commented Sep 16, 2021

Uh oh!

kfaraz Sep 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zachjsh Jan 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhishekagarwal87 commented Jan 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kfaraz Sep 16, 2021 •

edited

Loading

zachjsh Jan 27, 2022 •

edited

Loading