Skip to content

Fix load-drop-load sequence for same segment and historical in http loadqueue peon#11717

Merged
abhishekagarwal87 merged 3 commits intoapache:masterfrom
rohangarg:http_loadqueue_fix
Jan 31, 2022
Merged

Fix load-drop-load sequence for same segment and historical in http loadqueue peon#11717
abhishekagarwal87 merged 3 commits intoapache:masterfrom
rohangarg:http_loadqueue_fix

Conversation

@rohangarg
Copy link
Copy Markdown
Member

Fixes an issue where a load-drop-load sequence for a segment and historical doesn't work correctly for http based load queue peon. The first cycle of load-drop works fine - the problem comes when there is an attempt to reload the segment. The historical caches load success for some recent segments and makes the reload as a no-op. But it doesn't consider that fact that the segment was also dropped in between the load requests.
This change invalidates the cache after a client tries to fetch a success result.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

…oadqueue peon

With http loadqueue peon, the success of a load operation is cached. With this after a load-drop sequence,
the peon makes the subsequent load (reload) as a no-op due to caching. The fix clears up the cached success
while serving it to a client
},
this::resolveWaitingFutures
);
} else if (status.get().getState() == Status.STATE.SUCCESS) {
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a better approach could be to augment the possible states to denote that the last success was a Drop/Load success rather than invalidating here. (e.g. Maybe we could add a new state SUCCESS_DROPPED, etc.)

The current approach could cause a problem if we send three subsequent requests, each to Load the same segment.

  • Request 1 would honor the request and mark it as PENDING. The request would then complete and be marked SUCCESS.
  • Request 2 would invalidate the status of the previous one.
  • Request 3 would again try to Load the Segment, even though it's already loaded.

Please correct me if my understanding is not right.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the cache should be keyed by DataSegment instead of DataSegmentChangeRequest, maybe something like Cache<DataSegment, AtomicReference<NonnullPair<DataSegmentChangeRequest, Status>>>, so then you could check if the change request is the same type of change. A different type of change for the same segment would replace the old status so a reload would not be incorrectly cached. DataSegmentChangeRequest itself would probably need modified to have a new method:

@Nullable
  default DataSegment getSegment()
  {
    return null;
  }

this would be a bit more disruptive of a change though, also I haven't fully thought it through so maybe there is some reason this wouldn't be good..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Segment loads are idempotent, the node can be asked to load the same segment many times, it will only load it once. If it gets asked multiple times, it will generate some log spew, but won't create a correctness bug. The cache here is just an optimization on top of that idempotency to keep things out of synchronized blocks when not needed. The intent of the change is to "invalidate" the cache as soon as the node believes that it has accurately reported back a meaningful success response.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not just an issue with dropping segments, but an issue in the load case too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kfaraz what is the harm if the segment is attempted to be loaded again? maybe wasteful, if it occurs, but how often do you think this would happen?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only invalidate the cache after the second call to reload then that call itself is a no-op right? However, since there was already a drop before it, that call should not have been a no-op and should reload right?. In this case we would have to do something like a load-drop-load-load to actually reload the segment?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the thing that is going on here is that the HTTP protocol allows for batches of change requests. The request will return as soon as any one of those requests completes, in a batch of 5, 1 might complete immediately while others are still running. When 1 completes, the coordinator will issue a new request. In the meantime, one of the ones that was requested might have completed, this will cache that it completed so that when the new request comes it can immediately respond back saying that it completed loading, this causes another request for load from the coordinator to happen.

So, even in the normal case, once SUCCESS (meaning that the request has been completed) has been achieved, the need for having the thing in the cache is no more. This change is making those semantics explicit.

There are still potential corner cases (as there are with any distributed protocol). Those corner cases are covered by the idempotency of the API. If, due to some corner case, the same thing gets request multiple times, then it will only be loaded or dropped once (once it has been loaded, there's nothing to load. Once it has been dropped, there's nothing more to drop).

The coordinator assignment algorithm also doesn't expect that a request to load/drop will actually succeed. It starts over from fresh state every time it wakes up because failures can happen. This means that if something is lost, it will identify the correct current state and eventually recover on its own. I.e. even if there is some point in time when a "load-drop-load-load" occurs, that's not actually a problem because at the end it will actually loaded. There is only a problem if it goes into an infinite loop of "load-load-load" without ever actually making progress on the work.

Copy link
Copy Markdown
Contributor

@zachjsh zachjsh Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheddar , for the coordinator assignment algorithm that you noted, im confused how it protects against a segment never being loaded properly in the face of the bug that this pr is fixing. The coordinator will likely notice that a particular segment that was asked to be loaded hasnt been loaded, so it may request that the historical load the segment again, but if the historical still has the cache entry for an earlier successful load request for the same segment, how would it ever load the segment? And in the worse case, if the segment has at one point in time been loaded by all historicals in deployment, and none of them have been rebooted since, I believe that no historical will load the segment no matter how many times the coordinator requests them to do so, until they are restarted, and that cache is cleared. Please correct any misunderstandings I may have here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the cache is set to have a maximum number of entries of 100 by default:

requestStatuses = CacheBuilder.newBuilder().maximumSize(config.getStatusQueueMaxSize()).initialCapacity(8).build();

config.getStatusQueueMaxSize basically defaults to 100. I.e. the reason that this isn't hit too much in production environments is that there is enough other segment stuff going on to cause the cache to get invalidated.

The point I was trying to make is that invalidating the entry as soon as some valid response is returned is correct as we have a point where we are guaranteed that the cache has been cleared. Generally speaking, this should also only ever return the success response once per request, which is also what we want. If there is a corner case such that success is returned a second time, we are at least guaranteed that it won't be returned a 3rd time (as it should be removed on the second return). And the coordinator's normal protocol is resilient to being lied to sometimes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying @cheddar . Makes sense

},
this::resolveWaitingFutures
);
} else if (status.get().getState() == Status.STATE.SUCCESS) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the cache is set to have a maximum number of entries of 100 by default:

requestStatuses = CacheBuilder.newBuilder().maximumSize(config.getStatusQueueMaxSize()).initialCapacity(8).build();

config.getStatusQueueMaxSize basically defaults to 100. I.e. the reason that this isn't hit too much in production environments is that there is enough other segment stuff going on to cause the cache to get invalidated.

The point I was trying to make is that invalidating the entry as soon as some valid response is returned is correct as we have a point where we are guaranteed that the cache has been cleared. Generally speaking, this should also only ever return the success response once per request, which is also what we want. If there is a corner case such that success is returned a second time, we are at least guaranteed that it won't be returned a 3rd time (as it should be removed on the second return). And the coordinator's normal protocol is resilient to being lied to sometimes.

@abhishekagarwal87 abhishekagarwal87 merged commit c4fa3cc into apache:master Jan 31, 2022
@abhishekagarwal87
Copy link
Copy Markdown
Contributor

Merged since CI failure is unrelated. Thank you @rohangarg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants