Fix load-drop-load sequence for same segment and historical in http loadqueue peon#11717
Conversation
0eb9dea to
268c029
Compare
…oadqueue peon With http loadqueue peon, the success of a load operation is cached. With this after a load-drop sequence, the peon makes the subsequent load (reload) as a no-op due to caching. The fix clears up the cached success while serving it to a client
268c029 to
84c21aa
Compare
| }, | ||
| this::resolveWaitingFutures | ||
| ); | ||
| } else if (status.get().getState() == Status.STATE.SUCCESS) { |
There was a problem hiding this comment.
I feel a better approach could be to augment the possible states to denote that the last success was a Drop/Load success rather than invalidating here. (e.g. Maybe we could add a new state SUCCESS_DROPPED, etc.)
The current approach could cause a problem if we send three subsequent requests, each to Load the same segment.
- Request 1 would honor the request and mark it as PENDING. The request would then complete and be marked SUCCESS.
- Request 2 would invalidate the status of the previous one.
- Request 3 would again try to Load the Segment, even though it's already loaded.
Please correct me if my understanding is not right.
There was a problem hiding this comment.
I wonder if the cache should be keyed by DataSegment instead of DataSegmentChangeRequest, maybe something like Cache<DataSegment, AtomicReference<NonnullPair<DataSegmentChangeRequest, Status>>>, so then you could check if the change request is the same type of change. A different type of change for the same segment would replace the old status so a reload would not be incorrectly cached. DataSegmentChangeRequest itself would probably need modified to have a new method:
@Nullable
default DataSegment getSegment()
{
return null;
}
this would be a bit more disruptive of a change though, also I haven't fully thought it through so maybe there is some reason this wouldn't be good..
There was a problem hiding this comment.
Segment loads are idempotent, the node can be asked to load the same segment many times, it will only load it once. If it gets asked multiple times, it will generate some log spew, but won't create a correctness bug. The cache here is just an optimization on top of that idempotency to keep things out of synchronized blocks when not needed. The intent of the change is to "invalidate" the cache as soon as the node believes that it has accurately reported back a meaningful success response.
There was a problem hiding this comment.
Note that this is not just an issue with dropping segments, but an issue in the load case too.
There was a problem hiding this comment.
@kfaraz what is the harm if the segment is attempted to be loaded again? maybe wasteful, if it occurs, but how often do you think this would happen?
There was a problem hiding this comment.
If we only invalidate the cache after the second call to reload then that call itself is a no-op right? However, since there was already a drop before it, that call should not have been a no-op and should reload right?. In this case we would have to do something like a load-drop-load-load to actually reload the segment?
There was a problem hiding this comment.
So, the thing that is going on here is that the HTTP protocol allows for batches of change requests. The request will return as soon as any one of those requests completes, in a batch of 5, 1 might complete immediately while others are still running. When 1 completes, the coordinator will issue a new request. In the meantime, one of the ones that was requested might have completed, this will cache that it completed so that when the new request comes it can immediately respond back saying that it completed loading, this causes another request for load from the coordinator to happen.
So, even in the normal case, once SUCCESS (meaning that the request has been completed) has been achieved, the need for having the thing in the cache is no more. This change is making those semantics explicit.
There are still potential corner cases (as there are with any distributed protocol). Those corner cases are covered by the idempotency of the API. If, due to some corner case, the same thing gets request multiple times, then it will only be loaded or dropped once (once it has been loaded, there's nothing to load. Once it has been dropped, there's nothing more to drop).
The coordinator assignment algorithm also doesn't expect that a request to load/drop will actually succeed. It starts over from fresh state every time it wakes up because failures can happen. This means that if something is lost, it will identify the correct current state and eventually recover on its own. I.e. even if there is some point in time when a "load-drop-load-load" occurs, that's not actually a problem because at the end it will actually loaded. There is only a problem if it goes into an infinite loop of "load-load-load" without ever actually making progress on the work.
There was a problem hiding this comment.
@cheddar , for the coordinator assignment algorithm that you noted, im confused how it protects against a segment never being loaded properly in the face of the bug that this pr is fixing. The coordinator will likely notice that a particular segment that was asked to be loaded hasnt been loaded, so it may request that the historical load the segment again, but if the historical still has the cache entry for an earlier successful load request for the same segment, how would it ever load the segment? And in the worse case, if the segment has at one point in time been loaded by all historicals in deployment, and none of them have been rebooted since, I believe that no historical will load the segment no matter how many times the coordinator requests them to do so, until they are restarted, and that cache is cleared. Please correct any misunderstandings I may have here.
There was a problem hiding this comment.
So, the cache is set to have a maximum number of entries of 100 by default:
requestStatuses = CacheBuilder.newBuilder().maximumSize(config.getStatusQueueMaxSize()).initialCapacity(8).build();
config.getStatusQueueMaxSize basically defaults to 100. I.e. the reason that this isn't hit too much in production environments is that there is enough other segment stuff going on to cause the cache to get invalidated.
The point I was trying to make is that invalidating the entry as soon as some valid response is returned is correct as we have a point where we are guaranteed that the cache has been cleared. Generally speaking, this should also only ever return the success response once per request, which is also what we want. If there is a corner case such that success is returned a second time, we are at least guaranteed that it won't be returned a 3rd time (as it should be removed on the second return). And the coordinator's normal protocol is resilient to being lied to sometimes.
| }, | ||
| this::resolveWaitingFutures | ||
| ); | ||
| } else if (status.get().getState() == Status.STATE.SUCCESS) { |
There was a problem hiding this comment.
So, the cache is set to have a maximum number of entries of 100 by default:
requestStatuses = CacheBuilder.newBuilder().maximumSize(config.getStatusQueueMaxSize()).initialCapacity(8).build();
config.getStatusQueueMaxSize basically defaults to 100. I.e. the reason that this isn't hit too much in production environments is that there is enough other segment stuff going on to cause the cache to get invalidated.
The point I was trying to make is that invalidating the entry as soon as some valid response is returned is correct as we have a point where we are guaranteed that the cache has been cleared. Generally speaking, this should also only ever return the success response once per request, which is also what we want. If there is a corner case such that success is returned a second time, we are at least guaranteed that it won't be returned a 3rd time (as it should be removed on the second return). And the coordinator's normal protocol is resilient to being lied to sometimes.
|
Merged since CI failure is unrelated. Thank you @rohangarg |
Fixes an issue where a load-drop-load sequence for a segment and historical doesn't work correctly for http based load queue peon. The first cycle of load-drop works fine - the problem comes when there is an attempt to reload the segment. The historical caches load success for some recent segments and makes the reload as a no-op. But it doesn't consider that fact that the segment was also dropped in between the load requests.
This change invalidates the cache after a client tries to fetch a success result.
This PR has: