Handle task location fetch from overlord during rolling upgrades#16227
Handle task location fetch from overlord during rolling upgrades#16227kfaraz merged 7 commits intoapache:masterfrom
Conversation
|
I don't think it is desirable to fall back to another API within the OverlordClientImpl. It would make more sense to have the fallback logic in the specific task service locator class. |
Would it be ok to add a parameter within the Overlord client to fallback to the older API? The locations are needed not only in |
I feel it is okay to have duplication in this case for the time being. |
| ).get(workerId); | ||
|
|
||
| if (taskStatus != null | ||
| && TaskLocation.unknown().equals(taskStatus.getLocation())) { |
There was a problem hiding this comment.
Typo:
| && TaskLocation.unknown().equals(taskStatus.getLocation())) { | |
| && !TaskLocation.unknown().equals(taskStatus.getLocation())) { |
kfaraz
left a comment
There was a problem hiding this comment.
Changes LGTM. @AmatyaAvadhanula , could you please check the CI failures?
…des (apache#16227)" This reverts commit ad6bd62.
Bug:
#15724 - introduced a bug where a rolling upgrade would cause all task locations returned by the Overlord on an older version to be unknown.
Prior to #15724,
getTaskStatusfor individual tasks fetched a TaskStatusResponse containing the location.getMultipleTaskStatusesfetched task statuses in a batch from the metadata store. The metadata store doesn't contain the current location of an active task. Complete tasks do contain themAfter the changes,
getTaskStatusremains unchanged.getMultipleTaskStatusesfetches task statuses for in-memory tasks from the TaskQueue and enhances them with the location from the task runner. The method fetches task statuses for completed tasks from the db.The Overlord client was also changed to rely on the 2nd API to fetch the task status and location from memory.
During a rolling upgrade, the task is on a version with the PR's changes and queries the 2nd API. The overlord is still on the older version and fails to return the correct location for active tasks. This can lead to task failures during rolling upgrades.
Fix
The overlord client now falls back to the orignal API that always returns the task location if the 2nd API fails to return it during the rolling upgrade. After the rolling upgrade, the active tasks' statuses will be fetched from memory as expected.
Testing
The new overlord client was used on an upgraded Indexer / MM while the Overlord was on a version prior to #15724. The tasks succeeded as expected. (They would fail with a newer Indexer without this patch).
This PR has: