-
-
Notifications
You must be signed in to change notification settings - Fork 782
Move lock for concurrency policies into scheduler #4537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fdf62a2 to
793c2aa
Compare
CHANGELOG.rst
Outdated
| ~~~~~~~ | ||
|
|
||
| * Changed the ``inquiries`` API path from ``/exp`` to ``/api/v1`` #4495 | ||
| * Moved the lock from concurrency policies into the scheduler. #4481 (bug fix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also clarify in the changelog entry what bug it fixes. Otherwise if people go over the changelog they will have no idea what this change doesn't and if it affects them or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
|
||
| # Concurrency policies require scheduler to acquire a distributed lock to prevent race | ||
| # in scheduling when there are multiple scheduler instances. | ||
| POLICY_TYPES_REQUIRING_LOCK = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| if policy_types: | ||
| query_params['policy_type__in'] = policy_types | ||
|
|
||
| policy_dbs = pc_db_access.Policy.query(**query_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding .count() to the end would probably be a bit more efficient since the count will be calculated and returned server side and means we don't need to evaluate and load the whole result set in memory like we do if we use len().
Not a huge issue here since those documents are not large, but still an easy change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
Thanks for working on this change, LGTM 👍 On a related note - would it some how be possible for us to write end to end / integration tests which actually try to emulate the race and verify it's not there (probably quite hard to do end to end wise)? Maybe spawn two scheduler process as part of an integration test? |
| COORDINATOR = coordinator_setup() | ||
|
|
||
| return COORDINATOR | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this commented out code will be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will be removed.
Move the lock for coordinating concurrency policies into the scheduler. With the current approach, when there are more than one schedulers, there is a race in scheduling that results in failure to enforce the concurrency accurately.
Use the count method instead of len so the querying is done server side at MongoDB.
… bug Update the changelog entry to be more descriptive on the fixing of the scheduler race related bug.
Clean up and remove commented out code from the coordination service.
4338989 to
a4f8b44
Compare
Move the lock for coordinating concurrency policies into the scheduler. With the current approach, when there are more than one schedulers, there is a race in scheduling that results in failure to enforce the concurrency accurately.