Move lock for concurrency policies into scheduler #4537

m4dcoder · 2019-02-06T01:03:45Z

Move the lock for coordinating concurrency policies into the scheduler. With the current approach, when there are more than one schedulers, there is a race in scheduling that results in failure to enforce the concurrency accurately.

Kami · 2019-02-06T08:45:04Z

CHANGELOG.rst

 ~~~~~~~

 * Changed the ``inquiries`` API path from ``/exp`` to ``/api/v1`` #4495
+* Moved the lock from concurrency policies into the scheduler. #4481 (bug fix)


Please also clarify in the changelog entry what bug it fixes. Otherwise if people go over the changelog they will have no idea what this change doesn't and if it affects them or not.

Kami · 2019-02-06T08:48:23Z

st2common/st2common/constants/policy.py

+
+# Concurrency policies require scheduler to acquire a distributed lock to prevent race
+# in scheduling when there are multiple scheduler instances.
+POLICY_TYPES_REQUIRING_LOCK = [


Kami · 2019-02-06T08:50:24Z

st2common/st2common/services/policies.py

+    if policy_types:
+        query_params['policy_type__in'] = policy_types
+
+    policy_dbs = pc_db_access.Policy.query(**query_params)


Adding .count() to the end would probably be a bit more efficient since the count will be calculated and returned server side and means we don't need to evaluate and load the whole result set in memory like we do if we use len().

Not a huge issue here since those documents are not large, but still an easy change :)

Kami · 2019-02-06T08:53:57Z

Thanks for working on this change, LGTM 👍

On a related note - would it some how be possible for us to write end to end / integration tests which actually try to emulate the race and verify it's not there (probably quite hard to do end to end wise)?

Maybe spawn two scheduler process as part of an integration test?

Kami · 2019-02-07T07:24:48Z

st2common/st2common/services/coordination.py

        COORDINATOR = coordinator_setup()

    return COORDINATOR
-


I assume this commented out code will be removed?

Yes, it will be removed.

Move the lock for coordinating concurrency policies into the scheduler. With the current approach, when there are more than one schedulers, there is a race in scheduling that results in failure to enforce the concurrency accurately.

Use the count method instead of len so the querying is done server side at MongoDB.

… bug Update the changelog entry to be more descriptive on the fixing of the scheduler race related bug.

Clean up and remove commented out code from the coordination service.

m4dcoder requested review from Kami and bigmstone February 6, 2019 01:03

m4dcoder mentioned this pull request Feb 6, 2019

Concurrency policy fails when set to 1 #4481

Closed

m4dcoder force-pushed the fix-scheduler-concurrency branch 2 times, most recently from fdf62a2 to 793c2aa Compare February 6, 2019 01:10

Kami added this to the 2.10.2 milestone Feb 6, 2019

Kami added service: scheduler race labels Feb 6, 2019

Kami reviewed Feb 6, 2019

View reviewed changes

Kami added the policies label Feb 6, 2019

Kami approved these changes Feb 7, 2019

View reviewed changes

Kami reviewed Feb 7, 2019

View reviewed changes

m4dcoder added 4 commits February 7, 2019 20:17

Move lock for concurrency policies into scheduler

08eab3f

Move the lock for coordinating concurrency policies into the scheduler. With the current approach, when there are more than one schedulers, there is a race in scheduling that results in failure to enforce the concurrency accurately.

Use count instead of len when querying if action has policies

97fd7d0

Use the count method instead of len so the querying is done server side at MongoDB.

Update changelog entry to include information on fixing the scheduler…

9bc6294

… bug Update the changelog entry to be more descriptive on the fixing of the scheduler race related bug.

Remove commented out code from coordination service

a4f8b44

Clean up and remove commented out code from the coordination service.

m4dcoder force-pushed the fix-scheduler-concurrency branch from 4338989 to a4f8b44 Compare February 7, 2019 20:18

m4dcoder merged commit 958eaeb into master Feb 7, 2019

m4dcoder deleted the fix-scheduler-concurrency branch February 7, 2019 21:13

Kami mentioned this pull request Feb 18, 2019

Changes for v2.10.2 release #4553

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move lock for concurrency policies into scheduler #4537

Move lock for concurrency policies into scheduler #4537

Uh oh!

m4dcoder commented Feb 6, 2019

Uh oh!

Kami Feb 6, 2019

Uh oh!

m4dcoder Feb 6, 2019

Uh oh!

Kami Feb 6, 2019

Uh oh!

Kami Feb 6, 2019

Uh oh!

m4dcoder Feb 6, 2019

Uh oh!

Kami commented Feb 6, 2019

Uh oh!

Kami Feb 7, 2019

Uh oh!

m4dcoder Feb 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Move lock for concurrency policies into scheduler #4537

Move lock for concurrency policies into scheduler #4537

Uh oh!

Conversation

m4dcoder commented Feb 6, 2019

Uh oh!

Kami Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

m4dcoder Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

Kami Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

Kami Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

m4dcoder Feb 6, 2019

Choose a reason for hiding this comment

Uh oh!

Kami commented Feb 6, 2019

Uh oh!

Kami Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

m4dcoder Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants