-
-
Notifications
You must be signed in to change notification settings - Fork 782
Description
SUMMARY
Race condition causing execution status to change to running after it completed successfully.
Action Execution logs:
log" : [ { "timestamp" : ISODate("2021-03-27T13:24:52.932Z"), "status" : "requested" }, { "timestamp" : ISODate("2021-03-27T13:24:54.938Z"), "status" : "scheduled" }, { "timestamp" : ISODate("2021-03-27T13:25:00.200Z"), "status" : "running" }, { "timestamp" : ISODate("2021-03-27T13:25:40.857Z"), "status" : "succeeded" }, { "timestamp" : ISODate("2021-03-27T13:25:43.470Z"), "status" : "running" } ]
Potential race condition causing issue when https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/workflows.py#L980 completes before https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/container/base.py#L380 is executed . When working with 1000s of executions, some delay is expected while running an action/updating records in DB which is leading to this race condition. We have been seeing this issue quite frequently now.
STACKSTORM VERSION
Paste the output of st2 --version:
st2 3.5dev, on Python 3.6.9
OS, environment, install method
Linux
Post what OS you are running this on, along with any other relevant information/
- e.g. Docker, Vagrant, Kubernetes, etc. Describe how you installed ST2
- e.g. one-line install, custom install, etc -->
Steps to reproduce the problem
Show how to reproduce the problem, using a minimal test-case. Make sure to include any content
(pack content - workflows, actions, etc.) which are needed to reproduce the problem.
Added few sleep statements in the code to reproduce the race condition.
Add this before https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/container/base.py#L380 executions.update_execution(
if liveaction_db.action_is_workflow==True:
concurrency.sleep(seconds=120)
Add this before https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/action.py#L80 if parent_user: (This is to avoid action execution to complete quickly which is expected when running large no. of executions)
concurrency.sleep(seconds=30)
Run any workflow.
Expected Results
Status transitioning should stop at succeeded even if the race condition occurs. Adding step to check if a transition is valid or not could help?
Actual Results
Action execution stuck in running state forever where as live action and workflow execution object are in succeeded state.
Making sure to follow these steps will guarantee the quickest resolution possible.
Thanks!