Skip to content

Race condition causing orquesta workflow status to change to running after it completed successfully #5222

@khushboobhatia01

Description

@khushboobhatia01

SUMMARY

Race condition causing execution status to change to running after it completed successfully.
Action Execution logs:

log" : [ { "timestamp" : ISODate("2021-03-27T13:24:52.932Z"), "status" : "requested" }, { "timestamp" : ISODate("2021-03-27T13:24:54.938Z"), "status" : "scheduled" }, { "timestamp" : ISODate("2021-03-27T13:25:00.200Z"), "status" : "running" }, { "timestamp" : ISODate("2021-03-27T13:25:40.857Z"), "status" : "succeeded" }, { "timestamp" : ISODate("2021-03-27T13:25:43.470Z"), "status" : "running" } ]

Potential race condition causing issue when https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/workflows.py#L980 completes before https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/container/base.py#L380 is executed . When working with 1000s of executions, some delay is expected while running an action/updating records in DB which is leading to this race condition. We have been seeing this issue quite frequently now.

STACKSTORM VERSION

Paste the output of st2 --version:
st2 3.5dev, on Python 3.6.9

OS, environment, install method

Linux

Post what OS you are running this on, along with any other relevant information/

  • e.g. Docker, Vagrant, Kubernetes, etc. Describe how you installed ST2
  • e.g. one-line install, custom install, etc -->

Steps to reproduce the problem

Show how to reproduce the problem, using a minimal test-case. Make sure to include any content
(pack content - workflows, actions, etc.) which are needed to reproduce the problem.

Added few sleep statements in the code to reproduce the race condition.

Add this before https://github.com/StackStorm/st2/blob/master/st2actions/st2actions/container/base.py#L380 executions.update_execution(

if liveaction_db.action_is_workflow==True:
         concurrency.sleep(seconds=120)

Add this before https://github.com/StackStorm/st2/blob/master/st2common/st2common/services/action.py#L80 if parent_user: (This is to avoid action execution to complete quickly which is expected when running large no. of executions)

concurrency.sleep(seconds=30)

Run any workflow.

Expected Results

Status transitioning should stop at succeeded even if the race condition occurs. Adding step to check if a transition is valid or not could help?

Actual Results

Action execution stuck in running state forever where as live action and workflow execution object are in succeeded state.

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions