-
-
Notifications
You must be signed in to change notification settings - Fork 782
Description
SUMMARY
Using StackStorm 3.0.1, if something kills an st2actionrunner process supervising a python-script action runner, and this action execution is part of a workflow execution, the action execution remains forever in state running regardless of parameters: timeout setting in the workflow.
What I'd like to see, is the action being rescheduled to another st2actionrunner, or at the very least, timed out so that a retry in the workflow can deal with that problem.
(It is not clear either how StackStorm deals with the death of a st2actionrunner supervising an orquesta action runner.)
This is not an HA setup, but nothing in the code or documentation leads me to believe that the expected behavior is to just hang a workflow execution when the underlying action runner supervisor process is gone. I'm thinking a machine in an HA setup crashes while ongoing workflows are executing actions on that machine, and then all workflows whose actions were running there, just hang, never to even timeout.
We expect to be able to run StackStorm for weeks on end, with long-running workflows that survive the death or reboot of a machine that is part of the StackStorm cluster.
OS / ENVIRONMENT / INSTALL METHOD
Standard non-HA recommended setup in Ubuntu 16.04
STEPS TO REPRODUCE
Create workflow with one Python action that runs sleep 60 via subprocess.
Start workflow with st2 run.
Kill st2actionrunner supervising the Python action.
Wait forever.