Skip to content

Death of st2actionrunner process causes action to remain running forever #4716

@Rudd-O

Description

@Rudd-O
SUMMARY

Using StackStorm 3.0.1, if something kills an st2actionrunner process supervising a python-script action runner, and this action execution is part of a workflow execution, the action execution remains forever in state running regardless of parameters: timeout setting in the workflow.

What I'd like to see, is the action being rescheduled to another st2actionrunner, or at the very least, timed out so that a retry in the workflow can deal with that problem.

(It is not clear either how StackStorm deals with the death of a st2actionrunner supervising an orquesta action runner.)

This is not an HA setup, but nothing in the code or documentation leads me to believe that the expected behavior is to just hang a workflow execution when the underlying action runner supervisor process is gone. I'm thinking a machine in an HA setup crashes while ongoing workflows are executing actions on that machine, and then all workflows whose actions were running there, just hang, never to even timeout.

We expect to be able to run StackStorm for weeks on end, with long-running workflows that survive the death or reboot of a machine that is part of the StackStorm cluster.

OS / ENVIRONMENT / INSTALL METHOD

Standard non-HA recommended setup in Ubuntu 16.04

STEPS TO REPRODUCE

Create workflow with one Python action that runs sleep 60 via subprocess.
Start workflow with st2 run.
Kill st2actionrunner supervising the Python action.
Wait forever.

Metadata

Metadata

Assignees

No one assigned

    Labels

    HAStackStorm in High AvailabilityK8sbugrunners

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions