Skip to content

Executions stuck in requested state #4539

@nmaludy

Description

@nmaludy
SUMMARY

Earlier this week we noticed that our instance had a bunch of executions stuck in the requested state and none were ever being executed. Restarting the host resolved the problem.

ISSUE TYPE
  • Bug Report
STACKSTORM VERSION
$ st2 --version
st2 2.10.1, on Python 2.7.5
OS / ENVIRONMENT / INSTALL METHOD
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)

# install method: puppet-st2

# single host installed "all in one" with NO MongoDB clustering/replica-set
# the MongoDB instance is a single instance that lives on the same host as StackStorm
STEPS TO REPRODUCE

We've been unable to come up with steps that reproduce the issue consistently. It has been noticed, usually over night, that cron rules were not producing the expected output (emails and tickets). These executions were stuck in the requested state.

EXPECTED RESULTS

Executions run when invoked.

Logs

As stated above, restarted the host fixed the issue. Unfortunately the engineer who found the issue didn't debug any further and try restarting individual services or pin point which service was the issue.

Post-mortem, we found the following in the /var/log/st2/st2scheduler.log


2019-02-04 14:48:45,229 140516293681552 INFO scheduler [-] (PID=8536) Scheduler started.
2019-02-04 14:48:45,418 140516293681552 INFO consumers [-] Starting SchedulerEntrypoint...
2019-02-04 14:48:45,429 140516109106224 INFO mixins [-] Connected to amqp://xxx:**@127.0.0.1:5672//
2019-02-04 14:51:05,599 140516198248688 ERROR traceback [-] Traceback (most recent call last):

2019-02-04 14:51:05,599 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/eventlet/hubs/poll.py", line 109, in wait

2019-02-04 14:51:05,600 140516198248688 ERROR traceback [-]     listener.cb(fileno)

2019-02-04 14:51:05,600 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/eventlet/greenthread.py", line 219, in main

2019-02-04 14:51:05,600 140516198248688 ERROR traceback [-]     result = function(*args, **kwargs)

2019-02-04 14:51:05,600 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2actions/scheduler/handler.py", line 67, in run

2019-02-04 14:51:05,601 140516198248688 ERROR traceback [-]     execution_queue_item_db = self._get_next_execution()

2019-02-04 14:51:05,601 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2actions/scheduler/handler.py", line 125, in _get_next_execution

2019-02-04 14:51:05,601 140516198248688 ERROR traceback [-]     execution_queue_item_db = ActionExecutionSchedulingQueue.query(**query).first()

2019-02-04 14:51:05,601 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 294, in first

2019-02-04 14:51:05,602 140516198248688 ERROR traceback [-]     result = queryset[0]

2019-02-04 14:51:05,602 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/mongoengine/queryset/base.py", line 194, in __getitem__

2019-02-04 14:51:05,602 140516198248688 ERROR traceback [-]     queryset._cursor[key],

2019-02-04 14:51:05,602 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/pymongo/cursor.py", line 610, in __getitem__

2019-02-04 14:51:05,603 140516198248688 ERROR traceback [-]     for doc in clone:

2019-02-04 14:51:05,603 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/pymongo/cursor.py", line 1189, in next

2019-02-04 14:51:05,603 140516198248688 ERROR traceback [-]     if len(self.__data) or self._refresh():

2019-02-04 14:51:05,603 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/pymongo/cursor.py", line 1104, in _refresh

2019-02-04 14:51:05,603 140516198248688 ERROR traceback [-]     self.__send_message(q)

2019-02-04 14:51:05,604 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/pymongo/cursor.py", line 982, in __send_message

2019-02-04 14:51:05,604 140516198248688 ERROR traceback [-]     helpers._check_command_response(first)

2019-02-04 14:51:05,604 140516198248688 ERROR traceback [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/pymongo/helpers.py", line 132, in _check_command_response

2019-02-04 14:51:05,604 140516198248688 ERROR traceback [-]     raise NotMasterError(errmsg, response)

2019-02-04 14:51:05,604 140516198248688 ERROR traceback [-] NotMasterError: interrupted at shutdown

2019-02-04 14:51:05,605 140516198248688 ERROR hub [-] Removing descriptor: 9

2019-02-05 03:28:30,413 140516198248688 INFO misc [-] Re-opening log file "/var/log/st2/st2scheduler.log" with mode "a"

2019-02-05 03:28:30,422 140516198248688 INFO misc [-] Re-opening log file "/var/log/st2/st2scheduler.audit.log" with mode "a"

It appears that the st2scheduler encountered an exception and the greenthread died and caused the service to no longer function.

In the next log file (after rotation) you can see when we fixed the issue:

2019-02-05 03:28:31,993 140516198248688 INFO misc [-] Re-opening log file "/var/log/st2/st2scheduler.log" with mode "a"

2019-02-05 03:28:31,996 140516198248688 INFO misc [-] Re-opening log file "/var/log/st2/st2scheduler.audit.log" with mode "a"

2019-02-05 10:29:53,767 140516293681552 INFO scheduler [-] (PID=8536) Scheduler stopped.
2019-02-05 10:29:53,767 140516293681552 INFO consumers [-] Shutting down SchedulerEntrypoint...
2019-02-05 10:31:30,168 139946447700368 INFO __init__ [-] Connecting to database "st2" @ "127.0.0.1:27017" as user "xxx".
2019-02-05 10:31:30,769 139946447700368 INFO __init__ [-] Successfully connected to database "st2" @ "127.0.0.1:27017" as user "xxx".
2019-02-05 10:31:46,988 139946447700368 AUDIT runnersregistrar [-] RunnerType updated. RunnerType RunnerTypeDB(d

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions