Skip to content

Coordinator errors if session expires  #4015

@nmaludy

Description

@nmaludy

In standing up our HA setup i came across an interesting corner case.

We chose Consul as our coordination backend for tooz and after a short while it appears that the auth session stored in the coordinator expired. When this happens and the coordination is then used to try and obtain a lock an exception is thrown. The exception below is seen in /var/log/st2/st2api.log when trying to add a key to the k/v store st2 key set testkey testvalue.

In order to restore functionality the st2api service has to be restarted.

2018-02-27 21:29:56,704 87179824 ERROR router [-] Failed to call controller function "put" for operation "st2api.controllers.v1.keyvalue:key_value_pair_controller.put": 500 rpc error making call: invalid session "2e20ab1d-4c3c-f224-02e9-f8261024ac1b"
Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/router.py", line 450, in __call__
    resp = func(**kw)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2api/controllers/v1/keyvalue.py", line 192, in put
    with self._coordinator.get_lock(lock_name):
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/tooz/locking.py", line 52, in __enter__
    acquired = self.acquire(*args, **kwargs)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/tooz/drivers/consul.py", line 67, in acquire
    return _acquire()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/tenacity/__init__.py", line 214, in wrapped_f
    return self.call(f, *args, **kw)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/tenacity/__init__.py", line 295, in call
    start_time=start_time)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/tenacity/__init__.py", line 252, in iter
    return fut.result()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/concurrent/futures/_base.py", line 455, in result
    return self.__get_result()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
    result = fn(*args, **kwargs)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/tooz/drivers/consul.py", line 58, in _acquire
    acquire=self._session_id)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/consul/base.py", line 605, in put
    CB.json(), '/v1/kv/%s' % key, params=params, data=value)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/consul/std.py", line 28, in put
    cert=self.cert)))
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/consul/base.py", line 207, in cb
    CB.__status(response, allow_404=allow_404)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/consul/base.py", line 164, in __status
    raise ConsulException("%d %s" % (response.code, response.body))
ConsulException: 500 rpc error making call: invalid session "2e20ab1d-4c3c-f224-02e9-f8261024ac1b"
2018-02-27 21:29:56,708 87179824 ERROR error_handling [-] API call failed: 500 rpc error making call: invalid session "2e20ab1d-4c3c-f224-02e9-f8261024ac1b"
Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/middleware/error_handling.py", line 46, in __call__
    return self.app(environ, start_response)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/middleware/streaming.py", line 47, in __call__
    return self.app(environ, start_response)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/router.py", line 499, in as_wsgi
    resp = self(req)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2common/router.py", line 454, in __call__
    raise e
ConsulException: 500 rpc error making call: invalid session "2e20ab1d-4c3c-f224-02e9-f8261024ac1b" (_exception_data={},_exception_class='ConsulException',_exception_message='500 rpc error making call: invalid session "2e20ab1d-4c3c-f224-02e9-f8261024ac1b"')

This may be a bug in tooz but wanted to start the conversation here. Should StackStorm be refreshing/retrying the coordinator if an error occurs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    HAStackStorm in High Availability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions