-
-
Notifications
You must be signed in to change notification settings - Fork 782
Refactor scheduler process to exit properly #4543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add retries in the scheduler handler to temporarily handle DB connection failures. Refactor how threads exit for the process to return proper code.
| eventlet.greenthread.sleep(cfg.CONF.scheduler.gc_interval) | ||
| self._handle_garbage_collection() | ||
|
|
||
| @retrying.retry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally still not too sure about this retry here. It's seems like a one off - aka we don't do it in other similar services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides that, the linking looks good to me, let's please just add some test cases for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the retries there is ok. For a single server install w/o any complex service management, this prevents temporary hiccups with MongoDB connection. If users want to fail fast, they can reconfigure the retries. If you're worry about consistency, we can revisit. After this issue, our service pattern need an overhaul.
entrypoint exceptions. NOTE: Tests currently fail because issue hasn't been fully fixed yet.
|
I added some test cases in 442c57e. It wasn't totally trivial to write them because we need to emulate async nature of those services and throw an exception in a specific place ( If we don't emulate that async nature and exception will be thrown inside Those tests, Problem lies in this line of code - https://github.com/StackStorm/st2/pull/4543/files#diff-4d1c13310fc6ebda8f5e5d2ec414c2f9R71. Doing In short - the same original issue still exists. You can also replicate the issue manually in the same manner by adding Again, it's important you add it there. If you add it inside start(), run() or similar the exception will be correctly propagated and process will exit because that happens before the blocking |
|
EDIT: So was actually wrong about We should be fine as long as we call Having said that,
It's the same as doing: thread1.wait() # blocks until thread1 finishes / returns
thread2.wait() # blocks until thread2 finishes / returnsAnd it means we will always block for So if there is a chance that thread2 can exit / finish / throw an exception before thread1 and we want to consider that error as fatal and exit the whole service, we can't use such approach. |
|
Yeah. That's why I commented that the handler.wait needs to be evaluated first since entrypoint is more durable. I agree, eventlets give us very little option to wait on multiple threads. Either we use |
Regenerated the sample st2 config with the scheduler retry configuration options.
Add a unit test to cover failure in the handler cleanup. This should signal the run method to also pause and exit the scheduler handler process.
Add unit tests to cover the retries in the run and cleanup in the scheduler handler.
Add or move the parsing of test configs to the top of affected test modules and make sure the scheduler default config options do not conflict with test configs.
|
Let's please add a changelog entry. Besides that, LGTM 👍 |
Add retries in the scheduler handler to temporarily handle DB connection failures. Refactor how threads exit for the process to return proper code. Fixes #4539