-
-
Notifications
You must be signed in to change notification settings - Fork 532
Description
Problem
When the Odoo server crashes or is otherwise force-stopped, running jobs are interrupted while the runner has no chance to know they have been aborted. In such situations, jobs may remain in started or enqueued state after the Odoo server is halted. Since the runner has no way to know if they are actually running or not, and does not know for sure if it is safe to restart the jobs, it does not attempt to restart them automatically. Such stale jobs therefore fill the running queue and prevent other jobs to start. You must therefore requeue them manually, either from the Jobs view, or by running the following SQL statement before starting Odoo:
update queue_job set state='pending' where state in ('started', 'enqueued')
Result of this - channel is lost as system is thinking that job is started, while it is not really doing anything
Solution
This problem exists since beginning of time (means from the beginning of queue_job). So I guess lot's of brilliant brains were considering different solutions. But I was thinking that problem sounds not so hard so be solved with below approach.
-
Currently we are running Job Runner Thread on Odoo Startup using this patch https://github.com/OCA/queue/blob/14.0/queue_job/jobrunner/__init__.py#L69
-
Than we will call initialize_database() method before we are starting processing jobs https://github.com/OCA/queue/blob/14.0/queue_job/jobrunner/runner.py#L501
-
And here we already have possibility to connect to the database. https://github.com/OCA/queue/blob/14.0/queue_job/jobrunner/runner.py#L414
So I suggest method that will run this script below before. We of course can make "SELECT FOR UPDATE" to be on the safe side and do not conflict with other processes that may query for the same records.
update queue_job set state='pending' where state in ('started', 'enqueued')
TO me this fix sounds safe.
But I believe @guewen you was considering this already. So before suggesting PR, maybe you see issues with above method?