-
-
Notifications
You must be signed in to change notification settings - Fork 532
[10.0][IMP] queue_job: requeue zombie jobs after hard shutdown #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @guewen, |
|
This is unfortunately not safe, because the job runner may be running on a different Odoo instance (possibly on another machine) than the jobs themselves. So it could be that the runner has crashed but the jobs are actually still running elsewhere. |
|
@sbidoul |
That is an interesting idea. Currently the RunJobController does not keep a lock on job records while executing the job, but we could experiment reacquiring the lock around here. |
|
isn't this tackled by this cron queue/queue_job/models/queue_job.py Line 279 in e0c5096
|
|
@simahawk |
|
@len-foss I wouldnt' say horrible :) it's a hard problem for which no-one contributed a complete solution so far. @simahawk I did not know that cron. For detecting stuck So yeah @len-foss's idea looks interesting to me: keep a lock on started jobs, and have something in the job runner loop that resets started jobs on which there is no lock. This new lock might have unexpected effects and backward compatibility implications, not sure, to be tested. |
c9c6106 to
057c916
Compare
|
@sbidoul |
|
@len-foss fair enough. Would you create an issue with your idea from #423 (comment) and close this PR ? |
|
@sbidoul |
| http.request.env.cr.commit() | ||
|
|
||
| _logger.debug('%s started', job) | ||
| job.lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is a small window between the commit and here where the job state could be reset, maybe we should do job.lock(expected_state=STARTED) and not exit if the expected state is not correct.
Ah I had not seen your last commit. It looks good to me. So let's keep this PR around until someone has a chance to battle test it. |
|
@len-foss @sbidoul we tested this PR and we got a lot of blocked queries on
Each time we had to kill these blocked queries and requeue the jobs. Without this PR we don't face anymore any issue. It could be reproduced again locally with jobs raising an exception I guess, we didn't take time for that. |
|
@sebalix you actually tested before we had a chance to do it :) I believe the behavior you observe is because in case of exception we try to write on the job in a different transaction and that creates a deadlock because the main transaction still holds a lock on the job record. So yeah, this PR is not going to work as is. Resetting to draft. |
|
There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days. |
No description provided.