Locker doesn't release locks for processed jobs under some circumstances

Hi 👋

I'm noticing an unexpected accumulation of locks by our worker processes. It seems that under some circumstances, advisory locks are not being released by the locker. Here's a query that hopefully illustrates the problem:

```sql
with
  pid_locks as
    (select pl.pid, count(*) from pg_locks pl inner join que_lockers ql on pl.pid = ql.pid group by pl.pid),
  pid_locked_jobs as
    (select pl.pid, count(*) from pg_locks pl inner join que_jobs qj on pl.objid = qj.id group by pl.pid)
select ql.pid, worker_count, queues, listening, pid_locks.count as active_locks, pid_locked_jobs.count as active_jobs
from que_lockers ql
left join pid_locks on ql.pid = pid_locks.pid
left join pid_locked_jobs on pid_locked_jobs.pid = ql.pid;
```

(Our job IDs are still in 32-bit range, so I didn't need to use `classid` in the `pid_locked_jobs` join.)

```
  pid  | worker_count |  queues   | listening | active_locks | active_jobs
-------+--------------+-----------+-----------+--------------+-------------
 15649 |           12 | {default} | t         |         4255 |           3
 19095 |           12 | {default} | t         |          673 |           4
 16646 |           12 | {default} | t         |         1148 |           6
 31188 |           12 | {default} | t         |         7909 |          10
(4 rows)
```

As you can see, at the moment we have 4 Que processes running (on Heroku, running on 4 different dynos), each with 12 workers. Our throughput is on average 400 jobs/minute, peaking occasionally at 800 jobs/minute. Normally our queue is almost empty.

However, all locker processes have hundreds/thousands of unreleased locks. I've seen the total number of locks exceed 50k and I'm suspecting it lead to `out of shared memory` errors we experienced earlier this week. I can see this problem locally as well, but I still haven't narrowed down on the simple way to reproduce it.

Does/did anybody see similar behavior? 

Any ideas on what to explore/test/log would be appreciated!

Stack details:

* Que 1.0.0.beta3 (using the ActiveJob adapter)
* Rails 5.2.4.3
* Postgresql 12
* We have a custom job middleware for reporting job metrics, but I can see the problem happening even when the job middleware disabled


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locker doesn't release locks for processed jobs under some circumstances #290

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Locker doesn't release locks for processed jobs under some circumstances #290

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions