-
Notifications
You must be signed in to change notification settings - Fork 26
Closed
Description
I have a single threaded worker defined as:
w = Worker(queues=['etherbi_decode'],
concurrency=1,
executor=ThreadPoolExecutor)which works for a 2-3 hours and at some point falls into an infinite loop, not getting any new jobs and not sending heartbeats to the faktory server.
Logs from the worker before it falls into infinite loop:
INFO:faktory.connection:Connecting to faktory-faktory:7419 (with password None)
ERROR:faktory.worker:Task failed: 50c676346b5a4307826de2245caf5593
Traceback (most recent call last):
File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "etherbi/decoder/decoder_worker.py", line 225, in decode_bucket
_enqueue_block_decoded_task(task_number, 'calculate_burn_rate', 'etherbi_burn_rate_calculation')
File "etherbi/decoder/decoder_worker.py", line 187, in _enqueue_block_decoded_task
if faktory_client.queue(task, queue=queue, args=[task_number]):
File "/app/src/faktory/faktory/client.py", line 32, in queue
self.connect()
File "/app/src/faktory/faktory/client.py", line 21, in connect
self.is_connected = self.faktory.connect()
File "/app/src/faktory/faktory/_proto.py", line 64, in connect
self.socket.connect((self.host, self.port))
socket.timeout: timed out
INFO:__main__:Transaction to 0xaf30d2a7e90d7dc361c8c4585e9bb7d2f6f15bc7 is recognized in block 3917149 on position 98.
INFO:__main__:Transaction to 0xaf30d2a7e90d7dc361c8c4585e9bb7d2f6f15bc7 is recognized in block 3917154 on position 2.
<SOME_WORK_RELATED_LOGS_AS_THE_ABOVE>
INFO:faktory.connection:Connecting to faktory-faktory:7419 (with password None)
INFO:faktory.connection:Disconnected
running sudo strace -p <PID> -e trace=network -f -s 10000 against the process I get infinite stream of
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
[pid 16254] recvfrom(3, "", 4096, 0, NULL, NULL) = 0
We use python and elixir clients against the same faktory server and so far this behavior is observed only with the python workers.
Worker version: c5cb89b
Server version: 0.7.0