I'm occasionally seeing AWSIoTMQTTClient.connect() indefinitely hang. Seems to be the same issue as reported in #40, but there wasn't a proper resolution found there.
Logs when this happen:
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,819 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Initializing MQTT layer...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,824 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Registering internal event callbacks to MQTT layer...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,825 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - MqttCore initialized
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,825 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Client id: 020000035
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,826 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Protocol version: MQTTv3.1.1
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,827 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Authentication type: TLSv1.2 certificate based Mutual Auth.
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,828 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring endpoint...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,828 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring certificates...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,830 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring offline requests queueing: max queue size: 0
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,834 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring offline requests queue draining interval: 0.500000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,836 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring connect/disconnect time out: 10.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,837 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Configuring MQTT operation time out: 30.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,838 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Performing sync connect...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,839 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Performing async connect...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,839 - AWSIoTPythonSDK.core.protocol.mqtt_core - INFO - Keep-alive: 600.000000 sec
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,846 - AWSIoTPythonSDK.core.protocol.internal.workers - DEBUG - Event consuming thread started
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,847 - AWSIoTPythonSDK.core.protocol.mqtt_core - DEBUG - Passing in general notification callbacks to internal client...
Mar 28 19:29:35 my_application.py[1937]: 2019-03-28 19:29:35,848 - AWSIoTPythonSDK.core.protocol.internal.clients - DEBUG - Filling in fixed event callbacks: CONNACK, DISCONNECT, MESSAGE
Comparing this to the logs when everything works, it looks as though it's hanging before the Starting network I/O thread... log message is printed from clients.py:
|
self._logger.debug("Starting network I/O thread...") |
Which leads me to believe it's probably hanging on one of the Lock.acquire calls in either reconnect:
or
connect_async:
|
def connect_async(self, host, port=1883, keepalive=60, bind_address=""): |
It's a frustrating issue as it's difficult to detect when it has occurred. If there are contention issues for the Locks, I'd much rather the SDK throw an exception than hang forever so that my application can still recover.
Frustratingly, I haven't found a way to try to replicate this yet.
Appreciate any help or insight!
I'm occasionally seeing
AWSIoTMQTTClient.connect()indefinitely hang. Seems to be the same issue as reported in #40, but there wasn't a proper resolution found there.Logs when this happen:
Comparing this to the logs when everything works, it looks as though it's hanging before the
Starting network I/O thread...log message is printed fromclients.py:aws-iot-device-sdk-python/AWSIoTPythonSDK/core/protocol/internal/clients.py
Line 126 in 832f074
Which leads me to believe it's probably hanging on one of the
Lock.acquirecalls in eitherreconnect:aws-iot-device-sdk-python/AWSIoTPythonSDK/core/protocol/paho/client.py
Line 736 in 832f074
connect_async:aws-iot-device-sdk-python/AWSIoTPythonSDK/core/protocol/paho/client.py
Line 704 in 832f074
It's a frustrating issue as it's difficult to detect when it has occurred. If there are contention issues for the Locks, I'd much rather the SDK throw an exception than hang forever so that my application can still recover.
Frustratingly, I haven't found a way to try to replicate this yet.
Appreciate any help or insight!