Skip to content

fix: ble: Race condition in receive for notifications#82

Merged
JPHutchins merged 1 commit intointercreate:mainfrom
randyscott:fix/ble/smp_notif_race
Oct 27, 2025
Merged

fix: ble: Race condition in receive for notifications#82
JPHutchins merged 1 commit intointercreate:mainfrom
randyscott:fix/ble/smp_notif_race

Conversation

@randyscott
Copy link
Contributor

Fix a race condition relating to BLE notifications. If the notification arrives before the receive() method is called, the notification will be lost and the transfer will time out. This happens because self._notify_condition.notify() does nothing without a pending self._notify_condition.wait().

Only wait for notifications if the buffer does not contain enough data for the header.

Fix a race condition relating to BLE notifications. If the notification arrives before the `receive()` method is called, the notification will be lost and the transfer will time out. This happens because `self._notify_condition.notify()` does nothing without a pending `self._notify_condition.wait()`.

Only wait for notifications if the buffer does not contain enough data for the header.
@JPHutchins JPHutchins self-requested a review October 23, 2025 03:08
@JPHutchins
Copy link
Collaborator

I think that this makes sense - does it happen when using the request method which normally calls send_and_receive?

If we had to, we could start waiting for the response before sending the request. Or have a receive task always running, carefully.

@randyscott
Copy link
Contributor Author

I think that this makes sense - does it happen when using the request method which normally calls send_and_receive?

Yes, exactly. This was an image upload, uses send_and_receive. Was able to catch a debug log showing the notification arriving before the wait. This was the cause of pretty consistent failures when programming one of my boards.

If we had to, we could start waiting for the response before sending the request. Or have a receive task always running, carefully.

Would be a bigger change, of course. I do like the simplicity of this change here and it seems like a pretty safe way to handle the race condition.

@JPHutchins
Copy link
Collaborator

JPHutchins commented Oct 25, 2025

I think that this makes sense - does it happen when using the request method which normally calls send_and_receive?

Yes, exactly. This was an image upload, uses send_and_receive. Was able to catch a debug log showing the notification arriving before the wait. This was the cause of pretty consistent failures when programming one of my boards.

If we had to, we could start waiting for the response before sending the request. Or have a receive task always running, carefully.

Would be a bigger change, of course. I do like the simplicity of this change here and it seems like a pretty safe way to handle the race condition.

I was worried that the fix is relying on subsequent bytes being received, but looking again I think that it's not a problem. If I understand correctly, in the original error case of response arriving before wait(), this fix means that the wait call never happens (good). Is that correct? Is the same bug present in the code that receives the rest of the packet?

I found this feature, and I'm curious if it would help: https://docs.python.org/3/library/asyncio-sync.html#asyncio.Condition.wait_for

https://github.com/python/cpython/blob/main/Lib/asyncio/locks.py#L303-L315

If not, I'm happy to merge as is.

Thanks again for finding and fixing this! I'll bump smpclient and smpmgr as soon as this is merged.

@randyscott
Copy link
Contributor Author

If I understand correctly, in the original error case of response arriving before wait(), this fix means that the wait call never happens (good). Is that correct?

Yes, correct. That initial wait never happens (assuming that you got at least 8 bytes for the header in the initial notification).

Is the same bug present in the code that receives the rest of the packet?

As far as I can tell, no, the bug doesn't exist in that second half. That code always checks the buffer length before waiting, which is what the first wait failed to do.

I found this feature, and I'm curious if it would help: https://docs.python.org/3/library/asyncio-sync.html#asyncio.Condition.wait_for

Condition.wait_for does appear to be potential solution, since it allows checking for a predicate before waiting. In this case, it would check on the length of the buffer. Again, it would take a little more rework since the actual wait on self._notify_condition happens inside of self._notify_or_disconnect(). The change that was implemented in this PR is does almost exactly what wait_for would do, but it does it for both the _nofify_condition and _disconnected_event objects.

I think that this is still probably the most straightforward approach. Thanks for digging into this. I definitely appreciate a second opinion!

@JPHutchins JPHutchins merged commit bf5787d into intercreate:main Oct 27, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants