Summary
packages/nvidia_nat_memmachine/tests/test_memmachine_api_calls.py::TestDataTransformation::test_conversation_messages_preserved_in_order is a flaky test that fails non-deterministically under CI load (observed on Python 3.13, amd64).
Root Cause
MemMachineEditor.add_items wraps each conversation message in asyncio.to_thread(add_memory) and dispatches all tasks concurrently via asyncio.gather(*tasks) (memmachine_editor.py:176-202). asyncio.gather makes no ordering guarantee — whichever thread completes first records its call first in the spy's list.
The test then asserts by index position:
assert add_calls[0]['kwargs']['content'] == "First message"
assert add_calls[1]['kwargs']['content'] == "Second message" # ← can be "Third message"
assert add_calls[2]['kwargs']['content'] == "Third message"
Under a loaded thread pool (Python 3.13 CI), thread 3 can complete before thread 2, causing add_calls[1] to hold "Third message" instead of "Second message".
Observed failure
FAILED packages/nvidia_nat_memmachine/tests/test_memmachine_api_calls.py::TestDataTransformation::test_conversation_messages_preserved_in_order
AssertionError: assert 'Third message' == 'Second message'
Fix
The parallel dispatch in add_items is intentional for performance. The test should assert by content lookup (the same pattern already used in test_add_conversation_calls_add_with_correct_parameters) rather than by insertion index.
Proposed fix: replace positional index assertions with content-based lookups that verify all three messages are present with correct roles, regardless of completion order.
Summary
packages/nvidia_nat_memmachine/tests/test_memmachine_api_calls.py::TestDataTransformation::test_conversation_messages_preserved_in_orderis a flaky test that fails non-deterministically under CI load (observed on Python 3.13, amd64).Root Cause
MemMachineEditor.add_itemswraps each conversation message inasyncio.to_thread(add_memory)and dispatches all tasks concurrently viaasyncio.gather(*tasks)(memmachine_editor.py:176-202).asyncio.gathermakes no ordering guarantee — whichever thread completes first records its call first in the spy's list.The test then asserts by index position:
Under a loaded thread pool (Python 3.13 CI), thread 3 can complete before thread 2, causing
add_calls[1]to hold"Third message"instead of"Second message".Observed failure
Fix
The parallel dispatch in
add_itemsis intentional for performance. The test should assert by content lookup (the same pattern already used intest_add_conversation_calls_add_with_correct_parameters) rather than by insertion index.Proposed fix: replace positional index assertions with content-based lookups that verify all three messages are present with correct roles, regardless of completion order.