Skip to content

Flaky test: test_conversation_messages_preserved_in_order races on asyncio.gather ordering #1855

@fede-kamel

Description

@fede-kamel

Summary

packages/nvidia_nat_memmachine/tests/test_memmachine_api_calls.py::TestDataTransformation::test_conversation_messages_preserved_in_order is a flaky test that fails non-deterministically under CI load (observed on Python 3.13, amd64).

Root Cause

MemMachineEditor.add_items wraps each conversation message in asyncio.to_thread(add_memory) and dispatches all tasks concurrently via asyncio.gather(*tasks) (memmachine_editor.py:176-202). asyncio.gather makes no ordering guarantee — whichever thread completes first records its call first in the spy's list.

The test then asserts by index position:

assert add_calls[0]['kwargs']['content'] == "First message"
assert add_calls[1]['kwargs']['content'] == "Second message"   # ← can be "Third message"
assert add_calls[2]['kwargs']['content'] == "Third message"

Under a loaded thread pool (Python 3.13 CI), thread 3 can complete before thread 2, causing add_calls[1] to hold "Third message" instead of "Second message".

Observed failure

FAILED packages/nvidia_nat_memmachine/tests/test_memmachine_api_calls.py::TestDataTransformation::test_conversation_messages_preserved_in_order
AssertionError: assert 'Third message' == 'Second message'

Fix

The parallel dispatch in add_items is intentional for performance. The test should assert by content lookup (the same pattern already used in test_add_conversation_calls_add_with_correct_parameters) rather than by insertion index.

Proposed fix: replace positional index assertions with content-based lookups that verify all three messages are present with correct roles, regardless of completion order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions