Skip to content

Relax lint failures and fix async tests#7

Merged
santoshkumarradha merged 3 commits intomainfrom
santosh/async-relations
Nov 11, 2025
Merged

Relax lint failures and fix async tests#7
santoshkumarradha merged 3 commits intomainfrom
santosh/async-relations

Conversation

@santoshkumarradha
Copy link
Copy Markdown
Member

Summary

  • allow control-plane lint and python formatting steps to continue-on-error so main jobs can report results
  • add auto-format step to python workflow to keep diffs small
  • update python SDK tests so async flows are mocked deterministically

Testing

  • cd control-plane && go test ./...
  • cd sdk/python && pytest

@santoshkumarradha santoshkumarradha merged commit d0be017 into main Nov 11, 2025
4 checks passed
@santoshkumarradha santoshkumarradha deleted the santosh/async-relations branch November 11, 2025 18:41
AbirAbbas added a commit that referenced this pull request Apr 10, 2026
Adds 9 additional regression tests covering scenarios closer to the
production failure shape and edge cases that could surface more bugs.

Concurrency / realistic workloads (test_agent_ai_deadlock_recovery.py):

- test_parallel_hangs_some_succeed_some_recover_via_pool_reset:
    The exact production scenario from extract_all_entities. Spawns 10
    concurrent ai() calls via asyncio.gather; calls #3 and #7 hang on a
    "stale socket". Asserts the 8 healthy ones return successfully (a
    hung call's safety-net firing must not corrupt unrelated in-flight
    requests), the 2 hung ones surface as TimeoutError, the pool is
    reset, AND a follow-up batch of 5 calls all succeed (recovery is
    durable, not one-shot).

- test_cascading_sequential_hangs_each_recover_independently:
    Three calls in a row each hang then time out, then a fourth call
    succeeds. Catches a future regression where reset accidentally
    caches state (e.g. someone adds a `_already_reset` flag). Each
    timeout must trigger a fresh reset.

- test_fallback_model_used_after_primary_hangs:
    Primary model hangs, fallback model is configured, fallback must be
    invoked successfully and the pool reset must run between them.
    Exercises the interplay between `_make_litellm_call` and
    `_execute_with_fallbacks`.

- test_tool_calling_loop_recovers_from_hang:
    The tool-calling loop has its own copy of the timeout + reset logic
    in `_tool_loop_completion._make_call`. This test routes through
    `execute_tool_call_loop` (mocked to just call make_completion once)
    and verifies the tool-loop path also recovers from hangs. Catches
    the regression where someone fixes one path and forgets the other.

Reset robustness:

- test_concurrent_resets_are_safe:
    20 concurrent resets via asyncio.gather. None should raise; all
    should observe the final cleared state. Pins down the property so
    a future "optimization" cannot break it.

- test_reset_swallows_exceptions_from_broken_cache:
    If `in_memory_llm_clients_cache.clear()` raises (e.g. third-party
    plugin replaced the cache with a broken object), the reset must
    NOT propagate — otherwise it would mask the original TimeoutError
    the caller is trying to surface.

- test_reset_does_not_clobber_unrelated_module_attrs:
    Catches a regression where someone changes the implementation to
    iterate over `dir(litellm_module)` and accidentally wipes config
    flags, callbacks, or api_key. Asserts suppress_debug_info,
    set_verbose, api_key, and success_callback all survive.

/debug/tasks robustness (test_agent_server.py):

- test_debug_tasks_endpoint_survives_cancelled_and_done_tasks:
    The endpoint must remain responsive even when the live task set
    contains tasks in pathological states (cancelled, done with
    exception). JSON must still be well-formed.

- test_debug_tasks_endpoint_reports_done_and_cancelled_state:
    Pins down the schema: each task entry must include `done=` and
    `cancelled=` markers so operators can quickly distinguish "stuck on
    await" from "finished but not yet collected" when diagnosing a hang.

Total: 63 passing tests across test_agent_ai.py + test_agent_server.py +
test_agent_ai_deadlock_recovery.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant