Skip to content

[aw][test audit] _infer_model_from_metrics is untested — tie-breaking non-determinism and missing branch coverage #238

@microsasa

Description

@microsasa

Root cause

_infer_model_from_metrics() in parser.py is a standalone function with three distinct branches, but it is never imported or directly tested. It is only exercised implicitly when build_session_summary calls it for completed sessions that have no currentModel on the shutdown event. The most important branch — picking the model with the highest requests.count when multiple models are present — is never explicitly covered, and the tie-breaking behaviour when two models share the same count is entirely undocumented.

The function

# parser.py
def _infer_model_from_metrics(metrics: dict[str, ModelMetrics]) -> str | None:
    if not metrics:
        return None
    if len(metrics) == 1:
        return next(iter(metrics))
    return max(metrics, key=lambda m: metrics[m].requests.count)

Branches with no test

Branch Condition Current coverage
Empty dict not metrics ❌ never called with {} directly
Single key len(metrics) == 1 ❌ never called directly; only indirectly via build_session_summary
Multi-key — clear winner one model has higher count ❌ not verified that the correct model is returned
Multi-key — tie two models have identical count max() returns the first maximum found in iteration order (insertion order); result is implementation-defined and could silently change

Tie-breaking non-determinism

When requests.count is equal across models, Python's max() returns the first value encountered in the dict (which in CPython 3.7+ means insertion order). This means the model shown in the CLI depends on the order events happened to be recorded — a perfectly valid alternative session replay could produce a different model name. There is no test to document which model wins or to alert if the behaviour changes.

Related gap: build_session_summary multi-model completed session without currentModel

The only path that reaches _infer_model_from_metrics in integration is a completed session where neither ev.currentModel nor data.currentModel is set. This specific combination (shutdown event missing currentModel at the top level AND in data, but having modelMetrics with two or more models) has no test in test_parser.py.


Expected behaviour to assert

Direct unit tests for _infer_model_from_metrics

from copilot_usage.parser import _infer_model_from_metrics  # pyright: ignore[reportPrivateUsage]
from copilot_usage.models import ModelMetrics, RequestMetrics

def test_infer_empty_returns_none() -> None:
    assert _infer_model_from_metrics({}) is None

def test_infer_single_key_returns_it() -> None:
    metrics = {"claude-sonnet-4": ModelMetrics(requests=RequestMetrics(count=5))}
    assert _infer_model_from_metrics(metrics) == "claude-sonnet-4"

def test_infer_multi_key_returns_highest_count() -> None:
    metrics = {
        "claude-sonnet-4": ModelMetrics(requests=RequestMetrics(count=3)),
        "claude-opus-4.6": ModelMetrics(requests=RequestMetrics(count=10)),
    }
    assert _infer_model_from_metrics(metrics) == "claude-opus-4.6"

def test_infer_tie_returns_a_model_deterministically() -> None:
    """When counts are equal, a specific model must always win.
    
    This test documents (and locks in) the current tie-breaking behaviour.
    If the behaviour changes, the test should be updated intentionally.
    """
    metrics = {
        "model-a": ModelMetrics(requests=RequestMetrics(count=5)),
        "model-b": ModelMetrics(requests=RequestMetrics(count=5)),
    }
    result = _infer_model_from_metrics(metrics)
    # Both are valid; the test asserts a specific one to detect unintentional changes.
    assert result in ("model-a", "model-b")
    # Run again to confirm stability within a single process
    assert _infer_model_from_metrics(metrics) == result

Integration test: completed session without currentModel

def test_build_summary_infers_model_when_current_model_absent(tmp_path):
    """Shutdown event with no currentModel → _infer_model_from_metrics is used."""
    # Shutdown event with two models; neither currentModel field is present
    shutdown = json.dumps({
        "type": "session.shutdown",
        "data": {
            "shutdownType": "routine",
            "totalPremiumRequests": 8,
            "totalApiDurationMs": 5000,
            "modelMetrics": {
                "claude-sonnet-4": {"requests": {"count": 3, "cost": 3}, "usage": {...}},
                "claude-opus-4.6": {"requests": {"count": 8, "cost": 8}, "usage": {...}},
            },
            # no "currentModel" in data
        },
        # no "currentModel" at top level
    })
    events = parse_events(...)
    summary = build_session_summary(events)
    assert summary.model == "claude-opus-4.6"  # highest count wins

Why this matters

The inferred model name drives two user-facing outputs:

  1. The Model column in every rendered table.
  2. The estimated premium cost calculation in render_cost_view and render_live_sessions (via _estimate_premium_cost, which calls lookup_model_pricing(model)).

A wrong model (e.g., a free model selected over a premium one due to a tie) would cause the cost estimate to be silently wrong. Direct tests for _infer_model_from_metrics provide a safety net for this invariant.

Generated by Test Suite Analysis ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    awCreated by agentic workflowaw-dispatchedIssue has been dispatched to implementertest-auditTest coverage gaps

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions