Skip to content

[aw][test audit] Test gaps: parser utilities — session naming edge cases, model-metric merging, OS-error recovery #19

@github-actions

Description

@github-actions

Three utility functions in parser.py and report.py that contain non-trivial branching logic have no direct unit tests. They are exercised only indirectly through higher-level integration paths, so regressions in the individual functions would be masked.

Untested functions / branches

1. _extract_session_name (parser.py ~L98)

Three branches are unreachable with current tests:

plan.md content Expected return
First line is plain text (no # prefix) None
File is completely empty None
File exists but raises OSError on read (e.g., permission denied) None (error swallowed)

The OSError path is important for graceful degradation in environments where the agent-state directory is not fully readable.

2. _aggregate_model_metrics (report.py ~L544)

This function merges per-model token/request data across multiple sessions. It has no direct unit tests at all — it is only exercised implicitly through render_summary and render_full_summary.

Specific behaviours to assert directly:

  • Two sessions using the same model → requests.count, requests.cost, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens are all summed correctly.
  • Two sessions using different models → each model appears independently in the result.
  • Empty session list → empty dict returned.
  • Session with empty model_metrics → does not break the merge; other sessions' data is unaffected.

3. get_all_sessions OS-error skip (parser.py ~L300)

except (FileNotFoundError, OSError) as exc:
    logger.warning("Skipping vanished session %s: %s", events_path, exc)
    continue

This path — a file that exists at discovery time but is gone (or unreadable) when parse_events is called — has no test. The expected behaviour is: the vanished session is silently skipped and the remaining sessions are returned normally.

4. _filter_sessions with None start times (report.py ~L523)

When a date filter is active, sessions without a start_time are silently excluded:

if s.start_time is None:
    continue

No test verifies this. A session with start_time=None should disappear from the filtered list even if it would otherwise match.

5. _render_totals singular grammar (report.py ~L587)

pr_label = "premium request" if total_premium == 1 else "premium requests"
session_label = "session" if total_sessions == 1 else "sessions"

The singular form ("1 premium request", "1 session") is never asserted. A test calling render_summary (or _render_totals directly) with exactly one session and one premium request should confirm the singular labels appear in output.

Tests to add

tests/copilot_usage/test_parser.py

class TestExtractSessionName:
    def test_plain_text_line_returns_none(self, tmp_path):
        plan = tmp_path / "plan.md"
        plan.write_text("Just plain text\n", encoding="utf-8")
        from copilot_usage.parser import _extract_session_name
        assert _extract_session_name(tmp_path) is None

    def test_empty_file_returns_none(self, tmp_path):
        plan = tmp_path / "plan.md"
        plan.write_text("", encoding="utf-8")
        from copilot_usage.parser import _extract_session_name
        assert _extract_session_name(tmp_path) is None

    def test_oserror_returns_none(self, tmp_path, monkeypatch):
        plan = tmp_path / "plan.md"
        plan.write_text("# Title\n", encoding="utf-8")
        monkeypatch.setattr(plan.__class__, "read_text",
                            lambda *a, **kw: (_ for _ in ()).throw(OSError("denied")))
        from copilot_usage.parser import _extract_session_name
        assert _extract_session_name(tmp_path) is None


class TestGetAllSessionsOsError:
    def test_vanished_session_is_skipped(self, tmp_path):
        """A session that disappears after discovery is silently skipped."""
        from copilot_usage.parser import get_all_sessions, discover_sessions
        # Create two valid sessions
        for sid in ["sess-a", "sess-b"]:
            d = tmp_path / sid
            d.mkdir()
            (d / "events.jsonl").write_text(
                '{"type":"session.start","data":{"sessionId":"' + sid + '"}}\n'
            )
        # Make one file unreadable
        (tmp_path / "sess-a" / "events.jsonl").chmod(0o000)
        try:
            results = get_all_sessions(tmp_path)
            # Only one session should come back; the unreadable one is skipped
            assert len(results) == 1
        finally:
            (tmp_path / "sess-a" / "events.jsonl").chmod(0o644)

tests/copilot_usage/test_report.py

class TestAggregateModelMetrics:
    """Direct unit tests for _aggregate_model_metrics."""

    def test_same_model_two_sessions_sums_fields(self):
        from copilot_usage.report import _aggregate_model_metrics
        s1 = SessionSummary(session_id="s1", model_metrics={
            "claude-sonnet-4": ModelMetrics(
                requests=RequestMetrics(count=3, cost=2),
                usage=TokenUsage(inputTokens=100, outputTokens=50,
                                 cacheReadTokens=10, cacheWriteTokens=5))})
        s2 = SessionSummary(session_id="s2", model_metrics={
            "claude-sonnet-4": ModelMetrics(
                requests=RequestMetrics(count=7, cost=4),
                usage=TokenUsage(inputTokens=200, outputTokens=80,
                                 cacheReadTokens=20, cacheWriteTokens=15))})
        merged = _aggregate_model_metrics([s1, s2])
        m = merged["claude-sonnet-4"]
        assert m.requests.count == 10
        assert m.requests.cost == 6
        assert m.usage.inputTokens == 300
        assert m.usage.outputTokens == 130
        assert m.usage.cacheReadTokens == 30
        assert m.usage.cacheWriteTokens == 20

    def test_different_models_kept_separate(self):
        from copilot_usage.report import _aggregate_model_metrics
        s1 = SessionSummary(session_id="s1", model_metrics={
            "model-a": ModelMetrics(usage=TokenUsage(outputTokens=100))})
        s2 = SessionSummary(session_id="s2", model_metrics={
            "model-b": ModelMetrics(usage=TokenUsage(outputTokens=200))})
        merged = _aggregate_model_metrics([s1, s2])
        assert "model-a" in merged and "model-b" in merged
        assert merged["model-a"].usage.outputTokens == 100

    def test_empty_list_returns_empty(self):
        from copilot_usage.report import _aggregate_model_metrics
        assert _aggregate_model_metrics([]) == {}


class TestFilterSessionsNoneStartTime:
    def test_none_start_time_excluded_when_filtering(self):
        from copilot_usage.report import _filter_sessions
        from datetime import UTC, datetime
        no_time = SessionSummary(session_id="no-time")
        with_time = SessionSummary(
            session_id="with-time",
            start_time=datetime(2025, 6, 1, tzinfo=UTC))
        since = datetime(2025, 1, 1, tzinfo=UTC)
        result = _filter_sessions([no_time, with_time], since=since, until=None)
        ids = [s.session_id for s in result]
        assert "no-time" not in ids
        assert "with-time" in ids


class TestRenderTotalsSingularLabels:
    def test_one_session_one_premium_request(self, tmp_path):
        """render_summary with 1 session / 1 premium request uses singular labels."""
        import json
        from click.testing import CliRunner
        from copilot_usage.cli import main
        d = tmp_path / "single"
        d.mkdir()
        events = [
            {"type": "session.start", "timestamp": "2025-01-15T10:00:00Z",
             "data": {"sessionId": "single-sess-0001",
                      "startTime": "2025-01-15T10:00:00Z",
                      "context": {"cwd": "/"}}},
            {"type": "session.shutdown", "timestamp": "2025-01-15T11:00:00Z",
             "currentModel": "claude-sonnet-4",
             "data": {"shutdownType": "normal", "totalPremiumRequests": 1,
                      "totalApiDurationMs": 1000,
                      "modelMetrics": {"claude-sonnet-4": {
                          "requests": {"count": 1, "cost": 1},
                          "usage": {"inputTokens": 100, "outputTokens": 50,
                                    "cacheReadTokens": 0, "cacheWriteTokens": 0}}}}},
        ]
        (d / "events.jsonl").write_text(
            "\n".join(json.dumps(e) for e in events), encoding="utf-8")
        result = CliRunner().invoke(main, ["summary", "--path", str(tmp_path)])
        assert "1 premium request" in result.output   # singular
        assert "1 session" in result.output           # singular

Regression scenarios

  • If the _aggregate_model_metrics merge loop is accidentally changed to overwrite instead of add, a multi-session summary would show wrong totals; the direct unit test would catch this.
  • The _extract_session_name OSError path matters when plan.md exists but is owned by another user; without a test it could be silently broken.
  • The _filter_sessions None-time exclusion, if removed, would cause a TypeError comparison (None < datetime) at runtime for any user who has a session without a start event.

Generated by Test Suite Analysis ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    awCreated by agentic workflowenhancementNew feature or requesttest-auditTest coverage gaps

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions