Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .ai-team/agents/gandalf/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,3 +114,67 @@
- Approval gates: Gandalf (architecture), Gimli (quality), Legolas (infrastructure).
- No merges without review. The squad model requires visibility and ownership.
- Team structure is defined in `.ai-team/routing.md`—contributors should understand domain boundaries and who to ask.

---

## 2026-02-19: PR #14 Review — CI/CD Workflow Architecture

**Learning:** When reviewing CI/CD changes, verify the fix targets the correct workflow file.

**Key Insight:**
- IssueManager has multiple workflow files with different purposes:
- `test.yml`: Comprehensive test suite with 6 parallel jobs (unit, architecture, bunit, integration, aspire, e2e)
- `squad-ci.yml`: Simple single-job build+test verification for basic checks
- Other squad-*.yml files for docs, releases, issue triage, etc.

**PR #14 Issue:**
- Attempted to fix E2E test failures (missing Playwright browsers)
- Modified `squad-ci.yml` but E2E tests actually run in `test.yml` (test-e2e job, lines 346-398)
- Would not fix the actual problem

**Review Process Validated:**
1. Read PR description (problem statement clear)
2. Examine diff (change conceptually correct)
3. Check workflow runs (test.yml failed, not squad-ci.yml)
4. Cross-reference file structure (E2E tests exist in tests/E2E with Playwright dependency)
5. Identify architectural mismatch (wrong workflow file)

**Lesson:** Always verify:
- Which workflow file runs which tests
- Where the failing tests actually execute
- Whether the fix targets the correct execution path

**Escalation:** Legolas (DevOps) to fix by moving Playwright install to test.yml test-e2e job

---

## 2026-02-19: E2E Test Infrastructure Removed

**Context:** Aspire project constraints — web application not deployable for testing at current stage of development.

**Architectural Decision:**
- E2E tests require running web application endpoint (Playwright browser automation)
- Aspire orchestrated services not yet configured for external endpoint exposure
- Premature to maintain E2E infrastructure without ability to execute tests
- Other test layers (Unit, Integration, Architecture, Blazor, Aspire) provide adequate coverage

**Actions Taken:**
- Deleted `tests/E2E/` directory (entire E2E test project)
- Removed E2E project reference from `IssueManager.sln` using `dotnet sln remove`
- Verified solution builds successfully in Release configuration (no broken references)

**Build Validation:**
- `dotnet build IssueManager.sln --configuration Release` — ✅ SUCCESS
- All remaining test projects (Unit, Architecture, Blazor, Integration, Aspire) — intact
- No cascading reference failures

**Next Steps:**
- Legolas (DevOps) to update `.github/workflows/test.yml` — remove `test-e2e` job
- Gimli (Tester) to update test documentation — remove E2E references, adjust coverage strategy
- Future: Re-introduce E2E tests when Aspire endpoints are stable and web UI is deployable

**Trade-offs Accepted:**
- Lost E2E coverage of user workflows (critical path testing)
- Gained faster CI/CD execution (removed longest-running test job)
- Reduced maintenance burden (no Playwright version management)
- Aligns with Aspire project maturity level
40 changes: 36 additions & 4 deletions .ai-team/agents/legolas/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,9 @@
### CI/CD Pipeline Design — Test Execution (2026-02-17)

#### Parallelization Strategy
- **6 independent test jobs** (Unit, Architecture, bUnit, Integration, Aspire, E2E) run simultaneously
- **5 independent test jobs** (Unit, Architecture, bUnit, Integration, Aspire) run simultaneously
- **Single shared build job** with NuGet cache reduces redundancy (~5-10 min)
- **Total execution time: ~12-15 minutes** (parallel much faster than sequential ~30 min)
- **Total execution time: ~10-12 minutes** (parallel much faster than sequential ~25 min)
- Safe to parallelize because test suites have no shared state; each job is idempotent

#### Coverage Gates & Reporting
Expand All @@ -62,7 +62,6 @@
- **CI env vars:** Dummy Auth0 values, test MongoDB connection string
- **Production:** Secrets stored in GitHub Secrets or Key Vault, rotated regularly
- **Aspire configuration:** Different manifests for dev (localhost), test (CI container), prod (cloud)
- **E2E tests:** Run in headless mode in CI (Playwright --headless flag)

#### Artifact & Reporting Strategy
- **Per-job TRX uploads** named by test type (unit-test-results, integration-test-results, etc.)
Expand All @@ -80,4 +79,37 @@
- **NuGet cache hit:** 50-60% reduction in restore time (subsequent jobs benefit)
- **Build cache:** `dotnet build` is incremental; most builds skip unchanged projects
- **Timeout margins:** 15 min build + 10 min tests + 2 min overhead = 27 min total, well below 30 min runner default
- **Parallelism limit:** 6 jobs OK for standard GitHub runner; cost scales linearly
- **Parallelism limit:** 5 jobs OK for standard GitHub runner; cost scales linearly

---

### E2E Test Removal — 2026-02-17

#### Why Removed
- **Aspire projects do not expose a static web endpoint** suitable for Playwright E2E testing
- **Service discovery is dynamic** — Aspire uses ephemeral ports and service mesh routing
- **E2E tests cannot reliably target the Blazor UI** without complex orchestration or external deployment
- **CI build time reduced** by ~5 minutes (removed Playwright browser install + E2E execution)

#### Changes Made
- **Removed `test-e2e` job** from `.github/workflows/test.yml` (lines 366-431)
- **Updated `report` job dependencies** — removed `test-e2e` from `needs` list
- **Updated job summary script** — removed `e2e_status` variable and E2E Tests line from output
- **Verified `squad-ci.yml`** — no E2E references found (already clean)

#### Remaining Test Jobs
1. `build` — Solution build with NuGet caching
2. `test-unit` — Unit tests with coverage
3. `test-architecture` — NetArchTest rules (no coverage)
4. `test-bunit` — Blazor component tests with coverage
5. `test-integration` — Integration tests with MongoDB service
6. `test-aspire` — Aspire orchestration tests with coverage
7. `coverage` — Aggregate coverage from all sources
8. `report` — Unified test result summary

#### Future E2E Strategy (if needed)
- Deploy Aspire app to temporary container/K8s environment in CI
- Wait for service readiness via health checks
- Run Playwright tests against deployed endpoints
- Teardown after test completion
- **Trade-off:** Adds 10-15 min to CI pipeline vs current in-process testing
33 changes: 33 additions & 0 deletions .ai-team/decisions.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,37 @@

---

## 2026-02-19: PR #14 Review — Playwright CI Integration (REJECTED)

**By:** Gandalf (Lead)
**What:** Reviewed PR #14 attempting to fix E2E test failures by installing Playwright browsers in CI
**Result:** NEEDS FIXES — Critical issue found: wrong workflow file modified

**Analysis:**
- **Root cause identified correctly:** E2E tests failing due to missing Chromium browser binaries in GitHub Actions
- **Solution approach is sound:** Install Playwright CLI tool and Chromium browsers before running tests
- **CRITICAL FLAW:** PR modifies `squad-ci.yml` but E2E tests actually run in `test.yml` workflow (test-e2e job)
- **squad-ci.yml** is a single-job workflow for simple build+test verification (line 30: "build-and-test" job)
- **test.yml** is the comprehensive test suite with 6 parallel jobs including test-e2e (lines 346-398)
- The Playwright install step must be added to test.yml (test-e2e job) not squad-ci.yml

**Minor issues:**
- Uses `dotnet tool update --global ... || dotnet tool install --global ... --ignore-failed-sources` which is overly complex
- Should use simple `dotnet tool install --global Microsoft.Playwright.CLI` for consistency with test.yml conventions
- Uses `--with-deps` flag which is actually better than separate install/install-deps commands (improvement)

**Impact if merged:**
- E2E tests in test.yml will continue to fail (no Playwright browsers installed)
- squad-ci.yml will have an unnecessary Playwright install step (no E2E tests run there)
- Test failures will persist, blocking future PRs

**Decision:**
NEEDS FIXES before merge. Legolas (DevOps) should:
1. Remove Playwright install from squad-ci.yml
2. Add Playwright install to test.yml in test-e2e job after "Build solution" step (line 373)
3. Use command: `dotnet tool install --global Microsoft.Playwright.CLI && playwright install --with-deps chromium`
4. Re-run tests to verify E2E tests now pass

---

*End of decisions. Append new decisions below as the team works.*
132 changes: 132 additions & 0 deletions .ai-team/decisions/inbox/gandalf-e2e-removal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Decision: Remove E2E Test Infrastructure

**Date:** 2026-02-19
**Author:** Gandalf (Lead Architect)
**Status:** Implemented

---

## Context

IssueManager is an Aspire-orchestrated .NET 10 application with multiple test layers:
- Unit tests (business logic, validators, handlers)
- Architecture tests (layer boundaries, naming conventions)
- Integration tests (MongoDB TestContainers, end-to-end data flows)
- Blazor tests (bUnit component testing)
- Aspire tests (orchestration health)
- E2E tests (Playwright browser automation)

**Problem:** The E2E test infrastructure requires a running web application endpoint to execute Playwright browser automation tests. However:
- Aspire project is in early development stage
- Web application endpoints are not yet stable or externally accessible
- E2E tests cannot execute without a deployed web URL
- CI/CD pipeline fails when attempting to run E2E tests (missing browser dependencies, no endpoint to test against)

---

## Decision

**Remove the E2E test infrastructure entirely:**
1. Delete `tests/E2E/` directory and all E2E test code
2. Remove E2E project reference from `IssueManager.sln`
3. Update CI/CD workflow to remove `test-e2e` job (delegated to Legolas/DevOps)
4. Update test documentation to reflect adjusted testing strategy (delegated to Gimli/Tester)

**Rationale:**
- E2E tests provide value only when web application is deployable and accessible
- Maintaining E2E infrastructure without ability to execute tests creates false confidence
- Other test layers (Unit, Integration, Architecture, Blazor) provide adequate coverage during early development
- Faster CI/CD execution without longest-running test job (E2E typically 3-5 minutes)
- Reduced maintenance burden (no Playwright browser version management)

---

## Alternatives Considered

### 1. Keep E2E tests but disable them
**Pros:** Infrastructure ready when web app is deployable
**Cons:** Dead code creates maintenance burden, CI/CD confusion, false impression of coverage
**Rejected:** Violates "don't maintain what you can't execute" principle

### 2. Mock or stub web application endpoint
**Pros:** E2E tests can run without real web app
**Cons:** Mocked E2E tests are not true end-to-end tests, defeats purpose of browser automation
**Rejected:** Playwright tests without real web app provide no value over bUnit component tests

### 3. Wait for Aspire endpoints to stabilize
**Pros:** Keep all test infrastructure in place
**Cons:** Timeline unclear, CI/CD fails in meantime, blocks other development
**Rejected:** Premature optimization, revisit when web app is deployable

---

## Implementation

**Completed Actions:**
- ✅ Deleted `tests/E2E/` directory (entire project)
- ✅ Removed E2E project reference from solution using `dotnet sln remove tests\E2E\E2E.csproj`
- ✅ Verified solution builds successfully: `dotnet build IssueManager.sln --configuration Release` (exit code 0)
- ✅ Updated Gandalf history (`history.md`) with architectural decision and context

**Pending Actions (delegated):**
- ⏳ Legolas (DevOps): Remove `test-e2e` job from `.github/workflows/test.yml`
- ⏳ Gimli (Tester): Update test documentation to remove E2E references and adjust coverage strategy

---

## Consequences

### Positive
- **Faster CI/CD:** Removed longest-running test job (3-5 minutes saved per run)
- **Reduced complexity:** No Playwright browser management, no endpoint configuration
- **Honest coverage:** Test suite reflects actual executable tests, no false positives
- **Clear signal:** When web app is deployable, E2E infrastructure will be re-introduced with purpose

### Negative
- **Lost coverage:** No end-to-end user workflow validation via browser automation
- **Manual testing required:** Critical paths must be manually tested until E2E tests are restored
- **Re-work later:** Will need to recreate E2E infrastructure when Aspire endpoints are stable

### Neutral
- **No impact on other test layers:** Unit, Integration, Architecture, Blazor tests remain intact
- **No blocking issues:** Solution builds, other tests run successfully

---

## Monitoring & Success Criteria

**How we'll know this was the right decision:**
1. CI/CD pipeline passes consistently without E2E test failures
2. Development velocity increases (no debugging E2E infrastructure issues)
3. Other test layers provide sufficient confidence for early development

**When to revisit:**
- Aspire web application is deployed to a stable endpoint (dev/staging environment)
- Web UI is mature enough for user workflow testing
- Team requests E2E coverage for critical paths

**Re-introduction criteria:**
- Web app accessible via stable URL (local, dev, or staging)
- Playwright browser automation can successfully navigate application
- E2E tests provide value beyond existing bUnit component tests

---

## Related Work Items

- **Legolas (DevOps):** Update `.github/workflows/test.yml` to remove `test-e2e` job
- **Gimli (Tester):** Update test documentation (`docs/testing/`) to reflect adjusted strategy
- **Future (TBD):** Re-introduce E2E tests when Aspire endpoints are stable

---

## References

- [Playwright Documentation](https://playwright.dev/dotnet/)
- [Aspire Orchestration Docs](https://learn.microsoft.com/en-us/dotnet/aspire/)
- [Test Pyramid Pattern](https://martinfowler.com/articles/practical-test-pyramid.html)

---

**Sign-off:** Gandalf (Lead Architect)
**Review:** Pending (Legolas for CI/CD impact, Gimli for test strategy impact)
22 changes: 22 additions & 0 deletions .ai-team/decisions/inbox/gandalf-pr14-workflow-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
### 2026-02-20: PR #14 Review — Incorrect Workflow Target Identified

**By:** Gandalf (Lead)

**What:** PR #14 attempted to fix E2E test failures by installing Playwright browsers, but targeted the WRONG workflow file (squad-ci.yml instead of test.yml).

**Why:**
- E2E tests run in `.github/workflows/test.yml` (test-e2e job), not squad-ci.yml
- E2E tests were failing due to missing Chromium browser binaries
- Installing Playwright in squad-ci.yml would not fix the actual problem
- The correct fix requires modifying test.yml's test-e2e job

**Decision:**
1. ✅ CLOSED PR #14 (incorrect target workflow)
2. ✅ OPENED PR #15 with the CORRECT fix (test.yml, test-e2e job)
3. ✅ Reassigned to Aragorn to execute proper fix

**Learning for Future:**
When E2E or other specialized test suites fail:
- Always verify which workflow job actually runs those tests
- Don't assume single unified test step — workflows often have dedicated jobs per test category
- Check workflow YAML to map test failures to correct job before implementing fix
Loading
Loading