From 2466ecee3903c0405142ed84576acdc7d2411dd7 Mon Sep 17 00:00:00 2001 From: DJ Date: Mon, 30 Mar 2026 15:53:02 -0700 Subject: [PATCH 1/4] feat: add Structured Logging and CQRS standards to AGENTS.md Add two new organization-wide sections with agentic-friendly directives: - Structured Logging: JSON format, canonical fields, correlation/tracing, log levels, what to log/not log, Go (slog) and TypeScript (pino) patterns - CQRS: when to apply, command/query/event naming conventions, separation rules, idempotency, eventual consistency, GraphQL integration, testing Both sections include numbered "Agentic Directives" blocks with deterministic rules that AI coding agents can follow without ambiguity. Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 190 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 190 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index d5d5f13..afacc5a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -644,6 +644,196 @@ Before starting a stacked Epic/Feature workflow, verify: --- +## Structured Logging + +All services MUST use structured logging. Structured logs are machine-parseable, correlatable across services, and essential for observability. These rules apply to every language and framework in the stack. + +### Format & Fields + +- **Emit all logs as JSON objects in production.** Never use `fmt.Println`, `console.log`, or unstructured string interpolation for application logs. +- **Every log line MUST include these baseline fields:** + + | Field | Format | Example | + |-------|--------|---------| + | `timestamp` | ISO 8601 / RFC 3339, always UTC | `2026-03-29T14:22:01.123Z` | + | `level` | `debug`, `info`, `warn`, `error`, `fatal` | `info` | + | `msg` | Static, human-readable string (no interpolated variables) | `order placed` | + | `service` | Name of the emitting service | `markets-api` | + | `version` | Deployed version or git SHA | `a1b2c3d` | + +- **Put variable data in fields, not in the message string.** + - WRONG: `logger.Info("User 1234 placed order 5678")` + - RIGHT: `logger.Info("order placed", "user_id", "1234", "order_id", "5678")` +- **Use `snake_case` for all field names.** Check existing logs in the codebase for canonical field names before inventing new ones. +- **Canonical field names:** `user_id`, `request_id`, `trace_id`, `span_id`, `duration_ms`, `http_method`, `http_path`, `http_status`, `error_message`, `error_stack`. + +### Correlation & Tracing + +- **Every inbound request MUST be assigned a `request_id`.** If the caller provides one via `X-Request-ID` header, propagate it. Otherwise, generate a UUIDv4. +- **Attach `request_id` to the logger context at the middleware layer.** All subsequent logs within that request lifecycle inherit it automatically. +- **When making outbound calls (HTTP, gRPC, queues), propagate `request_id` in headers.** +- **If OpenTelemetry is in use, include `trace_id` and `span_id` in every log line.** These fields bridge logs to distributed traces. + +### Log Levels + +| Level | Use when | Examples | +|-------|----------|----------| +| **DEBUG** | Detailed diagnostics, useful only during development. **Disabled in production by default.** | Variable values, SQL queries, cache hit/miss | +| **INFO** | Normal operational events confirming the system works as expected | Request completed, job finished, service started | +| **WARN** | Something unexpected but the system recovered or can continue | Retry attempted, deprecated API called, slow query detected | +| **ERROR** | An operation failed and could not be completed. Every ERROR must be actionable | External service call failed after retries, database write failed | +| **FATAL** | The process cannot continue and will exit. Use extremely sparingly | Failed to bind port, required config missing at startup | + +**Rules:** +- A 404 for a missing resource is INFO, not ERROR. +- Validation failures caused by user input are WARN at most, usually INFO. +- Every ERROR and FATAL log MUST include `error_message` and, where available, `error_stack`. +- If an error is caught and handled with a fallback, log at WARN, not ERROR. + +### What to Log + +- **Request/response boundaries:** method, path, status, `duration_ms`, request size. +- **State transitions:** job started/completed/failed, circuit breaker changes, cache invalidation. +- **Errors with full context:** the attempted operation, input parameters (sanitized), error message and type, retry count, stack trace for unexpected errors. +- **Security events:** authentication success/failure (without credentials), authorization denial, rate limiting triggered, admin operations. +- **Performance data:** external call durations, database query durations above threshold. + +### What NOT to Log + +- **NEVER log passwords, API keys, secrets, tokens (JWT, session, bearer), or private keys.** +- **NEVER log full request/response bodies that may contain `Authorization`, `Cookie`, `Set-Cookie`, or `X-API-Key` headers.** +- **NEVER log PII in plain text:** email addresses, phone numbers, physical addresses, SSNs, credit card numbers. +- **NEVER log at DEBUG level in production by default.** DEBUG must be gated behind a runtime flag. +- **NEVER log inside tight loops** (per-item in a batch). Log once before and once after with a count. +- **NEVER use `log.Fatal` or `log.Panic` in library code.** Return errors; let the caller decide. + +### Agentic Directives + +These rules are deterministic — apply them whenever writing or modifying code: + +1. When creating any new HTTP/GraphQL handler, add structured logging middleware that logs: method, path/operation, status, `duration_ms`, `request_id`. +2. When adding a logger call, ALWAYS use structured key-value pairs. NEVER concatenate or interpolate variables into the message string. +3. When handling an error, log with at minimum: `level=error`, `msg=`, `error_message`, and all context IDs (`request_id`, `user_id`) available in scope. +4. When calling an external service, log before (at DEBUG) and after (ERROR on failure with `duration_ms`, INFO/DEBUG on success with `duration_ms`). +5. When a function receives a `context.Context` (Go) or request object (Node), extract the logger from it. NEVER create a new bare logger inside a handler. +6. NEVER log any variable named or containing: `password`, `secret`, `token`, `key`, `authorization`, `cookie`, `ssn`, `credit_card`. When in doubt, omit it. +7. When adding retry logic, log each retry at WARN with `attempt_number` and `max_retries`. +8. When choosing a log level, ask: "Will someone be paged?" → ERROR. "Degradation?" → WARN. "Normal operation?" → INFO. "Only useful debugging?" → DEBUG. + +### Go Patterns + +- **Use `log/slog` (Go 1.21+) as the default.** Use `slog.NewJSONHandler` for production, `slog.NewTextHandler` for local development. +- **Propagate loggers via `context.Context`.** Use `slog.InfoContext(ctx, ...)` so OpenTelemetry bridges can extract trace/span IDs. +- **Never use the global `log` package** from the standard library. Always use `slog` or a structured alternative. +- **In tests, use `slog.New(slog.NewTextHandler(io.Discard, nil))` to suppress log output.** +- **Log at the point of handling, not at every intermediate layer.** Wrap errors with `fmt.Errorf("operation: %w", err)` and log once at the top-level handler. + +### TypeScript / Node Patterns + +- **Use `pino` as the default structured logger.** +- **Use `logger.child({...})` to create request-scoped loggers.** Never add fields to the root logger at runtime. +- **In Express/Fastify, attach the child logger to the request object in middleware.** All downstream code uses `req.log`. +- **Never use `console.log`, `console.error`, or `console.warn` in application code.** `console.*` is acceptable only in CLI tools and build scripts. +- **When logging errors, pass the Error object in a field:** `logger.error({ msg: "payment failed", err })`. + +--- + +## CQRS — Command Query Responsibility Segregation + +CQRS separates read and write models to allow independent optimization of each. These rules define when and how to apply CQRS across the organization. CQRS is a **per-use-case decision**, not an architectural mandate — some operations within a service may use CQRS while others use standard CRUD. + +### When to Apply + +- **Use CQRS** when read and write workloads have significantly different scaling, performance, or modeling requirements. +- **Use CQRS** when the domain has complex business rules on the write side but simple, denormalized read requirements. +- **Use CQRS** when multiple read representations (projections) of the same data are needed. +- **Do NOT use CQRS** for simple CRUD domains where read and write models are nearly identical. +- **Do NOT apply CQRS to every bounded context.** Apply it selectively to contexts that benefit from it. + +### Separation Rules + +- The **write model** (command side) is optimized for enforcing invariants and business rules. It is the source of truth. +- The **read model** (query side) is optimized for query performance. It MAY be denormalized, pre-aggregated, or stored in a different database. +- **Commands MUST NOT return domain data.** They return either nothing (success), an ID of a created resource, or an error. +- **Queries MUST NOT mutate state.** A query handler is side-effect-free with respect to domain state. (Logging, metrics, and cache population are acceptable side effects.) + +### Naming Conventions + +| Concept | Convention | Examples | +|---------|-----------|----------| +| **Command** | Imperative verb + noun | `PlaceOrder`, `CancelSubscription`, `AssignRole` | +| **Event** | Past-tense verb + noun | `OrderPlaced`, `SubscriptionCancelled`, `RoleAssigned` | +| **Query** | `Get`/`List`/`Search`/`Count` + noun | `GetOrderById`, `ListActiveUsers`, `SearchProducts` | +| **Command handler** | `Handle(cmd)` or `Handler` | `PlaceOrderHandler` | +| **Event handler** | `On` or `Handler` | `OnOrderPlaced` | +| **Projection** | Noun describing the view | `OrderSummaryProjection`, `UserDashboardView` | + +Mixing these conventions (e.g., a command named `OrderPlaced` or an event named `PlaceOrder`) is a design error. + +### Command Patterns + +- A command MUST be a plain data structure (DTO) with no behavior. It carries the intent and the data needed to fulfill it. +- Every command that creates or mutates state MUST be idempotent, or the system MUST detect and reject duplicates. Strategies: idempotency key from client, natural idempotency ("set X" not "append X"), or conditional/versioned updates (`expectedVersion`/`ETag`). +- Each command MUST have exactly one handler. +- A command handler's responsibilities are: (1) validate, (2) load aggregate/entity, (3) invoke domain logic, (4) persist changes, (5) publish domain events. +- **One command = one transaction = one aggregate mutation.** Do NOT mutate multiple aggregates in one command handler. Use domain events and sagas for cross-aggregate coordination. +- **Domain logic belongs in the aggregate/entity, not the handler.** If the handler has `if/else` business logic, move it into the domain model. + +### Command Validation + +- **Structural/syntactic validation** on the command object before it reaches the handler (required fields, format, lengths). This can be middleware. +- **Domain/business validation** inside the aggregate/entity (e.g., "user has sufficient balance"). MUST NOT be in the handler or middleware. +- Return validation errors as typed, structured responses — not exceptions for flow control. + +### Query Patterns + +- A query MUST be a plain data structure containing only retrieval parameters: filters, pagination, sorting, field selection. +- Query handlers MUST NOT invoke command handlers or emit domain events. +- Query handlers MAY read from a dedicated read database, cache, search index, or materialized view. +- Build projections (materialized views) shaped exactly for the UI/API consumer. One projection per distinct read use case is acceptable. + +### Domain Events + +- Events MUST be immutable. Once published, an event's schema and data MUST NOT change. To evolve, publish new event types and deprecate old ones. +- Events SHOULD carry enough data for consumers to process them without querying back to the source ("fat" events over "thin" events). +- Every event MUST include: `event_id` (unique), `event_type`, `aggregate_id`, `aggregate_type`, `timestamp`, `version`/`sequence_number`, `payload`, `metadata` (`correlation_id`, `causation_id`, `user_id`). +- Event handlers (projectors, reactors, sagas) MUST be idempotent. Use `sequence_number` or `event_id` to detect and skip already-processed events. +- **Outbox pattern:** When a command handler writes to the database AND publishes events, use the transactional outbox pattern — write events to an outbox table in the same transaction as the aggregate, then publish from the outbox asynchronously. This prevents dual-write problems. + +### Eventual Consistency + +- Accept that the read model will lag behind the write model. Design UIs to handle this: optimistic updates, "your change is being processed" messaging, polling/subscriptions for completion. +- Define and monitor consistency SLAs — the acceptable lag time between a command and the read model update. +- Do NOT read from the write model after a command as a consistency workaround. This defeats CQRS and creates coupling. + +### GraphQL + CQRS Integration + +- **Mutations map to commands.** Each mutation field corresponds to a single command. The mutation name matches the command name. +- **Queries map to read models.** Each query field reads from the query side. Never route a GraphQL query through a command handler. +- **Subscriptions map to domain events** or projection change streams. Use event-driven push, not database polling. +- Use GraphQL union types or typed error fields for command validation and domain errors. Do NOT rely solely on the `errors` array for business-logic errors. + +### Agentic Directives + +1. When implementing a new mutation, structure it as: parse input → construct command → dispatch to handler → return ID or error. Do NOT put business logic in the resolver. +2. When implementing a new query, read from the query/projection store. NEVER load aggregates or call domain services from a query resolver. +3. When naming a new command, use imperative form (`CreateMarket`). When naming an event, use past tense (`MarketCreated`). When naming a query, use `Get`/`List`/`Search` prefix (`GetMarketById`). +4. When a command handler needs to affect another aggregate, publish a domain event and handle it in a separate event handler. Do NOT modify two aggregates in the same transaction. +5. When creating an event handler, make it idempotent — check if the event has already been processed before applying side effects. +6. When adding a new read use case with different shape requirements, create a new projection rather than overloading an existing query with conditional logic. +7. When the domain is simple CRUD with no complex invariants or divergent read/write needs, use a standard repository pattern. Do NOT introduce CQRS for basic entity operations. +8. When a command creates a new entity, return only the new entity's ID. Do NOT return the full read model — let the caller issue a separate query. + +### Testing CQRS + +- **Commands:** Test handlers in isolation with in-memory repositories. Verify correct events are produced, state changes are persisted, and invariants are enforced. Include idempotency tests. +- **Queries:** Test against a pre-populated read store. Seed known data, assert query results. +- **Event handlers:** Feed events and assert side effects. Test idempotency (same event twice = same result). Test ordering if handler depends on sequence. +- **Projections:** Test that a projection can be rebuilt from scratch by replaying all events. +- **Integration:** Send a command, wait for projection update (with polling/timeout, NOT arbitrary sleep), then query and assert. + +--- + ## References - AGENTS.md convention: https://agents.md/ From d6426a1649003d825149521b4939276b8d6fedf2 Mon Sep 17 00:00:00 2001 From: DJ Date: Mon, 30 Mar 2026 16:01:09 -0700 Subject: [PATCH 2/4] =?UTF-8?q?feat:=20add=20E2E=20testing=20standards=20?= =?UTF-8?q?=E2=80=94=20validate=20real=20functionality,=20not=20smoke=20te?= =?UTF-8?q?sts?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add comprehensive E2E testing section to org-wide AGENTS.md that enforces testing real business outcomes through the full stack. Key additions: - Philosophy: every E2E test must answer "what would break for a real user?" - Forbidden patterns table: smoke tests disguised as E2E, UI-only assertions, mocked backends, status-code-only checks, arbitrary sleeps, happy-path-only - Required test structure: Arrange → Act → Assert → Verify → Cleanup - Multi-layer assertions: UI + API response + database state - GraphQL E2E: mutation→query round trips, auth on every resolver, pagination - Go backend E2E: testcontainers for real databases, migration testing, concurrency/idempotency testing - Mobile E2E: Detox/Maestro patterns, offline/online, testID selectors - 12 agentic directives for deterministic agent behavior Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 114 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index afacc5a..e16223f 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -55,6 +55,120 @@ If a dependency cannot be resolved, report the specific blocker and a workaround --- +## End-to-End Testing — Validate Real Functionality + +E2E tests validate real user workflows through the full stack. They exist to catch bugs that would affect real users. **A test that does not verify a real business outcome is a test that provides false confidence and must not exist.** + +> Every E2E test must answer one question: **"What would break for a real user if this test didn't exist?"** If the test could pass while the feature is fundamentally broken, it is worthless and must be rewritten. + +### What E2E Tests MUST Do + +1. **Full round-trip verification.** Action → API call → database mutation → response → UI reflects new state. Not a subset — the whole chain. +2. **Multi-layer assertions.** The UI shows correct data AND the database contains the correct record AND side effects occurred (events published, emails queued, cache invalidated). +3. **Verify at the data layer.** After a form submission, query the database directly to verify the record exists with correct fields. After a delete, verify it's gone. After auth, verify the token's claims and scopes. Do NOT stop at "success toast appeared." +4. **Test error paths.** For every happy-path test, write corresponding tests for: invalid input, unauthorized access, conflict/duplicate states, and not-found resources. +5. **Test authorization boundaries.** Verify user A cannot access user B's resources. Verify regular users cannot hit admin endpoints. Verify expired/revoked tokens are rejected. +6. **Use realistic data.** Factories that produce production-realistic data (unicode names, long strings, special characters, realistic cardinalities) — not `"test"` and `"foo"`. +7. **Deterministic waits.** Wait for specific conditions (element visible, API response received, database row present) using polling with timeouts — never arbitrary sleeps. + +### Forbidden Patterns + +These are non-negotiable. Tests exhibiting these patterns MUST be rejected: + +| Anti-Pattern | Why It Fails | Fix | +|---|---|---| +| **Smoke test disguised as E2E** | Verifies the page loads, not that anything works | Add assertions on business outcomes after user actions | +| **UI-only assertions** | Cached/stale UI can show "Success" while the write failed | Query the database or API to verify the actual state change | +| **Mocking the entire backend** | Eliminates the integration being tested | Hit the real backend. Use testcontainers or test databases | +| **Asserting only on HTTP status codes** | A 200 with empty body or wrong data is still a bug | Always verify response body fields and database state | +| **Arbitrary sleeps** | Flaky, slow, hides timing bugs | Poll for a condition with a timeout | +| **Happy path only** | Production bugs live in error paths and edge cases | Test invalid input, unauthorized access, and conflicts | +| **No cleanup / test pollution** | Tests depend on execution order, fail in isolation | Each test creates and cleans up its own data | +| **UI for preconditions** | 10x slower, couples test to unrelated UI flows | Use API calls or direct DB inserts for setup | +| **Brittle selectors** | Breaks on any UI change | Use `data-testid` (web) or `testID` (React Native) exclusively | +| **Placeholder assertions** | `expect(true).toBe(true)` proves nothing | Assert on specific field values and business outcomes | + +### Test Structure — Arrange, Act, Assert, Verify, Cleanup + +Every E2E test follows this structure: + +1. **Arrange** — Create preconditions via API or direct DB insert (never via UI) +2. **Act** — Perform the user action under test +3. **Assert** — Check the immediate response (HTTP status + body, or UI feedback) +4. **Verify** — Check the database/state store to confirm the real outcome +5. **Cleanup** — Remove test data (or use transactional rollback) + +### Test Design Patterns + +**Page Object Model (for UI E2E):** +- Encapsulate page interactions in page objects. Tests read as user stories, not DOM manipulation. +- Page objects expose user-intent methods (`loginAs(user)`, `submitOrder(items)`) — not element-level methods. +- Selectors live in exactly one place (the page object). Use `data-testid` / `testID` exclusively. + +**Test Data Factories:** +- Every test creates its own data. Never rely on pre-existing seed data. +- Factories produce realistic, randomized data: `createUser({role: "admin"})`, `createOrder({status: "pending", items: 3})`. +- Factories use the API or database — NOT the UI. + +**Multi-Layer Assertion Example:** +``` +// WRONG — only checks UI +await page.click("#submit-order"); +expect(await page.getText(".toast")).toBe("Order placed!"); + +// RIGHT — checks UI + API response + database +const response = await submitOrderAndCapture(orderData); +expect(response.status).toBe(201); +expect(response.body.orderId).toBeDefined(); + +const dbOrder = await db.orders.findById(response.body.orderId); +expect(dbOrder.status).toBe("confirmed"); +expect(dbOrder.items).toHaveLength(3); +expect(dbOrder.total).toBe(expectedTotal); + +expect(await page.getText("[data-testid='order-id']")).toContain(dbOrder.orderId); +``` + +### GraphQL E2E + +- **Mutation → Query round trips.** Execute a mutation, then immediately query for the resource. Verify every field matches. This catches stale cache, serialization mismatches, and silent write failures. +- **Authorization on every resolver.** For every query and mutation, test with: valid token (succeeds), no token (rejected), wrong user's token (rejected), insufficient scope (rejected). +- **Subscription delivery.** Open a subscription, perform the triggering mutation, verify the subscription receives the correct payload within a timeout. +- **Pagination edge cases.** Test: empty results, exactly one page, last page terminates correctly, cursor stability across inserts, invalid cursors return helpful errors. + +### Go Backend E2E + +- **Use `testcontainers-go` for real databases.** Spin up real PostgreSQL (and Redis, Kafka, etc.) per test suite. No mocking the database in E2E — ever. +- **Use `httptest.Server` with the real application.** Tests hit real HTTP endpoints, which hit the real database. +- **Test migrations.** Run migrations from scratch on test suite startup. If migrations fail, the test fails. +- **Test concurrency.** Double-submit for idempotency verification. Two users editing the same resource for optimistic locking. Use `errgroup` to fire concurrent requests. +- **Assert beyond status codes.** Verify response body fields, database state, audit log entries, published events. + +### Mobile / React Native E2E + +- **Use Detox or Maestro** for React Native. Run against real simulators/emulators. +- **Test full navigation flows** — deep links, back button, tab switching with state preservation, modal dismissal. +- **Test offline/online transitions** — disable network, verify cached data displays and writes queue, re-enable, verify sync. +- **Use `testID` prop exclusively for selectors.** Never match on displayed text (changes with localization). +- **Test on at least two form factors.** Never hardcode device dimensions. + +### Agentic Directives + +1. Before writing any E2E test, state the user workflow in a comment: `// Workflow: User creates a project, verifies it appears in the list, and can be accessed by direct URL.` +2. Every test MUST include a database/state assertion. If the test only asserts on HTTP status or UI text, it is incomplete. +3. Every test MUST create its own preconditions via API/DB. Never assume data exists from a previous test. +4. Every test MUST clean up after itself. Prefer transactional cleanup. +5. For every mutation test, write a corresponding verification query. Create → verify exists. Update → verify changed. Delete → verify gone. +6. Test at least one error case per endpoint/workflow: invalid input, missing auth, forbidden access, not-found, duplicate/conflict. +7. Use deterministic waits, not sleeps. Poll for a condition with a timeout. +8. Use `data-testid` / `testID` for all element selection. If one doesn't exist, add it to the component. +9. Name tests as behavioral specifications: not `test("submit form")` but `test("submitting a valid order creates a confirmed order record with correct line items and total")`. +10. When testing auth flows, always test both positive AND negative: valid credentials succeed AND invalid credentials fail with the correct error. +11. Never generate placeholder assertions. Every `expect`/`assert` must check a meaningful, specific value. +12. When unsure whether a test is thorough enough, it is not. Add more assertions. Verify at more layers. Test one more error case. + +--- + ## Pre-Commit Quality Checks Before every commit, agents MUST run and pass the project's full check suite. At minimum: From afb7032a73b70ec432b94797d5774db68502ba3d Mon Sep 17 00:00:00 2001 From: DJ Date: Mon, 30 Mar 2026 16:05:30 -0700 Subject: [PATCH 3/4] =?UTF-8?q?feat:=20add=20breaking=20changes=20policy?= =?UTF-8?q?=20=E2=80=94=20require=20human=20approval,=20tests=20as=20contr?= =?UTF-8?q?acts?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add "Breaking Changes — Human Approval Required" subsection to Coding Standards. Technology-agnostic rules covering all layers: - What constitutes a breaking change (API, database, frontend, backend, shared contracts) with concrete examples table - Tests as the primary detection mechanism — existing tests encode contracts, never modify a test to accommodate a breaking change - Mandatory human approval gate: stop, describe, list impact, propose non-breaking alternative, wait for explicit approval - Non-breaking alternatives in priority order: additive changes, deprecation, feature flags, adapters, staged database migrations - 9 agentic directives (deterministic always/never rules) Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index e16223f..db1c62b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -252,6 +252,55 @@ All repositories MUST follow these software engineering principles. They apply t - **Direction of dependencies.** Dependencies always point inward — infrastructure depends on application, application depends on domain. Never the reverse. - **No framework bleed.** Framework-specific types and annotations stay at the infrastructure/adapter layer. Domain and application layers must be framework-agnostic. +### Breaking Changes — Human Approval Required + +Agents MUST NOT introduce breaking changes without explicit human approval. A breaking change is any modification that causes existing consumers, callers, or dependents to fail, produce incorrect results, or require changes on their end. **When in doubt, treat it as breaking.** + +**What constitutes a breaking change:** + +| Layer | Breaking changes include | +|-------|------------------------| +| **API** | Removing/renaming endpoints or response fields, changing field types or response shapes, adding required parameters, tightening validation, changing error formats, changing auth requirements | +| **Database** | Dropping/renaming columns or tables, changing column types, adding NOT NULL without a default, removing defaults, dropping indexes that queries depend on, changing constraints | +| **Frontend** | Removing/renaming exported components/hooks/utilities, changing prop types or function signatures, removing design tokens or CSS variables, changing route paths | +| **Backend** | Removing/renaming exported functions/interfaces/types, changing function signatures or return types, changing event schemas, removing configuration options | +| **Shared contracts** | Removing/renaming fields in API schemas, changing message payload structures, modifying shared type definitions consumed by multiple services | + +**Detection — tests are the primary safety net:** + +- Existing tests encode existing contracts. If your change causes an existing test to fail, you have likely introduced a breaking change. +- **NEVER modify an existing test to make a breaking change pass.** Changing a test to accommodate new behavior is hiding the break, not fixing it. New behavior gets new tests; existing tests protect existing contracts. +- If a test seems "wrong" or "outdated," flag it for human review. It may be the only documentation of a contract someone depends on. +- Before removing or changing anything, search the entire codebase for all references (including string-based references like route strings, config keys, and dynamic imports). Report the consumer count. + +**When a breaking change is detected — stop and follow this protocol:** + +1. **Stop immediately.** Do not commit the breaking change. +2. **Describe it clearly.** State exactly what is changing and why it qualifies as breaking. +3. **List what will break and for whom.** Specific file names, function names, test names, consumer services — not "some things might break." +4. **Propose a non-breaking alternative.** In most cases, one exists (see below). +5. **Wait for explicit human approval.** Do not interpret silence or general task intent as approval. + +**Non-breaking alternatives (prefer in this order):** + +1. **Additive changes** — Add new fields/endpoints/functions/columns alongside old ones. Do not replace. +2. **Deprecation + migration period** — Mark the old item as deprecated, add the replacement, document the migration path. Removal happens in a separate future change with human approval. +3. **Feature flags / versioning** — Gate new behavior behind a flag so old behavior remains the default. +4. **Adapter / compatibility layers** — Write a thin adapter mapping the old interface to the new implementation. +5. **Staged database changes** — Add new column → migrate data → update application code → deploy and verify → drop old column in a separate, human-approved change. Never combine add and drop in a single migration. + +**Agentic directives:** + +1. NEVER remove or rename a public function, type, component, endpoint, field, column, route, event, or config option without human approval. +2. NEVER change the signature of a public function or the shape of an API response, event payload, or shared data structure without human approval. +3. NEVER tighten validation, add required fields, or add NOT NULL constraints without human approval. +4. ALWAYS search for all consumers of any symbol, endpoint, or schema element before modifying or removing it. +5. ALWAYS propose a non-breaking alternative first. Only present the breaking option alongside it. +6. ALWAYS run the existing test suite after changes. Treat test failures as breaking change signals, not as "tests that need updating." +7. ALWAYS use additive changes by default. Deprecate, then remove later in a separate change. +8. NEVER combine a destructive database migration (drop column, drop table, change type) with other changes in the same commit or PR. +9. When uncertain whether something is breaking, treat it as breaking. False positives are cheap. False negatives break production. + ### Code Organization - **Co-locate related code.** Tests live next to the code they test. Types live near the code that uses them. Avoid scattering related files across distant directories. From 1b6d9f9f7d470fe39fa9696ab36f07b6671d9b1b Mon Sep 17 00:00:00 2001 From: DJ Date: Mon, 30 Mar 2026 18:23:00 -0700 Subject: [PATCH 4/4] refactor: use "functional requirement" terminology, remove tech-specific refs, address review comments MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit E2E Testing section: - Replace "workflow" with "functional requirement" throughout - Remove all technology-specific references (Playwright, Detox, Maestro, testcontainers-go, httptest, errgroup, React Native, GraphQL) - Generalize subsections: "GraphQL E2E" → "API E2E", "Go Backend E2E" → "Backend E2E", "Mobile / React Native E2E" → "Frontend E2E (Web and Mobile)" - Use generic terms: "frontend", "backend", "database", "test-ID attributes" - Fix code block: add language identifier (pseudocode) for MD031/MD040 Logging section (addressing Copilot + CodeRabbit review comments): - Add logger initialization guidance for baseline fields (timestamp, service, version) — addresses Copilot comment on line 826 - Add correlation_id, causation_id to canonical field names with explicit relationship definitions linking to CQRS — addresses comments on lines 832, 962 - Clarify error_message vs err object serialization — addresses line 853 - Narrow sensitive field name matching from substring "key" to explicit suffixes (api_key, private_key, etc.) — addresses line 882 CQRS section: - Add "CQRS is not Event Sourcing" clarification — addresses CodeRabbit nitpick - Cross-reference correlation_id/causation_id back to Structured Logging Co-Authored-By: Claude Opus 4.6 (1M context) --- AGENTS.md | 122 ++++++++++++++++++++++++++++-------------------------- 1 file changed, 63 insertions(+), 59 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index db1c62b..ea9d42e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -55,16 +55,16 @@ If a dependency cannot be resolved, report the specific blocker and a workaround --- -## End-to-End Testing — Validate Real Functionality +## End-to-End Testing — Validate Real Functional Requirements -E2E tests validate real user workflows through the full stack. They exist to catch bugs that would affect real users. **A test that does not verify a real business outcome is a test that provides false confidence and must not exist.** +E2E tests validate real functional requirements through the full stack. They exist to catch bugs that would affect real users. **A test that does not verify a real business outcome is a test that provides false confidence and must not exist.** -> Every E2E test must answer one question: **"What would break for a real user if this test didn't exist?"** If the test could pass while the feature is fundamentally broken, it is worthless and must be rewritten. +> Every E2E test must answer one question: **"What functional requirement would be broken for a real user if this test didn't exist?"** If the test could pass while the requirement is fundamentally unmet, it is worthless and must be rewritten. ### What E2E Tests MUST Do -1. **Full round-trip verification.** Action → API call → database mutation → response → UI reflects new state. Not a subset — the whole chain. -2. **Multi-layer assertions.** The UI shows correct data AND the database contains the correct record AND side effects occurred (events published, emails queued, cache invalidated). +1. **Full round-trip verification.** Action → API call → database mutation → response → frontend reflects new state. Not a subset — the whole chain. +2. **Multi-layer assertions.** The frontend shows correct data AND the database contains the correct record AND side effects occurred (events published, notifications queued, cache invalidated). 3. **Verify at the data layer.** After a form submission, query the database directly to verify the record exists with correct fields. After a delete, verify it's gone. After auth, verify the token's claims and scopes. Do NOT stop at "success toast appeared." 4. **Test error paths.** For every happy-path test, write corresponding tests for: invalid input, unauthorized access, conflict/duplicate states, and not-found resources. 5. **Test authorization boundaries.** Verify user A cannot access user B's resources. Verify regular users cannot hit admin endpoints. Verify expired/revoked tokens are rejected. @@ -77,94 +77,95 @@ These are non-negotiable. Tests exhibiting these patterns MUST be rejected: | Anti-Pattern | Why It Fails | Fix | |---|---|---| -| **Smoke test disguised as E2E** | Verifies the page loads, not that anything works | Add assertions on business outcomes after user actions | -| **UI-only assertions** | Cached/stale UI can show "Success" while the write failed | Query the database or API to verify the actual state change | -| **Mocking the entire backend** | Eliminates the integration being tested | Hit the real backend. Use testcontainers or test databases | +| **Smoke test disguised as E2E** | Verifies the page loads, not that a functional requirement works | Add assertions on business outcomes after user actions | +| **Frontend-only assertions** | Cached/stale frontend can show "Success" while the write failed | Query the database or API to verify the actual state change | +| **Mocking the entire backend** | Eliminates the integration being tested | Hit the real backend with real databases (containers or dedicated test instances) | | **Asserting only on HTTP status codes** | A 200 with empty body or wrong data is still a bug | Always verify response body fields and database state | | **Arbitrary sleeps** | Flaky, slow, hides timing bugs | Poll for a condition with a timeout | | **Happy path only** | Production bugs live in error paths and edge cases | Test invalid input, unauthorized access, and conflicts | | **No cleanup / test pollution** | Tests depend on execution order, fail in isolation | Each test creates and cleans up its own data | -| **UI for preconditions** | 10x slower, couples test to unrelated UI flows | Use API calls or direct DB inserts for setup | -| **Brittle selectors** | Breaks on any UI change | Use `data-testid` (web) or `testID` (React Native) exclusively | +| **Frontend for preconditions** | 10x slower, couples test to unrelated frontend flows | Use API calls or direct database inserts for setup | +| **Brittle selectors** | Breaks on any frontend change | Use stable test-ID attributes exclusively — never CSS classes, DOM hierarchy, or text content | | **Placeholder assertions** | `expect(true).toBe(true)` proves nothing | Assert on specific field values and business outcomes | ### Test Structure — Arrange, Act, Assert, Verify, Cleanup Every E2E test follows this structure: -1. **Arrange** — Create preconditions via API or direct DB insert (never via UI) +1. **Arrange** — Create preconditions via API or direct database insert (never via the frontend) 2. **Act** — Perform the user action under test -3. **Assert** — Check the immediate response (HTTP status + body, or UI feedback) +3. **Assert** — Check the immediate response (HTTP status + body, or frontend feedback) 4. **Verify** — Check the database/state store to confirm the real outcome 5. **Cleanup** — Remove test data (or use transactional rollback) ### Test Design Patterns -**Page Object Model (for UI E2E):** -- Encapsulate page interactions in page objects. Tests read as user stories, not DOM manipulation. +**Page/Screen Object Model (for frontend E2E):** +- Encapsulate page interactions in page/screen objects. Tests read as functional requirements, not DOM/view manipulation. - Page objects expose user-intent methods (`loginAs(user)`, `submitOrder(items)`) — not element-level methods. -- Selectors live in exactly one place (the page object). Use `data-testid` / `testID` exclusively. +- Selectors live in exactly one place (the page object). Use stable test-ID attributes exclusively. **Test Data Factories:** - Every test creates its own data. Never rely on pre-existing seed data. - Factories produce realistic, randomized data: `createUser({role: "admin"})`, `createOrder({status: "pending", items: 3})`. -- Factories use the API or database — NOT the UI. +- Factories use the API or database — NOT the frontend. **Multi-Layer Assertion Example:** -``` -// WRONG — only checks UI -await page.click("#submit-order"); -expect(await page.getText(".toast")).toBe("Order placed!"); -// RIGHT — checks UI + API response + database -const response = await submitOrderAndCapture(orderData); -expect(response.status).toBe(201); -expect(response.body.orderId).toBeDefined(); +```pseudocode +// WRONG — only checks frontend +click("#submit-order") +assert getText(".toast") == "Order placed!" + +// RIGHT — checks frontend + API response + database +response = submitOrderAndCapture(orderData) +assert response.status == 201 +assert response.body.orderId is not empty -const dbOrder = await db.orders.findById(response.body.orderId); -expect(dbOrder.status).toBe("confirmed"); -expect(dbOrder.items).toHaveLength(3); -expect(dbOrder.total).toBe(expectedTotal); +dbOrder = db.orders.findById(response.body.orderId) +assert dbOrder.status == "confirmed" +assert dbOrder.items.length == 3 +assert dbOrder.total == expectedTotal -expect(await page.getText("[data-testid='order-id']")).toContain(dbOrder.orderId); +assert getText("[test-id='order-id']") contains dbOrder.orderId ``` -### GraphQL E2E +### API E2E -- **Mutation → Query round trips.** Execute a mutation, then immediately query for the resource. Verify every field matches. This catches stale cache, serialization mismatches, and silent write failures. -- **Authorization on every resolver.** For every query and mutation, test with: valid token (succeeds), no token (rejected), wrong user's token (rejected), insufficient scope (rejected). -- **Subscription delivery.** Open a subscription, perform the triggering mutation, verify the subscription receives the correct payload within a timeout. -- **Pagination edge cases.** Test: empty results, exactly one page, last page terminates correctly, cursor stability across inserts, invalid cursors return helpful errors. +- **Write → Read round trips.** Execute a mutation/write, then immediately query for the resource. Verify every field matches. This catches stale cache, serialization mismatches, and silent write failures. +- **Authorization on every endpoint.** For every read and write operation, test with: valid token (succeeds), no token (rejected), wrong user's token (rejected), insufficient scope (rejected). +- **Real-time/subscription delivery.** If the API supports subscriptions or push, open a listener, perform the triggering write, verify the listener receives the correct payload within a timeout. +- **Pagination edge cases.** Test: empty results, exactly one page, last page terminates correctly, cursor/offset stability across inserts, invalid cursors return helpful errors. -### Go Backend E2E +### Backend E2E -- **Use `testcontainers-go` for real databases.** Spin up real PostgreSQL (and Redis, Kafka, etc.) per test suite. No mocking the database in E2E — ever. -- **Use `httptest.Server` with the real application.** Tests hit real HTTP endpoints, which hit the real database. -- **Test migrations.** Run migrations from scratch on test suite startup. If migrations fail, the test fails. -- **Test concurrency.** Double-submit for idempotency verification. Two users editing the same resource for optimistic locking. Use `errgroup` to fire concurrent requests. +- **Use containerized or dedicated test databases.** Spin up real database instances per test suite. No mocking the database in E2E — ever. +- **Run the real application server with test configuration.** Tests hit real API endpoints, which hit the real database. +- **Test migrations.** Run database migrations from scratch on test suite startup. If migrations fail, the test fails. +- **Test concurrency.** Double-submit for idempotency verification. Two users editing the same resource for optimistic locking. Fire concurrent requests from the test. - **Assert beyond status codes.** Verify response body fields, database state, audit log entries, published events. -### Mobile / React Native E2E +### Frontend E2E (Web and Mobile) -- **Use Detox or Maestro** for React Native. Run against real simulators/emulators. -- **Test full navigation flows** — deep links, back button, tab switching with state preservation, modal dismissal. -- **Test offline/online transitions** — disable network, verify cached data displays and writes queue, re-enable, verify sync. -- **Use `testID` prop exclusively for selectors.** Never match on displayed text (changes with localization). -- **Test on at least two form factors.** Never hardcode device dimensions. +- **Run against the real backend** — not a mocked API layer. The frontend E2E test environment connects to a real API backed by a real (test) database. +- **Test full navigation flows** — deep links, back navigation, tab switching with state preservation, modal dismissal. +- **Test offline/online transitions** (mobile) — disable network, verify cached data displays and writes queue, re-enable, verify sync. +- **Use stable test-ID attributes exclusively for selectors.** Never match on displayed text (changes with localization), CSS classes, or DOM/view hierarchy. +- **Test on at least two form factors** (mobile). Never hardcode device dimensions. ### Agentic Directives -1. Before writing any E2E test, state the user workflow in a comment: `// Workflow: User creates a project, verifies it appears in the list, and can be accessed by direct URL.` -2. Every test MUST include a database/state assertion. If the test only asserts on HTTP status or UI text, it is incomplete. -3. Every test MUST create its own preconditions via API/DB. Never assume data exists from a previous test. +1. Before writing any E2E test, state the functional requirement in a comment: `// Functional requirement: User creates a project, verifies it appears in the list, and can access it by direct URL.` +2. Every test MUST include a database/state assertion. If the test only asserts on HTTP status or frontend text, it is incomplete. +3. Every test MUST create its own preconditions via API/database. Never assume data exists from a previous test. 4. Every test MUST clean up after itself. Prefer transactional cleanup. -5. For every mutation test, write a corresponding verification query. Create → verify exists. Update → verify changed. Delete → verify gone. -6. Test at least one error case per endpoint/workflow: invalid input, missing auth, forbidden access, not-found, duplicate/conflict. +5. For every write operation test, write a corresponding verification read. Create → verify exists. Update → verify changed. Delete → verify gone. +6. Test at least one error case per endpoint/functional requirement: invalid input, missing auth, forbidden access, not-found, duplicate/conflict. 7. Use deterministic waits, not sleeps. Poll for a condition with a timeout. -8. Use `data-testid` / `testID` for all element selection. If one doesn't exist, add it to the component. -9. Name tests as behavioral specifications: not `test("submit form")` but `test("submitting a valid order creates a confirmed order record with correct line items and total")`. +8. Use stable test-ID attributes for all element selection. If one doesn't exist, add it to the component. +9. Name tests as functional requirement specifications: not `test("submit form")` but `test("submitting a valid order creates a confirmed order record with correct line items and total")`. 10. When testing auth flows, always test both positive AND negative: valid credentials succeed AND invalid credentials fail with the correct error. -11. Never generate placeholder assertions. Every `expect`/`assert` must check a meaningful, specific value. +11. Never generate placeholder assertions. Every assertion must check a meaningful, specific value. 12. When unsure whether a test is thorough enough, it is not. Add more assertions. Verify at more layers. Test one more error case. --- @@ -813,8 +814,8 @@ All services MUST use structured logging. Structured logs are machine-parseable, ### Format & Fields -- **Emit all logs as JSON objects in production.** Never use `fmt.Println`, `console.log`, or unstructured string interpolation for application logs. -- **Every log line MUST include these baseline fields:** +- **Emit all logs as JSON objects in production.** Never use unstructured print statements or string interpolation for application logs. +- **Every log line MUST include these baseline fields.** Configure your logging library at initialization to automatically include `timestamp`, `service`, and `version` in every log entry — do not rely on developers adding these per-call. | Field | Format | Example | |-------|--------|---------| @@ -828,7 +829,8 @@ All services MUST use structured logging. Structured logs are machine-parseable, - WRONG: `logger.Info("User 1234 placed order 5678")` - RIGHT: `logger.Info("order placed", "user_id", "1234", "order_id", "5678")` - **Use `snake_case` for all field names.** Check existing logs in the codebase for canonical field names before inventing new ones. -- **Canonical field names:** `user_id`, `request_id`, `trace_id`, `span_id`, `duration_ms`, `http_method`, `http_path`, `http_status`, `error_message`, `error_stack`. +- **Canonical field names:** `user_id`, `request_id`, `correlation_id`, `causation_id`, `trace_id`, `span_id`, `duration_ms`, `http_method`, `http_path`, `http_status`, `error_message`, `error_stack`. +- **Relationship between correlation IDs:** `request_id` is assigned per inbound request. `correlation_id` tracks a logical operation across multiple services/events (often equal to the originating `request_id`). `causation_id` identifies the direct parent event or command that triggered the current action. `trace_id` and `span_id` are OpenTelemetry-specific and bridge logs to distributed traces. When CQRS domain events include `correlation_id`/`causation_id` in metadata, these MUST use the same field names and values as the logging context. ### Correlation & Tracing @@ -850,7 +852,7 @@ All services MUST use structured logging. Structured logs are machine-parseable, **Rules:** - A 404 for a missing resource is INFO, not ERROR. - Validation failures caused by user input are WARN at most, usually INFO. -- Every ERROR and FATAL log MUST include `error_message` and, where available, `error_stack`. +- Every ERROR and FATAL log MUST include `error_message` (string) and, where available, `error_stack` (string). If the logging library supports automatic error serialization (e.g., passing an error object that the library expands into `error_message` and `error_stack` fields), use that mechanism — but verify the emitted JSON contains the canonical field names. - If an error is caught and handled with a fallback, log at WARN, not ERROR. ### What to Log @@ -879,7 +881,7 @@ These rules are deterministic — apply them whenever writing or modifying code: 3. When handling an error, log with at minimum: `level=error`, `msg=`, `error_message`, and all context IDs (`request_id`, `user_id`) available in scope. 4. When calling an external service, log before (at DEBUG) and after (ERROR on failure with `duration_ms`, INFO/DEBUG on success with `duration_ms`). 5. When a function receives a `context.Context` (Go) or request object (Node), extract the logger from it. NEVER create a new bare logger inside a handler. -6. NEVER log any variable named or containing: `password`, `secret`, `token`, `key`, `authorization`, `cookie`, `ssn`, `credit_card`. When in doubt, omit it. +6. NEVER log any variable whose name exactly matches or ends with: `password`, `secret`, `token`, `api_key`, `private_key`, `secret_key`, `access_key`, `authorization`, `cookie`, `ssn`, `credit_card`, `card_number`. When in doubt, omit the field rather than risk leaking sensitive data. 7. When adding retry logic, log each retry at WARN with `attempt_number` and `max_retries`. 8. When choosing a log level, ask: "Will someone be paged?" → ERROR. "Degradation?" → WARN. "Normal operation?" → INFO. "Only useful debugging?" → DEBUG. @@ -905,6 +907,8 @@ These rules are deterministic — apply them whenever writing or modifying code: CQRS separates read and write models to allow independent optimization of each. These rules define when and how to apply CQRS across the organization. CQRS is a **per-use-case decision**, not an architectural mandate — some operations within a service may use CQRS while others use standard CRUD. +**CQRS is not Event Sourcing.** CQRS only requires separate read and write models. Event Sourcing (persisting state as a sequence of events rather than current state) is an orthogonal pattern that pairs well with CQRS but is NOT required. Do not conflate them. Apply Event Sourcing only when there is an explicit requirement for full audit trails, temporal queries, or event replay — not as a default. + ### When to Apply - **Use CQRS** when read and write workloads have significantly different scaling, performance, or modeling requirements. @@ -959,7 +963,7 @@ Mixing these conventions (e.g., a command named `OrderPlaced` or an event named - Events MUST be immutable. Once published, an event's schema and data MUST NOT change. To evolve, publish new event types and deprecate old ones. - Events SHOULD carry enough data for consumers to process them without querying back to the source ("fat" events over "thin" events). -- Every event MUST include: `event_id` (unique), `event_type`, `aggregate_id`, `aggregate_type`, `timestamp`, `version`/`sequence_number`, `payload`, `metadata` (`correlation_id`, `causation_id`, `user_id`). +- Every event MUST include: `event_id` (unique), `event_type`, `aggregate_id`, `aggregate_type`, `timestamp`, `version`/`sequence_number`, `payload`, `metadata` (`correlation_id`, `causation_id`, `user_id`). The `correlation_id` and `causation_id` in event metadata use the same field names and semantics defined in the Structured Logging section — propagate them from the request/command context that triggered the event. - Event handlers (projectors, reactors, sagas) MUST be idempotent. Use `sequence_number` or `event_id` to detect and skip already-processed events. - **Outbox pattern:** When a command handler writes to the database AND publishes events, use the transactional outbox pattern — write events to an outbox table in the same transaction as the aggregate, then publish from the outbox asynchronously. This prevents dual-write problems.