diff --git a/AGENTS.md b/AGENTS.md index af0a2d1e5b..4a92691a42 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -772,6 +772,7 @@ Skills provide specialized, detailed knowledge on specific topics. **Use them on - **[console-rendering](skills/console-rendering/SKILL.md)** - Struct tag-based console rendering system for CLI output - **[error-messages](skills/error-messages/SKILL.md)** - Error message style guide for validation errors - **[error-pattern-safety](skills/error-pattern-safety/SKILL.md)** - Safety guidelines for error pattern regex +- **[error-recovery-patterns](specs/error-recovery-patterns.md)** - Error handling patterns, recovery strategies, and debugging techniques ### JavaScript & GitHub Actions - **[github-script](skills/github-script/SKILL.md)** - Best practices for GitHub Actions scripts using github-script diff --git a/specs/error-recovery-patterns.md b/specs/error-recovery-patterns.md new file mode 100644 index 0000000000..15d0b4201d --- /dev/null +++ b/specs/error-recovery-patterns.md @@ -0,0 +1,1137 @@ +# Error Recovery Patterns + +This document provides comprehensive guidance for error handling, common error scenarios, recovery strategies, and debugging techniques in GitHub Agentic Workflows (gh-aw). + +## Table of Contents + +1. [Error Handling Patterns](#error-handling-patterns) +2. [Common Error Scenarios](#common-error-scenarios) +3. [Error Message Templates](#error-message-templates) +4. [Debugging Runbook](#debugging-runbook) +5. [Error Categorization](#error-categorization) + +--- + +## Error Handling Patterns + +### Console Formatting Requirements + +All user-facing error messages **must** use console formatting helpers from the `pkg/console` package. This ensures consistent styling and proper output routing. + +**✅ CORRECT - Use console formatting:** +```go +import "github.com/githubnext/gh-aw/pkg/console" + +// Error messages +fmt.Fprintln(os.Stderr, console.FormatErrorMessage(err.Error())) + +// Success messages +fmt.Fprintln(os.Stderr, console.FormatSuccessMessage("Workflow compiled successfully")) + +// Info messages +fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Processing workflow...")) + +// Warning messages +fmt.Fprintln(os.Stderr, console.FormatWarningMessage("File has uncommitted changes")) +``` + +**❌ INCORRECT - Plain error output:** +```go +// Don't use plain fmt.Println +fmt.Println("Error:", err) + +// Don't use plain fmt.Fprintf without console formatting +fmt.Fprintf(os.Stderr, "Error: %v\n", err) +``` + +**Key rules:** +- **ALWAYS** use `fmt.Fprintln(os.Stderr, ...)` or `fmt.Fprintf(os.Stderr, ...)` for CLI logging +- **NEVER** use `fmt.Println()` or `fmt.Printf()` directly - all output should go to stderr +- Use console formatting helpers with `os.Stderr` for consistent styling +- For simple messages without console formatting: `fmt.Fprintf(os.Stderr, "message\n")` +- **Exception**: JSON output goes to stdout, all other output to stderr + +**Available console formatters:** +```go +console.FormatSuccessMessage(msg) // ✓ prefix, green +console.FormatInfoMessage(msg) // ℹ prefix, blue +console.FormatWarningMessage(msg) // ⚠ prefix, yellow +console.FormatErrorMessage(msg) // ✗ prefix, red +console.FormatCommandMessage(cmd) // $ prefix, dim +console.FormatProgressMessage(msg) // ⋯ prefix +console.FormatPromptMessage(msg) // ? prefix +console.FormatCountMessage(msg) // # prefix +console.FormatVerboseMessage(msg) // dim text +console.FormatLocationMessage(loc) // @ prefix +``` + +**Error messages with suggestions:** +```go +suggestions := []string{ + "Check that the workflow file exists", + "Verify the YAML syntax is correct", + "Ensure required fields are present", +} +fmt.Fprintln(os.Stderr, console.FormatErrorWithSuggestions( + "Failed to compile workflow", + suggestions, +)) +``` + +### Error Wrapping with `%w` + +Use `%w` to wrap errors when preserving the error chain is important for downstream code. However, when presenting errors to users, break the chain to avoid exposing internal error types. + +**Internal error handling (preserve chain):** +```go +// When error needs to be checked with errors.Is or errors.As +func processConfig(file string) error { + data, err := os.ReadFile(file) + if err != nil { + // Wrap with %w to preserve error type for os.IsNotExist checks + return fmt.Errorf("failed to read config file: %w", err) + } + return nil +} +``` + +**User-facing error handling (break chain):** +```go +// When presenting error to user +func compileWorkflow(file string) error { + config, err := loadConfig(file) + if err != nil { + // Format error for user, then create new error (breaks chain) + msg := console.FormatErrorMessage(err.Error()) + fmt.Fprintln(os.Stderr, msg) + // Create new error without %w to prevent internal types from leaking + return fmt.Errorf("compilation failed") + } + return nil +} +``` + +**Why break the error chain for users?** + +Internal error types (like `yaml.TypeError`, `os.PathError`) are implementation details that should not leak to user-facing error messages. Breaking the chain with `errors.New()` or `fmt.Errorf()` (without `%w`) prevents downstream code from making assumptions about internal error types. + +See `pkg/workflow/error_wrapping_test.go` for comprehensive examples of error wrapping patterns. + +**Key rules:** +- Use `%w` for internal errors that need type checking +- Use `%s` or `errors.New()` for user-facing errors to break the chain +- Always preserve context (file names, field names) in error messages +- Never expose internal error types (`yaml.TypeError`, `os.PathError`, etc.) to users + +### Debug Logging Standards + +Use the logger package for debug logging. Debug logs are only shown when the `DEBUG` environment variable matches the logger's namespace. + +**Creating a logger:** +```go +import "github.com/githubnext/gh-aw/pkg/logger" + +// Use pkg:filename convention for namespace +var log = logger.New("workflow:compiler") +``` + +**Logging debug messages:** +```go +// Simple message +log.Print("Starting compilation") + +// Formatted message +log.Printf("Processing %d workflows", count) + +// Check if enabled before expensive operations +if log.Enabled() { + result := expensiveOperation() + log.Printf("Result: %+v", result) +} +``` + +**Enabling debug logs:** +```bash +# Enable all debug logs +DEBUG=* gh aw compile workflow.md + +# Enable specific package +DEBUG=workflow:* gh aw compile workflow.md + +# Enable multiple packages +DEBUG=workflow:*,cli:* gh aw compile workflow.md + +# Exclude specific loggers +DEBUG=*,-workflow:test gh aw compile workflow.md + +# Disable colors (auto-disabled when piping) +DEBUG_COLORS=0 DEBUG=* gh aw compile workflow.md +``` + +**Category naming convention:** +- Pattern: `pkg:filename` (e.g., `cli:compile_command`, `workflow:compiler`) +- Use colon (`:`) as separator between package and file/component name +- Be consistent with existing loggers in the codebase + +**Key features:** +- Zero overhead when disabled (checked at logger construction) +- Time diff shown between log calls (e.g., `+50ms`, `+2.5s`) +- Auto-colors for each namespace in terminals +- Pattern matching with wildcards and exclusions + +See `pkg/logger/README.md` for complete documentation. + +### Panic Recovery + +Panics should be rare and only used for programming errors detected at initialization time. Never panic in user-facing code paths. + +**When to panic:** +- ✅ Embedded resource loading failure (schemas, action pins) +- ✅ JSON unmarshaling of hardcoded data structures +- ✅ Test setup failures in test helpers +- ✅ Programmer errors in test code + +**When NOT to panic:** +- ❌ User input validation errors +- ❌ File I/O errors +- ❌ Network errors +- ❌ Configuration errors +- ❌ Any runtime error that users might encounter + +**Examples of appropriate panic usage:** +```go +// Embedded schema loading (happens once at startup) +//go:embed schemas/workflow.json +var workflowSchema string + +func init() { + if workflowSchema == "" { + panic("failed to load embedded workflow schema") + } +} + +// Test setup helper (test code only) +func mustCreateTempFile(t *testing.T, content string) string { + tmpDir := t.TempDir() + file := filepath.Join(tmpDir, "test.md") + if err := os.WriteFile(file, []byte(content), 0644); err != nil { + panic(fmt.Sprintf("test setup failed: %v", err)) + } + return file +} +``` + +**Panic examples in the codebase:** +- `pkg/workflow/action_pins.go:51` - Failed to load embedded action pins +- `pkg/workflow/permissions_validator.go:47` - Failed to parse embedded permissions JSON +- `pkg/testutil/tempdir.go:30` - Test directory creation failed + +--- + +## Common Error Scenarios + +This section covers common errors users encounter, their causes, and step-by-step recovery procedures. + +### MCP Configuration Errors + +#### Scenario: Missing Required Fields + +**Error:** +``` +✗ tool 'my-server' has invalid MCP configuration: tool 'my-server' mcp configuration must specify either 'command' or 'container' +``` + +**Cause:** MCP server configuration is missing the required execution method. + +**Resolution:** +```yaml +# Add either 'command' or 'container' field +tools: + my-server: + command: "npx @my/server" # For stdio-based MCP + # OR + container: "ghcr.io/my-org/server:latest" # For containerized MCP +``` + +**Related validation:** `pkg/workflow/mcp_config_validation.go` + +#### Scenario: HTTP MCP with Container Field + +**Error:** +``` +✗ tool 'http-server' validation failed: http MCP servers cannot use 'container' field +``` + +**Cause:** HTTP-type MCP servers don't support container execution - they connect to existing HTTP endpoints. + +**Resolution:** +```yaml +# Remove 'container' field for HTTP servers +tools: + http-server: + type: http + url: "https://api.example.com" + # Remove: container: "..." +``` + +**Related validation:** `pkg/workflow/mcp_config_validation.go` + +#### Scenario: MCP Type Inference Confusion + +**Error:** +``` +✗ tool 'my-server' validation failed: missing required field 'url' for http type +``` + +**Cause:** The validator inferred HTTP type from the configuration, but the URL is missing. + +**Resolution:** + +Either add the URL for HTTP type: +```yaml +tools: + my-server: + type: http + url: "https://api.example.com" +``` + +Or switch to stdio type: +```yaml +tools: + my-server: + command: "node server.js" +``` + +**Debug steps:** +1. Enable debug logging: `DEBUG=workflow:mcp_config_validation gh aw compile workflow.md` +2. Check which type was inferred +3. Add explicit `type: stdio` or `type: http` to avoid confusion + +### Permission Validation Errors + +#### Scenario: Write Permissions in Strict Mode + +**Error:** +``` +✗ strict mode: write permission 'contents: write' is not allowed for security reasons +``` + +**Cause:** Strict mode (`--strict` flag) refuses write permissions to prevent security risks. + +**Resolution:** + +Use safe outputs instead of direct write permissions: +```yaml +# Remove write permissions +permissions: + contents: read # Change write to read + +# Use safe outputs for write operations +safe-outputs: + create-issue: + title: "Result" + body: "Workflow completed" +``` + +**Available safe output operations:** +- `create-issue` - Create GitHub issues +- `create-pull-request` - Create pull requests +- `add-comment` - Add comments to issues/PRs +- `update-issue` - Update existing issues + +**Related validation:** `pkg/workflow/strict_mode_validation.go` + +#### Scenario: Missing Permissions for GitHub Toolset + +**Error:** +``` +✗ validation failed: GitHub tool requires additional permissions: issues: write, pull-requests: write +``` + +**Cause:** The GitHub MCP toolset requires specific permissions that aren't granted. + +**Resolution:** +```yaml +# Add required permissions +permissions: + contents: read + issues: write # Add this + pull-requests: write # Add this + +tools: + github: + mode: remote + toolsets: [issues, pull_requests] +``` + +**Debug steps:** +1. Check which toolsets are enabled +2. Review required permissions in error message +3. Add missing permissions to `permissions:` section + +**Related validation:** `pkg/workflow/permissions_validator.go` + +### Network/Firewall Errors + +#### Scenario: Wildcard Network Access in Strict Mode + +**Error:** +``` +✗ strict mode: wildcard '*' is not allowed in network.allowed domains to prevent unrestricted internet access +``` + +**Cause:** Strict mode requires explicit domain allowlisting instead of wildcard access. + +**Resolution:** +```yaml +# Replace wildcard with explicit domains +network: + allowed: + - "api.github.com" + - "registry.npmjs.org" + # Or use ecosystem identifiers + - "python" # Allows PyPI and related domains + - "node" # Allows npm registry +``` + +**Available ecosystem identifiers:** +- `python` - PyPI, python.org +- `node` - npm registry +- `containers` - Docker Hub, GitHub Container Registry +- `go` - pkg.go.dev, proxy.golang.org + +**Related validation:** `pkg/workflow/strict_mode_validation.go` + +#### Scenario: Network Access Required but Not Configured + +**Error:** +``` +✗ MCP server 'my-server' requires network access but network configuration is missing +``` + +**Cause:** Custom MCP server needs network access, but no network permissions are configured. + +**Resolution:** +```yaml +# Add network configuration +network: + allowed: + - "api.example.com" # Add required domains + +tools: + my-server: + command: "node server.js" +``` + +**Debug steps:** +1. Identify which domains the MCP server needs to access +2. Add those domains to `network.allowed` +3. Test with `--strict` flag to ensure configuration is correct + +### Workflow Compilation Errors + +#### Scenario: Invalid YAML Syntax in Frontmatter + +**Error:** +``` +✗ failed to parse workflow frontmatter: invalid YAML syntax +``` + +**Cause:** The YAML frontmatter contains syntax errors. + +**Resolution:** + +1. Check for common YAML syntax errors: + - Incorrect indentation (must use spaces, not tabs) + - Missing colons after keys + - Unquoted strings with special characters + - Mismatched brackets/braces + +2. Use a YAML validator to check syntax: + ```bash + # Extract frontmatter and validate + head -n 20 workflow.md | grep -v '^---$' | yamllint - + ``` + +3. Common fixes: + ```yaml + # ❌ WRONG - tabs for indentation + tools: + github: + mode: remote + + # ✅ CORRECT - spaces for indentation + tools: + github: + mode: remote + + # ❌ WRONG - missing colon + tools + github: + mode: remote + + # ✅ CORRECT - colon after key + tools: + github: + mode: remote + ``` + +**Related validation:** `pkg/parser/frontmatter.go` + +#### Scenario: YAML 1.1 vs 1.2 Issues + +**Error:** +``` +✗ unexpected value for 'on' field: true (boolean) +``` + +**Cause:** GitHub Actions uses YAML 1.1, where `on` is parsed as boolean `true` instead of string "on". + +**Resolution:** + +Quote the `on` keyword: +```yaml +# ❌ WRONG - YAML 1.1 parses 'on' as boolean +on: + issues: + types: [opened] + +# ✅ CORRECT - quote 'on' to force string interpretation +"on": + issues: + types: [opened] +``` + +**Other YAML 1.1 keywords to quote:** +- `yes`, `no` (parsed as booleans) +- `on`, `off` (parsed as booleans) +- Numbers starting with 0 (parsed as octal) + +See `specs/yaml-version-gotchas.md` for complete guide. + +#### Scenario: Expression Size Limit Exceeded + +**Error:** +``` +✗ GitHub Actions expression exceeds size limit: 21000/20000 bytes +``` + +**Cause:** A single GitHub Actions expression is too large (limit is 20KB per expression). + +**Resolution:** + +1. Split large expressions into multiple steps: + ```yaml + # ❌ WRONG - one huge expression + - run: | + ${{ very.long.expression.that.exceeds.20KB }} + + # ✅ CORRECT - split into multiple steps + - name: Part 1 + run: ${{ expression.part1 }} + - name: Part 2 + run: ${{ expression.part2 }} + ``` + +2. Use environment variables to break up complex logic: + ```yaml + - env: + DATA: ${{ toJSON(github.event) }} + run: | + echo "$DATA" | jq '.issue.title' + ``` + +**Related validation:** `pkg/workflow/validation.go:validateExpressionSizes()` + +#### Scenario: Invalid Engine Specified + +**Error:** +``` +✗ invalid engine: 'chatgpt'. Valid engines are: copilot, claude, codex, custom +``` + +**Cause:** The specified engine is not supported. + +**Resolution:** +```yaml +# Use a valid engine +engine: copilot # or claude, codex, custom +``` + +**Available engines:** +- `copilot` - GitHub Copilot (default) +- `claude` - Anthropic Claude +- `codex` - OpenAI Codex +- `custom` - Custom engine configuration + +**Related validation:** `pkg/workflow/engine_validation.go` + +--- + +## Error Message Templates + +This section provides templates for writing clear, actionable error messages. Follow these templates when adding new validation or error handling. + +### Validation Error Template + +**Pattern:** `[what's wrong]. [what's expected]. [example of correct usage]` + +**Template:** +```go +return fmt.Errorf( + "invalid %s: %s. Expected %s. Example: %s", + fieldName, actualValue, expectedFormat, exampleUsage, +) +``` + +**Example:** +```go +return fmt.Errorf( + "invalid engine: %s. Valid engines are: copilot, claude, codex, custom. Example: engine: copilot", + engineID, +) +``` + +**Key elements:** +- State what's wrong (invalid value, missing field, wrong type) +- Explain what's expected (format, valid values, type) +- Provide concrete example of correct usage + +See `skills/error-messages/SKILL.md` for comprehensive style guide. + +### Runtime Error Template + +**Pattern:** `[operation failed]. [context]. [suggestion or next step]` + +**Template:** +```go +return fmt.Errorf( + "failed to %s: %s. Check that %s", + operation, err, suggestion, +) +``` + +**Example:** +```go +return fmt.Errorf( + "failed to read workflow file: %s. Check that the file exists and is readable", + err, +) +``` + +**Key elements:** +- Describe the operation that failed +- Include error context (which file, which step) +- Suggest how to fix or what to check + +### User-Actionable Error Template + +**Pattern:** `[error description]. [why it matters]. [how to fix]` + +**Template:** +```go +return fmt.Errorf( + "%s. This is required because %s. To fix: %s", + errorDescription, reason, fixInstructions, +) +``` + +**Example:** +```go +return fmt.Errorf( + "GitHub tool requires 'issues: write' permission. This is required because the workflow creates issues. To fix: add 'issues: write' to the permissions section", +) +``` + +**Key elements:** +- Clear error description +- Explain why it's a problem (security, correctness, compatibility) +- Provide step-by-step fix instructions + +### System/Internal Error Template + +**Pattern:** `[internal error]. [what user should do]` + +**Template:** +```go +return fmt.Errorf( + "internal error: %s. Please report this issue at https://github.com/githubnext/gh-aw/issues", + err, +) +``` + +**Example:** +```go +return fmt.Errorf( + "internal error: failed to load embedded schema. Please report this issue at https://github.com/githubnext/gh-aw/issues", +) +``` + +**Key elements:** +- Mark as internal error to distinguish from user errors +- Provide issue reporting link +- Don't expose implementation details to users + +--- + +## Debugging Runbook + +This section provides step-by-step debugging procedures for different error categories. + +### Enable DEBUG Logging + +Debug logging provides detailed information about what gh-aw is doing internally. + +**Enable all debug logs:** +```bash +DEBUG=* gh aw compile workflow.md 2>&1 | tee debug.log +``` + +**Enable specific packages:** +```bash +# Workflow compilation +DEBUG=workflow:* gh aw compile workflow.md + +# CLI commands +DEBUG=cli:* gh aw audit 123456 + +# Parser operations +DEBUG=parser:* gh aw compile workflow.md + +# Multiple packages +DEBUG=workflow:*,cli:* gh aw compile workflow.md +``` + +**Useful debug patterns:** +```bash +# Everything except tests +DEBUG=*,-*:test gh aw compile workflow.md + +# MCP-related only +DEBUG=*mcp* gh aw compile workflow.md + +# Validation-related only +DEBUG=*validation* gh aw compile workflow.md +``` + +**Debug output includes:** +- Namespace (e.g., `workflow:compiler`) +- Log message +- Time elapsed since last log (e.g., `+125ms`) + +**Tips:** +- Pipe to `tee` to save logs while viewing them: `DEBUG=* gh aw compile workflow.md 2>&1 | tee debug.log` +- Use `DEBUG_COLORS=0` when piping to files to remove color codes +- Look for validation log messages (e.g., `workflow:mcp_config_validation`) + +### Reading Error Chains with errors.As/errors.Is + +While user-facing errors should have broken chains, internal code may need to check error types. + +**Check for specific error type (errors.As):** +```go +var pathErr *os.PathError +if errors.As(err, &pathErr) { + // Handle file not found specifically + if pathErr.Err == os.ErrNotExist { + return fmt.Errorf("workflow file not found: %s", pathErr.Path) + } +} +``` + +**Check for sentinel error (errors.Is):** +```go +if errors.Is(err, os.ErrNotExist) { + return fmt.Errorf("file does not exist") +} +``` + +**When to use:** +- `errors.Is` - Check if error is or wraps a specific sentinel error +- `errors.As` - Check if error is or wraps a specific error type + +**Important:** User-facing errors in gh-aw intentionally break the error chain to prevent internal types from leaking. See `pkg/workflow/error_wrapping_test.go` for examples. + +### Analyzing Validation Failures + +When validation fails, follow this debugging procedure: + +**1. Identify the validation function:** +```bash +# Enable validation debug logs +DEBUG=*validation* gh aw compile workflow.md +``` + +Look for log messages like: +- `workflow:mcp_config_validation Validating MCP configurations for 3 tools` +- `workflow:strict_mode_validation Write permission validation failed: scope=contents` + +**2. Check the error message:** + +Error messages follow the pattern: `[what's wrong]. [what's expected]. [example]` + +Example: `invalid engine: 'chatgpt'. Valid engines are: copilot, claude, codex, custom. Example: engine: copilot` + +**3. Locate the validation code:** + +Based on error message or debug logs: +- MCP configuration → `pkg/workflow/mcp_config_validation.go` +- Strict mode → `pkg/workflow/strict_mode_validation.go` +- Permissions → `pkg/workflow/permissions_validator.go` +- Network → `pkg/workflow/firewall_validation.go` +- General → `pkg/workflow/validation.go` + +**4. Reproduce with minimal example:** + +Create a minimal workflow that triggers the error: +```yaml +--- +engine: copilot +tools: + github: + mode: invalid # Test invalid value +on: + issues: + types: [opened] +--- +# Minimal workflow +``` + +**5. Check validation logic:** + +Read the validation function to understand: +- What's being validated +- What values are allowed +- What the correct configuration should be + +**6. Fix and verify:** + +Update the workflow and recompile: +```bash +# Compile with debug logs to verify fix +DEBUG=*validation* gh aw compile workflow.md +``` + +### Troubleshooting MCP Server Issues + +MCP server issues are common. Follow this systematic approach: + +**1. Enable MCP debug logs:** +```bash +DEBUG=*mcp* gh aw compile workflow.md +``` + +**2. Check MCP configuration validation:** + +Common issues: +- Missing required fields (`command` or `container`) +- Wrong type inferred (stdio vs http) +- Invalid field combinations (e.g., http with container) + +**3. Verify MCP server execution:** + +For stdio MCP servers: +```bash +# Test command execution +npx @my/server --stdio + +# Check command exists +which npx +``` + +For containerized MCP servers: +```bash +# Test container image +docker pull ghcr.io/my-org/server:latest + +# Check Docker is available +docker ps +``` + +For HTTP MCP servers: +```bash +# Test HTTP endpoint +curl https://api.example.com/mcp/v1 + +# Check network access +DEBUG=*firewall* gh aw compile workflow.md +``` + +**4. Check network configuration:** + +If MCP server needs network access: +```yaml +network: + allowed: + - "api.example.com" # Add required domains + +tools: + my-server: + command: "node server.js" +``` + +**5. Verify permissions:** + +Some MCP servers require specific GitHub permissions: +```yaml +permissions: + contents: read + issues: write # If MCP server creates issues + +tools: + github: + mode: remote + toolsets: [issues] +``` + +**6. Common MCP error patterns:** + +| Error | Cause | Fix | +|-------|-------|-----| +| Missing command/container | No execution method | Add `command` or `container` field | +| HTTP with container | Invalid combination | Remove `container`, use `url` only | +| Network access required | Missing network config | Add `network.allowed` domains | +| Type inference wrong | Ambiguous configuration | Add explicit `type: stdio` or `type: http` | + +**7. Test with minimal configuration:** + +Start with simplest possible MCP config: +```yaml +tools: + test-server: + command: "echo 'test'" # Minimal stdio server +``` + +Then gradually add complexity: +```yaml +tools: + test-server: + command: "node server.js" + args: ["--port", "3000"] + env: + API_KEY: "${{ secrets.API_KEY }}" +``` + +**Related files:** +- `pkg/workflow/mcp_config_validation.go` - MCP validation logic +- `pkg/workflow/firewall_validation.go` - Network validation +- `pkg/workflow/tools.go` - Tool configuration handling + +--- + +## Error Categorization + +Errors in gh-aw fall into four main categories, each requiring different handling approaches. + +### User Errors + +**Definition:** Errors caused by incorrect user input or workflow configuration that users can fix themselves. + +**Examples:** +- Invalid YAML syntax in frontmatter +- Typos in engine names +- Missing required fields +- Invalid permission levels +- Wrong field types + +**Error message pattern:** +``` +✗ [what's wrong]. [what's expected]. [example of correct usage] +``` + +**Handling approach:** +- Provide clear, actionable error message +- Include example of correct usage +- Reference relevant documentation +- Use `console.FormatErrorMessage()` for output + +**Example:** +```go +return fmt.Errorf( + "invalid engine: %s. Valid engines are: copilot, claude, codex, custom. Example: engine: copilot", + engineID, +) +``` + +**Recovery:** User fixes the configuration and retries. + +### Configuration Errors + +**Definition:** Errors caused by incompatible or invalid workflow configuration that requires workflow changes. + +**Examples:** +- Write permissions in strict mode +- Wildcard network access in strict mode +- MCP server with missing network configuration +- Tool configuration without required permissions +- Expression size limit exceeded + +**Error message pattern:** +``` +✗ [what's wrong]. [why it matters]. [how to fix with configuration change] +``` + +**Handling approach:** +- Explain why the configuration is invalid +- Describe security or correctness implications +- Provide specific configuration changes needed +- Link to relevant documentation sections + +**Example:** +```go +return fmt.Errorf( + "strict mode: write permission 'contents: write' is not allowed for security reasons. Use 'safe-outputs.create-issue' or 'safe-outputs.create-pull-request' to perform write operations safely. See: https://githubnext.github.io/gh-aw/reference/safe-outputs/", +) +``` + +**Recovery:** User updates workflow configuration and recompiles. + +### System Errors + +**Definition:** Errors caused by environment or system issues that users need to fix in their environment. + +**Examples:** +- Docker not installed or not running +- File not found (missing workflow file) +- Network connectivity issues +- Insufficient disk space +- Permission denied (file system) + +**Error message pattern:** +``` +✗ [operation failed]: [system error]. Check that [environment requirement] +``` + +**Handling approach:** +- Describe what operation failed +- Include system error details +- Suggest environment checks +- Provide recovery steps + +**Example:** +```go +return fmt.Errorf( + "failed to validate Docker image: %s. Check that Docker is installed and running", + err, +) +``` + +**Recovery:** User fixes environment issue (install Docker, fix permissions, etc.) and retries. + +### Internal Errors + +**Definition:** Errors caused by bugs in gh-aw code that should be reported to maintainers. + +**Examples:** +- Failed to load embedded schemas +- Unexpected nil pointer +- Invalid state in compiler +- Schema validation bug +- Panic in production code path + +**Error message pattern:** +``` +✗ internal error: [brief description]. Please report this issue at https://github.com/githubnext/gh-aw/issues +``` + +**Handling approach:** +- Mark as internal error to distinguish from user errors +- Don't expose implementation details +- Provide issue reporting link +- Include minimal context for debugging (but not sensitive data) + +**Example:** +```go +return fmt.Errorf( + "internal error: failed to load embedded schema. Please report this issue at https://github.com/githubnext/gh-aw/issues", +) +``` + +**Recovery:** User reports issue with details. Maintainers fix bug in next release. + +### Error Category Decision Tree + +```mermaid +graph TD + A[Error Occurred] --> B{Who caused it?} + B -->|User| C{Can user fix it?} + B -->|System| D{Environment issue?} + B -->|gh-aw bug| E[Internal Error] + + C -->|Yes, in workflow| F[User Error] + C -->|Yes, but needs workflow change| G[Configuration Error] + + D -->|Yes| H[System Error] + D -->|No| E + + F --> I[Clear message + example] + G --> J[Explain implications + fix] + H --> K[Check environment + recovery] + E --> L[Report issue link] +``` + +### Error Handling Summary Table + +| Category | Output | Example Fix | Documentation | +|----------|--------|-------------|---------------| +| **User Error** | ✗ + clear message + example | Fix typo, add missing field | Link to reference docs | +| **Configuration Error** | ✗ + reason + how to reconfigure | Change permissions, update network | Link to guides | +| **System Error** | ✗ + check environment | Install Docker, fix permissions | Link to setup docs | +| **Internal Error** | ✗ + report issue | Report to GitHub | Link to issue tracker | + +--- + +## Additional Resources + +### Related Documentation + +- **[Error Message Style Guide](../skills/error-messages/SKILL.md)** - Comprehensive guide for writing validation error messages +- **[Console Formatting](../AGENTS.md#console-message-formatting)** - Console formatting requirements and helpers +- **[Debug Logging](../pkg/logger/README.md)** - Logger package documentation with DEBUG environment variable syntax +- **[Validation Architecture](validation-architecture.md)** - Overview of validation system organization +- **[GitHub Actions Security](github-actions-security-best-practices.md)** - Security best practices for error handling + +### Validation Files Reference + +| Domain | File | Purpose | +|--------|------|---------| +| **General** | `pkg/workflow/validation.go` | Cross-cutting validation concerns | +| **Strict Mode** | `pkg/workflow/strict_mode_validation.go` | Security policy enforcement | +| **MCP Config** | `pkg/workflow/mcp_config_validation.go` | MCP server configuration validation | +| **Permissions** | `pkg/workflow/permissions_validator.go` | GitHub permissions validation | +| **Network** | `pkg/workflow/firewall_validation.go` | Network access validation | +| **Python/pip** | `pkg/workflow/pip_validation.go` | Python package validation | +| **Node.js/npm** | `pkg/workflow/npm_validation.go` | NPM package validation | +| **Engine** | `pkg/workflow/engine_validation.go` | AI engine validation | + +### Testing Error Handling + +When adding new error handling: + +1. **Add validation tests** - Test both valid and invalid inputs +2. **Test error messages** - Verify error contains expected information +3. **Test error chains** - Ensure internal types don't leak (see `error_wrapping_test.go`) +4. **Add debug logging** - Help future debugging efforts + +**Example test:** +```go +func TestValidationErrorMessage(t *testing.T) { + err := validateEngine("invalid") + require.Error(t, err) + + // Error should explain what's wrong + assert.Contains(t, err.Error(), "invalid engine") + + // Error should list valid options + assert.Contains(t, err.Error(), "Valid engines are:") + + // Error should include example + assert.Contains(t, err.Error(), "Example:") +} +``` + +See `specs/testing.md` for complete testing guidelines. + +--- + +**Last Updated:** 2026-01-07