From 5c93a136c384bb670d6a13abc5118e60fd7affb1 Mon Sep 17 00:00:00 2001 From: Landon Cox Date: Fri, 20 Mar 2026 09:18:29 -0700 Subject: [PATCH 1/3] docs: add GHE Cloud data residency debugging guide Add docs/troubleshooting/debug-ghe.md with step-by-step setup and debugging instructions for running agentic workflows on GHE Cloud with data residency (*.ghe.com). Content sourced from the debugging discussion in #18480. Cross-references added from: - troubleshooting/common-issues.md (GHES section) - reference/engines.md (api-target section) - setup/cli.md (GHES support section) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/src/content/docs/reference/engines.md | 2 + docs/src/content/docs/setup/cli.md | 2 + .../docs/troubleshooting/common-issues.md | 3 + .../content/docs/troubleshooting/debug-ghe.md | 208 ++++++++++++++++++ 4 files changed, 215 insertions(+) create mode 100644 docs/src/content/docs/troubleshooting/debug-ghe.md diff --git a/docs/src/content/docs/reference/engines.md b/docs/src/content/docs/reference/engines.md index b9f27d1806..ece2083034 100644 --- a/docs/src/content/docs/reference/engines.md +++ b/docs/src/content/docs/reference/engines.md @@ -96,6 +96,8 @@ Environment variables can also be defined at workflow, job, step, and other scop The `api-target` field specifies a custom API endpoint hostname for the agentic engine. Use this when running workflows against GitHub Enterprise Cloud (GHEC), GitHub Enterprise Server (GHES), or any custom AI endpoint. +For a complete setup and debugging walkthrough for GHE Cloud with data residency, see [Debugging GHE Cloud with Data Residency](/gh-aw/troubleshooting/debug-ghe/). + ```yaml wrap engine: id: copilot diff --git a/docs/src/content/docs/setup/cli.md b/docs/src/content/docs/setup/cli.md index 865bcb3c9e..5e9db7b497 100644 --- a/docs/src/content/docs/setup/cli.md +++ b/docs/src/content/docs/setup/cli.md @@ -75,6 +75,8 @@ gh auth login --hostname github.enterprise.com # Authenticate gh aw logs workflow --repo github.enterprise.com/owner/repo # Use with commands ``` +For GHE Cloud with data residency (`*.ghe.com`), see the dedicated [Debugging GHE Cloud guide](/gh-aw/troubleshooting/debug-ghe/) for setup and troubleshooting steps. + Commands that support `--create-pull-request` (such as `gh aw add`, `gh aw add-wizard`, `gh aw init`, `gh aw update`, and `gh aw upgrade`) automatically detect the enterprise host from the git remote and route PR creation to the correct GHES instance. No extra flags are needed. `gh aw audit` and `gh aw add-wizard` also auto-detect the GHES host from the git remote, so running them inside a GHES repository works without setting `GH_HOST` manually. diff --git a/docs/src/content/docs/troubleshooting/common-issues.md b/docs/src/content/docs/troubleshooting/common-issues.md index 9248a87254..b5ff646d56 100644 --- a/docs/src/content/docs/troubleshooting/common-issues.md +++ b/docs/src/content/docs/troubleshooting/common-issues.md @@ -307,6 +307,9 @@ If this command fails, the account associated with the token does not have a val ## GitHub Enterprise Server Issues +> [!TIP] +> For a complete walkthrough of setting up and debugging workflows on **GHE Cloud with data residency** (`*.ghe.com`), see [Debugging GHE Cloud with Data Residency](/gh-aw/troubleshooting/debug-ghe/). + ### Copilot Engine Prerequisites on GHES Before running Copilot-based workflows on GHES, verify the following: diff --git a/docs/src/content/docs/troubleshooting/debug-ghe.md b/docs/src/content/docs/troubleshooting/debug-ghe.md new file mode 100644 index 0000000000..5276cc465a --- /dev/null +++ b/docs/src/content/docs/troubleshooting/debug-ghe.md @@ -0,0 +1,208 @@ +--- +title: Debugging GHE Cloud with Data Residency +description: Step-by-step guide for setting up and debugging agentic workflows on GitHub Enterprise Cloud with data residency (*.ghe.com). +sidebar: + order: 300 +--- + +This guide walks you through setting up and running agentic workflows on **GitHub Enterprise Cloud with data residency** (`*.ghe.com`). It reflects the configuration needed as of `gh aw` **v0.61.1+** for enterprises using data residency in the EU or other regions. + +> [!TIP] +> The one thing you must do differently from github.com is set `api-target` in your workflow frontmatter to `copilot-api..ghe.com`. Everything else flows from that. + +Based on the debugging discussion in [github/gh-aw#18480](https://github.com/github/gh-aw/issues/18480). + +## Prerequisites + +- A repository on your GHE Cloud data residency instance (e.g., `yourorg.ghe.com`) +- The `gh aw` CLI extension **v0.61.1 or later** (`gh extension install github/gh-aw`) +- Copilot enabled for your enterprise +- The `gh` CLI authenticated with your GHE instance: + ```bash + gh auth login --hostname yourorg.ghe.com + ``` + +## Setup + +### Step 1: Initialize Your Repository + +```bash +GH_HOST=yourorg.ghe.com gh aw init +``` + +### Step 2: Add a Workflow + +```bash +GH_HOST=yourorg.ghe.com gh aw add-wizard githubnext/agentics/repo-assist +``` + +Follow the prompts to configure the workflow for your repository. + +### Step 3: Configure the Engine for GHE (Critical) + +Open the generated workflow `.md` file (e.g., `.github/workflows/repo-assist.md`) and ensure the `engine` section in the YAML frontmatter includes `api-target` pointing to your enterprise's Copilot API subdomain: + +```aw +engine: + id: "copilot" + api-target: "copilot-api.yourorg.ghe.com" +``` + +Replace `yourorg` with your enterprise's slug — the subdomain portion of `yourorg.ghe.com`. + +**Why this is required**: On GHE Cloud with data residency, Copilot inference runs on a dedicated subdomain (`copilot-api.yourorg.ghe.com`) rather than the default `api.githubcopilot.com`. Without `api-target`, the AWF api-proxy routes requests to the wrong host, resulting in authentication failures. + +See [Enterprise API Endpoint](/gh-aw/reference/engines/#enterprise-api-endpoint-api-target) for full `api-target` documentation. + +### Step 4: Compile + +```bash +GH_HOST=yourorg.ghe.com gh aw compile repo-assist +``` + +The compiler (v0.61.1+) will automatically: +- Add your GHE domains (`api.yourorg.ghe.com`, `copilot-api.yourorg.ghe.com`) to the firewall allow-list +- Set `--copilot-api-target` for the AWF api-proxy +- Configure `GH_HOST` so the `gh` CLI targets the correct host + +### Step 5: Commit, Push, and Run + +```bash +git add .github/workflows/repo-assist.md .github/workflows/repo-assist.lock.yml +git commit -m "Add repo-assist agentic workflow" +git push + +# Dispatch the workflow +GH_HOST=yourorg.ghe.com gh workflow run repo-assist.lock.yml --ref main +``` + +## Troubleshooting + +If the workflow fails, start by using the Copilot CLI to help diagnose the issue. + +### Debugging with Copilot CLI Locally + +The fastest way to diagnose failures is to use the Copilot CLI interactively from your local machine. This lets you confirm Copilot can authenticate against your GHE instance and then use Copilot itself to help debug workflow failures. + +1. **Ensure you're authenticated with your GHE instance**: + ```bash + GH_HOST=yourorg.ghe.com gh auth status + ``` + +2. **Launch the Copilot CLI**: + ```bash + GH_HOST=yourorg.ghe.com copilot + ``` + +3. **Select the agentic-workflows agent** — when Copilot starts, run `/agent` and choose `agentic-workflows` from the list. + +4. **Ask Copilot to run and debug the workflow** — trigger the workflow, wait for it to complete, and then ask Copilot to analyze the results. For example: + ``` + Run the repo-assist workflow and check if it succeeds. + If it fails, help me debug the failure. + ``` + +Copilot has access to your workflow files, run logs, and the `gh aw audit` tool, so it can inspect failures end-to-end and suggest fixes. + +### Common Errors + +#### "Authentication failed" + +``` +Error: Authentication failed +Your GitHub token may be invalid, expired, or lacking the required permissions. +``` + +**Cause**: The `api-target` is missing or incorrect. The api-proxy is sending Copilot requests to the wrong endpoint. + +**Fix**: Verify your `.md` frontmatter has: + +```aw +engine: + id: "copilot" + api-target: "copilot-api.yourorg.ghe.com" +``` + +Then recompile with `GH_HOST=yourorg.ghe.com gh aw compile`. + +#### "none of the git remotes point to a known GitHub host" + +**Cause**: `GH_HOST` is not set. The `gh` CLI doesn't recognize your GHE instance as a GitHub host. + +**Fix**: Upgrade to `gh aw` v0.61.1+ and recompile. The compiler now auto-configures `GH_HOST` for GHE instances. + +#### "Not Found" during checkout steps + +**Cause**: The lock file is trying to access `github.com` repositories with your GHE-scoped token. This can happen with local builds of the compiler that use `actions/checkout` instead of the published `github/gh-aw-actions` action reference. + +**Fix**: Always compile with the installed `gh aw` extension rather than a local binary: + +```bash +GH_HOST=yourorg.ghe.com gh aw compile +``` + +See also [Copilot GHES: Common Error Messages](/gh-aw/troubleshooting/common-issues/#copilot-ghes-common-error-messages) for additional error patterns. + +### Advanced: Testing Copilot on the Runner Directly + +If you need to verify that Copilot auth works on the Actions runner itself (outside the AWF sandbox), add a temporary diagnostic step to the lock file before the Execute step: + +```yaml +- name: Test Copilot CLI directly + env: + GH_HOST: yourorg.ghe.com + GH_TOKEN: ${{ github.token }} + run: | + echo "GH_HOST=$GH_HOST" + echo "GITHUB_SERVER_URL=$GITHUB_SERVER_URL" + /usr/local/bin/copilot --version + /usr/local/bin/copilot --prompt "Say hello" --log-level all 2>&1 | head -50 +``` + +If this step succeeds but the Execute step fails, the problem is in the firewall or api-proxy configuration, not in Copilot auth. + +### Advanced: Capturing HTTP Traffic + +To see exactly which hosts the Copilot CLI contacts, add these environment variables to the Execute step: + +```yaml +env: + NODE_DEBUG: fetch,undici + UNDICI_DEBUG: full +``` + +> [!IMPORTANT] +> The Copilot CLI uses Node.js `fetch()`/`undici` internally, not the built-in `http`/`https` modules. Setting `NODE_DEBUG=http,https` will capture nothing. You must use `UNDICI_DEBUG=full`. + +The traffic capture reveals the four domains the CLI uses on data residency: + +| Domain | Purpose | +|--------|---------| +| `api.yourorg.ghe.com` | REST API, Copilot auth (`/copilot_internal/user`) | +| `copilot-api.yourorg.ghe.com` | Inference, model listing, MCP | +| `copilot-telemetry-service.yourorg.ghe.com` | Telemetry | +| `api.githubcopilot.com` | Shared Copilot services | + +### Advanced: Checking Firewall Logs + +Download the workflow run artifacts and inspect `sandbox/firewall/logs/access.log`. Each line shows whether a domain was allowed (`TCP_TUNNEL`) or blocked (`DENIED`). Verify that your `yourorg.ghe.com` domains appear as `TCP_TUNNEL`. + +## Required Domains Reference + +For GHE Cloud with data residency, the following domains must be reachable from inside the AWF sandbox. The compiler adds most of these automatically when `api-target` is set: + +| Domain | Auto-added by compiler? | Required for | +|--------|:-----------------------:|-------------| +| `yourorg.ghe.com` | ✅ | Git, web UI | +| `api.yourorg.ghe.com` | ✅ | REST API, Copilot auth | +| `copilot-api.yourorg.ghe.com` | ✅ | Inference, models, MCP | +| `copilot-telemetry-service.yourorg.ghe.com` | ❌ (add manually if needed) | Telemetry | + +To add the telemetry domain manually: + +```aw +network: + allowed: + - defaults + - copilot-telemetry-service.yourorg.ghe.com +``` From f132d3e9fcdc390a7945da1ad30eb010dccc9ea0 Mon Sep 17 00:00:00 2001 From: Landon Cox Date: Fri, 20 Mar 2026 09:26:42 -0700 Subject: [PATCH 2/3] docs: add general workflow debugging guide Add docs/troubleshooting/debugging.md with comprehensive debugging instructions for github.com users. Covers: - Using the Copilot CLI to debug workflows (launch, /agent, paste URL) - Alternative debugging via Copilot Chat and any coding agent - CLI commands: gh aw audit, gh aw logs, gh aw health, gh aw mcp inspect - Common errors: auth failures, missing tools, firewall blocks, safe-outputs issues, compilation errors - Advanced: debug logging, Actions debug mode, firewall log inspection, artifact analysis Cross-reference added from common-issues.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../docs/troubleshooting/common-issues.md | 2 + .../content/docs/troubleshooting/debugging.md | 313 ++++++++++++++++++ 2 files changed, 315 insertions(+) create mode 100644 docs/src/content/docs/troubleshooting/debugging.md diff --git a/docs/src/content/docs/troubleshooting/common-issues.md b/docs/src/content/docs/troubleshooting/common-issues.md index b5ff646d56..789934af0a 100644 --- a/docs/src/content/docs/troubleshooting/common-issues.md +++ b/docs/src/content/docs/troubleshooting/common-issues.md @@ -530,6 +530,8 @@ Recompile with `gh aw compile` after updating. If the workflow is consistently t Common causes: missing tokens, permission mismatches, network restrictions, disabled tools, or rate limits. Use `gh aw audit ` to investigate. +For a comprehensive walkthrough of all debugging techniques, see the [Debugging Workflows](/gh-aw/troubleshooting/debugging/) guide. + ### How Do I Debug a Failing Workflow? The fastest way to debug a failing workflow is to ask an agent. Load the `agentic-workflows` agent and give it the run URL — it will audit the logs, identify the root cause, and suggest targeted fixes. diff --git a/docs/src/content/docs/troubleshooting/debugging.md b/docs/src/content/docs/troubleshooting/debugging.md new file mode 100644 index 0000000000..4f4fe0859c --- /dev/null +++ b/docs/src/content/docs/troubleshooting/debugging.md @@ -0,0 +1,313 @@ +--- +title: Debugging Workflows +description: How to run, debug, and investigate agentic workflow failures using the Copilot CLI, gh aw audit, and log analysis. +sidebar: + order: 250 +--- + +This guide shows you how to debug agentic workflow failures on **github.com** using the Copilot CLI, the `gh aw` debugging commands, and manual investigation techniques. + +> [!TIP] +> The fastest path to a fix is to let an AI agent debug it for you. Launch the Copilot CLI, load the agentic-workflows agent, and paste the failing run URL. + +## Debugging with the Copilot CLI + +The Copilot CLI can audit logs, trace failures, and suggest fixes interactively. This is the recommended first step for any workflow failure. + +### Step 1: Launch the Copilot CLI + +```bash +copilot +``` + +### Step 2: Load the Agentic Workflows Agent + +Once inside the Copilot CLI, run: + +```text +/agent +``` + +Select **agentic-workflows** from the list. This gives Copilot access to the `gh aw audit`, `gh aw logs`, and other debugging tools. + +### Step 3: Ask Copilot to Debug the Failure + +Paste the failing run URL and ask Copilot to investigate: + +```text +Debug this workflow run: https://github.com/OWNER/REPO/actions/runs/RUN_ID +``` + +Copilot will: +- Download and audit the run logs +- Identify the root cause (missing tools, permission errors, network blocks, etc.) +- Suggest targeted fixes or open a pull request with the fix + +You can also ask follow-up questions: + +```text +What domains were blocked by the firewall? +Show me the safe-outputs from this run. +Why did the MCP server fail to connect? +``` + +### Alternative: Copilot Chat on GitHub.com + +If your repository is [configured for agentic authoring](/gh-aw/guides/agentic-authoring/), you can use Copilot Chat directly on GitHub.com: + +```text +/agent agentic-workflows debug https://github.com/OWNER/REPO/actions/runs/RUN_ID +``` + +### Alternative: Any Coding Agent + +For coding agents that don't have the agentic-workflows agent pre-configured, use the standalone debug prompt: + +```text +Debug this workflow run using https://raw.githubusercontent.com/github/gh-aw/main/debug.md + +The failed workflow run is at https://github.com/OWNER/REPO/actions/runs/RUN_ID +``` + +The agent will install `gh aw`, analyze logs, identify the root cause, and suggest a fix. + +## Debugging with CLI Commands + +### Auditing a Specific Run + +`gh aw audit` gives a comprehensive breakdown of a single run — overview, metrics, tool usage, MCP failures, firewall analysis, and artifacts: + +```bash +# By run ID +gh aw audit 12345678 + +# By full URL +gh aw audit https://github.com/OWNER/REPO/actions/runs/12345678 + +# By job URL (extracts first failing step) +gh aw audit https://github.com/OWNER/REPO/actions/runs/123/job/456 + +# By step URL (extracts a specific step) +gh aw audit https://github.com/OWNER/REPO/actions/runs/123/job/456#step:7:1 + +# Parse to markdown for sharing +gh aw audit 12345678 --parse +``` + +Audit output includes: +- **Failure analysis** with error summary and root cause +- **Tool usage** — which tools were called, which failed, and why +- **MCP server status** — connection failures, timeout errors +- **Firewall analysis** — blocked domains and allowed traffic +- **Safe-outputs** — structured outputs the agent produced + +### Analyzing Workflow Logs + +`gh aw logs` downloads and analyzes logs across multiple runs with tool usage, network patterns, errors, and warnings: + +```bash +# Download logs for a workflow +gh aw logs my-workflow + +# Filter by count and date range +gh aw logs my-workflow -c 10 --start-date -1w + +# Include firewall analysis +gh aw logs my-workflow --firewall + +# Include safe-output details +gh aw logs my-workflow --safe-output + +# JSON output for scripting +gh aw logs my-workflow --json +``` + +Results are cached locally for 10–100× speedup on subsequent runs. + +### Checking Workflow Health + +`gh aw health` gives a quick overview of workflow status across all workflows in a repository: + +```bash +gh aw health +``` + +### Inspecting MCP Configuration + +If you suspect MCP server issues, inspect the compiled configuration: + +```bash +# List all workflows with MCP servers +gh aw mcp list + +# Inspect MCP servers for a specific workflow +gh aw mcp inspect my-workflow + +# Open the web-based MCP inspector +gh aw mcp inspect my-workflow --inspector +``` + +## Common Errors + +### "Authentication failed" + +```text +Error: Authentication failed +Your GitHub token may be invalid, expired, or lacking the required permissions. +``` + +**Cause**: The Copilot token is missing, expired, or lacks required permissions. + +**Fix**: +1. Verify you have an active Copilot subscription +2. Check that the token has the **Copilot Requests** permission (for fine-grained PATs) +3. If using a custom `COPILOT_GITHUB_TOKEN`, verify it's valid: + ```bash + gh auth status + ``` +4. See [Authentication Reference](/gh-aw/reference/auth/) for token setup details + +### "Tool not found" or Missing Tool Calls + +**Cause**: The workflow references a tool that isn't configured or the MCP server failed to connect. + +**Fix**: +1. Run `gh aw mcp inspect my-workflow` to verify tool configuration +2. Check that the MCP server version is compatible +3. Ensure `tools:` section in frontmatter includes the required tool +4. Run `gh aw audit ` to see which tools were available vs. requested + +### Network / Firewall Blocks + +```text +DENIED CONNECT registry.npmjs.org:443 +``` + +**Cause**: The agent tried to reach a domain not in the firewall allow-list. + +**Fix**: Add the domain to the `network.allowed` list in your workflow frontmatter: + +```aw +network: + allowed: + - defaults + - registry.npmjs.org +``` + +Or use an ecosystem shorthand: + +```aw +network: + allowed: + - defaults + - node # Adds npm, yarn, pnpm registries + - python # Adds PyPI, conda registries +``` + +See [Network Configuration](/gh-aw/guides/network-configuration/) for common domain configurations. + +### Safe-Outputs Not Creating Issues / Comments + +**Cause**: The safe-outputs job failed, the agent didn't produce the expected output, or permissions are missing. + +**Fix**: +1. Run `gh aw audit ` and check the safe-outputs section +2. Verify the workflow has `permissions: issues: write` (or the relevant permission) +3. Check that `GH_AW_GITHUB_TOKEN` or `GITHUB_TOKEN` has write access +4. See [Safe Outputs Reference](/gh-aw/reference/safe-outputs/) for configuration details + +### Compilation Errors + +**Cause**: The workflow frontmatter has schema validation errors or unsupported fields. + +**Fix**: +1. Run the compiler with verbose output: + ```bash + gh aw compile my-workflow --verbose + ``` +2. Run the fixer for auto-correctable issues: + ```bash + gh aw fix --write + ``` +3. Validate without compiling: + ```bash + gh aw compile --validate + ``` +4. See [Error Reference](/gh-aw/troubleshooting/errors/) for specific error messages + +## Advanced Debugging + +### Enable Debug Logging + +The `DEBUG` environment variable enables detailed internal logging for any `gh aw` command: + +```bash +# All debug logs +DEBUG=* gh aw compile my-workflow + +# CLI-specific logs +DEBUG=cli:* gh aw audit 12345678 + +# Workflow compilation logs +DEBUG=workflow:* gh aw compile my-workflow + +# Multiple packages +DEBUG=workflow:*,cli:* gh aw compile my-workflow +``` + +> [!TIP] +> Debug output goes to `stderr`. Capture it with `2>&1 | tee debug.log`. + +### Enable GitHub Actions Debug Logging + +Set the `ACTIONS_STEP_DEBUG` secret to `true` in your repository to enable verbose step-level logging in GitHub Actions: + +1. Go to **Settings → Secrets and variables → Actions** +2. Add a secret: `ACTIONS_STEP_DEBUG` = `true` +3. Re-run the workflow + +This produces much more detailed logs in the Actions UI. + +### Inspecting Firewall Logs + +Download the workflow run artifacts and look for `sandbox/firewall/logs/access.log`. Each line shows whether a domain was allowed (`TCP_TUNNEL`) or blocked (`DENIED`): + +```text +TCP_TUNNEL/200 api.github.com:443 +DENIED CONNECT blocked-domain.com:443 +``` + +You can also use the CLI: + +```bash +gh aw logs my-workflow --firewall +gh aw audit # Includes firewall analysis +``` + +### Inspecting Artifacts + +Workflow runs produce several artifacts useful for debugging: + +| Artifact | Location | Contents | +|----------|----------|----------| +| `prompt.txt` | `/tmp/gh-aw/aw-prompts/` | The full prompt sent to the AI agent | +| `agent_output.json` | `/tmp/gh-aw/safeoutputs/` | Structured safe-output data | +| `agent-stdio.log` | `/tmp/gh-aw/` | Raw agent stdin/stdout log | +| `firewall-logs/` | `/tmp/gh-aw/firewall-logs/` | Network access logs | + +Download artifacts from the GitHub Actions run page or via the CLI: + +```bash +gh run download --repo OWNER/REPO +``` + +### Recompiling for a Quick Fix + +If you've identified the issue and made a change to the `.md` file, recompile and push: + +```bash +gh aw compile my-workflow +git add .github/workflows/my-workflow.md .github/workflows/my-workflow.lock.yml +git commit -m "fix: update workflow configuration" +git push +``` From 58170ab8397a45156c8a7c7efc0645afa2590285 Mon Sep 17 00:00:00 2001 From: Copilot <198982749+Copilot@users.noreply.github.com> Date: Fri, 20 Mar 2026 09:28:57 -0700 Subject: [PATCH 3/3] Initial plan (#21994) Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>