Skip to content

PR-context base-branch restore overwrites APM-restored .github/skills before Copilot starts #27566

@theletterf

Description

@theletterf

Summary

In a gh-aw workflow using:

  • engine: copilot
  • APM-imported skills via shared/apm.md
  • a prompt that explicitly instructs the agent to invoke skills using the exact form skill(skill: <name>)

…the Copilot agent still emits the shorthand form skill(<name>), which then fails with Skill not found.

At the same time, the APM job succeeds and the restored bundle clearly contains the skills.

This makes it look like either:

  1. the Copilot runtime is ignoring the requested invocation syntax and rewriting it to the wrong call shape, or
  2. the skill tool contract for APM-restored skills under Copilot is different from what the prompt/runtime suggests.

What we verified

APM succeeds

The workflow imports skills through APM, and the APM pack/unpack steps succeed. The bundle contents show the expected skills are present:

AW_APM_PACKAGES: ["elastic/elastic-docs-skills/skills/review/docs-check-style", ...]
...
[+] github.com/elastic/elastic-docs-skills/skills/review/docs-check-style
...
skills/: applies-to-tagging, content-type-checker, docs-check-style, flag-jargon-skill, frontmatter-audit

The prompt explicitly instructs exact-name invocation

The released workflow prompt tells the Copilot agent to use:

- `skill(skill: docs-check-style)`
- `skill(skill: docs-flag-jargon-skill)`
- `skill(skill: docs-frontmatter-audit)`
- `skill(skill: docs-content-type-checker)`
- `skill(skill: docs-applies-to-tagging)`

It also explicitly says:

Do not guess alternate invocation formats.

The runtime still emits the old shorthand form

In the original failing agent log, the model says:

Now let me run the docs skills in parallel to review the changed content.
✗ skill(docs-check-style) Skill not found: docs-check-style

So despite the updated prompt, the runtime behavior is still:

  • emitting skill(docs-check-style)
  • not skill(skill: docs-check-style)
  • and failing with Skill not found

Environment

Why I think this is a gh-aw/runtime issue

This does not look like an APM packaging failure:

  • the APM job completes successfully,
  • the unpacked bundle contains the skills,
  • and the prompt delivered to the agent includes the exact invocation syntax.

But the agent still uses the wrong call shape at runtime.

Expected behavior

One of these should be true:

  1. If skill(skill: docs-check-style) is the correct invocation form for Copilot + APM-restored skills, the agent should actually emit that call and succeed.
  2. If a different invocation form is required, the runtime/tooling/docs should expose that clearly so workflows can instruct the agent correctly.

Actual behavior

The Copilot agent ignores the explicit prompt guidance and emits skill(docs-check-style), which fails with Skill not found.

Update

We now have a narrower public repro that suggests the issue is PR-context specific, not a blanket APM-skill failure.

Public repro repo:

What the smoke tests show

  1. workflow_dispatch succeeds

    • A minimal Copilot-based gh-aw workflow that imports one public Elastic Docs Skill through APM can invoke that skill successfully.
    • The smoke test verified skill-returned metadata against the live public skill definition.
  2. pull_request fails

    • The same repo, same engine, same public skill import, and same MCP setup fail in PR context.
    • In PR context the workflow reports:
      • Skill "docs-check-style" not found. Available skills: customizing-copilot-cloud-agents-environment

This points away from “Copilot can never invoke APM-restored skills” and toward a PR-specific interaction in gh-aw job setup.

Likely root cause in gh-aw

In PR context, gh-aw restores trusted base-branch agent config folders after PR checkout. That restore includes .github.

The relevant code path is in pkg/workflow/pr.go, which generates the step:

  • Restore agent config folders from base branch

And the restore script in actions/setup/sh/restore_base_github_folders.sh does:

  • rm -rf "${DEST}"
  • cp -r "${SNAPSHOT}" "${DEST}"

That means .github from the trusted base snapshot fully overwrites .github in the workspace.

At the same time, the shared APM workflow currently restores the APM bundle as imported steps: in .github/workflows/shared/apm.md. Imported steps are merged before compiler-generated main job steps, while the PR-specific base-branch restore runs later.

So the effective order in PR context appears to be:

  1. restore APM bundle into the workspace
  2. restore trusted base-branch .github
  3. wipe out .github/skills restored from APM
  4. start the Copilot agent
  5. only builtin skills remain available

This ordering matches the observed behavior:

  • dispatch success when the base-branch restore step is skipped
  • PR failure when the restore step runs

Request

Could you clarify / investigate:

  • whether PR-context base-branch restoration is clobbering APM-restored .github/skills, and
  • whether the shared APM workflow should restore its bundle as pre-agent-steps instead of ordinary imported steps so trusted APM-restored skills survive until agent startup?

If helpful, I also have a small patch in my fork that changes the shared APM restore from steps: to pre-agent-steps: so the bundle is restored after the PR-specific base restore and immediately before agent execution.

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions