fix: CI integration test corrections by danielmeppiel · Pull Request #109 · microsoft/apm

danielmeppiel · 2026-02-25T22:48:32Z

This PR is fixing integration tests in CI:

Skills are not compiled to agent.md files, and that has to be changed in the tests
We do now integrate skills into .github folder
We should not use .prompt.md files in our integration tests (deprecation ongoing in favor of skills)
We need to change from ComposioHQ to anthropics/skills to source real skills for our tests
Several other integration test CI fixes

- Fix _auto_install_virtual_package to handle subdirectory paths correctly (don't append .prompt.md when path already has an extension) - Add unit tests for subdirectory auto-install support - Update test repos: ComposioHQ/awesome-claude-skills → anthropics/skills - Correct skill integration assertions: .github/agents/ → .github/skills/ - Rename test_skill_compile.py → test_skill_integration.py with accurate class/method names reflecting that skills integrate at install, not compile - Add unauthenticated API fallback in setup-codex.sh - Remove stale metadata assertions in test_auto_integration.py - Update guardrailing test to use github/awesome-copilot references

Copilot

Pull request overview

This PR updates APM’s CI integration tests (and a small part of ScriptRunner) to reflect the current “skills-first” integration strategy: skills are copied to .github/skills/ at install time (not compiled into .agent.md), .prompt.md usage is being phased out in tests, and real upstream skills are sourced from anthropics/skills.

Changes:

Update ScriptRunner prompt discovery + virtual package auto-install to support virtual subdirectory skills and use canonical install paths.
Replace ComposioHQ skill references with anthropics/skills across integration tests; add a new integration suite verifying .github/skills/ integration and compile non-modification.
Adjust e2e/integration tests away from SKILL→agent expectations and remove prompt-metadata injection assertions.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/unit/test_script_runner.py	Adds unit coverage for SKILL.md discovery and virtual subdirectory auto-install.
src/apm_cli/core/script_runner.py	Extends discovery to find SKILL.md and updates virtual package install flow to use `DependencyReference.get_install_path()` and subdirectory downloads.
tests/integration/test_skill_integration.py	New integration tests validating skill install integrates into `.github/skills/` and compile doesn’t modify integrated skills.
tests/integration/test_skill_install.py	Updates skill source refs and assertions to expect `.github/skills/{name}/SKILL.md` rather than generated agent files.
tests/integration/test_skill_compile.py	Removes obsolete integration tests that assumed SKILL.md → `.agent.md` compilation.
tests/integration/test_mixed_deps.py	Updates mixed-dependency tests to use `anthropics/skills` and validate `.github/skills/` integration.
tests/integration/test_guardrailing_hero_e2e.py	Switches step 3 to install an instructions virtual file and updates assertions around AGENTS.md content.
tests/integration/test_auto_integration.py	Updates expectations to verify prompt integration copies content verbatim (no metadata injection).
scripts/runtime/setup-codex.sh	Adds a fallback unauthenticated GitHub API call when an authenticated “latest release” lookup yields no tag.

Comments suppressed due to low confidence (5)

tests/integration/test_skill_integration.py:90

This test runs apm install but does not assert on the subprocess return code. Because it later pytest.skip()s when expected files are missing, an installation failure (network, auth, repo change) can be silently treated as a passing test. Capture the result and assert success (or skip based on an explicit/expected failure condition) before comparing file contents.

        subprocess.run(
            [apm_command, "install", "anthropics/skills/skills/brand-guidelines"],
            cwd=temp_project,
            capture_output=True,
            text=True,
            timeout=120
        )

tests/integration/test_skill_integration.py:156

test_compile_does_not_modify_skills doesn't assert that apm install / apm compile succeeded, so a failing command could still lead to a passing test if the skill file isn't present or unchanged. Also, using st_mtime can be flaky on filesystems with coarse timestamp resolution; st_mtime_ns (or content hashing) is a more robust way to detect modifications.

        subprocess.run(
            [apm_command, "install", "anthropics/skills/skills/brand-guidelines"],
            cwd=temp_project,
            capture_output=True,
            text=True,
            timeout=120
        )
        
        skill_integrated = temp_project / ".github" / "skills" / "brand-guidelines" / "SKILL.md"
        
        if skill_integrated.exists():
            # Record modification time
            mtime_before = skill_integrated.stat().st_mtime
            
            # Run compile
            subprocess.run(
                [apm_command, "compile"],
                cwd=temp_project,
                capture_output=True,
                text=True,
                timeout=60
            )
            
            # Skill file should not be modified by compile
            mtime_after = skill_integrated.stat().st_mtime
            assert mtime_before == mtime_after, "Compile should not modify skill integrated at install"

tests/integration/test_skill_install.py:118

This test asserts on stdout content but doesn't assert the install command succeeded. If the install fails (auth/rate limit/etc.), stdout/stderr may still contain the searched strings and produce misleading results. Assert result.returncode == 0 (and include stderr in the assertion message) before checking output text.

This issue also appears on line 153 of the same file.

        result = subprocess.run(
            [apm_command, "install", "anthropics/skills/skills/brand-guidelines", "--verbose"],
            cwd=temp_project,
            capture_output=True,
            text=True,
            timeout=120
        )
        
        # Check for skill detection message
        assert "Claude Skill" in result.stdout or "SKILL.md detected" in result.stdout

tests/integration/test_skill_install.py:165

This test ignores the install subprocess return code and then pytest.skip()s if the skill path doesn't exist, which can hide unexpected install failures as skips. Capture the subprocess result and either assert success or skip only for explicitly recognized conditions (e.g., 404/not found).

        subprocess.run(
            [apm_command, "install", "anthropics/skills/skills/skill-creator", "--verbose"],
            cwd=temp_project,
            capture_output=True,
            text=True,
            timeout=120
        )
        
        skill_path = temp_project / "apm_modules" / "anthropics" / "skills" / "skills" / "skill-creator"
        
        if not skill_path.exists():
            pytest.skip("skill-creator not available")

tests/integration/test_skill_install.py:209

The class/docstring imply VSCode integration is disabled when .github/ is missing, but the CLI now auto-creates .github/ (as the standard skills root) and this test expects .github/skills/... to exist. Consider renaming the class/test and updating the docstring to reflect the actual behavior being validated (auto-creation + skill integration).

class TestSkillInstallWithoutVSCodeTarget:
    """Test skill installation when VSCode is not the target."""
    
    def test_skill_install_without_github_folder(self, tmp_path, apm_command):
        """Skill installs but no agent.md generated without .github/ folder."""
        project_dir = tmp_path / "no-vscode-project"

src/apm_cli/core/script_runner.py

tests/integration/test_skill_integration.py

tests/integration/test_guardrailing_hero_e2e.py

tests/integration/test_skill_install.py

tests/integration/test_guardrailing_hero_e2e.py

- Assert returncode == 0 in test_skill_integration.py instead of silently skipping on missing files (masks real install failures) - Update guardrailing test docstring to reflect actual flow (no longer claims to mirror README line-for-line) - Add assertion that instruction file is actually downloaded, not just the directory

Copilot AI review requested due to automatic review settings February 25, 2026 22:48

Copilot started reviewing on behalf of danielmeppiel February 25, 2026 22:49 View session

Merge branch 'main' into fix/integration-tests-and-auto-install

fec771e

Copilot AI reviewed Feb 25, 2026

View reviewed changes

danielmeppiel added the bug Something isn't working label Feb 25, 2026

danielmeppiel added this to the 0.7.4 milestone Feb 25, 2026

danielmeppiel requested review from SebastienDegodez and cteyton February 26, 2026 07:03

SebastienDegodez approved these changes Feb 26, 2026

View reviewed changes

danielmeppiel merged commit d8b546c into main Feb 26, 2026
5 of 6 checks passed

danielmeppiel deleted the fix/integration-tests-and-auto-install branch February 27, 2026 09:41

danielmeppiel mentioned this pull request Mar 3, 2026

release: v0.7.4 #139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: CI integration test corrections#109

fix: CI integration test corrections#109
danielmeppiel merged 3 commits intomainfrom
fix/integration-tests-and-auto-install

danielmeppiel commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

danielmeppiel commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants