Skip to content

fix(policy): fail closed on malformed manifest YAML#1053

Merged
sergio-sisternes-epam merged 3 commits intomainfrom
fix/936-policy-malformed-yaml
Apr 29, 2026
Merged

fix(policy): fail closed on malformed manifest YAML#1053
sergio-sisternes-epam merged 3 commits intomainfrom
fix/936-policy-malformed-yaml

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

@sergio-sisternes-epam sergio-sisternes-epam commented Apr 29, 2026

Description

Replace silent policy-enforcement bypass with failing manifest-parse CheckResult when apm.yml is malformed YAML or not a mapping. Four code paths in policy_checks.py and ci_checks.py swallowed parse errors and returned permissive defaults, allowing an attacker to bypass all policy enforcement by introducing a subtly malformed manifest.

Fixes #936

Type of change

  • Bug fix
  • New feature
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally
  • All existing tests pass
  • Added tests for new functionality (if applicable)

Details

The vulnerability

Location Before (vulnerable) After (fixed)
run_policy_checks() except (ValueError, FileNotFoundError): return result -- empty CIAuditResult, all checks trivially pass Catches (ValueError, yaml.YAMLError), appends failing CheckResult(name="manifest-parse", passed=False)
run_baseline_checks() Same pattern -- skips all remaining baseline checks Same fix -- failing CheckResult instead of silent return
_check_lockfile_exists() except (ValueError, FileNotFoundError) returns passed=True Catches (ValueError, yaml.YAMLError), returns passed=False
_load_raw_apm_yml() except Exception: return None -- swallows everything silently Catches yaml.YAMLError and OSError specifically, logs WARNING

What's NOT changed

The org policy file (apm-policy.yml) loading path was already correct -- parser.py::load_policy() raises PolicyValidationError on malformed YAML, and outcome_routing.py handles malformed outcomes via the documented fetch_failure_default knob. This PR fixes the project manifest (apm.yml) parse path only.

Implementation (3 files)

  • src/apm_cli/policy/policy_checks.py -- fix _load_raw_apm_yml() and run_policy_checks()
  • src/apm_cli/policy/ci_checks.py -- fix _check_lockfile_exists() and run_baseline_checks()
  • src/apm_cli/policy/models.py -- add manifest-parse to _CHECK_ARTIFACT_MAP

Tests (2 files, 18 new tests)

  • tests/unit/policy/test_policy_checks.py -- TestLoadRawApmYml (6 tests), TestRunPolicyChecksMalformedManifest (5 tests incl. regression guard)
  • tests/unit/policy/test_ci_checks.py -- TestCheckLockfileExistsMalformedManifest (4 tests), TestRunBaselineChecksMalformedManifest (3 tests incl. monkeypatched second-catch path)

Documentation (5 files)

  • CHANGELOG.md -- Security entry under [Unreleased]
  • docs/src/content/docs/enterprise/security.md -- manifest integrity bullet
  • docs/src/content/docs/enterprise/policy-reference.md -- manifest_parse outcome row
  • docs/src/content/docs/enterprise/governance-guide.md -- failure semantics table row
  • packages/apm-guide/.apm/skills/apm-usage/governance.md -- clarifying paragraph

Validation

  • 526 policy tests pass (508 existing + 18 new)
  • 6,744 total unit tests pass (1 pre-existing audit_report.py syntax error excluded)
  • No non-ASCII characters in any changed file

Copilot AI review requested due to automatic review settings April 29, 2026 18:35
Replace silent bypass with failing manifest-parse CheckResult when
apm.yml cannot be parsed. Fixes four code paths:

- run_policy_checks(): catch (ValueError, yaml.YAMLError) instead of
  silently returning empty CIAuditResult
- run_baseline_checks(): same pattern in ci_checks.py
- _check_lockfile_exists(): return passed=False instead of passed=True
- _load_raw_apm_yml(): replace bare except Exception with yaml.YAMLError
  and OSError, log WARNING on parse failure

Add manifest-parse to _CHECK_ARTIFACT_MAP for SARIF output.

14 new tests cover malformed YAML, non-dict YAML, scalar YAML, and
regression guards. 522 policy tests pass, 6744 total unit tests pass.

Update security.md, policy-reference.md, governance-guide.md,
governance.md skill resource, and CHANGELOG.

Closes #936

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/936-policy-malformed-yaml branch from a7074a2 to 704e3d7 Compare April 29, 2026 18:39
@sergio-sisternes-epam sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label Apr 29, 2026
@sergio-sisternes-epam sergio-sisternes-epam added this to the 0.12.0 milestone Apr 29, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes a security gap where malformed apm.yml could cause policy/baseline checks to be silently bypassed, ensuring audits fail closed when the manifest cannot be parsed.

Changes:

  • Fail-closed behavior added for malformed/non-mapping apm.yml across policy and baseline CI checks via a dedicated manifest-parse failing check.
  • _load_raw_apm_yml() tightened to avoid swallowing parse/read errors silently and to log warnings for visibility.
  • Docs and unit tests added/updated to describe and enforce the new failure semantics.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/policy/test_policy_checks.py Adds regression/unit tests covering malformed manifest handling and _load_raw_apm_yml() behavior.
tests/unit/policy/test_ci_checks.py Adds tests ensuring baseline checks fail-closed on malformed manifests (including a patched path).
src/apm_cli/policy/policy_checks.py Implements fail-closed manifest parsing in run_policy_checks() and adds warning logs in _load_raw_apm_yml().
src/apm_cli/policy/ci_checks.py Implements fail-closed manifest parsing in _check_lockfile_exists() and run_baseline_checks().
src/apm_cli/policy/models.py Registers the new manifest-parse check artifact mapping to apm.yml.
packages/apm-guide/.apm/skills/apm-usage/governance.md Documents that malformed apm.yml is unconditionally fail-closed via manifest-parse.
docs/src/content/docs/enterprise/security.md Documents the new manifest integrity behavior for audits.
docs/src/content/docs/enterprise/policy-reference.md Adds a manifest_parse row to the outcomes table (but see doc consistency comment below).
docs/src/content/docs/enterprise/governance-guide.md Adds a failure semantics row for malformed project manifest (but see naming consistency comment below).
CHANGELOG.md Records the security fix under [Unreleased].

Comment thread src/apm_cli/policy/policy_checks.py
Comment thread src/apm_cli/policy/policy_checks.py
Comment thread src/apm_cli/policy/policy_checks.py
Comment thread src/apm_cli/policy/ci_checks.py Outdated
Comment thread docs/src/content/docs/enterprise/policy-reference.md Outdated
@sergio-sisternes-epam sergio-sisternes-epam added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 29, 2026
- Split FileNotFoundError from OSError in _load_raw_apm_yml() to avoid
  noisy warnings for normal "missing file" conditions (TOCTOU)
- Add UnicodeDecodeError handling for non-UTF8/binary files
- Add OSError to TOCTOU catch in run_policy_checks() and
  _check_lockfile_exists() so file-disappear races map to manifest-parse
  failure instead of unhandled exceptions
- Fix docs: manifest_parse -> manifest-parse (hyphen) and clarify it is
  a local audit check, not a fetch outcome

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/936-policy-malformed-yaml branch from 477e8e9 to a40e911 Compare April 29, 2026 20:45
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict: REJECT

The fix correctly closes the malformed-YAML bypass path, but three structural defects -- a check-name contract violation, an unaddressed file-deletion bypass (identical threat model), and a TOCTOU dual-parse window -- block merge. CHANGELOG communication of the breaking behavior change is also required before this ships to 0.12.0 users.

Required before merge (10 items)

  • [python-architect] _check_lockfile_exists emits name="manifest-parse" instead of name="lockfile-exists" at src/apm_cli/policy/ci_checks.py:49

    • Why: SARIF routing, CI dashboards, and any downstream consumer keying on check names will silently misroute the result. The second manifest-parse guard in run_baseline_checks is also now unreachable, leaving dead code.
    • Suggested fix: Option A -- hoist manifest parsing out of _check_lockfile_exists and accept Optional[APMPackage] as a parameter; run_baseline_checks owns all parse-error handling. Option B -- always return name="lockfile-exists" from _check_lockfile_exists (with parse-error detail in message) and remove the dead guard from run_baseline_checks.
  • [python-architect] apm.yml is parsed twice in ci_checks.run_baseline_checks with a clear_apm_yml_cache() between reads at src/apm_cli/policy/ci_checks.py:455

    • Why: O(2) I/O where O(1) is correct. More importantly, the explicit cache-bust creates a TOCTOU window: if apm.yml changes between the two reads (race condition or mocked FS in tests), the two parses see different data -- the first parse may pass, the second may produce inconsistent policy results.
    • Suggested fix: Parse once in run_baseline_checks, thread the result through _check_lockfile_exists via a parameter; remove the clear_apm_yml_cache() call.
  • [cli-logging-expert] CheckResult is constructed with details=str(exc) (bare string) where details is typed List[str], and omits the required message field at src/apm_cli/policy/policy_checks.py and src/apm_cli/policy/ci_checks.py

    • Why: At runtime the SARIF/JSON renderer iterates details, causing a TypeError on a bare string. Missing message causes a construction error. (Note: the CEO flags uncertainty -- verify the actual post-merge code matches the diff. If the correct form message='Cannot parse apm.yml', details=[str(exc)] is already present in the real file, demote this to a nit.)
    • Suggested fix: CheckResult(name="manifest-parse", passed=False, message="Cannot parse apm.yml", details=[str(exc)]) at all four call sites.
  • [cli-logging-expert] New logger = logging.getLogger(__name__) bindings in both policy_checks.py and ci_checks.py conflict with the modules' existing _logger convention at src/apm_cli/policy/policy_checks.py and src/apm_cli/policy/ci_checks.py

    • Why: Both files already declare _logger = logging.getLogger(__name__) and existing call sites use _logger.warning(...). The PR introduces a second, inconsistently named binding for the same logger. (Note: CEO sides with downgrading this to a nit since it is a style issue, but it produces a redundant module-level name and contradicts the established convention in every other APM policy module.)
    • Suggested fix: Remove the new logger = ... lines and replace call sites in the new code with _logger.
  • [devx-ux-expert] _check_lockfile_exists emits name="manifest-parse" into result.checks[0], breaking the implicit contract that checks[0].name == "lockfile-exists" for all callers at src/apm_cli/policy/ci_checks.py:50

    • Why: Silent contract break for any downstream consumer that parses apm audit --ci --format json output and keys on check names. This is a distinct framing of the same root cause as the python-architect finding above; both are required.
  • [devx-ux-expert] CHANGELOG does not flag the silent-pass to fail-closed change as a breaking behavior change; no upgrade path is communicated at CHANGELOG.md

    • Why: Any repo with a latently malformed apm.yml (stray tab, unquoted colon, non-mapping root) will go from "CI passes" to "CI fails with exit 1" on upgrade to 0.12.0 with no warning. Developers will blame APM for "breaking CI." The same CHANGELOG file already has a **Migration:** pattern for breaking changes -- this entry must follow it.
    • Suggested fix: Add a **Migration:** block to the Security entry: what changed (silent pass -> exit 1), what triggers it (malformed YAML or non-mapping root in apm.yml), and how to detect before upgrading (python -c "import yaml; yaml.safe_load(open('apm.yml'))" or equivalent apm command).
  • [supply-chain-security-expert] Missing apm.yml silently skips ALL policy and baseline checks -- identical bypass class to the malformed-YAML path this PR is fixing at src/apm_cli/policy/policy_checks.py

    • Why: An attacker with write access to apm.yml can delete the file instead of malforming it and achieve exactly the same outcome: every check returns permissive. The PR explicitly leaves FileNotFoundError permissive. The threat model is identical to the CVE this PR is closing. At minimum, run_policy_checks should append a failing CheckResult (or document this as an explicitly accepted risk with a tracking issue) rather than returning a fully-passing CIAuditResult.
    • Suggested fix: In run_policy_checks and run_baseline_checks, treat FileNotFoundError as a CheckResult(name="manifest-parse", passed=False, message="apm.yml not found") rather than returning an empty passing result. Keep _check_lockfile_exists permissive on missing file if that is intentional for the lockfile-specific check.
  • [supply-chain-security-expert] _load_raw_apm_yml returning None on I/O error still produces permissive results for compilation-target, compilation-strategy, and scripts-policy checks via (raw_yml or {}).get(...) fallback at src/apm_cli/policy/policy_checks.py:23

    • Why: If an OS-level I/O error occurs on the second read (after APMPackage.from_apm_yml succeeded on the first), _load_raw_apm_yml returns None, and all raw-field checks silently skip with passed=True. While the primary path (APMPackage.from_apm_yml short-circuits) prevents the main bypass, a TOCTOU window between the two reads (symlink swap, network-mounted path) can produce inconsistent enforcement. This is the same dual-parse hazard the python-architect identified.
    • Suggested fix: Raise from _load_raw_apm_yml on all parse/IO errors (do not return None) and let run_policy_checks catch and handle. Alternatively, pass the already-parsed dict from APMPackage.from_apm_yml to eliminate the second read entirely.
  • [oss-growth-hacker] CHANGELOG entry lacks an explicit upgrade-friction warning; teams with silently-malformed apm.yml files will see CI break on upgrade to 0.12.0 with no in-CHANGELOG pointer to remediation at CHANGELOG.md

    • Why: This is the growth-layer framing of the same breaking-change communication gap devx-ux-expert identified. Both panels agree this is a required fix before releasing 0.12.0.
    • Suggested fix: Same as devx-ux-expert suggestion -- add a **Migration:** block with a remediation command.
  • [oss-growth-hacker] Error message "Cannot parse apm.yml: <exc>" surfaces the raw YAML exception text with no remediation hint in terminal or CI output at src/apm_cli/policy/ci_checks.py and src/apm_cli/policy/policy_checks.py

    • Why: The policy-reference table correctly documents the remediation (Fix the YAML syntax error in apm.yml) but it never surfaces in the terminal. When a user's CI breaks on upgrade, the first thing they see is a raw YAML parse exception. Other policy check failures in the codebase include actionable hints.
    • Suggested fix: Append " -- Fix the YAML syntax error in apm.yml and re-run." to the message field of the failing CheckResult, or set it as a distinct details entry. Apply consistently in all four call sites.

Nits (13 items, skip if you want)

  • [python-architect] details=str(exc) diff artifact -- if actual merged code already uses message= correctly, this is resolved.
  • [python-architect] _load_raw_apm_yml in policy_checks.py is a second silent parse path living alongside APMPackage.from_apm_yml -- add a docstring explaining the ordering and why None return is acceptable there.
  • [cli-logging-expert] Log level WARNING may be too low for a security gate that unconditionally fails CI; consider _logger.error(...).
  • [cli-logging-expert] New log messages omit the file path ('Cannot parse apm.yml: %s', exc); the existing ci_checks.py pattern ('Malformed YAML in %s: %s', apm_yml_path, exc) is more actionable in monorepos.
  • [devx-ux-expert] _load_raw_apm_yml still swallows errors and returns None in policy_checks.py -- not a bypass risk given the ordering, but inconsistent with the fail-closed philosophy; add a docstring note.
  • [devx-ux-expert] FileNotFoundError -> lockfile-exists passed=True with no message may confuse users on freshly initialized repos that have not yet created apm.yml.
  • [devx-ux-expert] Minor style: %-formatting in some error messages vs. f-string elsewhere in the same file.
  • [supply-chain-security-expert] message=str(exc) / details=str(exc) may expose absolute filesystem paths in CI logs and SARIF reports; consider relativizing the path before including it.
  • [supply-chain-security-expert] _check_lockfile_exists returning name="manifest-parse" on parse error is surprising for a function named _check_lockfile_exists; add a doc comment explaining the dual return names (or fix the root cause per the required finding).
  • [supply-chain-security-expert] Bare except Exception: pass when reading the lockfile for local-content detection silently swallows all read/parse errors -- add at minimum _logger.warning('Could not read lockfile for gating: %s', exc).
  • [oss-growth-hacker] governance-guide.md row for malformed (org policy file) at line 354 could add a parenthetical (applies to the org policy file, not apm.yml) to prevent double-take with the adjacent manifest_parse row.
  • [oss-growth-hacker] CHANGELOG Security entry could add a brief severity framing -- (severity: medium -- policy bypass) -- to help security-conscious adopters triage and improve searchability.
  • [oss-growth-hacker] security.md update mentions "manifest integrity" but has no "What to do if this check fires" callout for users who land there from a search on the error message.

CEO arbitration

The panel has converged on five substantive findings, three of which form a tightly coupled correctness-and-security cluster. First, the check-name contract violation in _check_lockfile_exists -- emitting name="manifest-parse" instead of name="lockfile-exists" -- is confirmed required by python-architect and devx-ux-expert, and echoed by supply-chain as a nit. This is not cosmetic: SARIF routing, CI dashboard keying, and any downstream JSON consumer that keys on check names will silently misroute the result. The unreachable second manifest-parse guard in run_baseline_checks compounds the issue. Second, the dual-parse of apm.yml identified by python-architect is escalated to a security concern by supply-chain-security: the explicit cache-bust between reads creates a TOCTOU window. These two findings should be treated as a single required fix -- unify to a single parse with the result threaded through all callers. Third, and most critically, supply-chain-security surfaces that a missing apm.yml triggers FileNotFoundError which flows into the same permissive fallback as the malformed-YAML path this PR is closing. The threat model is identical -- an attacker or misconfigured repo can bypass all policy and baseline checks by ensuring apm.yml is absent rather than malformed. This finding is sound and uncontested on its merits. The fourth cluster -- CHANGELOG communication and upgrade friction -- is raised as required by both devx-ux-expert and oss-growth-hacker and is straightforward to address before the 0.12.0 release cut.

Dissent resolved: The cli-logging-expert's required finding on CheckResult field types (details=str(exc) vs. List[str]) is noted with uncertainty -- the diff artifact may not reflect the actual merged code. If the string-not-list construction is real, it is a genuine runtime TypeError and is required; if it is a diff artifact, it is a nit. Verify against the actual post-merge file before acting on it. The cli-logging-expert's second required finding (logger vs. _logger naming) is substantively a style issue; the CEO sides with treating it as a nit rather than a blocker, though it should still be fixed for consistency.

Growth/positioning note: This fix is being undersold. "APM silently skipped your security policy checks when apm.yml was malformed -- 0.12.0 closes that bypass" is a trust signal, not just a bug fix. Enterprise buyers and CISO-facing teams respond to exactly this framing: a project that finds, acknowledges, and closes a silent security bypass with a fail-closed default is demonstrating security posture maturity. The CHANGELOG Migration block -- once written correctly -- becomes the skeleton of a short release highlight framed as "APM 0.12 closes a silent policy bypass." No CVE bar is met, but the honest narrative of "here is what was happening, here is what we changed, here is how to validate your config" is the right posture. The governance-guide.md and security.md updates already in this PR are the right supporting artifacts; they just need the user-facing remediation callout that is currently missing.


Per-persona findings (full)

Python Architect

classDiagram
    namespace models {
        class CheckResult {
            <<dataclass>>
            +name: str
            +passed: bool
            +message: str
            +details: List[str]
        }
        class CIAuditResult {
            <<dataclass>>
            +checks: List[CheckResult]
            +passed: bool
            +failed_checks: List[CheckResult]
            +to_json() dict
            +to_sarif() dict
        }
    }
    namespace policy_checks {
        class PolicyCheckModule {
            <<module>>
            +_load_raw_apm_yml(project_root) Optional[dict]
            +run_policy_checks(project_root, policy) CIAuditResult
            +run_baseline_checks(project_root) CIAuditResult
        }
    }
    namespace ci_checks {
        class CICheckModule {
            <<module>>
            +_check_lockfile_exists(project_root) CheckResult
            +run_baseline_checks(project_root) CIAuditResult
        }
    }
    class APMPackage {
        <<model>>
        +from_apm_yml(path) APMPackage$
        +has_apm_dependencies() bool
        +get_apm_dependencies() List[DependencyReference]
    }
    class LockFile {
        <<model>>
        +read(path) Optional[LockFile]$
        +dependencies: Dict[str, LockedDependency]
    }
    class ApmPolicy {
        <<model>>
        +dependency_policy: DependencyPolicy
    }
    CIAuditResult "1" o-- "*" CheckResult : contains
    PolicyCheckModule ..> CheckResult : creates
    PolicyCheckModule ..> CIAuditResult : returns
    PolicyCheckModule ..> APMPackage : parses via
    PolicyCheckModule ..> LockFile : reads
    PolicyCheckModule ..> ApmPolicy : enforces
    CICheckModule ..> CheckResult : creates
    CICheckModule ..> CIAuditResult : returns
    CICheckModule ..> APMPackage : parses via
    CICheckModule ..> LockFile : reads
    note for CICheckModule "_check_lockfile_exists can emit name=manifest-parse\ncontract mismatch (Required finding #1)"
    note for PolicyCheckModule "Two parse paths:\n1. APMPackage.from_apm_yml (raises, fail-closed)\n2. _load_raw_apm_yml (returns None, silent)\nOnly path 1 is security-critical."
    class CheckResult:::touched
    class CIAuditResult:::touched
    class PolicyCheckModule:::touched
    class CICheckModule:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A([apm audit --ci]) --> B["ci_checks.run_baseline_checks\nci_checks.py:425"]
    B --> C["_check_lockfile_exists\nci_checks.py:25 [FS]"]
    C --> D{"apm.yml exists?\n[FS]"}
    D -- no --> E["CheckResult(lockfile-exists, passed=True)"]
    D -- yes --> F["APMPackage.from_apm_yml\n[FS] -- PARSE 1"]
    F -- "ValueError/YAMLError" --> G["CheckResult(name=manifest-parse, passed=False)\nWRONG NAME for this function"]
    F -- ok --> H{"has_deps?"}
    H -- no deps/no lockfile --> I["CheckResult(lockfile-exists, passed=True)"]
    H -- deps declared --> J{"lockfile_path.exists?\n[FS]"}
    J -- yes --> K["CheckResult(lockfile-exists, passed=True)"]
    J -- no --> L["CheckResult(lockfile-exists, passed=False)"]
    E & G & I & K & L --> M["result.checks.append\nrun_baseline_checks:437"]
    M --> N{"result.checks[0].passed?"}
    N -- false --> O([return result fail-closed])
    N -- true --> P["clear_apm_yml_cache()\nci_checks.py:455 CACHE BUST"]
    P --> Q["APMPackage.from_apm_yml\n[FS] -- PARSE 2 REDUNDANT/TOCTOU"]
    Q -- "ValueError/YAMLError" --> R["CheckResult(manifest-parse, passed=False)\nresult.checks.append, return"]
    Q -- ok --> S["LockFile.read\n[FS]"]
    S --> T["_check_ref_consistency\n_check_deployed_files_present\n_check_no_orphans\n_check_content_integrity\n[FS/LOCK]"]
    T --> U([return CIAuditResult])
    B2([policy_checks.run_policy_checks]) --> V{"apm.yml exists?\n[FS]"}
    V -- no --> W([return empty CIAuditResult PERMISSIVE])
    V -- yes --> X["APMPackage.from_apm_yml\n[FS] -- raises on malformed"]
    X -- "ValueError/YAMLError" --> Y["CheckResult(manifest-parse, passed=False), return"]
    X -- ok --> Z["_load_raw_apm_yml\n[FS] -- PARSE 2 returns None on error"]
    Z --> AA["raw-field checks: compilation-target\ncompilation-strategy, scripts-policy\n(raw_yml or {}).get(...) -- PERMISSIVE on None"]
    AA --> AB([return CIAuditResult])
Loading

Design patterns

  • Used in this PR: Collect-then-render -- manifest-parse is appended as a standard CheckResult entry into CIAuditResult.checks, consistent with the APM-wide collect-then-render pattern. No exception path; no special-casing.
  • Used in this PR: Fail-fast short-circuit -- run_baseline_checks gates all subsequent expensive checks on result.checks[0].passed; this PR participates correctly in that gate (when it returns the right check name, which it currently does not).
  • Pragmatic suggestion: Hoist manifest parsing out of _check_lockfile_exists and accept Optional[APMPackage] as a parameter. This resolves the double-parse, the check-name contract violation, and makes run_baseline_checks the single owner of parse-error policy -- consistent with how run_policy_checks already owns it. No new abstraction; small signature change.

CLI Logging Expert

Required findings above. No additional findings.

DevX UX Expert

Required findings above. No additional findings.

Supply Chain Security Expert

Required findings above. Additional nit: yaml.safe_load confirmed safe -- no code-execution risk from attacker-controlled YAML content; no action required.

Auth Expert

Inactive -- PR touches only policy YAML parsing files (policy_checks.py, ci_checks.py, models.py) and documentation -- no auth, token, credential, or host-classification paths are affected.

OSS Growth Hacker

Required findings above. No additional findings beyond what is captured in the CEO growth/positioning note.

Verdict computed deterministically: 10 required findings across 5 active panelists. APPROVE iff N == 0. Push a new commit to clear this verdict label automatically.

Generated by PR Review Panel for issue #1053 · ● 2.7M ·

@github-actions github-actions Bot added panel-rejected Apm-review-panel verdict: REJECT. Removed automatically on next push. and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 29, 2026
Architecture:
- Hoist manifest parsing out of _check_lockfile_exists into
  run_baseline_checks -- single parse eliminates TOCTOU window
- _check_lockfile_exists now accepts Optional[APMPackage] and always
  returns name="lockfile-exists" (contract violation fixed)
- Remove redundant clear_apm_yml_cache() + second from_apm_yml() call
- Add OSError to catch tuple in run_baseline_checks
- Add _logger + debug logging for lockfile read exceptions

UX:
- All manifest-parse error messages include remediation hint
- CHANGELOG: add Migration block with detection command
- governance-guide.md: disambiguate org policy vs project manifest rows
- security.md: add remediation callout

Docstrings:
- _load_raw_apm_yml: document ordering, defence-in-depth contract

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot removed the panel-rejected Apm-review-panel verdict: REJECT. Removed automatically on next push. label Apr 29, 2026
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

Panel Findings -- Triage & Resolution

Thanks for the thorough review. All required findings have been triaged and addressed. Here's the breakdown:

Fixed (8/10 required items)

# Finding Resolution
1 _check_lockfile_exists emits name="manifest-parse" Fixed -- hoisted manifest parsing into run_baseline_checks; _check_lockfile_exists now accepts Optional[APMPackage] and always returns name="lockfile-exists"
2 Dual-parse + TOCTOU in run_baseline_checks Fixed -- single parse at top of run_baseline_checks, result threaded through. Removed redundant clear_apm_yml_cache() + second from_apm_yml()
3 details=str(exc) field type mismatch Dismissed -- diff artifact. Actual code uses message="Cannot parse apm.yml: ..." correctly, not details=
4 logger vs _logger naming conflict Dismissed -- diff artifact. Only _logger exists in the codebase
5 DevX framing of #1 Fixed -- same root cause as #1
6 CHANGELOG lacks Migration block Fixed -- added **Migration:** block with detection command (python -c "import yaml; ..." / apm audit --ci)
8 _load_raw_apm_yml None -> permissive on TOCTOU Fixed (docstring) -- enhanced docstring explaining ordering (runs after from_apm_yml succeeds), defence-in-depth contract, and that primary security gate is the earlier parse
9 Same as #6 (growth framing) Fixed -- same root cause
10 Error message lacks remediation hint Fixed -- all manifest-parse messages now include "-- fix the YAML syntax error in apm.yml and re-run."

Deferred (1/10)

# Finding Resolution
7 Missing apm.yml silently skips all checks Deferred to #1056 -- different threat model (deletion vs malformation). Missing apm.yml is a legitimate condition for non-APM projects. Scope expansion beyond #936. Created tracking issue with mitigation options.

Nits addressed

  • Added _logger + logging.getLogger(__name__) to ci_checks.py with debug logging for lockfile read exceptions (was bare except Exception: pass)
  • _load_raw_apm_yml docstring enhanced with ordering + defence-in-depth explanation
  • governance-guide.md: added (org policy file) parenthetical to distinguish from manifest-parse row
  • security.md: added remediation callout
  • CHANGELOG: added severity framing (severity: medium -- policy bypass)

Not addressed (rationale)

  • Log level WARNING vs ERROR: WARNING is correct -- the check itself is what fails CI (exit 1), the log is supplementary context for debugging
  • %-formatting vs f-string: existing codebase uses both; not a consistency issue within this PR
  • Relativizing filesystem paths in messages: the path comes from APMPackage.from_apm_yml exception text, which we don't control; would require wrapping all exception messages
  • FileNotFoundError -> lockfile-exists passed=True on fresh repos: this is intentional design -- no manifest means nothing to audit

@sergio-sisternes-epam sergio-sisternes-epam added this pull request to the merge queue Apr 29, 2026
Merged via the queue into main with commit 4e4df8d Apr 29, 2026
11 checks passed
@sergio-sisternes-epam sergio-sisternes-epam deleted the fix/936-policy-malformed-yaml branch April 29, 2026 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: policy_checks.py silently bypasses enforcement on malformed YAML

3 participants