fix(security): preserve leading BOM in strip_dangerous#372
Conversation
strip_dangerous() was stripping all BOM characters (U+FEFF), including a leading BOM at position 0. This contradicts its documented contract which states that info-level characters are preserved — and scan_text() classifies a leading BOM as info severity since it is standard practice for UTF-8 files. The fix checks whether the BOM is at position 0 before deciding to strip it. Mid-file BOMs (warning-level) are still stripped as before. Updated the corresponding test to assert the leading BOM is preserved.
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
This PR fixes ContentScanner.strip_dangerous() to preserve a legitimate leading BOM (U+FEFF) while still stripping mid-file BOMs, aligning behavior with the documented “info-level characters are preserved” contract.
Changes:
- Update
strip_dangerous()to keep BOM at index 0 and remove BOMs elsewhere. - Update the unit test to assert a leading BOM remains unchanged.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/apm_cli/security/content_scanner.py |
Adjust BOM handling in strip_dangerous() to preserve the leading BOM only. |
tests/unit/test_content_scanner.py |
Update test to validate preservation of a leading BOM. |
Restructure the BOM handling branch so the mid-file strip path uses an early continue and the leading-BOM path falls through to the common append, making the two behaviors self-documenting. Co-authored-by: Copilot <copilot@github.com>
danielmeppiel
left a comment
There was a problem hiding this comment.
Clean, well-scoped fix. strip_dangerous() now correctly preserves leading BOM (info-level) while still stripping mid-file BOM (warning-level), aligning with the documented security model contract.
Please ensure CI tests pass before merging. Thanks for the contribution! 🎉
|
@danielmeppiel All CI checks have passed, but the Integration Tests (PR) required check is stuck at "Waiting for status to be reported" and was never triggered. Could you re-run it or advise on how to trigger it? Thanks! |
|
It's a workflow gated on approval - it's running now, will merge. Thanks a lot for your contribution! |
Summary
ContentScanner.strip_dangerous()was stripping all BOM characters (U+FEFF), including a leading BOM at position 0. This contradicts its documented contract:scan_text()correctly classifies a leading BOM as"info"severity (standard practice for UTF-8 files), butstrip_dangerous()unconditionally stripped it anyway. Whenapm audit --stripruns on a file with a legitimate leading BOM, the BOM would be incorrectly removed.Fix
Files changed
src/apm_cli/security/content_scanner.py— 3-line fix instrip_dangerous()tests/unit/test_content_scanner.py— updated test to match corrected behaviorTest plan