Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- Content security scanning: `apm audit` command with `--file`, `--strip`; install-time pre-deployment gate that blocks critical hidden Unicode characters (override with `--force`); advisory scanning in `compile` and `pack` (#313)
- Detect hidden Unicode characters: variation selectors (Glassworm attack vector), invisible math operators, bidi marks, annotation markers, and deprecated formatting characters in `apm audit` and install-time scanning — by @raye-deng ([#320](https://github.com/microsoft/apm/issues/320))
- `apm audit --strip` now removes all dangerous characters (critical + warning) while preserving legitimate content like emoji; improved help text and strip feedback messages
- Context-aware ZWJ detection — zero-width joiners inside emoji sequences (e.g. 👨‍👩‍👧) are recognized as info-level and preserved by `--strip`
- `apm audit --strip --dry-run` preview mode — shows per-file counts of strippable characters without modifying files
- Native Cursor IDE integration — `apm install` deploys primitives to `.cursor/` when the directory exists: instructions→rules (`.mdc`), agents, skills, hooks (`hooks.json`), and MCP (`mcp.json`)
- Native OpenCode integration — `apm install` deploys primitives to `.opencode/` when the directory exists: agents, commands (from prompts), skills, and MCP (`opencode.json`) — inspired by @timvw (#257, #306)
- `TargetProfile` data layer (`src/apm_cli/integration/targets.py`) — data-driven target definitions for scalable multi-target architecture
Expand Down
9 changes: 5 additions & 4 deletions docs/src/content/docs/enterprise/governance.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,16 +112,17 @@ APM scans for hidden Unicode characters that can embed invisible instructions in
apm audit # Scan all installed packages
apm audit <package> # Scan a specific package
apm audit --file .cursorrules # Scan any file (even non-APM-managed)
apm audit --strip # Remove non-critical characters
apm audit --strip # Remove hidden characters (preserves emoji)
apm audit --strip --dry-run # Preview what --strip would remove
```

### Exit codes

| Code | Meaning |
|------|---------|
| 0 | Clean — no findings, or info-only |
| 1 | Critical findings — tag characters or bidi overrides detected |
| 2 | Warnings only — zero-width characters or mid-file BOM |
| 0 | Clean — no findings, info-only, or successful strip |
| 1 | Critical findings — tag characters, bidi overrides, or variation selectors 17–256 |
| 2 | Warnings only — zero-width characters, bidi marks, or other suspicious content |

### The `--file` escape hatch

Expand Down
16 changes: 12 additions & 4 deletions docs/src/content/docs/enterprise/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,22 @@ APM does not use a package registry. Dependencies are specified as git repositor

### The threat

Researchers have found hidden Unicode characters embedded in popular shared rules files. Tag characters (U+E0001–E007F) map 1:1 to invisible ASCII. Bidirectional overrides can reorder visible text. Zero-width joiners create invisible gaps. LLMs tokenize all of these individually, meaning models process instructions that developers cannot see on screen.
Researchers have found hidden Unicode characters embedded in popular shared rules files. Tag characters (U+E0001–E007F) map 1:1 to invisible ASCII. Bidirectional overrides can reorder visible text. Zero-width joiners create invisible gaps. Variation selectors attach to visible characters, embedding invisible payload bytes that AST-based tools cannot detect. The Glassworm campaign (2026) exploited this mechanism to compromise repositories and VS Code extensions. LLMs tokenize all of these individually, meaning models process instructions that developers cannot see on screen.

### What APM detects

| Severity | Characters | Risk |
|----------|-----------|------|
| Critical | Tag characters (U+E0001–E007F), bidi overrides (U+202A–E, U+2066–9) | Hidden instruction embedding. Zero legitimate use in prompt files. |
| Warning | Zero-width spaces/joiners (U+200B–D), mid-file BOM (U+FEFF) | Common copy-paste debris, but can hide content. |
| Critical | Variation selectors 17–256 (U+E0100–E01EF) | Glassworm attack vector — invisible payload encoding. Zero legitimate use in prompt files. |
| Warning | Zero-width spaces/joiners (U+200B–D), mid-file BOM (U+FEFF) | Common copy-paste debris, but can hide content. ZWJ inside emoji sequences is downgraded to info. |
| Warning | Variation selectors 1–15 (U+FE00–FE0E) | CJK typography / text presentation selectors. Uncommon in prompt files. |
| Warning | Bidi marks (U+200E–F, U+061C) | Invisible directional marks. No legitimate use in prompt files. |
| Warning | Invisible operators (U+2061–4) | Zero-width math operators. No legitimate use in prompt files. |
| Warning | Annotation markers (U+FFF9–B) | Interlinear annotation delimiters that can hide text. |
| Warning | Deprecated formatting (U+206A–F) | Deprecated since Unicode 3.0, invisible. |
| Info | Non-breaking spaces (U+00A0), unusual whitespace (U+2000–200A) | Mostly harmless, flagged for awareness. |
| Info | Emoji presentation selector (U+FE0F) | Common with emoji, informational only. |

### Pre-deployment gate

Expand Down Expand Up @@ -102,7 +109,8 @@ Content scanning extends beyond install:
```bash
apm audit # Scan all installed packages
apm audit --file .cursorrules # Scan any file
apm audit --strip # Remove non-critical characters
apm audit --strip # Remove hidden characters (preserves emoji)
apm audit --strip --dry-run # Preview what --strip would remove
```

The `--file` flag is useful for inspecting files obtained outside APM — downloaded rules files, copy-pasted instructions, or files from pull requests.
Expand All @@ -118,7 +126,7 @@ Content scanning detects hidden Unicode characters. It does not detect:
- Semantic manipulation (subtly misleading but syntactically normal text)
- Binary payload embedding

`--strip` removes non-critical characters from deployed copies. It does not modify the source package — the next `apm install` restores them. For persistent remediation, fix the upstream package or pin to a clean commit.
`--strip` removes dangerous and suspicious characters (critical and warning) from deployed copies while preserving legitimate content like emoji and whitespace. Zero-width joiners inside emoji sequences (e.g. 👨‍👩‍👧) are recognized and preserved. Use `--strip --dry-run` to preview what would be removed before modifying files. Strip does not modify the source package — the next `apm install` restores them. For persistent remediation, fix the upstream package or pin to a clean commit.

### Planned hardening

Expand Down
20 changes: 12 additions & 8 deletions docs/src/content/docs/reference/cli-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,8 @@ apm audit [PACKAGE] [OPTIONS]

**Options:**
- `--file PATH` - Scan an arbitrary file instead of installed packages
- `--strip` - Strip non-critical hidden characters (zero-width spaces, unusual whitespace). Critical findings are preserved for manual review.
- `--strip` - Remove dangerous characters (critical + warning severity) while preserving info-level content like emoji. ZWJ inside emoji sequences is preserved.
- `--dry-run` - Preview what `--strip` would remove without modifying files (requires `--strip`)
- `-v, --verbose` - Show info-level findings and file details

**Examples:**
Expand All @@ -357,24 +358,27 @@ apm audit https://github.com/owner/repo
# Scan any file (even non-APM-managed)
apm audit --file .cursorrules

# Auto-strip zero-width characters
# Remove dangerous characters (preserves emoji)
apm audit --strip

# Preview what --strip would remove
apm audit --strip --dry-run

# Verbose output with info-level findings
apm audit --verbose
```

**Exit codes:**
| Code | Meaning |
|------|---------|
| 0 | Clean — no findings, or info-only |
| 1 | Critical findings — tag characters or bidi overrides detected |
| 2 | Warnings only — zero-width characters or mid-file BOM |
| 0 | Clean — no findings, info-only, or successful strip |
| 1 | Critical findings — tag characters, bidi overrides, or variation selectors 17–256 |
| 2 | Warnings only — zero-width characters, bidi marks, or other suspicious content |

**What it detects:**
- **Critical**: Unicode tag characters (U+E0001–E007F), bidirectional overrides — these have zero legitimate use in prompt files
- **Warning**: Zero-width spaces/joiners, mid-file BOM — common copy-paste debris
- **Info**: Non-breaking spaces, unusual whitespace — mostly harmless
- **Critical**: Tag characters (U+E0001–E007F), bidi overrides (U+202A–E, U+2066–9), variation selectors 17–256 (U+E0100–E01EF, Glassworm attack vector)
- **Warning**: Zero-width spaces/joiners (U+200B–D), variation selectors 1–15 (U+FE00–FE0E), bidi marks (U+200E–F, U+061C), invisible operators (U+2061–4), annotation markers (U+FFF9–B), deprecated formatting (U+206A–F), soft hyphen (U+00AD), mid-file BOM
- **Info**: Non-breaking spaces, unusual whitespace, emoji presentation selector (U+FE0F). ZWJ between emoji characters is context-downgraded to info.

### `apm pack` - Create a portable bundle

Expand Down
139 changes: 111 additions & 28 deletions src/apm_cli/commands/audit.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@

Exit codes:
0 — clean (no findings, or info-only)
1 — critical findings (tag characters, bidi overrides)
2 — warnings (zero-width chars, no critical)
1 — critical findings detected
2 — warnings only (no critical)
"""

import sys
Expand Down Expand Up @@ -247,18 +247,18 @@ def _render_summary(
color="red",
bold=True,
)
_rich_info(" Critical findings require manual review")
_rich_info(" These characters may embed invisible instructions")
_rich_info(" Review file contents, then run 'apm audit --strip' to remove")
elif warning > 0:
_rich_warning(
f"{STATUS_SYMBOLS['warning']} {warning} warning(s) in "
f"{affected} file(s) — zero-width or hidden characters"
f"{affected} file(s) — hidden characters detected"
)
_rich_info(" Run 'apm audit --strip' to remove non-critical characters")
_rich_info(" Run 'apm audit --strip' to remove hidden characters")
elif info > 0:
_rich_info(
f"{STATUS_SYMBOLS['info']} {info} info-level finding(s) in "
f"{affected} file(s) — unusual whitespace (use --verbose to see)"
f"{affected} file(s) — unusual characters (use --verbose to see)"
)
else:
_rich_success(
Expand All @@ -276,17 +276,14 @@ def _apply_strip(
findings_by_file: Dict[str, List[ScanFinding]],
project_root: Path,
) -> int:
"""Strip non-critical characters from affected files.
"""Strip dangerous and suspicious characters from affected files.

Only modifies files that resolve within *project_root* (for lockfile
paths) or that are given as absolute paths (for ``--file`` mode).
Returns number of files modified.
"""
modified = 0
for rel_path, findings in findings_by_file.items():
# Skip files with only critical findings (require manual review)
if all(f.severity == "critical" for f in findings):
continue

abs_path = Path(rel_path)
if not abs_path.is_absolute():
Expand All @@ -303,7 +300,7 @@ def _apply_strip(

try:
original = abs_path.read_text(encoding="utf-8")
cleaned = ContentScanner.strip_non_critical(original)
cleaned = ContentScanner.strip_dangerous(original)
if cleaned != original:
abs_path.write_text(cleaned, encoding="utf-8")
modified += 1
Expand All @@ -314,6 +311,79 @@ def _apply_strip(
return modified


def _preview_strip(
findings_by_file: Dict[str, List[ScanFinding]],
) -> int:
"""Preview what --strip would remove without modifying files.

Shows a summary of strippable characters per file.
Returns the number of files that would be modified.
"""
console = _get_console()
affected = 0

for rel_path, findings in findings_by_file.items():
# Only critical+warning chars are stripped
strippable = [f for f in findings if f.severity in ("critical", "warning")]
if not strippable:
continue
affected += 1

if affected == 0:
_rich_info("Nothing to clean — no strippable characters found")
return 0

_rich_echo("")
_rich_info(f"Dry run — the following would be removed by --strip:", symbol="search")
_rich_echo("")

if console:
try:
from rich.table import Table

table = Table(
show_header=True,
header_style="bold cyan",
)
table.add_column("File", style="white")
table.add_column("Critical", style="bold red", justify="right", width=10)
table.add_column("Warning", style="yellow", justify="right", width=10)
table.add_column("Total", style="bold white", justify="right", width=10)

for rel_path, findings in findings_by_file.items():
strippable = [f for f in findings if f.severity in ("critical", "warning")]
if not strippable:
continue
crit = sum(1 for f in strippable if f.severity == "critical")
warn = sum(1 for f in strippable if f.severity == "warning")
table.add_row(
rel_path,
str(crit) if crit else "-",
str(warn) if warn else "-",
str(len(strippable)),
)

console.print(table)
except (ImportError, Exception):
# Fallback: plain text
for rel_path, findings in findings_by_file.items():
strippable = [f for f in findings if f.severity in ("critical", "warning")]
if not strippable:
continue
_rich_echo(f" {rel_path}: {len(strippable)} character(s)", color="white")
else:
for rel_path, findings in findings_by_file.items():
strippable = [f for f in findings if f.severity in ("critical", "warning")]
if not strippable:
continue
_rich_echo(f" {rel_path}: {len(strippable)} character(s)", color="white")

_rich_echo("")
_rich_info(f"{affected} file(s) would be modified")
_rich_info("Run 'apm audit --strip' to apply")
return affected


# ── Command ────────────────────────────────────────────────────────


Expand All @@ -328,29 +398,39 @@ def _apply_strip(
@click.option(
"--strip",
is_flag=True,
help="Strip non-critical hidden characters (zero-width spaces, unusual whitespace)",
help="Remove hidden characters from scanned files (preserves emoji and whitespace)",
)
@click.option(
"--verbose",
"-v",
is_flag=True,
help="Show info-level findings and file details",
help="Show all findings including harmless ones",
)
@click.option(
"--dry-run",
is_flag=True,
help="Preview what --strip would remove without modifying files",
)
@click.pass_context
def audit(ctx, package, file_path, strip, verbose):
def audit(ctx, package, file_path, strip, verbose, dry_run):
"""Scan deployed prompt files for hidden Unicode characters.

Detects invisible characters that could embed hidden instructions in
prompt, instruction, and rules files. Critical findings (tag characters,
bidi overrides) require manual review. Warnings (zero-width chars) can
be removed with --strip.
prompt, instruction, and rules files. Dangerous and suspicious
characters can be removed with --strip.

\b
Exit codes:
0 Clean, info-only findings, or successful strip
1 Critical findings detected (hidden instructions)
2 Warning-only findings (suspicious but not critical)

\b
Examples:
apm audit # Scan all installed packages
apm audit my-package # Scan a specific package
apm audit --file .cursorrules # Scan any file
apm audit --strip # Remove non-critical chars
apm audit --strip # Remove dangerous/suspicious chars
"""
project_root = Path.cwd()

Expand Down Expand Up @@ -386,22 +466,25 @@ def audit(ctx, package, file_path, strip, verbose):
_rich_info("No deployed files found in apm.lock.yaml")
sys.exit(0)

# -- Warn if --dry-run used without --strip --
if dry_run and not strip:
_rich_info("--dry-run only works with --strip (e.g. apm audit --strip --dry-run)")

# -- Strip mode --
if strip and findings_by_file:
has_critical = any(
ContentScanner.has_critical(f) for f in findings_by_file.values()
)
if strip:
if not findings_by_file:
_rich_info("Nothing to clean — no hidden characters found")
sys.exit(0)
if dry_run:
_preview_strip(findings_by_file)
sys.exit(0)
modified = _apply_strip(findings_by_file, project_root)
if modified > 0:
_rich_success(
f"{STATUS_SYMBOLS['success']} Cleaned {modified} file(s)"
)
if has_critical:
_rich_warning(
"Critical findings were preserved — they require manual review"
)
_rich_info(" Inspect flagged files and remove tag/bidi characters")
sys.exit(1)
else:
_rich_info("Nothing to clean — no strippable characters found")
sys.exit(0)

# -- Display findings --
Expand Down
Loading
Loading