microsoft · danielmeppiel · Mar 16, 2026 · Mar 16, 2026 · Mar 16, 2026 · Mar 16, 2026
@@ -11,6 +11,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 
 - Content security scanning: `apm audit` command with `--file`, `--strip`; install-time pre-deployment gate that blocks critical hidden Unicode characters (override with `--force`); advisory scanning in `compile` and `pack` (#313)
+- Detect hidden Unicode characters: variation selectors (Glassworm attack vector), invisible math operators, bidi marks, annotation markers, and deprecated formatting characters in `apm audit` and install-time scanning — by @raye-deng ([#320](https://github.com/microsoft/apm/issues/320))
+- `apm audit --strip` now removes all dangerous characters (critical + warning) while preserving legitimate content like emoji; improved help text and strip feedback messages
+- Context-aware ZWJ detection — zero-width joiners inside emoji sequences (e.g. 👨‍👩‍👧) are recognized as info-level and preserved by `--strip`
+- `apm audit --strip --dry-run` preview mode — shows per-file counts of strippable characters without modifying files
 - Native Cursor IDE integration — `apm install` deploys primitives to `.cursor/` when the directory exists: instructions→rules (`.mdc`), agents, skills, hooks (`hooks.json`), and MCP (`mcp.json`)
 - Native OpenCode integration — `apm install` deploys primitives to `.opencode/` when the directory exists: agents, commands (from prompts), skills, and MCP (`opencode.json`) — inspired by @timvw (#257, #306)
 - `TargetProfile` data layer (`src/apm_cli/integration/targets.py`) — data-driven target definitions for scalable multi-target architecture

@@ -112,16 +112,17 @@ APM scans for hidden Unicode characters that can embed invisible instructions in
 apm audit                              # Scan all installed packages
 apm audit <package>                    # Scan a specific package
 apm audit --file .cursorrules          # Scan any file (even non-APM-managed)
-apm audit --strip                      # Remove non-critical characters
+apm audit --strip                      # Remove hidden characters (preserves emoji)
+apm audit --strip --dry-run            # Preview what --strip would remove
 ```
 
 ### Exit codes
 
 | Code | Meaning |
 |------|---------|
-| 0 | Clean — no findings, or info-only |
-| 1 | Critical findings — tag characters or bidi overrides detected |
-| 2 | Warnings only — zero-width characters or mid-file BOM |
+| 0 | Clean — no findings, info-only, or successful strip |
+| 1 | Critical findings — tag characters, bidi overrides, or variation selectors 17–256 |
+| 2 | Warnings only — zero-width characters, bidi marks, or other suspicious content |
 
 ### The `--file` escape hatch
 

@@ -65,15 +65,22 @@ APM does not use a package registry. Dependencies are specified as git repositor
 
 ### The threat
 
-Researchers have found hidden Unicode characters embedded in popular shared rules files. Tag characters (U+E0001–E007F) map 1:1 to invisible ASCII. Bidirectional overrides can reorder visible text. Zero-width joiners create invisible gaps. LLMs tokenize all of these individually, meaning models process instructions that developers cannot see on screen.
+Researchers have found hidden Unicode characters embedded in popular shared rules files. Tag characters (U+E0001–E007F) map 1:1 to invisible ASCII. Bidirectional overrides can reorder visible text. Zero-width joiners create invisible gaps. Variation selectors attach to visible characters, embedding invisible payload bytes that AST-based tools cannot detect. The Glassworm campaign (2026) exploited this mechanism to compromise repositories and VS Code extensions. LLMs tokenize all of these individually, meaning models process instructions that developers cannot see on screen.
 
 ### What APM detects
 
 | Severity | Characters | Risk |
 |----------|-----------|------|
 | Critical | Tag characters (U+E0001–E007F), bidi overrides (U+202A–E, U+2066–9) | Hidden instruction embedding. Zero legitimate use in prompt files. |
-| Warning | Zero-width spaces/joiners (U+200B–D), mid-file BOM (U+FEFF) | Common copy-paste debris, but can hide content. |
+| Critical | Variation selectors 17–256 (U+E0100–E01EF) | Glassworm attack vector — invisible payload encoding. Zero legitimate use in prompt files. |
+| Warning | Zero-width spaces/joiners (U+200B–D), mid-file BOM (U+FEFF) | Common copy-paste debris, but can hide content. ZWJ inside emoji sequences is downgraded to info. |
+| Warning | Variation selectors 1–15 (U+FE00–FE0E) | CJK typography / text presentation selectors. Uncommon in prompt files. |
+| Warning | Bidi marks (U+200E–F, U+061C) | Invisible directional marks. No legitimate use in prompt files. |
+| Warning | Invisible operators (U+2061–4) | Zero-width math operators. No legitimate use in prompt files. |
+| Warning | Annotation markers (U+FFF9–B) | Interlinear annotation delimiters that can hide text. |
+| Warning | Deprecated formatting (U+206A–F) | Deprecated since Unicode 3.0, invisible. |
 | Info | Non-breaking spaces (U+00A0), unusual whitespace (U+2000–200A) | Mostly harmless, flagged for awareness. |
+| Info | Emoji presentation selector (U+FE0F) | Common with emoji, informational only. |
 
 ### Pre-deployment gate
 
@@ -102,7 +109,8 @@ Content scanning extends beyond install:
 ```bash
 apm audit                        # Scan all installed packages
 apm audit --file .cursorrules    # Scan any file
-apm audit --strip                # Remove non-critical characters
+apm audit --strip                # Remove hidden characters (preserves emoji)
+apm audit --strip --dry-run      # Preview what --strip would remove
 ```
 
 The `--file` flag is useful for inspecting files obtained outside APM — downloaded rules files, copy-pasted instructions, or files from pull requests.
@@ -118,7 +126,7 @@ Content scanning detects hidden Unicode characters. It does not detect:
 - Semantic manipulation (subtly misleading but syntactically normal text)
 - Binary payload embedding
 
-`--strip` removes non-critical characters from deployed copies. It does not modify the source package — the next `apm install` restores them. For persistent remediation, fix the upstream package or pin to a clean commit.
+`--strip` removes dangerous and suspicious characters (critical and warning) from deployed copies while preserving legitimate content like emoji and whitespace. Zero-width joiners inside emoji sequences (e.g. 👨‍👩‍👧) are recognized and preserved. Use `--strip --dry-run` to preview what would be removed before modifying files. Strip does not modify the source package — the next `apm install` restores them. For persistent remediation, fix the upstream package or pin to a clean commit.
 
 ### Planned hardening
 

@@ -343,7 +343,8 @@ apm audit [PACKAGE] [OPTIONS]
 
 **Options:**
 - `--file PATH` - Scan an arbitrary file instead of installed packages
-- `--strip` - Strip non-critical hidden characters (zero-width spaces, unusual whitespace). Critical findings are preserved for manual review.
+- `--strip` - Remove dangerous characters (critical + warning severity) while preserving info-level content like emoji. ZWJ inside emoji sequences is preserved.
+- `--dry-run` - Preview what `--strip` would remove without modifying files (requires `--strip`)
 - `-v, --verbose` - Show info-level findings and file details
 
 **Examples:**
@@ -357,24 +358,27 @@ apm audit https://github.com/owner/repo
 # Scan any file (even non-APM-managed)
 apm audit --file .cursorrules
 
-# Auto-strip zero-width characters
+# Remove dangerous characters (preserves emoji)
 apm audit --strip
 
+# Preview what --strip would remove
+apm audit --strip --dry-run
+
 # Verbose output with info-level findings
 apm audit --verbose
 ```
 
 **Exit codes:**
 | Code | Meaning |
 |------|---------|
-| 0 | Clean — no findings, or info-only |
-| 1 | Critical findings — tag characters or bidi overrides detected |
-| 2 | Warnings only — zero-width characters or mid-file BOM |
+| 0 | Clean — no findings, info-only, or successful strip |
+| 1 | Critical findings — tag characters, bidi overrides, or variation selectors 17–256 |
+| 2 | Warnings only — zero-width characters, bidi marks, or other suspicious content |
 
 **What it detects:**
-- **Critical**: Unicode tag characters (U+E0001–E007F), bidirectional overrides — these have zero legitimate use in prompt files
-- **Warning**: Zero-width spaces/joiners, mid-file BOM — common copy-paste debris
-- **Info**: Non-breaking spaces, unusual whitespace — mostly harmless
+- **Critical**: Tag characters (U+E0001–E007F), bidi overrides (U+202A–E, U+2066–9), variation selectors 17–256 (U+E0100–E01EF, Glassworm attack vector)
+- **Warning**: Zero-width spaces/joiners (U+200B–D), variation selectors 1–15 (U+FE00–FE0E), bidi marks (U+200E–F, U+061C), invisible operators (U+2061–4), annotation markers (U+FFF9–B), deprecated formatting (U+206A–F), soft hyphen (U+00AD), mid-file BOM
+- **Info**: Non-breaking spaces, unusual whitespace, emoji presentation selector (U+FE0F). ZWJ between emoji characters is context-downgraded to info.
 
 ### `apm pack` - Create a portable bundle
 

@@ -7,8 +7,8 @@
 
 Exit codes:
     0 — clean (no findings, or info-only)
-    1 — critical findings (tag characters, bidi overrides)
-    2 — warnings (zero-width chars, no critical)
+    1 — critical findings detected
+    2 — warnings only (no critical)
 """
 
 import sys
@@ -247,18 +247,18 @@ def _render_summary(
             color="red",
             bold=True,
         )
-        _rich_info("  Critical findings require manual review")
         _rich_info("  These characters may embed invisible instructions")
+        _rich_info("  Review file contents, then run 'apm audit --strip' to remove")
     elif warning > 0:
         _rich_warning(
             f"{STATUS_SYMBOLS['warning']} {warning} warning(s) in "
-            f"{affected} file(s) — zero-width or hidden characters"
+            f"{affected} file(s) — hidden characters detected"
         )
-        _rich_info("  Run 'apm audit --strip' to remove non-critical characters")
+        _rich_info("  Run 'apm audit --strip' to remove hidden characters")
     elif info > 0:
         _rich_info(
             f"{STATUS_SYMBOLS['info']} {info} info-level finding(s) in "
-            f"{affected} file(s) — unusual whitespace (use --verbose to see)"
+            f"{affected} file(s) — unusual characters (use --verbose to see)"
         )
     else:
         _rich_success(
@@ -276,17 +276,14 @@ def _apply_strip(
     findings_by_file: Dict[str, List[ScanFinding]],
     project_root: Path,
 ) -> int:
-    """Strip non-critical characters from affected files.
+    """Strip dangerous and suspicious characters from affected files.
 
     Only modifies files that resolve within *project_root* (for lockfile
     paths) or that are given as absolute paths (for ``--file`` mode).
     Returns number of files modified.
     """
     modified = 0
     for rel_path, findings in findings_by_file.items():
-        # Skip files with only critical findings (require manual review)
-        if all(f.severity == "critical" for f in findings):
-            continue
 
         abs_path = Path(rel_path)
         if not abs_path.is_absolute():
@@ -303,7 +300,7 @@ def _apply_strip(
 
         try:
             original = abs_path.read_text(encoding="utf-8")
-            cleaned = ContentScanner.strip_non_critical(original)
+            cleaned = ContentScanner.strip_dangerous(original)
             if cleaned != original:
                 abs_path.write_text(cleaned, encoding="utf-8")
                 modified += 1
@@ -314,6 +311,79 @@ def _apply_strip(
     return modified
 
 
+def _preview_strip(
+    findings_by_file: Dict[str, List[ScanFinding]],
+) -> int:
+    """Preview what --strip would remove without modifying files.
+
+    Shows a summary of strippable characters per file.
+    Returns the number of files that would be modified.
+    """
+    console = _get_console()
+    affected = 0
+
+    for rel_path, findings in findings_by_file.items():
+        # Only critical+warning chars are stripped
+        strippable = [f for f in findings if f.severity in ("critical", "warning")]
+        if not strippable:
+            continue
+        affected += 1
+
+    if affected == 0:
+        _rich_info("Nothing to clean — no strippable characters found")
+        return 0
+
+    _rich_echo("")
+    _rich_info(f"Dry run — the following would be removed by --strip:", symbol="search")
+    _rich_echo("")
+
+    if console:
+        try:
+            from rich.table import Table
+
+            table = Table(
+                show_header=True,
+                header_style="bold cyan",
+            )
+            table.add_column("File", style="white")
+            table.add_column("Critical", style="bold red", justify="right", width=10)
+            table.add_column("Warning", style="yellow", justify="right", width=10)
+            table.add_column("Total", style="bold white", justify="right", width=10)
+
+            for rel_path, findings in findings_by_file.items():
+                strippable = [f for f in findings if f.severity in ("critical", "warning")]
+                if not strippable:
+                    continue
+                crit = sum(1 for f in strippable if f.severity == "critical")
+                warn = sum(1 for f in strippable if f.severity == "warning")
+                table.add_row(
+                    rel_path,
+                    str(crit) if crit else "-",
+                    str(warn) if warn else "-",
+                    str(len(strippable)),
+                )
+
+            console.print(table)
+        except (ImportError, Exception):
+            # Fallback: plain text
+            for rel_path, findings in findings_by_file.items():
+                strippable = [f for f in findings if f.severity in ("critical", "warning")]
+                if not strippable:
+                    continue
+                _rich_echo(f"  {rel_path}: {len(strippable)} character(s)", color="white")
+    else:
+        for rel_path, findings in findings_by_file.items():
+            strippable = [f for f in findings if f.severity in ("critical", "warning")]
+            if not strippable:
+                continue
+            _rich_echo(f"  {rel_path}: {len(strippable)} character(s)", color="white")
+
+    _rich_echo("")
+    _rich_info(f"{affected} file(s) would be modified")
+    _rich_info("Run 'apm audit --strip' to apply")
+    return affected
+
+
 # ── Command ────────────────────────────────────────────────────────
 
 
@@ -328,29 +398,39 @@ def _apply_strip(
 @click.option(
     "--strip",
     is_flag=True,
-    help="Strip non-critical hidden characters (zero-width spaces, unusual whitespace)",
+    help="Remove hidden characters from scanned files (preserves emoji and whitespace)",
 )
 @click.option(
     "--verbose",
     "-v",
     is_flag=True,
-    help="Show info-level findings and file details",
+    help="Show all findings including harmless ones",
+)
+@click.option(
+    "--dry-run",
+    is_flag=True,
+    help="Preview what --strip would remove without modifying files",
 )
 @click.pass_context
-def audit(ctx, package, file_path, strip, verbose):
+def audit(ctx, package, file_path, strip, verbose, dry_run):
     """Scan deployed prompt files for hidden Unicode characters.
 
     Detects invisible characters that could embed hidden instructions in
-    prompt, instruction, and rules files. Critical findings (tag characters,
-    bidi overrides) require manual review. Warnings (zero-width chars) can
-    be removed with --strip.
+    prompt, instruction, and rules files. Dangerous and suspicious
+    characters can be removed with --strip.
+
+    \b
+    Exit codes:
+        0  Clean, info-only findings, or successful strip
+        1  Critical findings detected (hidden instructions)
+        2  Warning-only findings (suspicious but not critical)
 
     \b
     Examples:
         apm audit                      # Scan all installed packages
         apm audit my-package           # Scan a specific package
         apm audit --file .cursorrules  # Scan any file
-        apm audit --strip              # Remove non-critical chars
+        apm audit --strip              # Remove dangerous/suspicious chars
     """
     project_root = Path.cwd()
 
@@ -386,22 +466,25 @@ def audit(ctx, package, file_path, strip, verbose):
                 _rich_info("No deployed files found in apm.lock.yaml")
             sys.exit(0)
 
+    # -- Warn if --dry-run used without --strip --
+    if dry_run and not strip:
+        _rich_info("--dry-run only works with --strip (e.g. apm audit --strip --dry-run)")
+
     # -- Strip mode --
-    if strip and findings_by_file:
-        has_critical = any(
-            ContentScanner.has_critical(f) for f in findings_by_file.values()
-        )
+    if strip:
+        if not findings_by_file:
+            _rich_info("Nothing to clean — no hidden characters found")
+            sys.exit(0)
+        if dry_run:
+            _preview_strip(findings_by_file)
+            sys.exit(0)
         modified = _apply_strip(findings_by_file, project_root)
         if modified > 0:
             _rich_success(
                 f"{STATUS_SYMBOLS['success']} Cleaned {modified} file(s)"
             )
-        if has_critical:
-            _rich_warning(
-                "Critical findings were preserved — they require manual review"
-            )
-            _rich_info("  Inspect flagged files and remove tag/bidi characters")
-            sys.exit(1)
+        else:
+            _rich_info("Nothing to clean — no strippable characters found")
         sys.exit(0)
 
     # -- Display findings --