fix: extend explicit UTF-8 encoding to 5 remaining open() call sites by danielmeppiel · Pull Request #1068 · microsoft/apm

danielmeppiel · 2026-04-30T12:51:20Z

Why

Windows users on cp950 / cp1252 locales hit UnicodeDecodeError on any apm operation that reads or writes UTF-8 content through the 5 modules listed below. Issue #604 covered the prompt-file path and was fixed by #607 (script_runner.py only). The original PR #635 covered the broader surface but was closed because the author did not sign CLA. Re-deriving here.

What

Adds encoding="utf-8" to the remaining open() call sites in:

src/apm_cli/config.py (5 sites)
src/apm_cli/marketplace/client.py (5 sites)
src/apm_cli/marketplace/registry.py (3 sites)
src/apm_cli/models/plugin.py (1 site)
src/apm_cli/runtime/copilot_runtime.py (1 site)

Adds round-trip unit tests per module that exercise non-ASCII content (e.g. 中文, 開始, cafe). Skipped binary-mode opens (encoding irrelevant) and subprocess.Popen (different code path; already passes encoding="utf-8").

Refs

bug: UnicodeDecodeError reading .prompt.md on Windows CP950 — open() missing encoding parameter #604 (closed)
Standardize Unicode encodings to explicitly use UTF-8 #635 (closed-no-CLA, re-derived here)
fix: add encoding="utf-8" to prompt file open() calls in script_runner (Windows CP950) #607 (merged, fixed script_runner.py only)

Validation

uv run ruff check src/ tests/: clean.
uv run pytest tests/unit tests/test_console.py: 6870 passed.
9 new round-trip tests added; manually verified they would fail on a cp1252 locale without the encoding additions (the default codec cannot decode UTF-8 multi-byte sequences such as e4 b8 ad for 中).

Successor to #635 (closed-no-CLA). #604 was fixed by #607 in script_runner.py only. This extends the same fix to the remaining files where Windows cp950 / cp1252 locales would crash on UTF-8 content: - src/apm_cli/config.py: global config read/write - src/apm_cli/marketplace/client.py: marketplace cache I/O - src/apm_cli/marketplace/registry.py: registry persistence - src/apm_cli/models/plugin.py: plugin metadata read - src/apm_cli/runtime/copilot_runtime.py: MCP config read Adds round-trip unit tests with non-ASCII content for each module to catch regressions. Refs #604 (closed). Closes #635 (closed-no-CLA, re-derived in microsoft/apm). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR standardizes text-mode file I/O to explicitly use UTF-8 across the remaining open() call sites that could trigger UnicodeDecodeError on Windows non-UTF-8 locales, and adds unit tests intended to guard the behavior.

Changes:

Add encoding="utf-8" to remaining text open() call sites in config, marketplace cache/registry, plugin metadata loading, and Copilot MCP config loading.
Add new/extended unit tests to exercise round-trips involving non-ASCII content for the affected modules.

Show a summary per file

File	Description
src/apm_cli/config.py	Add explicit UTF-8 encoding to config JSON reads/writes.
src/apm_cli/marketplace/client.py	Add explicit UTF-8 encoding to marketplace cache reads/writes.
src/apm_cli/marketplace/registry.py	Add explicit UTF-8 encoding to marketplace registry persistence.
src/apm_cli/models/plugin.py	Add explicit UTF-8 encoding when reading `plugin.json`.
src/apm_cli/runtime/copilot_runtime.py	Add explicit UTF-8 encoding when reading MCP config JSON.
tests/unit/test_config.py	New tests intended to validate UTF-8 config I/O behavior.
tests/unit/marketplace/test_marketplace_client.py	New tests intended to validate UTF-8 cache I/O behavior.
tests/unit/marketplace/test_marketplace_registry.py	New tests intended to validate UTF-8 registry I/O behavior.
tests/unit/test_plugin.py	New test intended to validate UTF-8 plugin metadata reads.
tests/unit/test_copilot_runtime.py	New test intended to validate UTF-8 MCP config reads.

Copilot's findings

Comments suppressed due to low confidence (6)

tests/unit/test_config.py:45

These tests go through update_config(), which writes JSON via json.dump with the default ensure_ascii=True. That will escape non-ASCII as \uXXXX, leaving the file ASCII-only and not exercising the UTF-8 decode path that was failing on Windows locales. To cover the regression, add a case that writes a raw UTF-8 config.json (e.g. ensure_ascii=False) and asserts get_config() can load it.

    def test_config_file_is_utf8_on_disk(self, isolated_config):
        non_ascii_value = "# 開始 -- cafe"
        config_mod.update_config({"note": non_ascii_value})

        # Read raw bytes and decode as UTF-8 to assert the on-disk encoding.
        raw = isolated_config.read_bytes()
        decoded = json.loads(raw.decode("utf-8"))

tests/unit/marketplace/test_marketplace_registry.py:110

This test path uses registry_mod.add_marketplace(), which persists via json.dump(...) with default ensure_ascii=True (non-ASCII becomes \uXXXX). That keeps the registry file ASCII-only, so it wouldn't have triggered the Windows decode error prior to adding encoding="utf-8". To make the regression test meaningful, load from a registry file written with raw UTF-8 (e.g. ensure_ascii=False).

        src = MarketplaceSource(name="cafe-mkt", owner="cafe-org", repo="plugins-開始")
        registry_mod.add_marketplace(src)

        # Force re-load from disk by clearing the cache.
        registry_mod._invalidate_cache()
        fetched = registry_mod.get_marketplace_by_name("cafe-mkt")

tests/unit/marketplace/test_marketplace_registry.py:126

This test writes raw UTF-8 JSON but includes non-ASCII literals directly in the Python source (e.g. repo name with CJK). The repo requires Python source files to remain printable ASCII; please convert these literals to ASCII-only escape sequences ("\u....") so the runtime JSON still contains non-ASCII bytes without violating the source encoding rule.

        payload = {
            "marketplaces": [
                {"name": "cafe-mkt", "owner": "o", "repo": "repo-中文"},
            ]

tests/unit/marketplace/test_marketplace_client.py:446

These tests call _write_cache(), but that function uses json.dump with default ensure_ascii=True, so non-ASCII becomes \uXXXX and the cache files remain ASCII-only. That means the tests wouldn't have failed before the encoding="utf-8" change and don't actually exercise the regression scenario. Consider writing the cache JSON with raw UTF-8 (e.g. ensure_ascii=False) and then asserting _read_cache() / _read_stale_cache() can load it.

        }
        client_mod._write_cache("utf8-mkt", data)

        cached = client_mod._read_cache("utf8-mkt")
        assert cached is not None

tests/unit/test_copilot_runtime.py:175

_json.dumps(servers) defaults to ensure_ascii=True, which escapes non-ASCII as \uXXXX. That means the bytes written here will be ASCII-only and won't reproduce the Windows default-encoding decode failure this PR is guarding against. Write raw UTF-8 JSON (e.g. ensure_ascii=False) so the on-disk content actually contains non-ASCII bytes.

                    "description": "中文 description -- cafe",
                }
            }
        }
        mcp_path.write_bytes(_json.dumps(servers).encode("utf-8"))

tests/unit/test_plugin.py:25

json.dumps(metadata) defaults to ensure_ascii=True, which will escape non-ASCII characters as \uXXXX and keep plugin.json ASCII-only. That means this test would likely pass even if Plugin.from_path() used a non-UTF-8 platform default encoding. To validate the encoding fix, write JSON with raw UTF-8 bytes (e.g. ensure_ascii=False) and then read via Plugin.from_path().

        plugin_json = tmp_path / "plugin.json"
        plugin_json.write_bytes(json.dumps(metadata).encode("utf-8"))

Files reviewed: 10/10 changed files
Comments generated: 5

Copilot · 2026-04-30T12:57:40Z

+                    "command": "node",
+                    "args": ["server.js"],
+                    "description": "中文 description -- cafe",
+                }


This test introduces non-ASCII string literals in the Python source (e.g. CJK characters in the JSON data). The repo requires Python source files to stay within printable ASCII; please use ASCII-only escape sequences ("\u....") or construct UTF-8 bytes via escapes so runtime values remain non-ASCII without non-ASCII source text.

This issue also appears on line 171 of the same file.

Copilot · 2026-04-30T12:57:40Z

+    def test_update_config_preserves_non_ascii(self, isolated_config):
+        non_ascii_value = "/Users/cafe/projets/開始"
+        config_mod.update_config({"copilot_cowork_skills_dir": non_ascii_value})
+


Non-ASCII literals are introduced in the test data (e.g. in the path/config values). The repo requires Python source files to be printable ASCII, so please express these values using ASCII-only escape sequences ("\u....") while still producing non-ASCII runtime strings.

This issue also appears on line 39 of the same file.

Copilot · 2026-04-30T12:57:41Z

+        # description-bearing source by writing a custom entry directly.
+        src = MarketplaceSource(name="cafe-mkt", owner="cafe-org", repo="plugins-開始")
+        registry_mod.add_marketplace(src)


This test adds non-ASCII string literals in the Python source (e.g. in the repo value). The repo requires Python source files to remain printable ASCII; please replace these with ASCII-only escape sequences so runtime values are still non-ASCII without non-ASCII source text.

This issue also appears in the following locations of the same file:

line 105

line 123

Copilot · 2026-04-30T12:57:41Z

+            "name": "Demo plugin -- cafe",
+            "version": "1.0.0",
+            "description": "Plugin de demo with 中文 description",
+            "author": "Cafe Author",


This test file includes non-ASCII string literals (CJK characters) in source. The repo requires Python source files to remain printable ASCII; please replace these literals with ASCII-only escape sequences so the runtime value is still non-ASCII without introducing non-ASCII characters into the .py file.

This issue also appears on line 23 of the same file.

Copilot · 2026-04-30T12:57:41Z

+        data = {
+            "name": "Marketplace -- cafe",
+            "description": "中文 description",
+            "plugins": [{"name": "skill-開始", "author": "cafe"}],
+        }


These new tests include non-ASCII literals in the Python source (e.g. in description / plugin names). The repo requires Python source files to remain printable ASCII; please use ASCII-only escape sequences ("\u....") or construct UTF-8 bytes via escapes so runtime values remain non-ASCII without non-ASCII source text.

This issue also appears on line 442 of the same file.

Addresses Copilot review on #1068. The repo's encoding rule requires Python source files to stay within printable ASCII. Replace the CJK / accented literals introduced for round-trip tests with \\uXXXX escapes so runtime values remain non-ASCII (still exercising the UTF-8 encoding fix) without putting non-ASCII bytes in .py source. Affects 5 test files; no behavioral change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danielmeppiel · 2026-04-30T13:04:55Z

Addressed Copilot review: replaced raw CJK / accented literals with \uXXXX escapes across all 5 test files. Same runtime values (still non-ASCII), ASCII-only source as the repo encoding rule requires. 57 affected tests still pass.

Bring in PR microsoft#1068 (utf-8 encoding) and other main updates so the PR diff shows only the focused fix. Earlier panel rejection cited a hook_integrator regression which is no longer present in this branch (removed via prior merges); the core claude_formatter.py fix that the panel approved remains intact and is the only logical change. Co-authored-by: Travis Illig <tillig@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 30, 2026 12:51

danielmeppiel requested a review from sergio-sisternes-epam as a code owner April 30, 2026 12:51

Copilot started reviewing on behalf of danielmeppiel April 30, 2026 12:52 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

danielmeppiel merged commit cd9426f into main Apr 30, 2026
10 checks passed

danielmeppiel deleted the fix/utf8-encoding-non-script-files branch April 30, 2026 15:19

danielmeppiel mentioned this pull request Apr 30, 2026

[Summary] Next issues and PRs to solve, prioritized #1064

Closed

danielmeppiel mentioned this pull request May 1, 2026

[docs] Update documentation for features from 2026-05-01 #1095

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: extend explicit UTF-8 encoding to 5 remaining open() call sites#1068

fix: extend explicit UTF-8 encoding to 5 remaining open() call sites#1068
danielmeppiel merged 2 commits intomainfrom
fix/utf8-encoding-non-script-files

danielmeppiel commented Apr 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

Copilot AI Apr 30, 2026

Uh oh!

danielmeppiel commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielmeppiel commented Apr 30, 2026

Why

What

Refs

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

danielmeppiel commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants