fix: extend explicit UTF-8 encoding to 5 remaining open() call sites#1068
fix: extend explicit UTF-8 encoding to 5 remaining open() call sites#1068danielmeppiel merged 2 commits intomainfrom
Conversation
Successor to #635 (closed-no-CLA). #604 was fixed by #607 in script_runner.py only. This extends the same fix to the remaining files where Windows cp950 / cp1252 locales would crash on UTF-8 content: - src/apm_cli/config.py: global config read/write - src/apm_cli/marketplace/client.py: marketplace cache I/O - src/apm_cli/marketplace/registry.py: registry persistence - src/apm_cli/models/plugin.py: plugin metadata read - src/apm_cli/runtime/copilot_runtime.py: MCP config read Adds round-trip unit tests with non-ASCII content for each module to catch regressions. Refs #604 (closed). Closes #635 (closed-no-CLA, re-derived in microsoft/apm). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR standardizes text-mode file I/O to explicitly use UTF-8 across the remaining open() call sites that could trigger UnicodeDecodeError on Windows non-UTF-8 locales, and adds unit tests intended to guard the behavior.
Changes:
- Add
encoding="utf-8"to remaining textopen()call sites in config, marketplace cache/registry, plugin metadata loading, and Copilot MCP config loading. - Add new/extended unit tests to exercise round-trips involving non-ASCII content for the affected modules.
Show a summary per file
| File | Description |
|---|---|
| src/apm_cli/config.py | Add explicit UTF-8 encoding to config JSON reads/writes. |
| src/apm_cli/marketplace/client.py | Add explicit UTF-8 encoding to marketplace cache reads/writes. |
| src/apm_cli/marketplace/registry.py | Add explicit UTF-8 encoding to marketplace registry persistence. |
| src/apm_cli/models/plugin.py | Add explicit UTF-8 encoding when reading plugin.json. |
| src/apm_cli/runtime/copilot_runtime.py | Add explicit UTF-8 encoding when reading MCP config JSON. |
| tests/unit/test_config.py | New tests intended to validate UTF-8 config I/O behavior. |
| tests/unit/marketplace/test_marketplace_client.py | New tests intended to validate UTF-8 cache I/O behavior. |
| tests/unit/marketplace/test_marketplace_registry.py | New tests intended to validate UTF-8 registry I/O behavior. |
| tests/unit/test_plugin.py | New test intended to validate UTF-8 plugin metadata reads. |
| tests/unit/test_copilot_runtime.py | New test intended to validate UTF-8 MCP config reads. |
Copilot's findings
Comments suppressed due to low confidence (6)
tests/unit/test_config.py:45
- These tests go through
update_config(), which writes JSON viajson.dumpwith the defaultensure_ascii=True. That will escape non-ASCII as\uXXXX, leaving the file ASCII-only and not exercising the UTF-8 decode path that was failing on Windows locales. To cover the regression, add a case that writes a raw UTF-8 config.json (e.g.ensure_ascii=False) and assertsget_config()can load it.
def test_config_file_is_utf8_on_disk(self, isolated_config):
non_ascii_value = "# 開始 -- cafe"
config_mod.update_config({"note": non_ascii_value})
# Read raw bytes and decode as UTF-8 to assert the on-disk encoding.
raw = isolated_config.read_bytes()
decoded = json.loads(raw.decode("utf-8"))
tests/unit/marketplace/test_marketplace_registry.py:110
- This test path uses
registry_mod.add_marketplace(), which persists viajson.dump(...)with defaultensure_ascii=True(non-ASCII becomes\uXXXX). That keeps the registry file ASCII-only, so it wouldn't have triggered the Windows decode error prior to addingencoding="utf-8". To make the regression test meaningful, load from a registry file written with raw UTF-8 (e.g.ensure_ascii=False).
src = MarketplaceSource(name="cafe-mkt", owner="cafe-org", repo="plugins-開始")
registry_mod.add_marketplace(src)
# Force re-load from disk by clearing the cache.
registry_mod._invalidate_cache()
fetched = registry_mod.get_marketplace_by_name("cafe-mkt")
tests/unit/marketplace/test_marketplace_registry.py:126
- This test writes raw UTF-8 JSON but includes non-ASCII literals directly in the Python source (e.g. repo name with CJK). The repo requires Python source files to remain printable ASCII; please convert these literals to ASCII-only escape sequences ("\u....") so the runtime JSON still contains non-ASCII bytes without violating the source encoding rule.
payload = {
"marketplaces": [
{"name": "cafe-mkt", "owner": "o", "repo": "repo-中文"},
]
tests/unit/marketplace/test_marketplace_client.py:446
- These tests call
_write_cache(), but that function usesjson.dumpwith defaultensure_ascii=True, so non-ASCII becomes\uXXXXand the cache files remain ASCII-only. That means the tests wouldn't have failed before theencoding="utf-8"change and don't actually exercise the regression scenario. Consider writing the cache JSON with raw UTF-8 (e.g.ensure_ascii=False) and then asserting_read_cache()/_read_stale_cache()can load it.
}
client_mod._write_cache("utf8-mkt", data)
cached = client_mod._read_cache("utf8-mkt")
assert cached is not None
tests/unit/test_copilot_runtime.py:175
_json.dumps(servers)defaults toensure_ascii=True, which escapes non-ASCII as\uXXXX. That means the bytes written here will be ASCII-only and won't reproduce the Windows default-encoding decode failure this PR is guarding against. Write raw UTF-8 JSON (e.g.ensure_ascii=False) so the on-disk content actually contains non-ASCII bytes.
"description": "中文 description -- cafe",
}
}
}
mcp_path.write_bytes(_json.dumps(servers).encode("utf-8"))
tests/unit/test_plugin.py:25
json.dumps(metadata)defaults toensure_ascii=True, which will escape non-ASCII characters as\uXXXXand keep plugin.json ASCII-only. That means this test would likely pass even ifPlugin.from_path()used a non-UTF-8 platform default encoding. To validate the encoding fix, write JSON with raw UTF-8 bytes (e.g.ensure_ascii=False) and then read viaPlugin.from_path().
plugin_json = tmp_path / "plugin.json"
plugin_json.write_bytes(json.dumps(metadata).encode("utf-8"))
- Files reviewed: 10/10 changed files
- Comments generated: 5
| "command": "node", | ||
| "args": ["server.js"], | ||
| "description": "中文 description -- cafe", | ||
| } |
There was a problem hiding this comment.
This test introduces non-ASCII string literals in the Python source (e.g. CJK characters in the JSON data). The repo requires Python source files to stay within printable ASCII; please use ASCII-only escape sequences ("\u....") or construct UTF-8 bytes via escapes so runtime values remain non-ASCII without non-ASCII source text.
This issue also appears on line 171 of the same file.
| def test_update_config_preserves_non_ascii(self, isolated_config): | ||
| non_ascii_value = "/Users/cafe/projets/開始" | ||
| config_mod.update_config({"copilot_cowork_skills_dir": non_ascii_value}) | ||
|
|
There was a problem hiding this comment.
Non-ASCII literals are introduced in the test data (e.g. in the path/config values). The repo requires Python source files to be printable ASCII, so please express these values using ASCII-only escape sequences ("\u....") while still producing non-ASCII runtime strings.
This issue also appears on line 39 of the same file.
| # description-bearing source by writing a custom entry directly. | ||
| src = MarketplaceSource(name="cafe-mkt", owner="cafe-org", repo="plugins-開始") | ||
| registry_mod.add_marketplace(src) |
There was a problem hiding this comment.
This test adds non-ASCII string literals in the Python source (e.g. in the repo value). The repo requires Python source files to remain printable ASCII; please replace these with ASCII-only escape sequences so runtime values are still non-ASCII without non-ASCII source text.
This issue also appears in the following locations of the same file:
- line 105
- line 123
| "name": "Demo plugin -- cafe", | ||
| "version": "1.0.0", | ||
| "description": "Plugin de demo with 中文 description", | ||
| "author": "Cafe Author", |
There was a problem hiding this comment.
This test file includes non-ASCII string literals (CJK characters) in source. The repo requires Python source files to remain printable ASCII; please replace these literals with ASCII-only escape sequences so the runtime value is still non-ASCII without introducing non-ASCII characters into the .py file.
This issue also appears on line 23 of the same file.
| data = { | ||
| "name": "Marketplace -- cafe", | ||
| "description": "中文 description", | ||
| "plugins": [{"name": "skill-開始", "author": "cafe"}], | ||
| } |
There was a problem hiding this comment.
These new tests include non-ASCII literals in the Python source (e.g. in description / plugin names). The repo requires Python source files to remain printable ASCII; please use ASCII-only escape sequences ("\u....") or construct UTF-8 bytes via escapes so runtime values remain non-ASCII without non-ASCII source text.
This issue also appears on line 442 of the same file.
Addresses Copilot review on #1068. The repo's encoding rule requires Python source files to stay within printable ASCII. Replace the CJK / accented literals introduced for round-trip tests with \\uXXXX escapes so runtime values remain non-ASCII (still exercising the UTF-8 encoding fix) without putting non-ASCII bytes in .py source. Affects 5 test files; no behavioral change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed Copilot review: replaced raw CJK / accented literals with \uXXXX escapes across all 5 test files. Same runtime values (still non-ASCII), ASCII-only source as the repo encoding rule requires. 57 affected tests still pass. |
Bring in PR microsoft#1068 (utf-8 encoding) and other main updates so the PR diff shows only the focused fix. Earlier panel rejection cited a hook_integrator regression which is no longer present in this branch (removed via prior merges); the core claude_formatter.py fix that the panel approved remains intact and is the only logical change. Co-authored-by: Travis Illig <tillig@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why
Windows users on cp950 / cp1252 locales hit
UnicodeDecodeErroron anyapmoperation that reads or writes UTF-8 content through the 5 modules listed below. Issue #604 covered the prompt-file path and was fixed by #607 (script_runner.py only). The original PR #635 covered the broader surface but was closed because the author did not sign CLA. Re-deriving here.What
Adds
encoding="utf-8"to the remainingopen()call sites in:src/apm_cli/config.py(5 sites)src/apm_cli/marketplace/client.py(5 sites)src/apm_cli/marketplace/registry.py(3 sites)src/apm_cli/models/plugin.py(1 site)src/apm_cli/runtime/copilot_runtime.py(1 site)Adds round-trip unit tests per module that exercise non-ASCII content (e.g.
中文,開始,cafe). Skipped binary-mode opens (encoding irrelevant) andsubprocess.Popen(different code path; already passesencoding="utf-8").Refs
Validation
uv run ruff check src/ tests/: clean.uv run pytest tests/unit tests/test_console.py: 6870 passed.e4 b8 adfor中).