Standardize Unicode encodings to explicitly use UTF-8#635
Standardize Unicode encodings to explicitly use UTF-8#635dvmukul wants to merge 5 commits intomicrosoft:mainfrom
Conversation
This change adds encoding='utf-8' to file open operations in the script runner, configuration manager, marketplace client, and other core modules. This resolves reported issues (e.g., microsoft#604) where UnicodeDecodeError occurs on Windows when handling prompts and configuration files with non-ASCII characters.
sergio-sisternes-epam
left a comment
There was a problem hiding this comment.
Looks good overall — the changes are correct and directly fix the Windows UnicodeDecodeError from #604. Two suggestions in the inline comments (missed open() calls + test coverage).
|
|
||
| if not os.path.exists(CONFIG_FILE): | ||
| with open(CONFIG_FILE, "w") as f: | ||
| with open(CONFIG_FILE, "w", encoding="utf-8") as f: |
There was a problem hiding this comment.
👍 Good fix. Note that an AST scan of the codebase found five more text-mode open() calls that still lack explicit encoding='utf-8':
| File | Line | Operation |
|---|---|---|
adapters/client/codex.py |
65 | toml.dump() — write config |
adapters/client/codex.py |
80 | toml.load() — read config |
adapters/client/copilot.py |
68 | json.dump() — write config |
adapters/client/copilot.py |
83 | json.load() — read config |
deps/github_downloader.py |
184 | write empty gitconfig |
Since the PR title says "Standardize … across the codebase," it would be great to cover these too (same pattern, same Windows risk).
| self.compiled_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| with open(prompt_path, "r") as f: | ||
| with open(prompt_path, "r", encoding="utf-8") as f: |
There was a problem hiding this comment.
This is the core fix for #604. Consider adding a small test that round-trips a non-ASCII string (e.g. CJK characters) through PromptCompiler.compile() to lock in the fix — the existing test_cli_encoding.py only covers console stream reconfiguration, not file I/O.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Standardizes text file I/O across apm_cli by explicitly using UTF-8 encodings when reading/writing JSON and configuration files, improving cross-platform behavior (notably Windows default encodings).
Changes:
- Add
encoding="utf-8"toopen()calls that read/write JSON config and cache files. - Ensure plugin metadata and MCP server configuration are read consistently as UTF-8.
- Standardize marketplace registry persistence to UTF-8 for both reads and writes.
Show a summary per file
| File | Description |
|---|---|
| src/apm_cli/runtime/copilot_runtime.py | Reads MCP config JSON using explicit UTF-8 encoding. |
| src/apm_cli/models/plugin.py | Reads plugin.json metadata using explicit UTF-8 encoding. |
| src/apm_cli/marketplace/registry.py | Reads/writes marketplace registry JSON using explicit UTF-8 encoding (including temp file). |
| src/apm_cli/marketplace/client.py | Reads/writes marketplace cache JSON/meta using explicit UTF-8 encoding. |
| src/apm_cli/config.py | Reads/writes global config JSON using explicit UTF-8 encoding. |
Copilot's findings
- Files reviewed: 5/5 changed files
- Comments generated: 0
|
@dvmukul without you accepting the CLA as per the bot comment, we cannot merge this |
…1068) * fix: add explicit UTF-8 encoding to open() calls across 5 modules Successor to #635 (closed-no-CLA). #604 was fixed by #607 in script_runner.py only. This extends the same fix to the remaining files where Windows cp950 / cp1252 locales would crash on UTF-8 content: - src/apm_cli/config.py: global config read/write - src/apm_cli/marketplace/client.py: marketplace cache I/O - src/apm_cli/marketplace/registry.py: registry persistence - src/apm_cli/models/plugin.py: plugin metadata read - src/apm_cli/runtime/copilot_runtime.py: MCP config read Adds round-trip unit tests with non-ASCII content for each module to catch regressions. Refs #604 (closed). Closes #635 (closed-no-CLA, re-derived in microsoft/apm). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: use unicode escapes instead of raw non-ASCII literals Addresses Copilot review on #1068. The repo's encoding rule requires Python source files to stay within printable ASCII. Replace the CJK / accented literals introduced for round-trip tests with \\uXXXX escapes so runtime values remain non-ASCII (still exercising the UTF-8 encoding fix) without putting non-ASCII bytes in .py source. Affects 5 test files; no behavioral change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary of Changes
This Pull Request standardizes file operations across the codebase by adding explicit
encoding='utf-8'toopen()calls.Rationale
On Windows, the default system encoding is often locale-specific (e.g., CP1252 or CP950), which causes
UnicodeDecodeErrorwhen reading or writing files that contain UTF-8 characters (like prompt templates or localized marketplace metadata). This fix ensures thatapmbehaves consistently across Windows, macOS, and Linux.Key Fixes:
script_runner.py: Fixed issue in prompt compilation and script reading (resolves bug: UnicodeDecodeError reading .prompt.md on Windows CP950 — open() missing encoding parameter #604).config.py: Standardized global configuration file operations.marketplace/client.py®istry.py: Fixed local caching of marketplace manifests.models/plugin.py: Ensured stable reading ofplugin.jsonmetadata.runtime/copilot_runtime.py: Fixed MCP configuration retrieval.Verification
Changes were audited for text-based text vs binary-based operations to avoid data corruption. Existing unit tests were prioritized to ensure no regressions in core functionality.