-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Normalize UTF-8 BOM Marks and ps1 Indention #3546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize UTF-8 BOM Marks and ps1 Indention #3546
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request normalizes UTF-8 byte order mark (BOM) usage across the repository to eliminate noise from tools that don't preserve existing encoding. The PR establishes a "UTF-8 without BOM" policy, updates build tooling to respect this policy, and mechanically removes BOMs from over 1000 files.
Key changes:
- Establishes UTF-8 without BOM as the default encoding policy
- Updates PowerShell build scripts to write UTF-8 without BOM
- Adds new PowerShell utilities for BOM management and inspection
Reviewed Changes
Copilot reviewed 300 out of 1355 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| .editorconfig | Adds charset configuration with UTF-8 without BOM default, except for .resx files |
| BuildTools/update-assemblyinfo.ps1 | Modified to write UTF-8 without BOM using explicit encoding |
| BuildTools/bom-strip.ps1 | New utility script to remove BOMs from text files |
| BuildTools/bom-classify-encodings.ps1 | New utility script to classify files by encoding type |
| All other files | Mechanical BOM removal from text files (over 1000 files) |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| if ((-not (Test-File $file.Output)) -or (((Get-Content $file.Output) -Join [System.Environment]::NewLine) -ne $out)) { | ||
| $out | Out-File -Encoding utf8 $file.Output; | ||
| $utf8NoBom = New-Object System.Text.UTF8Encoding($false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation in ps1 should be spaces, I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be added to .editorconfig as well :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the existing ps1 files are a total mishmash of tabs vs spaces (even on the same line!), I think we should just choose.
And given that the project bucks the trend and uses tabs for .cs files (and I like tabs...), I've gone ahead and just standardized the ps1 files on tabs.
b3b3fa1 to
b4fb59a
Compare
|
Thank you very much! |
Problem
The repo mixes UTF-8 with and without a byte order mark (BOM). Some tools in my workflow write using their own defaults instead of preserving a file’s existing encoding, so routine edits flip the BOM and create noisy diffs. This PR establishes one policy, aligns scripts to it, then applies a mechanical cleanup.
Why you may not encounter this: if your tools preserve a file’s current encoding (as most of mine do...), you won’t hit churn. The noise appears only with non-preserving writers.
Changes
No functional/runtime changes.
Commit 1 — policy + tooling
.editorconfig: default to UTF-8 without BOM;.resxstays UTF-8 with BOM. The ResX toolchain (ResXResourceWriter, Visual Studio designer,resgen) emits a BOM and ignores.editorconfig, so we configure.resxaccordingly to keep editors and build tools aligned.BuildTools/update-assemblyinfo.ps1: write UTF-8 without BOM to match policy. On PowerShell pre-v6,Out-Filewrites a BOM by default; this prevents that.BuildTools/bom-strip.ps1: remove all BOMs (except resx)BuildTools/bom-classify-encoding.ps1: inspect BOM presence; also used to derive the extension list consumed bybom-strip.dotnet formathonours.editorconfigand checks the charset (including BOM), so this should prevent BOM churn going forward by hooking into the existing formatting enforcement.Commit 2 — mechanical application
Review notes
Commit 2 is the mechanical output of running
buildtools/bom-strip.ps1after Commit 1. Its diff should contain only BOM removals.It was done this way to make verifying the giant (>1000 files) commit easier--I suggest parsing the diff/patch or running
buildtools/bom-strip.ps1on commit 1 if you want to make sure for yourself that there are no other changes in there.