Skip to content

Conversation

@petercrabtree
Copy link
Contributor

Problem

The repo mixes UTF-8 with and without a byte order mark (BOM). Some tools in my workflow write using their own defaults instead of preserving a file’s existing encoding, so routine edits flip the BOM and create noisy diffs. This PR establishes one policy, aligns scripts to it, then applies a mechanical cleanup.

Why you may not encounter this: if your tools preserve a file’s current encoding (as most of mine do...), you won’t hit churn. The noise appears only with non-preserving writers.

Changes

No functional/runtime changes.

Commit 1 — policy + tooling

  • .editorconfig: default to UTF-8 without BOM; .resx stays UTF-8 with BOM. The ResX toolchain (ResXResourceWriter, Visual Studio designer, resgen) emits a BOM and ignores .editorconfig, so we configure .resx accordingly to keep editors and build tools aligned.
  • BuildTools/update-assemblyinfo.ps1: write UTF-8 without BOM to match policy. On PowerShell pre-v6, Out-File writes a BOM by default; this prevents that.
  • New: BuildTools/bom-strip.ps1: remove all BOMs (except resx)
  • New: BuildTools/bom-classify-encoding.ps1: inspect BOM presence; also used to derive the extension list consumed by bom-strip.

dotnet format honours .editorconfig and checks the charset (including BOM), so this should prevent BOM churn going forward by hooking into the existing formatting enforcement.

Commit 2 — mechanical application

  • Normalize UTF-8 BOMs across the repository by removing them from all text files other than resx

Review notes

Commit 2 is the mechanical output of running buildtools/bom-strip.ps1 after Commit 1. Its diff should contain only BOM removals.

It was done this way to make verifying the giant (>1000 files) commit easier--I suggest parsing the diff/patch or running buildtools/bom-strip.ps1 on commit 1 if you want to make sure for yourself that there are no other changes in there.

Copilot AI review requested due to automatic review settings August 27, 2025 03:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request normalizes UTF-8 byte order mark (BOM) usage across the repository to eliminate noise from tools that don't preserve existing encoding. The PR establishes a "UTF-8 without BOM" policy, updates build tooling to respect this policy, and mechanically removes BOMs from over 1000 files.

Key changes:

  • Establishes UTF-8 without BOM as the default encoding policy
  • Updates PowerShell build scripts to write UTF-8 without BOM
  • Adds new PowerShell utilities for BOM management and inspection

Reviewed Changes

Copilot reviewed 300 out of 1355 changed files in this pull request and generated no comments.

Show a summary per file
File Description
.editorconfig Adds charset configuration with UTF-8 without BOM default, except for .resx files
BuildTools/update-assemblyinfo.ps1 Modified to write UTF-8 without BOM using explicit encoding
BuildTools/bom-strip.ps1 New utility script to remove BOMs from text files
BuildTools/bom-classify-encodings.ps1 New utility script to classify files by encoding type
All other files Mechanical BOM removal from text files (over 1000 files)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


if ((-not (Test-File $file.Output)) -or (((Get-Content $file.Output) -Join [System.Environment]::NewLine) -ne $out)) {
$out | Out-File -Encoding utf8 $file.Output;
$utf8NoBom = New-Object System.Text.UTF8Encoding($false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation in ps1 should be spaces, I guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be added to .editorconfig as well :)

Copy link
Contributor Author

@petercrabtree petercrabtree Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the existing ps1 files are a total mishmash of tabs vs spaces (even on the same line!), I think we should just choose.

And given that the project bucks the trend and uses tabs for .cs files (and I like tabs...), I've gone ahead and just standardized the ps1 files on tabs.

@petercrabtree petercrabtree changed the title Normalize UTF-8 BOM Usage in the Repository Normalize UTF-8 BOM marks and ps1 Indention Aug 29, 2025
@petercrabtree petercrabtree changed the title Normalize UTF-8 BOM marks and ps1 Indention Normalize UTF-8 BOM Marks and ps1 Indention Aug 29, 2025
@petercrabtree petercrabtree force-pushed the dev/dev-env-clean-no-bom branch from b3b3fa1 to b4fb59a Compare August 29, 2025 00:59
@siegfriedpammer siegfriedpammer merged commit 967de58 into icsharpcode:master Sep 4, 2025
5 checks passed
@siegfriedpammer
Copy link
Member

Thank you very much!

@petercrabtree petercrabtree deleted the dev/dev-env-clean-no-bom branch October 18, 2025 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants