diff --git a/.config/dotnet-tools.json b/.config/dotnet-tools.json index 8a6bed1..a6dbde4 100644 --- a/.config/dotnet-tools.json +++ b/.config/dotnet-tools.json @@ -15,6 +15,13 @@ "husky" ], "rollForward": false + }, + "dotnet-outdated-tool": { + "version": "4.6.9", + "commands": [ + "dotnet-outdated" + ], + "rollForward": false } } } diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 0780c5e..aa228eb 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -7,6 +7,10 @@ 1. **Data Publishing**: Provides ISO 639-2, ISO 639-3, and RFC 5646 language tag records in JSON and C# formats 2. **Tag Processing**: Implements IETF BCP 47 language tag construction and parsing per RFC 5646 semantic rules +**Current Version**: 1.2 (supports .NET 10.0, AOT compatible) + +**Important Note**: The implemented language tag parsing and normalization logic may be incomplete or inaccurate per RFC 5646. Always verify results for your specific use case + ## Solution Structure ### Projects @@ -38,9 +42,11 @@ - Updated weekly via GitHub Actions - **.github/workflows/** - - `update-languagedata.yml`: Weekly scheduled job to update language data + - `run-periodic-codegen-pull-request.yml`: Weekly scheduled job to update language data - `publish-release.yml`: Release and NuGet publishing workflow - - `merge-bot-pr.yml`: Automated PR merge workflow + - `merge-bot-pull-request.yml`: Automated PR merge workflow + - `build-release-task.yml`, `build-library-task.yml`: Build tasks + - `get-version-task.yml`, `build-datebadge-task.yml`: Version and badge generation ## Core Components @@ -50,9 +56,12 @@ The main public API for working with language tags: **Static Factory Methods:** - `Parse(string tag)`: Parse a language tag string, returns null on failure +- `Parse(string tag, Options? options)`: Parse with per-call logging options - `TryParse(string tag, out LanguageTag? result)`: Safe parsing with out parameter +- `TryParse(string tag, out LanguageTag? result, Options? options)`: Safe parsing with per-call logging options - `ParseOrDefault(string tag, LanguageTag? defaultTag = null)`: Parse with fallback to "und" - `ParseAndNormalize(string tag)`: Parse and normalize in one step +- `ParseAndNormalize(string tag, Options? options)`: Parse and normalize with per-call logging options - `CreateBuilder()`: Create a fluent builder instance - `FromLanguage(string language)`: Factory for simple language tags - `FromLanguageRegion(string language, string region)`: Factory for language+region tags @@ -71,6 +80,7 @@ The main public API for working with language tags: **Instance Methods:** - `Validate()`: Verify tag correctness - `Normalize()`: Return normalized copy of tag +- `Normalize(Options? options)`: Return normalized copy with per-call logging options - `ToString()`: String representation - `Equals()`: Equality comparison (case-insensitive) - `GetHashCode()`: Hash code for collections @@ -98,6 +108,7 @@ Fluent builder for constructing language tags: - `PrivateUseAddRange(IEnumerable values)`: Add multiple private use tags - `Build()`: Return constructed tag (no validation) - `Normalize()`: Return normalized tag (with validation) +- `Normalize(Options? options)`: Return normalized tag with per-call logging options ### LanguageTagParser Class (LanguageTagParser.cs) @@ -127,33 +138,39 @@ Provides language code conversion and matching: - `GetIsoFromIetf(string languageTag)`: Convert IETF to ISO format - `IsMatch(string prefix, string languageTag)`: Prefix matching for content selection +**Construction:** +- `new LanguageLookup(Options? options = null)`: Optional per-instance logging + ### Data Models #### Iso6392Data.cs - ISO 639-2 language codes (3-letter bibliographic/terminologic codes) - **Public Methods:** - `Create()`: Load embedded data - - `LoadData(string fileName)`: Load from file - - `LoadJson(string fileName)`: Load from JSON + - `LoadDataAsync(string fileName)`: Load from file + - `LoadJsonAsync(string fileName)`: Load from JSON - `Find(string? languageTag, bool includeDescription)`: Find record by tag + - `Find(string? languageTag, bool includeDescription, Options? options)`: Find record by tag with logging options - **Record Properties:** `Part2B`, `Part2T`, `Part1`, `RefName` #### Iso6393Data.cs - ISO 639-3 language codes (comprehensive language codes) - **Public Methods:** - `Create()`: Load embedded data - - `LoadData(string fileName)`: Load from file - - `LoadJson(string fileName)`: Load from JSON + - `LoadDataAsync(string fileName)`: Load from file + - `LoadJsonAsync(string fileName)`: Load from JSON - `Find(string? languageTag, bool includeDescription)`: Find record by tag + - `Find(string? languageTag, bool includeDescription, Options? options)`: Find record by tag with logging options - **Record Properties:** `Id`, `Part2B`, `Part2T`, `Part1`, `Scope`, `LanguageType`, `RefName`, `Comment` #### Rfc5646Data.cs - RFC 5646 / BCP 47 language subtag registry - **Public Methods:** - `Create()`: Load embedded data - - `LoadData(string fileName)`: Load from file - - `LoadJson(string fileName)`: Load from JSON + - `LoadDataAsync(string fileName)`: Load from file + - `LoadJsonAsync(string fileName)`: Load from JSON - `Find(string? languageTag, bool includeDescription)`: Find record by tag + - `Find(string? languageTag, bool includeDescription, Options? options)`: Find record by tag with logging options - **Properties:** `FileDate`, `RecordList` - **Record Properties:** `Type`, `Tag`, `SubTag`, `Description` (ImmutableArray), `Added`, `SuppressScript`, `Scope`, `MacroLanguage`, `Deprecated`, `Comments` (ImmutableArray), `Prefix` (ImmutableArray), `PreferredValue`, `TagAny` - **Enums:** @@ -184,83 +201,6 @@ Examples: - `zh-yue-hk`: Language with extended language and region - `en-latn-gb-boont-r-extended-sequence-x-private`: Full tag with all components -## Development Guidelines - -### Code Style - -- **C# Version**: 14.0 (latest features) -- **Target Framework**: .NET 10.0 -- Use modern C# features and syntax -- Follow .NET naming conventions -- Use collection expressions: `[]` instead of `new List<>()` -- Use `ImmutableArray` for public collections -- Use file-scoped namespaces -- **Required**: Include XML documentation (`///`) for ALL public APIs -- Use `init` accessors for immutable properties where appropriate -- Use `internal set` for properties that need internal mutability -- Use readonly fields where appropriate -- Prefer primary constructors where applicable - -### XML Documentation Requirements - -All public classes, methods, properties, enums, and operators **must** have XML documentation: - -```csharp -/// -/// Brief description of the member. -/// -/// Description of parameter. -/// Description of return value. -/// When this exception is thrown. -public ReturnType MethodName(ParamType paramName) -``` - -### Immutability and Thread Safety - -- All data classes (`Iso6392Data`, `Iso6393Data`, `Rfc5646Data`) are immutable -- Records can be safely shared across threads -- Use `ImmutableArray` for collections in public APIs -- Properties expose immutable collections; internal backing stores can be mutable - -### Testing Requirements - -- **100% coverage** of all public APIs required -- Write unit tests for: - - All public methods - - All static factory methods - - Property accessors - - Equality members - - Edge cases (null, empty, invalid inputs) - - Case-insensitive behavior - - Roundtrip scenarios (parse → normalize → toString) -- Tests are organized by component: - - `LanguageTagTests.cs`: 77+ tests for LanguageTag class - - `LanguageTagBuilderTests.cs`: Builder functionality - - `LanguageTagParserTests.cs`: Parser and normalization - - `LanguageLookupTests.cs`: Conversion and matching - - `Iso6392Tests.cs`, `Iso6393Tests.cs`, `Rfc5646Tests.cs`: Data access -- Use descriptive test method names that explain the scenario -- Leverage AwesomeAssertions for fluent assertions -- Use `[Theory]` with `[InlineData]` for parameterized tests - -### Tools and Formatting - -Available VS Code tasks: -- `.Net Build`: Build the solution -- `.Net Format`: Format code using dotnet format -- `CSharpier Format`: Format code using CSharpier -- `.Net Tool Update`: Update all .NET tools -- `Husky.Net Run`: Run Husky pre-commit hooks - -### Package Management - -- Uses Microsoft.SourceLink.GitHub for source linking -- Generates symbols package (.snupkg) for debugging -- Embeds untracked sources for complete debugging experience -- Package ID: `ptr727.LanguageTags` -- License: MIT -- Current version: 1.0.0-pre - ### Data Updates - Language data is updated weekly via GitHub Actions workflow @@ -339,15 +279,18 @@ LanguageTag tag = LanguageTag.ParseOrDefault(input); // Falls back to "und" - `VariantList` → `Variants` - `ExtensionList` → `Extensions` - `TagList` → `Tags` -- `LoadData()` and `LoadJson()` changed from internal to public in data classes +- Data file APIs are async-only: `LoadDataAsync()`/`LoadJsonAsync()`; sync versions removed - Tag construction requires use of factory methods or builder (constructors are internal) ### Added (Non-Breaking) - `LanguageTag.ParseOrDefault()`: Safe parsing with fallback - `LanguageTag.ParseAndNormalize()`: Combined parse and normalize +- `LanguageTag.ParseAndNormalize(string, Options?)`: Combined parse and normalize with logging options - `LanguageTag.IsValid`: Property for validation - `LanguageTag.FromLanguage()`, `FromLanguageRegion()`, `FromLanguageScriptRegion()`: Factory methods - `IEquatable` implementation with operators +- Options-aware logging for parsing/normalization and lookup (`Options` + `LogOptions`) +- `LanguageLookup` supports optional logging via primary constructor - Comprehensive XML documentation for all public APIs ## Future Improvements @@ -361,17 +304,9 @@ Consider these areas for enhancement: ## Contributing -When contributing to this project: -1. Follow the existing code style and patterns -2. Add unit tests for ALL new public functionality (100% coverage required) -3. Add XML documentation for ALL public APIs -4. Run formatting tools before committing -5. Ensure all tests pass (211+ tests should pass) -6. Update the README if adding significant features -7. Do not expose constructors publicly - use factory methods or builder pattern -8. Prefer immutability - use `ImmutableArray` for collections -9. Follow the safe parsing patterns (TryParse, ParseOrDefault) -10. Maintain thread safety for all data structures +- Follow the authoritative coding standards and tooling in `CODESTYLE.md` and `.editorconfig` +- Add tests for new public behavior and keep API documentation complete +- Use factory methods or builders for tag creation; avoid public constructors ## Common Patterns diff --git a/.github/dependabot.yml b/.github/dependabot.yml index fbc6fce..21030f4 100644 --- a/.github/dependabot.yml +++ b/.github/dependabot.yml @@ -21,23 +21,3 @@ updates: actions-deps: patterns: - "*" - - # develop -- package-ecosystem: "nuget" - target-branch: "develop" - directory: "/" - schedule: - interval: "daily" - groups: - nuget-deps: - patterns: - - "*" -- package-ecosystem: "github-actions" - target-branch: "develop" - directory: "/" - schedule: - interval: "daily" - groups: - actions-deps: - patterns: - - "*" diff --git a/.github/workflows/build-datebadge-task.yml b/.github/workflows/build-datebadge-task.yml new file mode 100644 index 0000000..08eaed3 --- /dev/null +++ b/.github/workflows/build-datebadge-task.yml @@ -0,0 +1,31 @@ +name: Build BYOB date badge task + +env: + IS_MAIN_BRANCH: ${{ endsWith(github.ref, 'refs/heads/main') }} + +on: + workflow_call: + +jobs: + + date-badge: + name: Build BYOB date badge job + runs-on: ubuntu-latest + + steps: + + - name: Get current date step + id: date + run: | + echo "date=$(date)" >> $GITHUB_OUTPUT + + - name: Build BYOB date badge step + if: ${{ env.IS_MAIN_BRANCH == 'true' }} + uses: RubbaBoy/BYOB@v1 + with: + name: lastbuild + label: "Last Build" + icon: "github" + status: ${{ steps.date.outputs.date }} + color: "blue" + github_token: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/build-library-task.yml b/.github/workflows/build-library-task.yml new file mode 100644 index 0000000..6886014 --- /dev/null +++ b/.github/workflows/build-library-task.yml @@ -0,0 +1,73 @@ +name: Build library task + +env: + IS_MAIN_BRANCH: ${{ endsWith(github.ref, 'refs/heads/main') }} + PROJECT_FILE: ./LanguageTags/LanguageTags.csproj + PROJECT_ARTIFACT: LanguageTags.7z + +on: + workflow_call: + inputs: + # Input to control whether to push the library to NuGet.org + push: + required: false + type: boolean + default: false + outputs: + # Output of the uploaded artifact id + artifact-id: + value: ${{ jobs.build-library.outputs.artifact-id }} + +jobs: + + get-version: + name: Get version information job + uses: ./.github/workflows/get-version-task.yml + secrets: inherit + + build-library: + name: Build library project job + runs-on: ubuntu-latest + outputs: + artifact-id: ${{ steps.artifact-upload-step.outputs.artifact-id }} + needs: [get-version] + + steps: + + - name: Setup .NET SDK step + uses: actions/setup-dotnet@v5 + with: + dotnet-version: 10.x + + - name: Checkout code step + uses: actions/checkout@v6 + + - name: Build library project step + run: >- + dotnet build ${{ env.PROJECT_FILE }} + --output ${{ runner.temp }}/publish + --configuration ${{ env.IS_MAIN_BRANCH == 'true' && 'Release' || 'Debug' }} + -property:Version=${{ needs.get-version.outputs.AssemblyVersion }} + -property:FileVersion=${{ needs.get-version.outputs.AssemblyFileVersion }} + -property:AssemblyVersion=${{ needs.get-version.outputs.AssemblyVersion }} + -property:InformationalVersion=${{ needs.get-version.outputs.AssemblyInformationalVersion }} + -property:PackageVersion=${{ needs.get-version.outputs.SemVer2 }} + + - name: Publish to NuGet.org step + if: ${{ inputs.push }} + run: >- + dotnet nuget push ${{ runner.temp }}/publish/*.nupkg + --source https://api.nuget.org/v3/index.json + --api-key ${{ secrets.NUGET_API_KEY }} + --skip-duplicate + + - name: Zip output step + run: | + 7z a -t7z ${{ runner.temp }}/${{ env.PROJECT_ARTIFACT }} ${{ runner.temp }}/publish/* + + - name: Upload build artifacts step + id: artifact-upload-step + uses: actions/upload-artifact@v6 + with: + name: library-build + path: ${{ runner.temp }}/${{ env.PROJECT_ARTIFACT }} diff --git a/.github/workflows/build-release-task.yml b/.github/workflows/build-release-task.yml new file mode 100644 index 0000000..17f4ad3 --- /dev/null +++ b/.github/workflows/build-release-task.yml @@ -0,0 +1,61 @@ +name: Build project release task + +env: + IS_MAIN_BRANCH: ${{ endsWith(github.ref, 'refs/heads/main') }} + +on: + workflow_call: + inputs: + # Input to control whether to create a GitHub release + github: + required: false + type: boolean + default: false + # Input to control whether to push the library to NuGet.org + nuget: + required: false + type: boolean + default: false + +jobs: + + get-version: + name: Get version information job + uses: ./.github/workflows/get-version-task.yml + secrets: inherit + + build-library: + name: Build library job + uses: ./.github/workflows/build-library-task.yml + secrets: inherit + with: + # Conditional push to NuGet.org + push: ${{ inputs.nuget }} + + github-release: + name: Publish GitHub release job + if: ${{ inputs.github }} + runs-on: ubuntu-latest + needs: [get-version, build-library] + + steps: + + - name: Checkout code step + uses: actions/checkout@v6 + + - name: Download library build artifacts job + uses: actions/download-artifact@v7 + with: + artifact-ids: ${{ needs.build-library.outputs.artifact-id }} + path: ./Publish + + - name: Create GitHub release job + uses: softprops/action-gh-release@v2 + with: + generate_release_notes: true + tag_name: ${{ needs.get-version.outputs.SemVer2 }} + prerelease: ${{ env.IS_MAIN_BRANCH != 'true' }} + files: | + LICENSE + README.md + ./Publish/* diff --git a/.github/workflows/get-version-task.yml b/.github/workflows/get-version-task.yml new file mode 100644 index 0000000..9130588 --- /dev/null +++ b/.github/workflows/get-version-task.yml @@ -0,0 +1,41 @@ +name: Get version information task + +on: + workflow_call: + outputs: + # Version information outputs + SemVer2: + value: ${{ jobs.get-version.outputs.SemVer2 }} + AssemblyVersion: + value: ${{ jobs.get-version.outputs.AssemblyVersion }} + AssemblyFileVersion: + value: ${{ jobs.get-version.outputs.AssemblyFileVersion }} + AssemblyInformationalVersion: + value: ${{ jobs.get-version.outputs.AssemblyInformationalVersion }} + +jobs: + + get-version: + name: Get version information job + runs-on: ubuntu-latest + outputs: + SemVer2: ${{ steps.nbgv.outputs.SemVer2 }} + AssemblyVersion: ${{ steps.nbgv.outputs.AssemblyVersion }} + AssemblyFileVersion: ${{ steps.nbgv.outputs.AssemblyFileVersion }} + AssemblyInformationalVersion: ${{ steps.nbgv.outputs.AssemblyInformationalVersion }} + + steps: + + - name: Setup .NET SDK step + uses: actions/setup-dotnet@v5 + with: + dotnet-version: 10.x + + - name: Checkout code step + uses: actions/checkout@v6 + with: + fetch-depth: 0 + + - name: Run Nerdbank.GitVersioning tool step + id: nbgv + uses: dotnet/nbgv@master diff --git a/.github/workflows/merge-bot-pr.yml b/.github/workflows/merge-bot-pull-request.yml similarity index 64% rename from .github/workflows/merge-bot-pr.yml rename to .github/workflows/merge-bot-pull-request.yml index 2ac8d07..615ed52 100644 --- a/.github/workflows/merge-bot-pr.yml +++ b/.github/workflows/merge-bot-pull-request.yml @@ -1,4 +1,4 @@ -name: Merge bot generated PRs +name: Merge bot pull request action on: pull_request: @@ -10,39 +10,39 @@ concurrency: jobs: merge-dependabot: - name: Merge dependabot PRs + name: Merge dependabot pull request job runs-on: ubuntu-latest - if: github.actor == 'dependabot[bot]' + if: github.actor == 'dependabot[bot]' && github.event.pull_request.head.repo.full_name == github.repository permissions: contents: write pull-requests: write steps: - - name: Get dependabot metadata + - name: Get dependabot metadata step id: metadata uses: dependabot/fetch-metadata@v2 with: github-token: "${{ secrets.GITHUB_TOKEN }}" - - name: Merge PR + - name: Merge pull request step if: steps.metadata.outputs.update-type != 'version-update:semver-major' run: gh pr merge --auto --merge "$PR_URL" env: PR_URL: ${{github.event.pull_request.html_url}} GH_TOKEN: ${{secrets.GITHUB_TOKEN}} - merge-languagedata: - name: Merge language data PRs + merge-codegen: + name: Merge codegen pull request job runs-on: ubuntu-latest - if: github.actor == 'github-actions[bot]' && github.event.pull_request.base.ref == 'update-languagedata' + if: github.actor == 'github-actions[bot]' && github.event.pull_request.head.ref == 'codegen-update' && github.event.pull_request.head.repo.full_name == github.repository permissions: contents: write pull-requests: write steps: - - name: Merge PR + - name: Merge pull request step run: gh pr merge --auto --merge "$PR_URL" env: PR_URL: ${{github.event.pull_request.html_url}} diff --git a/.github/workflows/publish-release.yml b/.github/workflows/publish-release.yml index 80b10c6..500fd9a 100644 --- a/.github/workflows/publish-release.yml +++ b/.github/workflows/publish-release.yml @@ -1,4 +1,4 @@ -name: Publish release +name: Publish project release action on: push: @@ -11,64 +11,21 @@ concurrency: jobs: - test: - name: Run tests - uses: ./.github/workflows/test-task.yml + create-release: + name: Publish project release job + uses: ./.github/workflows/build-release-task.yml + secrets: inherit + permissions: + contents: write + with: + # Push to GitHub and NuGet + github: true + nuget: true + + date-badge: + name: Create BYOB date badge job + needs: [create-release] + uses: ./.github/workflows/build-datebadge-task.yml secrets: inherit - - publish: - name: Publish release - runs-on: ubuntu-latest - needs: test permissions: contents: write - - steps: - - - name: Setup .NET SDK - uses: actions/setup-dotnet@v5 - with: - dotnet-version: 10.x - - - name: Checkout code - uses: actions/checkout@v6 - with: - fetch-depth: 0 - - - name: Run Nerdbank.GitVersioning - id: nbgv - uses: dotnet/nbgv@master - - - name: Build project - run: >- - dotnet build ./LanguageTags/LanguageTags.csproj - --output ./Publish/ - --configuration ${{ endsWith(github.ref, 'refs/heads/main') && 'Release' || 'Debug' }} - -property:Version=${{ steps.nbgv.outputs.AssemblyVersion }} - -property:FileVersion=${{ steps.nbgv.outputs.AssemblyFileVersion }} - -property:AssemblyVersion=${{ steps.nbgv.outputs.AssemblyVersion }} - -property:InformationalVersion=${{ steps.nbgv.outputs.AssemblyInformationalVersion }} - -property:PackageVersion=${{ steps.nbgv.outputs.SemVer2 }} - - - name: Publish to NuGet.org - run: >- - dotnet nuget push ${{ github.workspace }}/Publish/*.nupkg - --source https://api.nuget.org/v3/index.json - --api-key ${{ secrets.NUGET_API_KEY }} - --skip-duplicate - - - name: Zip output - run: | - cp ./LanguageData/*.json ./Publish/ - 7z a -t7z ./Publish/LanguageTags.7z ./Publish/* - - - name: Create GitHub release - uses: softprops/action-gh-release@v2 - with: - generate_release_notes: true - tag_name: ${{ steps.nbgv.outputs.SemVer2 }} - prerelease: ${{ !endsWith(github.ref, 'refs/heads/main') }} - files: | - LICENSE - README.md - ./Publish/LanguageTags.7z diff --git a/.github/workflows/run-codegen-pull-request-task.yml b/.github/workflows/run-codegen-pull-request-task.yml new file mode 100644 index 0000000..08efd05 --- /dev/null +++ b/.github/workflows/run-codegen-pull-request-task.yml @@ -0,0 +1,42 @@ +name: Run codegen and pull request task + +env: + PROJECT_FILE: ./LanguageTagsCreate/LanguageTagsCreate.csproj + +on: + workflow_call: + +jobs: + + codegen: + name: Run codegen and pull request job + runs-on: ubuntu-latest + + steps: + + - name: Setup .NET SDK step + uses: actions/setup-dotnet@v5 + with: + dotnet-version: 10.x + + - name: Checkout code step + uses: actions/checkout@v6 + + - name: Run codegen step + run: dotnet run --project ${{ env.PROJECT_FILE }} -- --codepath . + + - name: Format code step + run: | + dotnet tool restore + dotnet husky install + dotnet csharpier format --log-level=debug . + git status + + - name: Create pull request step + uses: peter-evans/create-pull-request@v8 + with: + branch: codegen-update + title: 'Update codegen files' + body: 'This PR updates the codegen files.' + commit-message: 'Update codegen files' + delete-branch: true diff --git a/.github/workflows/run-periodic-codegen-pull-request.yml b/.github/workflows/run-periodic-codegen-pull-request.yml new file mode 100644 index 0000000..cb16fe8 --- /dev/null +++ b/.github/workflows/run-periodic-codegen-pull-request.yml @@ -0,0 +1,21 @@ +name: Run weekly CodeGen and Pull Request action + +on: + workflow_dispatch: + schedule: + # Run weekly on Mondays at 02:00 UTC + - cron: '0 2 * * MON' + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + + run-codegen: + name: Run codegen and pull request job + uses: ./.github/workflows/run-codegen-pull-request-task.yml + secrets: inherit + permissions: + contents: write + pull-requests: write diff --git a/.github/workflows/test-pr.yml b/.github/workflows/test-pr.yml deleted file mode 100644 index 06d78fd..0000000 --- a/.github/workflows/test-pr.yml +++ /dev/null @@ -1,17 +0,0 @@ -name: Test PRs - -on: - pull_request: - branches: [ main, develop ] - workflow_dispatch: - -concurrency: - group: ${{ github.workflow }}-${{ github.ref }} - cancel-in-progress: true - -jobs: - - test: - name: Run tests - uses: ./.github/workflows/test-task.yml - secrets: inherit diff --git a/.github/workflows/test-pull-request.yml b/.github/workflows/test-pull-request.yml new file mode 100644 index 0000000..05f4fa2 --- /dev/null +++ b/.github/workflows/test-pull-request.yml @@ -0,0 +1,36 @@ +name: Test pull request action + +on: + pull_request: + branches: [ main, develop ] + workflow_dispatch: + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + + test-release: + name: Test release job + uses: ./.github/workflows/test-release-task.yml + secrets: inherit + + # TODO: Workaround for GitHub Actions not supporting status checks on conditional jobs + # https://github.com/orgs/community/discussions/12395#discussioncomment-12970019 + check-workflow-status: + name: Check pull request workflow status + runs-on: ubuntu-latest + needs: + [ test-release ] + if: always() + steps: + - name: Check workflow results + run: | + exit_on_result() { + if [[ "$2" == "failure" || "$2" == "cancelled" ]]; then + echo "Job '$1' failed or was cancelled." + exit 1 + fi + } + exit_on_result "test-release" "${{ needs.test-release.result }}" diff --git a/.github/workflows/test-release-task.yml b/.github/workflows/test-release-task.yml new file mode 100644 index 0000000..e9e6737 --- /dev/null +++ b/.github/workflows/test-release-task.yml @@ -0,0 +1,40 @@ +name: Test release task + +on: + workflow_call: + workflow_dispatch: + +jobs: + + unit-test: + name: Run unit tests job + runs-on: ubuntu-latest + + steps: + + - name: Setup .NET SDK step + uses: actions/setup-dotnet@v5 + with: + dotnet-version: 10.x + + - name: Checkout code step + uses: actions/checkout@v6 + + - name: Check code style step + run: | + dotnet tool restore + dotnet husky install + dotnet husky run + + - name: Run unit tests step + run: dotnet test + + build-release: + name: Build release without publishing job + needs: [unit-test] + uses: ./.github/workflows/build-release-task.yml + secrets: inherit + with: + # Do not publish + github: false + nuget: false diff --git a/.github/workflows/test-task.yml b/.github/workflows/test-task.yml deleted file mode 100644 index fa1ebf4..0000000 --- a/.github/workflows/test-task.yml +++ /dev/null @@ -1,33 +0,0 @@ -name: Test build - -on: - workflow_call: - workflow_dispatch: - -jobs: - - test: - name: Run tests - runs-on: ubuntu-latest - - steps: - - - name: Setup .NET SDK - uses: actions/setup-dotnet@v5 - with: - dotnet-version: 10.x - - - name: Checkout code - uses: actions/checkout@v6 - - - name: Check code style - run: | - dotnet tool restore - dotnet csharpier check --log-level=debug . - dotnet format style --verify-no-changes --severity=info --verbosity=detailed - - - name: Run unit tests - run: dotnet test - - - name: Build - run: dotnet build diff --git a/.github/workflows/update-languagedata.yml b/.github/workflows/update-languagedata.yml deleted file mode 100644 index 52c6a2f..0000000 --- a/.github/workflows/update-languagedata.yml +++ /dev/null @@ -1,49 +0,0 @@ -name: Update language data - -on: - workflow_dispatch: - schedule: - - cron: '0 2 * * MON' - -concurrency: - group: ${{ github.workflow }}-${{ github.ref }} - cancel-in-progress: true - -jobs: - - update: - name: Update language data - runs-on: ubuntu-latest - permissions: - contents: write - pull-requests: write - - steps: - - - name: Setup .NET SDK - uses: actions/setup-dotnet@v5 - with: - dotnet-version: 10.x - - - name: Checkout code - uses: actions/checkout@v6 - - - name: Download language data and generate code - run: | - dotnet run --project ./LanguageTagsCreate/LanguageTagsCreate.csproj -- . - - - name: Format code - run: | - dotnet tool restore - dotnet husky install - dotnet csharpier format --log-level=debug . - git status - - - name: Create pull request - uses: peter-evans/create-pull-request@v8 - with: - branch: update-languagedata - title: 'Update language data and generated files' - body: 'This PR updates the language data files and regenerates the code.' - commit-message: 'Update language data and generated files' - delete-branch: true diff --git a/.gitignore b/.gitignore index 772ce15..193ff24 100644 --- a/.gitignore +++ b/.gitignore @@ -5,3 +5,6 @@ .idea .vs +.artifacts +.DS_Store +*.user diff --git a/.vscode/launch.json b/.vscode/launch.json index ae5d4cb..58bc348 100644 --- a/.vscode/launch.json +++ b/.vscode/launch.json @@ -8,6 +8,7 @@ "preLaunchTask": ".Net Build", "program": "${workspaceFolder}/LanguageTagsCreate/bin/Debug/net10.0/LanguageTagsCreate.dll", "args": [ + "--codepath", "${workspaceFolder}", ], "cwd": "${workspaceFolder}/LanguageTagsCreate/bin/Debug/net10.0", diff --git a/.vscode/tasks.json b/.vscode/tasks.json index 1261310..059f784 100644 --- a/.vscode/tasks.json +++ b/.vscode/tasks.json @@ -1,22 +1,19 @@ -// dotnet new tool-manifest -// dotnet tool install csharpier -// dotnet tool install husky -// dotnet husky install -// dotnet husky add pre-commit -c "dotnet husky run" -// winget install nektos.act - -// dotnet tool update --all -// winget upgrade nektos.act - { "version": "2.0.0", "tasks": [ { "label": ".Net Build", - "type": "dotnet", - "task": "build", + "type": "process", + "command": "dotnet", + "args": [ + "build", + "${workspaceFolder}", + "--verbosity=diagnostic" + ], "group": "build", - "problemMatcher": ["$msCompile"], + "problemMatcher": [ + "$msCompile" + ], "presentation": { "showReuseMessage": false, "clear": false @@ -40,7 +37,10 @@ "showReuseMessage": false, "clear": false }, - "dependsOn": [".Net Build"] + "dependsOn": [ + "CSharpier Format", + ".Net Build" + ] }, { "label": "CSharpier Format", diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..5ee89e1 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,55 @@ +# Instructions for AI Coding Agents + +**LanguageTags** is a C# .NET library for handling ISO 639-2, ISO 639-3, and RFC 5646 / BCP 47 language tags. + +The project serves two primary purposes: + +1. **Data Publishing**: Provides ISO 639-2, ISO 639-3, and RFC 5646 language tag records in JSON and C# formats +2. **Tag Processing**: Implements IETF BCP 47 language tag construction and parsing per RFC 5646 semantic rules + +For comprehensive coding standards and detailed conventions, refer to [`.github/copilot-instructions.md`](./.github/copilot-instructions.md) and [`CODESTYLE.md`](./CODESTYLE.md). + +## Solution Structure + +### Projects + +- **LanguageTags** (`LanguageTags/LanguageTags.csproj`) + - Core library project + - NuGet package: `ptr727.LanguageTags` + - Target framework: .NET 10.0 + - AOT compatible (`true`) + +- **LanguageTagsCreate** (`LanguageTagsCreate/LanguageTagsCreate.csproj`) + - CLI utility for downloading and generating language data + - Downloads from official sources (Library of Congress, SIL, IANA) + - Converts to JSON and generates C# code files + +- **LanguageTagsTests** (`LanguageTagsTests/LanguageTagsTests.csproj`) + - xUnit test suite with comprehensive test coverage + - Uses AwesomeAssertions for test assertions + +### Key Components + +**Public API Classes:** + +- `LanguageTag`: Main class for working with language tags (parse, build, normalize, validate) +- `LanguageTagBuilder`: Fluent builder for constructing language tags +- `LanguageLookup`: Language code conversion and matching (IETF ↔ ISO) +- `Iso6392Data`: ISO 639-2 language code data +- `Iso6393Data`: ISO 639-3 language code data +- `Rfc5646Data`: RFC 5646 / BCP 47 language subtag registry data +- `ExtensionTag`: Represents extension subtags +- `PrivateUseTag`: Represents private use subtags + +**Internal Classes:** + +- `LanguageTagParser`: Internal parser (use `LanguageTag.Parse()` instead) + +## Authoritative References + +For detailed specifications, see: + +- [`.github/copilot-instructions.md`](./.github/copilot-instructions.md) - Complete coding conventions and style guide +- [`CODESTYLE.md`](./CODESTYLE.md) - Code style and formatting rules +- [`.editorconfig`](./.editorconfig) - Automated style enforcement +- Project task definitions - `CSharpier Format`, `.Net Build`, `.Net Format`, `Husky.Net Run` diff --git a/CODESTYLE.md b/CODESTYLE.md new file mode 100644 index 0000000..3952ddd --- /dev/null +++ b/CODESTYLE.md @@ -0,0 +1,296 @@ +# Code Style and Formatting Rules + +## Build Requirements + +### Zero Warnings Policy + +**CRITICAL**: All builds must complete without warnings. The project enforces this through: + +1. **VS Code tasks** + - `CSharpier Format` → `.Net Build` → `.Net Format` + - `.Net Format` must pass with `--verify-no-changes` before commit + - Command: `dotnet format style --verify-no-changes --severity=info --verbosity=detailed` + +2. **Analyzer configuration** + - `latest-all` + - `true` + - Analyzer severity is `suggestion`, but all warnings must be addressed + +3. **Husky.Net pre-commit hooks** + - Automated checks run before commits + +### Build Tasks + +Available VS Code tasks (use via `run_task` tool): + +- `.Net Build`: Build with diagnostic verbosity +- `.Net Format`: Verify formatting and style (must pass) +- `CSharpier Format`: Auto-format code with CSharpier +- `.Net Tool Update`: Update dotnet tools +- `Husky.Net Run`: Run pre-commit hooks manually + +## Tooling and Editor + +### Code Formatting and Tooling + +1. **CSharpier**: Primary code formatter + - Run before committing: `dotnet csharpier format --log-level=debug .` + +2. **dotnet format**: Style verification + - Verify no changes: `dotnet format style --verify-no-changes --severity=info --verbosity=detailed` + +3. **Husky.Net**: Git hooks for automated checks + - Installed as a local dotnet tool (via `dotnet tool restore`) + - Install Git hooks locally with `dotnet husky install` + - Pre-commit hooks run formatting and style checks + +4. **Other tools** + - `dotnet-outdated-tool`: Dependency update checks + - Nerdbank.GitVersioning: Version management + +### Editor Baseline + +1. **Required VS Code extensions**: CSharpier, markdownlint, CSpell +2. **VS Code settings**: Use the workspace settings without overrides + +### Markdown Files + +1. **Linting**: All `.md` files must be linted with the VS Code `markdownlint` extension (local only; no CI) +2. **Zero warnings**: Markdown linting must be error and warning free + +### Spelling + +1. **CSpell**: All spelling checks must be error free using the CSpell VS Code integration +2. **Accepted spellings**: Words must be correctly spelled in US or UK English +3. **Allowed exceptions**: Project-specific terms must be added to the workspace CSpell config + +## Coding Standards and Conventions + +Note: Code snippets are illustrative examples only. Replace namespaces/types to match your project. + +### C# Language Features + +1. **File-scoped namespaces** + + ```csharp + namespace Example.Project.Library; + ``` + +2. **Nullable reference types**: Enabled (`enable`) + - Use nullable annotations appropriately + - Use `required` for mandatory properties + +3. **Modern C# features**: Prefer modern language constructs + - Primary constructors when appropriate + - Top-level statements for console apps + - Pattern matching over traditional checks + - Collection expressions when types loosely match + - Extension methods using `extension()` syntax + - Implicit object creation when type is apparent + - Range and index operators + +4. **Expression-bodied members**: Use for applicable members + - Methods, properties, accessors, operators, lambdas, local functions + +5. **`var` keyword**: Do NOT use `var` (always use explicit types) + + ```csharp + // Correct + int count = 42; + string name = "test"; + + // Incorrect + var count = 42; + var name = "test"; + ``` + +### Naming Conventions + +1. **Private fields**: underscore prefix with camelCase + + ```csharp + private readonly HttpClient _httpClient; + private int _counter; + ``` + +2. **Static fields**: `s_` prefix with camelCase + + ```csharp + private static int s_instanceCount; + ``` + +3. **Constants**: PascalCase + + ```csharp + private const int MaxRetries = 3; + ``` + +### Code Structure + +1. **Global usings**: Use `GlobalUsings.cs` for common namespaces + + ```csharp + global using System; + global using System.Net.Http; + global using System.Threading.Tasks; + global using Serilog; + ``` + +2. **Usings placement**: Outside namespace, sorted with `System` directives first + + ```csharp + using System.CommandLine; + using System.Runtime.CompilerServices; + using Example.Project.Library; + + namespace Example.Project.Console; + ``` + +3. **Braces**: Allman style + + ```csharp + public void Method() + { + if (condition) + { + // code + } + } + ``` + +4. **Indentation** + - C# files: 4 spaces + - XML/csproj files: 2 spaces + - YAML files: 2 spaces + - JSON files: 4 spaces + +5. **Line endings** + - C#, XML, YAML, JSON, Windows scripts: CRLF + - Linux scripts (`.sh`): LF + +6. **`#region`**: Do not use. Prefer logical file/folder/namespace organization. +7. **Member ordering (StyleCop-like)**: Constants → fields → constructors → properties → indexers → methods → events → operators → finalizers → delegates → nested types + +### Comments and Documentation + +1. **XML documentation** + - `true` + - Missing XML comments for public APIs are suppressed (`.editorconfig`) + - Single-line summaries + + ```csharp + /// + /// This property always returns a value < 1. + /// + ``` + +2. **Code analysis suppressions** + - Do not use `#pragma` sections to disable analyzers + - For one-off cases, use suppression attributes with justifications + - For project-wide suppressions, add rules to `.editorconfig` + + ```csharp + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Design", + "CA1034:Nested types should not be visible", + Justification = "https://github.com/dotnet/sdk/issues/51681" + )] + ``` + +### Error Handling and Logging + +1. **Serilog logging**: Use structured logging + + ```csharp + logger.Error(exception, "{Function}", function); + ``` + +2. **Library log configuration**: Libraries must expose logging configuration + - Provide options or settings to supply an `ILoggerFactory` and/or `ILogger` + - Offer a global fallback logger for static usage when needed + +3. **CallerMemberName**: Use for automatic function name tracking + + ```csharp + public bool LogAndPropagate( + Exception exception, + [CallerMemberName] string function = "unknown" + ) + ``` + +4. **Logger extensions**: Use `Extensions.cs` for logger and other extension methods + + ```csharp + extension(ILogger logger) + { + public bool LogAndPropagate(Exception exception, ...) { } + } + ``` + +5. **Exceptions**: Do not swallow exceptions; log and rethrow or translate to a domain-specific exception + +### Code Patterns + +1. **Guard clauses**: Prefer early returns for validation and error handling +2. **Async all the way**: Avoid blocking calls (`.Result`, `.Wait()`); use `async`/`await` +3. **Cancellation tokens**: Accept `CancellationToken` as the last parameter and pass it through +4. **ConfigureAwait**: In library code, use `ConfigureAwait(false)` unless context is required + - Do not call `ConfigureAwait(false)` in xUnit tests (see xUnit1030) +5. **Disposables**: Use `await using` for async disposables; prefer `using` declarations +6. **LINQ vs loops**: Use LINQ for clarity, loops for hot paths or allocations +7. **HTTP**: Reuse `HttpClient` via factory; avoid per-request instantiation +8. **Collections**: Prefer `IReadOnlyList`/`IReadOnlyCollection` for public APIs +9. **Immutability**: Prefer immutable records; use init-only setters when records are not suitable; prefer immutable or frozen collections for read-only data +10. **Exceptions as control flow**: Avoid using exceptions for expected flow +11. **Sealing classes**: Seal classes that are not designed for inheritance +12. **Read-only data**: Use immutable or frozen collections for read-only data sets +13. **Lazy initialization**: Use `Lazy` for static, thread-safe instantiation (e.g., logger factory, HTTP factory) + +### Testing Conventions + +1. **Framework**: xUnit with FluentAssertions + + ```csharp + [Fact] + public void MethodName_Scenario_ExpectedBehavior() + { + // Arrange + int expected = 42; + + // Act + int actual = GetValue(); + + // Assert + actual.Should().Be(expected); + } + ``` + +2. **Organization**: Arrange-Act-Assert pattern +3. **Naming**: Descriptive names with underscores +4. **Theory tests**: Use `[Theory]` with `[InlineData]` + +## Project Configuration + +1. **Target framework**: .NET 10.0 (`net10.0`) + +2. **AOT compatibility** + - `true` + - `true` + +3. **Assembly information** + - Use semantic versioning + - Include SourceLink: `true` + - Embed untracked sources: `true` + +4. **Internal visibility**: Use `InternalsVisibleTo` for test and benchmark access + + ```xml + + + + + ``` + +## Best Practices + +1. **Code reviews**: All changes go through pull requests diff --git a/HISTORY.md b/HISTORY.md new file mode 100644 index 0000000..2e7d830 --- /dev/null +++ b/HISTORY.md @@ -0,0 +1,17 @@ +# LanguageTags + +C# .NET library for ISO 639-2, ISO 639-3, RFC 5646 / BCP 47 language tags. + +## Release History + +- Version 1.2: + - Refactored the project to follow standard patterns across other projects. + - IO APIs are now async-only (`LoadDataAsync`, `LoadJsonAsync`, `SaveJsonAsync`, `GenCodeAsync`). + - Added logging support for `ILogger` or `ILoggerFactory` per class instance or statically. + - JSON load/save and codegen now stream directly to/from files, no intermediate text buffers. + +- Version 1.1: + - .NET 10 and AOT support. + - Refactored public surfaces to minimize internals exposure. +- Version 1.0: + - Initial standalone release. diff --git a/LICENSE b/LICENSE index 73e1c52..c07453d 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2025 Pieter Viljoen +Copyright (c) 2026 Pieter Viljoen Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/LanguageData/iso6393 b/LanguageData/iso6393 index 6220a91..029e57d 100644 --- a/LanguageData/iso6393 +++ b/LanguageData/iso6393 @@ -850,7 +850,7 @@ bnz I L Beezen boa I L Bora bob I L Aweer bod tib bod bo I L Tibetan -boe I L Mundabli +boe I L Mundabli-Mufu bof I L Bolon bog I L Bamako Sign Language boh I L Boma @@ -1751,6 +1751,7 @@ dyb I E Dyaberdyaber dyd I E Dyugun dyg I E Villa Viciosa Agta dyi I L Djimini Senoufo +dyl I L Bhutanese Sign Language dym I L Yanda Dom Dogon dyn I L Dyangadi dyo I L Jola-Fonyi @@ -3573,6 +3574,7 @@ lex I L Luang ley I L Lemolang lez lez lez I L Lezghian lfa I L Lefa +lfb I L Buu (Cameroon) lfn I C Lingua Franca Nova lga I L Lungga lgb I L Laghu @@ -3784,7 +3786,7 @@ lue I L Luvale luf I L Laua lug lug lug lg I L Ganda luh I L Leizhou Chinese -lui lui lui I E Luiseno +lui lui lui I E Luiseño luj I L Luna luk I L Lunanakha lul I L Olu'bo diff --git a/LanguageData/iso6393.json b/LanguageData/iso6393.json index e0ed638..9db09c8 100644 --- a/LanguageData/iso6393.json +++ b/LanguageData/iso6393.json @@ -5216,7 +5216,7 @@ "Id": "boe", "Scope": "I", "LanguageType": "L", - "RefName": "Mundabli" + "RefName": "Mundabli-Mufu" }, { "Id": "bof", @@ -10726,6 +10726,12 @@ "LanguageType": "L", "RefName": "Djimini Senoufo" }, + { + "Id": "dyl", + "Scope": "I", + "LanguageType": "L", + "RefName": "Bhutanese Sign Language" + }, { "Id": "dym", "Scope": "I", @@ -21965,6 +21971,12 @@ "LanguageType": "L", "RefName": "Lefa" }, + { + "Id": "lfb", + "Scope": "I", + "LanguageType": "L", + "RefName": "Buu (Cameroon)" + }, { "Id": "lfn", "Scope": "I", @@ -23261,7 +23273,7 @@ "Part2T": "lui", "Scope": "I", "LanguageType": "E", - "RefName": "Luiseno" + "RefName": "Luise\u00F1o" }, { "Id": "luj", diff --git a/LanguageTags.code-workspace b/LanguageTags.code-workspace index dee95ac..8f550b7 100644 --- a/LanguageTags.code-workspace +++ b/LanguageTags.code-workspace @@ -8,15 +8,20 @@ "cSpell.words": [ "ABNF", "acrolanguage", + "Allman", "alphanum", + "ANTLR", "arevela", "boont", "CLDR", + "codegen", "csdevkit", + "datebadge", "davidanson", "derbend", "dotnettools", "extlang", + "finalizers", "gruntfuggly", "istorical", "iving", @@ -26,6 +31,7 @@ "langtag", "languagedata", "languagetags", + "LINQ", "lojban", "macrolanguage", "Matroska", @@ -63,7 +69,7 @@ "xtinct", "xunit" ], - "dotnet.defaultSolution": "LanguageTags.sln", + "dotnet.defaultSolution": "LanguageTags.slnx", "files.trimTrailingWhitespace": true, "files.trimTrailingWhitespaceInRegexAndStrings": false, "diffEditor.ignoreTrimWhitespace": false, diff --git a/LanguageTags/.editorconfig b/LanguageTags/.editorconfig new file mode 100644 index 0000000..c9179fd --- /dev/null +++ b/LanguageTags/.editorconfig @@ -0,0 +1,7 @@ +root = false + +# C# files +[*.cs] + +# Ignore normalize strings to uppercase +dotnet_diagnostic.CA1308.severity = none diff --git a/LanguageTags/Extensions.cs b/LanguageTags/Extensions.cs new file mode 100644 index 0000000..b9e4b6e --- /dev/null +++ b/LanguageTags/Extensions.cs @@ -0,0 +1,128 @@ +using System.Runtime.CompilerServices; + +namespace ptr727.LanguageTags; + +internal static partial class LogExtensions +{ + extension(ILogger logger) + { + internal bool LogAndPropagate( + Exception exception, + [CallerMemberName] string function = "unknown" + ) + { + LogCatchException(logger, function, exception); + return false; + } + + internal bool LogAndHandle( + Exception exception, + [CallerMemberName] string function = "unknown" + ) + { + LogCatchException(logger, function, exception); + return true; + } + } + + [LoggerMessage(Message = "Exception in {Function}", Level = LogLevel.Error)] + internal static partial void LogCatchException( + this ILogger logger, + string function, + Exception exception + ); + + [LoggerMessage(Message = "{Message}", Level = LogLevel.Information)] + internal static partial void LogInformation(this ILogger logger, string message); + + [LoggerMessage( + Message = "Failed to parse language tag {LanguageTag}: {Reason}", + Level = LogLevel.Debug + )] + internal static partial void LogParseFailure( + this ILogger logger, + string? languageTag, + string reason + ); + + [LoggerMessage( + Message = "Normalized language tag {OriginalTag} to {NormalizedTag}", + Level = LogLevel.Debug + )] + internal static partial void LogNormalizedTag( + this ILogger logger, + string originalTag, + string normalizedTag + ); + + [LoggerMessage( + Message = "Language tag conversion returned undetermined for {LanguageTag} in {Operation}", + Level = LogLevel.Debug + )] + internal static partial void LogUndeterminedFallback( + this ILogger logger, + string languageTag, + string operation + ); + + [LoggerMessage( + Message = "Loaded {RecordCount} records for {DataKind} from {FileName}.", + Level = LogLevel.Information + )] + internal static partial void LogDataLoaded( + this ILogger logger, + string dataKind, + string fileName, + int recordCount + ); + + [LoggerMessage( + Message = "No data was loaded for {DataKind} from {FileName}.", + Level = LogLevel.Warning + )] + internal static partial void LogDataLoadEmpty( + this ILogger logger, + string dataKind, + string fileName + ); + + [LoggerMessage(Message = "Failed to load {DataKind} from {FileName}.", Level = LogLevel.Error)] + internal static partial void LogDataLoadFailed( + this ILogger logger, + string dataKind, + string fileName, + Exception exception + ); + + [LoggerMessage( + Message = "Found {DataKind} record for {LanguageTag} (include description: {IncludeDescription}).", + Level = LogLevel.Debug + )] + internal static partial void LogFindRecordFound( + this ILogger logger, + string dataKind, + string? languageTag, + bool includeDescription + ); + + [LoggerMessage( + Message = "No {DataKind} record found for {LanguageTag} (include description: {IncludeDescription}).", + Level = LogLevel.Debug + )] + internal static partial void LogFindRecordNotFound( + this ILogger logger, + string dataKind, + string? languageTag, + bool includeDescription + ); + + [LoggerMessage( + Message = "Language tag {LanguageTag} did not match prefix {Prefix}.", + Level = LogLevel.Debug + )] + internal static partial void LogPrefixMatchFailed( + this ILogger logger, + string prefix, + string languageTag + ); +} diff --git a/LanguageTags/GlobalUsings.cs b/LanguageTags/GlobalUsings.cs new file mode 100644 index 0000000..9b521ce --- /dev/null +++ b/LanguageTags/GlobalUsings.cs @@ -0,0 +1,11 @@ +global using System; +global using System.Collections.Generic; +global using System.Collections.Immutable; +global using System.Globalization; +global using System.IO; +global using System.Linq; +global using System.Text; +global using System.Text.Json; +global using System.Threading.Tasks; +global using Microsoft.Extensions.Logging; +global using Microsoft.Extensions.Logging.Abstractions; diff --git a/LanguageTags/Iso6392Data.cs b/LanguageTags/Iso6392Data.cs index 8bff0e1..efa0d11 100644 --- a/LanguageTags/Iso6392Data.cs +++ b/LanguageTags/Iso6392Data.cs @@ -1,29 +1,40 @@ -using System; -using System.Collections.Generic; -using System.Collections.Immutable; -using System.Globalization; -using System.IO; -using System.Linq; -using System.Text; -using System.Text.Json; +using System.Runtime.CompilerServices; namespace ptr727.LanguageTags; /// /// Provides access to ISO 639-2 language code data. /// -public partial class Iso6392Data +public sealed partial class Iso6392Data { internal const string DataUri = "https://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt"; internal const string DataFileName = "iso6392"; /// - /// Loads ISO 639-2 data from a file. + /// Loads ISO 639-2 data from a file asynchronously. /// /// The path to the data file. /// The loaded . /// Thrown when the file contains invalid data. - public static Iso6392Data LoadData(string fileName) + public static Task LoadDataAsync(string fileName) => + LoadDataAsync(fileName, LogOptions.CreateLogger()); + + /// + /// Loads ISO 639-2 data from a file asynchronously using the specified options. + /// + /// The path to the data file. + /// The options used to configure logging. + /// The loaded . + /// Thrown when the file contains invalid data. + public static Task LoadDataAsync(string fileName, Options? options) => + LoadDataAsync(fileName, LogOptions.CreateLogger(options)); + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + private static async Task LoadDataAsync(string fileName, ILogger logger) { // https://www.loc.gov/standards/iso639-2/ascii_8bits.html // Alpha-3 (bibliographic) code @@ -34,107 +45,209 @@ public static Iso6392Data LoadData(string fileName) // | deliminator // LF line terminator - // Read line by line - List recordList = []; - using StreamReader lineReader = new(File.OpenRead(fileName)); - while (lineReader.ReadLine() is { } line) + try { - // Parse using pipe character - List records = [.. line.Split('|').Select(item => item.Trim())]; - if (records.Count != 5) + // Read line by line + List recordList = []; + await using FileStream fileStream = new( + fileName, + FileMode.Open, + FileAccess.Read, + FileShare.Read, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ); + using StreamReader lineReader = new(fileStream); + while (await lineReader.ReadLineAsync().ConfigureAwait(false) is { } line) { - throw new InvalidDataException($"Invalid data found in ISO 639-2 record: {line}"); + // Parse using pipe character + List records = [.. line.Split('|').Select(item => item.Trim())]; + if (records.Count != 5) + { + throw new InvalidDataException( + $"Invalid data found in ISO 639-2 record: {line}" + ); + } + + // Populate record + Iso6392Record record = new() + { + Part2B = string.IsNullOrEmpty(records[0]) ? null : records[0], + Part2T = string.IsNullOrEmpty(records[1]) ? null : records[1], + Part1 = string.IsNullOrEmpty(records[2]) ? null : records[2], + RefName = string.IsNullOrEmpty(records[3]) ? null : records[3], + }; + if (string.IsNullOrEmpty(record.Part2B) || string.IsNullOrEmpty(record.RefName)) + { + throw new InvalidDataException( + $"Invalid data found in ISO 639-2 record: {line}" + ); + } + recordList.Add(record); } - // Populate record - Iso6392Record record = new() + if (recordList.Count == 0) { - Part2B = string.IsNullOrEmpty(records[0]) ? null : records[0], - Part2T = string.IsNullOrEmpty(records[1]) ? null : records[1], - Part1 = string.IsNullOrEmpty(records[2]) ? null : records[2], - RefName = string.IsNullOrEmpty(records[3]) ? null : records[3], - }; - if (string.IsNullOrEmpty(record.Part2B) || string.IsNullOrEmpty(record.RefName)) - { - throw new InvalidDataException($"Invalid data found in ISO 639-2 record: {line}"); + logger.LogDataLoadEmpty(nameof(Iso6392Data), fileName); + throw new InvalidDataException($"No data found in ISO 639-2 file: {fileName}"); } - recordList.Add(record); + + Iso6392Data data = new() { RecordList = [.. recordList] }; + logger.LogDataLoaded(nameof(Iso6392Data), fileName, data.RecordList.Length); + return data; + } + catch (Exception exception) + { + logger.LogDataLoadFailed(nameof(Iso6392Data), fileName, exception); + throw; } - return recordList.Count == 0 - ? throw new InvalidDataException($"No data found in ISO 639-2 file: {fileName}") - : new Iso6392Data { RecordList = [.. recordList] }; } /// - /// Loads ISO 639-2 data from a JSON file. + /// Loads ISO 639-2 data from a JSON file asynchronously. /// /// The path to the JSON file. - /// The loaded or null if deserialization fails. - public static Iso6392Data? LoadJson(string fileName) => - JsonSerializer.Deserialize( - File.ReadAllText(fileName), - LanguageJsonContext.Default.Iso6392Data - ); + /// + /// The loaded , or null when deserialization yields no data. + /// + /// Thrown when the file cannot be read. + /// Thrown when the JSON is invalid. + public static Task LoadJsonAsync(string fileName) => + LoadJsonAsync(fileName, LogOptions.CreateLogger()); + + /// + /// Loads ISO 639-2 data from a JSON file asynchronously using the specified options. + /// + /// The path to the JSON file. + /// The options used to configure logging. + /// + /// The loaded , or null when deserialization yields no data. + /// + /// Thrown when the file cannot be read. + /// Thrown when the JSON is invalid. + public static Task LoadJsonAsync(string fileName, Options? options) => + LoadJsonAsync(fileName, LogOptions.CreateLogger(options)); + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + private static async Task LoadJsonAsync(string fileName, ILogger logger) + { + try + { + await using FileStream fileStream = new( + fileName, + FileMode.Open, + FileAccess.Read, + FileShare.Read, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ); + Iso6392Data? data = await JsonSerializer + .DeserializeAsync(fileStream, LanguageJsonContext.Default.Iso6392Data) + .ConfigureAwait(false); + if (data == null) + { + logger.LogDataLoadEmpty(nameof(Iso6392Data), fileName); + } + else + { + logger.LogDataLoaded(nameof(Iso6392Data), fileName, data.RecordList.Length); + } + + return data; + } + catch (Exception exception) + { + logger.LogDataLoadFailed(nameof(Iso6392Data), fileName, exception); + throw; + } + } - internal static void SaveJson(string fileName, Iso6392Data iso6392) => - File.WriteAllText( + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + internal static async Task SaveJsonAsync(string fileName, Iso6392Data iso6392) + { + await using FileStream fileStream = new( fileName, - JsonSerializer.Serialize(iso6392, LanguageJsonContext.Default.Iso6392Data) + FileMode.Create, + FileAccess.Write, + FileShare.None, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan ); + await JsonSerializer + .SerializeAsync(fileStream, iso6392, LanguageJsonContext.Default.Iso6392Data) + .ConfigureAwait(false); + } - internal static void GenCode(string fileName, Iso6392Data iso6392) + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + internal static async Task GenCodeAsync(string fileName, Iso6392Data iso6392) { ArgumentNullException.ThrowIfNull(iso6392); - StringBuilder stringBuilder = new(); - _ = stringBuilder - .Append( - """ - namespace ptr727.LanguageTags; - - /// - /// Provides access to ISO 639-2 language code data. - /// - public partial class Iso6392Data - { - public static Iso6392Data Create() => - new() - { - RecordList = - [ - """ - ) - .Append("\r\n"); + StreamWriter writer = new( + new FileStream( + fileName, + FileMode.Create, + FileAccess.Write, + FileShare.None, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ), + new UTF8Encoding(false) + ) + { + NewLine = "\r\n", + }; + await using ConfiguredAsyncDisposable writerScope = writer.ConfigureAwait(false); + + ConfiguredTaskAwaitable WriteLineAsync(string value) => + writer.WriteLineAsync(value).ConfigureAwait(false); + + await WriteLineAsync("namespace ptr727.LanguageTags;"); + await WriteLineAsync(string.Empty); + await WriteLineAsync("/// "); + await WriteLineAsync("/// Provides access to ISO 639-2 language code data."); + await WriteLineAsync("/// "); + await WriteLineAsync("public sealed partial class Iso6392Data"); + await WriteLineAsync("{"); + await WriteLineAsync(" public static Iso6392Data Create() =>"); + await WriteLineAsync(" new()"); + await WriteLineAsync(" {"); + await WriteLineAsync(" RecordList ="); + await WriteLineAsync(" ["); foreach (Iso6392Record record in iso6392.RecordList) { - _ = stringBuilder - .Append( - CultureInfo.InvariantCulture, - $$""" - new() - { - Part2B = {{LanguageSchema.GetCodeGenString(record.Part2B)}}, - Part2T = {{LanguageSchema.GetCodeGenString(record.Part2T)}}, - Part1 = {{LanguageSchema.GetCodeGenString(record.Part1)}}, - RefName = {{LanguageSchema.GetCodeGenString( - record.RefName - )}}, - }, - """ - ) - .Append("\r\n"); + await WriteLineAsync(" new()"); + await WriteLineAsync(" {"); + await WriteLineAsync( + $" Part2B = {LanguageSchema.GetCodeGenString(record.Part2B)}," + ); + await WriteLineAsync( + $" Part2T = {LanguageSchema.GetCodeGenString(record.Part2T)}," + ); + await WriteLineAsync( + $" Part1 = {LanguageSchema.GetCodeGenString(record.Part1)}," + ); + await WriteLineAsync( + $" RefName = {LanguageSchema.GetCodeGenString(record.RefName)}," + ); + await WriteLineAsync(" },"); } - _ = stringBuilder - .Append( - """ - ], - }; - } - """ - ) - .Append("\r\n"); - LanguageSchema.WriteFile(fileName, stringBuilder.ToString()); + await WriteLineAsync(" ],"); + await WriteLineAsync(" };"); + await WriteLineAsync("}"); } /// @@ -148,10 +261,24 @@ public static Iso6392Data Create() => /// The language code or description to search for. /// If true, searches in the reference name field; otherwise, only searches language codes. /// The matching or null if not found. - public Iso6392Record? Find(string? languageTag, bool includeDescription) + public Iso6392Record? Find(string? languageTag, bool includeDescription) => + Find(languageTag, includeDescription, LogOptions.CreateLogger()); + + /// + /// Finds an ISO 639-2 language record by language code or description using the specified options. + /// + /// The language code or description to search for. + /// If true, searches in the reference name field; otherwise, only searches language codes. + /// The options used to configure logging. + /// The matching or null if not found. + public Iso6392Record? Find(string? languageTag, bool includeDescription, Options? options) => + Find(languageTag, includeDescription, LogOptions.CreateLogger(options)); + + private Iso6392Record? Find(string? languageTag, bool includeDescription, ILogger logger) { if (string.IsNullOrEmpty(languageTag)) { + logger.LogFindRecordNotFound(nameof(Iso6392Data), languageTag, includeDescription); return null; } @@ -168,6 +295,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6392Data), languageTag, includeDescription); return record; } @@ -178,6 +306,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6392Data), languageTag, includeDescription); return record; } } @@ -192,6 +321,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6392Data), languageTag, includeDescription); return record; } } @@ -206,6 +336,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6392Data), languageTag, includeDescription); return record; } @@ -216,11 +347,13 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6392Data), languageTag, includeDescription); return record; } } // Not found + logger.LogFindRecordNotFound(nameof(Iso6392Data), languageTag, includeDescription); return null; } } @@ -228,7 +361,7 @@ record = RecordList.FirstOrDefault(item => /// /// Represents an ISO 639-2 language code record. /// -public record Iso6392Record +public sealed record Iso6392Record { /// /// Gets the ISO 639-2/B bibliographic code (3 letters). diff --git a/LanguageTags/Iso6392DataGen.cs b/LanguageTags/Iso6392DataGen.cs index 5a1cf77..ec4cf4a 100644 --- a/LanguageTags/Iso6392DataGen.cs +++ b/LanguageTags/Iso6392DataGen.cs @@ -3,7 +3,7 @@ namespace ptr727.LanguageTags; /// /// Provides access to ISO 639-2 language code data. /// -public partial class Iso6392Data +public sealed partial class Iso6392Data { public static Iso6392Data Create() => new() diff --git a/LanguageTags/Iso6393Data.cs b/LanguageTags/Iso6393Data.cs index e55f8f2..b204a1b 100644 --- a/LanguageTags/Iso6393Data.cs +++ b/LanguageTags/Iso6393Data.cs @@ -1,30 +1,41 @@ -using System; -using System.Collections.Generic; -using System.Collections.Immutable; -using System.Globalization; -using System.IO; -using System.Linq; -using System.Text; -using System.Text.Json; +using System.Runtime.CompilerServices; namespace ptr727.LanguageTags; /// /// Provides access to ISO 639-3 language code data. /// -public partial class Iso6393Data +public sealed partial class Iso6393Data { internal const string DataUri = "https://iso639-3.sil.org/sites/iso639-3/files/downloads/iso-639-3.tab"; internal const string DataFileName = "iso6393"; /// - /// Loads ISO 639-3 data from a file. + /// Loads ISO 639-3 data from a file asynchronously. /// /// The path to the data file. /// The loaded . /// Thrown when the file contains invalid data. - public static Iso6393Data LoadData(string fileName) + public static Task LoadDataAsync(string fileName) => + LoadDataAsync(fileName, LogOptions.CreateLogger()); + + /// + /// Loads ISO 639-3 data from a file asynchronously using the specified options. + /// + /// The path to the data file. + /// The options used to configure logging. + /// The loaded . + /// Thrown when the file contains invalid data. + public static Task LoadDataAsync(string fileName, Options? options) => + LoadDataAsync(fileName, LogOptions.CreateLogger(options)); + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + private static async Task LoadDataAsync(string fileName, ILogger logger) { // https://iso639-3.sil.org/code_tables/download_tables // Id char(3) NOT NULL, The three-letter 639-3 identifier @@ -36,134 +47,243 @@ public static Iso6393Data LoadData(string fileName) // Ref_Name varchar(150) NOT NULL, Reference language name // Comment varchar(150) NULL) Comment relating to one or more of the columns - // Read header - // Id Part2b Part2t Part1 Scope Language_Type Ref_Name Comment - List recordList = []; - using StreamReader lineReader = new(File.OpenRead(fileName)); - string? line = lineReader.ReadLine(); - if (string.IsNullOrEmpty(line)) - { - throw new InvalidDataException($"Missing header line in ISO 639-3 file: {fileName}"); - } - List records = [.. line.Split('\t').Select(item => item.Trim())]; - if (records.Count != 8) - { - throw new InvalidDataException($"Invalid data found in ISO 639-3 record: {line}"); - } - - // Read line by line - while ((line = lineReader.ReadLine()) is not null) + try { - // Parse using tab character - records = [.. line.Split('\t').Select(item => item.Trim())]; + // Read header + // Id Part2b Part2t Part1 Scope Language_Type Ref_Name Comment + List recordList = []; + await using FileStream fileStream = new( + fileName, + FileMode.Open, + FileAccess.Read, + FileShare.Read, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ); + using StreamReader lineReader = new(fileStream); + string? line = await lineReader.ReadLineAsync().ConfigureAwait(false); + if (string.IsNullOrEmpty(line)) + { + throw new InvalidDataException( + $"Missing header line in ISO 639-3 file: {fileName}" + ); + } + List records = [.. line.Split('\t').Select(item => item.Trim())]; if (records.Count != 8) { throw new InvalidDataException($"Invalid data found in ISO 639-3 record: {line}"); } - // Populate record - Iso6393Record record = new() + // Read line by line + while ((line = await lineReader.ReadLineAsync().ConfigureAwait(false)) is not null) { - Id = string.IsNullOrEmpty(records[0]) ? null : records[0], - Part2B = string.IsNullOrEmpty(records[1]) ? null : records[1], - Part2T = string.IsNullOrEmpty(records[2]) ? null : records[2], - Part1 = string.IsNullOrEmpty(records[3]) ? null : records[3], - Scope = string.IsNullOrEmpty(records[4]) ? null : records[4], - LanguageType = string.IsNullOrEmpty(records[5]) ? null : records[5], - RefName = string.IsNullOrEmpty(records[6]) ? null : records[6], - Comment = string.IsNullOrEmpty(records[7]) ? null : records[7], - }; - if ( - string.IsNullOrEmpty(record.Id) - || string.IsNullOrEmpty(record.Scope) - || string.IsNullOrEmpty(record.LanguageType) - || string.IsNullOrEmpty(record.RefName) - ) + // Parse using tab character + records = [.. line.Split('\t').Select(item => item.Trim())]; + if (records.Count != 8) + { + throw new InvalidDataException( + $"Invalid data found in ISO 639-3 record: {line}" + ); + } + + // Populate record + Iso6393Record record = new() + { + Id = string.IsNullOrEmpty(records[0]) ? null : records[0], + Part2B = string.IsNullOrEmpty(records[1]) ? null : records[1], + Part2T = string.IsNullOrEmpty(records[2]) ? null : records[2], + Part1 = string.IsNullOrEmpty(records[3]) ? null : records[3], + Scope = string.IsNullOrEmpty(records[4]) ? null : records[4], + LanguageType = string.IsNullOrEmpty(records[5]) ? null : records[5], + RefName = string.IsNullOrEmpty(records[6]) ? null : records[6], + Comment = string.IsNullOrEmpty(records[7]) ? null : records[7], + }; + if ( + string.IsNullOrEmpty(record.Id) + || string.IsNullOrEmpty(record.Scope) + || string.IsNullOrEmpty(record.LanguageType) + || string.IsNullOrEmpty(record.RefName) + ) + { + throw new InvalidDataException( + $"Invalid data found in ISO 639-3 record: {line}" + ); + } + recordList.Add(record); + } + + if (recordList.Count == 0) { - throw new InvalidDataException($"Invalid data found in ISO 639-3 record: {line}"); + logger.LogDataLoadEmpty(nameof(Iso6393Data), fileName); + throw new InvalidDataException($"No data found in ISO 639-3 file: {fileName}"); } - recordList.Add(record); + + Iso6393Data data = new() { RecordList = [.. recordList] }; + logger.LogDataLoaded(nameof(Iso6393Data), fileName, data.RecordList.Length); + return data; + } + catch (Exception exception) + { + logger.LogDataLoadFailed(nameof(Iso6393Data), fileName, exception); + throw; } - return recordList.Count == 0 - ? throw new InvalidDataException($"No data found in ISO 639-3 file: {fileName}") - : new Iso6393Data { RecordList = [.. recordList] }; } /// - /// Loads ISO 639-3 data from a JSON file. + /// Loads ISO 639-3 data from a JSON file asynchronously. /// /// The path to the JSON file. - /// The loaded or null if deserialization fails. - public static Iso6393Data? LoadJson(string fileName) => - JsonSerializer.Deserialize( - File.ReadAllText(fileName), - LanguageJsonContext.Default.Iso6393Data - ); + /// + /// The loaded , or null when deserialization yields no data. + /// + /// Thrown when the file cannot be read. + /// Thrown when the JSON is invalid. + public static Task LoadJsonAsync(string fileName) => + LoadJsonAsync(fileName, LogOptions.CreateLogger()); + + /// + /// Loads ISO 639-3 data from a JSON file asynchronously using the specified options. + /// + /// The path to the JSON file. + /// The options used to configure logging. + /// + /// The loaded , or null when deserialization yields no data. + /// + /// Thrown when the file cannot be read. + /// Thrown when the JSON is invalid. + public static Task LoadJsonAsync(string fileName, Options? options) => + LoadJsonAsync(fileName, LogOptions.CreateLogger(options)); - internal static void SaveJson(string fileName, Iso6393Data iso6393) => - File.WriteAllText( + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + private static async Task LoadJsonAsync(string fileName, ILogger logger) + { + try + { + await using FileStream fileStream = new( + fileName, + FileMode.Open, + FileAccess.Read, + FileShare.Read, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ); + Iso6393Data? data = await JsonSerializer + .DeserializeAsync(fileStream, LanguageJsonContext.Default.Iso6393Data) + .ConfigureAwait(false); + if (data == null) + { + logger.LogDataLoadEmpty(nameof(Iso6393Data), fileName); + } + else + { + logger.LogDataLoaded(nameof(Iso6393Data), fileName, data.RecordList.Length); + } + + return data; + } + catch (Exception exception) + { + logger.LogDataLoadFailed(nameof(Iso6393Data), fileName, exception); + throw; + } + } + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + internal static async Task SaveJsonAsync(string fileName, Iso6393Data iso6393) + { + await using FileStream fileStream = new( fileName, - JsonSerializer.Serialize(iso6393, LanguageJsonContext.Default.Iso6393Data) + FileMode.Create, + FileAccess.Write, + FileShare.None, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan ); + await JsonSerializer + .SerializeAsync(fileStream, iso6393, LanguageJsonContext.Default.Iso6393Data) + .ConfigureAwait(false); + } - internal static void GenCode(string fileName, Iso6393Data iso6393) + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + internal static async Task GenCodeAsync(string fileName, Iso6393Data iso6393) { ArgumentNullException.ThrowIfNull(iso6393); - StringBuilder stringBuilder = new(); - _ = stringBuilder - .Append( - """ - namespace ptr727.LanguageTags; - - /// - /// Provides access to ISO 639-3 language code data. - /// - public partial class Iso6393Data - { - public static Iso6393Data Create() => - new() - { - RecordList = - [ - """ - ) - .Append("\r\n"); + + StreamWriter writer = new( + new FileStream( + fileName, + FileMode.Create, + FileAccess.Write, + FileShare.None, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ), + new UTF8Encoding(false) + ) + { + NewLine = "\r\n", + }; + await using ConfiguredAsyncDisposable writerScope = writer.ConfigureAwait(false); + + ConfiguredTaskAwaitable WriteLineAsync(string value) => + writer.WriteLineAsync(value).ConfigureAwait(false); + + await WriteLineAsync("namespace ptr727.LanguageTags;"); + await WriteLineAsync(string.Empty); + await WriteLineAsync("/// "); + await WriteLineAsync("/// Provides access to ISO 639-3 language code data."); + await WriteLineAsync("/// "); + await WriteLineAsync("public sealed partial class Iso6393Data"); + await WriteLineAsync("{"); + await WriteLineAsync(" public static Iso6393Data Create() =>"); + await WriteLineAsync(" new()"); + await WriteLineAsync(" {"); + await WriteLineAsync(" RecordList ="); + await WriteLineAsync(" ["); foreach (Iso6393Record record in iso6393.RecordList) { - _ = stringBuilder - .Append( - CultureInfo.InvariantCulture, - $$""" - new() - { - Id = {{LanguageSchema.GetCodeGenString(record.Id)}}, - Part2B = {{LanguageSchema.GetCodeGenString(record.Part2B)}}, - Part2T = {{LanguageSchema.GetCodeGenString(record.Part2T)}}, - Part1 = {{LanguageSchema.GetCodeGenString(record.Part1)}}, - Scope = {{LanguageSchema.GetCodeGenString(record.Scope)}}, - LanguageType = {{LanguageSchema.GetCodeGenString( - record.LanguageType - )}}, - RefName = {{LanguageSchema.GetCodeGenString( - record.RefName - )}}, - }, - """ - ) - .Append("\r\n"); + await WriteLineAsync(" new()"); + await WriteLineAsync(" {"); + await WriteLineAsync( + $" Id = {LanguageSchema.GetCodeGenString(record.Id)}," + ); + await WriteLineAsync( + $" Part2B = {LanguageSchema.GetCodeGenString(record.Part2B)}," + ); + await WriteLineAsync( + $" Part2T = {LanguageSchema.GetCodeGenString(record.Part2T)}," + ); + await WriteLineAsync( + $" Part1 = {LanguageSchema.GetCodeGenString(record.Part1)}," + ); + await WriteLineAsync( + $" Scope = {LanguageSchema.GetCodeGenString(record.Scope)}," + ); + await WriteLineAsync( + $" LanguageType = {LanguageSchema.GetCodeGenString(record.LanguageType)}," + ); + await WriteLineAsync( + $" RefName = {LanguageSchema.GetCodeGenString(record.RefName)}," + ); + await WriteLineAsync(" },"); } - _ = stringBuilder - .Append( - """ - ], - }; - } - """ - ) - .Append("\r\n"); - LanguageSchema.WriteFile(fileName, stringBuilder.ToString()); + await WriteLineAsync(" ],"); + await WriteLineAsync(" };"); + await WriteLineAsync("}"); } /// @@ -177,10 +297,24 @@ public static Iso6393Data Create() => /// The language code or description to search for. /// If true, searches in the reference name field; otherwise, only searches language codes. /// The matching or null if not found. - public Iso6393Record? Find(string? languageTag, bool includeDescription) + public Iso6393Record? Find(string? languageTag, bool includeDescription) => + Find(languageTag, includeDescription, LogOptions.CreateLogger()); + + /// + /// Finds an ISO 639-3 language record by language code or description using the specified options. + /// + /// The language code or description to search for. + /// If true, searches in the reference name field; otherwise, only searches language codes. + /// The options used to configure logging. + /// The matching or null if not found. + public Iso6393Record? Find(string? languageTag, bool includeDescription, Options? options) => + Find(languageTag, includeDescription, LogOptions.CreateLogger(options)); + + private Iso6393Record? Find(string? languageTag, bool includeDescription, ILogger logger) { if (string.IsNullOrEmpty(languageTag)) { + logger.LogFindRecordNotFound(nameof(Iso6393Data), languageTag, includeDescription); return null; } @@ -197,6 +331,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6393Data), languageTag, includeDescription); return record; } @@ -207,6 +342,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6393Data), languageTag, includeDescription); return record; } @@ -217,6 +353,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6393Data), languageTag, includeDescription); return record; } } @@ -231,6 +368,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6393Data), languageTag, includeDescription); return record; } } @@ -245,6 +383,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6393Data), languageTag, includeDescription); return record; } @@ -255,11 +394,13 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Iso6393Data), languageTag, includeDescription); return record; } } // Not found + logger.LogFindRecordNotFound(nameof(Iso6393Data), languageTag, includeDescription); return null; } } @@ -267,7 +408,7 @@ record = RecordList.FirstOrDefault(item => /// /// Represents an ISO 639-3 language code record. /// -public record Iso6393Record +public sealed record Iso6393Record { /// /// Gets the ISO 639-3 identifier (3 letters). diff --git a/LanguageTags/Iso6393DataGen.cs b/LanguageTags/Iso6393DataGen.cs index 4bbfc31..df64bd4 100644 --- a/LanguageTags/Iso6393DataGen.cs +++ b/LanguageTags/Iso6393DataGen.cs @@ -3,7 +3,7 @@ namespace ptr727.LanguageTags; /// /// Provides access to ISO 639-3 language code data. /// -public partial class Iso6393Data +public sealed partial class Iso6393Data { public static Iso6393Data Create() => new() @@ -8528,7 +8528,7 @@ public static Iso6393Data Create() => Part1 = null, Scope = "I", LanguageType = "L", - RefName = "Mundabli", + RefName = "Mundabli-Mufu", }, new() { @@ -17531,6 +17531,16 @@ public static Iso6393Data Create() => RefName = "Djimini Senoufo", }, new() + { + Id = "dyl", + Part2B = null, + Part2T = null, + Part1 = null, + Scope = "I", + LanguageType = "L", + RefName = "Bhutanese Sign Language", + }, + new() { Id = "dym", Part2B = null, @@ -35751,6 +35761,16 @@ public static Iso6393Data Create() => RefName = "Lefa", }, new() + { + Id = "lfb", + Part2B = null, + Part2T = null, + Part1 = null, + Scope = "I", + LanguageType = "L", + RefName = "Buu (Cameroon)", + }, + new() { Id = "lfn", Part2B = null, @@ -37868,7 +37888,7 @@ public static Iso6393Data Create() => Part1 = null, Scope = "I", LanguageType = "E", - RefName = "Luiseno", + RefName = "Luiseño", }, new() { diff --git a/LanguageTags/LanguageLookup.cs b/LanguageTags/LanguageLookup.cs index d04f4d2..bb67b2f 100644 --- a/LanguageTags/LanguageLookup.cs +++ b/LanguageTags/LanguageLookup.cs @@ -1,20 +1,17 @@ -using System; -using System.Collections.Generic; -using System.Globalization; -using System.Linq; - namespace ptr727.LanguageTags; /// /// Provides language code lookup and conversion functionality between IETF and ISO standards. /// -public class LanguageLookup +/// The options used to configure logging. +public sealed class LanguageLookup(Options? options = null) { /// /// The language code for undetermined languages ("und"). /// public const string Undetermined = "und"; + private readonly ILogger _logger = LogOptions.CreateLogger(options); private readonly Iso6392Data _iso6392 = Iso6392Data.Create(); private readonly Iso6393Data _iso6393 = Iso6393Data.Create(); private readonly Rfc5646Data _rfc5646 = Rfc5646Data.Create(); @@ -110,7 +107,13 @@ public string GetIetfFromIso(string languageTag) // Try CultureInfo CultureInfo? cultureInfo = CreateCultureInfo(languageTag); - return cultureInfo != null ? cultureInfo.IetfLanguageTag : Undetermined; + if (cultureInfo != null) + { + return cultureInfo.IetfLanguageTag; + } + + _logger.LogUndeterminedFallback(languageTag, nameof(GetIetfFromIso)); + return Undetermined; } /// @@ -174,6 +177,7 @@ public string GetIsoFromIetf(string languageTag) CultureInfo? cultureInfo = CreateCultureInfo(languageTag); if (cultureInfo == null) { + _logger.LogUndeterminedFallback(languageTag, nameof(GetIsoFromIetf)); return Undetermined; } @@ -184,6 +188,7 @@ public string GetIsoFromIetf(string languageTag) // Return the Part 2B code return iso6393.Part2B!; } + _logger.LogUndeterminedFallback(languageTag, nameof(GetIsoFromIetf)); return Undetermined; } @@ -199,6 +204,9 @@ public bool IsMatch(string prefix, string languageTag) ArgumentNullException.ThrowIfNull(prefix); ArgumentNullException.ThrowIfNull(languageTag); + string originalPrefix = prefix; + string originalTag = languageTag; + // TODO: Conditional parse and normalize before processing // https://r12a.github.io/app-subtags/ @@ -244,6 +252,7 @@ public bool IsMatch(string prefix, string languageTag) } // No match + _logger.LogPrefixMatchFailed(originalPrefix, originalTag); return false; } } diff --git a/LanguageTags/LanguageSchema.cs b/LanguageTags/LanguageSchema.cs index d4da707..0bfd5b2 100644 --- a/LanguageTags/LanguageSchema.cs +++ b/LanguageTags/LanguageSchema.cs @@ -1,15 +1,10 @@ -using System; -using System.Collections.Generic; -using System.IO; -using System.Linq; -using System.Text.Json; using System.Text.Json.Serialization; namespace ptr727.LanguageTags; internal static class LanguageSchema { - internal static void WriteFile(string fileName, string value) + internal static async Task WriteFileAsync(string fileName, string value) { // Always write as CRLF with newline at the end if ( @@ -20,7 +15,7 @@ internal static void WriteFile(string fileName, string value) value = value.Replace("\n", "\r\n", StringComparison.Ordinal); } value = value.TrimEnd() + "\r\n"; - File.WriteAllText(fileName, value); + await File.WriteAllTextAsync(fileName, value).ConfigureAwait(false); } internal static string GetCodeGenString(string? text) => diff --git a/LanguageTags/LanguageTag.cs b/LanguageTags/LanguageTag.cs index a310cbe..b0a04a2 100644 --- a/LanguageTags/LanguageTag.cs +++ b/LanguageTags/LanguageTag.cs @@ -1,17 +1,11 @@ -using System; -using System.Collections.Generic; -using System.Collections.Immutable; using System.Diagnostics.CodeAnalysis; -using System.Globalization; -using System.Linq; -using System.Text; namespace ptr727.LanguageTags; /// /// Represents a language tag conforming to RFC 5646 / BCP 47. /// -public class LanguageTag : IEquatable +public sealed class LanguageTag : IEquatable { internal LanguageTag() { @@ -33,7 +27,7 @@ internal LanguageTag(LanguageTag languageTag) Region = languageTag.Region; _variants = [.. languageTag._variants]; _extensions = [.. languageTag._extensions]; - PrivateUse = new PrivateUseTag(languageTag.PrivateUse); + PrivateUse = languageTag.PrivateUse; } /// @@ -71,7 +65,7 @@ internal LanguageTag(LanguageTag languageTag) /// /// Gets the private use subtag. /// - public PrivateUseTag PrivateUse { get; init; } + public PrivateUseTag PrivateUse { get; internal set; } /// /// Parses a language tag string into a LanguageTag object. @@ -80,6 +74,15 @@ internal LanguageTag(LanguageTag languageTag) /// A parsed and normalized LanguageTag object, or null if parsing fails. public static LanguageTag? Parse(string tag) => new LanguageTagParser().Parse(tag); + /// + /// Parses a language tag string into a LanguageTag object using the specified options. + /// + /// The language tag string to parse (e.g., "en-US", "zh-Hans-CN"). + /// The options used to configure logging. + /// A parsed and normalized LanguageTag object, or null if parsing fails. + public static LanguageTag? Parse(string tag, Options? options) => + new LanguageTagParser(options).Parse(tag); + /// /// Parses a language tag string, returning a default tag if parsing fails. /// @@ -99,6 +102,15 @@ public static LanguageTag ParseOrDefault(string tag, LanguageTag? defaultTag = n /// A normalized language tag or null if parsing/normalization fails. public static LanguageTag? ParseAndNormalize(string tag) => Parse(tag)?.Normalize(); + /// + /// Parses and normalizes a language tag string using the specified options. + /// + /// The language tag string. + /// The options used to configure logging. + /// A normalized language tag or null if parsing/normalization fails. + public static LanguageTag? ParseAndNormalize(string tag, Options? options) => + Parse(tag, options)?.Normalize(options); + /// /// Tries to parse a language tag string into a LanguageTag object. /// @@ -111,6 +123,23 @@ public static bool TryParse(string tag, [NotNullWhen(true)] out LanguageTag? res return result != null; } + /// + /// Tries to parse a language tag string into a LanguageTag object using the specified options. + /// + /// The language tag string to parse (e.g., "en-US", "zh-Hans-CN"). + /// When this method returns, contains the parsed LanguageTag if successful, or null if parsing fails. + /// The options used to configure logging. + /// true if the tag was successfully parsed; otherwise, false. + public static bool TryParse( + string tag, + [NotNullWhen(true)] out LanguageTag? result, + Options? options + ) + { + result = Parse(tag, options); + return result != null; + } + /// /// Creates a new LanguageTagBuilder for fluent construction of language tags. /// @@ -134,6 +163,14 @@ public static bool TryParse(string tag, [NotNullWhen(true)] out LanguageTag? res /// A normalized copy of this language tag. public LanguageTag? Normalize() => new LanguageTagParser().Normalize(this); + /// + /// Normalizes this language tag according to RFC 5646 rules using the specified options. + /// + /// The options used to configure logging. + /// A normalized copy of this language tag. + public LanguageTag? Normalize(Options? options) => + new LanguageTagParser(options).Normalize(this); + /// /// Converts this language tag to its string representation. /// @@ -171,7 +208,7 @@ public override string ToString() $"-{string.Join('-', _extensions.Select(item => item.ToString()))}" ); } - if (PrivateUse._tags.Count > 0) + if (!PrivateUse.Tags.IsEmpty) { if (stringBuilder.Length > 0) { @@ -258,66 +295,132 @@ string region /// /// Represents an extension subtag in a language tag. /// -public class ExtensionTag +/// The single-character prefix for the extension (e.g., 'u' for Unicode extensions). +/// The list of extension subtag values. +public sealed record ExtensionTag(char Prefix, ImmutableArray Tags) { - internal ExtensionTag() - { - Prefix = '\0'; - _tags = []; - } - - internal ExtensionTag(ExtensionTag extensionTag) - { - ArgumentNullException.ThrowIfNull(extensionTag); - Prefix = extensionTag.Prefix; - _tags = [.. extensionTag._tags]; - } - /// - /// Gets or sets the single-character prefix for the extension (e.g., 'u' for Unicode extensions). + /// Creates an extension tag with a prefix and tags from an enumerable collection. /// - public char Prefix { get; internal set; } + /// The single-character prefix. + /// The extension subtag values. + public ExtensionTag(char prefix, IEnumerable tags) + : this(prefix, [.. tags]) { } /// - /// Gets the list of extension subtag values. + /// Creates an empty extension tag. /// - public ImmutableArray Tags => [.. _tags]; - internal List _tags { get; init; } + public ExtensionTag() + : this('\0', []) { } /// /// Converts this extension tag to its string representation. /// /// A string representation of the extension tag (e.g., "u-ca-buddhist"). - public override string ToString() => $"{Prefix}-{string.Join('-', _tags)}"; + public override string ToString() => + Tags.IsEmpty ? string.Empty : $"{Prefix}-{string.Join('-', Tags)}"; + + internal ExtensionTag Normalize() => + this with + { + Prefix = char.ToLowerInvariant(Prefix), + Tags = + [ + .. Tags.Select(t => t.ToLowerInvariant()).OrderBy(t => t, StringComparer.Ordinal), + ], + }; + + /// + /// Determines whether this instance is equal to another . + /// + /// The to compare with. + /// true if the extension tags are equal; otherwise, false. + public bool Equals(ExtensionTag? other) => + ReferenceEquals(this, other) + || ( + other is not null + && char.ToLowerInvariant(Prefix) == char.ToLowerInvariant(other.Prefix) + && Tags.SequenceEqual(other.Tags, StringComparer.OrdinalIgnoreCase) + ); + + /// + /// Returns the hash code for this extension tag. + /// + /// A hash code for the current extension tag. + public override int GetHashCode() + { + HashCode hashCode = new(); + hashCode.Add(char.ToLowerInvariant(Prefix)); + foreach (string tag in Tags) + { + hashCode.Add(tag, StringComparer.OrdinalIgnoreCase); + } + + return hashCode.ToHashCode(); + } } /// /// Represents a private use subtag in a language tag. /// -public class PrivateUseTag +/// The list of private use subtag values. +public sealed record PrivateUseTag(ImmutableArray Tags) { - internal PrivateUseTag() => _tags = []; - - internal PrivateUseTag(PrivateUseTag privateUseTag) - { - ArgumentNullException.ThrowIfNull(privateUseTag); - _tags = [.. privateUseTag._tags]; - } - /// /// The prefix character for private use subtags ('x'). /// public const char Prefix = 'x'; /// - /// Gets the list of private use subtag values. + /// Creates a private use tag from an enumerable collection. /// - public ImmutableArray Tags => [.. _tags]; - internal List _tags { get; init; } + /// The private use subtag values. + public PrivateUseTag(IEnumerable tags) + : this([.. tags]) { } + + /// + /// Creates an empty private use tag. + /// + public PrivateUseTag() + : this([]) { } /// /// Converts this private use tag to its string representation. /// /// A string representation of the private use tag (e.g., "x-private"). - public override string ToString() => $"{Prefix}-{string.Join('-', _tags)}"; + public override string ToString() => + Tags.IsEmpty ? string.Empty : $"{Prefix}-{string.Join('-', Tags)}"; + + internal PrivateUseTag Normalize() => + this with + { + Tags = + [ + .. Tags.Select(t => t.ToLowerInvariant()).OrderBy(t => t, StringComparer.Ordinal), + ], + }; + + /// + /// Determines whether this instance is equal to another . + /// + /// The to compare with. + /// true if the private use tags are equal; otherwise, false. + public bool Equals(PrivateUseTag? other) => + ReferenceEquals(this, other) + || (other is not null && Tags.SequenceEqual(other.Tags, StringComparer.OrdinalIgnoreCase)); + + /// + /// Returns the hash code for this private use tag. + /// + /// A hash code for the current private use tag. + public override int GetHashCode() + { + HashCode hashCode = new(); + foreach (string tag in Tags) + { + hashCode.Add(tag, StringComparer.OrdinalIgnoreCase); + } + + return hashCode.ToHashCode(); + } } diff --git a/LanguageTags/LanguageTagBuilder.cs b/LanguageTags/LanguageTagBuilder.cs index 8296725..8e9fd78 100644 --- a/LanguageTags/LanguageTagBuilder.cs +++ b/LanguageTags/LanguageTagBuilder.cs @@ -1,12 +1,9 @@ -using System; -using System.Collections.Generic; - namespace ptr727.LanguageTags; /// /// Provides a fluent API for building RFC 5646 / BCP 47 language tags. /// -public class LanguageTagBuilder +public sealed class LanguageTagBuilder { private readonly LanguageTag _languageTag = new(); @@ -85,10 +82,17 @@ public LanguageTagBuilder VariantAddRange(IEnumerable values) /// The extension values. /// The builder instance for method chaining. /// Thrown when is null. + /// Thrown when is empty. public LanguageTagBuilder ExtensionAdd(char prefix, IEnumerable values) { ArgumentNullException.ThrowIfNull(values); - _languageTag._extensions.Add(new() { Prefix = prefix, _tags = [.. values] }); + ImmutableArray tags = [.. values]; + if (tags.IsEmpty) + { + throw new ArgumentException("Extension tags cannot be empty.", nameof(values)); + } + + _languageTag._extensions.Add(new ExtensionTag(prefix, tags)); return this; } @@ -99,7 +103,8 @@ public LanguageTagBuilder ExtensionAdd(char prefix, IEnumerable values) /// The builder instance for method chaining. public LanguageTagBuilder PrivateUseAdd(string value) { - _languageTag.PrivateUse._tags.Add(value); + List tags = [.. _languageTag.PrivateUse.Tags, value]; + _languageTag.PrivateUse = new PrivateUseTag(tags); return this; } @@ -112,7 +117,8 @@ public LanguageTagBuilder PrivateUseAdd(string value) public LanguageTagBuilder PrivateUseAddRange(IEnumerable values) { ArgumentNullException.ThrowIfNull(values); - _languageTag.PrivateUse._tags.AddRange(values); + List tags = [.. _languageTag.PrivateUse.Tags, .. values]; + _languageTag.PrivateUse = new PrivateUseTag(tags); return this; } @@ -127,4 +133,12 @@ public LanguageTagBuilder PrivateUseAddRange(IEnumerable values) /// /// A normalized or null if normalization fails. public LanguageTag? Normalize() => new LanguageTagParser().Normalize(_languageTag); + + /// + /// Builds and normalizes the constructed language tag according to RFC 5646 rules using the specified options. + /// + /// The options used to configure logging. + /// A normalized or null if normalization fails. + public LanguageTag? Normalize(Options? options) => + new LanguageTagParser(options).Normalize(_languageTag); } diff --git a/LanguageTags/LanguageTagParser.cs b/LanguageTags/LanguageTagParser.cs index 4bef535..e489098 100644 --- a/LanguageTags/LanguageTagParser.cs +++ b/LanguageTags/LanguageTagParser.cs @@ -1,7 +1,4 @@ -using System; -using System.Collections.Generic; using System.Diagnostics; -using System.Linq; namespace ptr727.LanguageTags; @@ -20,12 +17,16 @@ namespace ptr727.LanguageTags; // TODO: Implement subtag content validation by comparing values with the registry data -internal class LanguageTagParser +internal sealed class LanguageTagParser { + private readonly ILogger _logger; private readonly Rfc5646Data _rfc5646 = Rfc5646Data.Create(); private readonly List _tagList = []; private LanguageTag _languageTag = new(); + internal LanguageTagParser(Options? options = null) => + _logger = LogOptions.CreateLogger(options); + private string ParseGrandfathered(string languageTag) { // Grandfathered and Redundant Registrations @@ -52,7 +53,6 @@ .. _rfc5646.RecordList.Where(record => return languageTag; } -#pragma warning disable CA1308 private static void SetCase(LanguageTag languageTag) { // Language lowercase @@ -70,10 +70,9 @@ private static void SetCase(LanguageTag languageTag) // Script title case if (!string.IsNullOrEmpty(languageTag.Script)) { - languageTag.Script = - System.Globalization.CultureInfo.InvariantCulture.TextInfo.ToTitleCase( - languageTag.Script.ToLowerInvariant() - ); + languageTag.Script = CultureInfo.InvariantCulture.TextInfo.ToTitleCase( + languageTag.Script.ToLowerInvariant() + ); } // Region uppercase @@ -88,23 +87,15 @@ private static void SetCase(LanguageTag languageTag) languageTag._variants[i] = languageTag._variants[i].ToLowerInvariant(); } - // Extensions lowercase - foreach (ExtensionTag extension in languageTag._extensions) + // Extensions lowercase and normalize + for (int i = 0; i < languageTag._extensions.Count; i++) { - extension.Prefix = char.ToLowerInvariant(extension.Prefix); - for (int i = 0; i < extension._tags.Count; i++) - { - extension._tags[i] = extension._tags[i].ToLowerInvariant(); - } + languageTag._extensions[i] = languageTag._extensions[i].Normalize(); } - // Private use lowercase - for (int i = 0; i < languageTag.PrivateUse._tags.Count; i++) - { - languageTag.PrivateUse._tags[i] = languageTag.PrivateUse._tags[i].ToLowerInvariant(); - } + // Private use lowercase and normalize + languageTag.PrivateUse = languageTag.PrivateUse.Normalize(); } -#pragma warning restore CA1308 private static void Sort(LanguageTag languageTag) { @@ -114,9 +105,7 @@ private static void Sort(LanguageTag languageTag) // Sort extensions by prefix languageTag._extensions.Sort((x, y) => x.Prefix.CompareTo(y.Prefix)); - // Sort extensions and private use tags - languageTag._extensions.ForEach(extension => extension._tags.Sort()); - languageTag.PrivateUse._tags.Sort(); + // Note: Extension tags and private use tags are already sorted by Normalize() } private static bool ValidateLanguage(string tag) => @@ -301,7 +290,9 @@ private static bool ValidateExtensionPrefix(string tag) => private static bool ValidateExtension(string tag) => // 2 - 8 chars - !string.IsNullOrEmpty(tag) && tag.Length is >= 2 and <= 8; + !string.IsNullOrWhiteSpace(tag) + && tag.Length is >= 2 and <= 8 + && !tag.Any(char.IsWhiteSpace); private bool ParseExtension() { @@ -331,7 +322,7 @@ private bool ParseExtension() return false; } - ExtensionTag extensionTag = new() { Prefix = _tagList[0][0] }; + char prefix = _tagList[0][0]; _tagList.RemoveAt(0); // 1 or more tags remaining @@ -340,29 +331,32 @@ private bool ParseExtension() return false; } + // Collect tags for this extension + List extensionTags = []; + // 2 to 8 chars // Stop when no more tags match while (_tagList.Count > 0 && ValidateExtension(_tagList[0])) { // Tag may not repeat - if (extensionTag._tags.Contains(_tagList[0], StringComparer.OrdinalIgnoreCase)) + if (extensionTags.Contains(_tagList[0], StringComparer.OrdinalIgnoreCase)) { return false; } // Add extension tag - extensionTag._tags.Add(_tagList[0]); + extensionTags.Add(_tagList[0]); _tagList.RemoveAt(0); } // Must have some matches - if (extensionTag._tags.Count == 0) + if (extensionTags.Count == 0) { return false; } // Add extension tag - _languageTag._extensions.Add(extensionTag); + _languageTag._extensions.Add(new ExtensionTag(prefix, extensionTags)); } // Done @@ -394,7 +388,7 @@ private bool ParsePrivateUse() } // Prefix may not repeat - if (_languageTag.PrivateUse._tags.Count > 0) + if (!_languageTag.PrivateUse.Tags.IsEmpty) { return false; } @@ -408,6 +402,9 @@ private bool ParsePrivateUse() return false; } + // Collect all private use tags + List privateTags = []; + // Read all tags while (_tagList.Count > 0) { @@ -419,27 +416,25 @@ private bool ParsePrivateUse() } // Tag may not repeat - if ( - _languageTag.PrivateUse._tags.Contains( - _tagList[0], - StringComparer.OrdinalIgnoreCase - ) - ) + if (privateTags.Contains(_tagList[0], StringComparer.OrdinalIgnoreCase)) { return false; } // Add private use tag - _languageTag.PrivateUse._tags.Add(_tagList[0]); + privateTags.Add(_tagList[0]); _tagList.RemoveAt(0); } // Must have some matches - if (_languageTag.PrivateUse._tags.Count == 0) + if (privateTags.Count == 0) { return false; } + // Create private use tag + _languageTag.PrivateUse = new PrivateUseTag(privateTags); + // Done return true; } @@ -466,16 +461,19 @@ private bool ParsePrivateUse() // Init _languageTag = new(); _tagList.Clear(); + string originalTag = languageTag; // Must be non-empty if (string.IsNullOrEmpty(languageTag)) { + _logger.LogParseFailure(originalTag, "Tag is null or empty."); return null; } // Must be all ASCII if (languageTag.Any(c => !char.IsAscii(c))) { + _logger.LogParseFailure(originalTag, "Tag contains non-ASCII characters."); return null; } @@ -486,18 +484,21 @@ private bool ParsePrivateUse() _tagList.AddRange([.. languageTag.Split('-')]); if (_tagList.Count == 0) { + _logger.LogParseFailure(originalTag, "Tag split resulted in no segments."); return null; } // All parts must be non-empty if (_tagList.Any(string.IsNullOrEmpty)) { + _logger.LogParseFailure(originalTag, "Tag contains empty segments."); return null; } // Private use if (!ParsePrivateUse()) { + _logger.LogParseFailure(originalTag, "Invalid private use section."); return null; } if (_tagList.Count == 0) @@ -508,6 +509,7 @@ private bool ParsePrivateUse() // Language if (!ParseLanguage()) { + _logger.LogParseFailure(originalTag, "Invalid primary language subtag."); return null; } if (_tagList.Count == 0) @@ -518,6 +520,7 @@ private bool ParsePrivateUse() // Extended language if (!ParseExtendedLanguage()) { + _logger.LogParseFailure(originalTag, "Invalid extended language subtag."); return null; } if (_tagList.Count == 0) @@ -528,6 +531,7 @@ private bool ParsePrivateUse() // Script if (!ParseScript()) { + _logger.LogParseFailure(originalTag, "Invalid script subtag."); return null; } if (_tagList.Count == 0) @@ -538,6 +542,7 @@ private bool ParsePrivateUse() // Region if (!ParseRegion()) { + _logger.LogParseFailure(originalTag, "Invalid region subtag."); return null; } if (_tagList.Count == 0) @@ -548,6 +553,7 @@ private bool ParsePrivateUse() // Variant if (!ParseVariant()) { + _logger.LogParseFailure(originalTag, "Invalid variant subtag."); return null; } if (_tagList.Count == 0) @@ -558,6 +564,7 @@ private bool ParsePrivateUse() // Extension if (!ParseExtension()) { + _logger.LogParseFailure(originalTag, "Invalid extension subtag."); return null; } if (_tagList.Count == 0) @@ -568,6 +575,7 @@ private bool ParsePrivateUse() // Private use if (!ParsePrivateUse()) { + _logger.LogParseFailure(originalTag, "Invalid private use subtag."); return null; } if (_tagList.Count == 0) @@ -576,6 +584,7 @@ private bool ParsePrivateUse() } // Should be done + _logger.LogParseFailure(originalTag, "Unexpected trailing segments."); return null; } @@ -595,6 +604,8 @@ private bool ParsePrivateUse() return null; } + string originalTag = languageTag.ToString(); + // Create a copy and do not modify the original LanguageTag normalizeTag = new(languageTag); @@ -733,6 +744,12 @@ .. _rfc5646.RecordList.Where(record => SetCase(normalizeTag); Sort(normalizeTag); + string normalizedTag = normalizeTag.ToString(); + if (!string.Equals(originalTag, normalizedTag, StringComparison.OrdinalIgnoreCase)) + { + _logger.LogNormalizedTag(originalTag, normalizedTag); + } + // Done return normalizeTag; } @@ -773,14 +790,15 @@ internal static bool Validate(LanguageTag languageTag) } if ( languageTag._extensions.Any(extension => - !ValidateExtensionPrefix(extension.Prefix.ToString()) - || extension._tags.Any(tag => !ValidateExtension(tag)) + extension.Tags.IsEmpty + || !ValidateExtensionPrefix(extension.Prefix.ToString()) + || extension.Tags.Any(tag => !ValidateExtension(tag)) ) ) { return false; } - if (languageTag.PrivateUse._tags.Any(tag => !ValidatePrivateUse(tag))) + if (languageTag.PrivateUse.Tags.Any(tag => !ValidatePrivateUse(tag))) { return false; } @@ -804,7 +822,7 @@ internal static bool Validate(LanguageTag languageTag) // No duplicate extensions per prefix if ( languageTag._extensions.Any(extension => - extension._tags.GroupBy(tag => tag).Any(group => group.Count() > 1) + extension.Tags.GroupBy(tag => tag).Any(group => group.Count() > 1) ) ) { @@ -812,7 +830,7 @@ internal static bool Validate(LanguageTag languageTag) } // No duplicate private use tags - if (languageTag.PrivateUse._tags.GroupBy(tag => tag).Any(group => group.Count() > 1)) + if (languageTag.PrivateUse.Tags.GroupBy(tag => tag).Any(group => group.Count() > 1)) { return false; } diff --git a/LanguageTags/LanguageTags.csproj b/LanguageTags/LanguageTags.csproj index 8bf1ac5..dabc6d9 100644 --- a/LanguageTags/LanguageTags.csproj +++ b/LanguageTags/LanguageTags.csproj @@ -34,6 +34,7 @@ $(NoWarn);1591 + @@ -43,17 +44,4 @@ - - - - diff --git a/LanguageTags/LogOptions.cs b/LanguageTags/LogOptions.cs new file mode 100644 index 0000000..cd6a85b --- /dev/null +++ b/LanguageTags/LogOptions.cs @@ -0,0 +1,121 @@ +using System.Threading; + +namespace ptr727.LanguageTags; + +/// +/// Provides global logging configuration for the library. +/// +public static class LogOptions +{ + private static ILoggerFactory s_loggerFactory = NullLoggerFactory.Instance; + private static ILogger s_logger = NullLogger.Instance; + + /// + /// Gets or sets the logger factory used to create category loggers. + /// + public static ILoggerFactory LoggerFactory + { + get => Volatile.Read(ref s_loggerFactory); + set => _ = Interlocked.Exchange(ref s_loggerFactory, value ?? NullLoggerFactory.Instance); + } + + /// + /// Gets or sets the global fallback logger. + /// + public static ILogger Logger + { + get => Volatile.Read(ref s_logger); + set => _ = Interlocked.Exchange(ref s_logger, value ?? NullLogger.Instance); + } + + /// + /// Creates a logger for the specified type using the current factory or fallback logger. + /// + /// The type used to derive the logger category. + /// The configured logger for the category. + public static ILogger CreateLogger() => CreateLogger(typeof(T).FullName ?? typeof(T).Name); + + /// + /// Creates a logger for the specified type using the provided options or global configuration. + /// + /// The type used to derive the logger category. + /// The options used to configure logging. + /// The configured logger for the category. + public static ILogger CreateLogger(Options? options) => + CreateLogger(typeof(T).FullName ?? typeof(T).Name, options); + + /// + /// Creates a logger for the specified category using the current factory or fallback logger. + /// + /// The category name for the logger. + /// The configured logger for the category. + public static ILogger CreateLogger(string categoryName) + { + ILoggerFactory loggerFactory = LoggerFactory; + return !ReferenceEquals(loggerFactory, NullLoggerFactory.Instance) + ? loggerFactory.CreateLogger(categoryName) + : Logger; + } + + /// + /// Creates a logger for the specified category using the provided options or global configuration. + /// + /// The category name for the logger. + /// The options used to configure logging. + /// The configured logger for the category. + public static ILogger CreateLogger(string categoryName, Options? options) => + options is null ? CreateLogger(categoryName) + : options.LoggerFactory is not null ? options.LoggerFactory.CreateLogger(categoryName) + : options.Logger is not null ? options.Logger + : CreateLogger(categoryName); + + /// + /// Configures the library to use the specified logger factory. + /// + /// The factory to use for new loggers. + public static void SetFactory(ILoggerFactory loggerFactory) => LoggerFactory = loggerFactory; + + /// + /// Attempts to configure the library to use the specified logger factory if none is set. + /// + /// The factory to use for new loggers. + /// + /// true when the factory was set because no factory was configured; otherwise, false. + /// + public static bool TrySetFactory(ILoggerFactory loggerFactory) + { + ILoggerFactory candidate = loggerFactory ?? NullLoggerFactory.Instance; + ILoggerFactory original = Interlocked.CompareExchange( + ref s_loggerFactory, + candidate, + NullLoggerFactory.Instance + ); + + return ReferenceEquals(original, NullLoggerFactory.Instance); + } + + /// + /// Configures the library to use the specified global logger. + /// + /// The logger used as the global fallback. + public static void SetLogger(ILogger logger) => Logger = logger; + + /// + /// Attempts to configure the library to use the specified global logger if none is set. + /// + /// The logger used as the global fallback. + /// + /// true when the logger was set because no logger was configured; otherwise, false. + /// + public static bool TrySetLogger(ILogger logger) + { + ILogger candidate = logger ?? NullLogger.Instance; + ILogger original = Interlocked.CompareExchange( + ref s_logger, + candidate, + NullLogger.Instance + ); + + return ReferenceEquals(original, NullLogger.Instance); + } +} diff --git a/LanguageTags/Options.cs b/LanguageTags/Options.cs new file mode 100644 index 0000000..1dcd641 --- /dev/null +++ b/LanguageTags/Options.cs @@ -0,0 +1,17 @@ +namespace ptr727.LanguageTags; + +/// +/// Options used to configure the library. +/// +public sealed class Options +{ + /// + /// Gets the logger factory used to create per-instance loggers. + /// + public ILoggerFactory? LoggerFactory { get; init; } + + /// + /// Gets the logger used by the library. + /// + public ILogger? Logger { get; init; } +} diff --git a/LanguageTags/Rfc5646Data.cs b/LanguageTags/Rfc5646Data.cs index 14bc786..95ee1ba 100644 --- a/LanguageTags/Rfc5646Data.cs +++ b/LanguageTags/Rfc5646Data.cs @@ -1,11 +1,4 @@ -using System; -using System.Collections.Generic; -using System.Collections.Immutable; -using System.Globalization; -using System.IO; -using System.Linq; -using System.Text; -using System.Text.Json; +using System.Runtime.CompilerServices; using System.Text.Json.Serialization; namespace ptr727.LanguageTags; @@ -13,19 +6,37 @@ namespace ptr727.LanguageTags; /// /// Provides access to RFC 5646 / BCP 47 language subtag registry data. /// -public partial class Rfc5646Data +public sealed partial class Rfc5646Data { internal const string DataUri = "https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry"; internal const string DataFileName = "rfc5646"; /// - /// Loads RFC 5646 data from a file. + /// Loads RFC 5646 data from a file asynchronously. /// /// The path to the data file. /// The loaded . /// Thrown when the file contains invalid data. - public static Rfc5646Data LoadData(string fileName) + public static Task LoadDataAsync(string fileName) => + LoadDataAsync(fileName, LogOptions.CreateLogger()); + + /// + /// Loads RFC 5646 data from a file asynchronously using the specified options. + /// + /// The path to the data file. + /// The options used to configure logging. + /// The loaded . + /// Thrown when the file contains invalid data. + public static Task LoadDataAsync(string fileName, Options? options) => + LoadDataAsync(fileName, LogOptions.CreateLogger(options)); + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + private static async Task LoadDataAsync(string fileName, ILogger logger) { // File Format // https://www.rfc-editor.org/rfc/rfc5646#section-3.1 @@ -33,124 +44,230 @@ public static Rfc5646Data LoadData(string fileName) // https://www.w3.org/International/articles/language-tags // https://datatracker.ietf.org/doc/html/draft-phillips-record-jar-02 - List recordList = []; - Parser parser = new(); - using StreamReader lineReader = new(File.OpenRead(fileName)); + try + { + List recordList = []; + Parser parser = new(); + await using FileStream fileStream = new( + fileName, + FileMode.Open, + FileAccess.Read, + FileShare.Read, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ); + using StreamReader lineReader = new(fileStream); - // First record is file date - _ = parser.ReadAttributes(lineReader); - DateOnly fileDate = parser.GetFileDate(); + // First record is file date + _ = await parser.ReadAttributesAsync(lineReader).ConfigureAwait(false); + DateOnly fileDate = parser.GetFileDate(); - // Read all record attributes separated by %% until EOF - while (parser.ReadAttributes(lineReader)) - { + // Read all record attributes separated by %% until EOF + while (await parser.ReadAttributesAsync(lineReader).ConfigureAwait(false)) + { + recordList.Add(parser.GetRecord()); + } recordList.Add(parser.GetRecord()); - } - recordList.Add(parser.GetRecord()); - return new Rfc5646Data { FileDate = fileDate, RecordList = [.. recordList] }; + if (recordList.Count == 0) + { + logger.LogDataLoadEmpty(nameof(Rfc5646Data), fileName); + } + + Rfc5646Data data = new() { FileDate = fileDate, RecordList = [.. recordList] }; + logger.LogDataLoaded(nameof(Rfc5646Data), fileName, data.RecordList.Length); + return data; + } + catch (Exception exception) + { + logger.LogDataLoadFailed(nameof(Rfc5646Data), fileName, exception); + throw; + } } /// - /// Loads RFC 5646 data from a JSON file. + /// Loads RFC 5646 data from a JSON file asynchronously. /// /// The path to the JSON file. - /// The loaded or null if deserialization fails. - public static Rfc5646Data? LoadJson(string fileName) => - JsonSerializer.Deserialize( - File.ReadAllText(fileName), - LanguageJsonContext.Default.Rfc5646Data - ); + /// + /// The loaded , or null when deserialization yields no data. + /// + /// Thrown when the file cannot be read. + /// Thrown when the JSON is invalid. + public static Task LoadJsonAsync(string fileName) => + LoadJsonAsync(fileName, LogOptions.CreateLogger()); - internal static void SaveJson(string fileName, Rfc5646Data rfc5646) => - File.WriteAllText( + /// + /// Loads RFC 5646 data from a JSON file asynchronously using the specified options. + /// + /// The path to the JSON file. + /// The options used to configure logging. + /// + /// The loaded , or null when deserialization yields no data. + /// + /// Thrown when the file cannot be read. + /// Thrown when the JSON is invalid. + public static Task LoadJsonAsync(string fileName, Options? options) => + LoadJsonAsync(fileName, LogOptions.CreateLogger(options)); + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + private static async Task LoadJsonAsync(string fileName, ILogger logger) + { + try + { + await using FileStream fileStream = new( + fileName, + FileMode.Open, + FileAccess.Read, + FileShare.Read, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ); + Rfc5646Data? data = await JsonSerializer + .DeserializeAsync(fileStream, LanguageJsonContext.Default.Rfc5646Data) + .ConfigureAwait(false); + if (data == null) + { + logger.LogDataLoadEmpty(nameof(Rfc5646Data), fileName); + } + else + { + logger.LogDataLoaded(nameof(Rfc5646Data), fileName, data.RecordList.Length); + } + + return data; + } + catch (Exception exception) + { + logger.LogDataLoadFailed(nameof(Rfc5646Data), fileName, exception); + throw; + } + } + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + internal static async Task SaveJsonAsync(string fileName, Rfc5646Data rfc5646) + { + await using FileStream fileStream = new( fileName, - JsonSerializer.Serialize(rfc5646, LanguageJsonContext.Default.Rfc5646Data) + FileMode.Create, + FileAccess.Write, + FileShare.None, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan ); + await JsonSerializer + .SerializeAsync(fileStream, rfc5646, LanguageJsonContext.Default.Rfc5646Data) + .ConfigureAwait(false); + } - internal static void GenCode(string fileName, Rfc5646Data rfc5646) + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Reliability", + "CA2007:Consider calling ConfigureAwait on the awaited task", + Justification = "https://github.com/dotnet/roslyn-analyzers/issues/7185" + )] + internal static async Task GenCodeAsync(string fileName, Rfc5646Data rfc5646) { ArgumentNullException.ThrowIfNull(rfc5646); - StringBuilder stringBuilder = new(); - _ = stringBuilder - .Append( - CultureInfo.InvariantCulture, - $$""" - using System; + StreamWriter writer = new( + new FileStream( + fileName, + FileMode.Create, + FileAccess.Write, + FileShare.None, + 4096, + FileOptions.Asynchronous | FileOptions.SequentialScan + ), + new UTF8Encoding(false) + ) + { + NewLine = "\r\n", + }; + await using ConfiguredAsyncDisposable writerScope = writer.ConfigureAwait(false); - namespace ptr727.LanguageTags; + ConfiguredTaskAwaitable WriteLineAsync(string value) => + writer.WriteLineAsync(value).ConfigureAwait(false); - /// - /// Provides access to RFC 5646 / BCP 47 language subtag registry data. - /// - public partial class Rfc5646Data - { - public static Rfc5646Data Create() => - new() - { - FileDate = {{LanguageSchema.GetCodeGenString(rfc5646.FileDate)}}, - RecordList = - [ - """ - ) - .Append("\r\n"); + await WriteLineAsync("namespace ptr727.LanguageTags;"); + await WriteLineAsync(string.Empty); + await WriteLineAsync("/// "); + await WriteLineAsync( + "/// Provides access to RFC 5646 / BCP 47 language subtag registry data." + ); + await WriteLineAsync("/// "); + await WriteLineAsync("public sealed partial class Rfc5646Data"); + await WriteLineAsync("{"); + await WriteLineAsync(" public static Rfc5646Data Create() =>"); + await WriteLineAsync(" new()"); + await WriteLineAsync(" {"); + await WriteLineAsync( + $" FileDate = {LanguageSchema.GetCodeGenString(rfc5646.FileDate)}," + ); + await WriteLineAsync(" RecordList ="); + await WriteLineAsync(" ["); foreach (Rfc5646Record record in rfc5646.RecordList) { - _ = stringBuilder - .Append( - CultureInfo.InvariantCulture, - $$""" - new() - { - Type = {{LanguageSchema.GetCodeGenString(record.Type)}}, - SubTag = {{LanguageSchema.GetCodeGenString(record.SubTag)}}, - Added = {{LanguageSchema.GetCodeGenString(record.Added)}}, - SuppressScript = {{LanguageSchema.GetCodeGenString( - record.SuppressScript - )}}, - Scope = {{LanguageSchema.GetCodeGenString(record.Scope)}}, - MacroLanguage = {{LanguageSchema.GetCodeGenString( - record.MacroLanguage - )}}, - Deprecated = {{LanguageSchema.GetCodeGenString( - record.Deprecated - )}}, - PreferredValue = {{LanguageSchema.GetCodeGenString( - record.PreferredValue - )}}, - Tag = {{LanguageSchema.GetCodeGenString(record.Tag)}}, - Description = {{LanguageSchema.GetCodeGenString( - record.Description - )}}, - Comments = {{LanguageSchema.GetCodeGenString( - record.Comments - )}}, - Prefix = {{LanguageSchema.GetCodeGenString(record.Prefix)}}, - }, - """ - ) - .Append("\r\n"); + await WriteLineAsync(" new()"); + await WriteLineAsync(" {"); + await WriteLineAsync( + $" Type = {LanguageSchema.GetCodeGenString(record.Type)}," + ); + await WriteLineAsync( + $" SubTag = {LanguageSchema.GetCodeGenString(record.SubTag)}," + ); + await WriteLineAsync( + $" Added = {LanguageSchema.GetCodeGenString(record.Added)}," + ); + await WriteLineAsync( + $" SuppressScript = {LanguageSchema.GetCodeGenString(record.SuppressScript)}," + ); + await WriteLineAsync( + $" Scope = {LanguageSchema.GetCodeGenString(record.Scope)}," + ); + await WriteLineAsync( + $" MacroLanguage = {LanguageSchema.GetCodeGenString(record.MacroLanguage)}," + ); + await WriteLineAsync( + $" Deprecated = {LanguageSchema.GetCodeGenString(record.Deprecated)}," + ); + await WriteLineAsync( + $" PreferredValue = {LanguageSchema.GetCodeGenString(record.PreferredValue)}," + ); + await WriteLineAsync( + $" Tag = {LanguageSchema.GetCodeGenString(record.Tag)}," + ); + await WriteLineAsync( + $" Description = {LanguageSchema.GetCodeGenString(record.Description)}," + ); + await WriteLineAsync( + $" Comments = {LanguageSchema.GetCodeGenString(record.Comments)}," + ); + await WriteLineAsync( + $" Prefix = {LanguageSchema.GetCodeGenString(record.Prefix)}," + ); + await WriteLineAsync(" },"); } - _ = stringBuilder - .Append( - """ - ], - }; - } - """ - ) - .Append("\r\n"); - LanguageSchema.WriteFile(fileName, stringBuilder.ToString()); + await WriteLineAsync(" ],"); + await WriteLineAsync(" };"); + await WriteLineAsync("}"); } internal sealed class Parser { private readonly List> _attributeList = []; + private string? _pendingLine; - public bool ReadAttributes(StreamReader lineReader) + public async Task ReadAttributesAsync(StreamReader lineReader) { // Read until %% or EOF _attributeList.Clear(); @@ -158,7 +275,7 @@ public bool ReadAttributes(StreamReader lineReader) while (true) { // Read next line - string? line = lineReader.ReadLine(); + string? line = await ReadLineAsync(lineReader).ConfigureAwait(false); if (string.IsNullOrEmpty(line)) { // End of file @@ -183,26 +300,20 @@ public bool ReadAttributes(StreamReader lineReader) // Peek at the next line an look for a space while (true) { - // There is no PeekLine(), so we only get 1 char look ahead - // -1 is EOF or error, else cast to Char - int peek = lineReader.Peek(); - if (peek == -1 || (char)peek != ' ') + // Read next line to check for multiline continuation + string? multiLine = await ReadLineAsync(lineReader).ConfigureAwait(false); + if (string.IsNullOrEmpty(multiLine)) { - // Done + _pendingLine = multiLine; break; } - // Append the next line to the current line - string? multiLine = lineReader.ReadLine(); - if ( - string.IsNullOrEmpty(multiLine) - || !multiLine.StartsWith(" ", StringComparison.Ordinal) - ) + if (!multiLine.StartsWith(" ", StringComparison.Ordinal)) { - throw new InvalidDataException( - $"Invalid data found in RFC 5646 record: {line}" - ); + _pendingLine = multiLine; + break; } + line = $"{line.Trim()} {multiLine.Trim()}"; } @@ -218,6 +329,18 @@ public bool ReadAttributes(StreamReader lineReader) return !eof; } + private async Task ReadLineAsync(StreamReader lineReader) + { + if (_pendingLine is not null) + { + string? line = _pendingLine; + _pendingLine = null; + return line; + } + + return await lineReader.ReadLineAsync().ConfigureAwait(false); + } + public Rfc5646Record GetRecord() { // Create a mutable tuple as placeholder @@ -375,10 +498,24 @@ private static Rfc5646Record.RecordScope ScopeFromString(string value) => /// The language tag, subtag, or description to search for. /// If true, searches in the description field; otherwise, only searches tags and subtags. /// The matching or null if not found. - public Rfc5646Record? Find(string? languageTag, bool includeDescription) + public Rfc5646Record? Find(string? languageTag, bool includeDescription) => + Find(languageTag, includeDescription, LogOptions.CreateLogger()); + + /// + /// Finds a language subtag record by tag, subtag, preferred value, or description using the specified options. + /// + /// The language tag, subtag, or description to search for. + /// If true, searches in the description field; otherwise, only searches tags and subtags. + /// The options used to configure logging. + /// The matching or null if not found. + public Rfc5646Record? Find(string? languageTag, bool includeDescription, Options? options) => + Find(languageTag, includeDescription, LogOptions.CreateLogger(options)); + + private Rfc5646Record? Find(string? languageTag, bool includeDescription, ILogger logger) { if (string.IsNullOrEmpty(languageTag)) { + logger.LogFindRecordNotFound(nameof(Rfc5646Data), languageTag, includeDescription); return null; } @@ -392,6 +529,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Rfc5646Data), languageTag, includeDescription); return record; } @@ -402,6 +540,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Rfc5646Data), languageTag, includeDescription); return record; } @@ -412,6 +551,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Rfc5646Data), languageTag, includeDescription); return record; } @@ -426,6 +566,7 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Rfc5646Data), languageTag, includeDescription); return record; } @@ -437,11 +578,13 @@ record = RecordList.FirstOrDefault(item => ); if (record != null) { + logger.LogFindRecordFound(nameof(Rfc5646Data), languageTag, includeDescription); return record; } } // Not found + logger.LogFindRecordNotFound(nameof(Rfc5646Data), languageTag, includeDescription); return null; } } @@ -449,7 +592,7 @@ record = RecordList.FirstOrDefault(item => /// /// Represents a record from the RFC 5646 / BCP 47 language subtag registry. /// -public record Rfc5646Record +public sealed record Rfc5646Record { /// /// Defines the type of language subtag record. diff --git a/LanguageTags/Rfc5646DataGen.cs b/LanguageTags/Rfc5646DataGen.cs index bea29bb..b0c3175 100644 --- a/LanguageTags/Rfc5646DataGen.cs +++ b/LanguageTags/Rfc5646DataGen.cs @@ -1,11 +1,9 @@ -using System; - namespace ptr727.LanguageTags; /// /// Provides access to RFC 5646 / BCP 47 language subtag registry data. /// -public partial class Rfc5646Data +public sealed partial class Rfc5646Data { public static Rfc5646Data Create() => new() diff --git a/LanguageTagsCreate/.editorconfig b/LanguageTagsCreate/.editorconfig new file mode 100644 index 0000000..797656b --- /dev/null +++ b/LanguageTagsCreate/.editorconfig @@ -0,0 +1,7 @@ +root = false + +# C# files +[*.cs] + +# Ignore types can be made internal +dotnet_diagnostic.CA1515.severity = none diff --git a/LanguageTagsCreate/AssemblyInfo.cs b/LanguageTagsCreate/AssemblyInfo.cs new file mode 100644 index 0000000..ac9e350 --- /dev/null +++ b/LanguageTagsCreate/AssemblyInfo.cs @@ -0,0 +1,45 @@ +using System.Reflection; +using System.Runtime.InteropServices; + +namespace ptr727.LanguageTags.Create; + +internal static class AssemblyInfo +{ + internal static string AppVersion => $"{AppName} : {FileVersion} ({BuildType})"; + + internal static string RuntimeVersion => + $"{RuntimeInformation.FrameworkDescription} : {RuntimeInformation.RuntimeIdentifier}"; + + internal static string BuildType => +#if DEBUG + "Debug"; +#else + "Release"; +#endif + + internal static string AppName => GetAssembly().GetName().Name ?? string.Empty; + + internal static string InformationalVersion => + // E.g. 1.2.3+abc123.abc123 + GetAssembly() + .GetCustomAttribute() + ?.InformationalVersion + ?? string.Empty; + + internal static string FileVersion => + // E.g. 1.2.3.4 + GetAssembly().GetCustomAttribute()?.Version + ?? string.Empty; + + internal static string ReleaseVersion => + // E.g. 1.2.3 part of 1.2.3+abc123.abc123 + // Use major.minor.build from informational version + InformationalVersion.Split('+', '-')[0]; + + private static Assembly GetAssembly() + { + Assembly? assembly = Assembly.GetEntryAssembly(); + assembly ??= Assembly.GetExecutingAssembly(); + return assembly; + } +} diff --git a/LanguageTagsCreate/CommandLine.cs b/LanguageTagsCreate/CommandLine.cs new file mode 100644 index 0000000..1501e0f --- /dev/null +++ b/LanguageTagsCreate/CommandLine.cs @@ -0,0 +1,106 @@ +using System.CommandLine; +using System.CommandLine.Parsing; + +namespace ptr727.LanguageTags.Create; + +internal sealed class CommandLine +{ + private readonly Option _logLevelOption = CreateLogLevelOption(); + private readonly Option _logFileOption = CreateLogFileOption(); + private readonly Option _logFileClearOption = CreateLogFileClearOption(); + private readonly Option _codePathOption = CreateCodePathOption(); + + private static readonly FrozenSet s_cliBypassList = FrozenSet.Create( + StringComparer.OrdinalIgnoreCase, + ["--help", "--version"] + ); + + internal CommandLine(string[] args) + { + Root = CreateRootCommand(); + Result = Root.Parse(args); + } + + internal RootCommand Root { get; init; } + internal ParseResult Result { get; init; } + + internal RootCommand CreateRootCommand() + { + RootCommand rootCommand = new("Download and generate language tag data and code") + { + _logLevelOption, + _logFileOption, + _logFileClearOption, + _codePathOption, + }; + rootCommand.SetAction( + (parseResult, cancellationToken) => + { + Program program = new(CreateOptions(parseResult), cancellationToken); + return program.ExecuteAsync(); + } + ); + + return rootCommand; + } + + internal Options CreateOptions(ParseResult parseResult) => + new() + { + LogOptions = new LoggerFactory.Options + { + Level = parseResult.GetValue(_logLevelOption), + File = parseResult.GetValue(_logFileOption) ?? string.Empty, + FileClear = parseResult.GetValue(_logFileClearOption), + }, + CodePath = parseResult.GetValue(_codePathOption)!, + }; + + private static Option CreateLogFileClearOption() => + new("--logfile-clear", "-c") + { + Description = "Clear the log file before writing (default: false).", + Recursive = true, + }; + + private static Option CreateLogLevelOption() => + new("--loglevel", "-l") + { + Description = "Set the log level (default: Information).", + DefaultValueFactory = _ => LogEventLevel.Information, + Recursive = true, + }; + + private static Option CreateLogFileOption() + { + Option option = new("--logfile", "-f") + { + Description = "Write logs to the specified file (optional).", + Recursive = true, + }; + return option.AcceptLegalFileNamesOnly(); + } + + private static Option CreateCodePathOption() + { + Option option = new("--codepath", "-p") + { + Description = "Path to the solution directory.", + Required = true, + }; + return option.AcceptExistingOnly(); + } + + internal static bool BypassStartup(ParseResult parseResult) => + parseResult.Errors.Count > 0 + || parseResult.CommandResult.Children.Any(symbolResult => + symbolResult is OptionResult optionResult + && s_cliBypassList.Contains(optionResult.Option.Name) + ); + + internal sealed class Options + { + internal required LoggerFactory.Options LogOptions { get; init; } + internal required DirectoryInfo CodePath { get; init; } + } +} diff --git a/LanguageTagsCreate/CreateTagData.cs b/LanguageTagsCreate/CreateTagData.cs new file mode 100644 index 0000000..fceea1e --- /dev/null +++ b/LanguageTagsCreate/CreateTagData.cs @@ -0,0 +1,129 @@ +using System.Runtime.CompilerServices; + +namespace ptr727.LanguageTags.Create; + +internal sealed class CreateTagData( + string dataDirectory, + string codeDirectory, + CancellationToken cancellationToken +) +{ + private Iso6392Data? _iso6392; + private string? _iso6392DataFile; + private string? _iso6392JsonFile; + private string? _iso6392CodeFile; + private Iso6393Data? _iso6393; + private string? _iso6393DataFile; + private string? _iso6393JsonFile; + private string? _iso6393CodeFile; + private Rfc5646Data? _rfc5646; + private string? _rfc5646DataFile; + private string? _rfc5646JsonFile; + private string? _rfc5646CodeFile; + + internal async Task DownloadDataAsync() + { + // Download all the data files + Log.Information("Downloading all language tag data files ..."); + + Log.Information("Downloading ISO 639-2 data ..."); + _iso6392DataFile = Path.Combine(dataDirectory, Iso6392Data.DataFileName); + await DownloadFileAsync(new Uri(Iso6392Data.DataUri), _iso6392DataFile) + .ConfigureAwait(false); + + Log.Information("Downloading ISO 639-3 data ..."); + _iso6393DataFile = Path.Combine(dataDirectory, Iso6393Data.DataFileName); + await DownloadFileAsync(new Uri(Iso6393Data.DataUri), _iso6393DataFile) + .ConfigureAwait(false); + + Log.Information("Downloading RFC 5646 data ..."); + _rfc5646DataFile = Path.Combine(dataDirectory, Rfc5646Data.DataFileName); + await DownloadFileAsync(new Uri(Rfc5646Data.DataUri), _rfc5646DataFile) + .ConfigureAwait(false); + + Log.Information("Language tag data files downloaded successfully."); + } + + internal async Task CreateJsonDataAsync() + { + ArgumentNullException.ThrowIfNull(_iso6392DataFile, nameof(_iso6392DataFile)); + ArgumentNullException.ThrowIfNull(_iso6393DataFile, nameof(_iso6393DataFile)); + ArgumentNullException.ThrowIfNull(_rfc5646DataFile, nameof(_rfc5646DataFile)); + + // Convert data files to JSON + Log.Information("Converting data files to JSON ..."); + + Log.Information("Converting ISO 639-2 data to JSON ..."); + _iso6392 = await Iso6392Data.LoadDataAsync(_iso6392DataFile).ConfigureAwait(false); + _iso6392JsonFile = Path.Combine(dataDirectory, Iso6392Data.DataFileName + ".json"); + Log.Information("Writing ISO 639-2 data to {JsonPath}", _iso6392JsonFile); + await Iso6392Data.SaveJsonAsync(_iso6392JsonFile, _iso6392).ConfigureAwait(false); + + Log.Information("Converting ISO 639-3 data to JSON ..."); + _iso6393 = await Iso6393Data.LoadDataAsync(_iso6393DataFile).ConfigureAwait(false); + _iso6393JsonFile = Path.Combine(dataDirectory, Iso6393Data.DataFileName + ".json"); + Log.Information("Writing ISO 639-3 data to {JsonPath}", _iso6393JsonFile); + await Iso6393Data.SaveJsonAsync(_iso6393JsonFile, _iso6393).ConfigureAwait(false); + + Log.Information("Converting RFC 5646 data to JSON ..."); + _rfc5646 = await Rfc5646Data.LoadDataAsync(_rfc5646DataFile).ConfigureAwait(false); + _rfc5646JsonFile = Path.Combine(dataDirectory, Rfc5646Data.DataFileName + ".json"); + Log.Information("Writing RFC 5646 data to {JsonPath}", _rfc5646JsonFile); + await Rfc5646Data.SaveJsonAsync(_rfc5646JsonFile, _rfc5646).ConfigureAwait(false); + + Log.Information("Data files converted to JSON successfully."); + } + + internal async Task GenerateCodeAsync() + { + ArgumentNullException.ThrowIfNull(_iso6392, nameof(_iso6392)); + ArgumentNullException.ThrowIfNull(_iso6393, nameof(_iso6393)); + ArgumentNullException.ThrowIfNull(_rfc5646, nameof(_rfc5646)); + + // Generate code files + Log.Information("Generating code files ..."); + + Log.Information("Generating ISO 639-2 code ..."); + _iso6392CodeFile = Path.Combine(codeDirectory, nameof(Iso6392Data) + "Gen.cs"); + Log.Information("Writing ISO 639-2 code to {CodePath}", _iso6392CodeFile); + await Iso6392Data.GenCodeAsync(_iso6392CodeFile, _iso6392).ConfigureAwait(false); + + Log.Information("Generating ISO 639-3 code ..."); + _iso6393CodeFile = Path.Combine(codeDirectory, nameof(Iso6393Data) + "Gen.cs"); + Log.Information("Writing ISO 639-3 code to {CodePath}", _iso6393CodeFile); + await Iso6393Data.GenCodeAsync(_iso6393CodeFile, _iso6393).ConfigureAwait(false); + + Log.Information("Generating RFC 5646 code ..."); + _rfc5646CodeFile = Path.Combine(codeDirectory, nameof(Rfc5646Data) + "Gen.cs"); + Log.Information("Writing RFC 5646 code to {CodePath}", _rfc5646CodeFile); + await Rfc5646Data.GenCodeAsync(_rfc5646CodeFile, _rfc5646).ConfigureAwait(false); + + Log.Information("Code files generated successfully."); + } + + private async Task DownloadFileAsync(Uri uri, string fileName) + { + ArgumentNullException.ThrowIfNull(uri, nameof(uri)); + ArgumentException.ThrowIfNullOrWhiteSpace(fileName, nameof(fileName)); + + Log.Information("Downloading \"{Uri}\" to \"{FileName}\" ...", uri.ToString(), fileName); + + Stream httpStream = await HttpClientFactory + .GetHttpClient() + .GetStreamAsync(uri, cancellationToken) + .ConfigureAwait(false); + await using ConfiguredAsyncDisposable httpStreamScope = httpStream.ConfigureAwait(false); + + FileStream fileStream = new( + fileName, + FileMode.Create, + FileAccess.Write, + FileShare.None, + 8192, + FileOptions.Asynchronous | FileOptions.SequentialScan + ); + await using ConfiguredAsyncDisposable fileStreamScope = fileStream.ConfigureAwait(false); + + await httpStream.CopyToAsync(fileStream, cancellationToken).ConfigureAwait(false); + } +} diff --git a/LanguageTagsCreate/Extensions.cs b/LanguageTagsCreate/Extensions.cs new file mode 100644 index 0000000..f55b44d --- /dev/null +++ b/LanguageTagsCreate/Extensions.cs @@ -0,0 +1,64 @@ +using System.Runtime.CompilerServices; + +namespace ptr727.LanguageTags.Create; + +internal static partial class LogExtensions +{ + extension(Serilog.ILogger logger) + { + internal bool LogAndPropagate( + Exception exception, + [CallerMemberName] string function = "unknown" + ) + { + logger.Error(exception, "{Function}", function); + return false; + } + + internal bool LogAndHandle( + Exception exception, + [CallerMemberName] string function = "unknown" + ) + { + logger.Error(exception, "{Function}", function); + return true; + } + + internal Serilog.ILogger LogOverrideContext() => logger.ForContext(); + } + + extension(Microsoft.Extensions.Logging.ILogger logger) + { + internal bool LogAndPropagate( + Exception exception, + [CallerMemberName] string function = "unknown" + ) + { + LogCatchException(logger, function, exception); + return false; + } + + internal bool LogAndHandle( + Exception exception, + [CallerMemberName] string function = "unknown" + ) + { + LogCatchException(logger, function, exception); + return true; + } + } + + [LoggerMessage(Message = "Exception in {Function}", Level = LogLevel.Error)] + internal static partial void LogCatchException( + this Microsoft.Extensions.Logging.ILogger logger, + string function, + Exception exception + ); + + [System.Diagnostics.CodeAnalysis.SuppressMessage( + "Design", + "CA1812:Avoid uninstantiated internal classes", + Justification = "Used as a type marker for Serilog context filtering" + )] + internal sealed class LogOverride; +} diff --git a/LanguageTagsCreate/GlobalUsings.cs b/LanguageTagsCreate/GlobalUsings.cs new file mode 100644 index 0000000..4dbb9d0 --- /dev/null +++ b/LanguageTagsCreate/GlobalUsings.cs @@ -0,0 +1,11 @@ +global using System; +global using System.Collections.Frozen; +global using System.Globalization; +global using System.IO; +global using System.Linq; +global using System.Net.Http; +global using System.Threading; +global using System.Threading.Tasks; +global using Microsoft.Extensions.Logging; +global using Serilog; +global using Serilog.Events; diff --git a/LanguageTagsCreate/HttpClientFactory.cs b/LanguageTagsCreate/HttpClientFactory.cs new file mode 100644 index 0000000..db37c50 --- /dev/null +++ b/LanguageTagsCreate/HttpClientFactory.cs @@ -0,0 +1,75 @@ +using System.Net.Http.Headers; +using Microsoft.Extensions.Http.Resilience; +using Polly; + +namespace ptr727.LanguageTags.Create; + +internal static class HttpClientFactory +{ + private static readonly Lazy s_httpClient = new(CreateHttpClient); + private static readonly Lazy s_resilienceHandler = new( + CreateResilienceHandler + ); + + internal static HttpClient GetHttpClient() => s_httpClient.Value; + + internal static ResilienceHandler GetResilienceHandler() => s_resilienceHandler.Value; + + private static ResilienceHandler CreateResilienceHandler() => + new( + new ResiliencePipelineBuilder() + .AddRetry( + new Polly.Retry.RetryStrategyOptions + { + MaxRetryAttempts = 3, + BackoffType = DelayBackoffType.Exponential, + UseJitter = true, + Delay = TimeSpan.FromSeconds(1), + MaxDelay = TimeSpan.FromSeconds(30), + ShouldHandle = args => + ValueTask.FromResult( + args.Outcome.Exception != null + || ( + args.Outcome.Result != null + && !args.Outcome.Result.IsSuccessStatusCode + ) + ), + } + ) + .AddCircuitBreaker( + new Polly.CircuitBreaker.CircuitBreakerStrategyOptions + { + FailureRatio = 0.2, + MinimumThroughput = 10, + SamplingDuration = TimeSpan.FromSeconds(60), + BreakDuration = TimeSpan.FromSeconds(30), + ShouldHandle = args => + ValueTask.FromResult( + args.Outcome.Exception != null + || ( + args.Outcome.Result != null + && !args.Outcome.Result.IsSuccessStatusCode + ) + ), + } + ) + .AddTimeout(TimeSpan.FromSeconds(30)) + .Build() + ) + { + InnerHandler = new SocketsHttpHandler + { + PooledConnectionLifetime = TimeSpan.FromMinutes(15), + PooledConnectionIdleTimeout = TimeSpan.FromMinutes(2), + }, + }; + + private static HttpClient CreateHttpClient() + { + HttpClient httpClient = new(GetResilienceHandler()) { Timeout = TimeSpan.FromSeconds(120) }; + httpClient.DefaultRequestHeaders.UserAgent.Add( + new ProductInfoHeaderValue(AssemblyInfo.AppName, AssemblyInfo.InformationalVersion) + ); + return httpClient; + } +} diff --git a/LanguageTagsCreate/LanguageTagsCreate.csproj b/LanguageTagsCreate/LanguageTagsCreate.csproj index 5bc4a59..2e172c3 100644 --- a/LanguageTagsCreate/LanguageTagsCreate.csproj +++ b/LanguageTagsCreate/LanguageTagsCreate.csproj @@ -1,34 +1,37 @@  latest-all - LanguageTagsCreate - 1.0.0.0 - Pieter Viljoen - Pieter Viljoen true - Pieter Viljoen - LanguageTags create utility - true - 1.0.0.0 - true - en + 1.0.0-pre + false enable Exe - ptr727.LanguageTags.Create - MIT - https://github.com/ptr727/LanguageTags - true - true + false ptr727.LanguageTags.Create - snupkg net10.0 - 1.0.0.0 + 1.0.0 + + + true + true + true - + + + + + + + + diff --git a/LanguageTagsCreate/LoggerFactory.cs b/LanguageTagsCreate/LoggerFactory.cs new file mode 100644 index 0000000..bf604a4 --- /dev/null +++ b/LanguageTagsCreate/LoggerFactory.cs @@ -0,0 +1,58 @@ +using Serilog.Debugging; +using Serilog.Extensions.Logging; +using Serilog.Sinks.SystemConsole.Themes; + +namespace ptr727.LanguageTags.Create; + +internal static class LoggerFactory +{ + private static readonly Lazy s_serilogLoggerFactory = new(() => + new SerilogLoggerFactory(Log.Logger, dispose: false) + ); + + internal static Serilog.ILogger Create(Options options) + { + // Enable Serilog debug output to the console + SelfLog.Enable(Console.Error); + LoggerConfiguration loggerConfiguration = new LoggerConfiguration() + .MinimumLevel.Is(options.Level) + .MinimumLevel.Override( + typeof(LogExtensions.LogOverride).FullName!, + LogEventLevel.Verbose + ) + .Enrich.WithThreadId() + .WriteTo.Console( + theme: AnsiConsoleTheme.Code, + formatProvider: CultureInfo.InvariantCulture + ); + + // Add file sink if logFile is specified + if (!string.IsNullOrEmpty(options.File)) + { + if (options.FileClear && File.Exists(options.File)) + { + File.Delete(options.File); + } + _ = loggerConfiguration.WriteTo.File( + options.File, + formatProvider: CultureInfo.InvariantCulture + ); + } + + // Create logger + Log.Logger = loggerConfiguration.CreateLogger(); + return Log.Logger; + } + + internal static ILoggerFactory CreateLoggerFactory() => s_serilogLoggerFactory.Value; + + internal static Microsoft.Extensions.Logging.ILogger CreateLogger(string categoryName) => + s_serilogLoggerFactory.Value.CreateLogger(categoryName); + + internal sealed class Options + { + internal required LogEventLevel Level { get; init; } + internal required string File { get; init; } + internal required bool FileClear { get; init; } + } +} diff --git a/LanguageTagsCreate/Program.cs b/LanguageTagsCreate/Program.cs index ee89824..c904236 100644 --- a/LanguageTagsCreate/Program.cs +++ b/LanguageTagsCreate/Program.cs @@ -1,154 +1,69 @@ -using System; -using System.Globalization; -using System.IO; -using System.Net.Http; -using System.Net.Http.Headers; -using System.Reflection; -using System.Threading.Tasks; -using Serilog; -using Serilog.Sinks.SystemConsole.Themes; - namespace ptr727.LanguageTags.Create; -internal static class Program +internal sealed class Program( + CommandLine.Options commandLineOptions, + CancellationToken cancellationToken +) { - private const string LanguageDataDirectory = "LanguageData"; - private const string LanguageTagsDirectory = "LanguageTags"; - - private static HttpClient? s_httpClient; + private const string DataDirectory = "LanguageData"; + private const string CodeDirectory = "LanguageTags"; - private static async Task DownloadFileAsync(Uri uri, string fileName) + internal static async Task Main(string[] args) { - Log.Information("Downloading \"{Uri}\" to \"{FileName}\" ...", uri.ToString(), fileName); - Stream httpStream = await GetHttpClient().GetStreamAsync(uri).ConfigureAwait(false); - await using (httpStream.ConfigureAwait(false)) - { - FileStream fileStream = File.Create(fileName); - await using (fileStream.ConfigureAwait(false)) - { - await httpStream.CopyToAsync(fileStream).ConfigureAwait(false); - } - } - } + // Parse commandline + CommandLine commandLine = new(args); - private static HttpClient GetHttpClient() - { - if (s_httpClient != null) + // Bypass startup for errors or help and version commands + if (CommandLine.BypassStartup(commandLine.Result)) { - return s_httpClient; + return await commandLine.Result.InvokeAsync().ConfigureAwait(false); } - s_httpClient = new() { Timeout = TimeSpan.FromSeconds(120) }; - s_httpClient.DefaultRequestHeaders.UserAgent.Add( - new ProductInfoHeaderValue( - Assembly.GetExecutingAssembly().GetName().Name!, - Assembly.GetExecutingAssembly().GetName().Version?.ToString() - ) - ); - return s_httpClient; + + // Create logger + _ = LoggerFactory.Create(commandLine.CreateOptions(commandLine.Result).LogOptions); + Log.Logger.LogOverrideContext().Information("Starting: {Args}", args); + + // Initialize library with static logger + LogOptions.SetFactory(LoggerFactory.CreateLoggerFactory()); + + // Invoke command + return await commandLine.Result.InvokeAsync().ConfigureAwait(false); } - internal static async Task Main(string[] args) + internal async Task ExecuteAsync() { - Log.Logger = new LoggerConfiguration() - .WriteTo.Console( - theme: AnsiConsoleTheme.Code, - formatProvider: CultureInfo.InvariantCulture - ) - .CreateLogger(); - - // args[0] : Root directory, defaults to entry assembly directory - string rootDirectory; - if (args.Length > 0) + try { - if (!Directory.Exists(args[0])) + // Data and code directories + string solutionDirectory = Path.GetFullPath(commandLineOptions.CodePath.FullName); + string dataDirectory = Path.Combine(solutionDirectory, DataDirectory); + string codeDirectory = Path.Combine(solutionDirectory, CodeDirectory); + if (!Directory.Exists(dataDirectory)) { - Log.Error("Directory does not exist: \"{Directory}\"", args[0]); + Log.Error("Data directory does not exist: {DataDirectory}", dataDirectory); + return 1; + } + if (!Directory.Exists(codeDirectory)) + { + Log.Error("Code directory does not exist: {CodeDirectory}", codeDirectory); return 1; } - rootDirectory = Path.GetFullPath(args[0]); - } - else - { - rootDirectory = Path.GetFullPath(AppContext.BaseDirectory); - } - - // Code directory - string codeDirectory = Path.Combine(rootDirectory, LanguageTagsDirectory); - if (!Directory.Exists(codeDirectory)) - { - Log.Error("Directory does not exist: \"{Directory}\"", codeDirectory); - return 1; - } - - // Data directory - string dataDirectory = Path.Combine(rootDirectory, LanguageDataDirectory); - Log.Information("Root directory: {RootDirectory}", rootDirectory); - if (!Directory.Exists(dataDirectory)) - { - Log.Information("Creating data directory: {DataDirectory}", dataDirectory); - _ = Directory.CreateDirectory(dataDirectory); - } - Log.Information("Data directory: {DataDirectory}", dataDirectory); - // Download all the data files - Log.Information("Downloading all language tag data files ..."); - Log.Information("Downloading ISO 639-2 data ..."); - await DownloadFileAsync( - new Uri(Iso6392Data.DataUri), - Path.Combine(dataDirectory, Iso6392Data.DataFileName) - ) - .ConfigureAwait(false); - Log.Information("Downloading ISO 639-3 data ..."); - await DownloadFileAsync( - new Uri(Iso6393Data.DataUri), - Path.Combine(dataDirectory, Iso6393Data.DataFileName) - ) - .ConfigureAwait(false); - Log.Information("Downloading RFC 5646 data ..."); - await DownloadFileAsync( - new Uri(Rfc5646Data.DataUri), - Path.Combine(dataDirectory, Rfc5646Data.DataFileName) - ) - .ConfigureAwait(false); - Log.Information("Language tag data files downloaded successfully."); + // Download data files + CreateTagData createTagData = new(dataDirectory, codeDirectory, cancellationToken); + await createTagData.DownloadDataAsync().ConfigureAwait(false); - // Convert data files to JSON - Log.Information("Converting data files to JSON ..."); - Log.Information("Converting ISO 639-2 data to JSON ..."); - Iso6392Data iso6392 = Iso6392Data.LoadData( - Path.Combine(dataDirectory, Iso6392Data.DataFileName) - ); - Iso6392Data.SaveJson( - Path.Combine(dataDirectory, Iso6392Data.DataFileName + ".json"), - iso6392 - ); - Log.Information("Converting ISO 639-3 data to JSON ..."); - Iso6393Data iso6393 = Iso6393Data.LoadData( - Path.Combine(dataDirectory, Iso6393Data.DataFileName) - ); - Iso6393Data.SaveJson( - Path.Combine(dataDirectory, Iso6393Data.DataFileName + ".json"), - iso6393 - ); - Log.Information("Converting RFC 5646 data to JSON ..."); - Rfc5646Data rfc5646 = Rfc5646Data.LoadData( - Path.Combine(dataDirectory, Rfc5646Data.DataFileName) - ); - Rfc5646Data.SaveJson( - Path.Combine(dataDirectory, Rfc5646Data.DataFileName + ".json"), - rfc5646 - ); - Log.Information("Data files converted to JSON successfully."); + // Convert data files to JSON + await createTagData.CreateJsonDataAsync().ConfigureAwait(false); - // Generate code files - Log.Information("Generating code files ..."); - Log.Information("Generating ISO 639-2 code ..."); - Iso6392Data.GenCode(Path.Combine(codeDirectory, nameof(Iso6392Data) + "Gen.cs"), iso6392); - Log.Information("Generating ISO 639-3 code ..."); - Iso6393Data.GenCode(Path.Combine(codeDirectory, nameof(Iso6393Data) + "Gen.cs"), iso6393); - Log.Information("Generating RFC 5646 code ..."); - Rfc5646Data.GenCode(Path.Combine(codeDirectory, nameof(Rfc5646Data) + "Gen.cs"), rfc5646); + // Generate code files + await createTagData.GenerateCodeAsync().ConfigureAwait(false); - return 0; + return 0; + } + catch (Exception ex) when (Log.Logger.LogAndHandle(ex)) + { + return 1; + } } } diff --git a/LanguageTagsTests/Fixture.cs b/LanguageTagsTests/Fixture.cs index 621b51b..5e35a36 100644 --- a/LanguageTagsTests/Fixture.cs +++ b/LanguageTagsTests/Fixture.cs @@ -1,8 +1,13 @@ -using System; -using System.IO; - namespace ptr727.LanguageTags.Tests; +[CollectionDefinition("DisableParallelDefinition", DisableParallelization = true)] +[System.Diagnostics.CodeAnalysis.SuppressMessage( + "Maintainability", + "CA1515:Consider making public types internal", + Justification = "https://xunit.net/docs/running-tests-in-parallel" +)] +public sealed class DisableParallelDefinition { } + internal static class Fixture // : IDisposable { // public void Dispose() => GC.SuppressFinalize(this); diff --git a/LanguageTagsTests/GlobalUsings.cs b/LanguageTagsTests/GlobalUsings.cs new file mode 100644 index 0000000..be16484 --- /dev/null +++ b/LanguageTagsTests/GlobalUsings.cs @@ -0,0 +1,6 @@ +global using System; +global using System.Collections.Generic; +global using System.IO; +global using System.Threading.Tasks; +global using AwesomeAssertions; +global using Xunit; diff --git a/LanguageTagsTests/Iso6392Tests.cs b/LanguageTagsTests/Iso6392Tests.cs index 209d069..d12b6bf 100644 --- a/LanguageTagsTests/Iso6392Tests.cs +++ b/LanguageTagsTests/Iso6392Tests.cs @@ -1,6 +1,3 @@ -using AwesomeAssertions; -using Xunit; - namespace ptr727.LanguageTags.Tests; public sealed class Iso6392Tests @@ -14,9 +11,9 @@ public void Create() } [Fact] - public void LoadData() + public async Task LoadData() { - Iso6392Data iso6392 = Iso6392Data.LoadData( + Iso6392Data iso6392 = await Iso6392Data.LoadDataAsync( Fixture.GetDataFilePath(Iso6392Data.DataFileName) ); _ = iso6392.Should().NotBeNull(); @@ -24,14 +21,37 @@ public void LoadData() } [Fact] - public void LoadJson() + public async Task LoadJson() { - Iso6392Data? iso6392 = Iso6392Data.LoadJson( + Iso6392Data? iso6392 = await Iso6392Data.LoadJsonAsync( Fixture.GetDataFilePath(Iso6392Data.DataFileName + ".json") ); _ = iso6392.Should().NotBeNull(); } + [Fact] + public async Task SaveJsonAsync_RoundTrip() + { + Iso6392Data iso6392 = Iso6392Data.Create(); + _ = iso6392.RecordList.Length.Should().BeGreaterThan(0); + + string tempFile = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.json"); + try + { + await Iso6392Data.SaveJsonAsync(tempFile, iso6392); + Iso6392Data? roundTrip = await Iso6392Data.LoadJsonAsync(tempFile); + _ = roundTrip.Should().NotBeNull(); + _ = roundTrip!.RecordList.Length.Should().Be(iso6392.RecordList.Length); + } + finally + { + if (File.Exists(tempFile)) + { + File.Delete(tempFile); + } + } + } + [Theory] [InlineData("afr", false, "Afrikaans")] [InlineData("af", false, "Afrikaans")] @@ -70,4 +90,14 @@ public void Find_Fail(string input) Iso6392Record? record = iso6392.Find(input, false); _ = record.Should().BeNull(); } + + [Theory] + [InlineData(null)] + [InlineData("")] + public void Find_NullOrEmpty_ReturnsNull(string? input) + { + Iso6392Data iso6392 = Iso6392Data.Create(); + Iso6392Record? record = iso6392.Find(input, false); + _ = record.Should().BeNull(); + } } diff --git a/LanguageTagsTests/Iso6393Tests.cs b/LanguageTagsTests/Iso6393Tests.cs index 774db8f..ecd914e 100644 --- a/LanguageTagsTests/Iso6393Tests.cs +++ b/LanguageTagsTests/Iso6393Tests.cs @@ -1,6 +1,3 @@ -using AwesomeAssertions; -using Xunit; - namespace ptr727.LanguageTags.Tests; public sealed class Iso6393Tests @@ -14,9 +11,9 @@ public void Create() } [Fact] - public void LoadData() + public async Task LoadData() { - Iso6393Data iso6393 = Iso6393Data.LoadData( + Iso6393Data iso6393 = await Iso6393Data.LoadDataAsync( Fixture.GetDataFilePath(Iso6393Data.DataFileName) ); _ = iso6393.Should().NotBeNull(); @@ -24,14 +21,37 @@ public void LoadData() } [Fact] - public void LoadJson() + public async Task LoadJson() { - Iso6393Data? iso6393 = Iso6393Data.LoadJson( + Iso6393Data? iso6393 = await Iso6393Data.LoadJsonAsync( Fixture.GetDataFilePath(Iso6393Data.DataFileName + ".json") ); _ = iso6393.Should().NotBeNull(); } + [Fact] + public async Task SaveJsonAsync_RoundTrip() + { + Iso6393Data iso6393 = Iso6393Data.Create(); + _ = iso6393.RecordList.Length.Should().BeGreaterThan(0); + + string tempFile = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.json"); + try + { + await Iso6393Data.SaveJsonAsync(tempFile, iso6393); + Iso6393Data? roundTrip = await Iso6393Data.LoadJsonAsync(tempFile); + _ = roundTrip.Should().NotBeNull(); + _ = roundTrip!.RecordList.Length.Should().Be(iso6393.RecordList.Length); + } + finally + { + if (File.Exists(tempFile)) + { + File.Delete(tempFile); + } + } + } + [Theory] [InlineData("afr", false, "Afrikaans")] [InlineData("af", false, "Afrikaans")] @@ -70,4 +90,14 @@ public void Find_Fail(string input) Iso6393Record? record = iso6393.Find(input, false); _ = record.Should().BeNull(); } + + [Theory] + [InlineData(null)] + [InlineData("")] + public void Find_NullOrEmpty_ReturnsNull(string? input) + { + Iso6393Data iso6393 = Iso6393Data.Create(); + Iso6393Record? record = iso6393.Find(input, false); + _ = record.Should().BeNull(); + } } diff --git a/LanguageTagsTests/LanguageLookupTests.cs b/LanguageTagsTests/LanguageLookupTests.cs index a6d6379..6cacae0 100644 --- a/LanguageTagsTests/LanguageLookupTests.cs +++ b/LanguageTagsTests/LanguageLookupTests.cs @@ -1,6 +1,3 @@ -using AwesomeAssertions; -using Xunit; - namespace ptr727.LanguageTags.Tests; public sealed class LanguageLookupTests @@ -64,4 +61,62 @@ public void IsMatch(string prefix, string tag, bool match) LanguageLookup languageLookup = new(); _ = languageLookup.IsMatch(prefix, tag).Should().Be(match); } + + [Theory] + [InlineData("en-US", "en-us", true)] + [InlineData("en-US", "EN-US", true)] + [InlineData("zh-Hans", "zh-hans", true)] + [InlineData("en-US", "en-GB", false)] + [InlineData("en", "fr", false)] + public void AreEquivalent_ComparesTagsCaseInsensitive( + string tag1, + string tag2, + bool expected + ) => _ = LanguageLookup.AreEquivalent(tag1, tag2).Should().Be(expected); + + [Theory] + [InlineData("en-latn-us", "en-US", true)] // Normalized tags match + [InlineData("zh-cmn-Hans-CN", "cmn-Hans-CN", true)] // Normalized tags match + [InlineData("en-US", "en-GB", false)] // Different regions + [InlineData("en", "fr", false)] // Different languages + public void AreEquivalentNormalized_NormalizesAndCompares( + string tag1, + string tag2, + bool expected + ) => _ = LanguageLookup.AreEquivalentNormalized(tag1, tag2).Should().Be(expected); + + [Fact] + public void Overrides_CanBeModified() + { + LanguageLookup languageLookup = new(); + _ = languageLookup.Overrides.Should().NotBeNull(); + _ = languageLookup.Overrides.Count.Should().Be(0); + + languageLookup.Overrides.Add(("custom_ietf", "custom_iso")); + _ = languageLookup.Overrides.Count.Should().Be(1); + } + + [Fact] + public void Undetermined_ConstantIsCorrect() => + _ = LanguageLookup.Undetermined.Should().Be("und"); + + [Fact] + public void IsMatch_ThrowsOnNullPrefix() + { + LanguageLookup languageLookup = new(); + _ = Assert + .Throws(() => languageLookup.IsMatch(null!, "en-US")) + .Should() + .NotBeNull(); + } + + [Fact] + public void IsMatch_ThrowsOnNullTag() + { + LanguageLookup languageLookup = new(); + _ = Assert + .Throws(() => languageLookup.IsMatch("en", null!)) + .Should() + .NotBeNull(); + } } diff --git a/LanguageTagsTests/LanguageTagBuilderTests.cs b/LanguageTagsTests/LanguageTagBuilderTests.cs index 3622740..1455ff8 100644 --- a/LanguageTagsTests/LanguageTagBuilderTests.cs +++ b/LanguageTagsTests/LanguageTagBuilderTests.cs @@ -1,6 +1,3 @@ -using AwesomeAssertions; -using Xunit; - namespace ptr727.LanguageTags.Tests; public sealed class LanguageTagBuilderTests @@ -94,6 +91,25 @@ public void Normalize_Pass() _ = languageTag.ToString().Should().Be("en-a-aaa-bbb-b-ccc-x-a-ccc"); } + [Fact] + public void Normalize_WithOptions_Pass() + { + Options options = new(); + + // en-Latn-GB-boont-r-extended-sequence-x-private + LanguageTag? languageTag = new LanguageTagBuilder() + .Language("en") + .Script("latn") + .Region("gb") + .VariantAdd("boont") + .ExtensionAdd('r', ["extended", "sequence"]) + .PrivateUseAdd("private") + .Normalize(options); + _ = languageTag.Should().NotBeNull(); + _ = languageTag!.Validate().Should().BeTrue(); + _ = languageTag.ToString().Should().Be("en-GB-boont-r-extended-sequence-x-private"); + } + [Fact] public void Build_Fail() { @@ -128,5 +144,63 @@ public void Build_Fail() // Extension prefix 1 char, not x languageTag = new LanguageTagBuilder().Language("en").ExtensionAdd('x', ["abcd"]).Build(); _ = languageTag.Validate().Should().BeFalse(); + + // Extension tags must not be whitespace + languageTag = new LanguageTagBuilder().Language("en").ExtensionAdd('a', [" "]).Build(); + _ = languageTag.Validate().Should().BeFalse(); + } + + [Fact] + public void VariantAddRange_AddsMultipleVariants() + { + LanguageTag languageTag = new LanguageTagBuilder() + .Language("en") + .VariantAddRange(["variant1", "variant2", "variant3"]) + .Build(); + + _ = languageTag.Variants.Length.Should().Be(3); + _ = languageTag.Variants[0].Should().Be("variant1"); + _ = languageTag.Variants[1].Should().Be("variant2"); + _ = languageTag.Variants[2].Should().Be("variant3"); + } + + [Fact] + public void VariantAddRange_ThrowsOnNull() + { + LanguageTagBuilder builder = new(); + _ = Assert + .Throws(() => builder.VariantAddRange(null!)) + .Should() + .NotBeNull(); + } + + [Fact] + public void ExtensionAdd_ThrowsOnNull() + { + LanguageTagBuilder builder = new(); + _ = Assert + .Throws(() => builder.ExtensionAdd('u', null!)) + .Should() + .NotBeNull(); + } + + [Fact] + public void ExtensionAdd_ThrowsOnEmpty() + { + LanguageTagBuilder builder = new(); + _ = Assert + .Throws(() => builder.ExtensionAdd('u', [])) + .Should() + .NotBeNull(); + } + + [Fact] + public void PrivateUseAddRange_ThrowsOnNull() + { + LanguageTagBuilder builder = new(); + _ = Assert + .Throws(() => builder.PrivateUseAddRange(null!)) + .Should() + .NotBeNull(); } } diff --git a/LanguageTagsTests/LanguageTagParserTests.cs b/LanguageTagsTests/LanguageTagParserTests.cs index 9f989cd..eb51c1c 100644 --- a/LanguageTagsTests/LanguageTagParserTests.cs +++ b/LanguageTagsTests/LanguageTagParserTests.cs @@ -1,6 +1,3 @@ -using AwesomeAssertions; -using Xunit; - namespace ptr727.LanguageTags.Tests; public class LanguageTagParserTests @@ -98,6 +95,7 @@ public void Normalize_Sort_Pass(string tag, string parsed) [InlineData("en-gb-abcde-abcde")] // Variant repeats [InlineData("en-gb-a-abcd-a-abcde")] // Extension prefix repeats [InlineData("en-gb-a-abcd-abcd")] // Extension tag repeats + [InlineData("en-a- ")] // Extension tag whitespace [InlineData("en-gb-x-abcd-x-abcd")] // Private prefix repeats [InlineData("en-gb-x-abcd-abcd")] // Private tag repeats public void Parse_Fail(string tag) => _ = new LanguageTagParser().Parse(tag).Should().BeNull(); diff --git a/LanguageTagsTests/LanguageTagTests.cs b/LanguageTagsTests/LanguageTagTests.cs index 0a578fd..bc30cd2 100644 --- a/LanguageTagsTests/LanguageTagTests.cs +++ b/LanguageTagsTests/LanguageTagTests.cs @@ -1,6 +1,4 @@ using System.Collections.Immutable; -using AwesomeAssertions; -using Xunit; namespace ptr727.LanguageTags.Tests; @@ -19,6 +17,20 @@ public void Parse_Static_Pass(string tag) _ = languageTag.ToString().Should().Be(tag); } + [Theory] + [InlineData("en-US")] + [InlineData("zh-Hans-CN")] + [InlineData("en-latn-gb-boont-r-extended-sequence-x-private")] + [InlineData("x-all-private")] + public void Parse_WithOptions_Pass(string tag) + { + Options options = new(); + LanguageTag? languageTag = LanguageTag.Parse(tag, options); + _ = languageTag.Should().NotBeNull(); + _ = languageTag!.Validate().Should().BeTrue(); + _ = languageTag.ToString().Should().Be(tag); + } + [Theory] [InlineData("")] // Empty string [InlineData("i")] // Too short @@ -33,6 +45,21 @@ public void Parse_Static_ReturnsNull(string tag) _ = languageTag.Should().BeNull(); } + [Theory] + [InlineData("")] // Empty string + [InlineData("i")] // Too short + [InlineData("abcdefghi")] // Too long + [InlineData("en--gb")] // Empty tag + [InlineData("en-€-extension")] // Non-ASCII + [InlineData("a-extension")] // Only start with x or grandfathered + [InlineData("en-gb-x")] // Private must have parts + public void Parse_WithOptions_ReturnsNull(string tag) + { + Options options = new(); + LanguageTag? languageTag = LanguageTag.Parse(tag, options); + _ = languageTag.Should().BeNull(); + } + [Theory] [InlineData("en-US")] [InlineData("zh-Hans-CN")] @@ -46,6 +73,20 @@ public void TryParse_Success(string tag) _ = languageTag.ToString().Should().Be(tag); } + [Theory] + [InlineData("en-US")] + [InlineData("zh-Hans-CN")] + [InlineData("en-latn-gb-boont-r-extended-sequence-x-private")] + public void TryParse_WithOptions_Success(string tag) + { + Options options = new(); + bool result = LanguageTag.TryParse(tag, out LanguageTag? languageTag, options); + _ = result.Should().BeTrue(); + _ = languageTag.Should().NotBeNull(); + _ = languageTag!.Validate().Should().BeTrue(); + _ = languageTag.ToString().Should().Be(tag); + } + [Theory] [InlineData("")] // Empty string [InlineData("i")] // Too short @@ -62,6 +103,23 @@ public void TryParse_Failure(string tag) _ = languageTag.Should().BeNull(); } + [Theory] + [InlineData("")] // Empty string + [InlineData("i")] // Too short + [InlineData("abcdefghi")] // Too long + [InlineData("en--gb")] // Empty tag + [InlineData("en-€-extension")] // Non-ASCII + [InlineData("a-extension")] // Only start with x or grandfathered + [InlineData("en-gb-x")] // Private must have parts + [InlineData("x")] // Private missing + public void TryParse_WithOptions_Failure(string tag) + { + Options options = new(); + bool result = LanguageTag.TryParse(tag, out LanguageTag? languageTag, options); + _ = result.Should().BeFalse(); + _ = languageTag.Should().BeNull(); + } + [Fact] public void CreateBuilder_Pass() { @@ -164,6 +222,28 @@ public void ParseAndNormalize_InvalidTag_ReturnsNull() _ = result.Should().BeNull(); } + [Theory] + [InlineData("en-latn-us", "en-US")] + [InlineData("zh-cmn-Hans-CN", "cmn-Hans-CN")] + public void ParseAndNormalize_WithOptions_ValidTag_ReturnsNormalized( + string tag, + string expected + ) + { + Options options = new(); + LanguageTag? result = LanguageTag.ParseAndNormalize(tag, options); + _ = result.Should().NotBeNull(); + _ = result!.ToString().Should().Be(expected); + } + + [Fact] + public void ParseAndNormalize_WithOptions_InvalidTag_ReturnsNull() + { + Options options = new(); + LanguageTag? result = LanguageTag.ParseAndNormalize("invalid-tag", options); + _ = result.Should().BeNull(); + } + [Fact] public void IsValid_Property_ValidTag_ReturnsTrue() { @@ -532,4 +612,88 @@ public void Validate_ComplexValidTag_ReturnsTrue() _ = tag.IsValid.Should().BeTrue(); _ = tag.Validate().Should().BeTrue(); } + + [Fact] + public void ExtensionTag_DefaultConstructor_CreatesEmptyTag() + { + ExtensionTag extension = new(); + _ = extension.Prefix.Should().Be('\0'); + _ = extension.Tags.IsEmpty.Should().BeTrue(); + } + + [Fact] + public void ExtensionTag_EmptyTags_ToStringReturnsEmpty() + { + ExtensionTag extension = new('u', []); + _ = extension.ToString().Should().Be(string.Empty); + } + + [Fact] + public void ExtensionTag_EnumerableConstructor_CreatesTag() + { + List tags = ["tag1", "tag2", "tag3"]; + ExtensionTag extension = new('u', tags); + + _ = extension.Prefix.Should().Be('u'); + _ = extension.Tags.Length.Should().Be(3); + _ = extension.Tags[0].Should().Be("tag1"); + _ = extension.Tags[1].Should().Be("tag2"); + _ = extension.Tags[2].Should().Be("tag3"); + } + + [Fact] + public void ExtensionTag_RecordEquality_WorksCorrectly() + { + ExtensionTag ext1 = new('u', ["ca", "buddhist"]); + ExtensionTag ext2 = new('U', ["CA", "BUDDHIST"]); + ExtensionTag ext3 = new('t', ["ca", "buddhist"]); + + _ = ext1.Equals(ext2).Should().BeTrue(); + _ = (ext1 == ext2).Should().BeTrue(); + _ = ext1.GetHashCode().Should().Be(ext2.GetHashCode()); + + _ = ext1.Equals(ext3).Should().BeFalse(); + _ = (ext1 != ext3).Should().BeTrue(); + } + + [Fact] + public void PrivateUseTag_DefaultConstructor_CreatesEmptyTag() + { + PrivateUseTag privateUse = new(); + _ = privateUse.Tags.IsEmpty.Should().BeTrue(); + _ = privateUse.ToString().Should().Be(string.Empty); + } + + [Fact] + public void PrivateUseTag_EnumerableConstructor_CreatesTag() + { + List tags = ["private1", "private2"]; + PrivateUseTag privateUse = new(tags); + + _ = privateUse.Tags.Length.Should().Be(2); + _ = privateUse.Tags[0].Should().Be("private1"); + _ = privateUse.Tags[1].Should().Be("private2"); + } + + [Fact] + public void PrivateUseTag_RecordEquality_WorksCorrectly() + { + PrivateUseTag priv1 = new(["private1", "private2"]); + PrivateUseTag priv2 = new(["PRIVATE1", "PRIVATE2"]); + PrivateUseTag priv3 = new(["other"]); + + _ = priv1.Equals(priv2).Should().BeTrue(); + _ = (priv1 == priv2).Should().BeTrue(); + _ = priv1.GetHashCode().Should().Be(priv2.GetHashCode()); + + _ = priv1.Equals(priv3).Should().BeFalse(); + _ = (priv1 != priv3).Should().BeTrue(); + } + + [Fact] + public void PrivateUseTag_EmptyTag_ToStringReturnsEmpty() + { + PrivateUseTag privateUse = new(); + _ = privateUse.ToString().Should().Be(string.Empty); + } } diff --git a/LanguageTagsTests/LanguageTagsTests.csproj b/LanguageTagsTests/LanguageTagsTests.csproj index 8e14fb6..e36b670 100644 --- a/LanguageTagsTests/LanguageTagsTests.csproj +++ b/LanguageTagsTests/LanguageTagsTests.csproj @@ -1,32 +1,16 @@ latest-all - LanguageTagsTests - 1.0.0.0 - Pieter Viljoen - Pieter Viljoen true - Pieter Viljoen - LanguageTags unit tests - true - 1.0.0.0 - false - true - en + false + true enable - ptr727.LanguageTags.Tests - MIT - https://github.com/ptr727/LanguageTags - true ptr727.LanguageTags.Tests - snupkg net10.0 - 1.0.0.0 - all diff --git a/LanguageTagsTests/LogOptionsTests.cs b/LanguageTagsTests/LogOptionsTests.cs new file mode 100644 index 0000000..df5806c --- /dev/null +++ b/LanguageTagsTests/LogOptionsTests.cs @@ -0,0 +1,258 @@ +using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Logging.Abstractions; + +namespace ptr727.LanguageTags.Tests; + +[Collection("DisableParallelDefinition")] +public sealed class LogOptionsTests +{ + [Fact] + public void CreateLogger_UsesFactory_WhenFactorySet() + { + ILoggerFactory originalFactory = LogOptions.LoggerFactory; + ILogger originalLogger = LogOptions.Logger; + using TestLoggerFactory testFactory = new(); + TestLogger testLogger = new(); + + try + { + LogOptions.LoggerFactory = testFactory; + LogOptions.Logger = testLogger; + + ILogger logger = LogOptions.CreateLogger("category"); + + _ = logger.Should().BeSameAs(testFactory.Logger); + _ = testFactory.LastCategory.Should().Be("category"); + } + finally + { + LogOptions.LoggerFactory = originalFactory; + LogOptions.Logger = originalLogger; + } + } + + [Fact] + public void CreateLogger_UsesLogger_WhenFactoryDefault() + { + ILoggerFactory originalFactory = LogOptions.LoggerFactory; + ILogger originalLogger = LogOptions.Logger; + TestLogger testLogger = new(); + + try + { + LogOptions.LoggerFactory = NullLoggerFactory.Instance; + LogOptions.Logger = testLogger; + + ILogger logger = LogOptions.CreateLogger("category"); + + _ = logger.Should().BeSameAs(testLogger); + } + finally + { + LogOptions.LoggerFactory = originalFactory; + LogOptions.Logger = originalLogger; + } + } + + [Fact] + public void CreateLogger_WithOptions_UsesOptionsFactory() + { + ILoggerFactory originalFactory = LogOptions.LoggerFactory; + ILogger originalLogger = LogOptions.Logger; + using TestLoggerFactory testFactory = new(); + TestLogger testLogger = new(); + Options options = new() { LoggerFactory = testFactory, Logger = testLogger }; + + try + { + LogOptions.LoggerFactory = NullLoggerFactory.Instance; + LogOptions.Logger = NullLogger.Instance; + + ILogger logger = LogOptions.CreateLogger("category", options); + + _ = logger.Should().BeSameAs(testFactory.Logger); + _ = testFactory.LastCategory.Should().Be("category"); + } + finally + { + LogOptions.LoggerFactory = originalFactory; + LogOptions.Logger = originalLogger; + } + } + + [Fact] + public void CreateLogger_WithOptions_UsesOptionsLoggerWhenNoFactory() + { + ILoggerFactory originalFactory = LogOptions.LoggerFactory; + ILogger originalLogger = LogOptions.Logger; + TestLogger testLogger = new(); + Options options = new() { Logger = testLogger }; + + try + { + using TestLoggerFactory testFactory = new(); + LogOptions.LoggerFactory = testFactory; + LogOptions.Logger = new TestLogger(); + + ILogger logger = LogOptions.CreateLogger("category", options); + + _ = logger.Should().BeSameAs(testLogger); + } + finally + { + LogOptions.LoggerFactory = originalFactory; + LogOptions.Logger = originalLogger; + } + } + + [Fact] + public void CreateLogger_WithOptions_FallsBackToGlobal() + { + ILoggerFactory originalFactory = LogOptions.LoggerFactory; + ILogger originalLogger = LogOptions.Logger; + using TestLoggerFactory testFactory = new(); + Options options = new(); + + try + { + LogOptions.LoggerFactory = testFactory; + LogOptions.Logger = new TestLogger(); + + ILogger logger = LogOptions.CreateLogger("category", options); + + _ = logger.Should().BeSameAs(testFactory.Logger); + _ = testFactory.LastCategory.Should().Be("category"); + } + finally + { + LogOptions.LoggerFactory = originalFactory; + LogOptions.Logger = originalLogger; + } + } + + [Fact] + public void TrySetFactory_WhenUnset_ReturnsTrueAndSets() + { + ILoggerFactory originalFactory = LogOptions.LoggerFactory; + using TestLoggerFactory testFactory = new(); + + try + { + LogOptions.LoggerFactory = NullLoggerFactory.Instance; + + bool result = LogOptions.TrySetFactory(testFactory); + + _ = result.Should().BeTrue(); + _ = LogOptions.LoggerFactory.Should().BeSameAs(testFactory); + } + finally + { + LogOptions.LoggerFactory = originalFactory; + } + } + + [Fact] + public void TrySetFactory_WhenAlreadySet_ReturnsFalseAndDoesNotOverwrite() + { + ILoggerFactory originalFactory = LogOptions.LoggerFactory; + using TestLoggerFactory testFactory = new(); + using TestLoggerFactory otherFactory = new(); + + try + { + LogOptions.LoggerFactory = testFactory; + + bool result = LogOptions.TrySetFactory(otherFactory); + + _ = result.Should().BeFalse(); + _ = LogOptions.LoggerFactory.Should().BeSameAs(testFactory); + } + finally + { + LogOptions.LoggerFactory = originalFactory; + } + } + + [Fact] + public void TrySetLogger_WhenUnset_ReturnsTrueAndSets() + { + ILogger originalLogger = LogOptions.Logger; + TestLogger testLogger = new(); + + try + { + LogOptions.Logger = NullLogger.Instance; + + bool result = LogOptions.TrySetLogger(testLogger); + + _ = result.Should().BeTrue(); + _ = LogOptions.Logger.Should().BeSameAs(testLogger); + } + finally + { + LogOptions.Logger = originalLogger; + } + } + + [Fact] + public void TrySetLogger_WhenAlreadySet_ReturnsFalseAndDoesNotOverwrite() + { + ILogger originalLogger = LogOptions.Logger; + TestLogger testLogger = new(); + TestLogger otherLogger = new(); + + try + { + LogOptions.Logger = testLogger; + + bool result = LogOptions.TrySetLogger(otherLogger); + + _ = result.Should().BeFalse(); + _ = LogOptions.Logger.Should().BeSameAs(testLogger); + } + finally + { + LogOptions.Logger = originalLogger; + } + } + + private sealed class TestLoggerFactory : ILoggerFactory + { + public ILogger Logger { get; } = new TestLogger(); + + public string? LastCategory { get; private set; } + + public void AddProvider(ILoggerProvider provider) { } + + public ILogger CreateLogger(string categoryName) + { + LastCategory = categoryName; + return Logger; + } + + public void Dispose() { } + } + + private sealed class TestLogger : ILogger + { + public IDisposable BeginScope(TState state) + where TState : notnull => NullScope.Instance; + + public bool IsEnabled(LogLevel logLevel) => true; + + public void Log( + LogLevel logLevel, + EventId eventId, + TState state, + Exception? exception, + Func formatter + ) { } + + private sealed class NullScope : IDisposable + { + public static readonly NullScope Instance = new(); + + public void Dispose() { } + } + } +} diff --git a/LanguageTagsTests/Rfc5646Tests.cs b/LanguageTagsTests/Rfc5646Tests.cs index 839b9c6..ca98999 100644 --- a/LanguageTagsTests/Rfc5646Tests.cs +++ b/LanguageTagsTests/Rfc5646Tests.cs @@ -1,7 +1,3 @@ -using System; -using AwesomeAssertions; -using Xunit; - namespace ptr727.LanguageTags.Tests; public class Rfc5646Tests @@ -16,9 +12,9 @@ public void Create() } [Fact] - public void LoadData() + public async Task LoadData() { - Rfc5646Data rfc5646 = Rfc5646Data.LoadData( + Rfc5646Data rfc5646 = await Rfc5646Data.LoadDataAsync( Fixture.GetDataFilePath(Rfc5646Data.DataFileName) ); _ = rfc5646.Should().NotBeNull(); @@ -26,15 +22,38 @@ public void LoadData() } [Fact] - public void LoadJson() + public async Task LoadJson() { - Rfc5646Data? rfc5646 = Rfc5646Data.LoadJson( + Rfc5646Data? rfc5646 = await Rfc5646Data.LoadJsonAsync( Fixture.GetDataFilePath(Rfc5646Data.DataFileName + ".json") ); _ = rfc5646.Should().NotBeNull(); _ = rfc5646.RecordList.Length.Should().BeGreaterThan(0); } + [Fact] + public async Task SaveJsonAsync_RoundTrip() + { + Rfc5646Data rfc5646 = Rfc5646Data.Create(); + _ = rfc5646.RecordList.Length.Should().BeGreaterThan(0); + + string tempFile = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.json"); + try + { + await Rfc5646Data.SaveJsonAsync(tempFile, rfc5646); + Rfc5646Data? roundTrip = await Rfc5646Data.LoadJsonAsync(tempFile); + _ = roundTrip.Should().NotBeNull(); + _ = roundTrip!.RecordList.Length.Should().Be(rfc5646.RecordList.Length); + } + finally + { + if (File.Exists(tempFile)) + { + File.Delete(tempFile); + } + } + } + [Theory] [InlineData("af", false, "Afrikaans")] [InlineData("zh", false, "Chinese")] @@ -80,4 +99,22 @@ public void Find_Fail(string input) Rfc5646Record? record = rfc5646.Find(input, false); _ = record.Should().BeNull(); } + + [Theory] + [InlineData(null)] + [InlineData("")] + public void Find_NullOrEmpty_ReturnsNull(string? input) + { + Rfc5646Data rfc5646 = Rfc5646Data.Create(); + Rfc5646Record? record = rfc5646.Find(input, false); + _ = record.Should().BeNull(); + } + + [Fact] + public void FileDate_IsSet() + { + Rfc5646Data rfc5646 = Rfc5646Data.Create(); + _ = rfc5646.FileDate.Should().NotBeNull(); + _ = rfc5646.FileDate.Should().HaveValue(); + } } diff --git a/README.md b/README.md index a1479f1..6c869c1 100644 --- a/README.md +++ b/README.md @@ -2,115 +2,124 @@ C# .NET library for ISO 639-2, ISO 639-3, RFC 5646 / BCP 47 language tags. -## Build Status - -Code and Pipeline is on [GitHub](https://github.com/ptr727/LanguageTags)\ -![GitHub Last Commit](https://img.shields.io/github/last-commit/ptr727/LanguageTags?logo=github)\ -![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/ptr727/LanguageTags/publish-release.yml?logo=github) - -## NuGet Package - -Packages published on [NuGet](https://www.nuget.org/packages/ptr727.LanguageTags/)\ -![NuGet](https://img.shields.io/nuget/v/ptr727.LanguageTags?logo=nuget) - -## Version History - -- v1.1: - - .NET 10 and AOT support. - - Refactored public surfaces to minimize internals exposure. -- v1.0: - - Initial standalone release. - -## Introduction - -This project serves two primary purposes: - -- Publishing ISO 639-2, ISO 639-3, RFC 5646 language tag records in JSON and C# format. -- Code for IETF BCP 47 language tag construction and parsing per the RFC 5646 semantic rules. - -Terminology clarification: - -- An IETF BCP 47 language tag is a standardized code that is used to identify human languages on the Internet. -- The tag structure is standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47. -- RFC 5646 defines the BCP 47 language tag syntax and semantic rules. -- The subtags are maintained by Internet Assigned Numbers Authority (IANA) Language Subtag Registry. -- ISO 639 is a standard for classifying languages and language groups, and is maintained by the International Organization for Standardization (ISO). -- RFC 5646 incorporates ISO 639, ISO 15924, ISO 3166, and UN M.49 codes as the foundation for its language tags. - -Note that the implemented language tag parsing and normalization logic may be incomplete or inaccurate. - -Refer to [Language Tag Libraries](#language-tag-libraries) for other known implementations.\ -Refer to [References](#references) for specification details. - -## Build Artifacts - -The build [tool](./LanguageTagsCreate) downloads language tag data files, converts them into JSON files for easy consumption, and generates C# classes with all the tags for direct use in code. - -- ISO 639-2: [Source](https://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt), [Data](./LanguageData/iso6392), [JSON](./LanguageData/iso6392.json), [Code](./LanguageTags/Iso6392DataGen.cs) -- ISO 639-3: [Source](https://iso639-3.sil.org/sites/iso639-3/files/downloads/iso-639-3.tab), [Data](./LanguageData/iso6393), [JSON](./LanguageData/iso6393.json), [Code](./LanguageTags/Iso6393DataGen.cs) -- RFC 5646 : [Source](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), [Data](./LanguageData/rfc5646), [JSON](./LanguageData/rfc5646.json), [Code](./LanguageTags/rfc5646DataGen.cs) - -The data files are [updated](./LanguageTagsCreate) weekly using a scheduled [actions job](./.github/workflows/update-languagedata.yml). +## Build and Distribution + +- **Source Code**: [GitHub][github-link] - Source code, issues, discussions, and CI/CD pipelines. +- **Versioned Releases**: [GitHub Releases][releases-link] - Version tagged source code and build artifacts. +- **NuGet Packages** [NuGet Packages][nuget-link] - .NET libraries published to NuGet.org. + +### Build Status + +[![Release Status][releasebuildstatus-shield]][actions-link]\ +[![Last Commit][lastcommit-shield]][commits-link]\ +[![Last Build][lastbuild-shield]][actions-link] + +### Releases + +[![GitHub Release][releaseversion-shield]][releases-link]\ +[![GitHub Pre-Release][prereleaseversion-shield]][releases-link]\ +[![NuGet Release][nugetreleaseversion-shield]][nuget-link]\ +[![NuGet Pre-Release][nugetprereleaseversion-shield]][nuget-link] + +### Release Notes + +**Version: 1.2**: + +**Summary**: + +- Refactored the project to follow standard patterns across other projects. +- IO APIs are now async-only (`LoadDataAsync`, `LoadJsonAsync`, `SaveJsonAsync`, `GenCodeAsync`). +- Added logging support for `ILogger` or `ILoggerFactory` per class instance or statically. + +See [Release History](./HISTORY.md) for complete release notes and older versions. + +## Getting Started + +Get started with LanguageTags in two easy steps: + +1. **Add LanguageTags to your project**: + + ```shell + # Add the package to your project + dotnet add package ptr727.LanguageTags + ``` + +2. **Write some code**: + + ```csharp + LanguageLookup languageLookup = new(); + string iso = languageLookup.GetIsoFromIetf("af"); // "afr" + iso = languageLookup.GetIsoFromIetf("zh-cmn-Hant"); // "chi" + iso = languageLookup.GetIsoFromIetf("cmn-Hant"); // "chi" + ``` + + ```csharp + LanguageTag languageTag = LanguageTag.CreateBuilder() + .Language("en") + .Script("latn") + .Region("gb") + .VariantAdd("boont") + .ExtensionAdd('r', ["extended", "sequence"]) + .PrivateUseAdd("private") + .Build(); + string tag = languageTag.ToString(); // "en-latn-gb-boont-r-extended-sequence-x-private" + ``` + +See [Usage](#usage) for detailed usage instructions. + +## Table of Contents + +- [LanguageTags](#languagetags) + - [Build and Distribution](#build-and-distribution) + - [Build Status](#build-status) + - [Releases](#releases) + - [Release Notes](#release-notes) + - [Getting Started](#getting-started) + - [Table of Contents](#table-of-contents) + - [Use Cases](#use-cases) + - [Usage](#usage) + - [Tag Lookup](#tag-lookup) + - [Tag Conversion](#tag-conversion) + - [Tag Matching](#tag-matching) + - [Tag Builder](#tag-builder) + - [Tag Parser](#tag-parser) + - [Tag Normalization](#tag-normalization) + - [Tag Validation](#tag-validation) + - [Installation](#installation) + - [Questions or Issues](#questions-or-issues) + - [Build Artifacts](#build-artifacts) + - [Tag Theory](#tag-theory) + - [Terminology](#terminology) + - [Format](#format) + - [References](#references) + - [Libraries](#libraries) + - [3rd Party Tools](#3rd-party-tools) + - [License](#license) + +## Use Cases + +> **ℹ️ TL;DR**: +> +> - Catalog of ISO 639-2, ISO 639-3, RFC 5646 language tags in JSON and C# record format. +> - Code for IETF BCP 47 language tag construction and parsing per the RFC 5646 semantic rules. +> +> **⚠️ Note**: The implemented language tag parsing and normalization logic may be incomplete or inaccurate. +> +> - Verify the results for your specific usage. +> - Refer to [Libraries](#libraries) for other known implementations. +> - Refer to [References](#references) for specification details. ## Usage -### Tag Format - -Refer to [RFC 5646 Section 2.1](https://www.rfc-editor.org/rfc/rfc5646#section-2.1) for complete language tag syntax and rules. - -IETF language tags are constructed from sub-tags in the form of: - -- Normal tags: - - `[Language]-[Extended language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]` - - Language: - - See [RFC 5646 Section 2.2.1](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.1) - - 2 - 3 alpha: Shortest ISO 639 code - - 4 alpha: Future use - - 5 - 8 alpha: Registered tag - - Extended language: - - See [RFC 5646 Section 2.2.2](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.2) - - 3 alpha: Reserved ISO 639 code - - Script: - - See [RFC 5646 Section 2.2.3](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.3) - - 4 alpha: [ISO 15924](https://unicode.org/iso15924/iso15924-codes.html) code - - Region: - - See [RFC 5646 Section 2.2.4](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.4) - - 2 alpha: [ISO 3166-1](https://en.wikipedia.org/wiki/ISO_3166-1) code - - 3 digit: [UN M.49](https://unstats.un.org/unsd/methodology/m49/) code - - Variant: - - See [RFC 5646 Section 2.2.5](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.5) - - 5 - 8 alphanumeric: Registered tag - - Extension: (`[singleton]-[extension]`) - - See [RFC 5646 Section 2.2.6](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.6) - - 1 alphanumeric: Singleton - - 2 - 8 alphanumeric: Extension - - Private Use: (`x-[private]`) - - See [RFC 5646 Section 2.2.7](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.7) - - `x`: Singleton - - 1 - 8 alphanumeric: Private use -- Grandfathered tags: - - See [RFC 5646 Section 2.2.8](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.8) - - Grandfathered tags are converted to current form tags - - E.g. `en-gb-oed` -> `en-GB-oxendict`, `i-klingon` -> `tlh`. -- Private use: - - All tags are private use - - `x-[private]-[private]` - -Examples: - -- `zh` : `[Language]` -- `zh-yue` : `[Language]-[Extended language]` -- `zh-yue-hk`: `[Language]-[Extended language]-[Region]` -- `hy-latn-it-arevela`: `[Language]-[Script]-[Region]-[Variant]` -- `en-a-bbb-x-a-ccc` : `[Language]-[Extension]-[Private Use]` -- `en-latn-gb-boont-r-extended-sequence-x-private` : `[Language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]` +> **ℹ️ Note**: Refer to the [Tag Theory](#tag-theory) section for an overview of terms and theory of operation. ### Tag Lookup -Tag records can be constructed by calling `Create()`, or loaded from data `LoadData()`, or loaded from JSON `LoadJson()`. The records and record collections are immutable and can safely be reused and shared across threads. +Tag records can be constructed by calling `Create()`, or loaded from data `LoadDataAsync()`, or loaded from JSON `LoadJsonAsync()`.\ +The records and record collections are immutable and can safely be reused and shared across threads. Each class implements a `Find(string languageTag, bool includeDescription)` method that will search all tags in all records for a matching tag.\ -This is mostly a convenience function and specific use cases should use specific tags. +This is mostly a convenience function, and specific use cases should use specific tags. ```csharp Iso6392Data iso6392 = Iso6392Data.Create(); @@ -123,7 +132,7 @@ record = iso6392.Find("zulu", true); ``` ```csharp -Iso6393Data iso6393 = Iso6393Data.LoadData("iso6393"); +Iso6393Data iso6393 = await Iso6393Data.LoadDataAsync("iso6393"); Iso6393Record? record = iso6393.Find("zh", false); // record.Id = "zho" // record.Part1 = "zh" @@ -134,7 +143,7 @@ record = iso6393.Find("yue chinese", true); ``` ```csharp -Rfc5646Data rfc5646 = Rfc5646Data.LoadJson("rfc5646.json"); +Rfc5646Data rfc5646 = await Rfc5646Data.LoadJsonAsync("rfc5646.json"); Rfc5646Record? record = rfc5646.Find("de", false); // record.SubTag = "de" // record.Description[0] = "German" @@ -167,10 +176,18 @@ iso = languageLookup.GetIsoFromIetf("cmn-Hant"); // "chi" ### Tag Matching -Tag matching can be used to select content based on preferred vs. available languages.\ -E.g. in HTTP [`Accept-Language`](https://www.rfc-editor.org/rfc/rfc9110.html#name-accept-language) and [`Content-Language`](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-language), or Matroska media stream [`LanguageIETF Element`](https://datatracker.ietf.org/doc/html/draft-ietf-cellar-matroska-07#name-language-codes). +Tag matching can be used to select content based on preferred vs. available languages. + +> **ℹ️ Examples**: +> +> - HTTP [`Accept-Language`][acceptlanguage-link] and [`Content-Language`](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-language). +> - Matroska media stream [`LanguageIETF Element`][matroskalanguage-link]. + +IETF language tags are in the form of: -IETF language tags are in the form of `[Language]-[Extended language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]`, and sub-tag matching happens left to right until a match is found. +> [Language]-[Extended language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use] + +Sub-tag matching happens left to right until a match is found. Examples: @@ -195,7 +212,8 @@ The `LanguageTagBuilder` class supports fluent builder style tag construction, a The `Build()` method will construct the tag, but will not perform any correctness validation or normalization.\ Use the `Validate()` method to test for shape correctness. See [Tag Validation](#tag-validation) for details. -The `Normalize()` method will build the tag and perform validation and normalization. See [Tag Normalization](#tag-normalization) for details. +The `Normalize()` method will build the tag and perform validation and normalization.\ +See [Tag Normalization](#tag-normalization) for details. ```csharp LanguageTag languageTag = LanguageTag.CreateBuilder() @@ -230,14 +248,15 @@ string tag = languageTag?.ToString(); // "arb-Latn-DE-foobar-nedis" ### Tag Parser -The `LanguageTag` class static `Parse()` method will parse the text form language tag and return a constructed `LanguageTag` object, or `null` in case of parsing failure. +The `LanguageTag` class static `Parse()` method will parse the text form language tag and return a constructed `LanguageTag` object, or `null` in case of a parsing failure. Parsing will validate all subtags for correctness in type, length, and position, but not value, and case will not be modified. Grandfathered tags will be converted to their current preferred form and parsed as such.\ E.g. `en-gb-oed` -> `en-GB-oxendict`, `i-klingon` -> `tlh`. -The `Normalize()` method will parse the text tag, and perform validation and normalization. See [Tag Normalization](#tag-normalization) for details. +The `Normalize()` method will parse the text tag, and perform validation and normalization.\ +See [Tag Normalization](#tag-normalization) for details. ```csharp LanguageTag? languageTag = LanguageTag.Parse("en-latn-gb-boont-r-extended-sequence-x-private"); @@ -262,8 +281,8 @@ string tag = languageTag?.ToString(); // "en-GB-oxendict" ### Tag Normalization -The `LanguageTag` instance `Normalize()` method will convert tags to their canonical form.\ -See [RFC 5646 Section 4.5 for details](https://www.rfc-editor.org/rfc/rfc5646#section-4.5) +The `Normalize()` method will convert tags to their canonical form.\ +See [RFC 5646 Section 4.5][rfc5646section45-link] for details. Normalization includes the following: @@ -309,14 +328,14 @@ string normalizedString = normalizedTag?.ToString(); // "arb-Latn-DE-foobar-nedi ### Tag Validation -The `LanguageTag` class `Validate()` method will verify subtags for correctness.\ -See [RFC 5646 Section 2.1](https://www.rfc-editor.org/rfc/rfc5646#section-2.1) and [RFC 5646 Section 2.2.9](https://www.rfc-editor.org/rfc/rfc5646#section-2.2.9) for details. Refer to [Tag Format](#tag-format) for a summary. +The `Validate()` method will verify subtags for correctness.\ +See [RFC 5646 Section 2.1][rfc5646section21-link] and [RFC 5646 Section 2.2.9][rfc5646section229-link] for details. -Note that `LanguageTag` objects created by `Parse()` or `Normalize()` are already verified for form correctness during parsing, and `Validate()` is primarily of use when using the `LanguageTagBuilder` `Build()` method directly. +Note that `LanguageTag` objects created by `Parse()` or `Normalize()` are already verified for form correctness during parsing, and `Validate()` is primarily of use when using the `LanguageTagBuilder.Build()` method directly. Validation includes the following: -- Subtag shape correctness, see [Tag Format](#tag-format) for a summary. +- Subtag shape correctness, see [Format](#format) for a summary. - No duplicate variants, extension prefixes, extension tags, or private tags. - No missing subtags. @@ -330,53 +349,284 @@ bool isValid = languageTag.Validate(); // true isValid = languageTag.IsValid; // true ``` -## Testing +## Installation + +**Project integration**: + +```shell +# Add the package to your project +dotnet add package ptr727.LanguageTags +``` + +```csharp +// Include the namespace +using ptr727.LanguageTags; +``` + +**Debug log configuration**: + +```csharp +// Configure global logging (static fallback) +using Microsoft.Extensions.Logging; +using ptr727.LanguageTags; +using Serilog; +using Serilog.Extensions.Logging; + +Log.Logger = new LoggerConfiguration() + .MinimumLevel.Debug() + .WriteTo.Debug() + .CreateLogger(); + +ILoggerFactory loggerFactory = new SerilogLoggerFactory(Log.Logger, dispose: true); +LogOptions.SetFactory(loggerFactory); +``` + +```csharp +// Configure per-call logging (instance logger or factory) +using Microsoft.Extensions.Logging; +using ptr727.LanguageTags; +using Serilog; +using Serilog.Extensions.Logging; + +Log.Logger = new LoggerConfiguration() + .MinimumLevel.Debug() + .WriteTo.Debug() + .CreateLogger(); + +ILoggerFactory loggerFactory = new SerilogLoggerFactory(Log.Logger, dispose: true); +Options options = new() { LoggerFactory = loggerFactory }; + +LanguageTag? tag = LanguageTag.Parse("en-US", options); +LanguageLookup lookup = new(options); +``` + +## Questions or Issues + +**Tag testing**: + +- The [BCP47 language subtag lookup][r12asubtags-link] site offers convenient tag parsing and validation capabilities. +- Refer to the [unit tests](./LanguageTagsTests) for examples, do note that tests may pass but not be complete or accurate per the RFC spec. + +**General questions**: + +- Use the [Discussions][discussions-link] forum for general questions. + +**Bug reports**: + +- Ask in the [Discussions][discussions-link] forum if you are not sure if it is a bug. +- Check the existing [Issues][issues-link] tracker for known problems. +- If the issue is unique and a bug, file it in [Issues][issues-link], and include all pertinent steps to reproduce the issue. + +## Build Artifacts + +**Build process and artifacts**: + +- **[`LanguageTagsCreate`](./LanguageTagsCreate) project**: + - Downloads language tag data files. + - Converts the tag data into JSON files. + - Generates C# records of the tags. +- **[`LanguageData`](./LanguageData/) directory**: + - ISO 639-2: [Source](https://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt), [Data](./LanguageData/iso6392), [JSON](./LanguageData/iso6392.json), [Code](./LanguageTags/Iso6392DataGen.cs) + - ISO 639-3: [Source](https://iso639-3.sil.org/sites/iso639-3/files/downloads/iso-639-3.tab), [Data](./LanguageData/iso6393), [JSON](./LanguageData/iso6393.json), [Code](./LanguageTags/Iso6393DataGen.cs) + - RFC 5646 : [Source](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), [Data](./LanguageData/rfc5646), [JSON](./LanguageData/rfc5646.json), [Code](./LanguageTags/rfc5646DataGen.cs) +- A weekly [GitHub Actions](./.github/workflows/run-periodic-codegen-pull-request.yml) job keeps the data files up to date and automatically publishes new releases. + +## Tag Theory + +> **ℹ️ Note**: Refer to [References](#references) for complete specification details. + +### Terminology + +**Brief overview of tag terms**: + +- An IETF BCP 47 language tag is a standardized code that is used to identify human languages on the Internet. +- The tag structure is standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47. +- RFC 5646 defines the BCP 47 language tag syntax and semantic rules. +- The subtags are maintained by Internet Assigned Numbers Authority (IANA) Language Subtag Registry. +- ISO 639 is a standard for classifying languages and language groups, and is maintained by the International Organization for Standardization (ISO). +- RFC 5646 incorporates ISO 639, ISO 15924, ISO 3166, and UN M.49 codes as the foundation for its language tags. + +### Format + +> **ℹ️ TL;DR**: IETF language tags are constructed from sub-tags with specific rules. +> +> **ℹ️ Note**: Refer to [RFC 5646 Section 2.1][rfc5646section21-link] for complete language tag syntax and rules. + +**Normal tags**: + +> [Language]-[Extended language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use] + +- Language: + - 2 - 3 alpha: Shortest ISO 639 code + - 4 alpha: Future use + - 5 - 8 alpha: Registered tag + - See [RFC 5646 Section 2.2.1][rfc5646section221-link] +- Extended language: + - 3 alpha: Reserved ISO 639 code + - See [RFC 5646 Section 2.2.2][rfc5646section222-link] +- Script: + - 4 alpha: [ISO 15924][iso15924-link] code + - See [RFC 5646 Section 2.2.3][rfc5646section223-link] +- Region: + - 2 alpha: [ISO 3166-1][iso31661-link] code + - 3 digit: [UN M.49][unm49-link] code + - See [RFC 5646 Section 2.2.4][rfc5646section224-link] +- Variant: + - 5 - 8 alphanumeric starting with letter: Registered tag + - 4 - 8 alphanumeric starting with digit: Registered tag + - See [RFC 5646 Section 2.2.5][rfc5646section225-link] +- Extension: (`[singleton]-[extension]`) + - 1 alphanumeric: Singleton + - 2 - 8 alphanumeric: Extension + - See [RFC 5646 Section 2.2.6][rfc5646section226-link] + +**Private use tags**: + +> x-[private] + +- `x`: Singleton +- 1 - 8 alphanumeric: Private use +- See [RFC 5646 Section 2.2.7][rfc5646section227-link] + +**Grandfathered tags**: + + > [grandfathered] + +- Grandfathered tags are converted to current form tags. +- E.g. `en-gb-oed` -> `en-GB-oxendict` +- E.g. `i-klingon` -> `tlh`. +- See [RFC 5646 Section 2.2.8][rfc5646section228-link] + +**Examples**: + +- `zh` : `[Language]` +- `zh-yue` : `[Language]-[Extended language]` +- `zh-yue-hk`: `[Language]-[Extended language]-[Region]` +- `hy-latn-it-arevela`: `[Language]-[Script]-[Region]-[Variant]` +- `en-a-bbb-x-a-ccc` : `[Language]-[Extension]-[Private Use]` +- `en-latn-gb-boont-r-extended-sequence-x-private` : `[Language]-[Script]-[Region]-[Variant]-[Extension]-[Private Use]` -The [BCP47 language subtag lookup](https://r12a.github.io/app-subtags/) site offers convenient tag parsing and validation capabilities. +### References -Refer to [unit tests](./LanguageTagsTests) for code validation.\ -Note that testing attests to the desired behavior in code, but the implemented functionality may not be complete or accurate per the RFC 5646 specification. +**References and documentation**: -## References +- [Wikipedia : Codes for constructed languages][wikipediacodes-link] +- [Wikipedia : IETF language tag][ietflanguagetag-link] +- [W3C : Choosing a Language Tag][w3cchoosingtag-link] +- [W3C : Language tags in HTML and XML][w3ctags-link] +- [W3C : BCP47 language subtag lookup][r12asubtags-link] +- [IANA : Language Subtags, Tag Extensions, and Tags][ianatags-link] +- [RFC : BCP47][bcp47-link] +- [RFC : 4647 : Matching of Language Tags][rfc4647-link] +- [RFC : 5646 : Tags for Identifying Languages][rfc5646-link] +- [Unicode Consortium : Unicode Common Locale Data Repository (CLDR) Project][cldr-link] +- [Library of Congress : ISO 639-2 Language Coding Agency][iso6392-link] +- [SIL International : ISO 639-3 Language Coding Agency][iso6393-link] -- [Wikipedia : Codes for constructed languages](https://en.wikipedia.org/wiki/Codes_for_constructed_languages) -- [Wikipedia : IETF language tag](https://en.wikipedia.org/wiki/IETF_language_tag) -- [W3C : Choosing a Language Tag](https://www.w3.org/International/questions/qa-choosing-language-tags) -- [W3C : Language tags in HTML and XML](https://www.w3.org/International/articles/language-tags/) -- [W3C : BCP47 language subtag lookup](https://r12a.github.io/app-subtags/) -- [IANA : Language Subtags, Tag Extensions, and Tags](https://www.iana.org/assignments/language-subtags-tags-extensions/language-subtags-tags-extensions.xhtml) -- [RFC : BCP47](https://www.rfc-editor.org/info/bcp47) -- [RFC : 4647 : Matching of Language Tags](https://www.rfc-editor.org/info/rfc4647) -- [RFC : 5646 : Tags for Identifying Languages](https://www.rfc-editor.org/info/rfc5646) -- [Unicode Consortium : Unicode Common Locale Data Repository (CLDR) Project](https://cldr.unicode.org/) -- [Library of Congress : ISO 639-2 Language Coding Agency](https://www.loc.gov/standards/iso639-2/) -- [SIL International : ISO 639-3 Language Coding Agency](https://iso639-3.sil.org/) +### Libraries -## Language Tag Libraries +**Other known language tag libraries**: -- [github.com/rspeer/langcodes](https://github.com/rspeer/langcodes) -- [github.com/oxigraph/oxilangtag](https://github.com/oxigraph/oxilangtag) -- [github.com/pyfisch/rust-language-tags/](https://github.com/pyfisch/rust-language-tags/) -- [github.com/DanSmith/languagetags-sharp](https://github.com/DanSmith/languagetags-sharp) -- [github.com/jkporter/bcp47](https://github.com/jkporter/bcp47) -- [github.com/mattcg/language-subtag-registry](https://github.com/mattcg/language-subtag-registry) +- [github.com/rspeer/langcodes][rspeerlangcodes-link] +- [github.com/oxigraph/oxilangtag][oxigraphoxilangtag-link] +- [github.com/pyfisch/rust-language-tags/][pyfischrustlanguagetags-link] +- [github.com/DanSmith/languagetags-sharp][dansmithlanguagetagssharp-link] +- [github.com/jkporter/bcp47][jkporterbcp47-link] +- [github.com/mattcg/language-subtag-registry][mattcglanguagesubtagregistry-link] ## 3rd Party Tools -- [AwesomeAssertions](https://awesomeassertions.org/) -- [Bring Your Own Badge](https://github.com/marketplace/actions/bring-your-own-badge) -- [CSharpier](https://csharpier.com/) -- [Create Pull Request](https://github.com/marketplace/actions/create-pull-request) -- [GH Release](https://github.com/marketplace/actions/gh-release) -- [Git Auto Commit](https://github.com/marketplace/actions/git-auto-commit) -- [GitHub Actions](https://github.com/actions) -- [GitHub Dependabot](https://github.com/dependabot) -- [Husky.Net](https://alirezanet.github.io/Husky.Net/) -- [Nerdbank.GitVersioning](https://github.com/marketplace/actions/nerdbank-gitversioning) -- [Serilog](https://serilog.net/) -- [xUnit.Net](https://xunit.net/) +**3rd party tools used in this project**: + +- [AwesomeAssertions][awesomeassertions-link] +- [Bring Your Own Badge][byob-link] +- [Create Pull Request][createpr-link] +- [CSharpier][csharpier-link] +- [GH Release][ghrelease-link] +- [Git Auto Commit][ghautocommit-link] +- [GitHub Actions][ghactions-link] +- [GitHub Dependabot][ghdependabot-link] +- [Husky.Net][huskynet-link] +- [Nerdbank.GitVersioning][nerbankgitversion-link] +- [Serilog][serilog-link] +- [xUnit.Net][xunit-link] ## License -Licensed under the [MIT License](./LICENSE)\ -![GitHub](https://img.shields.io/github/license/ptr727/LanguageTags) +Licensed under the [MIT License][license-link]\ +![GitHub License][license-shield] + + + +[github-link]: https://github.com/ptr727/LanguageTags +[actions-link]: https://github.com/ptr727/LanguageTags/actions +[discussions-link]: https://github.com/ptr727/LanguageTags/discussions +[commits-link]: https://github.com/ptr727/LanguageTags/commits/main +[issues-link]: https://github.com/ptr727/LanguageTags/issues +[releases-link]: https://github.com/ptr727/LanguageTags/releases + +[license-link]: ./LICENSE +[license-shield]: https://img.shields.io/github/license/ptr727/LanguageTags?label=License + +[lastbuild-shield]: https://byob.yarr.is/ptr727/LanguageTags/lastbuild +[lastcommit-shield]: https://img.shields.io/github/last-commit/ptr727/LanguageTags?logo=github&label=Last%20Commit + +[releaseversion-shield]: https://img.shields.io/github/v/release/ptr727/LanguageTags?logo=github&label=GitHub%20Release +[prereleaseversion-shield]: https://img.shields.io/github/v/release/ptr727/LanguageTags?include_prereleases&label=GitHub%20Pre-Release&logo=github +[releasebuildstatus-shield]: https://img.shields.io/github/actions/workflow/status/ptr727/LanguageTags/publish-release.yml?logo=github&label=Releases%20Build + +[nuget-link]: https://www.nuget.org/packages/ptr727.LanguageTags/ +[nugetreleaseversion-shield]: https://img.shields.io/nuget/v/ptr727.LanguageTags?logo=nuget&label=NuGet%20Release +[nugetprereleaseversion-shield]: https://img.shields.io/nuget/vpre/ptr727.LanguageTags?logo=nuget&&label=NuGet%20Pre-Release&color=orange + + + +[awesomeassertions-link]: https://awesomeassertions.org/ +[byob-link]: https://github.com/marketplace/actions/bring-your-own-badge +[createpr-link]: https://github.com/marketplace/actions/create-pull-request +[csharpier-link]: https://csharpier.com/ +[ghactions-link]: https://github.com/actions +[ghautocommit-link]: https://github.com/marketplace/actions/git-auto-commit +[ghdependabot-link]: https://github.com/dependabot +[ghrelease-link]: https://github.com/marketplace/actions/gh-release +[huskynet-link]: https://alirezanet.github.io/Husky.Net/ +[nerbankgitversion-link]: https://github.com/marketplace/actions/nerdbank-gitversioning +[serilog-link]: https://serilog.net/ +[xunit-link]: https://xunit.net/ + + + +[rfc5646section21-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.1 +[rfc5646section221-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.1 +[rfc5646section222-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.2 +[rfc5646section223-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.3 +[iso15924-link]: https://unicode.org/iso15924/iso15924-codes.html +[rfc5646section224-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.4 +[iso31661-link]: https://en.wikipedia.org/wiki/ISO_3166-1 +[unm49-link]: https://unstats.un.org/unsd/methodology/m49/ +[rfc5646section225-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.5 +[rfc5646section226-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.6 +[rfc5646section227-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.7 +[rfc5646section228-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.8 +[r12asubtags-link]: https://r12a.github.io/app-subtags/ +[wikipediacodes-link]: https://en.wikipedia.org/wiki/Codes_for_constructed_languages +[ietflanguagetag-link]: https://en.wikipedia.org/wiki/IETF_language_tag +[w3cchoosingtag-link]: https://www.w3.org/International/questions/qa-choosing-language-tags +[w3ctags-link]: https://www.w3.org/International/articles/language-tags/ +[ianatags-link]: https://www.iana.org/assignments/language-subtags-tags-extensions/language-subtags-tags-extensions.xhtml +[rfc4647-link]: https://www.rfc-editor.org/info/rfc4647 +[rfc5646-link]: https://www.rfc-editor.org/info/rfc5646 +[iso6392-link]: https://www.loc.gov/standards/iso639-2/ +[cldr-link]: https://cldr.unicode.org/ +[iso6393-link]: https://iso639-3.sil.org/ +[bcp47-link]: https://www.rfc-editor.org/info/bcp47 +[rspeerlangcodes-link]: https://github.com/rspeer/langcodes +[oxigraphoxilangtag-link]: https://github.com/oxigraph/oxilangtag +[pyfischrustlanguagetags-link]: https://github.com/pyfisch/rust-language-tags/ +[dansmithlanguagetagssharp-link]: https://github.com/DanSmith/languagetags-sharp +[jkporterbcp47-link]: https://github.com/jkporter/bcp47 +[mattcglanguagesubtagregistry-link]: https://github.com/mattcg/language-subtag-registry +[rfc5646section229-link]: https://www.rfc-editor.org/rfc/rfc5646#section-2.2.9 +[acceptlanguage-link]: https://www.rfc-editor.org/rfc/rfc9110.html#name-accept-language +[matroskalanguage-link]: https://datatracker.ietf.org/doc/html/draft-ietf-cellar-matroska-07#name-language-codes +[rfc5646section45-link]: https://www.rfc-editor.org/rfc/rfc5646#section-4.5 diff --git a/version.json b/version.json index 268e6f6..177e012 100644 --- a/version.json +++ b/version.json @@ -1,6 +1,6 @@ { "$schema": "https://raw.githubusercontent.com/dotnet/Nerdbank.GitVersioning/master/src/NerdBank.GitVersioning/version.schema.json", - "version": "1.1", + "version": "1.2", "publicReleaseRefSpec": [ "^refs/heads/main$" ],