Skip to content

Add layout logic to replace duplicate files with links (Linux/MacOS only)#52044

Merged
MichaelSimons merged 14 commits intodotnet:mainfrom
MichaelSimons:file-deduplication
Feb 4, 2026
Merged

Add layout logic to replace duplicate files with links (Linux/MacOS only)#52044
MichaelSimons merged 14 commits intodotnet:mainfrom
MichaelSimons:file-deduplication

Conversation

@MichaelSimons
Copy link
Copy Markdown
Member

@MichaelSimons MichaelSimons commented Dec 5, 2025

Summary

This PR reduces SDK size by deduplicating assemblies in the SDK layout using symbolic links on Linux and macOS. Duplicate .dll and .exe files are identified by content hash and replaced with symbolic links pointing to a single "master" copy.

This PR only affects the linux/mac tarballs. Installers will be handled in a separate PR.

Key Changes

  • New DeduplicateAssembliesWithLinks MSBuild task that replaces duplicate assemblies with relative symbolic links
  • End-to-end tests verifying archives and installers preserve symbolic links

Why Symbolic Links Instead of Hard Links?

Hard links would be preferred, but are blocked by dotnet/arcade#16453 which affects RPM packaging. Symbolic links work correctly with current packaging infrastructure until unblocked.

Why No Windows Support?

Windows deduplication requires signing infrastructure changes and is blocked by dotnet/arcade#16484. This will be addressed in #52182.

Related Issues

@MichaelSimons MichaelSimons force-pushed the file-deduplication branch 6 times, most recently from e9c827c to c244f53 Compare January 12, 2026 16:23
@MichaelSimons MichaelSimons force-pushed the file-deduplication branch 2 times, most recently from 8dde6ac to 7a5f9bc Compare January 12, 2026 20:43
@MichaelSimons MichaelSimons force-pushed the file-deduplication branch 4 times, most recently from 707bf9f to 8cdddbf Compare January 22, 2026 15:30
@MichaelSimons MichaelSimons changed the title [WIP] Add layout logic to replace duplicate files with hard links [WIP] Add layout logic to replace duplicate files with links (linux only) Jan 22, 2026
@MichaelSimons MichaelSimons force-pushed the file-deduplication branch 4 times, most recently from 6ca1b07 to 5f75bc1 Compare January 22, 2026 16:47
@MichaelSimons MichaelSimons marked this pull request as ready for review January 29, 2026 14:29
@MichaelSimons MichaelSimons requested review from a team and Copilot January 29, 2026 14:29
/// <summary>
/// Shared helpers for verifying symbolic links and extracting archives in tests.
/// </summary>
internal static class SymbolicLinkHelpers
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was factored into a shared helper so that it can be shared with the linux/max installer tests which are coming in a follow-up PR.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automated deduplication of assemblies in the SDK layout for Linux and macOS builds by replacing duplicate .dll and .exe files with symbolic links. The implementation uses content hashing (XxHash64) to identify duplicates and selects a deterministic "master" file (closest to root, alphabetically first) to which all duplicates are linked using relative symbolic links.

Changes:

  • New MSBuild task DeduplicateAssembliesWithLinks that scans a layout directory, groups assemblies by content hash, and replaces duplicates with links
  • End-to-end tests verifying that SDK archives contain the expected number of symbolic links with relative paths
  • Test infrastructure updates to locate and verify SDK acquisition artifacts

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Tasks/sdk-tasks/DeduplicateAssembliesWithLinks.cs New MSBuild task implementing assembly deduplication via hard/symbolic links
src/Tasks/sdk-tasks/sdk-tasks.csproj Added System.IO.Hashing package dependency for content hashing
src/Tasks/sdk-tasks/sdk-tasks.InTree.targets Registered the new DeduplicateAssembliesWithLinks task for Core runtime
src/Tasks/sdk-tasks/ReplaceFilesWithSymbolicLinks.cs Fixed comment to correctly state "symbolic links" instead of "hard links"
src/Layout/redist/targets/GenerateInstallerLayout.targets Integrated deduplication task into build pipeline for non-Windows platforms
test/sdk-tasks.Tests/DeduplicateAssembliesWithLinksTests.cs Comprehensive unit tests for the deduplication task
test/EndToEnd.Tests/GivenSdkArchives.cs End-to-end test verifying deduplication in SDK archives
test/EndToEnd.Tests/Utilities/SymbolicLinkHelpers.cs Shared utilities for extracting archives and verifying symbolic links
test/Microsoft.NET.TestFramework/TestContext.cs Added ShippingPackagesDirectory property and helper method to locate SDK artifacts

Comment thread test/Microsoft.NET.TestFramework/TestContext.cs
Comment thread src/Tasks/sdk-tasks/DeduplicateAssembliesWithLinks.cs Outdated
Comment thread test/sdk-tasks.Tests/DeduplicateAssembliesWithLinksTests.cs Outdated
Comment thread test/EndToEnd.Tests/GivenSdkArchives.cs Outdated
Comment thread test/Microsoft.NET.TestFramework/TestContext.cs Outdated
Comment thread src/Tasks/sdk-tasks/DeduplicateAssembliesWithLinks.cs Outdated
@MichaelSimons MichaelSimons changed the title [WIP] Add layout logic to replace duplicate files with links (Linux/MacOS only) Add layout logic to replace duplicate files with links (Linux/MacOS only) Jan 30, 2026
Comment thread src/Tasks/sdk-tasks/sdk-tasks.InTree.targets Outdated
Co-authored-by: Chet Husk <baronfel@users.noreply.github.com>
Copy link
Copy Markdown
Member

@dsplaisted dsplaisted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

Comment thread src/Layout/redist/targets/GenerateInstallerLayout.targets Outdated
Comment thread src/Tasks/sdk-tasks/ReplaceFilesWithSymbolicLinks.cs
Comment thread src/Tasks/sdk-tasks/sdk-tasks.InTree.targets
Comment thread src/Tasks/sdk-tasks/DeduplicateAssembliesWithLinks.cs Outdated
Comment thread src/Tasks/sdk-tasks/DeduplicateAssembliesWithLinks.cs Outdated
Comment thread test/sdk-tasks.Tests/DeduplicateAssembliesWithLinksTests.cs Outdated
@MichaelSimons MichaelSimons merged commit b664a8b into dotnet:main Feb 4, 2026
25 checks passed
@MichaelSimons MichaelSimons deleted the file-deduplication branch February 4, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Deduplicate SDK Files] Update SDK layout to utilize links for duplicate files

5 participants