diff --git a/text/0000-improved-binary-gem-support.md b/text/0000-improved-binary-gem-support.md new file mode 100644 index 0000000..d3da201 --- /dev/null +++ b/text/0000-improved-binary-gem-support.md @@ -0,0 +1,466 @@ +- Feature Name: Improved binary gem support in RubyGems +- Start Date: 2025-08-20 +- RFC PR: (leave this empty) +- Bundler Issue: (leave this empty) + +# Summary + +Add support to RubyGems for more fine-grained matching for binary gem distribution through ruby/api tags and platform tags (`gb-{abi_tags}-{platform_tags}`). This enables "platform matching" based on Ruby ABI version, os, os version, cpu architecture, and libc implementation (for generic linux platforms). + +# Motivation + +Binary gem distribution currently uses coarse-grained platform matching (e.g., "x86_64-linux") that cannot distinguish between: + +- Different Ruby language and ABI versions (Ruby 3.2 vs 3.3 native extension binaries are incompatible) +- Linux libc implementation versions (musl and glibc can introduce new symbols in new versions) + +This leads to native extension gems either failing at runtime, requiring compilation from source even when compatible binaries exist, or packaging _multiple_ extension binaries in a single gem (of which each installation will use only one). This proposal is heavily inspired by the [`wheel` format in the Python ecosystem](https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/) that addresses similar issues. + +# Guide-level explanation + +When installing gems with native extensions, RubyGems will now automatically select the most appropriate pre-compiled binary based on your exact Ruby version and platform: + +```bash +$ gem install nokogiri +# On CRuby 3.3, x86_64 Linux with glibc 2.28: +# Installs nokogiri-1.16.0-gb-cr33-glibclinux_2_28_x86_64.gem (pre-compiled) + +# On Ruby 3.2, same platform: +# Falls back to nokogiri-1.16.0.gem (compiles from source) +``` + +Gem authors can build platform-specific gems: + +```sh +gem build --platform "gb-cr33-x86_64_linux" # Specific to CRuby 3.3 on Linux x86_64 +``` + + +The new binary gem platform format follows the pattern `gb-{abi_tags}-{platform_tags}`: + +- `abi_tags`: Ruby ABI version (e.g., `rb33` for Ruby 3.3, `any` for pure Ruby, `cr33` for CRuby 3.3, `jr92` for JRuby 9.2, `tr241` for TruffleRuby 24.1) +- `platform_tags`: Platform specification (e.g., `x86_64_linux`, `glibclinux_2_28_x86_64`, `arm64_darwin_23`). Based on `Gem::Platform#to_s`. + +Multiple tags can be combined with dots to express compatibility with multiple versions: + +``` +gb-rb33.rb32-x86_64_linux # Works with Ruby 3.3 or 3.2 on Linux x86_64 +``` + +# Tag Reference + +It is important to note that tags on published gems are intrinsically strings, with no specific meaning attached to them. + +Clients will generate lists of _compatible_ ruby ABI & platform tags, and will match gems based on the gem's declared tags. This reference serves to demonstrate _examples_ of the tags that RubyGems will list as compatible in the initial implementation. + +Due to this approach to tag matching, it is impossible to exhaustively enumerate every possible tag. + +## Ruby ABI Tags + +The implementation supports all Ruby interpreters with version tagging, with the most common engines getting short two-letter abbreviations. + +The `suffix` referenced below is the `static` appended in `Gem.extension_api_version` when `ENABLE_SHARED=no`. + +### CRuby (MRI) +- **`RUBY_ENGINE`**: `ruby` +- **Format**: `cr{major}{minor}[_{suffix}]` +- **Examples**: `cr33`, `cr32`, `cr34_static` +- **Versioning**: Based on `RUBY_ENGINE_VERSION` (e.g., "3.3.1" → `cr33`) +- **Compatibility**: Matches CRuby major.minor versions +- **Caveat**: If the abi version does not have `0` as its patch version, the patch version will also be included + +### JRuby +- **`RUBY_ENGINE`**: `jruby` +- **Format**: `jr{major}{minor}[_{suffix}]` +- **Examples**: `jr92`, `jr93`, `jr94_static` +- **Versioning**: Based on JRuby engine version (e.g., "9.2.8.0" → `jr92`) +- **Compatibility**: Matches JRuby major.minor engine versions + +### TruffleRuby +- **`RUBY_ENGINE`**: `truffleruby` +- **Format**: `tr{major}{minor}[_{suffix}]` +- **Examples**: `tr241`, `tr250`, `tr241_static` +- **Versioning**: Based on TruffleRuby version (e.g., "24.1.0" → `tr241`) +- **Compatibility**: Matches TruffleRuby major.minor versions + +### Other Ruby Engines +- **Format**: `{engine}{major}{minor}[_{suffix}]` +- **Examples**: `mysupercoolnewruby10`, `mysupercoolnewruby11`, `mysupercoolnewruby20_static` + +### Generic Ruby Tags +- **Format**: `rb{major}[{minor}]` +- **Examples**: `rb33`, `rb32`, `rb3` (any Ruby 3.x) +- **Compatibility**: Cross-interpreter compatibility for gems that work across Ruby implementations. This represents compatibility _only_ with a specific `RUBY_VERSION`, and implies that all `RUBY_ENGINE`s are supported. +- **Caveat**: This represents more of an API than ABI, but it is simplest to call this set of tags "ABI tags" +- `rb` is plain ruby, not specific to any implementation, so matching against older versions than the current running ruby should be safe because you won’t have anything precompiled in there +- This tag alows moving the `Gem::Specification#required_ruby_version` into the platform information, so you could have a different gem for 3.3 vs 3.2. An example of where this could be helpful is if the dependencies needs to change between Ruby versions. + + +### Universal Tags + +- `any`: Compatible with any Ruby interpreter and version (pure Ruby gems that work on all ruby versions) + +## Platform Tags + +All existing `Gem::Platform`s are transformed into platform tags via `to_s.gsub(/[.-]/, "_")` + +### Darwin/macOS Platforms +- **Standard**: `{arch}_darwin_{version}` (e.g., `x86_64_darwin_22`, `arm64_darwin_23`) +- **Universal**: `universal_darwin_{version}` (fat binaries for multiple architectures) +- **Legacy**: `{arch}_darwin` (version-agnostic, lower priority) + +### Linux Platforms + +#### Standard Linux Tags +- **Format**: `{arch}_linux` (e.g., `x86_64_linux`, `aarch64_linux`, `i686_linux`) +- **Compatibility**: Works with any Linux distribution but requires compatible libc + +#### GlibcLinux Tags (glibc-based) + +- **Format**: `glibclinux_{glibc_major}_{glibc_minor}_{arch}` +- **Examples**: + - `glibclinux_2_28_x86_64` (glibc 2.28+, x86_64) + - `glibclinux_2_17_aarch64` (glibc 2.17+, ARM64) + - `glibclinux_2_35_i686` (glibc 2.35+, 32-bit x86) +- **Compatibility**: Forward-compatible with newer glibc versions +- **Caveat**: Only the `_{major}_{minor}` format from the PEP is supported. The legacy formats (`_1`, `_2010`, `_2014`) are not recognized. + +#### Musllinux Tags (musl-based) + +- **Format**: `musllinux_{musl_major}_{musl_minor}_{arch}` +- **Examples**: + - `musllinux_1_2_x86_64` (musl 1.2+, x86_64) + - `musllinux_1_1_aarch64` (musl 1.1+, ARM64) +- **Compatibility**: Forward-compatible with newer musl versions + +### Windows Platforms +- **Standard**: `{arch}_mingw32` (e.g., `x64_mingw32`, `x86_mingw32`) +- **MSVC**: `{arch}_mswin64` (e.g., `x64_mswin64`) + +### WebAssembly Platforms +- **WASI**: `wasm32_wasi` (WebAssembly System Interface) +- **Future WASM**: `wasm32_wasip1`, `wasm32_wasip2` (WASI preview versions) + +### Other Platforms +- **FreeBSD**: `{arch}_freebsd_{version}` (e.g., `x86_64_freebsd_14`) +- **OpenBSD**: `{arch}_openbsd_{version}` +- **NetBSD**: `{arch}_netbsd_{version}` +- **Solaris**: `{arch}_solaris_{version}` +- **AIX**: `{arch}_aix_{version}` + +### Universal Platform Tags +- `any`: Compatible with any platform (pure Ruby, no native extensions) + +## Examples + +### Multi-Engine Compatibility +``` +gb-cr33.jr93.tr241-x86_64_linux # CRuby 3.3, JRuby 9.3, or TruffleRuby 24.1 +gb-rb33-glibclinux_2_28_x86_64 # Any Ruby 3.3 implementation +gb-any-any # Universal compatibility +``` + +### Platform-Specific Builds +``` +gb-cr33-arm64_darwin_23 # CRuby 3.3 on macOS 14+ (Apple Silicon) +gb-jr93-x64_mingw32 # JRuby 9.3 on Windows x64 +gb-tr241-musllinux_1_2_x86_64 # TruffleRuby 24.1 on Alpine Linux +``` + +### Backward Compatibility +``` +gb-rb33.rb32.rb31-x86_64_linux # Ruby 3.1, 3.2, or 3.3 +gb-cr32-glibclinux_2_17_x86_64 # Any CRuby 3.2.x with old glibc +``` + +### Supported tag pairs on my machine + +From running `puts Gem::Platform::Specific.local; pp Gem::Platform::Specific.local.each_possible_match.to_a`, this is the (ordered) list of supported tag pairs. Pairs that appear earlier in the list are prefered to pairs later in the list. + +``` +arm64-darwin-24 v:1 engine:ruby engine_version:3.4.4 ruby_version:3.4.4 abi_version:3.4.0-static +[["cr34_static", "arm64_darwin_24"], + ["cr34_static", "universal_darwin_24"], + ["cr34_static", "arm64_darwin"], + ["cr34_static", "universal_darwin"], + ["cr34_static", "arm64_darwin_23"], + ["cr34_static", "universal_darwin_23"], + ["cr34_static", "arm64_darwin_22"], + ["cr34_static", "universal_darwin_22"], + ["cr34_static", "arm64_darwin_21"], + ["cr34_static", "universal_darwin_21"], + ["cr34_static", "arm64_darwin_20"], + ["cr34_static", "universal_darwin_20"], + ["cr34_static", "darwin"], + ["rb34", "arm64_darwin_24"], + ["rb34", "universal_darwin_24"], + ["rb34", "arm64_darwin"], + ["rb34", "universal_darwin"], + ["rb34", "arm64_darwin_23"], + ["rb34", "universal_darwin_23"], + ["rb34", "arm64_darwin_22"], + ["rb34", "universal_darwin_22"], + ["rb34", "arm64_darwin_21"], + ["rb34", "universal_darwin_21"], + ["rb34", "arm64_darwin_20"], + ["rb34", "universal_darwin_20"], + ["rb34", "darwin"], + ["rb3", "arm64_darwin_24"], + ["rb3", "universal_darwin_24"], + ["rb3", "arm64_darwin"], + ["rb3", "universal_darwin"], + ["rb3", "arm64_darwin_23"], + ["rb3", "universal_darwin_23"], + ["rb3", "arm64_darwin_22"], + ["rb3", "universal_darwin_22"], + ["rb3", "arm64_darwin_21"], + ["rb3", "universal_darwin_21"], + ["rb3", "arm64_darwin_20"], + ["rb3", "universal_darwin_20"], + ["rb3", "darwin"], + ["rb33", "arm64_darwin_24"], + ["rb33", "universal_darwin_24"], + ["rb33", "arm64_darwin"], + ["rb33", "universal_darwin"], + ["rb33", "arm64_darwin_23"], + ["rb33", "universal_darwin_23"], + ["rb33", "arm64_darwin_22"], + ["rb33", "universal_darwin_22"], + ["rb33", "arm64_darwin_21"], + ["rb33", "universal_darwin_21"], + ["rb33", "arm64_darwin_20"], + ["rb33", "universal_darwin_20"], + ["rb33", "darwin"], + ["rb32", "arm64_darwin_24"], + ["rb32", "universal_darwin_24"], + ["rb32", "arm64_darwin"], + ["rb32", "universal_darwin"], + ["rb32", "arm64_darwin_23"], + ["rb32", "universal_darwin_23"], + ["rb32", "arm64_darwin_22"], + ["rb32", "universal_darwin_22"], + ["rb32", "arm64_darwin_21"], + ["rb32", "universal_darwin_21"], + ["rb32", "arm64_darwin_20"], + ["rb32", "universal_darwin_20"], + ["rb32", "darwin"], + ["rb31", "arm64_darwin_24"], + ["rb31", "universal_darwin_24"], + ["rb31", "arm64_darwin"], + ["rb31", "universal_darwin"], + ["rb31", "arm64_darwin_23"], + ["rb31", "universal_darwin_23"], + ["rb31", "arm64_darwin_22"], + ["rb31", "universal_darwin_22"], + ["rb31", "arm64_darwin_21"], + ["rb31", "universal_darwin_21"], + ["rb31", "arm64_darwin_20"], + ["rb31", "universal_darwin_20"], + ["rb31", "darwin"], + ["rb30", "arm64_darwin_24"], + ["rb30", "universal_darwin_24"], + ["rb30", "arm64_darwin"], + ["rb30", "universal_darwin"], + ["rb30", "arm64_darwin_23"], + ["rb30", "universal_darwin_23"], + ["rb30", "arm64_darwin_22"], + ["rb30", "universal_darwin_22"], + ["rb30", "arm64_darwin_21"], + ["rb30", "universal_darwin_21"], + ["rb30", "arm64_darwin_20"], + ["rb30", "universal_darwin_20"], + ["rb30", "darwin"], + ["rb34", "any"], + ["rb3", "any"], + ["rb33", "any"], + ["rb32", "any"], + ["rb31", "any"], + ["rb30", "any"], + ["any", "arm64_darwin_24"], + ["any", "universal_darwin_24"], + ["any", "arm64_darwin"], + ["any", "universal_darwin"], + ["any", "arm64_darwin_23"], + ["any", "universal_darwin_23"], + ["any", "arm64_darwin_22"], + ["any", "universal_darwin_22"], + ["any", "arm64_darwin_21"], + ["any", "universal_darwin_21"], + ["any", "arm64_darwin_20"], + ["any", "universal_darwin_20"], + ["any", "darwin"], + ["any", "any"]] +``` + +# Reference-level explanation + +The implementation adds four new classes to `lib/rubygems/platform/`: + +**`Gem::Platform::GemBinary`** - Parses gem binary-format platform strings (`gb-{abi}-{platform}`) +- Handles multi-tag specifications (e.g., `rb33.rb32` for multiple Ruby versions) +- Provides matching logic against Ruby environments +- Expands tag combinations for compatibility checking + +**`Gem::Platform::Specific`** - Represents the current Ruby environment with precision +- Detects Ruby engine, version, and ABI version +- Identifies Linux libc type and version through ELF analysis +- Generates appropriate compatible gem binary tags for the environment + +**`Gem::Platform::ELFFile`** - Analyzes ELF binaries on Linux +- Reads interpreter path to identify musl vs glibc +- Uses `Etc::CS_GNU_LIBC_VERSION` or `ldd` to determine minimum glibc version +- Uses the musl interpreter to determine minimum musl version +- Provides platform detection without external dependencies + +**`Gem::Platform::GlibcLinux`/`Gem::Platform::MuslLinux`** - Linux compatibility standards +- Implements a pared down equivalent of the PEP 600 (manylinux) and PEP 656 (musllinux) specifications +- Maps glibc/musl versions to compatibility tags +- Allows for compact representation of cross-distribution compatible binaries + +## Platform Resolution Changes + +The platform matching system now uses `Gem::Platform::Specific.local` instead of `Gem::Platform.local` for more precise matching. The `sort_priority` method in `Gem::Platform` now assigns: + +1. Ruby platform: priority -1 (highest, for pure Ruby gems) +2. gem binary platforms: priority 2 (preferred for binary gems) +3. Traditional platforms: priority 1 (fallback for legacy gems) + +This ensures matching gem binary-platformed gems are preferred when available while maintaining backward compatibility. + +# Drawbacks + +- **Increased complexity** - The resolver must handle more platform variants, though this complexity is isolated to the new gem binary classes. +- **Storage requirements** - Platform-specific gems will increase repository storage needs on rubygems.org. +- **Build matrix explosion** - Gem authors targeting multiple platforms need to build many more gem variants. + +# Rationale and Alternatives + +The gem binary format was chosen because: + +- It's a proven solution in the Python ecosystem with years of real-world usage +- It provides the precision needed for Ruby ABI and libc compatibility +- The format is extensible for future platform requirements +- It maintains full backward compatibility with existing gems + +Alternatives considered: + +- **Extending the current platform system** - Would break backward compatibility for existing gems (e.g. loading specs or index files would blow up on old bundler/rubygems versions) +- **Runtime detection only** - Would fail after installation rather than selecting the right gem upfront + +# Unresolved questions + +- Should rubygems.org automatically build gem binary variants when gems with native extensions are pushed? +- What tooling should we provide to help gem authors build for multiple platforms? + +# Future Directions + +## Bundler Integration + +Initial testing reveals that Bundler's resolver needs updates to handle `Gem::Platform::GemBinary` objects: + +**Required Bundler Changes**: + +1. **Platform Object Handling** - Bundler's resolver currently expects `Gem::Platform` objects but receives `Gem::Platform::GemBinary` objects, causing `ArgumentError: invalid argument #`. The resolver needs to: + - Accept `Gem::Platform::GemBinary` in platform comparison methods + - Update `Bundler::Resolver#search_for` to handle gem binary platform matching + - Modify `Bundler::LazySpecification` platform validation + +2. **Lockfile Integration** - The `Bundler::LockfileGenerator` needs updates to: + - Serialize gem binary platforms correctly in gem specifications + - Record appropriate platforms in the PLATFORMS section + - Handle gem binary platform parsing when reading lockfiles + +3. **Platform Resolution Priority** - The resolver's `sort_by` logic in `search_for` uses `Gem::Platform.platform_specificity_match`, which needs gem binary platform awareness to properly prioritize gem binary gems over traditional platform gems. + +**PLATFORMS Section Recording** - The key unresolved question is how specific platform information should be recorded in the PLATFORMS section: + +```ruby +# Option 1: Record exact gem binary platforms (most specific) +PLATFORMS + gb-rb33-glibclinux_2_28_x86_64 + gb-rb33-x86_64_darwin23 + +# Option 2: Record specific platforms with Ruby version metadata +PLATFORMS + x86_64-linux (ruby 3.3.0) + x86_64-darwin23 (ruby 3.3.0) + +# Option 3: Hybrid approach with both formats +PLATFORMS + x86_64-linux + gb-rb33-glibclinux_2_28_x86_64 +``` + +The choice impacts lockfile portability - more specific platforms ensure exact binary reproducibility but may cause unnecessary gem recompilation when moving between similar environments. + +## Gemspec DSL for Platform-Specific Configuration + +Current gemspecs require Ruby-level conditionals to handle platform differences, leading to complex and error-prone code. A declarative DSL would improve maintainability and enable better tooling: + +**Current Approach** (using conditionals): + +```ruby +Gem::Specification.new do |spec| + if RUBY_PLATFORM =~ /linux/ + spec.add_dependency "linux-specific-gem" + spec.extensions << "ext/linux/extconf.rb" + elsif RUBY_PLATFORM =~ /darwin/ + spec.add_dependency "macos-specific-gem" + spec.extensions << "ext/darwin/extconf.rb" + end + + spec.files = Dir["lib/**/*"] + spec.files += Dir["ext/windows/**/*"] if RUBY_PLATFORM =~ /mingw/ +end +``` + +**Proposed DSL Approach**: + +```ruby +Gem::Specification.new do |spec| + spec.platform_specific platform: "x86_64_linux", abi: ["rb33", "rb32"] do |pl| + pl.add_runtime_dependency "linux-specific-gem" + pl.extensions << "ext/linux/extconf.rb" + pl.files.include "ext/linux/**/*" + end + + spec.platform_specific platform: ["x86_64_darwin", "arm64_darwin"], abi: "any" do |pl| + pl.add_runtime_dependency "macos-specific-gem" + pl.extensions << "ext/darwin/extconf.rb" + pl.files.include "ext/darwin/**/*" + end + + spec.platform_specific platform: "mingw32" do |pl| + pl.files.include "ext/windows/**/*" + pl.files.exclude "ext/linux/**/*", "ext/darwin/**/*" + end + + # Support glibclinux/musllinux tags + spec.platform_specific platform: ["glibclinux_2_28_x86_64", "musllinux_1_1_x86_64"] do |pl| + pl.add_runtime_dependency "linux-binary-compat" + end + + # Common files for all platforms + spec.files = Dir["lib/**/*"] +end +``` + +This DSL would: + +- Match against specific gem binary tag components (abi, platform) rather than string patterns +- Generate separate `.gemspec` files for each platform at build time +- Enable static analysis of platform requirements +- Support arrays for matching multiple tags +- Provide clear separation between common and platform-specific code + +**Build-Time Benefits** - The DSL would enable `gem build --platform=gb-cr33-x86_64_linux` to automatically select the correct dependencies, extensions, and files without executing platform detection code. Additionally, it would allow Bundler to support multiple platforms for a `gemspec` gem without clobbering the `Gemfile.lock`. + +## Additional Future Work + +**Automated Multi-Platform Building** - Provide GitHub Actions and CI templates for building gems across multiple Ruby versions and platforms automatically. + +## Inspiration + +- [PEP 600](https://peps.python.org/pep-0600/) +- [PEP 656](https://peps.python.org/pep-0656/) +- [Python packaging platform compatibility tags](https://packaging.python.org/en/latest/specifications/platform-compatibility-tags/)