Skip to content

Expand value interning optimization and fix XML factory transformer usage#2495

Merged
gnodet merged 1 commit intoapache:masterfrom
gnodet:feature/expand-value-interning
Jul 17, 2025
Merged

Expand value interning optimization and fix XML factory transformer usage#2495
gnodet merged 1 commit intoapache:masterfrom
gnodet:feature/expand-value-interning

Conversation

@gnodet
Copy link
Copy Markdown
Contributor

@gnodet gnodet commented Jun 18, 2025

Summary

This PR expands the value interning optimization in Maven's XML parsing and fixes transformer usage across all XML factories to improve memory efficiency during Maven builds.

Changes Made

1. Expanded InliningTransformer Coverage

Expanded the InliningTransformer in DefaultModelBuilder to intern 16 additional commonly repeated contexts (from 11 to 27 total):

Added contexts:

  • Dependency-related: type, classifier
  • Build and plugin-related: goal, execution
  • Common metadata: modelVersion, name, url, system, distribution, status
  • SCM fields: connection, developerConnection, tag
  • Common enum-like values: id, inherited, optional

2. Fixed XML Factory Transformer Usage

Fixed all XML factories to properly use the transformer from XmlReaderRequest:

  • DefaultSettingsXmlFactory - Now uses transformer
  • DefaultToolchainsXmlFactory - Now uses transformer
  • DefaultPluginXmlFactory - Now uses transformer
  • DefaultModelXmlFactory - Already working, verified

3. Added Comprehensive Tests

Added two new JUnit test files:

  • InliningTransformerTest.java - Tests interning logic and context coverage
  • XmlFactoryTransformerTest.java - Tests transformer usage across all XML factories

Benefits

  1. Memory Efficiency: String interning reduces memory usage by ensuring identical string values share the same object reference
  2. Performance: Faster string comparisons using == instead of .equals() for interned strings
  3. Comprehensive Coverage: All XML parsing in Maven (POMs, settings, toolchains, plugins) now benefits from interning
  4. Maven-specific Optimization: Targets the most commonly repeated values in Maven files

Technical Details

  • Backward Compatible: No breaking changes - optimization is transparent to users
  • Automatic Application: All XML parsing automatically benefits from interning
  • Proper Integration: Transformers are correctly passed through the XML factory chain
  • Conservative Approach: Only interns commonly repeated values to avoid memory overhead

Testing

  • ✅ All changes compile successfully
  • ✅ Comprehensive unit tests added and passing
  • ✅ Checkstyle violations fixed
  • ✅ Bytecode verification confirms correct implementation

This addresses the memory optimization request for repeated string values in Maven's XML processing, providing better memory efficiency during Maven builds and operations.

Files Changed

  • impl/maven-impl/src/main/java/org/apache/maven/impl/model/DefaultModelBuilder.java - Expanded CONTEXTS set
  • impl/maven-impl/src/main/java/org/apache/maven/impl/DefaultSettingsXmlFactory.java - Added transformer usage
  • impl/maven-impl/src/main/java/org/apache/maven/impl/DefaultToolchainsXmlFactory.java - Added transformer usage
  • impl/maven-impl/src/main/java/org/apache/maven/impl/DefaultPluginXmlFactory.java - Added transformer usage
  • impl/maven-impl/src/main/java/org/apache/maven/impl/DefaultModelXmlFactory.java - Verified transformer usage
  • impl/maven-impl/src/test/java/org/apache/maven/impl/model/InliningTransformerTest.java - New test file
  • impl/maven-impl/src/test/java/org/apache/maven/impl/XmlFactoryTransformerTest.java - New test file

Pull Request opened by Augment Code with guidance from the PR author

@gnodet gnodet added this to the 4.0.0 milestone Jun 18, 2025
Copy link
Copy Markdown
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any benchmarks for this?

@gnodet
Copy link
Copy Markdown
Contributor Author

gnodet commented Jun 26, 2025

Are there any benchmarks for this?

I haven't seen any performance changes really. But the idea was more to save some memory when loading big projects, where lots of data is actually the same in all poms.

@gnodet gnodet force-pushed the feature/expand-value-interning branch from 773956d to 2558e2b Compare June 26, 2025 05:29
@gnodet gnodet modified the milestones: 4.0.0, 4.1.0 Jul 1, 2025
@gnodet gnodet force-pushed the feature/expand-value-interning branch from 2558e2b to 4515e4e Compare July 4, 2025 07:31
…erty

This PR expands the value interning optimization in Maven's XML parsing and fixes
transformer usage across all XML factories to improve memory efficiency during Maven
builds.

Expanded the InterningTransformer in DefaultModelBuilder to intern 27 commonly
repeated contexts:

**Core Maven coordinates:**
- groupId, artifactId, version, namespaceUri, packaging

**Dependency-related fields:**
- scope, type, classifier

**Build and plugin-related fields:**
- phase, goal, execution

**Repository-related fields:**
- layout, policy, checksumPolicy, updatePolicy

**Common metadata fields:**
- modelVersion, name, url, system, distribution, status

**SCM fields:**
- connection, developerConnection, tag

**Common enum-like values:**
- id, inherited, optional

Added MAVEN_MODEL_BUILDER_INTERNS session property to allow users to customize
which XML contexts are interned during POM parsing:

- Supports comma-separated list of field names
- User properties take precedence over system properties
- Falls back to default contexts when property not set
- Handles whitespace and empty values gracefully

Usage examples:
- mvn clean install -Dmaven.modelBuilder.interns="groupId,artifactId,version"
- maven.modelBuilder.interns=groupId,artifactId,version,scope,type

Fixed all XML factories to properly use the transformer from XmlReaderRequest:
- DefaultSettingsXmlFactory - Now uses transformer
- DefaultToolchainsXmlFactory - Now uses transformer
- DefaultPluginXmlFactory - Now uses transformer
- DefaultModelXmlFactory - Already working, verified

Added comprehensive test coverage:
- InterningTransformerTest.java - Tests interning logic and session property functionality
- XmlFactoryTransformerTest.java - Tests transformer usage across all XML factories

1. **Memory Efficiency**: String interning reduces memory usage by ensuring identical
   string values share the same object reference
2. **Performance**: Faster string comparisons using == instead of .equals() for interned strings
3. **Comprehensive Coverage**: All XML parsing in Maven (POMs, settings, toolchains, plugins)
   now benefits from interning
4. **Customizable**: Users can tailor interning to their specific use cases
5. **Maven-specific Optimization**: Targets the most commonly repeated values in Maven files

- **Backward Compatible**: No breaking changes - optimization is transparent to users
- **Automatic Application**: All XML parsing automatically benefits from interning
- **Proper Integration**: Transformers are correctly passed through the XML factory chain
- **Conservative Approach**: Only interns commonly repeated values to avoid memory overhead
- **Configurable**: Users can customize which fields are interned via session properties
@gnodet gnodet force-pushed the feature/expand-value-interning branch from 4515e4e to 968190c Compare July 16, 2025 12:28
@gnodet gnodet merged commit e5d985c into apache:master Jul 17, 2025
19 checks passed
@gnodet gnodet deleted the feature/expand-value-interning branch July 17, 2025 23:16
gnodet added a commit to gnodet/maven that referenced this pull request Jul 17, 2025
…erty (apache#2495)

This PR expands the value interning optimization in Maven's XML parsing and fixes
transformer usage across all XML factories to improve memory efficiency during Maven
builds.

Expanded the InterningTransformer in DefaultModelBuilder to intern 27 commonly
repeated contexts:

**Core Maven coordinates:**
- groupId, artifactId, version, namespaceUri, packaging

**Dependency-related fields:**
- scope, type, classifier

**Build and plugin-related fields:**
- phase, goal, execution

**Repository-related fields:**
- layout, policy, checksumPolicy, updatePolicy

**Common metadata fields:**
- modelVersion, name, url, system, distribution, status

**SCM fields:**
- connection, developerConnection, tag

**Common enum-like values:**
- id, inherited, optional

Added MAVEN_MODEL_BUILDER_INTERNS session property to allow users to customize
which XML contexts are interned during POM parsing:

- Supports comma-separated list of field names
- User properties take precedence over system properties
- Falls back to default contexts when property not set
- Handles whitespace and empty values gracefully

Usage examples:
- mvn clean install -Dmaven.modelBuilder.interns="groupId,artifactId,version"
- maven.modelBuilder.interns=groupId,artifactId,version,scope,type

Fixed all XML factories to properly use the transformer from XmlReaderRequest:
- DefaultSettingsXmlFactory - Now uses transformer
- DefaultToolchainsXmlFactory - Now uses transformer
- DefaultPluginXmlFactory - Now uses transformer
- DefaultModelXmlFactory - Already working, verified

Added comprehensive test coverage:
- InterningTransformerTest.java - Tests interning logic and session property functionality
- XmlFactoryTransformerTest.java - Tests transformer usage across all XML factories

1. **Memory Efficiency**: String interning reduces memory usage by ensuring identical
   string values share the same object reference
2. **Performance**: Faster string comparisons using == instead of .equals() for interned strings
3. **Comprehensive Coverage**: All XML parsing in Maven (POMs, settings, toolchains, plugins)
   now benefits from interning
4. **Customizable**: Users can tailor interning to their specific use cases
5. **Maven-specific Optimization**: Targets the most commonly repeated values in Maven files

- **Backward Compatible**: No breaking changes - optimization is transparent to users
- **Automatic Application**: All XML parsing automatically benefits from interning
- **Proper Integration**: Transformers are correctly passed through the XML factory chain
- **Conservative Approach**: Only interns commonly repeated values to avoid memory overhead
- **Configurable**: Users can customize which fields are interned via session properties

(cherry picked from commit e5d985c)

# Conflicts:
#	src/site/markdown/configuration.properties
@gnodet
Copy link
Copy Markdown
Contributor Author

gnodet commented Jul 17, 2025

💚 All backports created successfully

Status Branch Result
maven-4.0.x

Questions ?

Please refer to the Backport tool documentation

gnodet added a commit that referenced this pull request Jul 18, 2025
…erty (#2495) (#10932)

This PR expands the value interning optimization in Maven's XML parsing and fixes
transformer usage across all XML factories to improve memory efficiency during Maven
builds.

Expanded the InterningTransformer in DefaultModelBuilder to intern 27 commonly
repeated contexts:

**Core Maven coordinates:**
- groupId, artifactId, version, namespaceUri, packaging

**Dependency-related fields:**
- scope, type, classifier

**Build and plugin-related fields:**
- phase, goal, execution

**Repository-related fields:**
- layout, policy, checksumPolicy, updatePolicy

**Common metadata fields:**
- modelVersion, name, url, system, distribution, status

**SCM fields:**
- connection, developerConnection, tag

**Common enum-like values:**
- id, inherited, optional

Added MAVEN_MODEL_BUILDER_INTERNS session property to allow users to customize
which XML contexts are interned during POM parsing:

- Supports comma-separated list of field names
- User properties take precedence over system properties
- Falls back to default contexts when property not set
- Handles whitespace and empty values gracefully

Usage examples:
- mvn clean install -Dmaven.modelBuilder.interns="groupId,artifactId,version"
- maven.modelBuilder.interns=groupId,artifactId,version,scope,type

Fixed all XML factories to properly use the transformer from XmlReaderRequest:
- DefaultSettingsXmlFactory - Now uses transformer
- DefaultToolchainsXmlFactory - Now uses transformer
- DefaultPluginXmlFactory - Now uses transformer
- DefaultModelXmlFactory - Already working, verified

Added comprehensive test coverage:
- InterningTransformerTest.java - Tests interning logic and session property functionality
- XmlFactoryTransformerTest.java - Tests transformer usage across all XML factories

1. **Memory Efficiency**: String interning reduces memory usage by ensuring identical
   string values share the same object reference
2. **Performance**: Faster string comparisons using == instead of .equals() for interned strings
3. **Comprehensive Coverage**: All XML parsing in Maven (POMs, settings, toolchains, plugins)
   now benefits from interning
4. **Customizable**: Users can tailor interning to their specific use cases
5. **Maven-specific Optimization**: Targets the most commonly repeated values in Maven files

- **Backward Compatible**: No breaking changes - optimization is transparent to users
- **Automatic Application**: All XML parsing automatically benefits from interning
- **Proper Integration**: Transformers are correctly passed through the XML factory chain
- **Conservative Approach**: Only interns commonly repeated values to avoid memory overhead
- **Configurable**: Users can customize which fields are interned via session properties

(cherry picked from commit e5d985c)

# Conflicts:
#	src/site/markdown/configuration.properties
gnodet added a commit to gnodet/maven that referenced this pull request Jul 24, 2025
…erty (apache#2495)

This PR expands the value interning optimization in Maven's XML parsing and fixes
transformer usage across all XML factories to improve memory efficiency during Maven
builds.

Expanded the InterningTransformer in DefaultModelBuilder to intern 27 commonly
repeated contexts:

**Core Maven coordinates:**
- groupId, artifactId, version, namespaceUri, packaging

**Dependency-related fields:**
- scope, type, classifier

**Build and plugin-related fields:**
- phase, goal, execution

**Repository-related fields:**
- layout, policy, checksumPolicy, updatePolicy

**Common metadata fields:**
- modelVersion, name, url, system, distribution, status

**SCM fields:**
- connection, developerConnection, tag

**Common enum-like values:**
- id, inherited, optional

Added MAVEN_MODEL_BUILDER_INTERNS session property to allow users to customize
which XML contexts are interned during POM parsing:

- Supports comma-separated list of field names
- User properties take precedence over system properties
- Falls back to default contexts when property not set
- Handles whitespace and empty values gracefully

Usage examples:
- mvn clean install -Dmaven.modelBuilder.interns="groupId,artifactId,version"
- maven.modelBuilder.interns=groupId,artifactId,version,scope,type

Fixed all XML factories to properly use the transformer from XmlReaderRequest:
- DefaultSettingsXmlFactory - Now uses transformer
- DefaultToolchainsXmlFactory - Now uses transformer
- DefaultPluginXmlFactory - Now uses transformer
- DefaultModelXmlFactory - Already working, verified

Added comprehensive test coverage:
- InterningTransformerTest.java - Tests interning logic and session property functionality
- XmlFactoryTransformerTest.java - Tests transformer usage across all XML factories

1. **Memory Efficiency**: String interning reduces memory usage by ensuring identical
   string values share the same object reference
2. **Performance**: Faster string comparisons using == instead of .equals() for interned strings
3. **Comprehensive Coverage**: All XML parsing in Maven (POMs, settings, toolchains, plugins)
   now benefits from interning
4. **Customizable**: Users can tailor interning to their specific use cases
5. **Maven-specific Optimization**: Targets the most commonly repeated values in Maven files

- **Backward Compatible**: No breaking changes - optimization is transparent to users
- **Automatic Application**: All XML parsing automatically benefits from interning
- **Proper Integration**: Transformers are correctly passed through the XML factory chain
- **Conservative Approach**: Only interns commonly repeated values to avoid memory overhead
- **Configurable**: Users can customize which fields are interned via session properties
@gnodet gnodet added the bug Something isn't working label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants