Skip to content

Conversation

@voonhous
Copy link
Member

@voonhous voonhous commented Jan 5, 2026

Describe the issue this Pull Request addresses

The current implementation of HoodieLogFormat.WriterBuilder is convoluted:

  1. Code Smell: The builder is implemented as a heavy, manual inner class within the HoodieLogFormat interface.
  2. Reflection Usage: It uses ReflectionUtils to load the default writer implementation via a String class name, which is brittle and bypasses compile-time checks, a change introduced in #11207 to decouple hudi-common with hadoop dependencies
  3. Maintenance: Any new field requires manual updates to both the builder methods and the instantiation logic.

This PR refactors the log writer to use Lombok's @Builder, standardizing the fluent API and improving type safety across the codebase.

Summary and Changelog

This refactor simplifies the construction of HoodieLogFormat writers by leveraging Lombok and improving the class hierarchy.

Key Changes:

  1. Refactored HoodieLogFormat.Writer from an interface to an abstract class to centralize shared fields and construction logic.
  2. Moved validation, default value assignment, and log version computation into the Writer base constructor. This ensures that all writer implementations follow the same versioning and path-generation logic.
  3. Applied Lombok @Builder to the HoodieLogFormatWriter constructor, replacing the manual WriterBuilder.
  4. Eliminated reflection-based instantiation, favoring direct constructor calls for better performance and safety.
  5. Updated call sites across the project (client, spark, flink, utilities) to use the new builder syntax (e.g., standardized on the with prefix).
  6. Updated existing tests and renamed TestHoodieLogWriterBuilder to TestHoodieLogFormatWriterBuilder to reflect the new structure.

Impact

  1. Internal API Change: Developers manually building a log writer will see changes in method names (e.g., onParentPath is now withParentPath).
  2. Type Safety: The builder is now type-aware, preventing runtime failures previously possible with the reflection-based approach.
  3. Codebase Health: Significant reduction in boilerplate code in HoodieLogFormat.

Risk Level

Low. This is a structural refactor. The core logic for log writing and versioning remains unchanged, just relocated to the base class constructor.

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Jan 5, 2026
@voonhous voonhous force-pushed the lombokify-builders-HoodieLogFormat branch 2 times, most recently from d21169c to fee094e Compare January 5, 2026 09:57
@apache apache deleted a comment from hudi-bot Jan 5, 2026
@voonhous voonhous force-pushed the lombokify-builders-HoodieLogFormat branch from fee094e to 342a043 Compare January 8, 2026 09:31
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 8, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants