Skip to content

Conversation

@suryaprasanna
Copy link
Contributor

Describe the issue this Pull Request addresses

This PR adds support for hierarchical date partitioning, allowing users to store date partition values in yyyy/MM/dd directory structure while specifying them as yyyy-mm-dd format. This provides a more intuitive folder hierarchy for date-based partitions.

Summary and Changelog

Users can now enable hierarchical date partitioning to automatically convert date values from yyyy-mm-dd format to yyyy/MM/dd directory structure.

Changes:

  • Added HIERARCHICAL_DATE_PARTITIONING config property to enable date path transformation
  • Updated key generators (SimpleAvroKeyGenerator, BuiltinKeyGenerator) to support hierarchical date formatting
  • Modified partition path formatters to handle date hierarchy conversion
  • Updated partition path parsing logic to reverse the transformation when reading
  • Added validation to prevent conflicting configuration with hive-style partitioning
  • Added test coverage in TestSlashSeparatedPartitionValue.scala

Impact

Public API Changes:

  • New config: hoodie.datasource.write.hierarchial.date.partitionpath (default: false)
  • New table config property in HoodieTableConfig.HIERARCHICAL_DATE_PARTITIONING

User-facing Changes:

  • Users can enable hierarchical date partitioning via the new config property
  • Partition directories will be created as nested folders (e.g., 2026/01/05 instead of 2026-01-05)
  • Cannot be used together with hive-style partitioning

Risk Level

Low - Feature is opt-in via configuration flag. Existing tables are unaffected unless explicitly enabled. Added validation to prevent misconfiguration with hive-style partitioning.

Documentation Update

Config documentation needs to be updated to include:

  • hoodie.datasource.write.hierarchial.date.partitionpath configuration property
  • Usage examples showing date partition transformation
  • Note about incompatibility with hive-style partitioning

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Jan 5, 2026
@suryaprasanna suryaprasanna force-pushed the support-slash-separated-date-partitions branch from eb03cba to e4af4f8 Compare January 5, 2026 22:36
@suryaprasanna suryaprasanna force-pushed the support-slash-separated-date-partitions branch from e4af4f8 to af1a4ab Compare January 7, 2026 19:29
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 7, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants