Skip to content

Conversation

@TheR1sing3un
Copy link
Member

@TheR1sing3un TheR1sing3un commented Jan 4, 2026

  1. Introduce the configuration for forcibly triggering timeline compaction

Describe the issue this Pull Request addresses

closes #17778

Summary and Changelog

  1. Introduce the configuration for forcibly triggering timeline compaction

Impact

none

Risk Level

low

Documentation Update

none

  • The config description must be updated if new configs are added or the default value of the configs are changed.
  • Any new feature or user-facing change requires updating the Hudi website. Please follow the
    instruction to make changes to the website. -->

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Jan 4, 2026
@TheR1sing3un TheR1sing3un requested a review from danny0405 January 5, 2026 04:01
log.info("No Instants to archive");
}
// run compact and clean if needed even no instants were archived
if (!instantsToArchive.isEmpty() || config.isTimelineCompactionForced()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current compaction strategy is kind of eager(compact all the layers casecadingly) when a new archived file is generated, so even if we trigger compaction for commits that does not archive, it does not take effect as expected at all.

Wondering wheter we could introduce some lazy compaction strategies or add a upper threshold for the target file size of single round of compaction.

Copy link
Member Author

@TheR1sing3un TheR1sing3un Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering wheter we could introduce some lazy compaction strategies or add a upper threshold for the target file size of single round of compaction.

I agree with this idea.
The current strategy is relatively simple, and in some corner cases, compaction will be blocked, resulting in all newly added archived files not being compacted.
Therefore, I will propose two pr later:

  1. Solve the current corner case that block normal compaction
  2. Introduce more diverse compaction strategies, not only for trigger timing (lazy/eager), but also for compaction strategies, such as finding at most one batch of candicates at each level or compact each level as much as possible, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so even if we trigger compaction for commits that does not archive, it does not take effect as expected at all.

This is related to the current strategy.
Currently, each time compact is triggered, each level will only merge at most one batch of files, and then move on to the next level for merging.
So if I have 100 files at layer L0 that can be merged, but with batch=10, that compact will only merge the first ten. Therefore, if we don't provide a switch to force the compact to be triggered, the remaining 90 files won't be able to be triggered for merging until there are some new instants comes into archve.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this idea. The current strategy is relatively simple, and in some corner cases, compaction will be blocked, resulting in all newly added archived files not being compacted. Therefore, I will propose two pr later:

  1. Solve the current corner case that block normal compaction

first pr: #17784

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I have 100 files at layer L0 that can be merged, but with batch=10, that compact will only merge the first ten. Therefore, if we don't provide a switch to force the compact to be triggered, the remaining 90 files won't be able to be triggered for merging until there are some new instants comes into archve.

This could not happen, at most 1 file is generated in each layer, we just need to merge one bunch of files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could not happen, at most 1 file is generated in each layer, we just need to merge one bunch of files.

We encountered a scene like this:

  1. The instants were written normally, and at this time, archiver was also called up once normally.
  2. archiver archived the active instants.
  3. Because instant is archived, a timeline compaction is triggered. However, timeline compaction failed due to some occasional exception, such as hdfs timeout, etc.
  4. At this point, I re-ran archival, but since there was no instant that needed to be archve, timelime compaction could never be triggered until there was an active instant that needed to be archived.

In this case, I do need a logic to forcibly trigger timeline compaction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it makes more sense to add a spark procedure to trigger the timeline compaction instead of bind it with the write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea👍 agree it, it will be more clearly and reasonably

@TheR1sing3un TheR1sing3un requested a review from danny0405 January 5, 2026 08:34
…mpaction

1. Introduce the configuration for forcibly triggering timeline compaction

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

feat: enable timeline compaction by default

1. enable timeline compaction by default

Signed-off-by: TheR1sing3un <chaoyang@apache.org>

rerun
@TheR1sing3un TheR1sing3un force-pushed the feat_lsm_timeline_force_trigger_compact branch from ba986e3 to f5c5029 Compare January 8, 2026 03:45
@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 8, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce the configuration for forcibly triggering timeline compaction

3 participants