Skip to content

[Proposal] The execution model of compaction needs to improve #3624

@vagetablechicken

Description

@vagetablechicken

Current model

Each dir creates M threads for base compaction, N threads for cumulative compaction. And compaction threads execute one compaction in one cycle(may skip execution because no best tablet).
Too many compaction tasks may run out of memory, so we limit the max concurrency of running compaction tasks by semaphore.

Problem

It only limits the thread number. If the running threads cost too much memory, we can't defense it.
If we reduce concurrency to avoid OOM, we can't do compaction in time. We may meet more heavy compactions.
So concurrency limitation is not enough.

Proposal

The most desirable solution is limiting the memory. But this solution assumes that we can estimate the memory usage of one compaction. It's diffcult.
So we can only refer to the tablet score(the segments num). It has positive correlation with memory, but can't simply estimate the mem usage by a scale factor.
What about a model of scores limitation?
A compaction needs to acquire the permits(equals to it's score), and release the permits when it finished. So it wiil be low concurrency when high score compactions running, and high concurrency when low score compactions running.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/storageIssues or PRs related to storage enginekind/featureCategorizes issue or PR as related to a new feature.proposalCategorizes an issue is a proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions