Skip to content

[Enhance] Optimize version retrieval efficiency in high-frequency import/compaction scenarios. #5830

@killxdcj

Description

@killxdcj

Backgroud

For high-frequency import and high-frequency Compaction scenarios, the number of versions of the Tablet will be large(maybe thousands), and the version graph will be more complex. In this case, the current implementation of VersionGraph.capture_consistent_versions will have relatively large performance problems.

For example, the flowing is a flame diagram of one of our online scenes

image

After analyzing the relevant code, find the current version capture algorithm is BFS, and there are two main reasons for the poor performance

  1. Space complexity is O (n), which is related to the versions number of Tablet
  2. In most scenarios, the version retrieving needs to traverse all versions.

Suppose StreamLoad is performed once in 1 second and Compaction is performed once in 2 seconds, the version graph of the tablet may be as the following figure (the edges represents rowset, the nodes represents version)

image

If we want to retrieve the version (0 -> 8), the retrieval process is as follows (we need to traverse all nodes and edges)

0 -> (1, 2, 4, 6) -> (3, 5, 7) -> (8)

Solution

One solution is to change the retrieval algorithm from BFS to greedy strategy. when retrieving, we only find the version with the largest step size in each time, so that the space complexity can be reduced to O (1), and the nodes that need to be traversed when retrieving are also greatly reduced.

For example, for the above scenario, the retrieval process is as follows:

0 -> 6 -> 7 -> 8

The following is the flame diagram of BE in the same scene after switching to greedy strategy. The proportion of capture_consistent_versions has almost disappeared.

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions