Skip to content

[Enhancement] [Vectorized] optimize pre-agg in stream load memtable #8656

@spaces-X

Description

@spaces-X

Search before asking

  • I had searched in the issues and found no similar issues.

Description

The current version( #8561 ) of vectorized stream load is using skip list to aggregate values, which is a row-structured constant ordered data result.

Considering that we just do pre-aggregation in memtable and queries will not go to memtable, this implementation of maintaining constant order is too expensive.

I plan to refactor this part of the code and replace the existing skip list with other solutions based on pr #8561.

Solution

Solution 1:
First, sort of incoming-data-block.
Then, merge the sorted-data-block.
Then, append merged-data-block to final-block.
At last, do a finalize(sort + merge) of the final-block to flush.

Solution 2:
First, aggregate the incoming-data-block by hash table.
At last, sort the whole aggregated-block.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions