Skip to content

Merging Statistics is slow when sum statistic is present #15809

@robert3005

Description

@robert3005

Is your feature request related to a problem or challenge?

If you have a datasource that reports a sum statistic then merging statistics for a file group is considerably slower when compared to cases without sum statistic. Profiling reveals it's mostly due to Precision<ScalarValue>::add doing a lot of extra work.

Describe the solution you'd like

Ideally we'd transpose the stats aggregation and aggregate them column wise instead of column group wise

Describe alternatives you've considered

No response

Additional context

Screenshots of Parquet and Vortex profiles from clickbench Q0 which is all just getting statistics. You can notice that roughly 50% of Vortex time is spent in aggregating stats

Image
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions