Skip to content

[C++][Compute][Acero] Poor aggregate performance when there is a large number of batches on the build side #45847

@uchenily

Description

@uchenily

Describe the bug, including details regarding any error messages, version, and platform.

I am running a performance test on a plan as described below. This execution plan is fairly straightforward, involving only two source nodes, one hash join node, and one aggregation node.

         aggr
           +
         join
     +-----+------+
  source_0     source_1
  (probe)      (build)

However, I noticed that the performance is far below my expectations. So, I made the following table for further analysis.

In this table, the horizontal axis represents the number of batches generated on the build side, while the vertical axis denotes the number of batches produced on the probe side, with each batch size being 1<<15.

time(in seconds) build 1 build 2 build 4 build 16 build 32 build 64
probe 1 0.03 0.04 0.06 0.03 7.6 269.7
probe 2 0.04 0.04 0.06 7.7 59.1 176.0
probe 4 0.05 0.06 0.06 7.7 56.1 229.9
probe 16 0.11 0.12 0.12 3.2 32.2 145.8
probe 32 0.19 0.19 0.20 3.1 32.2 107.5
probe 64 0.36 0.36 0.36 3.4 44.9 134.9
probe 256 1.33 1.33 1.34 5.5 30.7 197.3
probe 1024 5.2 5.3 5.2 5.3 17.7 50.9

More information:

  1. I run these tests on a Intel x86_64 machine with about 100 cores.

  2. I have noticed that in scenarios where the execution time exceeds 10 seconds, the CPU utilization is very low, and in the most of time, only one CPU core is being used.

  3. I found that the arrow/compute/row/grouper.cc:ConsumeImpl method is quite time-consuming. (map phase)

  4. The process of generating data also consumes a considerable amount of time, but in the tests mentioned above, the time spent on data generation does not account for a significant portion.

  5. The distribution of data is also certain to affect the execution time, but I have not conducted any relevant verification.

  6. In some scenarios, the execution time is longer even with a smaller amount of data, which should be considered unreasonable.

  7. I tested on both version 19.0.1 and the main branch, and no significant differences were observed.(this table is based on v19.0.1)

  8. If we remove the aggregate node, everything becomes OK, so I guess the main problem is the hash aggr node; If we replace hash_count in the test code below with hash_sum or another hash aggregation function, the results were almost unanimous.

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions