[VL] Distinct aggregation OOM when getOutput

### Backend

VL (Velox)

### Bug description

Distinct aggregation will merge all sorted spill file in `getOutput()` (`SpillPartition::createOrderedReader`). If there are too many spill files, reading the first batch of each file into memory will consume a significant amount of memory. In one of our internal cases, one task generated 300 spill files, which requires close to 3G of memory.

![image](https://github.com/user-attachments/assets/23dd540e-a4b7-448e-84e0-caae00aa5147)

Possible workarounds:

1. Increase `kMaxSpillRunRows`, `1M` will generate too many spill files for hundreds million rows of input. https://github.com/apache/incubator-gluten/pull/7531
2. Reduce `kSpillWriteBufferSize` to `1M` or lower. Why it is set to 4M by default? Is there any experience in performance tuning?

### Spark version

None

### Spark configurations

_No response_

### System information

_No response_

### Relevant logs

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VL] Distinct aggregation OOM when getOutput #8025

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[VL] Distinct aggregation OOM when getOutput #8025

Description

Backend

Bug description

Spark version

Spark configurations

System information

Relevant logs

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions