Skip to content

Consider using single spill file for multiple partitions #3859

@andygrove

Description

@andygrove

What is the problem the feature request solves?

When spilling occurs in native shuffle, Comet writes one spill file per partition. These files are then merged into the final file when shuffle completed.

Gluten combines spill data from multiple partitions into a single spill file. The data is ordered by partition. This results in fewer file handles and fewer metadata ops, which could be a benefit when using EBS for shuffle data.

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions