Skip to content

[VL] Gluten write more shuffle data than vanilla Spark #8833

@boneanxs

Description

@boneanxs

Backend

VL (Velox)

Bug description

I can see that for the same record numbers, gluten always write much larger shuffle data than vanilla, why? From my understanding, gluten should use less storage since it's written in columnar format.

gluten

Image

vanilla

Image

Spark version

Spark-3.2.x

Spark configurations

No response

System information

gluten version: 1.3.0

Relevant logs

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions