Skip to content

[VL] Spark 4.1: Support memory shuffle spill by size threshold (SPARK-49386) #11922

@baibaichen

Description

@baibaichen

Backend

VL (Velox)

Gluten version: main branch

Description

Spark 4.1 introduced memory-based shuffle spill thresholds (SPARK-49386, JIRA type: Improvement). The new spillSizeThreshold parameter enables spilling by data size rather than only by row count. Gluten's shuffle implementation does not support this threshold.

Spark 4.1 only.

Parent issue: #11910 ([VL] Spark 4.x: Tracking new feature support)

Impact

Suite Exclude spark40 spark41
GlutenDataFrameWindowFunctionsSuite SPARK-49386 spill 🟢 🔴
GlutenJoinSuite SPARK-49386 SortMergeJoin spill 🟢 🔴

Note: GlutenSQLWindowFunctionSuite has a pre-existing spill issue ("low buffer spill threshold") unrelated to SPARK-49386 — out of scope for this issue.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions