Skip to content

Conversation

@xinrong-meng
Copy link
Member

@xinrong-meng xinrong-meng commented Jul 20, 2023

What changes were proposed in this pull request?

Implement Arrow self_destruct of toPandas for memory savings.

Now the Spark configuration spark.sql.execution.arrow.pyspark.selfDestruct.enabled can be used to enable PyArrow’s self_destruct feature in Spark Connect, which can save memory when creating a Pandas DataFrame via toPandas by freeing Arrow-allocated memory while building the Pandas DataFrame.

Why are the changes needed?

Reach parity with vanilla PySpark. The PR is a mirror of #29818 for Spark Connect.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@xinrong-meng xinrong-meng changed the title [WIP][SPARK-44486][PYTHON][CONNECT] Implement PyArrow self_destruct feature for toPandas [SPARK-44486][PYTHON][CONNECT] Implement PyArrow self_destruct feature for toPandas Jul 20, 2023
@xinrong-meng xinrong-meng marked this pull request as ready for review July 21, 2023 00:18
@xinrong-meng
Copy link
Member Author

cc @BryanCutler

@xinrong-meng
Copy link
Member Author

Failed Run / Run Spark on Kubernetes Integration test, which is irrelevant to the PR.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.5.

HyukjinKwon pushed a commit that referenced this pull request Jul 25, 2023
…ure for `toPandas`

### What changes were proposed in this pull request?
Implement Arrow `self_destruct` of `toPandas` for memory savings.

Now the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame.

### Why are the changes needed?
Reach parity with vanilla PySpark. The PR is a mirror of #29818 for Spark Connect.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit test.

Closes #42079 from xinrong-meng/self_destruct.

Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 78b3345)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants