Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

This PR aims to support fallback storage clean-up during stopping SparkContext.

Why are the changes needed?

SPARK-33545 added Support Fallback Storage during worker decommission for the managed cloud-storages with TTL support. Usually, it's one day. This PR will add an additional clean-up feature during stopping SparkContext in order to save some money before TTL or the other HDFS-compatible storage which doesn't have TTL support.

Does this PR introduce any user-facing change?

Yes, but this is a new feature.

How was this patch tested?

Pass the newly added UT.

@github-actions github-actions bot added the CORE label Jan 17, 2021
@SparkQA
Copy link

SparkQA commented Jan 17, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38740/

@SparkQA
Copy link

SparkQA commented Jan 17, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38740/

@SparkQA
Copy link

SparkQA commented Jan 17, 2021

Test build #134157 has finished for PR 31215 at commit 44f3106.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.checkValue(_.endsWith(java.io.File.separator), "Path should end with separator.")
.createOptional

private[spark] val STORAGE_DECOMMISSION_FALLBACK_STORAGE_CLEANUP =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @yikf .

@dongjoon-hyun
Copy link
Member Author

Could you review this please, @viirya ?

.createOptional

private[spark] val STORAGE_DECOMMISSION_FALLBACK_STORAGE_CLEANUP =
ConfigBuilder("spark.storage.decommission.fallbackStorage.cleanUp")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, should we plan to document this fallback storage somewhere like https://spark.apache.org/docs/latest/configuration.html#kubernetes or https://spark.apache.org/docs/latest/job-scheduling.html#graceful-decommission-of-executors? Should better be done separately though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll do that separately. Thanks.

@HyukjinKwon
Copy link
Member

Looks pretty good to me

…cala

Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
@dongjoon-hyun
Copy link
Member Author

Thank you for review and approval, @HyukjinKwon . For the last commit, this PR is only tested by FallbackStorageSuite and I verified like the following manually.

$ build/sbt "core/testOnly *.FallbackStorageSuite"
...
[info] FallbackStorageSuite:
[info] - fallback storage APIs - copy/exists (1 second, 42 milliseconds)
[info] - SPARK-34142: fallback storage API - cleanUp (12 milliseconds)
[info] - migrate shuffle data to fallback storage (239 milliseconds)
[info] - Upload from all decommissioned executors (5 seconds, 437 milliseconds)
[info] - Upload multi stages (3 seconds, 177 milliseconds)
[info] - lz4 - Newly added executors should access old data from remote storage (9 seconds, 535 milliseconds)
[info] - lzf - Newly added executors should access old data from remote storage (9 seconds, 487 milliseconds)
[info] - snappy - Newly added executors should access old data from remote storage (9 seconds, 389 milliseconds)
[info] - zstd - Newly added executors should access old data from remote storage (9 seconds, 485 milliseconds)
[info] ScalaTest
[info] Run completed in 48 seconds, 870 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[info] Passed: Total 9, Failed 0, Errors 0, Passed 9
[success] Total time: 69 s (01:09), completed Jan 17, 2021 4:52:45 PM

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 3.2.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-34142 branch January 18, 2021 00:54
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm too.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya ! :)

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…SparkContext

### What changes were proposed in this pull request?

This PR aims to support fallback storage clean-up during stopping `SparkContext`.

### Why are the changes needed?

SPARK-33545 added `Support Fallback Storage during worker decommission` for the managed cloud-storages with TTL support.  Usually, it's one day. This PR will add an additional clean-up feature during stopping `SparkContext` in order to save some money before TTL or the other HDFS-compatible storage which doesn't have TTL support.

### Does this PR introduce _any_ user-facing change?

Yes, but this is a new feature.

### How was this patch tested?

Pass the newly added UT.

Closes apache#31215 from dongjoon-hyun/SPARK-34142.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 415506c)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants