Skip to content

Conversation

@Ted-Jiang
Copy link
Member

Which issue does this PR close?

Closes #1780 .

Rationale for this change

Avoid work_dir quickly fill up disk space.

What changes are included in this PR?

like spark standalone mode, Executor periodic spawn a task to clean work_dir, if all the files in job_dir not modified in executor_cleanup_ttl seconds, it will be deleted.
https://spark.apache.org/docs/latest/spark-standalone.html spark.worker.cleanup.enabled

.context("Could not connect to scheduler")?;

let scheduler_policy = opt.task_scheduling_policy;
let cleanup_ttl = opt.executor_cleanup_ttl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it might be possible to use NamedTempFile or some other struct from tempfile (already a dependency): https://crates.io/crates/tempfile

There are at least two benefits:

  1. The files are dropped immediately after they are no longer used so required intermediate diskspace is minimized
  2. They can't be accidentally dropped while still in use (which perhaps affects long running queries)

I did something similar in DataFusion here: #1680

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Thanks for your advice 😊!
IMHP, if one job has 3 stage, stage2 read stage1 input then delete the file, but stage2 task fail,
In ballista, scheduler will start a task to reload stage1 input. I think using NamedTempFile will cause some trouble and complexity.
we need keep the file for task-recovery and stage retry (like spark). So i decide if all the files under job_dir not modified in TTL we can safely delete it.
If i am not right, Please correct me 🙈

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't honestly know enough about Ballista and its executor to know.

What do you think @yahoNanJing and @liukun4515 ?

@realno
Copy link
Contributor

realno commented Feb 9, 2022

@Ted-Jiang this is a great step toward making Ballista production ready 👍 Thanks for making the change.

@Ted-Jiang
Copy link
Member Author

cc @houqp @yahoNanJing

@Ted-Jiang
Copy link
Member Author

cc @liukun4515

name = "executor_cleanup_enable"
type = "bool"
doc = "Enable periodic cleanup of work_dir directories."
default = "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the default value to false and open it after a long time of testing?
@Ted-Jiang
If the default is false, this feature will not impact the current status. Users who want to use this feature can open it.
Do you agree? @alamb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to put this feature in turned off by default so it can be tested more thoroughly


[[param]]
name = "executor_cleanup_ttl"
type = "i64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to use u64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks fix !

@alamb
Copy link
Contributor

alamb commented Mar 4, 2022

Once someone who focuses on Ballista is happy with this PR I am happy to merge it

@liukun4515
Copy link
Contributor

Once someone who focuses on Ballista is happy with this PR I am happy to merge it

I think @Ted-Jiang will address the comments and please hold the status.
If I think it is ok, I will give my LGTM.

Ted-Jiang and others added 4 commits March 7, 2022 15:19
fix info

Co-authored-by: Kun Liu <liukun@apache.org>
fix info

Co-authored-by: Kun Liu <liukun@apache.org>
fix

Co-authored-by: Kun Liu <liukun@apache.org>
Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Ted-Jiang
LGTM

@alamb alamb merged commit 432861e into apache:master Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable periodic cleanup of work_dir directories in ballista executor

5 participants