-
-
Notifications
You must be signed in to change notification settings - Fork 748
Do not use tempfile in utils_test #6651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use tempfile in utils_test #6651
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 10h 20m 15s ⏱️ - 12m 26s For more details on these failures, see this check. Results for commit 104d1ec. ± Comparison against base commit bd7f43a. |
hendrikmakait
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. Regarding the concerns: Given that CI looks green, I'd say we risk it. Those problems, if existent, should be fairly easy to spot.
|
There are a few reasons why I introduced the top-level temporary directory in #6335:
I understand that CI is failing because the tests are leaking open file handles, but on Windows you can't delete a file if there are open file handles to it. I feel that this PR just sweeps the problem under the carpet - I'd much rather have proper cleanup, or even just a "retry untl timeout" design in the diskutils teardown. Thinking about it again now, n.2 above (local test failures due to open file handles to an NFS directory) are problably the same issue as the failures on Windows CI. |
|
|
||
| with dask.config.set( | ||
| { | ||
| "local_directory": tempfile.gettempdir(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crusaderky I'm still setting the local_directory option to this value, e.g. on my machine this yields
In [2]: import tempfile
In [3]: tempfile.gettempdir()
Out[3]: '/var/folders/h0/kd1gdptx7gzb6cbywplfrx1r0000gn/T'
which is the same location as TMPDIR so on linux you'll likely get a /tmp directory.
▶ echo $TMPDIR
/var/folders/h0/kd1gdptx7gzb6cbywplfrx1r0000gn/T/
when using tempfile.TemporaryDirectory it will also create a directory in this folder. The only actual change I am introducing is that the gen_cluster util will not actually create the directory and own the directory but instead the workers will.
We should get all the same benefits from this as with the TemporaryDirectory but we will not collide during exit/cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diskutils is
- creating the directory
distributed/distributed/diskutils.py
Lines 44 to 51 in ff250f2
def __init__(self, workspace, name=None, prefix=None): assert name is None or prefix is None if name is None: self.dir_path = tempfile.mkdtemp(prefix=prefix, dir=workspace.base_dir) else: self.dir_path = os.path.join(workspace.base_dir, name) os.mkdir(self.dir_path) # it shouldn't already exist - creating a bunch of lock files, e.g. global and purge lock and a directory specific lock
distributed/distributed/diskutils.py
Line 62 in ff250f2
self._lock_file = locket.lock_file(self._lock_path) - Most importantly, it is also cleaning up the workspace again which may conflict with the
distributed/distributed/diskutils.py
Lines 93 to 109 in ff250f2
def release(self): """ Dispose of this directory. """ self._finalizer() @classmethod def _finalize(cls, workspace, lock_path, lock_file, dir_path): try: workspace._purge_directory(dir_path) finally: if lock_file is not None: lock_file.release() if lock_path is not None: workspace._known_locks.remove(lock_path) safe_unlink(lock_path) TemporaryDirectorycontext manager on exit, particularly when it is called as a finalizer
In a follow up step, I also think we can think of disabling purging on CI since we don't really care about his but this may significantly slow us down (e.g. introduce a config option)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worker is initializing WorksSpace and directory here
distributed/distributed/worker.py
Lines 570 to 585 in ff250f2
| if not local_directory: | |
| local_directory = dask.config.get("temporary-directory") or os.getcwd() | |
| os.makedirs(local_directory, exist_ok=True) | |
| local_directory = os.path.join(local_directory, "dask-worker-space") | |
| with warn_on_duration( | |
| "1s", | |
| "Creating scratch directories is taking a surprisingly long time. ({duration:.2f}s) " | |
| "This is often due to running workers on a network file system. " | |
| "Consider specifying a local-directory to point workers to write " | |
| "scratch data to a local disk.", | |
| ): | |
| self._workspace = WorkSpace(os.path.abspath(local_directory)) | |
| self._workdir = self._workspace.new_work_dir(prefix="worker-") | |
| self.local_directory = self._workdir.dir_path |
where we read the
temporary-directory variable in. the os.getcwd() there should probably be replaced with a tempfile.gettempdir() as well
|
Interestingly, this PR seemed to have fixed It also had an effect on |


I realized we don't actually need the TemporaryDirectory here since the
diskutilsare taking care of cleanup.The diff is mostly reindent. I didn't change any logic of gen_cluster
This could be responsible for many of our CI failures lately since I expect there to be a problem if the workdir/diskutils is using the finalize
distributed/distributed/diskutils.py
Lines 99 to 109 in bd7f43a
This should be safe to be called if there is no directory anymore, but I don't know for sure.
If the worker is closing gracefully, we call this
distributed/distributed/worker.py
Line 1524 in bd7f43a
I don't think any of this should matter but I am not absolutely sure. If any of these or the tempdir implementation doesn't handle empty directories well, this could cause conflicts