-
Notifications
You must be signed in to change notification settings - Fork 16.4k
mkdirs should set mode correctly #28367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With Path.mkdirs, if parents is true, any missing parents of this path are created as needed; they are created with the default permissions without taking mode into account see: https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir
|
It may not be a good idea to reuse |
|
In additional |
@Taragolis it is true, it is not used. it was replaced by
However, it is incorrect since the @uranusjr the main gap here is the about the dir mode, see this code block: the |
@uranusjr i think the new implementation is no-op when the dir already exists. it is actually from the previous implementation, e.g. 1.10.15 Lines 55 to 73 in 5786dcd
|
|
This implementation completely ignored umask for subdirectories, see: import os
from tempfile import TemporaryDirectory
from pathlib import Path
from airflow.utils.file import mkdirs
def cur_umask() -> int:
tmp = os.umask(0o022)
os.umask(tmp)
return oct(tmp)
def sample():
print(f" Current umask: {cur_umask()} ".center(72, "="))
with TemporaryDirectory() as tmpdir:
sub_path = Path(tmpdir) / "child1"
path = sub_path / "child2"
mkdirs(path, 0o770)
print(f"{sub_path} mode {oct(os.stat(sub_path).st_mode)}")
print(f"{path} mode {oct(os.stat(path).st_mode)}")
sample()
# child1 mode 0o40777
# child2 mode 0o40770
os.umask(0)
sample()
# child1 mode 0o40777
# child2 mode 0o40770
os.umask(0o002)
sample()
# child1 mode 0o40777
# child2 mode 0o40770 |
|
@Taragolis thanks for the code sample, it boils down to the expectation that when users use this method To me, it is a little bit confusing if i want a folder with |
|
I think there is far too much of confusion here in general. I'd rather say having "mkdir" which behaves different than Python and POSIX standard is confusing. This was a decision made by Python creators and while we might argue with it, it has a good reason and it is explained in the docs of Python very clearly and explicitly https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir - so you might say a lot of users will be confused if aiflow's mkdir behaves differently.
I'd say having a different behaviour will only add to confusion (umask behaviour is sufficiently complex without it). I would also say that if we want to change behaviour of this in specific places and we have a good reason for it (do we?) undeprecating a method that have been deprecated > 2 years ago #10117 (which would add even more confusion) is a bad idea. Now - we want to add a new mkdirs util method that not only creates parents by default (contrary to the system lib), but also does it differently. That's super-confusing, and there is even no attempt to explain it in the docs of hte newly added method. If we want to propose a change in behaviour of some pleces where currently Path.mkdir(parent=True) is used, then it should be a new feature added rather than silently changing it back. This ship has sailed. And rather than adding similarly named function to utils, it should have different name and signature to make it clear it is not similar to mkdirs behaviour. But - this PR does not do that (it does not use the newly added method), so I am rather sceptical if we want to add it at all because it seems there is no intention to use it ? If we want to just add a new util file which is not used in airflow anyway why do we want to add it at all? If it's not used, there is no reason to add it. YAGNI. If it "just" needs to be used elsewhere, outside of Airflow (I guess AirBnB uses it), it should be added as a library there, not in Airflow. We ere just now discussing about being very explicit what Public Airflow Interface is #28300 and the idea is there to be very explicit about what is "public" and treat all the rest as "internal detail". Adding - effectively (after old one is deprecated for 2.5 years) a new util API which is unused makes very little sense to me. We only want to expose as external interface things that are "extremely" important to be exposed. We do not want others relying on random utils and small things in Airflow, because it makes it more difficult to maintain Airflow in the future. Are there any particular uses of it @pingzh in Airflow that you want to add now or soon (and if so - why it's not part of this PR)? |
|
@potiuk thanks for chiming me. I agree with that:
A better method name with very clear expectation is very important. as for the usage of this method, i was planning to update places that use For example, the places that I plan to change is the airflow/airflow/utils/log/file_task_handler.py Lines 324 to 346 in b263dbc
To me, the mode of |
|
This is the only one place in project where we tried to set And fix (it it actually required) could be |
|
@Taragolis @potiuk it looks like it's better just to update the |
|
Yep. If this issue still exists better to fix them. It could be a situation that initial issue solved but comments still exists. |
|
+1 |
|
close this in favor of #28477 |
With Path.mkdirs, If mode is given, it is combined with the process’ umask value to determine the file mode and access flags. if parents is true, any missing parents of this path are created as needed; they are created with the default permissions without taking mode into account. However, in the
mkdirs, it specifies theumaskis ignoredThis
mkdirswas replaced byPath(xx).mkdirin the current codebase, e.g.airflow/airflow/utils/log/file_task_handler.py
Line 345 in b263dbc
But I think we should use
mkdirs, as thePath(xx).mkdirdoes not set the mode correctly, since it factors in theumask, see this code:e.g. code:
The output is :
mode is, 755, which is not777see: https://docs.python.org/3/library/pathlib.html#pathlib.Path.mkdir
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.