-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Don't resolve symbolic links for dag_directory #42142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think part of the problem is that we So we need to find a way how to retain the original symbolic link when we we are parsing the files - but that one might be tricky as it will depend on the way how you actually got to the folder. Another option is to only compare "relative" parts of the links. The thing is that - at least currently - fileloc / path will always be So it's ... tricky. |
| """Return the dag_director as a string.""" | ||
| if isinstance(self._dag_directory, Path): | ||
| return str(self._dag_directory.resolve()) | ||
| # we save dag.fileloc without resolving the symlink in the db, we should be consistent in resolving and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this at the deactivate stale dag code instead of here
Should we revert this change and target the fix for another release? |
If we can't find solution quickly, likely yes. But I think what we should likely do is to understand that the abspath can contain ".worktrees" and map them back to the "symbolic link" name and store them in both fileloc and directory - but it will likely need to come with some "special case" understanding - getting the DAG_FOLDER value and if we see that the the path does not contain DAG_FOLDER smth like (pseudo-code): if not path.startswith(DAG_FOLDER):
# symbolic link for git-sync
split_the_paths
see where they start to differ (/opt/airflow/dags/ is common)
see that the next two follow `.worktrees' + "looks_like_commit_hash"
replace ".worktrees" + "commit_hash" with whatever next folder is in the DAG_FOLDER after `/opt/airflow/dags`That shoudl work if we make an assumption (which I think is pretty valid) that we have git-sync. It will not work, however is someone has symbolic links in their repos which are not git-sync created. If there are other custom solutions where someone would also use symlinks to switch between folders more "manually" - might not work. Though I have not looked in details yet - at Airflow Summit, why really we get the problem in the first place - why dag_directory is "resolved" - maybe we can avoid it in general ? |
|
This looks more complicated than I initially thought. I agree with @ephraimbuddy that we should revert this PR and proceed with the release. That way we have more time to fix the issue properly. |
I was also curious why we do it differently for fileloc and dag_directory. I didn't see anything in the initial PR that introduced this method. Also, the scheduler command doesn't resolve the symbolic links initially. |
|
first plan in 2.10.2, then 2.10.3, now 2.11.0, why ? |
I guess, because it's difficult and no thig priority, but If you would like to step-in and attempt it - maybe you could help with it @belongwqz ? |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
related: #42111
With the recent fix we added a check that compares a
dag.filelocanddag_directoryHowever, we never resolve the symbolic links indag.filelocbut we do resolve them fordag_directory. But in helm charts that usesgitSyncwe use symbolic links like:Which results in a discrepancy:
and leads to failure of the check added in the PR. This PR aims to resolve this issue by ignoring the symbolic links in the
dag_directorypath.