Skip to content

Conversation

@aleksandr-shel
Copy link
Contributor

• this commit allows users to write dag_id and task_id in their national characters, which are usually non-ASCIIs as the display names of their dags.
• dag_id and task_id will be modified to ASCII version with the usage of slugify and stored in dag_id or task_id fields of dag and task
• unmodified version (possibly non-ASCII) of dag_id and task_id will be stored in display_name fields of dag and task
• related: #22073

@boring-cyborg boring-cyborg bot added area:serialization area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels Dec 7, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Dec 7, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@potiuk
Copy link
Member

potiuk commented Dec 30, 2022

I like how the implementation is started (and I think this one is needed). Particularly, I like it's going to be 100% backwards compatible and will only impact new DAGs with new DAG_IDs.

I know however that there are people who have rather strong opinion on the subject so I will leave it for others to comment @ashb @uranusjr @dstandish @eladkal - I think it would be good to get your comments here.

And I would like to get more comments, before we decide to progress with this idea because there are a LOT more to be done in this PR in order to be approved.

But if we do, the more changes needed are I believe:

  • track down and update all the places where dag_id is displayed in the UI. I have a feeling that the current proposal is a very small subset of those and we cannot affort inconsistently displaying slugified/non-slugified names. And we should make a decision whether we display only the display id or maybe both of them.

  • IMHO we need to make sure that you can use either slugified or non-slugified dag_id (i.e. display name) everywhere where there is a user interaction involved: CLI, API, possibly also in the places where "public interface" is used - for example when there is an XCom retrieval). Otherwise we risk a lot of confusion from the user's point of view. It's extremely bad idea to expect the users to have to manually run slugify (and know how to do it) when they want to query or interface with dag that they only know from their display name. There might be more cases where this aspect should be considered (and if we decide to everywhere display both - display name and id in case they differ, this one might be not needed.

  • We likely need to handle and detect the case where two different non-slugified dag_ids produce the same slugified dag_ids. This will become much more difficult to track if we do slugification. Likely DAGFileProcessor should detect such case (by comparing display_name and dag_id) and if there is an existing dag_id with different display name, detected a new kind of error should be displayed for the users (similar to but different from ImportError)

All those should have comprehensive unit tests covering all the edge cases there before we approve it anyway.

@uranusjr
Copy link
Member

uranusjr commented Jan 3, 2023

I like the general direction this takes. The only part I don’t like is it silently slugifies the IDs. This does create some backward incompatibility issues (can be eliminated if we only slugify if the ID is not valid as-is), but more importantly, can result in som weird usability issues, such as ID conflicts when there are no actually conflicting IDs (because the slugified IDs conflict). This could be improved by improving error messages to report the original (user-supplied) value instead of the slugified one. Another further issue with this is it won’t be possible for the user to work around this slugified ID conflict. One possible solution would be to make the API more explicit, such as:

# This would generate an auto slugified ID.
DAG(dag_name="いろはにほへと")

# But I can supply my own ID.
DAG(dag_name="いろはにほへと", dag_id="my_dag")

# This automatically sets the display name to the same as the ID,
# and still adheres to the current ID rules.
DAG(dag_id="my_dag")

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only part I don’t like is it silently slugifies the IDs

Agreed. And for that reason I am vetoing the PR as it stands

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Feb 27, 2023
@github-actions github-actions bot closed this Mar 5, 2023
@xgao1023
Copy link
Contributor

Is slugifying the IDs the only concern?

If so I will be happy to remove that part - task/dag ID should just work as before - either explicitly provided or from the function name of task flow.

I will be happy to continue the work if it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:serialization area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants