Skip to content

Conversation

@Birne94
Copy link

@Birne94 Birne94 commented Feb 19, 2024

This PR adds secret caching for the Airflow CLI. Secret caching current only works for the DAG-processing job, but is ignored when parsing the files through the CLI. This currently causes slow response times from the Airflow CLI for DAGs that use top-level variables.

By initializing the cache as part of the CLI startup, we make sure the cache is enabled properly for parsing. If the user did not configure caching explicitly, the added cache initialization is a no-op and the behavior is the same as before.

closes: #37543

I did not add a test case yet because I did not find an existing one for __main__.main() yet. I am open for suggestions on how to properly test the added caching, however since this is a very small change that does not change behavior until explicitly configured we may be good without one.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented Feb 19, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@Birne94 Birne94 marked this pull request as ready for review February 19, 2024 19:39
conf = write_default_airflow_configuration_if_needed()
if args.subcommand in ["webserver", "internal-api", "worker"]:
write_webserver_configuration_if_needed(conf)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use AIRFLOW__SECRETS__USE_CACHE instead of explicitly adding it in CLI?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that secret caching is implemented currently is that the SecretCache class is always used for looking up. If caching is not explicitly enabled, the cache will never be initialized and essentially is a no-op.

The problem is that SecretCache.init() is only called in the dag processing job. This means that if we configure caching and process the DAG there, it will work as expected. However if we load a DAG from any other place (e.g. the CLI), our caching configuration is ignored and cache creation is skipped.

With the change I suggest, the secrets.use_cache option will be honored from the CLI and caching will be enabled as expected.

Comment on lines 59 to 60
# Some tasks require parsing DAG files, so we enable secret caching for improved performance.
# If the user did not configure secret caching, this action is a no-op.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is too vague. What is “some” and why does secret caching affect performance on DAG-parsing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment, does it make more sense now?

Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the discussion into the Dev List and in PR decided to include it only for DagManager processor

By adding this changes into the main cli method it would enable everywhere into airflow.

I would suggest to open a new discussion into the https://airflow.apache.org/community/

For prevent accidentally merge it I mark request changes until proposed it not discussed in more wide audience.

@github-actions
Copy link

github-actions bot commented Apr 7, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 7, 2024
@github-actions github-actions bot closed this Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Secret caching is skipped when DAGs are parsed in Airflow CLI

4 participants