Skip to content

Conversation

@mik-laj
Copy link
Member

@mik-laj mik-laj commented Nov 17, 2019

Currently, CLI loads all packages even when the command is not used in any way, which takes CPU time and memory. I suggest that only commands that are executed be loaded. This can speed up task launch time and also improve user productivity.

Quick benchmark:
Test:

seq 1 20 | xargs -n 1 -I {} time airflow version 2>&1 | grep sys

Before;

        2.03 real         1.83 user         0.34 sys
        2.07 real         1.84 user         0.34 sys
        2.01 real         1.82 user         0.34 sys
        2.03 real         1.83 user         0.34 sys
        2.04 real         1.85 user         0.34 sys
        2.04 real         1.85 user         0.34 sys
        2.03 real         1.83 user         0.34 sys
        2.07 real         1.84 user         0.35 sys
        2.01 real         1.82 user         0.34 sys
        2.06 real         1.85 user         0.35 sys
        2.03 real         1.82 user         0.35 sys
        2.05 real         1.83 user         0.35 sys
        2.03 real         1.84 user         0.34 sys
        2.02 real         1.83 user         0.34 sys
        2.06 real         1.84 user         0.34 sys
        2.05 real         1.84 user         0.34 sys
        2.20 real         1.91 user         0.38 sys
        2.15 real         1.91 user         0.37 sys
        2.13 real         1.90 user         0.36 sys
        2.25 real         1.99 user         0.39 sys

After:

        1.50 real         1.23 user         0.20 sys
        1.51 real         1.24 user         0.19 sys
        1.53 real         1.25 user         0.20 sys
        1.49 real         1.23 user         0.20 sys
        1.51 real         1.25 user         0.20 sys
        1.53 real         1.24 user         0.20 sys
        1.55 real         1.27 user         0.20 sys
        1.50 real         1.24 user         0.20 sys
        1.50 real         1.23 user         0.20 sys
        1.50 real         1.24 user         0.19 sys
        1.49 real         1.23 user         0.19 sys
        1.51 real         1.25 user         0.20 sys
        1.55 real         1.25 user         0.21 sys
        1.52 real         1.23 user         0.20 sys
        1.54 real         1.26 user         0.20 sys
        1.50 real         1.24 user         0.20 sys
        1.49 real         1.23 user         0.19 sys
        1.50 real         1.23 user         0.20 sys
        1.49 real         1.23 user         0.20 sys
        1.52 real         1.24 user         0.20 sys

Now airflow version command executes 25% faster.


Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-6001
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@mik-laj mik-laj force-pushed the cli-split branch 2 times, most recently from 1c2072f to 911910e Compare November 17, 2019 17:02
@mik-laj mik-laj requested review from ashb, kaxil and potiuk November 17, 2019 17:19
@codecov-io
Copy link

codecov-io commented Nov 17, 2019

Codecov Report

Merging #6594 into master will decrease coverage by 0.22%.
The diff coverage is 65.98%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6594      +/-   ##
==========================================
- Coverage   83.69%   83.46%   -0.23%     
==========================================
  Files         650      669      +19     
  Lines       37427    37560     +133     
==========================================
+ Hits        31325    31350      +25     
- Misses       6102     6210     +108
Impacted Files Coverage Δ
airflow/__init__.py 95.83% <ø> (ø) ⬆️
airflow/cli/commands/rotate_fernet_key_command.py 0% <0%> (ø)
airflow/cli/commands/serve_logs_command.py 0% <0%> (ø)
airflow/cli/commands/scheduler_command.py 0% <0%> (ø)
airflow/cli/commands/kerberos_command.py 0% <0%> (ø)
airflow/cli/commands/flower_command.py 0% <0%> (ø)
airflow/api/client/__init__.py 100% <100%> (ø)
airflow/cli/commands/role_command.py 100% <100%> (ø)
airflow/models/__init__.py 100% <100%> (ø) ⬆️
airflow/cli/commands/version_command.py 100% <100%> (ø)
... and 47 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0acb034...b40e73e. Read the comment docs.

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good but it's a big change. I like the approach, one thing I suggest is changing stuff_command.py to stuff_commands.py because often there are multiple commands in one file.

@mik-laj
Copy link
Member Author

mik-laj commented Nov 17, 2019

@nuclearpinguin Sometimes there is one command and sometimes there are several commands. It depends if it is a single command or a subcommand. Subcommend is a command that lets you invoke other commands, but it is still a command, although it groups other commands, so I'd rather stay with the current prefix.

This is a big change, but unfortunately it is very difficult to divide it into several PRs, because there will be a lot of conflicts in imports.I divided it very carefully into several commits so that everyone can do a review.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is indeed worth merging it fast. This is mainly restructuring/refactoring and moving code around and I think @mik-laj has done a reasonable job of splitting it into separate moves. I looked through it and I see that those changes make perfect sense for the future CLI changes and it's worth to take one-time pain of splitting this all up. But since it is such a huge change - maybe others @ashb @kaxil @KevinYang21 @Fokko @feluelle ? (and others) can take a look and see if they have no problem with merging such a big change.

@ashb
Copy link
Member

ashb commented Nov 18, 2019

@mik-laj Did you reverse before and after in your PR description?

@mik-laj
Copy link
Member Author

mik-laj commented Nov 18, 2019

@ashb yes. Fixed.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Couple of small comments

@mik-laj mik-laj force-pushed the cli-split branch 2 times, most recently from 621d460 to e4b8da4 Compare November 18, 2019 14:01
@mik-laj mik-laj changed the title [AIRFLOW-6001][depends on AIRFLOW-6000] Lazy load CLI commands [AIRFLOW-6001] Lazy load CLI commands Nov 18, 2019
@mik-laj
Copy link
Member Author

mik-laj commented Nov 19, 2019

I rebased the rebased, took into account the comments. Can I do something else?

@mik-laj mik-laj merged commit 4a344f1 into apache:master Nov 19, 2019
@kaxil
Copy link
Member

kaxil commented Nov 20, 2019

Really like this change. Good work @mik-laj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants