-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Lazily import many modules to improve import speed #24486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/cc @ianbuss |
|
I really hope one day https://peps.python.org/pep-0690/ will be in place so that we don't have to do all those any more. |
Yeah, that would be nice. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
|
I recently listened to some of the folks behind PEP 690 discuss lazy imports and came away with the impression that even after it lands Airflow is highly likely not be able to use it. Lazy imports has a lot of implications and PEP 690 quite heavy-handedly make lazy imports the default, and Airflow has so many dependencies I can easily it might never be able to be fully compatible. |
Yeah. We've probably listened to the same Talk Python episode :). I think it's not even likely to have PEP 690 around in 3.11 or maybe even 3.12. Unless we roll our own sleeves up and help. I think it would be great to actually HELP in PEP 690 by making Airlfow works with lazy imports (maybe by improving PEP 690 implementation as well). |
|
Shall I resume this and get the SQLA model loading working do you think? |
|
I think it might be worth it. Would be nice to see how much we gain though in those casese |
Yup, lets try it out |
ec86106 to
711ecf1
Compare
711ecf1 to
68fab20
Compare
|
Wow mypy is confusing/wrong sometimes: Edit: turns out Mypy gets confused by Pep 562 lazy imports (not a great surprise really) -- something was importing BaseOperator from airflow.models, not airflow.models.baseoperator. |
a596456 to
bb9fa8f
Compare
airflow/serialization/json_schema.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was high up in the list of slow imports!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A schema validation library being slow… I guess that’s how third-party libraries work.
|
I've found an easy way to make sure all models are loaded when needed: This will ensure that on the first time any model class is queried or instantiated all other models will be loaded, which ensures we don't get the "class not found in registry" error message, but still means imports are "quick". |
43cec31 to
685498f
Compare
|
I wonder why this change causes these mypy errors to appear: |
|
I was testing this with Before: |
|
I tried adding a # MyPy doesn't/can't recoginze the PEP 562 style lazy imports, so we have to
# tell it about the imports
from .dag import DAG
from .base import ID_LEN
from .xcom import XCOM_RETURN_KEY
from .base import Base
from .baseoperator import BaseOperator
from .baseoperator import BaseOperatorLink
from .connection import Connection
from .dagbag import DagBag
from .dag import DagModel
from .dagpickle import DagPickle
from .dagrun import DagRun
from .dag import DagTag
from .db_callback_request import DbCallbackRequest
from .errors import ImportError
from .log import Log
from .mappedoperator import MappedOperator
from .operator import Operator
from .param import Param
from .pool import Pool
from .renderedtifields import RenderedTaskInstanceFields
from .skipmixin import SkipMixin
from .slamiss import SlaMiss
from .taskfail import TaskFail
from .taskinstance import TaskInstance
from .taskreschedule import TaskReschedule
from .trigger import Trigger
from .variable import Variable
from .xcom import XCom
from .taskinstance import clear_task_instances
def import_all_models(): ...But I ended up with loads of errors along the lines of: I wonder what is going on ... Any ideas? |
6955b6c to
5dc14ce
Compare
bd213c4 to
4817ccf
Compare
This may be "overkill" for the benefit it actually gives.
For some reason I couldn't get the typestub working: if I created `airflow/models/__init__.pyi` then Mypy couldn't find any of the submodules
4817ccf to
c54280a
Compare
|
Rebasing to check still good. |
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one :)


Goal here is to make the healthchecks for triggerer/scheduler quicker by having to do less
Roughly: