-
Notifications
You must be signed in to change notification settings - Fork 16.4k
openlineage, common.sql: provide OL SQL parser as internal OpenLineage provider API #31398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d0e9764 to
774dc8f
Compare
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I haven't looked at the tests in details yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this generic enough to be a worthwhile default, or would making it a required field make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say both are valuable default values that are common for lots of SQL databases.
generated/provider_dependencies.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make it a required dep or optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes it optional as extra package for pip install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to revert this PR.
This change add openlineage extra to common.sql but openlineage has never been released. We can not add such extra till we release openlineage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eladkal So, I think we have chicken and egg problem - the OL provider is not released yet, but it's useless to release without any provider support. Can we not have those changes - which do not change what common.sql does, just add methods to it - without releasing OL provider first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We often have cases where providers co-depended. In such senario we release the providers in the same wave so users gain the functionality together.
So this PR should have been put on hold till OL provider is ready then merge them in a sequence so both are ready for the same release wave. In providers release cycle we release from main branch we do not cherry pick commits thus anything merged will be released.
Maybe @potiuk have another idea?
868257d to
b4ee4f8
Compare
1ecb8c9 to
e7bbe1f
Compare
d8955ff to
6c9d9cf
Compare
| @@ -0,0 +1,194 @@ | |||
| # Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have the utils folder and didn't add sql to the provider root? If we need a folder, would there be more specific names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say there are at least 2 kinds of utility files now, which is enough reason for me to have a separate folder for it. Probably some more would show up in near future, e.g. when developing additional support PythonOperator and TaskFlow API.
|
All the SQL string formatting makes me a bit nervous. Would it be possible to build those SQL with SQLAlchemy instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| :return: A :class:`airflow.providers.openlineage.sqlparser.DatabaseInfo` instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be expressed in the type annotation instead (so you need to add a -> DatabaseInfo if this is deleted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's empty method that may be optionally implemented, is it enough to annotate it with -> DatabaseInfo | None? Does this say enough? I'm no expert in static typing.
|
Some general coding style comments I find confusing:
|
|
Implement base methods for SQLExecuteQueryOperator & DbApiHook. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com> Rename methods to expose their purpose for OpenLineage. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Instead of referencing the SQLParser directly, modify various static methods to class methods instead, so they can use the cls argument to avoid spelling out the class name repeatedly. Also added a few changes to better ultilize type reference and eliminate some verbose type annotations.
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
|
I think we need to revert this PR. This PR adds |
|
I think it's perfectly fine. 'openlineage' extra is optional, and I think (looking at the code) all the openlineage imports are under "TYPE_CHECKING" flag. Surely the openlineage package is not yet published, but at most it means that no-one will be able to use "openlineage" extra YET. Not until openlineage provider is released. But a good thing is that once it is released, it will work as expected witht this common.sql package that is going to be released now - in other words, if in the future openlineage provider is released, there will be no need to release common.sql. I see fundamentally no problem with releasing common.sql with that code. |
This PR adds OpenLineage SQL parser as internal provider API.
It also adds support for
SQLExecuteQueryOperator. A set of selected SQL providers will be implemented in next PRs.closes: #29673