Skip to content

Conversation

@JDarDagran
Copy link
Contributor

This PR adds OpenLineage SQL parser as internal provider API.
It also adds support for SQLExecuteQueryOperator. A set of selected SQL providers will be implemented in next PRs.

closes: #29673

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I haven't looked at the tests in details yet)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generic enough to be a worthwhile default, or would making it a required field make more sense?

Copy link
Contributor Author

@JDarDagran JDarDagran May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say both are valuable default values that are common for lots of SQL databases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make it a required dep or optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes it optional as extra package for pip install

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to revert this PR.

This change add openlineage extra to common.sql but openlineage has never been released. We can not add such extra till we release openlineage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eladkal So, I think we have chicken and egg problem - the OL provider is not released yet, but it's useless to release without any provider support. Can we not have those changes - which do not change what common.sql does, just add methods to it - without releasing OL provider first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We often have cases where providers co-depended. In such senario we release the providers in the same wave so users gain the functionality together.

So this PR should have been put on hold till OL provider is ready then merge them in a sequence so both are ready for the same release wave. In providers release cycle we release from main branch we do not cherry pick commits thus anything merged will be released.

Maybe @potiuk have another idea?

@@ -0,0 +1,194 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have the utils folder and didn't add sql to the provider root? If we need a folder, would there be more specific names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say there are at least 2 kinds of utility files now, which is enough reason for me to have a separate folder for it. Probably some more would show up in near future, e.g. when developing additional support PythonOperator and TaskFlow API.

@uranusjr
Copy link
Member

All the SQL string formatting makes me a bit nervous. Would it be possible to build those SQL with SQLAlchemy instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:return: A :class:`airflow.providers.openlineage.sqlparser.DatabaseInfo` instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be expressed in the type annotation instead (so you need to add a -> DatabaseInfo if this is deleted)

Copy link
Contributor Author

@JDarDagran JDarDagran Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's empty method that may be optionally implemented, is it enough to annotate it with -> DatabaseInfo | None? Does this say enough? I'm no expert in static typing.

@JDarDagran
Copy link
Contributor Author

All the SQL string formatting makes me a bit nervous. Would it be possible to build those SQL with SQLAlchemy instead?

@uranusjr I changed it in 4f685d1

Please review again.

@uranusjr
Copy link
Member

uranusjr commented Jun 28, 2023

Some general coding style comments I find confusing:

  1. Why dict(key=value, ...) instead of {}?
  2. There are local imports (sqlparse?) that seemingly can be done globally.
  3. Reasoning is needed for the # type: ignore mypy comments.
  4. Please reformat most of the docstrings to have a short, single sentence summary, a blank line, and longer descriptions in a separate paragraph.

@JDarDagran
Copy link
Contributor Author

Some general coding style comments I find confusing:

  1. Why dict(key=value, ...) instead of {}?
  2. There are local imports (sqlparse?) that seemingly can be done globally.
  3. Reasoning is needed for the # type: ignore mypy comments.
  4. Please reformat most of the docstrings to have a short, single sentence summary, a blank line, and longer descriptions in a separate paragraph.

@uranusjr

  1. Changed to {}.
  2. Moved import sqlparse to global. Did not find anything else.
  3. I've fixed all typing issues and got rid of # type: ignore.
  4. Reformatted.

@JDarDagran JDarDagran requested a review from uranusjr June 28, 2023 12:18
JDarDagran and others added 5 commits June 29, 2023 10:09
Implement base methods for SQLExecuteQueryOperator & DbApiHook.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Rename methods to expose their purpose for OpenLineage.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Instead of referencing the SQLParser directly, modify various static
methods to class methods instead, so they can use the cls argument
to avoid spelling out the class name repeatedly.

Also added a few changes to better ultilize type reference and eliminate
some verbose type annotations.
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
@mobuchowski mobuchowski merged commit f2e2125 into apache:main Jun 29, 2023
@JDarDagran JDarDagran deleted the aip-53-sql branch June 29, 2023 12:24
@eladkal
Copy link
Contributor

eladkal commented Jul 1, 2023

I think we need to revert this PR.
see my note on https://github.com/apache/airflow/pull/31398/files#r1248622721

This PR adds openlineage extra to common.sql but openlineage provider is not in the release cycle.
We can not add such extra yet - this is an issue for releasing common.sql provider

cc @potiuk @ashb

@potiuk
Copy link
Member

potiuk commented Jul 2, 2023

I think it's perfectly fine. 'openlineage' extra is optional, and I think (looking at the code) all the openlineage imports are under "TYPE_CHECKING" flag. Surely the openlineage package is not yet published, but at most it means that no-one will be able to use "openlineage" extra YET. Not until openlineage provider is released.

But a good thing is that once it is released, it will work as expected witht this common.sql package that is going to be released now - in other words, if in the future openlineage provider is released, there will be no need to release common.sql.

I see fundamentally no problem with releasing common.sql with that code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide OL SQL parser as internal OpenLineage provider API

10 participants