-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[AIP-62] Translate AIP-60 URI to OpenLineage #40173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
05096aa to
918cc0d
Compare
Contributor
|
@uranusjr this is how we want to use AIP-60 datasets in OpenLineage |
918cc0d to
01756fd
Compare
uranusjr
reviewed
Jun 20, 2024
e0fc1fa to
37490be
Compare
Contributor
Author
|
I believe this PR is ready to be reviewed, however it should probably NOT be merged before #40335 gets merged. |
37490be to
a0b337f
Compare
741c4b8 to
84c1b09
Compare
uranusjr
reviewed
Jul 12, 2024
uranusjr
reviewed
Jul 12, 2024
f894908 to
29fe565
Compare
f57c53e to
9dbc9ea
Compare
kacpermuda
commented
Jul 22, 2024
3a17c9f to
b2ba021
Compare
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
b2ba021 to
bd2ef8c
Compare
mobuchowski
approved these changes
Jul 23, 2024
This was referenced Jul 28, 2024
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Sep 16, 2024
Fix unit tests: - test_does_not_double_import_entrypoint_provider_plugins - in apache-airflow-providers-databricks==6.8.0 was added DatabricksWorkflowPlugin (apache/airflow#40724) - test_dataset - in apache-airflow-providers-amazon==8.27.0 changed Dataset URI format validation (apache/airflow#40173) Change-Id: Iae902e544aae2086ea4495b0850c19f813aa7069 GitOrigin-RevId: 7d5b7a9ead32610f7e3864230e55bb3a17bf6da5
kosteev
pushed a commit
to GoogleCloudPlatform/composer-airflow
that referenced
this pull request
Sep 20, 2024
Changes: - add suffix +composer to version - remove http and sqlite from pre-installed providers as they are included to Composer dependencies already - add pre-commit configuration file - set Composer pypi dependencies - adjust Airflow configs required for unit tests in order to prevent them from being cleaned up during testing - fix test_dataset as in apache-airflow-providers-amazon==8.27.0 changed Dataset URI format validation (apache/airflow#40173) Change-Id: Iac6842a49929d9e2c4b8ed29353312dbc450de8a GitOrigin-RevId: 1680fdc22961fa22517b2fd21ca67e8240e1f16a
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
AIP-62
Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentation
area:lineage
area:providers
changelog:skip
Changes that should be skipped from the changelog (CI, tests, etc..)
provider:amazon
AWS/Amazon - related issues
provider:openlineage
AIP-53
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
closes: #38767
For important changes look at the first commit, then for example implementation look at the second commit.
For Airflow Dataset I've added:
_get_normalized_scheme()function that still doesscheme.lower()underneath but now we can also use this in OL provider and be sure that we are using the same mechanism everywhere.Dataset.normalized_uriproperty - so that we can retrieve a normalized and AIP-60 compliant uri or None in all other cases (not an uri, no scheme, no normalizer etc.). At first i thought that in Airflow 3 we could just use Dataset.uri, as it will raise an error when a normalizer fails, but there can still be schems without a normalizer defined so i felt like this is needed.Also small adjustment to ProvidersManager: I felt like this dataset-uris part of provider.yml is getting complex, so i re-wrote the
_discover_dataset_uri_handlersmethod to be more flexible for future expansions (f.e. OL to AIP-60 converters).This Pr should be only merged AFTER #40335.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.