Skip to content

Conversation

@kacpermuda
Copy link
Contributor

@kacpermuda kacpermuda commented Jun 11, 2024

closes: #38767

For important changes look at the first commit, then for example implementation look at the second commit.

For Airflow Dataset I've added:

  • _get_normalized_scheme() function that still does scheme.lower() underneath but now we can also use this in OL provider and be sure that we are using the same mechanism everywhere.
  • Dataset.normalized_uri property - so that we can retrieve a normalized and AIP-60 compliant uri or None in all other cases (not an uri, no scheme, no normalizer etc.). At first i thought that in Airflow 3 we could just use Dataset.uri, as it will raise an error when a normalizer fails, but there can still be schems without a normalizer defined so i felt like this is needed.

Also small adjustment to ProvidersManager: I felt like this dataset-uris part of provider.yml is getting complex, so i re-wrote the _discover_dataset_uri_handlers method to be more flexible for future expansions (f.e. OL to AIP-60 converters).

This Pr should be only merged AFTER #40335.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@kacpermuda kacpermuda changed the title WIP - translate AIP-60 URI to OpenLineage WIP [AIP-62] Translate AIP-60 URI to OpenLineage Jun 11, 2024
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch 4 times, most recently from 05096aa to 918cc0d Compare June 19, 2024 08:49
@kacpermuda kacpermuda changed the title WIP [AIP-62] Translate AIP-60 URI to OpenLineage [AIP-62] Translate AIP-60 URI to OpenLineage Jun 19, 2024
@kacpermuda kacpermuda marked this pull request as ready for review June 19, 2024 09:19
@mobuchowski mobuchowski requested a review from uranusjr June 19, 2024 09:23
@mobuchowski mobuchowski added AIP-62 Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentation and removed provider:trino labels Jun 19, 2024
@mobuchowski
Copy link
Contributor

@uranusjr this is how we want to use AIP-60 datasets in OpenLineage

@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch from 918cc0d to 01756fd Compare June 19, 2024 10:30
@kacpermuda kacpermuda marked this pull request as draft July 8, 2024 17:50
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch 3 times, most recently from e0fc1fa to 37490be Compare July 9, 2024 17:22
@kacpermuda
Copy link
Contributor Author

I believe this PR is ready to be reviewed, however it should probably NOT be merged before #40335 gets merged.

@kacpermuda kacpermuda marked this pull request as ready for review July 10, 2024 10:24
@kacpermuda kacpermuda requested a review from bolkedebruin as a code owner July 10, 2024 10:24
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch from 37490be to a0b337f Compare July 10, 2024 10:45
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch 6 times, most recently from 741c4b8 to 84c1b09 Compare July 11, 2024 13:47
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch 3 times, most recently from f894908 to 29fe565 Compare July 16, 2024 07:50
@kacpermuda kacpermuda requested a review from uranusjr July 16, 2024 09:30
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch 4 times, most recently from f57c53e to 9dbc9ea Compare July 22, 2024 10:13
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch 2 times, most recently from 3a17c9f to b2ba021 Compare July 22, 2024 17:27
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
@kacpermuda kacpermuda force-pushed the aip62-ol-dataset-mapping branch from b2ba021 to bd2ef8c Compare July 22, 2024 18:07
@mobuchowski mobuchowski merged commit 8a912f9 into apache:main Jul 23, 2024
@kacpermuda kacpermuda deleted the aip62-ol-dataset-mapping branch July 23, 2024 09:04
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jul 24, 2024
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 16, 2024
Fix unit tests:
  - test_does_not_double_import_entrypoint_provider_plugins - in apache-airflow-providers-databricks==6.8.0 was added DatabricksWorkflowPlugin (apache/airflow#40724)
  - test_dataset - in apache-airflow-providers-amazon==8.27.0 changed
    Dataset URI format validation (apache/airflow#40173)

Change-Id: Iae902e544aae2086ea4495b0850c19f813aa7069
GitOrigin-RevId: 7d5b7a9ead32610f7e3864230e55bb3a17bf6da5
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 20, 2024
Changes:
- add suffix +composer to version
- remove http and sqlite from pre-installed providers as they are included to Composer dependencies already
- add pre-commit configuration file
- set Composer pypi dependencies
- adjust Airflow configs required for unit tests in order to prevent them from being cleaned up during testing
- fix test_dataset as in apache-airflow-providers-amazon==8.27.0 changed Dataset URI format validation (apache/airflow#40173)

Change-Id: Iac6842a49929d9e2c4b8ed29353312dbc450de8a
GitOrigin-RevId: 1680fdc22961fa22517b2fd21ca67e8240e1f16a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-62 Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentation area:lineage area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) provider:amazon AWS/Amazon - related issues provider:openlineage AIP-53

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Define and implement core AIP-60 to OpenLineage dataset mapping

6 participants