Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Feb 14, 2024

Follow up after #37406 - we could not test this change as there was no way to test canary builds but as the capability has been added in #37408 we can fix the failure and actually test it in a "canary" PR.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Follow up after #37406 - we could not test this change as there
was no way to test canary builds but as the capability has been
added in #37408 we can fix the failure and actually test it in
a "canary" PR.
@potiuk potiuk added the canary When set on PR running from apache repo - behave as canary run label Feb 14, 2024
Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is another part which could be simplified by packaging.utils ... probably one day.

@potiuk
Copy link
Member Author

potiuk commented Feb 14, 2024

Looks like it is another part which could be simplified by packaging.utils ... probably one day.

A bit more complex - because we are using provider.id as identifier in generated provider_dependencies.json, so packaging_utils would not help :).

@potiuk potiuk merged commit 6667711 into main Feb 14, 2024
@Taragolis
Copy link
Contributor

Taragolis commented Feb 14, 2024

A bit more complex - because we are using provider.id as identifier in generated provider_dependencies.json

I can't see a problem for generate valid canonical name from provider_id and build mapping between them (in globals?)
We pretty often use provider_id and after that convert it into the package name and vice versa.

import json
from pathlib import Path
from packaging.utils import canonicalize_name, parse_wheel_filename, parse_sdist_filename


PROJECT_DIR = Path("").resolve(). # Some path here
provider_dependencies = PROJECT_DIR / "generated" / "provider_dependencies.json"
assert provider_dependencies.exists()


provider_dependencies_json = json.loads(provider_dependencies.read_text())
mapping = {
    pi: canonicalize_name(f"apache-airflow-providers-{pi}") for pi in provider_dependencies_json
}
inverted_mapping = {v: k for k, v in mapping.items()}
print(json.dumps(mapping, indent=2))

# Samples
sdist_files = [
    "apache-airflow-providers-airbyte-3.2.1.tar.gz",
    "apache_airflow_providers_airbyte-3.6.0.tar.gz"
]

wheel_files = [
    "apache_airflow_providers_airbyte-3.6.0-py3-none-any.whl",
    "apache-airflow-providers-airbyte-42.0.0-py3-none-any.whl",
    # Invalid name
    "apache_airflow_providers_airbyte-42.0.0.whl",
]

for sdist in sdist_files:
    package, _ = parse_sdist_filename(sdist)
    provider_id = inverted_mapping[package]
    print(f"Provider ID: {provider_id}, sdist: {sdist}")

for wheel in wheel_files:
    package, *_ = parse_wheel_filename(wheel)
    provider_id = inverted_mapping[package]
    print(f"Provider ID: {provider_id}, sdist: {wheel}")

@potiuk
Copy link
Member Author

potiuk commented Feb 14, 2024

A bit more complex - because we are using provider.id as identifier in generated provider_dependencies.json

I can't see a problem for generate valid canonical name from provider_id and build mapping between them (in globals?) We pretty often use provider_id and after that convert it into the package name and vice versa.

import json
from pathlib import Path
from packaging.utils import canonicalize_name, parse_wheel_filename, parse_sdist_filename


PROJECT_DIR = Path("").resolve(). # Some path here
provider_dependencies = PROJECT_DIR / "generated" / "provider_dependencies.json"
assert provider_dependencies.exists()


provider_dependencies_json = json.loads(provider_dependencies.read_text())
mapping = {
    pi: canonicalize_name(f"apache-airflow-providers-{pi}") for pi in provider_dependencies_json
}
inverted_mapping = {v: k for k, v in mapping.items()}
print(json.dumps(mapping, indent=2))

# Samples
sdist_files = [
    "apache-airflow-providers-airbyte-3.2.1.tar.gz",
    "apache_airflow_providers_airbyte-3.6.0.tar.gz"
]

wheel_files = [
    "apache_airflow_providers_airbyte-3.6.0-py3-none-any.whl",
    "apache-airflow-providers-airbyte-42.0.0-py3-none-any.whl",
    # Invalid name
    "apache_airflow_providers_airbyte-42.0.0.whl",
]

for sdist in sdist_files:
    package, _ = parse_sdist_filename(sdist)
    provider_id = inverted_mapping[package]
    print(f"Provider ID: {provider_id}, sdist: {sdist}")

for wheel in wheel_files:
    package, *_ = parse_wheel_filename(wheel)
    provider_id = inverted_mapping[package]
    print(f"Provider ID: {provider_id}, sdist: {wheel}")

Ah ok yes, could be :) . Feel free :)

@Taragolis Taragolis deleted the fix_replacement_of_provider_package branch April 3, 2024 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools canary When set on PR running from apache repo - behave as canary run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants