Skip to content

Conversation

@mohamedmeqlad99
Copy link

@mohamedmeqlad99 mohamedmeqlad99 commented Jan 2, 2025

Fix Databricks Extra Link to See Job Run with Custom S3 Backend

Problem Addressed

Closes: #45240
This PR addresses the issue described in #45240, where the Databricks Extra Link for "See Job Run" does not work when using a custom XCom backend that writes to S3. Specifically:

  • The link directs to the S3 path of the XCom JSON file instead of the Databricks workspace job run URL.
  • This behavior makes the feature unusable in setups with custom XCom backends, such as those storing XCom values entirely in S3.

Proposed Changes

  • Refactored the DatabricksJobRunLink implementation to reliably fetch the Databricks job run URL from XCom using XCom.get_value.
  • Ensured compatibility with custom XCom backends, decoupling the link generation logic from specific backend implementations.
  • Improved error handling to ensure the system fails gracefully if the required XCom value is missing or malformed.
  • Enhanced maintainability with detailed comments and a cleaner structure.

Key Benefits

  • Resolves the issue of broken links when using custom XCom backends.
  • Ensures compatibility across a wide range of backend configurations, including setups where all XCom values are stored in S3.
  • Improves user experience by providing accurate links to Databricks job run details.

Testing

  • Validated the new implementation using a local Airflow environment with a custom XCom backend.
  • Confirmed that the generated links now correctly point to Databricks job run URLs, regardless of the XCom storage location.

Issue Link

Closes #45240.

Checklist

  • All unit tests pass.
  • Code is formatted according to Airflow’s standards.
  • Changes are backward compatible and do not break existing functionality.
  • adding unit test

Code of Conduct

I agree to follow this project's Code of Conduct.


# Example logic: Extract the job run ID from the path
job_details = parsed.path.split("/")[-1].replace(".json", "")
databricks_workspace_url = f"https://<your-databricks-workspace-url>/#job/{job_details}/run"
Copy link
Member

@pankajkoti pankajkoti Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this <your-databricks-workspace-url> in the URL work?

Also, can you provide an example of how the XCOM value looks like?

assert expected_task_key == operator.databricks_task_key


def test_get_link_with_valid_s3_url():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is doing a lot of mocking and does not seem to test much.

value = XCom.get_value(key=XCOM_RUN_PAGE_URL_KEY, ti_key=ti_key)

# Check if the value is an S3 URL
if value and value.startswith("s3://"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the implementation doesn't store an s3 url but just object key? What about other schemes that are not s3?

ti_key: TaskInstanceKey,
) -> str:
return XCom.get_value(key=XCOM_RUN_PAGE_URL_KEY, ti_key=ti_key)
# Retrieve the value from XCom

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally appreciate having comments, yet I think the method name get_value is already saying that a value is being retrieved

parsed = urlparse(s3_url)

# Example logic: Extract the job run ID from the path
job_details = parsed.path.split("/")[-1].replace(".json", "")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assuming that the json file path contains the job details? That's not true, the details (the complete URL) is stored inside the file


# Example logic: Extract the job run ID from the path
job_details = parsed.path.split("/")[-1].replace(".json", "")
databricks_workspace_url = f"https://<your-databricks-workspace-url>/#job/{job_details}/run"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this clickable? How/where is <your-databricks-workspace-url> going to be replaced?

- Refactored the DatabricksJobRunLink class to include a more robust implementation for retrieving the job run URL from XCom.
- Updated to utilize XCom.get_value for improved clarity and maintainability.
- Enhanced error handling to prevent issues when the XCom key does not exist.
- Added detailed comments to clarify logic and improve readability.

This change ensures better alignment with Airflow's standards for operator links and enhances the reliability of job run URL retrieval.
@potiuk
Copy link
Member

potiuk commented Jan 2, 2025

I rebased it @mohamedmeqlad99 -> we found and issue with @jscheffl with the new caching scheme - fixed in #45347 that would run "main" version of the tests.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Feb 17, 2025
@github-actions github-actions bot closed this Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:databricks stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Databricks Extra Links See Job Run does not work with custom S3 backend

4 participants