Skip to content

Conversation

@mobuchowski
Copy link
Contributor

@mobuchowski mobuchowski commented Jun 5, 2024

This PR builds on #39890 (already merged).

After this change, OpenLineage will execute metadata extraction in separate, forked process.
It's a technique modeled to what interaction between LocalTaskJobRunner and StandardTaskRunner looks like - a process, in this case process of StandardTaskRunner watches over OpenLineage listener process during metadata extraction.

This adds a layer of isolation between task execution and OpenLineage, adding a level of assurance that OpenLineage execution does not interfere with task execution in a way other than taking time.
Additionally, this allows us to add configurable timeout for OL execute methods.

The reason for that is, beyond configurability, that sometimes metadata extraction code can hang - for example, when dealing with Snowflake connection issue snowflakedb/snowflake-connector-python#1898 - and we want to give as much guarantees that OL will not cause task to fail.

@boring-cyborg boring-cyborg bot added area:providers area:Scheduler including HA (high availability) scheduler provider:google Google (including GCP) related issues provider:openlineage AIP-53 provider:snowflake Issues related to Snowflake provider labels Jun 5, 2024
@mobuchowski mobuchowski force-pushed the openlineage-process-execution branch 4 times, most recently from 2c96acb to 44ba855 Compare June 7, 2024 11:51
@mobuchowski mobuchowski force-pushed the openlineage-process-execution branch 3 times, most recently from 69865d6 to 7c13f5d Compare June 11, 2024 13:11
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few nits about splitting the PR

@potiuk
Copy link
Member

potiuk commented Jun 11, 2024

Ah I missed those are two commits/PRs already :)

@mobuchowski mobuchowski force-pushed the openlineage-process-execution branch from 55e5792 to 8cbb8bc Compare June 13, 2024 12:46
Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
@mobuchowski mobuchowski force-pushed the openlineage-process-execution branch from 8cbb8bc to 187d87e Compare June 14, 2024 12:39
@potiuk potiuk merged commit 1a8d12f into main Jun 14, 2024
@eladkal eladkal deleted the openlineage-process-execution branch June 29, 2024 17:32
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
…ss (apache#40078)

Signed-off-by: Maciej Obuchowski <obuchowski.maciej@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers area:Scheduler including HA (high availability) scheduler provider:google Google (including GCP) related issues provider:openlineage AIP-53 provider:snowflake Issues related to Snowflake provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants