-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Description
This issue adds OpenLineage support for the PostgresOperator to enable metadata tracking for PostgreSQL operations. The feature aims to enhance data pipeline observability by integrating OpenLineage capabilities with Apache Airflow's PostgreSQL provider.
Current Behavior
PostgresOperator currently lacks built-in OpenLineage support for tracking metadata.
Users cannot define both the database and schema for PostgreSQL connections directly, limiting flexibility.
Expected Behavior
Add support for OpenLineage in the PostgresOperator to enable metadata tracking for queries and tasks.
Allow users to specify both the database and schema explicitly when using PostgreSQL connections in Airflow.
Use case/motivation
This feature improves Apache Airflow by adding OpenLineage support to the PostgresOperator, making it easier to track and understand how data flows through your workflows. With this integration, users can capture detailed metadata about SQL queries and their inputs and outputs, enhancing visibility and compliance in data pipelines.
It also simplifies connection configurations by letting users explicitly set the database and schema in PostgreSQL connections. This is especially useful in complex setups with multiple databases or schemas.
By adopting OpenLineage, Airflow aligns with modern data engineering standards, helping teams monitor, debug, and scale their workflows with confidence. This feature makes managing and maintaining data pipelines more transparent and user-friendly.
Related issues
This feature builds on PR #31398, which lays the groundwork for integrating OpenLineage with Apache Airflow. To ensure everything works smoothly, thorough testing is needed to verify that lineage tracking and database/schema configurations are correctly implemented.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct