Skip to content

Conversation

@Satoshi-Sh
Copy link
Contributor

Description

Added dag_ids query to the GET /datasets endpoint. Updated document and unit test accordingly.

Related Issue

closes #37423


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues labels Feb 18, 2024
@uranusjr
Copy link
Member

I wonder if it’s clear enough to call this just dag_ids, or should a more descriptive name be used, say comsuming_dag_ids.

@Satoshi-Sh
Copy link
Contributor Author

dag_ids could be from consuming_dags and producing_tasks.

We could have 2 queries for consuming_dags and producing_tasks separately. For now, I put them together as dag_ids.

@bbovenzi
Copy link
Contributor

I wonder if it’s clear enough to call this just dag_ids, or should a more descriptive name be used, say comsuming_dag_ids.

My use case is that I want any datasets connected to a single dag. But I am indifferent if that is a single param dag_ids or if I need to pass the dag_id twice in consuming_dag_ids and producing_dag_ids. I guess the later is most flexible.

Copy link
Contributor

@bbovenzi bbovenzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as "request changes" to make sure we don't accidentally merge.

@Satoshi-Sh Satoshi-Sh requested a review from bbovenzi February 20, 2024 19:56
@jedcunningham
Copy link
Member

My use case is that I want any datasets connected to a single dag. But I am indifferent if that is a single param dag_ids or if I need to pass the dag_id twice in consuming_dag_ids and producing_dag_ids. I guess the later is most flexible.

I think having both makes sense. In your use case, if we only had consuming/producing, you'd have to hit the endpoint twice (they'd be AND'd together, not OR'd).

So maybe we start with the simple dag_ids and be clear in the description it filters for both consuming or producing dags. Leave the more granular filters for another PR / future need?

@bbovenzi bbovenzi added this to the Airflow 2.9.0 milestone Feb 20, 2024
Co-authored-by: Brent Bovenzi <brent.bovenzi@gmail.com>
Copy link
Contributor

@bbovenzi bbovenzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, works great. Thanks for picking this up!

@potiuk potiuk merged commit fae6310 into apache:main Feb 21, 2024
@Satoshi-Sh Satoshi-Sh deleted the feat/#37423/filter_datasets_by_dag_id branch February 21, 2024 16:01
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues type:improvement Changelog: Improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filter datasets by dag_id in rest API

8 participants