Skip to content

Conversation

@Joffreybvn
Copy link
Contributor

@Joffreybvn Joffreybvn commented Sep 25, 2023

Hello,

This PR implements support for paginated API in the SimpleHttpOperator. No breaking changes, but adds a lot of code into this Operator. I can also propose a PR with a new PaginatedHttpOperator if better ?

Use case:

I have been facing a very specific use-case where the API:

  • Returns data (as JSON) in a paginated way
  • Do not provide any way to know in advance how much pages / entries will be returned
  • Provides a "next_page_cursor" attribute, which is a random id, to do paginated calls

Dynamic Tasks? It works like a for loop. In this case, knowing in advance the number of pages means calling the API two times. Which is inefficient, and do not guarantee all pages are traversed during the second call.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@jscheffl
Copy link
Contributor

Thanks for the contribution! I was not aware of cases where pagination is used until I saw this PR. I don't know how common this is in API Calls and might be very specific.
As the HTTP operator is very basic and used in many use cases I am not sure whether a pagination function really is best to be suited in the base operator or as proposed an extended Operator inheriting from the SimpleHttpOperator would be better.

Also - to be open - I am not sure whether such function is so common that it makes sense to add it into the code package. I believe (but this is my personal opinion - other opinions welcome!) that this could also be very good placed in a contributor package to keep the basic operator "simple". Python is extensible and the standard operator can easily be used and extended per customer project.

(I am not a committer so my opinion only counts a bit, would be good to get other maintainers opinion on this contribution - also if not in code where such extension could be best placed in.)

@Joffreybvn Joffreybvn force-pushed the feature/add-pagination-http-operator branch from cdf155e to 96f9985 Compare September 27, 2023 20:35
@jscheffl jscheffl added type:new-feature Changelog: New Features and removed kind:documentation labels Sep 27, 2023
@hussein-awala
Copy link
Member

From his name SimpleHttpOperator and the description:

Calls an endpoint on an HTTP system to execute an action.

I see that it should be simple and just used to execute actions and not load big payloads which need pagination.

IMHO creating a new operator is better for this feature, and try to keep it generic, like HttpOperator and maybe a subclass of the first one if possible

@Joffreybvn
Copy link
Contributor Author

Alright, thanks for reviews ! I'll create a new PR with a Paginated one, based on the HttpOperator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants