-
Notifications
You must be signed in to change notification settings - Fork 16
262 saas connector pagination strategies #286
262 saas connector pagination strategies #286
Conversation
Updating documentation and Postman collection
|
I have some concerns around whether we can always assume that our data path for pagination will be the exact same as that needed for unwrap post-processing. For example, if pagination data is within a diff data path than the actual identity data we need to extract? Further, what if we have a use case where we need to unwrap with data path |
Hey @eastandwestwind these are valid concerns, here are my thoughts on these two points:
|
|
To elaborate a bit, for clarity: Scenario 1If API response: And we first needed a filter processor, then an unwrap with In this scenario, if we also needed pagination, what would our Scenario 2:If we have the following API response: Where we first implement an unwrap processor such that |
|
Thanks for the examples, it helped uncover the scenario where the response data can be found at the root level of the response and does not need a data_path. If we make the data_path optional for these scenarios, then this is how we can handle the scenarios you specified. Scenario 1:In this case we can omit the data_path from the request and use this configuration: The expected results from this would be: And the pagination strategy would see 2 results in the list (because of the empty data_path) and would continue to the next page. Scenario 2:For this scenario we could do this: The result would be: And the pagination strategy would also see the two results after the implicit |
|
This helps a lot @galvana ! So to summarize, we'll keep So, a postprocessor will first look within the postprocessor config, and, if not found there, will look at the endpoint level. Is that right? |
To the first point, yes, and empty data_path can be the way we denote "unwrapping" an array at both the endpoint and postprocessor level. For the second point, the postprocessor will only look in its config for the data_path. The only way the endpoint data_path will impact postprocessors is that it will affect what gets passed into the postprocssors (the data found at data_path instead of the raw response). I'll at some test cases for the scenarios we discussed here. |
Accounting for the use case where the list of objects is at the root level of the response and does not need a data_path
eastandwestwind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple comments, otherwise looking good!
| object_list = pydash.get(response.json(), data_path) | ||
| if object_list: | ||
| object_list = ( | ||
| pydash.get(response.json(), data_path) if data_path else response.json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to pass in the entire response from execute_prepared_request? Seems like we can just use response_data to reduce duplicate logic in all the strategies? (20245aa#diff-23646bca49075c0841fb934d0e2c0db46cb46242f3deeba83fb13f50432f7198R176)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few reasons I went with this approach:
- The link paging strategy uses the header information from the raw response, so I wanted to keep that info.
- I didn't want to pass in the raw response and the unwrapped response, these seemed too similar.
- I wanted to pass in as much raw data as possible into the paging strategies and not make any assumptions about what's needed.
- I didn't want to make the assumption outside of the paging strategy if there should be a next page, since it may vary per strategy.
This does lead to some duplicate logic in the paging strategies but I'd rather be explicit and offload as much as possible to each strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes sense to me, thanks for the detailed explanation!
src/fidesops/service/processors/post_processor_strategy/post_processor_strategy_unwrap.py
Outdated
Show resolved
Hide resolved
src/fidesops/service/processors/post_processor_strategy/post_processor_strategy_unwrap.py
Show resolved
Hide resolved
eastandwestwind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for all the hard work addressing the additional use cases @galvana !
* Implementations of offset, link, and cursor pagination * Adding pagination to SaaS connector workflow Updating documentation and Postman collection * Fixing Pylint warning * Updating unwrap postprocessor to accepts lists in addition to dicts Accounting for the use case where the list of objects is at the root level of the response and does not need a data_path * Adding missing test case Co-authored-by: Adrian Galvan <adrian@ethyca.com>
Purpose
To add pagination strategies for SaaS connectors
Changes
retrieve_datafunction in SaaS Connectordata_pathon the request to reduce verbosity and to prevent pagination strategies (which need the data_path) from depending on postprocessor configurations.Checklist
Run Unsafe PR Checkslabel has been applied, and checks have passed, if this PR touches any external servicesTicket
Closes #262