-
Notifications
You must be signed in to change notification settings - Fork 2
Feature/cursor paging #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
|
|
||
| def get(self, limit: Optional[int] = None, *, params: Optional[dict] = None, **kwargs) -> List[dict]: | ||
| def get(self, url: Optional[str] = None, limit: Optional[int] = None, *, params: Optional[dict] = None, **kwargs) -> requests.Response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep the same List[dict] output as the current code.
Could we instead allow the token argument to be passed into this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1: Call session.get_response() separately in get and get_pages. (decouple)
Option 2: Pass token as an optional argument to get. If passed, change output type from List[dict] to Union[List[dict], Tuple[List[dict], Optional[str]]].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with option 1 in my most recent updates for more output predictability and consistency + break get_pages into get_pages_offset and get_pages_cursor. I do recognize that there's a bit of repetition between the 2 getters, but since cursor-paging has different API requirements, headers used, compatibility rules, and fallback behaviors than off-set paging, I feel like this decomposition is justified. If you have any code design suggestion, that'd be great!
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
|
|
||
| elif cursor_paging and partitioning: | ||
| ods_version = tuple(map(int, self.client.get_ods_version().split(".")[:2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move the version-check to the top of the method and fail immediately if any disallowed arguments are passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess with the bifurcation of get_pages(), our notes about disallowed arguments don't apply for now? I did put some built-in checkpoints for get_pages_cursor so that:
- If ODS version is not 7.3, the paging method will switch to reverse-offset (Up to debate whether this should be the desired behavior or if it should error out completely)
- Since deletes and key changes are not supported by cursor paging, if the client tries to fetch them via get_pages_cursor() → throws warning and switch to reverse-offset (Again, up to debate if we should require more manual input from client if their requests are incompatible with the new feature)
edfi_api_client/edfi_endpoint.py
Outdated
| token = result.headers.get("Next-Page-Token") | ||
| return results | ||
|
|
||
| results = [element for sublist in Parallel(n_jobs=len(paged_tokens), backend="threading")(delayed(partitioning_with_token)(token) for token in paged_tokens) for element in sublist] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to dive deeper into this approach. There is a future where we enable async in the edfi_api_client and will need this method to be compatible.
jayckaiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a bunch of comments for things to change, most minor. I want to make sure that we:
- maintain backwards compatibility
- minimize code-hopping
I'd also request going through and making sure whitespace is kept the same between this branch and main, just to minimize the number of lines changed.
edfi_api_client/edfi_endpoint.py
Outdated
| paged_params['limit'] = limit | ||
|
|
||
| # Fall back to reverse-offset paging if incompatible with cursor paging | ||
| def _fall_back_to_pages_by_offset(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather put all conditional logic on choosing the internal helper method into get_rows() instead of defining a fallback. The choice of pagination should be made and declared immediately.
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
| ### Prepare pagination variables | ||
| ### First request should not have any `page_token` and `page_size` defined | ||
| paged_params.init_page_by_token(page_token = None, page_size = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Page_size should be defined regardless, since we always set a default value.
edfi_api_client/edfi_endpoint.py
Outdated
|
|
||
| ### Prepare pagination variables | ||
| ### First request should not have any `page_token` and `page_size` defined | ||
| paged_params.init_page_by_token(page_token = None, page_size = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consolidate 317 and 332 and put it at the start of the while-loop. We can define page_token = None above the loop for the first pass.
| """ | ||
| # Override init params if passed | ||
| paged_params = EdFiParams(params or self.params).copy() | ||
| logging.info(f"[Get {self.component}] Endpoint: {self.url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this logging statement.
No description provided.