Fix: OpenSearch queries are huge (azul-private#316)#7775
Fix: OpenSearch queries are huge (azul-private#316)#7775nadove-ucsc wants to merge 16 commits intodevelopfrom
Conversation
5556578 to
0018866
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7775 +/- ##
===========================================
+ Coverage 85.12% 85.15% +0.02%
===========================================
Files 158 158
Lines 23448 23498 +50
===========================================
+ Hits 19960 20009 +49
- Misses 3488 3489 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9e94090 to
99c77c9
Compare
c4de53a to
6186d40
Compare
There was a problem hiding this comment.
typing.Iterator is deprecated, should use its collections.abc equivalent instead. Is permissive type hinting better than no type hint? Consider adding it, the one for default was derived from the stub of the supper class.
Index: src/azul/service/query_service.py
IDEA additional info: REVIEW
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/azul/service/query_service.py b/src/azul/service/query_service.py
--- a/src/azul/service/query_service.py (revision 6186d40dc2f9d34c5a28f48ae4177f1e36eb3be3)
+++ b/src/azul/service/query_service.py (date 1773184850522)
@@ -7,6 +7,7 @@
)
from collections.abc import (
Iterable,
+ Iterator,
Mapping,
Sequence,
)
@@ -15,7 +16,6 @@
from typing import (
Any,
Generic,
- Iterator,
Self,
TypeVar,
)
@@ -702,11 +702,11 @@
class TemplateSearchJSONEncoder(json.JSONEncoder):
- def __init__(self, **kwargs):
+ def __init__(self, **kwargs: Any) -> None:
super().__init__(**kwargs)
self.params: MutableJSON = {}
- def default(self, obj):
+ def default(self, obj: Any) -> Any:
if isinstance(obj, Template):
try:
old_value = self.params[obj.param_name]
Should changes to commits 14b464ac & 79f6dc79 be sooner in the commit history?
From CONTRIBUTING.rst
We separate semantically neutral changes from those that alter semantics by committing them separately, …
We also push every semantically neutral commit separately such that the build status checks on Github and Gitlab prove the commit's semantic neutrality.
Commit 94badd2 removed the 2nd filter used is this test, making it identical to `test_create_request`. It has also been superceded in complexity by `test_create_request_terms_and_missing_values`.
All three cases mentioned in the comment (empty, single filter, complex multiple filters) are already implemented as separate tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminating the call to `scan` helps facilitate the transition to template queries
|
I've updated the type annotations as you suggest, with the exception of |
6186d40 to
26e593e
Compare
src/azul/service/query_service.py
Outdated
| order=pagination.order) | ||
|
|
||
|
|
||
| @attr.s(frozen=True, auto_attribs=True, kw_only=True) |
There was a problem hiding this comment.
This module isn't covered by mypy and I don't want to accrue more unchecked code. Besides, I don't think this is specific to the service. I think a more suitable place for this is azul.es, which is already covered by mypy.
src/azul/service/query_service.py
Outdated
| 'params': encoder.params, | ||
| } | ||
|
|
||
| def execute(self, ignore_cache: bool = False) -> Any: |
There was a problem hiding this comment.
Attribution needed. A comment should to explain where this was lifted from, and why we need to duplicate it.
26e593e to
7339461
Compare
| for hit in request.scan(): | ||
| doc = self._hit_to_doc(hit) | ||
| yield doc | ||
| # Need two sort fields to satisfy type constraints |
There was a problem hiding this comment.
Hacky. Maybe SortKey isn't the right type then. I thought SortKey is for the actual values, not the names of the fields they come from.
| return super().default(obj) | ||
|
|
||
| def iterencode(self, o: AnyJSON, _one_shot: bool = False) -> Iterator[str]: | ||
| with unittest.mock.patch('json.encoder.encode_basestring_ascii', |
There was a problem hiding this comment.
You can't patch a global module entry. It's not thread-safe and we use threads, for the summary, for example.
Generally speaking this approach is a bit concerning. For one, it doesn't help that it is so sparsely documented. Smoking gun: Template, the most obvious place for documenting the overall approach, has no docstring.
I also don't understand why we have to wait until serialization time to expand the templates. I am surprised that something that already has a value is called a "template". The point of a template is to have placeholders, so that the values for the placeholders are provided when the template is applied, not when the template is created. It also seems that the only reason the encoder tracks parameters is to facilitate the assert. Generally speaking, I am a bit confused by this approach.
Please, extract the changes that replace scan() into another PR. There may already be an issue for that effort. If not, create one. That issue should be blocking #316.
Lastly rethink the general "template" approach or request a PL to explain/defend it.
Linked issues: https://github.com/DataBiosphere/azul-private/issues/316
Checklist
Author
developissues/<GitHub handle of author>/<issue#>-<slug>1 when the issue title describes a problem, the corresponding PR
title is
Fix:followed by the issue titleAuthor (partiality)
ptag to titles of partial commitspartialor completely resolves all linked issuespartiallabelAuthor (reindex)
rtag to commit title or the changes introduced by this PR will not require reindexing of any deploymentreindex:devor the changes introduced by it will not require reindexing ofdevreindex:anvildevor the changes introduced by it will not require reindexing ofanvildevreindex:anvilprodor the changes introduced by it will not require reindexing ofanvilprodreindex:prodor the changes introduced by it will not require reindexing ofprodreindex:partialand its description documents the specific reindexing procedure fordev,anvildev,anvilprodandprodor requires a full reindex or carries none of the labelsreindex:dev,reindex:anvildev,reindex:anvilprodandreindex:prodAuthor (API changes)
APIor this PR does not modify a REST APIa(A) tag to commit title for backwards (in)compatible changes or this PR does not modify a REST APIapp.pyor this PR does not modify a REST APIAuthor (upgrading deployments)
make docker_images.jsonand committed the resulting changes or this PR does not modifyazul_docker_images, or any other variables referenced in the definition of that variableutag to commit title or this PR does not require upgrading deploymentsupgradeor does not require upgrading deploymentsdeploy:sharedor does not modifydocker_images.json, and does not require deploying thesharedcomponent for any other reasondeploy:gitlabor does not require deploying thegitlabcomponentdeploy:runneror does not require deploying therunnerimageAuthor (hotfixes)
Ftag to main commit title or this PR does not include permanent fix for a temporary hotfixanvilprodandprod) have temporary hotfixes for any of the issues linked to this PRAuthor (before every review)
develop, squashed fixups from prior reviewsmake requirements_updateor this PR does not modifyDockerfile,environment,requirements*.txt,common.mk,Makefileorenvironment.bootRtag to commit title or this PR does not modifyrequirements*.txtreqsor does not modifyrequirements*.txtmake integration_testpasses in personal deployment or this PR does not modify functionality that could affect the IT outcomePeer reviewer (after approval)
Note that after requesting changes, the PR must be assigned to only the author.
System administrator (after approval)
demoorno demono demono sandboxN reviewslabel is accurateOperator
reindex:…labels andrcommit title tagno demodevelopOperator (deploy
.sharedand.gitlabcomponents)_select dev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select dev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlab_select anvildev.shared && CI_COMMIT_REF_NAME=develop make -C terraform/shared apply_keep_unusedor this PR is not labeleddeploy:shared_select anvildev.gitlab && CI_COMMIT_REF_NAME=develop make -C terraform/gitlab applyor this PR is not labeleddeploy:gitlabdeploy:gitlabdeploy:gitlabSystem administrator (post-deploy of
.gitlabcomponent)dev.gitlabare complete or this PR is not labeleddeploy:gitlabanvildev.gitlabare complete or this PR is not labeleddeploy:gitlabOperator (deploy runner image)
_select dev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runner_select anvildev.gitlab && make -C terraform/gitlab/runneror this PR is not labeleddeploy:runnerOperator (sandbox build)
sandboxlabel or PR is labeledno sandboxdevor PR is labeledno sandboxanvildevor PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxdeployment or PR is labeledno sandboxanvilboxdeployment or PR is labeledno sandboxsandboxor this PR does not remove catalogs or otherwise causes unreferenced indices insandboxanvilboxor this PR does not remove catalogs or otherwise causes unreferenced indices inanvilboxsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevsandboxor this PR is not labeledreindex:devanvilboxor this PR is not labeledreindex:anvildevOperator (merge the branch)
pif the PR is also labeledpartialOperator (main build)
devanvildevdevdevanvildevanvildev_select dev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shared_select anvildev.shared && make -C terraform/shared applyor this PR is not labeleddeploy:shareddevanvildevOperator (reindex)
devor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR is neither labeledreindex:partialnorreindex:devanvildevor this PR is neither labeledreindex:partialnorreindex:anvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdevor this PR does not require reindexingdevdevor this PR does not require reindexingdevdeploy_browserjob in the GitLab pipeline for this PR indevor this PR does not require reindexingdevanvildevor this PR does not require reindexinganvildevdeploy_browserjob in the GitLab pipeline for this PR inanvildevor this PR does not require reindexinganvildevOperator (mirroring)
devor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevdevor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevdevor this PR does not require mirroringdevanvildevor this PR does not require mirroringanvildevOperator
deploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels to the next promotion PRs or this PR carries none of these labelsdeploy:shared,deploy:gitlab,deploy:runner,API,reindex:partial,reindex:anvilprodandreindex:prodlabels, from the description of this PR to that of the next promotion PRs or this PR carries none of these labelsShorthand for review comments
Lline is too longWline wrapping is wrongQbad quotesFother formatting problem