ref(aci): Remove redundant DetectorWorkflow rows#112402
Conversation
|
This PR has a migration; here is the generated SQL for for --
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL |
4ad4128 to
258e759
Compare
| issue_stream_workflow_ids = set( | ||
| DetectorWorkflow.objects.filter(detector_id__in=issue_stream_detector_ids).values_list( | ||
| "workflow_id", flat=True | ||
| ) | ||
| ) |
There was a problem hiding this comment.
When I ran this on redash it took 10 seconds. That might be fine, just a heads up that this could get stuck
| while bulk_delete_objects( | ||
| DetectorWorkflow, | ||
| logger=logger, | ||
| detector_id__in=error_detector_ids, |
There was a problem hiding this comment.
This will pass all of the error_detector_ids to the query. There are 2.9 million of these, so I think the query will fail.
I think you might be better off doing this in batches of ids. So run the query to get all of the error_detector_ids, then use chunked to split them into batches of 10k.
Using that, query DetectorWorkflow.objects.filter(detector_id__in=<chunk>, <exists query to detect if stream detector exists for this workflow>).values_list(detector_id, flat=True).
Then just delete however many rows match the query in that batch
| Detector = apps.get_model("workflow_engine", "Detector") | ||
| DetectorWorkflow = apps.get_model("workflow_engine", "DetectorWorkflow") | ||
|
|
||
| error_detectors = Detector.objects.filter(type="error").only("id") |
There was a problem hiding this comment.
Nit: I'd probably just use values_list("id", flat=True) so that you get the ids directly, since you're already just converting to ids in the loop
| Detector = apps.get_model("workflow_engine", "Detector") | ||
| DetectorWorkflow = apps.get_model("workflow_engine", "DetectorWorkflow") | ||
|
|
||
| error_detectors = Detector.objects.filter(type="error").only("id") |
There was a problem hiding this comment.
It might be better to fetch all of the error_detectors detector ids in a single query and just chunk through them, instead of using the rangewrapper. It's only a few million ints, and it saves having to requery on an unindexed field (type) for each chunk
| def test_deletes_error_workflow_with_matching_issue_stream(self) -> None: | ||
| assert not DetectorWorkflow.objects.filter(id=self.dw_error_should_delete.id).exists() | ||
|
|
||
| def test_preserves_issue_stream_workflow_when_error_deleted(self) -> None: | ||
| assert DetectorWorkflow.objects.filter(id=self.dw_issue_stream_keep.id).exists() | ||
|
|
||
| def test_preserves_error_workflow_without_matching_issue_stream(self) -> None: | ||
| assert DetectorWorkflow.objects.filter(id=self.dw_error_no_match.id).exists() | ||
|
|
||
| def test_preserves_issue_stream_only_workflow(self) -> None: | ||
| assert DetectorWorkflow.objects.filter(id=self.dw_issue_stream_only.id).exists() | ||
|
|
||
| def test_preserves_cross_project_error_workflow_without_issue_stream(self) -> None: | ||
| assert DetectorWorkflow.objects.filter(id=self.dw_error_project2.id).exists() | ||
|
|
||
| def test_total_count_after_migration(self) -> None: | ||
| assert DetectorWorkflow.objects.count() == 4 |
There was a problem hiding this comment.
These tests are extremely slow. I recommend combining these checks all into a single test, especially since we're not changing what the migration actually does
…per so this processes in batches
them into 10k groups to reduce the querying. update the tests to all be 1 test because of performance reasons in those tests
fa11450 to
7fdc783
Compare
Description
Since these workflows are connected to both the Issue Stream and the Error detector, we can remove the connection to the error detector because it's a subset of the Issue Stream.
This PR will find where this redundant connection is occurring, and remove the relationship for the Error Detector, while preserving it.
We don't need to make any processing changes etc, because of how these are selected in
processors/worfklow.pyThis PR should only be merged after: #112276 so we don't add any new connections after migrating.