Skip to content

ref(aci): Remove redundant DetectorWorkflow rows#112402

Merged
saponifi3d merged 5 commits intomasterfrom
jcallender/aci/drop-error-detectors
Apr 8, 2026
Merged

ref(aci): Remove redundant DetectorWorkflow rows#112402
saponifi3d merged 5 commits intomasterfrom
jcallender/aci/drop-error-detectors

Conversation

@saponifi3d
Copy link
Copy Markdown
Contributor

@saponifi3d saponifi3d commented Apr 7, 2026

Description

Since these workflows are connected to both the Issue Stream and the Error detector, we can remove the connection to the error detector because it's a subset of the Issue Stream.

This PR will find where this redundant connection is occurring, and remove the relationship for the Error Detector, while preserving it.

We don't need to make any processing changes etc, because of how these are selected in processors/worfklow.py

This PR should only be merged after: #112276 so we don't add any new connections after migrating.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

This PR has a migration; here is the generated SQL for src/sentry/workflow_engine/migrations/0112_drop_redundant_error_detector_workflows.py

for 0112_drop_redundant_error_detector_workflows in workflow_engine

--
-- Raw Python operation
--
-- THIS OPERATION CANNOT BE WRITTEN AS SQL

@saponifi3d saponifi3d force-pushed the jcallender/aci/drop-error-detectors branch from 4ad4128 to 258e759 Compare April 8, 2026 04:41
@saponifi3d saponifi3d marked this pull request as ready for review April 8, 2026 17:07
@saponifi3d saponifi3d requested review from a team as code owners April 8, 2026 17:07
Comment on lines +35 to +39
issue_stream_workflow_ids = set(
DetectorWorkflow.objects.filter(detector_id__in=issue_stream_detector_ids).values_list(
"workflow_id", flat=True
)
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran this on redash it took 10 seconds. That might be fine, just a heads up that this could get stuck

while bulk_delete_objects(
DetectorWorkflow,
logger=logger,
detector_id__in=error_detector_ids,
Copy link
Copy Markdown
Member

@wedamija wedamija Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pass all of the error_detector_ids to the query. There are 2.9 million of these, so I think the query will fail.

I think you might be better off doing this in batches of ids. So run the query to get all of the error_detector_ids, then use chunked to split them into batches of 10k.

Using that, query DetectorWorkflow.objects.filter(detector_id__in=<chunk>, <exists query to detect if stream detector exists for this workflow>).values_list(detector_id, flat=True).

Then just delete however many rows match the query in that batch

Detector = apps.get_model("workflow_engine", "Detector")
DetectorWorkflow = apps.get_model("workflow_engine", "DetectorWorkflow")

error_detectors = Detector.objects.filter(type="error").only("id")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd probably just use values_list("id", flat=True) so that you get the ids directly, since you're already just converting to ids in the loop

Detector = apps.get_model("workflow_engine", "Detector")
DetectorWorkflow = apps.get_model("workflow_engine", "DetectorWorkflow")

error_detectors = Detector.objects.filter(type="error").only("id")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to fetch all of the error_detectors detector ids in a single query and just chunk through them, instead of using the rangewrapper. It's only a few million ints, and it saves having to requery on an unindexed field (type) for each chunk

Comment on lines +54 to +70
def test_deletes_error_workflow_with_matching_issue_stream(self) -> None:
assert not DetectorWorkflow.objects.filter(id=self.dw_error_should_delete.id).exists()

def test_preserves_issue_stream_workflow_when_error_deleted(self) -> None:
assert DetectorWorkflow.objects.filter(id=self.dw_issue_stream_keep.id).exists()

def test_preserves_error_workflow_without_matching_issue_stream(self) -> None:
assert DetectorWorkflow.objects.filter(id=self.dw_error_no_match.id).exists()

def test_preserves_issue_stream_only_workflow(self) -> None:
assert DetectorWorkflow.objects.filter(id=self.dw_issue_stream_only.id).exists()

def test_preserves_cross_project_error_workflow_without_issue_stream(self) -> None:
assert DetectorWorkflow.objects.filter(id=self.dw_error_project2.id).exists()

def test_total_count_after_migration(self) -> None:
assert DetectorWorkflow.objects.count() == 4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are extremely slow. I recommend combining these checks all into a single test, especially since we're not changing what the migration actually does

@saponifi3d saponifi3d force-pushed the jcallender/aci/drop-error-detectors branch from fa11450 to 7fdc783 Compare April 8, 2026 21:51
@saponifi3d saponifi3d merged commit 7042e30 into master Apr 8, 2026
76 of 77 checks passed
@saponifi3d saponifi3d deleted the jcallender/aci/drop-error-detectors branch April 8, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants