Skip to content

Conversation

@JadeCara
Copy link
Contributor

@JadeCara JadeCara commented Dec 9, 2025

Ticket ENG-2200

Description Of Changes

There is some slowness on DSR creation - looking into potential bottlenecks. Located some queries related to duplicate detection which are referencing non indexed identity cols. This PR adds indexes to those columns.

Code Changes

  • add index on privacyrequest: policy_id, created_at
  • add index on providedidentity: privacy_request_id, field_name, hashed_value

Steps to Confirm

  1. There should be no change to any functionality.
  2. create requests and verify duplicates are found and marked as duplicates.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link

vercel bot commented Dec 9, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Dec 10, 2025 8:35pm
fides-privacy-center Ignored Ignored Dec 10, 2025 8:35pm

@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.31%. Comparing base (cd528c4) to head (22b8783).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7095   +/-   ##
=======================================
  Coverage   87.31%   87.31%           
=======================================
  Files         532      532           
  Lines       34969    34970    +1     
  Branches     4048     4048           
=======================================
+ Hits        30532    30533    +1     
  Misses       3560     3560           
  Partials      877      877           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JadeCara JadeCara marked this pull request as ready for review December 9, 2025 21:54
@JadeCara JadeCara requested a review from a team as a code owner December 9, 2025 21:54
@JadeCara JadeCara requested review from adamsachs and removed request for a team December 9, 2025 21:54
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 9, 2025

Greptile Overview

Greptile Summary

This PR addresses performance bottlenecks in Data Subject Request (DSR) creation by optimizing duplicate detection queries. The changes add two composite database indexes to improve query performance: ix_privacyrequest_policy_created on (policy_id, created_at) for the privacy request table and ix_providedidentity_reqid_field_hash on (privacy_request_id, field_name, hashed_value) for the provided identity table. These indexes target specific queries that filter privacy requests by policy and time windows during duplicate detection.

The PR also removes a redundant duplicate detection call from create_privacy_request() since the same check already occurs in the approval handling flow. The migration uses conditional index creation based on table size - tables with fewer than 1 million rows get indexes created immediately, while larger tables defer creation to use CREATE INDEX CONCURRENTLY for non-blocking operation. Additionally, one test was updated to accommodate the change in duplicate detection flow by adjusting its mocking strategy.

Important Files Changed

Filename Score Overview
src/fides/api/alembic/migrations/versions/xx_2025_12_09_2055_a7241db3ee6a_add_identity_indexes.py 4/5 Adds database migration for composite indexes with conditional creation based on table size
src/fides/api/models/privacy_request/privacy_request.py 5/5 Adds composite index definition to PrivacyRequest model for duplicate detection optimization
src/fides/api/models/privacy_request/provided_identity.py 5/5 Adds composite index definition to ProvidedIdentity model for duplicate detection optimization
src/fides/api/migrations/post_upgrade_index_creation.py 5/5 Registers new indexes for deferred creation on large tables using concurrent creation
src/fides/service/privacy_request/privacy_request_service.py 5/5 Removes redundant duplicate detection call from privacy request creation flow
tests/ops/service/privacy_request/test_duplication_detection.py 5/5 Updates test mocking strategy to align with modified duplicate detection flow

Confidence score: 4/5

  • This PR addresses clear performance issues with well-targeted database optimizations that should be safe to merge
  • Score reflects solid implementation with one minor style issue in the migration file where an import is placed after module identifiers rather than at the top
  • Pay close attention to the migration file for the import placement style violation and ensure the conditional index creation logic works correctly in production environments

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Jade Wibbels and others added 2 commits December 9, 2025 22:14
…7241db3ee6a_add_identity_indexes.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@JadeCara
Copy link
Contributor Author

JadeCara commented Dec 9, 2025

@greptile please review

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

"require_manual_request_approval",
"postgres_example_test_dataset_config",
)
@patch(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was flaking - hoping this solves the 🥐

Copy link
Contributor

@adamsachs adamsachs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! just a nit on a design choice in the migration that i don't consider blocking.

branch_labels = None
depends_on = None

from fides.api.migrations.post_upgrade_index_creation import INDEX_ROW_COUNT_THRESHOLD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i'm generally not so much in favor of a migration referencing variables/values defined outside of that migration, since it means the migration is no longer a frozen artifact in time -- i.e. if we were to ever redefine this threshold variable in the other module, then the migration behavior would technically change at that point. generally, i find it's much easier to reason about migrations as 'frozen', and i do think that's a general best practice.

obviously in this case it's hard to imagine this being problematic, in practice - so i don't have any major qualms. just wanted to mention that i'm not sure this is a great pattern. IMO, the DRYness that we're achieving here isn't really worth it - for a migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like your perspective! Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated :)

@JadeCara JadeCara enabled auto-merge December 10, 2025 20:35
@JadeCara JadeCara added this pull request to the merge queue Dec 10, 2025
Merged via the queue into main with commit cd632ba Dec 10, 2025
69 checks passed
@JadeCara JadeCara deleted the duplication_detection_improvement branch December 10, 2025 21:47
JadeCara added a commit that referenced this pull request Dec 11, 2025
Co-authored-by: Jade Wibbels <jade@ethyca.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@greptile-apps greptile-apps bot mentioned this pull request Dec 11, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants