Skip to content

Conversation

@JadeCara
Copy link
Contributor

@JadeCara JadeCara commented Nov 25, 2025

Ticket ENG-2046

Description Of Changes

🎯 We currently don’t have an index on providedidentity.privacy_request_id , which causes us to run a full table scan any time we call a privacy request’s get_persisted_identity method.

More context on this slack thread

AC

  • Add an index on providedidentity.privacy_request_id . Let’s use our pattern of deferring index creation if there’s too many rows, just to be safe.

Added a database index on the providedidentity.privacy_request_id foreign key column

  • Uses deferred index creation pattern: creates index immediately for tables < 1M rows, defers for larger tables
    For large tables (≥ 1M rows), index is automatically created using CREATE INDEX CONCURRENTLY on application startup via post_upgrade_index_creation.py
  • Downgrade uses DROP INDEX IF EXISTS to handle cases where index may not exist
  • Entry added to TABLE_OBJECT_MAP in post_upgrade_index_creation.py with comment noting it can be removed once all deployments have upgraded

Code Changes

  • Added migration /src/fides/api/alembic/migrations/versions/xx_2025_11_25_1854_3ff6449c099e_add_index_on_providedidentity_privacy_.py
  • Added providedidentity entry for deferred index creation /src/fides/api/migrations/post_upgrade_index_creation.py

Steps to Confirm

  1. running fides pointed at this branch run migration
  2. Verify index was created by querying in fides db
 SELECT indexname, indexdef 
   FROM pg_indexes 
   WHERE tablename = 'providedidentity' 
   AND indexname = 'ix_providedidentity_privacy_request_id';

You should get:

-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------
indexname | ix_providedidentity_privacy_request_id
indexdef  | CREATE INDEX ix_providedidentity_privacy_request_id ON public.providedidentity USING btree (privacy_request_id)
  1. now run downgrade and run the sql command again. This time you should get:
indexname | indexdef 
-----------+----------
(0 rows)

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link

vercel bot commented Nov 25, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
fides-plus-nightly Ready Ready Preview Comment Nov 28, 2025 7:31pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
fides-privacy-center Ignored Ignored Nov 28, 2025 7:31pm

@JadeCara JadeCara marked this pull request as ready for review November 25, 2025 19:23
@JadeCara JadeCara requested a review from a team as a code owner November 25, 2025 19:23
@JadeCara JadeCara requested review from thabofletcher and removed request for a team November 25, 2025 19:23
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 25, 2025

Greptile Overview

Greptile Summary

Adds database index on providedidentity.privacy_request_id to eliminate full table scans when calling get_persisted_identity method.

  • Uses deferred index creation pattern: creates immediately for tables < 1M rows, defers to application startup for larger tables
  • Follows established codebase patterns for safe index creation on production databases
  • Downgrade properly handles cases where index may not exist using DROP INDEX IF EXISTS
  • Post-upgrade entry correctly configured for CREATE INDEX CONCURRENTLY to avoid table locks

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The implementation follows the exact established pattern used in other deferred index migrations in the codebase. The changes are minimal, well-tested patterns, and address a clear performance issue without introducing new logic or complexity.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
src/fides/api/alembic/migrations/versions/xx_2025_11_25_1854_3ff6449c099e_add_index_on_providedidentity_privacy_.py 5/5 Adds deferred index creation on providedidentity.privacy_request_id foreign key following established patterns
src/fides/api/migrations/post_upgrade_index_creation.py 5/5 Adds providedidentity entry to TABLE_OBJECT_MAP for concurrent index creation on large tables

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@JadeCara JadeCara requested a review from a team as a code owner November 25, 2025 19:26
@JadeCara JadeCara requested review from speaker-ender and removed request for a team November 25, 2025 19:26
@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.01%. Comparing base (171f009) to head (7ce5dc6).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7035   +/-   ##
=======================================
  Coverage   87.00%   87.01%           
=======================================
  Files         528      528           
  Lines       34678    34682    +4     
  Branches     4010     4010           
=======================================
+ Hits        30173    30177    +4     
  Misses       3629     3629           
  Partials      876      876           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@erosselli erosselli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing this so quickly! I left two comments but the only one I care about is the first one about adding the index to the model :) approving with that in mind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also add the index declaratively on the model with __table_args__ , that way anyone who looks at the model know what indexes are available without needing to look at the DB or dig through migrations

sa.text("SELECT COUNT(*) FROM providedidentity")
).scalar()

if providedidentity_count < 1000000:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc this number is kind of arbitrary... but also I don't have a more specific one so 🤷‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly picked it because it was already being used in other places. I looked at the post_upgrade_index_creation.py file and tried to use the same pattern as migrations for tables in that list. Its used in at least 5 other migrations.

I can add a variable to that file so we have it documented on one spot though for future cases.

…ethod-on-a-privacy-request-causes-a-full-table-scan
@JadeCara JadeCara enabled auto-merge November 26, 2025 19:23
…ethod-on-a-privacy-request-causes-a-full-table-scan
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 28, 2025
@JadeCara JadeCara added this pull request to the merge queue Dec 1, 2025
Merged via the queue into main with commit 15d8b68 Dec 1, 2025
69 checks passed
@JadeCara JadeCara deleted the ENG-2046-calling-the-get-peristed-identity-method-on-a-privacy-request-causes-a-full-table-scan branch December 1, 2025 16:23
jjdaurora pushed a commit that referenced this pull request Dec 5, 2025
Co-authored-by: Jade Wibbels <jade@ethyca.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants