Adds migration for the new consent_documents table #6790

johnewart · 2025-10-17T23:43:37Z

Description Of Changes

Adds PostgreSQL migration to create the new consent document table and its partitions.

Code Changes

The migration as mentioned above, but also copies the alembic.ini to the root so you can use it without moving around (also, this is "normally" where alembic puts the configuration file, we might want to move it but I just copied it for now)

Changed both the copies with an updated migration file name template to be date-based instead of just the revision - makes it easier to reason about the history and linear progression of changes to the database.

Steps to Confirm

No manual steps needed

Pre-Merge Checklist

…artitioning; copies the alembic.ini to the root so you can use alembic from the root of the project (where it typically goes).

vercel · 2025-10-17T23:43:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Preview	Comments	Updated (UTC)
fides-plus-nightly	Ignored	Preview		Oct 24, 2025 3:34am
fides-privacy-center	Ignored			Oct 24, 2025 3:34am

greptile-apps

Greptile Overview

Summary

This PR introduces a PostgreSQL partitioned table for storing consent documents (v3) and updates the Alembic configuration to use date-based migration file naming for better historical tracking.

Key Changes:

Creates consent_documents table with LIST partitioning on is_latest boolean field
Adds two partitions: consent_documents_current (for latest records) and consent_documents_historic (for historical records)
Copies alembic.ini to project root with date-based file template (xx_YYYY_MM_DD_HHMM-revision-slug)
Updates existing src/fides/api/alembic/alembic.ini with same date-based template

Issues Found:

Missing primary key constraint on consent_documents table (critical - table won't function properly)
Boolean column is_latest should be non-nullable with default to avoid three-state logic
JSONB column uses incorrect syntax (postgresql.json.JSONB vs postgresql.JSONB(astext_type=sa.Text()))
created_at uses wrong default syntax (default= instead of server_default=)

The alembic.ini configuration changes are good and improve developer experience, but the migration has several critical issues that must be fixed before merging.

Confidence Score: 1/5

This PR has critical database schema issues that will cause runtime failures
Score reflects multiple critical issues: missing primary key constraint (will cause table operations to fail), incorrect default syntax for created_at timestamp (won't work), nullable boolean used for partitioning (creates ambiguity), and inconsistent JSONB syntax. These are not stylistic concerns but functional errors that will prevent the migration from working correctly.
src/fides/api/alembic/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py requires immediate attention to fix critical schema issues

Important Files Changed

File Analysis

Filename	Score	Overview
src/fides/api/alembic/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py	2/5	Creates partitioned consent_documents table with JSONB search data, TEXT record data, and boolean is_latest field - missing primary key, nullable constraints on critical fields, and proper JSONB syntax
alembic.ini	5/5	New root-level alembic configuration with date-based migration template for easier reasoning about migration history
src/fides/api/alembic/alembic.ini	5/5	Updated with date-based migration file template matching the root alembic.ini configuration

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Alembic as Alembic CLI
    participant Migration as Migration Script
    participant DB as PostgreSQL Database
    
    Dev->>Alembic: alembic upgrade head
    Alembic->>Migration: Execute upgrade()
    Migration->>DB: CREATE TABLE consent_documents
    Note over DB: Partitioned by LIST(is_latest)
    Migration->>DB: CREATE TABLE consent_documents_current
    Note over DB: PARTITION FOR VALUES IN (true)
    Migration->>DB: CREATE TABLE consent_documents_historic
    Note over DB: PARTITION FOR VALUES IN (false)
    DB-->>Migration: Tables created
    Migration-->>Alembic: Migration complete
    Alembic-->>Dev: Database upgraded
    
    Note over Dev,DB: Rollback scenario
    Dev->>Alembic: alembic downgrade -1
    Alembic->>Migration: Execute downgrade()
    Migration->>DB: DROP TABLE consent_documents
    Note over DB: Cascades to all partitions
    DB-->>Migration: Tables dropped
    Migration-->>Alembic: Downgrade complete
    Alembic-->>Dev: Database downgraded

_{3 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-17T23:48:41Z

...c/migrations/versions/xx_2025_10_17_1603_5093e92e2a5a_add_consent_data_v3_to_the_database.py

+    op.create_table(
+        "consent_documents",
+        sa.Column("search_data", postgresql.json.JSONB),
+        sa.Column("record_data", postgresql.TEXT),
+        sa.Column("is_latest", postgresql.BOOLEAN),
+        sa.Column(
+            "created_at",
+            postgresql.TIMESTAMP(timezone=True),
+            nullable=False,
+            default=sa.func.now,
+        ),
+        sa.Column("updated_at", postgresql.TIMESTAMP(timezone=True), nullable=True),
+        sa.Column("consented_at", postgresql.TIMESTAMP(timezone=True), nullable=True),
+        postgresql_partition_by="LIST (is_latest)",
+    )


logic: Table missing primary key. All tables need a primary key column with PrimaryKeyConstraint.

Add an id column:

sa.Column("id", sa.String(length=255), nullable=False),

And add before the partition clause:

sa.PrimaryKeyConstraint("id"),

Prompt To Fix With AI

This is a comment left during a code review. Path: src/fides/api/alembic/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py Line: 21:35 Comment: **logic:** Table missing primary key. All tables need a primary key column with `PrimaryKeyConstraint`. Add an id column: ```python sa.Column("id", sa.String(length=255), nullable=False), ``` And add before the partition clause: ```python sa.PrimaryKeyConstraint("id"), ``` How can I resolve this? If you propose a fix, please make it concise.

They don't technically need a primary key column; only if we are going to be using SA's ORM for these records (object identity / tracking changes does require a unique identifier). I am not sure we want the overhead of an ORM and tracking here but I haven't tested to see how much overhead there would be in our case, but I am happy to add it - thoughts?

You raise a good point. If you're not using SQLAlchemy's ORM for this table and just doing raw queries, you technically don't need a primary key. However, there are a few considerations beyond ORM usage:

PostgreSQL partitioning - While not strictly required, having a primary key (or at least a unique constraint) can help with partition pruning and query performance

Future flexibility - If you ever need to update/delete specific records, having an identifier makes that much easier

Indexing - Without any unique constraint, there's no index on this table at all, which could impact query performance

Since you mentioned you haven't tested the overhead yet, here are your options:

Option 1: Add the primary key (if using ORM or want the benefits above)

Suggested change

op.create_table(

"consent_documents",

sa.Column("search_data", postgresql.json.JSONB),

sa.Column("record_data", postgresql.TEXT),

sa.Column("is_latest", postgresql.BOOLEAN),

sa.Column(

"created_at",

postgresql.TIMESTAMP(timezone=True),

nullable=False,

default=sa.func.now,

),

sa.Column("updated_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

sa.Column("consented_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

postgresql_partition_by="LIST (is_latest)",

)

op.create_table(

"consent_documents",

sa.Column("id", sa.String(length=255), nullable=False),

sa.Column("search_data", postgresql.json.JSONB),

sa.Column("record_data", postgresql.TEXT),

sa.Column("is_latest", postgresql.BOOLEAN),

sa.Column(

"created_at",

postgresql.TIMESTAMP(timezone=True),

nullable=False,

server_default=sa.text("now()"),

),

sa.Column("updated_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

sa.Column("consented_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

sa.PrimaryKeyConstraint("id"),

postgresql_partition_by="LIST (is_latest)",

)

Option 2: Skip the primary key but add an index on commonly queried fields in search_data or created_at for query performance

What's your access pattern going to be for this table? That might help determine whether the PK is worth it.

The indexes will be created, updated, or removed later as the fields being indexed will be chosen either by configuration or during runtime.

hmm would it be a painful operation to add a PK down the line? e.g if we want some other table to have a FK to one of these records.
Also I think we almost always use SQLAlchemy's ORM, not sure if I can think of a table that doesn't use it -- do we think the performance advantage of not using it will be noticeable?

Good points. Let me address both:

On adding a PK later: It would be painful, yes. PostgreSQL requires that a primary key column be included in the partition key for partitioned tables. So if you add a PK later, you'd need to either:

Include is_latest in a composite PK (e.g., PrimaryKeyConstraint("id", "is_latest")), or

Drop and recreate the entire table structure including all partitions

On ORM usage: You're right that Fides consistently uses the ORM across the codebase. Looking at the models, even high-volume tables like AuditLog, PrivacyRequest, and various consent-related tables all use the ORM. The overhead is generally negligible for most operations, and you get:

Type safety

Relationship handling

Session management

Query builder benefits

Given these two considerations and the consistency with the rest of the codebase, I'd recommend adding the PK now. Here's the pattern from similar partitioned tables in the codebase:

Suggested change

op.create_table(

"consent_documents",

sa.Column("search_data", postgresql.json.JSONB),

sa.Column("record_data", postgresql.TEXT),

sa.Column("is_latest", postgresql.BOOLEAN),

sa.Column(

"created_at",

postgresql.TIMESTAMP(timezone=True),

nullable=False,

default=sa.func.now,

),

sa.Column("updated_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

sa.Column("consented_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

postgresql_partition_by="LIST (is_latest)",

)

op.create_table(

"privacy_preferences",

sa.Column("id", sa.String(length=255), nullable=False),

sa.Column("search_data", postgresql.json.JSONB),

sa.Column("record_data", postgresql.TEXT),

sa.Column("is_latest", postgresql.BOOLEAN, nullable=False, server_default="f"),

sa.Column(

"created_at",

postgresql.TIMESTAMP(timezone=True),

nullable=False,

server_default=sa.text("now()"),

),

sa.Column("updated_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

sa.Column("collected_at", postgresql.TIMESTAMP(timezone=True), nullable=True),

sa.PrimaryKeyConstraint("id", "is_latest"),

postgresql_partition_by="LIST (is_latest)",

)

Note the composite PK includes is_latest since it's the partition key. You'll generate IDs at the application level (probably UUIDs).

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py

...c/migrations/versions/xx_2025_10_17_1603_5093e92e2a5a_add_consent_data_v3_to_the_database.py

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py

codecov · 2025-10-18T00:10:07Z

Codecov Report

❌ Patch coverage is 46.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.38%. Comparing base (4240eb4) to head (04e995c).
⚠️ Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
src/fides/api/db/database.py	46.66%	8 Missing ⚠️

❌ Your patch status has failed because the patch coverage (46.66%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6790      +/-   ##
==========================================
- Coverage   87.39%   87.38%   -0.02%     
==========================================
  Files         518      518              
  Lines       33821    33835      +14     
  Branches     3892     3895       +3     
==========================================
+ Hits        29559    29566       +7     
- Misses       3407     3414       +7     
  Partials      855      855

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py

...c/migrations/versions/xx_2025_10_17_1603_5093e92e2a5a_add_consent_data_v3_to_the_database.py

erosselli · 2025-10-20T19:14:44Z

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py

+            default=sa.func.now,
+        ),
+        sa.Column("updated_at", postgresql.TIMESTAMP(timezone=True), nullable=True),
+        sa.Column("consented_at", postgresql.TIMESTAMP(timezone=True), nullable=True),


I think we had discussed calling this collected_at since consented_at kind of implies opt-in consent

erosselli · 2025-10-20T19:16:08Z

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py

+
+def upgrade():
+    op.create_table(
+        "consent_documents",


I think our domain name for "consent" is actually "privacy_preferences" , so maybe we should use that instead?

Sure, I will rename it to privacy_preferences if that makes more sense!

erosselli · 2025-10-20T19:17:09Z

...c/migrations/versions/xx_2025_10_17_1603_5093e92e2a5a_add_consent_data_v3_to_the_database.py

+depends_on = None
+
+
+def upgrade():


do we need a PK for the table at all ?

No, see above conversation with the bot, which argues with itself 😆

src/fides/api/alembic/alembic.ini

erosselli · 2025-10-20T19:18:46Z

alembic.ini

I know it's better to have it top-level but maybe instead of duplicating the alembic file we can leave it where it is for now and then push a separate PR that moves it to the top?

That's fine 😄 I worry that moving it may have unintended consequences, so I will open a PR with the duplicate versions first and then one removing it so it's easier to undo and still have migration files named in a sane way

…093e92e2a5a_add_consent_data_v3_to_the_database.py Switch to `server_default` over `default` to be consistent with other uses Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

vercel · 2025-10-20T21:41:44Z

Deployment failed with the following error:

Failed to create deployment for team_zNyOMht1YW4aGAS2AIEa1sZ7 in project prj_RHcBVflyGsFyx07EZ93ZBBnf7e1O: FetchError: request to https://76.76.21.112/v13/now/deployments?ownerId=team_zNyOMht1YW4aGAS2AIEa1sZ7&projectId=prj_RHcBVflyGsFyx07EZ93ZBBnf7e1O&skipAutoDetectionConfirmation=1&teamId=team_zNyOMht1YW4aGAS2AIEa1sZ7&traceCarrier=%7B%22ot-baggage-webhookAt%22%3A%221760996230468%22%2C%22ot-baggage-senderUsername%22%3A%22gh.johnewart%22%2C%22baggage%22%3A%22webhookAt%3D1760996230468%2CsenderUsername%3Dgh.johnewart%22%2C%22x-datadog-trace-id%22%3A%222219022528182712345%22%2C%22x-datadog-parent-id%22%3A%225863888292101682785%22%2C%22x-datadog-sampling-priority%22%3A%222%22%2C%22x-datadog-tags%22%3A%22_dd.p.tid%3D68f6ab8600000000%2C_dd.p.dm%3D-3%22%2C%22traceparent%22%3A%2200-68f6ab86000000001ecb8d3b84eb7819-5160b7555d4af661-01%22%2C%22tracestate%22%3A%22dd%3Dt.tid%3A68f6ab8600000000%3Bt.dm%3A-3%3Bs%3A2%3Bp%3A5160b7555d4af661%22%7D failed, reason: socket hang up

…093e92e2a5a_add_consent_data_v3_to_the_database.py Force `is_latest` to be not null and default to `f` if not specified. Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…093e92e2a5a_add_consent_data_v3_to_the_database.py Removing un-needed `pass` Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

… but leave the template name change (will make a separate PR with the alembic.ini in the new location)

src/fides/api/alembic/alembic.ini

… across partitions

erosselli

Approving with one question, mostly to decide if we want to stick to our current pattern for PKs or if we have good reasons to deviate from it. let's not forget the changelog entry too :) and updating PR title for future easy searches :)

erosselli · 2025-10-22T18:13:36Z

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py

+        # Use a big integer for the primary key to ensure we have enough space for all the records
+        # and also force the ID to be generated, never allowing it to be overridden without expressly using `OVERRIDING SYSTEM VALUE`
+        sa.Column("id", BigInteger, Identity(start=1, increment=1, always=True), primary_key=True),


this makes sense to me . I do want to call out that we don't have this pattern anywhere else in our codebase , our PKs are usually <prefix>_<uuid> (e.g sta_e14dbde3-e664-4a91-947e-139919630a30 , pri_f192b815-fb03-4bfb-bd34-4b27ba5bd168 ) . I don't feel too strongly about adhering to this but will tag @galvana just in case he has opinions

If were still creating a SQLAlchemy model for this, which I think we are, then I would let our base class handle IDs. That way it's consistent, and we'll still have enough identifiers. But open to changes if we have a good reason for it.

My only consideration would be if we have hundreds of millions (or more) of records with an extra string; it's small in comparison to the document, certainly, but not insignificant, and the index will be much bigger as well.

(Hence the 64-bit fixed-size index)

I just tested this and SQLAlchemy doesn't care about the data type, just that an ID is present. Let's keep this how it is

Yeah, SQLAlchemy just needs a stable ID for its internal 'this record has changed and is dirty' bookkeeping

galvana

Let's setting on the ID approach then we can merge

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py

They will be stored in preferences instead Co-authored-by: Adrian Galvan <adrian@ethyca.com>

Use underscore instead of dash for filename Co-authored-by: Adrian Galvan <adrian@ethyca.com>

johnewart added 2 commits October 17, 2025 16:37

Adds migration for the new consent_documents table, including their p…

0122c97

…artitioning; copies the alembic.ini to the root so you can use alembic from the root of the project (where it typically goes).

Updating the original alembic.ini to match the file template

2243b57

johnewart requested a review from galvana October 17, 2025 23:43

johnewart requested a review from a team as a code owner October 17, 2025 23:43

greptile-apps bot reviewed Oct 17, 2025

View reviewed changes

johnewart requested a review from erosselli October 20, 2025 18:29

erosselli reviewed Oct 20, 2025

View reviewed changes

johnewart and others added 3 commits October 20, 2025 14:48

Update src/fides/api/alembic/migrations/versions/xx_2025_10_17_1603-5…

b8defb1

…093e92e2a5a_add_consent_data_v3_to_the_database.py Removing un-needed `pass` Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Update some names in the migration; remove the duplicated alembic.ini…

1fee83f

… but leave the template name change (will make a separate PR with the alembic.ini in the new location)

galvana reviewed Oct 22, 2025

View reviewed changes

src/fides/api/alembic/alembic.ini Outdated Show resolved Hide resolved

johnewart added 2 commits October 22, 2025 10:15

Update other table name references; add bigint primary key

7bdc356

Add field data categories and composite id key to support primary key…

5612535

… across partitions

erosselli approved these changes Oct 22, 2025

View reviewed changes

Shorter import for JSONB

e0b3fb9

galvana requested changes Oct 22, 2025

View reviewed changes

Merge branch 'main' into johnewart/ENG-1589

b57d2bb

vercel bot deployed to Preview – fides-plus-nightly October 22, 2025 19:33 View deployment

galvana reviewed Oct 22, 2025

View reviewed changes

...c/migrations/versions/xx_2025_10_17_1603-5093e92e2a5a_add_consent_data_v3_to_the_database.py Outdated Show resolved Hide resolved

johnewart and others added 3 commits October 23, 2025 11:22

Remove collected_at from the document table schema

8c6589d

They will be stored in preferences instead Co-authored-by: Adrian Galvan <adrian@ethyca.com>

Update src/fides/api/alembic/alembic.ini

9367bd2

Use underscore instead of dash for filename Co-authored-by: Adrian Galvan <adrian@ethyca.com>

Add entry to CHANGELOG.md for new db schema

1cad66f

galvana approved these changes Oct 23, 2025

View reviewed changes

johnewart enabled auto-merge October 23, 2025 19:29

Merge branch 'main' into johnewart/ENG-1589

8dce831

vercel bot deployed to Preview – fides-plus-nightly October 23, 2025 19:35 View deployment

Merge branch 'main' into johnewart/ENG-1589

0d9a887

vercel bot deployed to Preview – fides-plus-nightly October 23, 2025 22:48 View deployment

galvana added 3 commits October 23, 2025 15:51

Alembic workaround (#6829)

906075b

Static fixes

a430944

Manually dropping excluded tables in DB reset

04e995c

johnewart added this pull request to the merge queue Oct 24, 2025

Merged via the queue into main with commit 7617d38 Oct 24, 2025
67 of 68 checks passed

johnewart deleted the johnewart/ENG-1589 branch October 24, 2025 04:44

		depends_on = None


		def upgrade():

Adds migration for the new consent_documents table #6790

Adds migration for the new consent_documents table #6790

Uh oh!

Conversation

johnewart commented Oct 17, 2025 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description Of Changes

Code Changes

Steps to Confirm

Pre-Merge Checklist

Uh oh!

vercel bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Summary

Confidence Score: 1/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vercel bot commented Oct 20, 2025

Uh oh!

Uh oh!

erosselli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnewart commented Oct 17, 2025 •

edited by atlassian bot

Loading

vercel bot commented Oct 17, 2025 •

edited

Loading

codecov bot commented Oct 18, 2025 •

edited

Loading

erosselli left a comment •

edited

Loading