feat(data-access): add PostgreSQL/PostgREST backend adapter by solaris007 · Pull Request #1351 · adobe/spacecat-shared

solaris007 · 2026-02-15T07:06:59Z

Summary

Adds a PostgreSQL/PostgREST backend to @adobe/spacecat-shared-data-access alongside the existing DynamoDB/ElectroDB backend. The backend is selectable via config (dataAccessBackend: 'postgresql') or environment variable (DATA_ACCESS_BACKEND=postgresql), allowing consumers to migrate incrementally.

Relation to @ekremney's data-access v3 (#1349)

Thanks to @ekremney for the pioneering work on #1349 which kicked off the v3 PostgREST migration and validated that the approach works. That PR was the inspiration for this one.

This PR takes an evolutionary approach rather than a clean break. The governing thoughts were:

Instant switchability - ability to switch to/from PostgREST at runtime via an env var, without redeploying. This keeps both the migration window and rollback window as short as possible.
Keep original DynamoDB code untouched - the existing ElectroDB path is not modified, so there is zero risk of breaking current consumers.
Keep original IT tests and assertions - the same integration tests run against both backends. If a test passes on DynamoDB but fails on PostgreSQL, that is an adapter bug, not a test bug.
No change in public API - consumers call the same methods, get the same return shapes. The backend switch is invisible to callers.

	#1349 (v3)	This PR (adapter)
Strategy	Replace DynamoDB entirely	Add Postgres alongside DynamoDB
Breaking change	Yes (`feat!`)	No - additive only
Migration	All consumers switch at once	Consumers switch individually via config flag
DynamoDB code	Removed	Kept, still the default backend
Test harness	PostgREST-only	Backend-agnostic (same tests, both backends)

Both approaches have merit. The evolutionary approach trades some code duplication (two backends coexist) for a safer rollout - we can enable Postgres per-Lambda, validate in production, and fall back to DynamoDB if anything breaks. The clean-break approach is simpler long-term but requires coordinated migration across all services.

We should decide together which path to take. This PR is ready for review and discussion.

What's in this PR

Adds PostgresBaseCollection and PostgresBaseModel - generic PostgREST-backed persistence using @supabase/postgrest-js
Adds Postgres-specific collection and model for every entity (Site, Audit, Organization, etc.)
Adds PostgresEntityRegistry to wire all entities and their relationships
Adds PostgresPatcher for update operations with proper field mapping
Adds PostgREST field mapping utilities (camelCase <-> snake_case, schema annotations)
Adds backend-agnostic IT test infrastructure (same tests run against both DynamoDB and Postgres)
Adds Postgres smoke test for all collections
Adds ECR authentication for IT tests to pull the mysticat-data-service Docker image

Documentation

Design document - Architecture, design decisions, data flows, entity coverage, and risk analysis
Implementation plan - Task-by-task implementation breakdown (12 tasks)
Network architecture analysis (mysticat-data-service feat: add support for 404 report backlink #57) - VPC topology, DNS resolution findings, and E2E validation results from deploying a test Lambda against the dev PostgREST endpoint

Related infrastructure PRs

spacecat-infrastructure #327 - Adds security group rules allowing Lambda to reach the PostgREST ALB on port 3000
spacecat-infrastructure #328 - Fixes inline security group rules by migrating to standalone aws_security_group_rule resources (discovered during E2E testing - Terraform was silently dropping inline rules)

Outstanding: Route53 private hosted zone

During E2E validation we discovered that the data service ALB's DNS name (internal-data-svc-alb-....elb.amazonaws.com) resolves to public IPs only, even from within the VPC. This means Lambda traffic to PostgREST exits the VPC to the internet gateway and comes back in, instead of staying on the private network.

The proper fix is a Route53 private hosted zone that maps a friendly DNS name (e.g., data-service.spacecat.internal) to the ALB's private IPs via an alias record. This keeps traffic on the private network and avoids dependency on the internet gateway. This still needs to be implemented in spacecat-infrastructure.

For the E2E test we worked around this by resolving the ALB's private IPs manually and passing them directly to the Lambda.

E2E validation

This adapter was validated end-to-end against the dev PostgREST endpoint on 2026-02-15 using a temporary Lambda deployed into the VPC. The Lambda successfully called dataAccess.Site.findById() and received a correctly formatted site entity from the Postgres database. Full details (including reproducible steps) are documented in the network architecture analysis.

Test plan

Unit tests pass (npm test in data-access package)
Integration tests pass against both DynamoDB and Postgres backends
Postgres smoke test validates all collections can be instantiated
E2E validated in dev (see above)
Route53 private hosted zone for data service ALB
Review with @ekremney to align on migration strategy

Ref: SITES-39598

Port camelCase/snake_case conversion, entity-to-table-name mapping, cursor encoding, field map creation, and query where-clause helpers from ED's v3-postgrest-data-access branch.

Route createDataAccess to either DynamoDB (default) or PostgreSQL based on DATA_ACCESS_BACKEND env var. The PostgreSQL path is a stub that throws 'not yet implemented' until wired up in later tasks. Also pass POSTGREST_URL, POSTGREST_SCHEMA, POSTGREST_API_KEY from context.env through the wrapper.

…tity management Implements a parallel collection class that extends BaseCollection but uses PostgREST queries instead of ElectroDB/DynamoDB for all CRUD operations. Includes pagination, field mapping, validation, defaults, setters, and update watchers.

…gREST persistence PostgresBaseModel extends BaseModel with PostgREST-specific removal logic and replaces the v2 Patcher with PostgresPatcher. PostgresPatcher tracks attribute changes and persists via collection.updateByKeys instead of ElectroDB patch records.

…DataAccess PostgresEntityRegistry mirrors v2 EntityRegistry for PostgREST-backed entities. createPostgresDataAccess now creates a real PostgrestClient and returns registered entity collections. Includes S3 support for Configuration entity.

…ability through TrialUserActivity)

…through Organization)

…e, Audit, LatestAudit)

…stry

findIndexBySortKeys only checked for ElectroDB 'facets' property on index pk/sk, but the raw schema uses 'composite'. This caused the method to fall back to the primary index instead of finding the correct GSI, resulting in wrong sort order for SiteTopForm and SiteTopPage queries that sort by traffic DESC.

The fix_entity_suggestions table uses a composite primary key (suggestion_id, fix_entity_id) with no synthetic 'id' column. - Mark fixEntitySuggestionId with postgrestField: false (no Postgres column) - Mark updatedBy with postgrestField: false (column not in join table) - Mark fixEntityCreatedDate with postgrestField: false (generated column) - Remove phantom id mapping from field maps in PG collection constructor

camelToSnake('baseURL') produces 'base_u_r_l' instead of the correct 'base_url'. Add postgrestField: 'base_url' to all schemas that use the baseURL attribute: Site, ImportJob, ScrapeJob, SiteCandidate.

Ensures every Postgres update gets a DB-side timestamp that differs from the in-memory value, matching ElectroDB behavior where the updatedAt watcher fires during go(). Fixes 6 IT tests (Entitlement, Organization, Project, TrialUser update tests).

PostgreSQL normalizes UUID values to lowercase, but createMany was building model instances from the pre-insert prepared items (which retained the original case). Now uses the PostgREST response data so instances reflect actual stored values. Fixes ScrapeJob UUID case mismatch IT test.

Override getConfig() and setConfig() in PostgresOrganizationModel to properly wrap/unwrap the Config helper object. The Postgres base collection applies schema 'get' transformers uniformly (both on create and read), while v2 only applies them on read. Normalize the test comparison to handle both behaviors. Fixes Organization config returning raw JSON (Bug 1) and config updates not persisting (Bug 7).

Match v2 behavior where the outer _remove() catch always wraps errors in a parent entity error message, rather than re-throwing DataAccessError directly. Also guard the mock-logger assertion for the Postgres backend where a no-op logger is used internally.

…tating record The patcher now tracks updatedAt in its updates map (so getUpdates() reports it) but does not mutate this.record.updatedAt. The actual DB timestamp is injected by updateByKeys, matching ElectroDB behavior where the watcher generates a server-side timestamp. Also adds c8 ignore for createMany DB-returned row mapping (IT-only code path).

Bug 5: Change .equal to .eql for options field comparison (line 97 of import-job.test.js) since Postgres reconstructs objects from DB as new references. Bug 6: Use isPostgres() conditional for validation error assertions. In DynamoDB/ElectroDB, validation errors are wrapped as DataAccessError with ElectroValidationError cause. In Postgres, ValidationError (which extends DataAccessError) is thrown directly without wrapping.

- Enable abortInfo field mapping (abort_info column now in data-service schema) - Multi-column sort ordering in queryPage for composite sort keys - Stable seed reset via PostgREST DELETE (avoids TRUNCATE deadlocks) - One-time schema alteration (disable triggers, drop unique constraints) - Timestamp normalization: pad fractional seconds to 3 digits - LatestAudit: proper all() delegation and unit test mock - Docker compose: add init-db.sql for PostgREST roles, use data-service image - Skip DynamoDB-only IT tests (Configuration, KeyEvent, Sentiment) on Postgres 298 passing, 45 pending, 0 failing on PostgreSQL backend.

ElectroDB populates pk.facets as an object at runtime, not an array. Without the Array.isArray guard, calling .every() on a non-array throws "pkKeys.every is not a function". This matches the existing guard pattern already used by getIndexKeys in the same file.

- Read DATA_ACCESS_BACKEND from config object (context.env) with process.env fallback so serverless environments work correctly - Remove double updatedAt injection in updateByKeys; patcher owns the timestamp and does not mutate the in-memory record after save - Throw DataAccessError on unsupported applyWhere operators instead of silently returning the unfiltered query - Optimize batchGetByKeys with .in() for simple PK lookups, falling back to N sequential requests for composite keys; filter non-UUID values to avoid Postgres type errors on mixed-ID batches - Replace positional index matching in createMany with PK-based matching since PostgREST does not guarantee insertion order - Annotate recordExpiresAt with postgrestField:false and remove hardcoded DYNAMO_ONLY_FIELDS set

…st, and SQL seeding Replace fragile Object.create stubs with Proxy traps that throw clear DataAccessError on any un-overridden method access. Replace string-based error matching with Postgres SQLSTATE 22P02 detection via cause-chain walking. Add exhaustive IT smoke test that dynamically discovers all Postgres collections and exercises read operations. Add optional SQL-based IT seeding mode toggled via IT_SEED_MODE env var.

Add AWS ECR credential configuration and login steps to the CI workflow so the mysticat-data-service image can be pulled for integration tests. Conditional on AWS_ECR_PULL_ROLE_ARN secret being set, so forks and repos without the secret are unaffected.

…ocker-compose Update the default image reference from local mysticat-data-service:test to the ECR-hosted 682033462621.dkr.ecr.us-east-1.amazonaws.com/mysticat-data-service:v1.8.0, matching PR #1349 and ensuring consistency with CI.

Document how to enable the PostgreSQL backend, choose the right PostgREST URL (ALB for VPC-internal, CloudFront for external), network prerequisites, and integration testing instructions.

…s tests - google-client: use esmock to stub composeAuditURL instead of nock. composeAuditURL uses @adobe/fetch (undici) which nock cannot intercept, causing "unable to get local issuer certificate" errors in CI. - data-access: use esmock to stub AWS SDK clients in createDataAccess tests. Without mocks, DynamoDB client creation times out in CI where no AWS credentials are available.

github-actions · 2026-02-15T07:55:04Z

This PR will trigger a minor release when merged.

…ient The sinon stub doesn't have the internal config.translateConfig property that DynamoDBDocument.from() needs. Use Object.create with the right shape instead.

Without AWS credentials, the SDK credential provider chain tries to reach the EC2 instance metadata service (169.254.169.254), which times out after ~2s in CI runners. Set dummy credentials to short-circuit this.

The EntityRegistry initialization loads all model definitions, which takes over 2s on slow CI runners, exceeding the default 2s mocha timeout.

- Set status to Implemented - Fix env var names (POSTGREST_URL, not DATA_SERVICE_URL) - Update package structure to match actual implementation - Note config.dataAccessBackend support

ekremney · 2026-02-16T08:51:29Z

@solaris007 thanks for taking this on and for the thorough write-up. I really appreciate the effort and the E2E validation.

Closing #1349 as part of this PR.

I want to share one architectural concern though: keeping DynamoDB/ElectroDB and Postgres/PostgREST in the same package (same major) creates a long-term operational burden. In practice, that means every change must preserve parity across two persistence implementations, two mapping layers, and two test paths. That tends to drift over time, even with good intentions.

When I proposed the feature flag idea, my thinking was that:

v2 = DynamoDB/ElectroDB (frozen except critical fixes)
v3 = Postgres/PostgREST only
migration/rollback handled at service boundary, not inside the data-access package

I’d just place feature flag one layer up (lambda middleware/wrapper), where services can switch between v2 and v3 cleanly without coupling both backends into one library.

Here is an example pattern at lambda level (same wrapper style as other context injectors):

// support/data-access.js
import { getDataAccess as getDataAccessV2 } from '@adobe/spacecat-shared-data-access-v2';
import { getDataAccess as getDataAccessV3 } from '@adobe/spacecat-shared-data-access-v3';

/**
 * Wrapper function to inject data-access capabilities via context.
 * When wrapped, client code reads from context.dataAccess.
 *
 * @param {UniversalAction} fn
 * @returns {function(object, UniversalContext): Promise<Response>}
 */
export function dataAccessWrapper(fn) {
  return async (request, context) => {
    if (!context.dataAccess) {
      const backend = context.env.DATA_ACCESS_BACKEND || 'dynamodb';
      context.dataAccess = backend === 'postgresql'
        ? getDataAccessV3(context)
        : getDataAccessV2(context);
    }

    return fn(request, context);
  };
}

// action.js
import { dataAccessWrapper } from './support/data-access.js';

async function action(request, context) {
  const site = await context.dataAccess.Site.findById('...');
  // ...
}

export const main = dataAccessWrapper(action);

solaris007 · 2026-02-16T09:24:06Z

Ekrem's PR, given #1351 (comment) is the cleaner solution. @ekremney please pull from this PR some of the fixes made that are bugs in your PR.

solaris007 added 30 commits February 13, 2026 18:18

docs(data-access): add PostgreSQL adapter design and implementation plan

90bc7cf

Ref: SITES-39598

chore(data-access): add @supabase/postgrest-js dependency

d7eecf5

feat(data-access): add PostgREST field mapping utilities

065168e

Port camelCase/snake_case conversion, entity-to-table-name mapping, cursor encoding, field map creation, and query where-clause helpers from ED's v3-postgrest-data-access branch.

feat(data-access): add Postgres collections for entities P-T (PageCit…

cc014e7

…ability through TrialUserActivity)

feat(data-access): add Postgres collections for entities A-O (ApiKey …

15e4dc0

…through Organization)

feat(data-access): add Postgres collections for complex entities (Sit…

9914fa3

…e, Audit, LatestAudit)

feat(data-access): wire all Postgres entities into PostgresEntityRegi…

f33e48b

…stry

chore(data-access): adjust coverage config for Postgres entity files

273a48d

feat(data-access): add backend-agnostic IT test infrastructure

9e22ddd

Add postgrestField annotations for baseURL attributes

6746898

camelToSnake('baseURL') produces 'base_u_r_l' instead of the correct 'base_url'. Add postgrestField: 'base_url' to all schemas that use the baseURL attribute: Site, ImportJob, ScrapeJob, SiteCandidate.

docs: add PostgreSQL/PostgREST backend documentation to README

fdba9ed

Document how to enable the PostgreSQL backend, choose the right PostgREST URL (ALB for VPC-internal, CloudFront for external), network prerequisites, and integration testing instructions.

Merge branch 'main' into dj/feat-postgres-adapter

434a060

solaris007 requested a review from ekremney as a code owner February 15, 2026 07:07

solaris007 self-assigned this Feb 15, 2026

solaris007 added the enhancement New feature or request label Feb 15, 2026

solaris007 added 4 commits February 15, 2026 09:05

fix(test): use proper DynamoDB mock with config shape for document cl…

de92ab3

…ient The sinon stub doesn't have the internal config.translateConfig property that DynamoDBDocument.from() needs. Use Object.create with the right shape instead.

fix(test): set dummy AWS credentials to prevent IMDS timeout in CI

77cbf0f

Without AWS credentials, the SDK credential provider chain tries to reach the EC2 instance metadata service (169.254.169.254), which times out after ~2s in CI runners. Set dummy credentials to short-circuit this.

fix: increase mocha timeout for data-access service tests

e5af009

The EntityRegistry initialization loads all model definitions, which takes over 2s on slow CI runners, exceeding the default 2s mocha timeout.

docs: update design doc status and fix stale details

95f2722

- Set status to Implemented - Fix env var names (POSTGREST_URL, not DATA_SERVICE_URL) - Update package structure to match actual implementation - Note config.dataAccessBackend support

ekremney mentioned this pull request Feb 16, 2026

feat!: data access v3 #1349

Merged

ekremney approved these changes Feb 16, 2026

View reviewed changes

solaris007 closed this Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(data-access): add PostgreSQL/PostgREST backend adapter#1351

feat(data-access): add PostgreSQL/PostgREST backend adapter#1351
solaris007 wants to merge 35 commits intomainfrom
dj/feat-postgres-adapter

solaris007 commented Feb 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 15, 2026

Uh oh!

ekremney commented Feb 16, 2026

Uh oh!

solaris007 commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

solaris007 commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relation to @ekremney's data-access v3 (#1349)

What's in this PR

Documentation

Related infrastructure PRs

Outstanding: Route53 private hosted zone

E2E validation

Test plan

Uh oh!

github-actions Bot commented Feb 15, 2026

Uh oh!

ekremney commented Feb 16, 2026

Uh oh!

solaris007 commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

solaris007 commented Feb 15, 2026 •

edited

Loading