feat(data-access): add PostgreSQL/PostgREST backend adapter#1351
feat(data-access): add PostgreSQL/PostgREST backend adapter#1351solaris007 wants to merge 35 commits intomainfrom
Conversation
Port camelCase/snake_case conversion, entity-to-table-name mapping, cursor encoding, field map creation, and query where-clause helpers from ED's v3-postgrest-data-access branch.
Route createDataAccess to either DynamoDB (default) or PostgreSQL based on DATA_ACCESS_BACKEND env var. The PostgreSQL path is a stub that throws 'not yet implemented' until wired up in later tasks. Also pass POSTGREST_URL, POSTGREST_SCHEMA, POSTGREST_API_KEY from context.env through the wrapper.
…tity management Implements a parallel collection class that extends BaseCollection but uses PostgREST queries instead of ElectroDB/DynamoDB for all CRUD operations. Includes pagination, field mapping, validation, defaults, setters, and update watchers.
…gREST persistence PostgresBaseModel extends BaseModel with PostgREST-specific removal logic and replaces the v2 Patcher with PostgresPatcher. PostgresPatcher tracks attribute changes and persists via collection.updateByKeys instead of ElectroDB patch records.
…DataAccess PostgresEntityRegistry mirrors v2 EntityRegistry for PostgREST-backed entities. createPostgresDataAccess now creates a real PostgrestClient and returns registered entity collections. Includes S3 support for Configuration entity.
…ability through TrialUserActivity)
…through Organization)
…e, Audit, LatestAudit)
findIndexBySortKeys only checked for ElectroDB 'facets' property on index pk/sk, but the raw schema uses 'composite'. This caused the method to fall back to the primary index instead of finding the correct GSI, resulting in wrong sort order for SiteTopForm and SiteTopPage queries that sort by traffic DESC.
The fix_entity_suggestions table uses a composite primary key (suggestion_id, fix_entity_id) with no synthetic 'id' column. - Mark fixEntitySuggestionId with postgrestField: false (no Postgres column) - Mark updatedBy with postgrestField: false (column not in join table) - Mark fixEntityCreatedDate with postgrestField: false (generated column) - Remove phantom id mapping from field maps in PG collection constructor
camelToSnake('baseURL') produces 'base_u_r_l' instead of the correct
'base_url'. Add postgrestField: 'base_url' to all schemas that use
the baseURL attribute: Site, ImportJob, ScrapeJob, SiteCandidate.
Ensures every Postgres update gets a DB-side timestamp that differs from the in-memory value, matching ElectroDB behavior where the updatedAt watcher fires during go(). Fixes 6 IT tests (Entitlement, Organization, Project, TrialUser update tests).
PostgreSQL normalizes UUID values to lowercase, but createMany was building model instances from the pre-insert prepared items (which retained the original case). Now uses the PostgREST response data so instances reflect actual stored values. Fixes ScrapeJob UUID case mismatch IT test.
Override getConfig() and setConfig() in PostgresOrganizationModel to properly wrap/unwrap the Config helper object. The Postgres base collection applies schema 'get' transformers uniformly (both on create and read), while v2 only applies them on read. Normalize the test comparison to handle both behaviors. Fixes Organization config returning raw JSON (Bug 1) and config updates not persisting (Bug 7).
Match v2 behavior where the outer _remove() catch always wraps errors in a parent entity error message, rather than re-throwing DataAccessError directly. Also guard the mock-logger assertion for the Postgres backend where a no-op logger is used internally.
…tating record The patcher now tracks updatedAt in its updates map (so getUpdates() reports it) but does not mutate this.record.updatedAt. The actual DB timestamp is injected by updateByKeys, matching ElectroDB behavior where the watcher generates a server-side timestamp. Also adds c8 ignore for createMany DB-returned row mapping (IT-only code path).
Bug 5: Change .equal to .eql for options field comparison (line 97 of import-job.test.js) since Postgres reconstructs objects from DB as new references. Bug 6: Use isPostgres() conditional for validation error assertions. In DynamoDB/ElectroDB, validation errors are wrapped as DataAccessError with ElectroValidationError cause. In Postgres, ValidationError (which extends DataAccessError) is thrown directly without wrapping.
- Enable abortInfo field mapping (abort_info column now in data-service schema) - Multi-column sort ordering in queryPage for composite sort keys - Stable seed reset via PostgREST DELETE (avoids TRUNCATE deadlocks) - One-time schema alteration (disable triggers, drop unique constraints) - Timestamp normalization: pad fractional seconds to 3 digits - LatestAudit: proper all() delegation and unit test mock - Docker compose: add init-db.sql for PostgREST roles, use data-service image - Skip DynamoDB-only IT tests (Configuration, KeyEvent, Sentiment) on Postgres 298 passing, 45 pending, 0 failing on PostgreSQL backend.
ElectroDB populates pk.facets as an object at runtime, not an array. Without the Array.isArray guard, calling .every() on a non-array throws "pkKeys.every is not a function". This matches the existing guard pattern already used by getIndexKeys in the same file.
- Read DATA_ACCESS_BACKEND from config object (context.env) with process.env fallback so serverless environments work correctly - Remove double updatedAt injection in updateByKeys; patcher owns the timestamp and does not mutate the in-memory record after save - Throw DataAccessError on unsupported applyWhere operators instead of silently returning the unfiltered query - Optimize batchGetByKeys with .in() for simple PK lookups, falling back to N sequential requests for composite keys; filter non-UUID values to avoid Postgres type errors on mixed-ID batches - Replace positional index matching in createMany with PK-based matching since PostgREST does not guarantee insertion order - Annotate recordExpiresAt with postgrestField:false and remove hardcoded DYNAMO_ONLY_FIELDS set
…st, and SQL seeding Replace fragile Object.create stubs with Proxy traps that throw clear DataAccessError on any un-overridden method access. Replace string-based error matching with Postgres SQLSTATE 22P02 detection via cause-chain walking. Add exhaustive IT smoke test that dynamically discovers all Postgres collections and exercises read operations. Add optional SQL-based IT seeding mode toggled via IT_SEED_MODE env var.
Add AWS ECR credential configuration and login steps to the CI workflow so the mysticat-data-service image can be pulled for integration tests. Conditional on AWS_ECR_PULL_ROLE_ARN secret being set, so forks and repos without the secret are unaffected.
…ocker-compose Update the default image reference from local mysticat-data-service:test to the ECR-hosted 682033462621.dkr.ecr.us-east-1.amazonaws.com/mysticat-data-service:v1.8.0, matching PR #1349 and ensuring consistency with CI.
Document how to enable the PostgreSQL backend, choose the right PostgREST URL (ALB for VPC-internal, CloudFront for external), network prerequisites, and integration testing instructions.
…s tests - google-client: use esmock to stub composeAuditURL instead of nock. composeAuditURL uses @adobe/fetch (undici) which nock cannot intercept, causing "unable to get local issuer certificate" errors in CI. - data-access: use esmock to stub AWS SDK clients in createDataAccess tests. Without mocks, DynamoDB client creation times out in CI where no AWS credentials are available.
|
This PR will trigger a minor release when merged. |
…ient The sinon stub doesn't have the internal config.translateConfig property that DynamoDBDocument.from() needs. Use Object.create with the right shape instead.
Without AWS credentials, the SDK credential provider chain tries to reach the EC2 instance metadata service (169.254.169.254), which times out after ~2s in CI runners. Set dummy credentials to short-circuit this.
The EntityRegistry initialization loads all model definitions, which takes over 2s on slow CI runners, exceeding the default 2s mocha timeout.
- Set status to Implemented - Fix env var names (POSTGREST_URL, not DATA_SERVICE_URL) - Update package structure to match actual implementation - Note config.dataAccessBackend support
|
@solaris007 thanks for taking this on and for the thorough write-up. I really appreciate the effort and the E2E validation. Closing #1349 as part of this PR. I want to share one architectural concern though: keeping When I proposed the feature flag idea, my thinking was that:
I’d just place feature flag one layer up (lambda middleware/wrapper), where services can switch between Here is an example pattern at lambda level (same wrapper style as other context injectors): // support/data-access.js
import { getDataAccess as getDataAccessV2 } from '@adobe/spacecat-shared-data-access-v2';
import { getDataAccess as getDataAccessV3 } from '@adobe/spacecat-shared-data-access-v3';
/**
* Wrapper function to inject data-access capabilities via context.
* When wrapped, client code reads from context.dataAccess.
*
* @param {UniversalAction} fn
* @returns {function(object, UniversalContext): Promise<Response>}
*/
export function dataAccessWrapper(fn) {
return async (request, context) => {
if (!context.dataAccess) {
const backend = context.env.DATA_ACCESS_BACKEND || 'dynamodb';
context.dataAccess = backend === 'postgresql'
? getDataAccessV3(context)
: getDataAccessV2(context);
}
return fn(request, context);
};
}// action.js
import { dataAccessWrapper } from './support/data-access.js';
async function action(request, context) {
const site = await context.dataAccess.Site.findById('...');
// ...
}
export const main = dataAccessWrapper(action); |
|
Ekrem's PR, given #1351 (comment) is the cleaner solution. @ekremney please pull from this PR some of the fixes made that are bugs in your PR. |
Summary
Adds a PostgreSQL/PostgREST backend to
@adobe/spacecat-shared-data-accessalongside the existing DynamoDB/ElectroDB backend. The backend is selectable via config (dataAccessBackend: 'postgresql') or environment variable (DATA_ACCESS_BACKEND=postgresql), allowing consumers to migrate incrementally.Relation to @ekremney's data-access v3 (#1349)
Thanks to @ekremney for the pioneering work on #1349 which kicked off the v3 PostgREST migration and validated that the approach works. That PR was the inspiration for this one.
This PR takes an evolutionary approach rather than a clean break. The governing thoughts were:
feat!)Both approaches have merit. The evolutionary approach trades some code duplication (two backends coexist) for a safer rollout - we can enable Postgres per-Lambda, validate in production, and fall back to DynamoDB if anything breaks. The clean-break approach is simpler long-term but requires coordinated migration across all services.
We should decide together which path to take. This PR is ready for review and discussion.
What's in this PR
PostgresBaseCollectionandPostgresBaseModel- generic PostgREST-backed persistence using@supabase/postgrest-jsPostgresEntityRegistryto wire all entities and their relationshipsPostgresPatcherfor update operations with proper field mappingmysticat-data-serviceDocker imageDocumentation
Related infrastructure PRs
aws_security_group_ruleresources (discovered during E2E testing - Terraform was silently dropping inline rules)Outstanding: Route53 private hosted zone
During E2E validation we discovered that the data service ALB's DNS name (
internal-data-svc-alb-....elb.amazonaws.com) resolves to public IPs only, even from within the VPC. This means Lambda traffic to PostgREST exits the VPC to the internet gateway and comes back in, instead of staying on the private network.The proper fix is a Route53 private hosted zone that maps a friendly DNS name (e.g.,
data-service.spacecat.internal) to the ALB's private IPs via an alias record. This keeps traffic on the private network and avoids dependency on the internet gateway. This still needs to be implemented in spacecat-infrastructure.For the E2E test we worked around this by resolving the ALB's private IPs manually and passing them directly to the Lambda.
E2E validation
This adapter was validated end-to-end against the dev PostgREST endpoint on 2026-02-15 using a temporary Lambda deployed into the VPC. The Lambda successfully called
dataAccess.Site.findById()and received a correctly formatted site entity from the Postgres database. Full details (including reproducible steps) are documented in the network architecture analysis.Test plan
npm testin data-access package)