Conversation
576935d to
52d8ebd
Compare
|
This PR will trigger a minor release when merged. |
|
Hey @ekremney :) Context applied: v3 is a temporary bridge layer (v4 will be PostgREST-native). Aurora is VPC-internal with Vault-managed credentials. No auth middleware exists by design. Proper indexes exist on the audits table. Note: Several findings from our initial review were addressed by commits pushed after 1. CRITICAL: LatestAudit fetchAllPages bypasses existing database indexesFile:
This is not a gradual degradation - it works until the table crosses a size threshold, then OOMs or times out. Remediation: Create a CREATE VIEW latest_audits AS
SELECT DISTINCT ON (site_id, audit_type) *
FROM audits
ORDER BY site_id, audit_type, audited_at DESC;The existing indexes fully support this. The team already has materialized views (brand_presence) showing this is a known pattern. Recommendation: Block or committed fast-follow with a Jira ticket before v3 hits production traffic. 2. IMPORTANT: N+1 HTTP patterns in batch operationsThree methods use
Works fine with small key sets in testing. Degrades linearly with 100+ keys in production (connection pool pressure on Aurora, latency multiplication). Recommendation: Non-blocking, but address before production scale. 3. IMPORTANT: No null guard on postgrestServiceFile:
Remediation: Trivial fix - 4. MEDIUM: applyWhere silently drops unsupported operatorsFile: Only Remediation: Throw or log a warning for unsupported operator types instead of silently returning unfiltered results. 5. INFORMATIONAL: Dual-path code with electrodb removedThe 6. INFORMATIONAL: Semantic-release mismatchPR title is Summary
Overall this is a well-structured bridge layer. The DynamoDB dep cleanup, expanded IT coverage, and parameterized ECR image in the recent commits are great. The single blocking concern is the LatestAudit full table scan (#1) with a straightforward DB-side fix. |
|
@solaris007 Re: #1349 (comment) Thanks for the review. Several of the “this will break” points were directionally right, and the concrete regressions are now addressed in this branch. Since that comment, commit
Validation:
Still intentionally out-of-scope for this PR:
So yes, useful review, and yes, the high-signal actionable items are now fixed with tests and passing IT. |
solaris007
left a comment
There was a problem hiding this comment.
Reviewed the PostgREST migration approach. Found a critical ordering bug with composite keys, an unsafe localhost default, and a duplicate sort clause issue. See inline comments for details.
…ocker-compose Update the default image reference from local mysticat-data-service:test to the ECR-hosted 682033462621.dkr.ecr.us-east-1.amazonaws.com/mysticat-data-service:v1.8.0, matching PR #1349 and ensuring consistency with CI.
|
Closing due to duplicate effort: #1351 |
|
This is what makes this PR viable again: #1351 (comment) |
|
@solaris007 could you please re-review? |
There was a problem hiding this comment.
Hey @ekremney
Approving conditionally - the architecture and migration approach are solid, but the items below should be addressed before merge.
Previous Review Issues - All 3 Fixed
- Composite key ordering - Fixed.
#getOrderFields()now returns all index key fields. - Localhost default - Fixed. Throws
new Error('POSTGREST_URL is required')if env var missing. - Duplicate ORDER BY - Fixed. Tiebreaker checks if id field is already in the order list.
Critical
C1. Presigned S3 URLs with AWS credentials in seed SQL
test/it/seed/tenants/01_tenant_alpha.sql contains presigned S3 URLs pointing to spacecat-prod-scraper.s3.us-east-1.amazonaws.com with embedded X-Amz-Credential, X-Amz-Security-Token, and X-Amz-Signature values. Even though these are expired STS credentials, they leak production bucket names, IAM credential patterns, and internal path structure into the public repo. Replace with placeholder URLs like https://example.com/screenshot.png.
C2. all-collections-methods-coverage.test.js swallows all errors
The coverage test wraps every method call in try/catch and only asserts expect(error).to.be.instanceOf(Error). Any method that throws for ANY reason (bug, missing table, connection error) is treated as a pass. This test cannot distinguish between "works correctly" and "completely broken" - it provides false confidence.
C3. Triggers never re-enabled after seeding
seed.js calls setPostgresTriggersEnabled(false) before seeding but never calls setPostgresTriggersEnabled(true) afterward. Integration tests run against a database with all triggers disabled, potentially masking bugs related to cascading updates, computed columns, or trigger-based timestamps.
Important
I1. applyWhere only supports eq and contains operators
ElectroDB supported richer where callbacks. Consumers using complex filters will either get no filtering or throw. Needs either documentation of supported operators or expansion.
I2. Patcher save() reverts updatedAt after persisting
In-memory model has stale updatedAt after save. If consumer code reads getUpdatedAt() after calling save(), they get the old timestamp, not the one persisted to the database.
I3. normalizeModelValue silently converts [null] to undefined
Undocumented transformation. Needs a comment explaining which Postgres/PostgREST behavior it compensates for. Also unclear what happens with [null, null] or mixed arrays.
I4. LatestAuditCollection loads ALL audits into memory for single-key queries
When only siteId or auditType is provided, #allAuditsByKeys uses fetchAllPages: true, loading potentially thousands of audits just to find the latest one. Performance risk for high-volume sites. Consider a PostgREST DISTINCT ON approach or server-side limit.
I5. ScrapeJob options null safety in computed attribute setters
optEnableJavascript and optHideConsentBanner setters access options[...] without a null guard. If options is undefined (allowed since it has no default), this throws TypeError at runtime. Fix: options?.[ScrapeJob.ScrapeOptions.ENABLE_JAVASCRIPT].
I6. LatestAudit sort order may differ from v2
allByIndexKeys sorts ascending (a.localeCompare(b)) on auditedAt. If v2 returned descending (most recent first), this is a behavioral change that could break consumers. Verify downstream expectations.
I7. _saveMany issues individual HTTP requests for updates
Unlike batch creates (single .insert() call), updates go one-by-one via Promise.all. For large batches this means N separate HTTP requests. PostgREST supports bulk UPSERT which could be leveraged.
I8. Static mutable state in EntityRegistry and LoggerRegistry
No cleanup/reset mechanism. Entity registrations persist across test cases. Multiple createDataAccess() calls with different loggers - last one wins globally.
I9. Removed DynamoDB IT tests covered entity-specific behaviors not fully replaced
Old per-entity tests (ApiKey hash validation, AsyncJob status transitions, ImportJob URL tracking, etc.) are replaced by a generic method coverage test that cannot verify correctness (see C2).
Suggestions
- S1:
camelToSnake/snakeToCamelroundtrip is lossy for acronyms (baseURL->base_url->baseUrl). Schema-registered fields are safe via explicit maps, but fallback conversion for unknown fields will silently mis-map. - S2: Docker ECR image dependency makes IT tests non-portable for external contributors. Document the ECR login requirement prominently.
- S3: Hardcoded
TEST_IDSinhelpers.jsmust stay in sync with seed SQL - fragile coupling with no validation. - S4:
DEFAULT_PAGE_SIZE = 1000may be too large for PostgREST/Postgres memory. Consider 100-250 or per-entity config. - S5: No request timeout, retry, or connection keep-alive config for the PostgREST client.
- S6: KeyEvent deprecation error message doesn't tell consumers what to do instead. Consider actionable guidance in the message.
- S7: SiteEnrollment
createdoes a full table scan viaallBySiteIdfor dedup - consider a unique constraint at the DB level. - S8:
isMissingDbFieldErrorinseed.jsdoes brittle string matching on PostgREST error messages - consider checking error codes instead. - S9: Add a smoke test that verifies all
TEST_IDSexist in seed data for faster debugging when Docker environment is misconfigured. - S10: FixEntitySuggestion
getId()behavior with composite keys should be verified -fixEntitySuggestionIdisrequired: false+postgrestIgnore: true, which meansgetId()may always return undefined.
|
Addressed the requested follow-ups in commit ff52726.
Also includes previously discussed review fixes (C1/C2/C3/I1/I2/I3/I4/I5/I7/I8/S6) in the same commit set. Validation run: npm run lint and npm test both pass locally. |
## [@adobe/spacecat-shared-data-access-v3.0.0](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-data-access-v2.109.0...@adobe/spacecat-shared-data-access-v3.0.0) (2026-02-16) ### ⚠ BREAKING CHANGES * **data-access:** data-access v3 migrates from DynamoDB/ElectroDB to Postgres/PostgREST Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * data access v3 (#1349) ### Features * data access v3 ([#1349](#1349)) ([6db6b79](6db6b79)) * **data-access:** v3 Postgres/PostgREST migration ([b79725f](b79725f))
|
🎉 This PR is included in version @adobe/spacecat-shared-data-access-v3.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
V3 Data Access Migration: DynamoDB/ElectroDB -> Postgres/PostgREST
BREAKING CHANGE
Summary
This PR moves
@adobe/spacecat-shared-data-accessto a v3 Postgres/PostgREST-backed data layer while preserving the external data-access API shape for consumers.It removes DynamoDB-backed integration coverage and replaces it with a PostgREST integration harness using
mysticat-data-service.Why
What Changed
Core data-access migration
@supabase/postgrest-js.src/util/postgrest.utils.js:siteId<->id)V3 special-case behavior
KeyEvent: deprecated in v3, collection methods throw a deprecation error.LatestAudit: no dedicated table in Postgres; implemented as virtual behavior derived fromAuditquery/grouping logic.Configuration: remains S3-backed (unchanged behavior).Integration tests replacement
test/it.test/it/postgrest/docker-compose.ymltest/it/fixtures.jsglobal setup/teardown to:dbmate upusingmysticat-data-servicemysticat-data-servicebase_url->baseURL)LatestAuditvirtual behaviorKeyEventdeprecation behaviortest/it/seed/tenants/01_tenant_alpha.sqlImage/Runtime Notes
mysticat-data-service:682033462621.dkr.ecr.us-east-1.amazonaws.com/mysticat-data-service:v1.7.1MYSTICAT_DATA_SERVICE_IMAGE=<your-image:tag>Validation
npm run test:it) pass locally when Docker can access required images.Operational Notes
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 682033462621.dkr.ecr.us-east-1.amazonaws.comBreaking/Behavioral Notes
Follow-ups (if needed)