Skip to content

Add archive workflow SPA and API#30

Merged
jruszo merged 2 commits intomasterfrom
feature/archive-workflow-spa
Apr 19, 2026
Merged

Add archive workflow SPA and API#30
jruszo merged 2 commits intomasterfrom
feature/archive-workflow-spa

Conversation

@jruszo
Copy link
Copy Markdown
Owner

@jruszo jruszo commented Apr 19, 2026

Summary

  • Add archive as a first-class SPA flow with list, create, and detail pages
  • Introduce archive REST endpoints, execution helpers, and model support for one-time and scheduled delete-only archives
  • Wire archive navigation, API client types, and demo data updates
  • Cover the new flow with backend unit tests and Playwright E2E tests

Testing

  • docker exec datamingle-app python manage.py test sql_api.tests.ArchiveApiTests sql.test_archiver.ArchiveExecutionHelpersTest
  • E2E_START_FRONTEND=1 npm run e2e:test -- archive-workflow.spec.ts
  • E2E_START_FRONTEND=1 npm run e2e:test -- workflow-smoke.spec.ts

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced comprehensive archive workflow management system with support for one-time and scheduled execution modes.
    • Added archive approval workflow with multi-step review process.
    • Enabled multiple archive execution methods (DML and pt-archiver) with automatic method selection based on database type.
    • Added archive execution logs and monitoring dashboard.
    • Implemented archive list filtering and search capabilities across resource groups and instances.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

Warning

Rate limit exceeded

@jruszo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 30 minutes and 51 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 30 minutes and 51 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 668b7a0f-4211-4581-a527-c088bb4cf15f

📥 Commits

Reviewing files that changed from the base of the PR and between 25265d2 and 827a2a6.

📒 Files selected for processing (9)
  • frontend/src/views/ArchiveDetailView.vue
  • frontend/tests/e2e/support/workflow-helpers.ts
  • sql/archiver.py
  • sql/migrations/0012_archiveconfig_consecutive_failures_and_more.py
  • sql/models.py
  • sql/test_archiver.py
  • sql/test_local_demo.py
  • sql_api/api_archive.py
  • sql_api/tests.py
📝 Walkthrough

Walkthrough

This PR introduces a complete archive workflow feature enabling users to create, review, schedule, and execute data archival operations. It spans frontend UI components for archive management, backend REST API endpoints, database schema extensions for scheduling and execution modes, archive scheduling and execution logic, and comprehensive E2E and unit tests.

Changes

Cohort / File(s) Summary
Frontend Navigation & Routing
frontend/src/App.vue, frontend/src/router/index.ts
Added primary navigation entry for /archives route with permission gate (canSeeArchives). Registered three archive routes: list (/archives), creation (/archives/new), and detail (/archives/:archiveId).
Frontend API Layer
frontend/src/lib/api.ts
Introduced 15 exported TypeScript types for archive domain objects (methods, modes, scheduling, records, payloads) and 9 API functions wrapping backend archive endpoints (metadata, list, create, detail, review, run-now, state updates, logs).
Frontend Archive Views
frontend/src/views/ArchivesView.vue, frontend/src/views/ArchiveCreateView.vue, frontend/src/views/ArchiveDetailView.vue
Added three Vue 3 components: list view with filtering and pagination, creation form with metadata loading and approval preview, and detail view with lifecycle management (review decisions, run-now, enable/disable). Each gates access via permission checks.
Frontend E2E Tests & Helpers
frontend/tests/e2e/archive-workflow.spec.ts, frontend/tests/e2e/support/workflow-helpers.ts
Added Playwright E2E spec exercising archive creation, approval, scheduling, filtering, method selection, and state transitions. Extended helpers with archive-specific utilities (seedLocalDemo, archiveIdFromUrl, waitForArchiveAction, polling primitives).
Backend Models & Migrations
sql/models.py, sql/migrations/0011_archiveconfig_archive_method_and_more.py
Extended ArchiveConfig with execution mode, archive method (DML vs pt-archiver), scheduling fields (frequency, time, weekdays), and next_run_at. Added archive_method field to ArchiveLog.
Backend Archive Execution & Scheduling
sql/archiver.py
Refactored archive execution with trigger tracking, added schedule calculation/management helpers, implemented DELETE SQL construction with validation, updated task callback for rescheduling, and integrated queue wrapper for async execution.
Backend API Implementation
sql_api/api_archive.py, sql_api/urls.py
Implemented REST API layer with 8 endpoints (metadata, approval preview, list/create, detail, review, run-now, state, logs), permission helpers, and serializers. Includes state machine logic for workflow transitions and schedule management.
Backend Tests & Seeding
sql/test_archiver.py, sql_api/tests.py, sql/local_demo.py
Added unit tests for archive helpers (condition rendering, SQL validation, schedule calculation, task callback). Extended API tests covering permission gating, method constraints, workflow state transitions, and schedule behavior. Seeded archive audit settings in demo.

Sequence Diagrams

sequenceDiagram
    actor User
    participant Frontend as Frontend<br/>(Browser)
    participant API as REST API<br/>(Backend)
    participant DB as Database
    participant Queue as Task Queue<br/>(Scheduler)

    User->>Frontend: Create new archive
    Frontend->>API: GET /metadata
    API->>DB: Fetch instances, groups
    DB-->>API: Metadata
    API-->>Frontend: Groups, instances, methods
    
    Frontend->>API: GET /approval-preview?groupId=X
    API->>DB: Fetch approval chain
    DB-->>API: Reviewers, audit settings
    API-->>Frontend: Preview data
    
    User->>Frontend: Submit archive form
    Frontend->>API: POST /archive (one-time or scheduled)
    API->>DB: Create ArchiveConfig
    API->>DB: Create WorkflowAudit
    DB-->>API: Success
    API->>Queue: Schedule if scheduled mode
    Queue-->>API: Schedule armed
    API-->>Frontend: Archive ID, redirect
Loading
sequenceDiagram
    actor DBA
    participant Frontend as Frontend<br/>(Browser)
    participant API as REST API<br/>(Backend)
    participant DB as Database
    participant Queue as Task Queue<br/>(Scheduler)
    participant Archive as Archive<br/>Execution

    DBA->>Frontend: Open archive detail
    Frontend->>API: GET /archive/:id
    API->>DB: Fetch ArchiveConfig + logs
    DB-->>API: Detail + execution logs
    API-->>Frontend: Display workflow state
    
    DBA->>Frontend: Submit approval decision
    Frontend->>API: POST /archive/:id/reviews
    API->>DB: Update workflow status
    alt If Approved & Scheduled
        API->>DB: Calculate next_run_at
        API->>Queue: Arm schedule
        Queue-->>API: Schedule active
    end
    DB-->>API: Updated
    API-->>Frontend: Success
    
    alt If manual Run Now
        DBA->>Frontend: Click run-now
        Frontend->>API: POST /archive/:id/run
        API->>Queue: Queue execution
        Queue->>Archive: Execute archive task
        Archive->>DB: Record execution log
        DB-->>Archive: Log entry created
        Archive-->>Queue: Done
        Queue-->>API: Queued
        API-->>Frontend: Success message
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

Poem

🐰 Hop, hop, archive and store,
Schedules tick, and logs explore,
Data neat in rows so bright,
Bunny-blessed workflows take flight! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding archive workflow as a new SPA (Single Page Application) feature with REST API support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/archive-workflow-spa

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread sql_api/api_archive.py Fixed
Comment thread sql_api/api_archive.py
action, request.user, data["audit_remark"]
)
except AuditException as exc:
raise serializers.ValidationError({"errors": str(exc)})
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
sql/models.py (1)

1021-1058: Consider model-level validation for execution_mode='scheduled' invariants.

When execution_mode='scheduled', valid configurations require schedule_time (and schedule_weekdays when schedule_frequency='weekly'). Currently this is only enforced at runtime in sql/archiver.py::calculate_next_archive_run/schedule_archive, so invalid rows can be persisted and only fail later when the scheduler tries to arm them. Consider adding a clean() method (and/or CheckConstraints) so that bad combinations are rejected at save time rather than surfacing as runtime exceptions.

Minor consistency nit: schedule_weekdays uses default="" while the other optional schedule fields use null=True, default=None. Aligning these would simplify "is unset?" checks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sql/models.py` around lines 1021 - 1058, Add model-level validation to
prevent invalid schedule combinations being saved: implement a clean() method on
the model that checks when execution_mode == "scheduled" that schedule_time is
set and, if schedule_frequency == "weekly", that schedule_weekdays is set (raise
ValidationError on violations); optionally add database-level CheckConstraint(s)
mirroring these invariants to guard at DB level. Update save() to call
full_clean() or rely on Django admin/forms to invoke clean(), and adjust the
field definition of schedule_weekdays to use null=True, default=None (matching
schedule_time/schedule_frequency) so "unset" checks are consistent. Reference
symbols: execution_mode, schedule_time, schedule_weekdays, schedule_frequency,
clean(), save(), and note calculate_next_archive_run/schedule_archive currently
enforce these at runtime and should rely on the model validation instead.
sql_api/api_archive.py (2)

659-697: get_queryset / get do not paginate None safely and bypass ListAPIView's standard flow.

Overriding get(self, request) on a ListAPIView and calling self.paginate_queryset(...) directly works, but skips the normal list() machinery (e.g., filter backends wired through filter_queryset continue to work because you call it explicitly, but any future addition of filter_backends or mixin behavior will silently be bypassed). Also, paginate_queryset returns None when pagination is disabled (e.g., client sends no page params and CustomizedPagination returns None). In that case serializer = self.get_serializer(None, many=True) will error and self.get_paginated_response(...) requires a paginator-paged result.

Two tidier options:

  • Remove the custom get override entirely and rely on ListAPIView.list (just keep get_queryset, _require_archive_module_access can move to initial/check_permissions via a permission class).
  • If you need the module-access gate, call _require_archive_module_access(request.user) at the start and then return super().get(request, *args, **kwargs).

The same shape applies to ArchiveLogList.get at lines 1005–1009.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sql_api/api_archive.py` around lines 659 - 697, The custom get override
bypasses ListAPIView.list and can call paginate_queryset which may return None;
remove the get method and let ListAPIView.handle listing by keeping get_queryset
as-is, or alternatively call _require_archive_module_access(request.user) then
return super().get(request, *args, **kwargs) to preserve DRF’s list flow; if you
need the access gate earlier, move the check into initial()/check_permissions or
implement a Permission class used by this view (also apply the same change to
ArchiveLogList.get).

783-795: Duplicated approve→arm-schedule logic between create and review paths (and the legacy Django view).

Lines 783–795 in ArchiveListCreate.post and lines 865–882 in ArchiveReviewCreate.post implement nearly the same sequence (mirror workflow.status from audit, flip state on PASSED, compute next_run_at for scheduled, save, then schedule_archive or cancel_archive_schedule). The legacy Django view archive_audit in sql/archiver.py (lines 766–781) replicates the same logic a third time. Any divergence (e.g., adding a last_approved_by field, adjusting cancel semantics) will require three coordinated edits and is an easy source of drift.

Consider extracting a helper (e.g. apply_archive_audit_outcome(workflow, current_status) in sql/archiver.py) that encapsulates:

  1. Setting state and next_run_at based on current_status.
  2. Saving with update_fields.
  3. Calling schedule_archive / cancel_archive_schedule.

Then call it from both DRF views and the legacy template view.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sql_api/api_archive.py` around lines 783 - 795, Duplicate
approve→schedule/cancel logic is repeated in ArchiveListCreate.post,
ArchiveReviewCreate.post and archive_audit; extract a single helper (e.g.
apply_archive_audit_outcome(workflow, current_status)) in sql/archiver.py that:
sets workflow.status/state based on current_status, computes and assigns
next_run_at when execution_mode == ARCHIVE_EXECUTION_SCHEDULED, saves the
workflow using update_fields, and calls schedule_archive or
cancel_archive_schedule as appropriate; then replace the duplicated blocks in
ArchiveListCreate.post and ArchiveReviewCreate.post and the legacy archive_audit
view to call this new helper with the workflow and audit.current_status.
sql/archiver.py (2)

184-215: schedule_archive unconditionally flips state=True, and cancel_archive_schedule does not reset state.

schedule_archive writes state=True via .update() on lines 204–209 regardless of the caller's intent, and mutates archive_info.state in place. Today all call sites already intend state=True, but it couples "arm the schedule" with "enable the workflow" and bypasses model save() signals/validators. Conversely, cancel_archive_schedule (lines 213–215) only clears next_run_at and does not touch state, so a caller wishing to disable must remember to update state separately (as archive_switch and archive_audit currently do).

Consider making schedule_archive only set next_run_at (leaving state to the caller) and keeping cancel_archive_schedule symmetric; this would also remove the duplicate state writes in callers like archive_switch (lines 892–907) and archive_audit (lines 766–781).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sql/archiver.py` around lines 184 - 215, Currently schedule_archive forces
ArchiveConfig.state=True and mutates archive_info.state in-place while
cancel_archive_schedule only clears next_run_at; make these symmetric by having
schedule_archive only set next_run_at (remove the state=True in the
ArchiveConfig.objects.filter(...).update(...) call and remove archive_info.state
assignment) so state is left to the caller, and keep cancel_archive_schedule
as-is (only clearing next_run_at) so callers control enabling/disabling via
their own state writes; reference functions: schedule_archive,
cancel_archive_schedule, ArchiveConfig, and archive_info.

129-142: DELETE SQL built via string interpolation with a user-authored WHERE fragment.

Ruff S608 fires here, and while the trust model is explicit (requester drafts the condition, DBA approves), a few hardening notes worth considering before merge:

  • sqlparse.split() is not a security boundary. It is best-effort and has historically split some statements differently across versions; a crafted condition containing unusual comment/terminator sequences may not always produce exactly one statement. The follow-up regex check only asserts the statement starts with DELETE FROM, so an attacker‑controlled condition that sqlparse still returns as a single statement (e.g., 1=1 UNION SELECT ... -- style tricks, or nested comments) is not detected here — you are relying entirely on DBA review.
  • ARCHIVE_SAFE_IDENTIFIER_PATTERN permits only [A-Za-z0-9_$], which rejects legitimate quoted identifiers (e.g. MySQL backtick-escaped names, PG schema-qualified schema.table). Scheduled/DML archives against tables needing schema qualification or mixed-case identifiers will be blocked. Since _execute_dml_archive targets arbitrary source DBs via query_engine.execute, consider quoting via the engine's identifier quoting helper rather than a regex whitelist.
  • archive_info.src_db_name is not validated/quoted here, but query_engine.execute(db_name=...) is called with it as a connection-level database selector; worth confirming the engine layer treats it safely and not via string interpolation.

Not blocking given the approval gate, but worth documenting the trust assumptions in the function docstring.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sql/archiver.py` around lines 129 - 142, The build_archive_delete_sql
function currently composes a DELETE via string interpolation (table name
checked by ARCHIVE_SAFE_IDENTIFIER_PATTERN, condition from
render_archive_condition) and then relies on sqlparse.split + a regex to
validate it; instead, either use the query engine's identifier-quoting helper to
quote/escape archive_info.src_table_name (replace the manual
ARCHIVE_SAFE_IDENTIFIER_PATTERN check and interpolate the quoted identifier),
ensure archive_info.src_db_name is validated/quoted or passed safely through the
engine API (not string-interpolated), and harden the condition validation
(reject any semicolons, comment terminators, and dangerous keywords like
UNION/SELECT/INSERT/UPDATE/CREATE or otherwise parse an AST) rather than only
using sqlparse.split; if you intentionally accept the current trust model, add a
clear docstring on build_archive_delete_sql documenting the DBA-review trust
assumptions and the exact guarantees required from render_archive_condition and
src_db_name.
frontend/src/views/ArchiveCreateView.vue (1)

273-290: Silent groupId reset on instance change can confuse the user.

When the user has already selected a resource group and then picks an instance that isn't associated with that group, form.groupId is wiped on line 277 without any feedback. The downstream effects are correct (approval preview is cleared by the groupId watcher), but from the user's perspective their group selection silently disappears. Consider either:

  • Showing a small inline notice ("Resource group reset because it is not associated with the selected instance."), or
  • Filtering the instance dropdown to only show instances compatible with the already-selected group (symmetric to how eligibleGroups narrows groups by instance).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/views/ArchiveCreateView.vue` around lines 273 - 290, The code
silently clears form.groupId when the newly selected instance isn't in
eligibleInstances (see selectedGroupId, eligibleInstances, and form.groupId)
which confuses users; either (A) surface an inline notice by setting a reactive
flag (e.g., groupResetNotice) before clearing form.groupId and display that
message in the template so users see "Resource group reset because it is not
associated with the selected instance.", or (B) prevent the situation by
filtering the instance selection to only show instances compatible with the
currently selected group (reuse eligibleGroups/eligibleInstances logic to
constrain the instance dropdown) so the user cannot pick an incompatible
instance in the first place; update the groupId watcher/instance change handling
(where form.groupId is set to '') to implement one of these approaches and clear
the notice after user action.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@frontend/tests/e2e/support/workflow-helpers.ts`:
- Around line 170-183: The test currently polls the whole document body in
waitForArchiveExecutionState; narrow the selector to the execution-state badge
by reading a stable test id (e.g., the same pattern as "archive-detail-status"
or add a new data-testid like "archive-execution-state") instead of body. Update
waitForArchiveExecutionState to call pollArchiveDetail with a locator that
targets page.locator('[data-testid="archive-execution-state"]') (or the exact
test id you add) and check its textContent for the expected state; ensure
ArchiveDetailView renders that data-testid on the state badge so the test reads
that element only.

In `@sql_api/api_archive.py`:
- Around line 776-781: The code currently echoes raw AuditException text into
API responses in the audit creation paths (call to
audit_handler.create_audit()), which can leak internals; change both exception
handlers that catch AuditException to log the full exception server‑side (use
logger.exception(...)) and raise a serializers.ValidationError with a generic
client message like "Failed to create approval flow. Contact admin." and
re-raise using "from exc" to preserve the cause chain (e.g. raise
serializers.ValidationError({"errors": "Failed to create approval flow. Contact
admin."}) from exc). Ensure this pattern replaces the current uses of str(exc)
in the handlers around audit_handler.create_audit().
- Around line 902-929: The POST handler performs non-atomic checks and writes an
"Archive Queued" audit then enqueues work, creating a TOCTOU and
duplicate-enqueue race; fix by wrapping the permission/approval/state check plus
audit creation and enqueue in a DB transaction that locks the ArchiveConfig row
(use select_for_update on ArchiveConfig inside a transaction in post),
re-checking archive_config.status == WorkflowStatus.PASSED and
archive_config.state before calling Audit.add_log and queue_archive_execution;
additionally, ensure sql/archiver.archive() enforces a concurrency gate (e.g.,
check/set a pending/locked flag under select_for_update or return early if
already queued) so two quick "Run Now" requests cannot both proceed—refer to the
post method, queue_archive_execution, ArchiveConfig, Audit.add_log and the
archive() function in sql/archiver.py when making these changes.

In `@sql/archiver.py`:
- Around line 460-494: The callback currently always reschedules a scheduled
archive even when task.success is False and also doesn't handle
ArchiveConfig.DoesNotExist from ArchiveConfig.objects.get(id=archive_id); change
archive_task_callback so that after fetching archive_info (wrap
ArchiveConfig.objects.get in try/except DoesNotExist and log/return) you
short-circuit rescheduling when the run was scheduled and failed (i.e., if
trigger == "scheduled" and not task.success, do not call
calculate_next_archive_run/schedule_archive), and additionally add one of the
suggested failure-handling actions: increment a failure counter on ArchiveConfig
(e.g., consecutive_failures) and disable scheduling after a threshold or call
notify_for_audit/another notifier when task.success is False to surface
operator-visible alerts; update references in ArchiveConfig
(state/status/next_run_at) and Audit.add_log usage accordingly.
- Around line 449-457: queue_archive_execution can enqueue duplicate archive
jobs because archive() releases its row lock before performing deletes; fix by
ensuring a strong idempotency gate is established inside archive() while still
holding the transaction used for select_for_update: for example, within the same
transaction that runs the select_for_update in archive() create or update a
short-lived ArchiveRun row (unique per archive_id) or set a dedicated running
flag/state on the Archive row before committing so a concurrent task will see
non-runnable state and abort; alternatively implement an advisory lock held for
the lifetime of archive() execution or add enqueue-time deduplication in
queue_archive_execution by checking for an existing in-progress
ArchiveRun—update the functions archive(), queue_archive_execution and any
Archive model logic accordingly so the guard is created atomically with the
state check.

In `@sql/models.py`:
- Around line 1086-1091: Change the model default for archive_method or add a
data migration so historical ArchiveLog rows reflect the API behavior: either
set archive_method = models.CharField(..., default="dml") in the model (replace
the current default "pt_archiver") or create a RunPython data migration in
migration 0011 that updates ArchiveLog.archive_method values to
ArchiveConfig.ARCHIVE_METHOD_DML (or other legacy-detection logic) instead of
blanket "pt_archiver"; ensure ArchiveConfig.ARCHIVE_METHOD_CHOICES and any code
referencing ARCHIVE_METHOD_DML remain consistent with the chosen approach.

---

Nitpick comments:
In `@frontend/src/views/ArchiveCreateView.vue`:
- Around line 273-290: The code silently clears form.groupId when the newly
selected instance isn't in eligibleInstances (see selectedGroupId,
eligibleInstances, and form.groupId) which confuses users; either (A) surface an
inline notice by setting a reactive flag (e.g., groupResetNotice) before
clearing form.groupId and display that message in the template so users see
"Resource group reset because it is not associated with the selected instance.",
or (B) prevent the situation by filtering the instance selection to only show
instances compatible with the currently selected group (reuse
eligibleGroups/eligibleInstances logic to constrain the instance dropdown) so
the user cannot pick an incompatible instance in the first place; update the
groupId watcher/instance change handling (where form.groupId is set to '') to
implement one of these approaches and clear the notice after user action.

In `@sql_api/api_archive.py`:
- Around line 659-697: The custom get override bypasses ListAPIView.list and can
call paginate_queryset which may return None; remove the get method and let
ListAPIView.handle listing by keeping get_queryset as-is, or alternatively call
_require_archive_module_access(request.user) then return super().get(request,
*args, **kwargs) to preserve DRF’s list flow; if you need the access gate
earlier, move the check into initial()/check_permissions or implement a
Permission class used by this view (also apply the same change to
ArchiveLogList.get).
- Around line 783-795: Duplicate approve→schedule/cancel logic is repeated in
ArchiveListCreate.post, ArchiveReviewCreate.post and archive_audit; extract a
single helper (e.g. apply_archive_audit_outcome(workflow, current_status)) in
sql/archiver.py that: sets workflow.status/state based on current_status,
computes and assigns next_run_at when execution_mode ==
ARCHIVE_EXECUTION_SCHEDULED, saves the workflow using update_fields, and calls
schedule_archive or cancel_archive_schedule as appropriate; then replace the
duplicated blocks in ArchiveListCreate.post and ArchiveReviewCreate.post and the
legacy archive_audit view to call this new helper with the workflow and
audit.current_status.

In `@sql/archiver.py`:
- Around line 184-215: Currently schedule_archive forces
ArchiveConfig.state=True and mutates archive_info.state in-place while
cancel_archive_schedule only clears next_run_at; make these symmetric by having
schedule_archive only set next_run_at (remove the state=True in the
ArchiveConfig.objects.filter(...).update(...) call and remove archive_info.state
assignment) so state is left to the caller, and keep cancel_archive_schedule
as-is (only clearing next_run_at) so callers control enabling/disabling via
their own state writes; reference functions: schedule_archive,
cancel_archive_schedule, ArchiveConfig, and archive_info.
- Around line 129-142: The build_archive_delete_sql function currently composes
a DELETE via string interpolation (table name checked by
ARCHIVE_SAFE_IDENTIFIER_PATTERN, condition from render_archive_condition) and
then relies on sqlparse.split + a regex to validate it; instead, either use the
query engine's identifier-quoting helper to quote/escape
archive_info.src_table_name (replace the manual ARCHIVE_SAFE_IDENTIFIER_PATTERN
check and interpolate the quoted identifier), ensure archive_info.src_db_name is
validated/quoted or passed safely through the engine API (not
string-interpolated), and harden the condition validation (reject any
semicolons, comment terminators, and dangerous keywords like
UNION/SELECT/INSERT/UPDATE/CREATE or otherwise parse an AST) rather than only
using sqlparse.split; if you intentionally accept the current trust model, add a
clear docstring on build_archive_delete_sql documenting the DBA-review trust
assumptions and the exact guarantees required from render_archive_condition and
src_db_name.

In `@sql/models.py`:
- Around line 1021-1058: Add model-level validation to prevent invalid schedule
combinations being saved: implement a clean() method on the model that checks
when execution_mode == "scheduled" that schedule_time is set and, if
schedule_frequency == "weekly", that schedule_weekdays is set (raise
ValidationError on violations); optionally add database-level CheckConstraint(s)
mirroring these invariants to guard at DB level. Update save() to call
full_clean() or rely on Django admin/forms to invoke clean(), and adjust the
field definition of schedule_weekdays to use null=True, default=None (matching
schedule_time/schedule_frequency) so "unset" checks are consistent. Reference
symbols: execution_mode, schedule_time, schedule_weekdays, schedule_frequency,
clean(), save(), and note calculate_next_archive_run/schedule_archive currently
enforce these at runtime and should rely on the model validation instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77b0d411-8a21-4de2-b16d-a64ede76ff46

📥 Commits

Reviewing files that changed from the base of the PR and between 453e8a5 and 25265d2.

📒 Files selected for processing (16)
  • frontend/src/App.vue
  • frontend/src/lib/api.ts
  • frontend/src/router/index.ts
  • frontend/src/views/ArchiveCreateView.vue
  • frontend/src/views/ArchiveDetailView.vue
  • frontend/src/views/ArchivesView.vue
  • frontend/tests/e2e/archive-workflow.spec.ts
  • frontend/tests/e2e/support/workflow-helpers.ts
  • sql/archiver.py
  • sql/local_demo.py
  • sql/migrations/0011_archiveconfig_archive_method_and_more.py
  • sql/models.py
  • sql/test_archiver.py
  • sql_api/api_archive.py
  • sql_api/tests.py
  • sql_api/urls.py

Comment thread frontend/tests/e2e/support/workflow-helpers.ts
Comment thread sql_api/api_archive.py Outdated
Comment thread sql_api/api_archive.py
Comment thread sql/archiver.py Outdated
Comment thread sql/archiver.py
Comment thread sql/models.py
Comment on lines +1086 to +1091
archive_method = models.CharField(
"Archive Method",
max_length=20,
choices=ArchiveConfig.ARCHIVE_METHOD_CHOICES,
default="pt_archiver",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
fd -t f '0011_archiveconfig_archive_method_and_more.py' | xargs cat

Repository: jruszo/Datamingle

Length of output: 2680


🏁 Script executed:

# Search for ArchiveLog model definition to understand the context
rg "class ArchiveLog" -A 20

Repository: jruszo/Datamingle

Length of output: 3410


🏁 Script executed:

# Search for references to archive_method and archive execution patterns
rg "archive_method|pt_archiver|dml.*archive" -i --type py -B 2 -A 2

Repository: jruszo/Datamingle

Length of output: 16159


🏁 Script executed:

# Look for any data migration or comments about legacy behavior
find . -type f -name "*.py" -path "*/migrations/*" | xargs grep -l "ArchiveLog\|archive" | head -5

Repository: jruszo/Datamingle

Length of output: 156


🏁 Script executed:

# Search for any comments or documentation about archive execution
rg "pt.archiver|rendered.*dml|archive.*method" -i --type py | head -20

Repository: jruszo/Datamingle

Length of output: 1504


🏁 Script executed:

# Check for any data migration steps in migration 0011
cat sql/migrations/0011_archiveconfig_archive_method_and_more.py | grep -i "runpython\|runsql"

Repository: jruszo/Datamingle

Length of output: 43


🏁 Script executed:

# Examine the API serializer defaults more closely
rg "ARCHIVE_METHOD_DML|default.*=.*ARCHIVE" sql_api/api_archive.py -B 2 -A 2

Repository: jruszo/Datamingle

Length of output: 971


🏁 Script executed:

# Check if there are any comments or documentation about legacy archive behavior
rg "legacy|historical|backward|compat" -i sql/ --type py | grep -i archive

Repository: jruszo/Datamingle

Length of output: 215


Update model default for archive_method to "dml" to align with API behavior, or add a data migration to intelligently handle historical records.

The migration backfills all pre-existing ArchiveLog rows with default="pt_archiver", but the API serializer defaults to ARCHIVE_METHOD_DML and all execution code explicitly sets the method. This mismatch suggests historical archives likely used DML by default, not pt-archiver. Either update the model default to "dml" to match the API, or add a data migration (RunPython) in 0011 that intelligently backfills historical records based on legacy behavior rather than a blanket default.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sql/models.py` around lines 1086 - 1091, Change the model default for
archive_method or add a data migration so historical ArchiveLog rows reflect the
API behavior: either set archive_method = models.CharField(..., default="dml")
in the model (replace the current default "pt_archiver") or create a RunPython
data migration in migration 0011 that updates ArchiveLog.archive_method values
to ArchiveConfig.ARCHIVE_METHOD_DML (or other legacy-detection logic) instead of
blanket "pt_archiver"; ensure ArchiveConfig.ARCHIVE_METHOD_CHOICES and any code
referencing ARCHIVE_METHOD_DML remain consistent with the chosen approach.

@jruszo jruszo merged commit a555085 into master Apr 19, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants