Skip to content

[DF2] #40: Add bulk download from GovInfo Bulk Data Repository#41

Merged
v1d0b0t merged 10 commits intomainfrom
df2/issue-40
Apr 3, 2026
Merged

[DF2] #40: Add bulk download from GovInfo Bulk Data Repository#41
v1d0b0t merged 10 commits intomainfrom
df2/issue-40

Conversation

@v1d0b0t
Copy link
Copy Markdown
Collaborator

@v1d0b0t v1d0b0t commented Apr 3, 2026

Automated pipeline PR for #40

@v1d0b0t
Copy link
Copy Markdown
Collaborator Author

v1d0b0t commented Apr 3, 2026

Implementation Plan

  1. Extend src/commands/fetch.ts and src/utils/manifest.ts for govinfo-bulk, --collection, status output, and manifest defaults.
  2. Add src/utils/govinfo-bulk-listing.ts to parse GovInfo XML listings, resolve allowed bulkdata URLs, and classify directories/files.
  3. Add src/sources/govinfo-bulk.ts to traverse selected collection/congress scopes, download with bounded concurrency, validate XML/ZIP payloads, extract safely, and persist resumable manifest state.
  4. Add/update CLI, unit, and integration-style tests for selector validation, listing traversal, resume behavior, and no-key execution path.
  5. Update docs/DATA-ACQUISITION-RUNBOOK.md, then run npm run build, npx tsc --noEmit, and the full Vitest suite before push.

@v1d0b0t v1d0b0t marked this pull request as ready for review April 3, 2026 15:39
@v1d0b0t
Copy link
Copy Markdown
Collaborator Author

v1d0b0t commented Apr 3, 2026

[adversary-review] — REJECTED

See issue #40 for full findings.

@v1d0b0t
Copy link
Copy Markdown
Collaborator Author

v1d0b0t commented Apr 3, 2026

Implementation Plan

  1. Update src/utils/manifest.ts to merge govinfo-bulk file/collection state on write so stale snapshots cannot clobber completed-file records from another writer.
  2. Update src/sources/govinfo-bulk.ts to stream response bodies to temp files instead of buffering with arrayBuffer().
  3. Add a pre-commit race re-check in the govinfo-bulk download path so an already-completed artifact is not overwritten after validation.
  4. Keep the existing govinfo-bulk tests green and run the full Vitest suite plus typecheck/build to prove no regressions.
  5. Push the fix commit, update the existing PR, and comment on issue Add bulk download from GovInfo Bulk Data Repository #40 with verification + adversary-finding closure.

@v1d0b0t
Copy link
Copy Markdown
Collaborator Author

v1d0b0t commented Apr 3, 2026

[adversary-review] — REJECTED

See issue #40 for full findings.

@v1d0b0t
Copy link
Copy Markdown
Collaborator Author

v1d0b0t commented Apr 3, 2026

Implementation Plan

  1. Re-read src/sources/govinfo-bulk.ts and the overlap regression test to isolate the remaining pre-rename race window.
  2. Extend the winner-detection helper to check both refreshed manifest state and on-disk final artifact/extraction-root existence immediately before commit.
  3. Keep the fix inside the existing downloadBulkArtifact() path so overlapping writers skip instead of overwriting.
  4. Re-run the focused govinfo-bulk regression test, then the full Vitest suite, typecheck, and build.
  5. Commit, push, update the issue comment, and ready the existing PR for re-review.

@v1d0b0t
Copy link
Copy Markdown
Collaborator Author

v1d0b0t commented Apr 3, 2026

[adversary-review] — APPROVED

See issue #40 for full findings.

@v1d0b0t
Copy link
Copy Markdown
Collaborator Author

v1d0b0t commented Apr 3, 2026

[adversary-review] — APPROVED

See issue #40 for full findings.

@v1d0b0t v1d0b0t merged commit b342555 into main Apr 3, 2026
1 check passed
@v1d0b0t v1d0b0t deleted the df2/issue-40 branch April 3, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant