Add bulk download from GovInfo Bulk Data Repository


## Summary

Add a \`fetch --source=govinfo-bulk\` command that downloads data from the [GovInfo Bulk Data Repository](https://www.govinfo.gov/bulkdata/) instead of crawling the GovInfo API one request at a time. This replaces days/weeks of rate-limited API crawling with a few hours of direct ZIP downloads.

## Background

The current \`fetch --source=govinfo\` client crawls the GovInfo API at 5,000 req/hr (shared with Congress.gov). A full historical crawl takes days to weeks of hourly sessions.

GovInfo publishes the same data as **bulk ZIP downloads** with no API key and no rate limits:
- https://www.govinfo.gov/bulkdata/

## Available Collections

| Collection | Path | Coverage | What It Contains |
|-----------|------|----------|-----------------|
| **BILLSTATUS** | \`/bulkdata/BILLSTATUS/{congress}/{type}/\` | 108th–present (2003+) | Bill lifecycle, sponsors, actions, committees, cosponsors, related bills |
| **BILLS** | \`/bulkdata/BILLS/{congress}/{type}/\` | 113th–present (2013+) | Full bill text as XML |
| **BILLSUM** | \`/bulkdata/BILLSUM/{congress}/\` | 113th–present | CRS bill summaries |
| **PLAW** | \`/bulkdata/PLAW/{congress}/\` | Public & private laws | Enacted law text (USLM XML) |

Directory structure example:
\`\`\`
/bulkdata/BILLSTATUS/119/hr/   → ZIP of all House bill statuses, 119th Congress
/bulkdata/BILLSTATUS/119/s/    → ZIP of all Senate bill statuses, 119th Congress
/bulkdata/BILLSTATUS/118/hr/   → 118th Congress House bills
...back to 108th Congress
\`\`\`

## Proposed Implementation

### New CLI command
\`\`\`bash
# Download all bulk collections
npx us-code-tools fetch --source=govinfo-bulk

# Download specific collection
npx us-code-tools fetch --source=govinfo-bulk --collection=BILLSTATUS

# Download specific congress only
npx us-code-tools fetch --source=govinfo-bulk --congress=119

# Download specific collection + congress
npx us-code-tools fetch --source=govinfo-bulk --collection=BILLSTATUS --congress=119
\`\`\`

### Behavior
1. Enumerate available congresses by crawling the bulk data directory listing (XML format)
2. Download ZIPs for each congress/type combination
3. Extract to \`data/cache/govinfo-bulk/{collection}/{congress}/{type}/\`
4. Track progress in manifest (congress + collection granularity)
5. Support resume — skip already-downloaded ZIPs (check size/date)
6. No API key required
7. No rate limiting needed (just be polite — maybe 1-2 concurrent downloads)

### Cache structure
\`\`\`
data/cache/govinfo-bulk/
├── BILLSTATUS/
│   ├── 119/
│   │   ├── hr/   → extracted XML files
│   │   └── s/
│   ├── 118/
│   │   ├── hr/
│   │   └── s/
│   └── ...back to 108
├── BILLS/
│   └── ...
├── BILLSUM/
│   └── ...
└── PLAW/
    └── ...
\`\`\`

### Collections priority
1. **BILLSTATUS** — most important, has bill lifecycle data needed for "bills as PRs"
2. **PLAW** — public law text for linking code changes to specific laws
3. **BILLS** — full bill text (large, defer if disk space is tight)
4. **BILLSUM** — summaries (nice to have, small)

## Relationship to Existing Clients

- \`fetch --source=govinfo\` (API client) remains for real-time/incremental updates
- \`fetch --source=govinfo-bulk\` is for initial historical bulk load
- \`fetch --source=congress\` (Congress.gov API) may become unnecessary for bill data if BILLSTATUS covers the same fields — evaluate after bulk download completes

## Acceptance Criteria

- [ ] \`fetch --source=govinfo-bulk\` downloads and extracts ZIPs from GovInfo bulk repository
- [ ] Supports \`--collection\` and \`--congress\` filters
- [ ] Progress tracked in manifest with resume support
- [ ] No API key required
- [ ] Downloads BILLSTATUS for all available congresses (108–119)
- [ ] Extracted XML files are valid and parseable
- [ ] Runbook updated with bulk download instructions

## Estimated Download Size

- BILLSTATUS: ~2-5 GB across all congresses (rough estimate)
- PLAW: ~500 MB–1 GB
- BILLS: ~10-20 GB (full text of every bill version)
- BILLSUM: ~500 MB

Total for BILLSTATUS + PLAW (priority): probably under 5 GB.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bulk download from GovInfo Bulk Data Repository #40

Summary

Background

Available Collections

Proposed Implementation

New CLI command

Download all bulk collections

Download specific collection

Download specific congress only

Download specific collection + congress

Behavior

Cache structure

Collections priority

Relationship to Existing Clients

Acceptance Criteria

Estimated Download Size

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Collection	Path	Coverage	What It Contains
BILLSTATUS	`/bulkdata/BILLSTATUS/{congress}/{type}/`	108th–present (2003+)	Bill lifecycle, sponsors, actions, committees, cosponsors, related bills
BILLS	`/bulkdata/BILLS/{congress}/{type}/`	113th–present (2013+)	Full bill text as XML
BILLSUM	`/bulkdata/BILLSUM/{congress}/`	113th–present	CRS bill summaries
PLAW	`/bulkdata/PLAW/{congress}/`	Public & private laws	Enacted law text (USLM XML)

Add bulk download from GovInfo Bulk Data Repository #40

Description

Summary

Background

Available Collections

Proposed Implementation

New CLI command

Download all bulk collections

Download specific collection

Download specific congress only

Download specific collection + congress

Behavior

Cache structure

Collections priority

Relationship to Existing Clients

Acceptance Criteria

Estimated Download Size

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions