Skip to content

Fix govinfo-bulk listing parser for GovInfo XML structure#43

Merged
v1d0b0t merged 1 commit intomainfrom
fix/govinfo-bulk-xml-parsing
Apr 3, 2026
Merged

Fix govinfo-bulk listing parser for GovInfo XML structure#43
v1d0b0t merged 1 commit intomainfrom
fix/govinfo-bulk-xml-parsing

Conversation

@v1d0b0t
Copy link
Copy Markdown
Collaborator

@v1d0b0t v1d0b0t commented Apr 3, 2026

Follow-up to #42. The listing parser couldn't find directory entries because GovInfo uses a specific XML structure.

Fixes:

  • findEntriesRoot: handle GovInfo's data > files > file wrapper
  • classifyListingEntry: check folder field from XML
  • readStringField: handle numeric values (congress numbers)

Verified: all 12 congresses (108-119) now discovered correctly.

- findEntriesRoot: handle GovInfo's <data><files><file> structure
- classifyListingEntry: check 'folder' field from XML
- readStringField: handle numeric values (congress numbers)
- Pass raw entry to classifyListingEntry for folder detection
@v1d0b0t v1d0b0t merged commit ab0d835 into main Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant